*** nimish has joined #buildstream | 01:19 | |
*** toscalix has joined #buildstream | 01:56 | |
*** nimish has quit IRC | 03:18 | |
*** tristan has joined #buildstream | 05:49 | |
*** tristan has quit IRC | 06:52 | |
*** tristan has joined #buildstream | 07:02 | |
gitlab-br-bot | tristanvb opened MR !1212 (tristan/cleanup-workspace-tests->master: tests/frontend/workspace.py: Remove redundant and pointless tests) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1212 | 07:49 |
---|---|---|
*** ChanServ sets mode: +o tristan | 08:22 | |
tristan | juergbi, Can I pick your brain for the infamous !1070 algorithm ? | 08:22 |
tristan | me gropes around channel for the bot which should be linking !1070 | 08:24 |
juergbi | tristan: ah, I just commented before seeing this here | 08:24 |
juergbi | disclaimer, haven't followed it too closely | 08:25 |
tristan | juergbi, hold a sec, I can explain in human language | 08:27 |
tristan | Right... commented back, and seems like a good idea indeed | 08:29 |
tristan | juergbi, So here is the thing, Element.__update_state_recursively() gets called everywhere (except _fetch_done() which cannot resolve a key) instead of _update_state() | 08:29 |
juergbi | tristan: why artifact version bump? | 08:29 |
juergbi | we wouldn't have to store that recursive key | 08:30 |
juergbi | (nor want to) | 08:30 |
tristan | juergbi, Wouldnt the hash allow you to avoid recursing into dependencies which calculating the key ? | 08:30 |
tristan | Oh, so it's a different idea I didn't really comprehend completely | 08:30 |
tristan | I think | 08:30 |
juergbi | yes, but only avoiding repeated calls at runtime | 08:30 |
juergbi | runtime dep versions can change, we can't store something like this in the artifact | 08:30 |
juergbi | (which doesn't get rebuilt for runtime-only dep changes) | 08:31 |
tristan | Yeah at runtime, but I thought instead of using a "dependencies" list, we would slowly add data to the "dependencies hash" as we go, contributing to the key as we truck along | 08:32 |
juergbi | I would just recalculate it whenever something changes, otherwise it gets tricky | 08:32 |
juergbi | but we don't have to do things recursively all the time | 08:32 |
tristan | hmmm, just rereading this | 08:32 |
tristan | I'm not sure it handles the case where indirect reverse dependencies need to have their state updated | 08:33 |
juergbi | actually, we also have to update reverse runtime dependencies, not just reverse build dependencies | 08:33 |
juergbi | but the main point is that it would suffice to have a flat list of reverse dependencies | 08:34 |
tristan | juergbi, What I discovered, is that if Element.__update_state_recursively() only iterates over direct dependencies (build + runtime), then reverse dependencies of reverse runtime-only dependencies are missed | 08:34 |
juergbi | yes, that's what the new hash would fix | 08:34 |
tristan | I think you have a good idea | 08:35 |
*** toscalix has joined #buildstream | 08:35 | |
tristan | It would still have to be recursive when the hash of something changes | 08:36 |
juergbi | the state update would still be recursive, yes | 08:37 |
juergbi | but not driven from a single element | 08:37 |
juergbi | i.e., recurse only if the cache key or the new hash of the reverse dependency was updated | 08:37 |
juergbi | to bubble up | 08:37 |
tristan | And keep a list of *all* reverse dependencies, not only direct ones ? | 08:38 |
juergbi | no, just direct ones, but build+runtime | 08:38 |
juergbi | the recursion should happen on the method level, not by having indirect reverse dependencies stored | 08:38 |
juergbi | i.e., the basic structure of the current branch would still fit | 08:39 |
juergbi | but use Scope.ALL, recurse=False to construct reverse dependency lists | 08:40 |
juergbi | and then use that new hash as additional criterion in __update_state_recursively() | 08:41 |
tristan | Ok so I don't really understand what this hash is :-S | 08:42 |
tristan | It is a cache key of itself and its direct runtime dependencies | 08:43 |
tristan | Ah yes I see what it is now | 08:43 |
juergbi | self.recursive_runtime_hash = sha256(self.cache_key + [dep.recursive_runtime_hash for dep in self.dependencies(Scope.RUN, recurse=False)]) | 08:43 |
juergbi | or so | 08:43 |
juergbi | and it's undefined as long as any of the dependencies don't have a recursive_runtime_hash, of course | 08:44 |
tristan | So basically it is a cache key which behaves differently, and does not omit the runtime-only gaps | 08:45 |
tristan | well, minus build-only dependencies, simply because we don't really need that if we're already observing the cache key changes | 08:45 |
juergbi | correct | 08:46 |
juergbi | theoretically, we'd only have to check runtime-only dependencies, but it's conceptually easier to cover all runtime dependencies | 08:46 |
tristan | Updated my original comment | 08:50 |
tristan | With a description of your idea in my own words after understanding it :) | 08:50 |
tristan | juergbi, Thanks I think that will solve it ! | 08:50 |
tristan | Just I still wonder... | 08:50 |
tristan | Is it enough ? | 08:50 |
tristan | The order is correct, I think it is enough | 08:51 |
juergbi | I think so, too | 08:52 |
tristan | juergbi, Maybe we could have a cache key which consists of every different kind of key (weak, strong, etc), and also considers direct runtime-only dependencies - like a "current state" hash | 08:52 |
juergbi | and I expect it to be lighter than the current branch but don't know how much memory/CPU is saved | 08:52 |
juergbi | weak is always a subset of strong | 08:52 |
tristan | And then we only ever have to check that one (i.e. just for coherency) | 08:52 |
juergbi | so that suggested hash would already cover everything | 08:53 |
tristan | Right, I mean we would *only* consult the current state hash, instead of consulting cache keys + the new hash | 08:53 |
tristan | making it... maybe conceptually lighter to swallow ? | 08:53 |
tristan | just an idea | 08:54 |
juergbi | agreed | 08:54 |
tristan | I wonder if I should hack it out, I feel bad for stringing benschubert along for so long now haha | 08:55 |
juergbi | it doesn't necessarily cover strict cache keys. but they might not be relevant for the condition | 08:55 |
tristan | benschubert, When you surface, let me know what you think of this idea and whether you will be able to work on it | 08:55 |
juergbi | the current branch should be correct as well, right? just maybe doing more than necessary | 08:56 |
tristan | Yes | 08:56 |
juergbi | however, if that's still an improvement compared to master, we could consider merging it already | 08:56 |
tristan | I don't know if the latest state of the branch corresponds to the numbers reported | 08:56 |
juergbi | right, should definitely be rechecked before | 08:57 |
tristan | I.e. it could be that the list of reverse dependencies being stored became recursive since the last benchmark | 08:57 |
juergbi | possible | 08:58 |
*** tristan has quit IRC | 09:02 | |
benschubert | I'm here! Bt seem to have missed tristan -_-' | 09:30 |
benschubert | juergbi: I'm not sure I understand this idea of hashes | 09:30 |
benschubert | trying to catch up on the rest | 09:31 |
juergbi | benschubert: one way to look at it would be to define a new cache key variant that is like the strong cache key except that it takes all dependencies into account, not just build dependencies | 09:37 |
juergbi | I called it hash, not cache key, because it wouldn't be used as a key for the cache | 09:38 |
benschubert | juergbi: what would be the advantage compared to now? | 09:38 |
juergbi | it would allow use of recurse=False to construct the reverse dependency list | 09:39 |
benschubert | and what advantages would that bring? | 09:39 |
juergbi | we'd use that new hash as condition in __update_state_recursively() | 09:39 |
juergbi | recurse=False would avoid having to store indirect reverse dependencies in every element (quadratic memory/CPU) | 09:39 |
benschubert | juergbi: on essential-stack.bst from debian stretch, that results in 1Mo of additional peak memory time | 09:40 |
benschubert | (I haven't published those oups) | 09:40 |
juergbi | as mentioned above, the current branch with Scope.BUILD, recurse=True should also be correct. if total performance is already better than master (and peak memory usage is not too high), my optimization shouldn't be a blocker | 09:41 |
benschubert | I should probably check for debian-stack.bst though | 09:41 |
*** tristan has joined #buildstream | 09:42 | |
*** jonathanmaw has joined #buildstream | 09:43 | |
*** ChanServ sets mode: +o tristan | 09:43 | |
* tristan catches backlog | 09:43 | |
benschubert | juergbi: just one quick check that I understand everything correclty: | 09:43 |
benschubert | 1) when an element has its weak_cache_key discovered, this allows us to discover the weak cache key of its reverse dependencies | 09:43 |
benschubert | 2) when an element has its strong_cache_key discovered, this allows us to discover the strong cache key of its reverse dependencies | 09:43 |
benschubert | Correct? | 09:43 |
benschubert | (having trouble waking up this morning) | 09:44 |
juergbi | benschubert: mostly but there are some exceptions | 09:44 |
benschubert | would you mind listing them? | 09:44 |
benschubert | workspace is one | 09:44 |
juergbi | if BST_STRICT_REBUILD is set, the weak cache of an element includes the strong cache keys of build dependencies | 09:45 |
juergbi | i.e., in that case 1) is not true, however, that shouldn't break the MR | 09:46 |
benschubert | yep | 09:47 |
*** raoul_ has quit IRC | 09:49 | |
tristan | benschubert, in short, the general advantage of only storing direct reverse dependencies, are that you avoid recursing into dependencies which you know cannot be resolved yet (because the dependencies in between have not yet been resolved, so it would be impossible) - looking at __update_state_recurse(), it *appears* to be the intention | 09:50 |
tristan | Also they are much shorter lists (less memory consumption), and generally overall less iteration | 09:50 |
benschubert | So basically, if we didn't have _update_state(), we would have something like: | 09:50 |
benschubert | 1) on weak cache key update -> check updates on weak cache keys of rdeps that only depends on us for (RUNTIME if BST_STRICT_REBUILD else ALL) | 09:50 |
benschubert | 2) on strong cache key update -> check updates on weak cache keys of rdeps that only depends on us for BUILD if BST_STRICT_REBUILD | 09:50 |
benschubert | 3) on strong cache key update -> check updates on strong cache keys of rdeps | 09:50 |
benschubert | Correct? | 09:50 |
benschubert | tristan: if you look at the _update_state_recursive(), it has a break is no key has changes, so it doesn't propagate up :) | 09:51 |
tristan | :-/ | 09:51 |
tristan | So we enforce somehow the order to elements in the rdeps list ? | 09:52 |
benschubert | tristan: also only storing direct BUILD deps is that runtime deps of runtime deps of build deps become way harded to treat | 09:52 |
tristan | If that is the intention, it would be worth a nice paragraph of comment around it | 09:52 |
benschubert | tristan: no, we don't need that. What happens is "did my cache key change? yes => propagate to rdeps, no? => we are done!" | 09:52 |
tristan | benschubert, storing only direct build dependencies is what I have been pointing out is incorrect since the beginning, runtime deps also need to be stored | 09:53 |
juergbi | benschubert: 1-3) yes, this sounds correct. there is also 'strict cache key', though, which may have to be considered | 09:53 |
benschubert | juergbi: then keeping a list of rdeps (we might need two, one for RUNTIME only, the other one for the rest), will allow more granular state propagation once _update_state is no more | 09:54 |
benschubert | juergbi: correct? | 09:54 |
tristan | I.e. you need *all* direct reverse deps for it to work, and the additional hash of the "cache key state" which juergbi proposes, fixes the issue of missing out on recursing into reverse runtime deps of deps | 09:54 |
benschubert | tristan: no, you don't need runtime deps, but runtime deps of BUILD deps, which is why we use scope=BUILD, recurse=TRUE, which gives us exactly that | 09:54 |
benschubert | tristan: because our runtime dependencies don't affect our cache key | 09:55 |
juergbi | benschubert: latest question: yes, that works but we might do too much work | 09:55 |
benschubert | juergbi: in what cases? | 09:55 |
juergbi | we iterate over reverse_dependencies which includes indirect runtime dependencies | 09:56 |
juergbi | and these indirect dependencies we might visit multiple times | 09:56 |
juergbi | because of the recursion of __update_state_recursively() itself | 09:56 |
tristan | benschubert, You are still speaking in terms of what dependencies are considered for the cache key of an element, I am speaking in terms of what direct reverse dependencies need to be stored to ensure that the recursive algorithm reaches all the deps which need to be updated | 09:56 |
benschubert | juergbi: agreed, we will visit them too many times, that is correct | 09:57 |
juergbi | i.e., as mentioned before, my new hash suggestion is just a possible optimization addressing this | 09:57 |
juergbi | I don't know how much the impact is in practice | 09:57 |
juergbi | *big | 09:57 |
benschubert | juergbi: I fail to see how it addresses this | 09:57 |
juergbi | the new hash allows reducing reverse_dependencies to the list of direct dependencies | 09:58 |
juergbi | (although we now have to include runtime dependencies as well) | 09:58 |
benschubert | juergbi: but removes the granularity of updates we could have if update_state was split, plus adds dependencies that are not needed | 09:58 |
tristan | That granularity is not worth it | 09:59 |
juergbi | it's not an alternative to splitting update_state() | 09:59 |
benschubert | tristan: why? | 09:59 |
tristan | And I dont believe it is true: In order to attempt that granularity, you are adding *all* the elements which might need to have their state recalulated as a result of a state update | 09:59 |
tristan | I.e. you are making the loop immensely more heavy duty, like deps * deps heavier | 10:00 |
tristan | And you will end up hitting those very few anyway | 10:00 |
juergbi | yes, the hash approach should scale a lot better | 10:00 |
tristan | At least it very much appears like this | 10:00 |
benschubert | tristan: so if instead of a recursive algorithm I had a list and kept an order, would that be satisfactory? | 10:01 |
juergbi | I can't estimate the impact but given that we're testing with huge element counts, I'd expect it to be relevant | 10:01 |
tristan | benschubert, that would be dangerous, the order of the list would be significant and delicate | 10:01 |
tristan | recurse one level at a time is much cleaner | 10:01 |
benschubert | tristan: we have a very simple order | 10:01 |
benschubert | tristan: that is the order in which the elements were created in _new_from_meta | 10:02 |
benschubert | so if we do in the constructor: __node_id = next(counter) as for MetaElement, we are done | 10:02 |
*** tpollard has joined #buildstream | 10:02 | |
tristan | benschubert, Exactly, that is delicate and honestly, I don't for a second trust anyone who jumps into the project today, to know that that order should remain the same | 10:02 |
benschubert | tristan: as I don't trust anybody jumping in the project today to know how _update_state works | 10:03 |
*** phildawson has joined #buildstream | 10:03 | |
tristan | (hence: if that is significant, that line of code is worth a comment that is a paragraph long) | 10:03 |
benschubert | I have no problem with that | 10:03 |
tristan | I do | 10:04 |
juergbi | I'm not sure this is even sufficient | 10:04 |
benschubert | juergbi: why? | 10:04 |
tristan | benschubert, I have a problem with overworking in the attempt to exclude a hand full of elements while I am rather convinced that direct rdep recursion will anyway work better than this | 10:05 |
juergbi | I'm not sure whether the order is guaranteed to be topological if we include indirect runtime dependencies | 10:05 |
benschubert | juergbi: it is per the sorting we are doing in the loader. If it is not, then we have a bug in that place | 10:06 |
tristan | Staging order is topological, that is not the same as the order of the list of reverse dependencies | 10:06 |
tristan | it is only the order in which an element *adds itself* as a reverse dependency to it's deps | 10:07 |
benschubert | tristan: and since we build the elements in topological orders, the way they add themselve stays a topological order | 10:07 |
tristan | reasoning out whether that is a correct order is quite an expensive mental process | 10:07 |
benschubert | If you iterate over a topological tree, in topological order, and create lists of segments you go through, each segment will be a sub-topological order of its own | 10:08 |
tristan | So long as every line of code stays at exactly the same place, and care is taken in adding to the list (append/prepend significant) | 10:09 |
benschubert | as long as _new_from_meta doesn't change yes | 10:10 |
tristan | Perhaps, and even then, I wouldnt trust that it is without trying to reason it out; it kind of makes sense that it *should* be | 10:10 |
benschubert | byt if new_from_meta changes, then EVERY assumption about ordering become also false | 10:10 |
tristan | Right, in other words: Very complicated for no clear gain | 10:10 |
* Kinnison worries that we have behaviour which is this delicate, and for which we can't be confident the test suite will notify us if we break it | 10:10 | |
* juergbi agrees | 10:11 | |
* tristan only worries that we are adding to this delicateness | 10:12 | |
benschubert | tristan: Another approach, that would be sane is to, as we call _set_required, add the current element in the list of every dep, that would assure us that the behavior is sane, but again it's more work than necessary | 10:12 |
tristan | I very dislike having more than direct rdeps in the list, and also having the updates be recursive; and they have to be recursive anyway | 10:13 |
tristan | (A) it's an exponentially more complex loop, (B) most importantly, it causes the code to be harder to understand, (C) it's exponentially more expensive in memory consumption | 10:14 |
benschubert | if we have more than direct rdep, how do you handle runtime deps of runtime deps, without adding this hash that will make splitting _update_state harder, and prevent us from having sane optimizations in the future? | 10:14 |
tristan | But (B) is still at the top of my list | 10:14 |
tristan | benschubert, you add the hash, that propagation can still be safely split up by domain in the future | 10:15 |
benschubert | And C is not that significant, as I said, a 1MB increase out of 680MB, | 10:15 |
tristan | at least you start with the sane approach | 10:15 |
tristan | Ok here is the sensible consideration to make... | 10:17 |
tristan | All of this added complexity, is in the hope to avoid descending into an element which might not need to have it's state resolved "right now" | 10:18 |
tristan | In a reverse dependency | 10:18 |
benschubert | tristan: the hash, at least as how it is described in https://gitlab.com/BuildStream/buildstream/merge_requests/1070#note_148535495 doesn't fix anything, as we are still missing runtime deps of runtime deps. How would you fix that, without resorting to what I have, or having an update of everything? | 10:18 |
tristan | However, a saner, level by level recursion, storing only direct reverse deps, ensures that we dont get ahead of ourselves and call _update_state() unnescesarily | 10:19 |
tristan | Which we go to lengths to try to achieve with this approach of storing everything | 10:19 |
tristan | And in the end, it is a reverse dependency which is in the build plan somewhere, so it's state *will* end up being calculated at some point | 10:19 |
juergbi | benschubert: the hash covers runtime deps of runtime deps | 10:20 |
tristan | In any case it will, maybe not at the same time, but ensuring we don't prematurely do it is the key | 10:20 |
tristan | (i.e. ensuring we dont call _update_state() until we know it could result in resolving a key) | 10:20 |
juergbi | benschubert: as the new recursive_runtime_hash would use the recursive_runtime_hash of the dependencies | 10:20 |
tristan | In the end, we get the same overall performance without all the headache | 10:20 |
juergbi | and thus, implicitly covering indirect dependencies | 10:20 |
benschubert | juergbi: so | 10:23 |
benschubert | 1) have __reverse_build_deps, containing all direct reverse dependencies that depend on us for either BUILD or BUILD and RUN | 10:23 |
benschubert | 2) have __reverse_runtime_deps, containings all direct reverse dependencies that depend on us for RUN only | 10:23 |
benschubert | 3) whenever we update our key, recurse in all __reverse_build_deps | 10:23 |
benschubert | 4) whenever we update our strong key, recurse in all __reverse_runtime_deps | 10:23 |
benschubert | Is that what you have in mind? | 10:23 |
juergbi | I would use a single list with build+runtime direct reverse dependencies | 10:24 |
juergbi | whenever the new hash changes, recurse | 10:24 |
tristan | Yeah, when you split up that propagation by domain, it is only the loops which change (different loops resolve different things) | 10:25 |
benschubert | juergbi: however, a weak cache key will never update a reverse_runtime_dep, correct? | 10:25 |
tristan | but the lists (or list) can mostly stay the same... that means we might check different hashes in the loops to decide whether or not to recurse | 10:25 |
benschubert | tristan: now, how does this fix calling multiple times the same update? | 10:26 |
juergbi | benschubert: if the weak cache key changes, either the strong cache key changes as well, or it's None | 10:26 |
juergbi | the big difference is that the reverse_dependencies list no longer includes indirect dependencies | 10:27 |
tristan | <benschubert> tristan: now, how does this fix calling multiple times the same update? <--- I don't compute what that means | 10:27 |
juergbi | so you only have recursion via __update_state_recursively(), and no longer implicit recursion by iterating through a list that already includes the indirect dependencies | 10:27 |
benschubert | juergbi: so that makes the recursion explicit, but actually introduces more work as we are not updating also runtime dependencies that might never be needed (if our element is a leaf in the current plan correct?) and does not save any work otherwise since the calls will be done anyways | 10:29 |
benschubert | or am I missing something? | 10:30 |
juergbi | benschubert: right now the branch does both | 10:30 |
juergbi | i.e., explicit and implicit recursion | 10:30 |
juergbi | resulting in potential quadratic scaling | 10:30 |
juergbi | while with the hash we only have explicit recursion, although indeed covering some unneeded reverse dependencies | 10:30 |
juergbi | with large/deep element graphs, I'd expect the hash-based approach to win out | 10:31 |
tristan | Those unneeded reverse dependencies are only in the present loop | 10:31 |
tristan | They will probably be needed later anyway | 10:31 |
tristan | And if we want to avoid that completely, we can check if it is "required" in the loop and omit them | 10:31 |
tristan | in the off chance, but I *think* that by the nature of how we load, it can't happen that we won't want to calculate the keys | 10:32 |
tristan | I.e. we dont load all the possible reverse dependencies to start with, we only load dependencies which we know we are going to want their keys | 10:32 |
benschubert | Ok another proposal: | 10:33 |
benschubert | Instead of building this list at that moment, we actually build it in _set_required, that way we have a minimal graph and maximum gains and only direct deps | 10:33 |
benschubert | and this actually improves even more stuff | 10:33 |
benschubert | as we are not computing cache keys of things that are not yet required | 10:33 |
tristan | Maybe, depending on the meaning of required | 10:33 |
tristan | I.e. even if we are not going to need some elements across a build-only boundary, that doesnt necessarily mean we dont want to include their cache keys in a report, if those cache keys can be known | 10:34 |
benschubert | tristan: but then those cache keys would have been computed before right? | 10:35 |
tristan | benschubert, But yes in general I would be amenable to building the list of direct rdeps lazily at _set_required() time | 10:35 |
tristan | that does sound sensible | 10:35 |
benschubert | or that would mean removing my commit that checks the state at the end | 10:35 |
benschubert | ok | 10:35 |
benschubert | I'll see what this gives | 10:35 |
*** raoul has joined #buildstream | 10:40 | |
tristan | benschubert, One more point about that: it might not really make a difference whether we load that lazily or not - because we might assume that we are only going to be calling _update_state() on elements that we are processing | 10:41 |
tristan | I mean, that might be helpful if attempting to do it lazily turns out to be a weird and complex eyesore | 10:41 |
benschubert | tristan: but that's not the case, correct? | 10:41 |
benschubert | so we are always updating everything | 10:42 |
tristan | benschubert, I think it is the case | 10:42 |
benschubert | whereas here, we would have the smallest flatest case | 10:42 |
tristan | benschubert, if you are iterating over reverse deps on an element already, you will not reach something that is not "required" | 10:42 |
tristan | benschubert, unrequired things are lower in the tree (cut off by build-only dependency boundaries), not higher (closer to the targets) | 10:43 |
benschubert | tristan: ah true | 10:43 |
benschubert | tristan: however doing it at this moment has two advantages: | 10:44 |
benschubert | 1) Don't create the lists when we do not need them | 10:44 |
benschubert | 2) We get the flattened list for free | 10:44 |
tristan | Yeah | 10:44 |
tristan | I'm just pointing it out in case it turns out to be weird code | 10:44 |
benschubert | right | 10:45 |
tristan | Like, when you set something required, you can easily do it recursively... _set_required() is already called for each element individually | 10:45 |
tristan | and without recursion, you dont have the rdeps in context right ? | 10:45 |
tristan | anyway, I'm not sure how that can work | 10:45 |
benschubert | tristan: I'll need to check | 10:46 |
gitlab-br-bot | marge-bot123 closed issue #922 (Make it easy to test BuildStream against external plugins) on buildstream https://gitlab.com/BuildStream/buildstream/issues/922 | 10:56 |
gitlab-br-bot | marge-bot123 merged MR !1158 (phil/external-plugin-testing->master: Make it easy to test BuildStream against external plugins) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1158 | 10:56 |
*** cs-shadow has joined #buildstream | 11:15 | |
jonathanmaw | cs-shadow: WSalmon says you'd be the person to ask about pypi credentials. I'm looking at https://gitlab.com/BuildStream/bst-external/merge_requests/71 and it'll need a username and password in pypi to be useful | 11:18 |
gitlab-br-bot | BenjaminSchubert approved MR !1211 (jennis/move_node_get_project_path->master: Cleanup: Move _yaml.node_get_project_path() to Project._get_path_from_node()) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1211 | 11:18 |
jonathanmaw | is there an account buildstream uses for uploading to pypi already? | 11:18 |
cs-shadow | jonathanmaw: not yet, we've been using our personal accounts so far. Although it definitely seems a worthwhile idea to get a buildstream-bot account | 11:19 |
cs-shadow | I'd imagine that can be shared by buildstream project and various extrernal plugins repos as well | 11:19 |
WSalmon | cs-shadow, did we get as far as having a "group" that some individual accounts are linked to? | 11:24 |
*** jonathanmaw has quit IRC | 11:29 | |
cs-shadow | WSalmon: I don't think PyPI supports the notion of groups, unlike DockerHub | 11:30 |
cs-shadow | having a buildstream-bot account that has secondary email id of more than person is as close as it is going to get AFAIK | 11:31 |
*** toscalix has quit IRC | 11:35 | |
*** toscalix has joined #buildstream | 11:37 | |
*** jonathanmaw has joined #buildstream | 11:38 | |
*** phildawson_ has joined #buildstream | 11:52 | |
*** phildawson has quit IRC | 11:52 | |
benschubert | tristan: my first test with set_required fails when workspaces are involved. I don't have time to look at this now, I'll postpone it a bit :/ | 12:00 |
benschubert | interesting. Running bst build on a project has so far printend 89k times "Resource temporary unavailable" because we are spawning too many subprocesses a second | 12:25 |
tristan | benschubert, I believe there is a report for that | 12:32 |
* tristan recalls filing one a long time ago, it just doesnt happen in practice so I guess we're still not catching the exception | 12:32 | |
benschubert | It does happen in practice. That was our production system | 12:33 |
tristan | Ohhh maybe with remote execution, or a lot of --builders and a very horizontal graph ? | 12:33 |
tristan | benschubert, https://gitlab.com/BuildStream/buildstream/issues/93 | 12:34 |
tristan | 93 ! | 12:34 |
tristan | haha small number | 12:34 |
benschubert | 114'000 times for a 6k pipeline, that seems excessive | 13:28 |
*** nimish has joined #buildstream | 14:12 | |
*** toscalix has quit IRC | 14:15 | |
*** toscalix has joined #buildstream | 14:15 | |
*** jonathanmaw has quit IRC | 14:15 | |
*** slaf has quit IRC | 14:26 | |
gitlab-br-bot | jmacarthur approved MR !1192 (juerg/remote-execution-cas->master: Improve remote execution) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1192 | 14:42 |
gitlab-br-bot | martinblanchard approved MR !1192 (juerg/remote-execution-cas->master: Improve remote execution) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1192 | 14:43 |
*** nimish has quit IRC | 14:47 | |
*** nimish has joined #buildstream | 14:49 | |
gokcennurlu | I am confused about the `tar` source plugin | 14:50 |
jmac | What's the issue? I had some dealings with that recently | 14:51 |
gokcennurlu | I want to keep the folders intact, but buildstream is flattening the output | 14:51 |
jmac | That's odd, it shouldn't | 14:51 |
gokcennurlu | I test my tar with `tar -xvf` and it works, but can't do it with buildstream. I might be missing something, I will dig a bit more | 14:52 |
gokcennurlu | but | 14:52 |
gokcennurlu | The base_dir raised my eyebrow, here https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/plugins/sources/tar.py#L75 | 14:52 |
gokcennurlu | my output is also missing some files. I will try inserting some print statements in `_find_base_dir` etc. | 14:59 |
jmac | Yeah, find_base_dir looks a bit suspicious to me | 15:00 |
jmac | Does your tarfile extract (in bst) if you specify base-dir: '' ? | 15:02 |
*** raoul has quit IRC | 15:03 | |
gokcennurlu | no, nothing. | 15:05 |
gitlab-br-bot | phildawson approved MR !1212 (tristan/cleanup-workspace-tests->master: tests/frontend/workspace.py: Remove redundant and pointless tests) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1212 | 15:06 |
gokcennurlu | ver 1.3.0 by the way. | 15:06 |
gokcennurlu | jmac sorry, it fixed it, yes. | 15:11 |
gokcennurlu | Thank you! | 15:14 |
jmac | gokcennurlu: np, I don't think the base-dir option is particularly clear | 15:16 |
*** slaf has joined #buildstream | 15:16 | |
*** slaf has joined #buildstream | 15:16 | |
*** slaf has joined #buildstream | 15:17 | |
*** slaf has joined #buildstream | 15:17 | |
*** slaf has joined #buildstream | 15:17 | |
*** slaf has joined #buildstream | 15:17 | |
*** slaf has joined #buildstream | 15:18 | |
*** raoul has joined #buildstream | 15:21 | |
*** slaf has quit IRC | 15:21 | |
*** slaf has joined #buildstream | 15:21 | |
*** slaf has joined #buildstream | 15:21 | |
*** slaf has joined #buildstream | 15:22 | |
*** slaf has joined #buildstream | 15:22 | |
*** nimish has quit IRC | 15:29 | |
gitlab-br-bot | marge-bot123 closed issue #935 (Remote Execution: build command failures hard to diagnose (always 'Directory not found')) on buildstream https://gitlab.com/BuildStream/buildstream/issues/935 | 15:42 |
gitlab-br-bot | marge-bot123 merged MR !1192 (juerg/remote-execution-cas->master: Improve remote execution) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1192 | 15:42 |
benschubert | how can I ensure I'm not caching build trees? | 15:45 |
tpollard | --cache-buildtrees never on the cli | 15:46 |
tpollard | or cache-buildtrees: never in user.conf | 15:47 |
benschubert | tpollard: I have cache-buildtrees:never in user.conf, but the size of my cache didn't go down (I wiped it entirely and rebuilt my stuff) | 15:47 |
tpollard | under the cache: section? | 15:48 |
benschubert | tpollard: under the cache: secion in my configuration file provided as --config when running buildstream | 15:48 |
raoul | juergbi fyi, I think the remaining discussion points on !1124 are fixed, though I've left some open as I'm not 100% sure they're resolved | 15:49 |
gitlab-br-bot | MR !1124: Source cache https://gitlab.com/BuildStream/buildstream/merge_requests/1124 | 15:49 |
juergbi | ok, ta, will take a look asap, but will probably be Monday | 15:49 |
raoul | cheers, shall put up an MR for the remote source cache soon too as that's gone surprisingly smoothly | 15:50 |
juergbi | great | 15:50 |
tpollard | benschubert: what happpens if you try open a build shell on one of the elements that would normally have a buildtree, with --use-buildtree? | 15:51 |
tpollard | There's integration tests for cache-buildtrees via cli and user conf | 15:51 |
tpollard | if you've got an empty buildtree cached, it should warn that it's empty when staging it | 15:52 |
tpollard | that's should be covered with tests too | 15:53 |
benschubert | tpollard: I'll have a look | 15:53 |
jmac | tpollard: Does the --config option to bst specify project config or user config? | 15:54 |
tpollard | user iirc | 15:55 |
jmac | That should be fine then | 15:56 |
tpollard | the only other thing I can think is that none of the elements have a build-root, so the buildtree is irrelevant | 16:00 |
mablanch | juergbi: [WIP] CI job testing BuildStream remote execution against BuildGrid for your just-merged changes in !1192 (triggered manually): https://gitlab.com/BuildGrid/buildgrid.hub.docker.com/-/jobs/174298350 | 16:00 |
gitlab-br-bot | MR !1192: Improve remote execution https://gitlab.com/BuildStream/buildstream/merge_requests/1192 | 16:00 |
benschubert | tpollard: not sure I understand | 16:01 |
juergbi | mablanch: great, ta | 16:02 |
tpollard | The buildtree will always be empty regardless of config if the element being cached doesn't have a build-root benschubert | 16:02 |
benschubert | tpollard: ah, so if I didn't have it before, it wouldn't change anything correct? | 16:03 |
tpollard | benschubert: afaict yes | 16:03 |
tpollard | setting cache-buildtree: none just enforces this behaviour for all elements | 16:04 |
*** nimish has joined #buildstream | 16:04 | |
tpollard | this will be handled more clearly once we move to the proto (which will define if an artifact include a buildtree, and if it was original created with one) | 16:07 |
*** nimish has quit IRC | 16:09 | |
*** nimish has joined #buildstream | 16:09 | |
gitlab-br-bot | cs-shadow opened MR !1213 (chandan/container-plugins->master: doc/source/core_plugins.rst: Add link to bst-plugins-container) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1213 | 16:13 |
*** nimish has quit IRC | 16:29 | |
*** nimish has joined #buildstream | 16:35 | |
*** nimish has quit IRC | 16:40 | |
*** nimish has joined #buildstream | 16:40 | |
*** toscalix has quit IRC | 16:43 | |
*** nimish has quit IRC | 16:45 | |
*** nimish has joined #buildstream | 16:45 | |
gitlab-br-bot | raoul.hidalgocharman opened MR !1214 (raoul/440-source-cache-remotes->master: Remote source cache) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1214 | 16:46 |
*** nimish has quit IRC | 17:00 | |
*** nimish has joined #buildstream | 17:01 | |
*** nimish has quit IRC | 17:06 | |
*** nimish has joined #buildstream | 17:21 | |
*** tpollard has quit IRC | 17:25 | |
*** nimish has quit IRC | 17:26 | |
*** nimish has joined #buildstream | 17:27 | |
*** nimish has quit IRC | 17:32 | |
*** nimish has joined #buildstream | 17:32 | |
benschubert | tristan: I tried with a PQ and the ordering instead of the _set_required part. That is what it would give: https://gitlab.com/BuildStream/buildstream/merge_requests/1070/diffs?commit_id=828510a8bbb6eeffdbfd05afd9998cdf2768377a (I can obviously comment/cleanup the code) I'll also run benchmarks on this to see what we gain | 17:33 |
benschubert | tristan: let me know if this way would be a big no or not | 17:34 |
*** nimish has quit IRC | 17:37 | |
*** nimish has joined #buildstream | 17:48 | |
*** raoul has quit IRC | 17:51 | |
*** nimish has quit IRC | 17:58 | |
*** nimish has joined #buildstream | 17:58 | |
tristan | benschubert, it's a bit late here, I don't understand why the aversion to only storing direct reverse dependencies | 18:03 |
tristan | looks more complicated than it needs to be | 18:04 |
benschubert | tristan: ok I'll have a look with having both too | 18:05 |
benschubert | and benchmark all three solutions on a bigger dataset | 18:06 |
tristan | Honestly, the whole patch fixes an important bug *and* makes the code more sensible in general, if it doesnt make things slower, that is a good thing overall | 18:07 |
tristan | Also, I would prefer looking into splitting up update state details into more granular updated, before trying more complicated things | 18:08 |
tristan | *into more granular updates | 18:08 |
benschubert | tristan: because even those updates won't be sufficent to get acceptable performance on medium-large projects | 18:10 |
benschubert | but yes, we can go per step | 18:10 |
*** nimish has quit IRC | 18:33 | |
*** nimish has joined #buildstream | 18:49 | |
*** nimish has quit IRC | 18:54 | |
*** nimish has joined #buildstream | 18:55 | |
tristan | benschubert, I think we can eliminate calls to Element.dependencies() in cache key calculation with this reverse deps things as a first step | 18:59 |
tristan | i.e. since we're informing reverse deps in order to trigger changes, we could be passing the cache keys down one level at a time, and cumulatively constructing the dependency portion of the cache keys | 19:00 |
tristan | that should be a good win and scale better than recursively calculating them | 19:01 |
*** nimish has quit IRC | 19:05 | |
*** nimish has joined #buildstream | 19:05 | |
*** nimish has joined #buildstream | 19:05 | |
*** nimish has quit IRC | 19:08 | |
*** nimish has joined #buildstream | 19:18 | |
*** nimish has quit IRC | 19:23 | |
*** tristan has quit IRC | 19:43 | |
benschubert | Tristan : that is true and indeed better | 19:58 |
*** slaf has quit IRC | 20:53 | |
*** slaf has joined #buildstream | 21:12 | |
*** slaf has joined #buildstream | 21:12 | |
*** slaf has joined #buildstream | 21:12 | |
*** slaf has joined #buildstream | 21:12 | |
*** cs-shadow has quit IRC | 21:40 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!