gitlab-br-bot | buildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/372 | 00:10 |
---|---|---|
*** Prince781 has joined #buildstream | 00:20 | |
*** jsgrant has joined #buildstream | 01:24 | |
*** jsgrant has quit IRC | 01:26 | |
*** Prince781 has quit IRC | 01:50 | |
*** Prince781 has joined #buildstream | 02:05 | |
*** Prince781 has quit IRC | 02:42 | |
*** tristan has joined #buildstream | 05:06 | |
gitlab-br-bot | buildstream: merge request (test-run-doc->master: HACKING.rst: Add integration and pytest notes) #319 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/319 | 06:49 |
gitlab-br-bot | buildstream: merge request (test-run-doc->master: HACKING.rst: Add integration and pytest notes) #319 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/319 | 07:09 |
gitlab-br-bot | buildstream: merge request (milloni/assert-bwrap-version->master: Assert minimum bwrap version on host) #379 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/379 | 07:10 |
*** toscalix has joined #buildstream | 07:57 | |
tristan | juergbi, ok so I first fixed that "running files" serialization, with the thought in mind that serializing where sources were staged relative to the sandbox in the workspaces.yml was the right solve for `bst shell --sysroot` | 08:10 |
tristan | fixing that serialization of writes was important, but now I think maybe I had the wrong solve for `bst shell --sysroot` | 08:10 |
tristan | Consider that; first of all we want to cache artifacts of failed builds | 08:11 |
tristan | Also consider that... a failed build sandbox is probably a very specific state | 08:11 |
tristan | juergbi, so I'm starting to think that; instead, after a build fails and we intend to give the user a sysroot snapshot of that build failure... maybe we should just be copying the workspace files into the mount location, and not mount at `bst shell --sysroot` time ? | 08:13 |
tristan | It would require I guess a utils.copy_files(...preserve_mtime=True) semantic | 08:13 |
tristan | any thoughts on that ? | 08:13 |
tristan | for reference: https://gitlab.com/BuildStream/buildstream/issues/346 <-- that issue | 08:14 |
juergbi | tristan: hm, that would be a user-visible change, though | 08:15 |
gitlab-br-bot | buildstream: merge request (milloni/assert-bwrap-version->master: Assert minimum bwrap version on host) #350 changed state ("closed"): https://gitlab.com/BuildStream/buildstream/merge_requests/350 | 08:15 |
juergbi | users might rely on read-write workspaces | 08:15 |
juergbi | but maybe they shouldn't | 08:16 |
juergbi | this would be in line with my idea of separating workspace sources and build directory | 08:16 |
juergbi | from that perspective it sounds fine to me. just wondering whether there are use cases where people really need the mounted workspace | 08:17 |
tristan | juergbi, I think a user visible change was that `bst shell --sysroot` became completely broken when we started mounting the workspaces into the sandbox | 08:18 |
tristan | juergbi, the question is rather what is the appropriate fix; I feel like the appropriate fix is to give the user a sysroot that is actually representative of what failed | 08:19 |
tristan | not a sysroot that "might have changed because the user edited a workspace file" | 08:19 |
juergbi | true, I think that makes sense | 08:19 |
tristan | which would be more in line with what we will be doing once we solve caching failed builds | 08:20 |
juergbi | for the sysroot case | 08:20 |
juergbi | what about bst shell without sysroot? | 08:20 |
tristan | i.e. when we cache failed builds, we should be removing the sysroots completely I think, and always cleaning up the build directory | 08:20 |
tristan | bst shell without sysroot is not broken | 08:20 |
juergbi | yes, so we keep mounting it there, right? | 08:21 |
tristan | it stages everything on demand, delegates to Element.stage() and everything | 08:21 |
tristan | right, we keep mounting it there | 08:21 |
juergbi | so that will still need the running_files update | 08:21 |
*** segfault3[m] has joined #buildstream | 08:21 | |
tristan | juergbi, right: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/element.py#L522 :) | 08:22 |
tristan | interestingly, that code path cannot get hit with `--sysroot` anyway | 08:22 |
tristan | because nothing is staged when shelling into an existing sysroot | 08:23 |
segfault3[m] | Hi, I'm having issues with buildstream. I tried building both gtk+ and GNOME Shell on Debian Stable and Debian Testing, it always fails with this message: | 08:23 |
segfault3[m] | FAILURE [base/cyrus-sasl.bst]: tar source at base/cyrus-sasl.bst [line 48 column 4]: Error mirroring ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz: <urlopen error ftp error: error_perm('500 OOPS: failed to open vsftpd log file:/var/log/vsftpd.log',)> | 08:23 |
tristan | segfault3[m], that looks like a legitimate error | 08:24 |
tristan | segfault3[m], i.e. you're getting 500 from the upstream site | 08:25 |
tristan | segfault3[m], wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz is also reporting errors for me | 08:25 |
tristan | https://bpaste.net/show/0aae6b299b9c | 08:26 |
segfault3[m] | ok, thanks, I will ask the cyrusimap devs then | 08:27 |
tristan | segfault3[m], this is a typical issue, and we're currently working on a mirroring solution here https://gitlab.com/BuildStream/buildstream/issues/328 | 08:27 |
tristan | so we hope that we can setup alternative mirrors and easily enable mirroring such that we can fallback to alternative mirrored locations | 08:27 |
segfault3[m] | sounds good | 08:28 |
tristan | segfault3[m], oh, if you have a means of actually obtaining the tarball, a workaround would be to locally set the 'url' field in `base/cyrus-sasl.bst` of it to a local file:// url | 08:31 |
tristan | ah, we already use shutil.copystat() in safe_copy(), so no need to explicitly preserve mtime | 08:47 |
gitlab-br-bot | buildstream: merge request (135-expire-artifacts-in-local-cache->master: WIP: Resolve "Expire artifacts in local cache") #347 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/347 | 08:57 |
segfault3[m] | tristan: I was able to download the tarball from https://www.cyrusimap.org/releases/cyrus-sasl-2.1.26.tar.gz, but I can't find the cyrus-sasl.bst on my system | 09:02 |
jmac | I've had this problem | 09:03 |
jmac | Cyrus-sasl's ftp site has been down for several days | 09:04 |
jmac | The bst file should be sdk/elements/base/cyrus-sasl.bst | 09:04 |
jmac | Actually that's freedesktop-sdk, which you may not be using | 09:05 |
segfault3[m] | I'm using gnome-build-meta, it only has freedesktop-sdk-junction.bst and linker-priority.bst in elements/base/ | 09:06 |
*** jonathanmaw has joined #buildstream | 09:06 | |
gitlab-br-bot | buildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/380 | 09:12 |
tristan | juergbi, wanna take a look at https://gitlab.com/BuildStream/buildstream/merge_requests/380 ? | 09:12 |
juergbi | sure | 09:13 |
tristan | oops linting error, I'll have to repush, but it's certainly something minor | 09:15 |
tristan | maybe a line too long | 09:15 |
* tristan will look into optimizing "Calculating cached state" and hopefully improve that today | 09:15 | |
tlater | juergbi: I'm not sure how to tackle concurrency on !347 - if we want to take advantage of cached artifact cache sizes, and don't want to accidentally start deleting artifacts from another element's build, the data in the main process needs to be updated as fast as possible | 09:16 |
tlater | So an approach like how tristan dealt with the workspace metadata doesn't seem like it's going to work | 09:16 |
tlater | 09:16 | |
tristan | tlater, first of all, half of your problem doesnt make sense | 09:16 |
tristan | tlater, all of the artifacts which you are going to create in a build should be pinned anyway | 09:17 |
tristan | it may be a corner case for workspace artifacts, which should be added to the "pinned" list as soon as the cache key is discovered | 09:17 |
tristan | tlater, juergbi; I'm not sure I like the idea of abusing the IPC and synchronizing things for the purpose of LRU expiry | 09:19 |
tristan | it sounds like this would fit better into the existing architecture of queues | 09:19 |
tristan | tlater, if you had an entirely separate task, who's job was just to cache an already built artifact, where the dispatch mechanism has some opportunity to reconcile, wouldnt that make things easier ? | 09:20 |
tlater | Yeah, I do think so, dispatching "cache" tasks would allow synchronizing cache sizes after each operation, instead of between processes | 09:20 |
juergbi | tristan: !380 makes sense to me. I'm wondering whether this might be a bit expensive for git sources | 09:21 |
tristan | *maybe* treating the main process as a sort of "cache size counter server" to tasks, and doing IPC from task -> parent would also work | 09:21 |
tristan | juergbi, it might be expensive for linux.git | 09:22 |
tristan | re the above, looking at program flow: Compare a bunch of child processes doing assemble, and all wanting the attention of the main process to periodically negotiate a cache size counter... | 09:22 |
juergbi | tlater: can't we do it in _assemble_done? | 09:22 |
juergbi | in the main process | 09:22 |
tristan | with a program flow where caching the artifacts is done in a separate queue | 09:23 |
tlater | juergbi: Wouldn't we have synchronization issues then, where some processes think the cache is larger than it is? | 09:23 |
tristan | juergbi, you want to hold up the main process with deletion of artifacts ? | 09:23 |
tristan | actually, that doesnt work I think | 09:23 |
juergbi | no, just pin the cache key for workspaces | 09:23 |
tristan | you need to delete artifacts from the cache to make space for new incoming artifacts | 09:23 |
tlater | Ah, I see what you mean | 09:23 |
tristan | ah, that part yes | 09:23 |
juergbi | ah, synchronization of the deletion is still an issue | 09:23 |
tlater | Yeah :) | 09:24 |
juergbi | the thing is, committing to ostree doesn't actually consume additional space | 09:24 |
juergbi | if the build result already exists | 09:24 |
tristan | tlater, it's tempting to go with "Add some IPC sugar, because I dont want to split up this big task into two tasks" | 09:24 |
tristan | tlater, I dont want us to be lazy that way; instead think of what is the cleaner program flow | 09:25 |
juergbi | so maybe keep committing in the build job but have a separate (serial) queue for deletion if necessary? | 09:25 |
tristan | if it really is cleaner to add some IPC (I feel like it's doubtful, though), then we go that route | 09:25 |
tristan | But "Lets fix this while modifying the least number of lines of code" is how we get into trouble | 09:26 |
juergbi | I don't see a point of splitting out committing itself | 09:26 |
tristan | juergbi, seeing your point | 09:26 |
tristan | juergbi, it's not going to take additional disk space to commit the artifact and delete the build directory; or is it ? | 09:27 |
tristan | juergbi, when we commit an artifact, does ostree retain hardlinks to the sources ? | 09:27 |
tristan | or can it grow ? | 09:27 |
juergbi | there will be a very short peak as we delete it after committing | 09:27 |
juergbi | no, ostree doesn't take hardlinks to keep things safe | 09:27 |
juergbi | iirc | 09:27 |
tristan | right, that makes sense, so technically, we bust our limits if we take that approach, and we are kind of lying about the quota | 09:28 |
juergbi | theoretically, we could commit and delete on a file by file basis | 09:28 |
* tlater doesn't like this for the case in which we're limited by disk space | 09:28 | |
juergbi | so the peak would just be a single file | 09:28 |
juergbi | longer term with FUSE we might actually be able to hardlink/rename if we control everything | 09:29 |
tristan | juergbi, rather, if we have a separate queue/task which which deletes-things-if-necessary, what would be the difference in that, from just committing to cache in a separate queue ? | 09:29 |
tristan | seems like not much of a big difference | 09:29 |
juergbi | seems more complex to me | 09:30 |
juergbi | the deletion aspect would also be required for pull and with remote execution | 09:30 |
juergbi | but separate committing makes much less sense in general | 09:30 |
juergbi | regarding peak disk usage: if we don't hold off parallel build jobs, we can't control peak disk usage anyway | 09:31 |
juergbi | we'll always need some extra space for working | 09:31 |
juergbi | and we must keep the case in mind where multiple bst sessions are running with the same local cache | 09:32 |
juergbi | i.e., we can't control this all anyway | 09:32 |
tlater | juergbi: That's what the 2GB headroom is for - I don't think that's quite enough for artifacts using the tar cache, though | 09:32 |
juergbi | we can hopefully replace the tar cache, so let's ignore it for architecture questions | 09:33 |
tristan | tlater, the headroom was originally a consideration that the user *might* have sources on the same volume | 09:33 |
tristan | tlater, and because we wanted to have a sensible "all" default value | 09:33 |
tristan | A.) We know the maximum space an artifact can potentially consume in the cache... B.) We know the size of the cache... C.) We (ungracefully) consider the build directory, to *not* be the cache (it can in fact be on a separate volume, but that's slow) | 09:35 |
tristan | for what is the cache, we can respect the limit; except for this synchronization issue | 09:35 |
juergbi | that also assumes no multiple concurrent bst sessions | 09:36 |
juergbi | the whole pinning actually as well | 09:36 |
tristan | Yes I noticed that | 09:36 |
tlater | juergbi: We work around that slightly | 09:37 |
tristan | juergbi, to mitigate this, I asked that tlater mark every artifact that was going to be used, as used; immediately | 09:37 |
tristan | juergbi, lessening the chance of such a parallel build deletion situation | 09:37 |
juergbi | right | 09:37 |
tristan | in the end, we *still* have to gracefully handle the out-of-space failures | 09:37 |
juergbi | yes, and also missing artifacts | 09:38 |
juergbi | (in case the quota is too small for the session) | 09:38 |
tlater | juergbi: Currently that's considered a failed build, although it will remove any artifacts not required in the pipeline. | 09:39 |
tlater | (Hard to work around that latter bit, but this should be a very rare case) | 09:39 |
juergbi | yes, failing is fine but we might need a suitable hint in the error message | 09:40 |
tlater | I do have a specific error message for that, could take some rewording | 09:40 |
juergbi | just saying we can't completely ignore it and completely crash and burn | 09:40 |
juergbi | good | 09:40 |
tristan | right, that is the graceful part :) | 09:40 |
gitlab-br-bot | buildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/380 | 09:43 |
gitlab-br-bot | buildstream: issue #346 ("Fails to open build shell after build failure") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/346 | 09:54 |
gitlab-br-bot | buildstream: issue #346 ("Fails to open build shell after build failure") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/346 | 09:54 |
gitlab-br-bot | buildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/380 | 09:54 |
tlater | Hm, so, I think I like the idea of a deletion queue. We could hold off on committing a handful of element's artifacts and queue a "please make this much space in the artifact cache" request. | 09:56 |
tlater | That would reduce the number of times we need to calculate the cache size | 09:56 |
tlater | As well as help reduce the number of calls to ostree.repo.prune | 09:57 |
tlater | Plus as juergbi says, this is more flexible than trying to do this while we're deleting, it could be reused very easily. | 09:58 |
tlater | Some synchronization to figure out how much space we want to reserve would still need to happen for the build queue | 09:58 |
tlater | But the major problem of trying to figure out how much space there is in the cache would be a lot easier | 09:59 |
tlater | Thoughts? | 09:59 |
tlater | I think the deletion queue wouldn't need any concurrency, actually | 09:59 |
tlater | Since we'd just need to figure out what artifacts we want to delete, and do them all at once. | 10:00 |
*** aday has joined #buildstream | 10:02 | |
tristan | tlater, Sure, I dont mind so much except that it seems that it *could* respect the limit, while this approach is a bit more fuzzy on that | 10:06 |
tristan | tlater, but I dont mind so much, it will be more performant this way anyway | 10:07 |
tristan | tlater, one thing is, we probably wont be treating this as a regular "Queue", so we may need other ways to launch a Job | 10:07 |
tristan | that will probably change the architecture somewhat, interested to see what you have in mind... | 10:08 |
tristan | tlater, maybe it can be a regular queue with a special attribute such that it is not displayed in any status parts of the UI | 10:14 |
tristan | i.e. it makes no sense to see "You skipped the deletion of 200 elements" reported, or to see the queue in the status area in interactive mode | 10:15 |
tristan | tlater, also the log lines are weird in this case, they will automatically print the element in context being processed in that queue, but that element is not getting deleted | 10:16 |
tlater | Hm, right. I'm wondering if this actually is a queue, in that sense. | 10:17 |
tristan | maybe it will print "Making room in the cache", and then a message "detail" listing the artifacts it's deleting | 10:17 |
*** dominic has joined #buildstream | 10:18 | |
tristan | yeah that's what I mean, we could hijack the queue; but we'd have to special case the UI | 10:18 |
tristan | tlater, on the other hand, there is already some good cases for leveraging the scheduler without queues | 10:18 |
tlater | Really, the deletion method doesn't need to know anything about the elements it wants to make space for, except the total maximum size of their artifacts. | 10:18 |
tristan | there are some places we've wanted that, I think initialization is slow because we dont even interrogate source consistency state in parallel | 10:18 |
tristan | tlater, so having a way to spawn parallel tasks with the scheduler, without them being "elements in queues" is already a thing we can use | 10:19 |
tristan | if we do it, it should be streamlined in such a way that there is no code duplication between Queues and NewThing | 10:20 |
tristan | probably Queues uses the NewThing, to handle job completion and such parts | 10:20 |
tristan | ok, time to refactor artifact metadata | 10:33 |
tristan | lets split this all into separate files | 10:34 |
tristan | juergbi, oddly, your comment here https://gitlab.com/BuildStream/buildstream/issues/294#note_62585246 talks about loading meta/artifact.yml ... but Element._get_artifact_metadata() in *only* called at staging time | 10:35 |
tristan | maybe there is some justification for reading the meta/artifact.yaml via a separate code path, dunno... | 10:36 |
* tristan hunts it down and gets to work... | 10:36 | |
juergbi | tristan: _update_state() | 10:37 |
juergbi | # Load the strong cache key from the artifact | 10:37 |
juergbi | metadir = os.path.join(self.__extract(), 'meta') | 10:37 |
juergbi | meta = _yaml.load(os.path.join(metadir, 'artifact.yaml')) | 10:37 |
juergbi | self.__cache_key = meta['keys']['strong'] | 10:37 |
juergbi | should probably have used the helper | 10:37 |
juergbi | maybe it didn't exist yet | 10:37 |
tristan | gah | 10:56 |
tristan | looks like it's manually extracted all over the place | 10:56 |
tristan | I should note too, that the artifact.yaml is getting parsed *every time* | 11:05 |
tlater | tristan: The way I imagined this was that the build queue would be paused until there's enough space for a set of elements - this doesn't look particularly possible with the scheduler atm | 11:05 |
tlater | ^Re the concurrent deletion thing | 11:05 |
tristan | tlater, see Queue.status() | 11:07 |
tristan | tlater, it can return QueueStatus.WAIT | 11:07 |
tristan | tlater, I'm not saying that this can be done without a global revisiting of how scheduling is done, and probably some minor changes | 11:08 |
tristan | in any case, ability to run jobs outside of the context of a Queue is already a big change | 11:08 |
tristan | juergbi, every cached artifact has both a weak and a strong key encoded right ? | 11:18 |
juergbi | yes | 11:18 |
juergbi | (they may be identical, though) | 11:19 |
tristan | sure | 11:19 |
tristan | juergbi, think it's at all possible that that codepath in _update_state() is getting called multiple times recursively ? | 11:20 |
tristan | still, I'm adding Element.__metadata_cache hashtable | 11:20 |
tristan | just wondering if that might be enough to fix performance | 11:21 |
tristan | only load metadata once per artifact would be a good start | 11:21 |
tristan | Eh, splitting up the files is still a good idea, but I'll start with cache as first step, and follow that pattern while splitting | 11:22 |
juergbi | that part in _update_state() should never be invoked multiple times for the same element | 11:23 |
juergbi | as it sets the cache key and will never unset it | 11:23 |
juergbi | (at least in a single process) | 11:24 |
juergbi | so I don't think a cache helps here but it might make sense in general | 11:24 |
tristan | yeah there is other workspace bits which ask for the metadata | 11:25 |
tristan | so it will still help | 11:25 |
tristan | tlater, ... | 11:36 |
tristan | tlater, seeing as you were involved in the incremental builds stuff, and are now doing local LRU expiry... I should make sure... | 11:36 |
tristan | tlater, ... I should just double check and make sure you are not going to break this: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/element.py#L501 | 11:38 |
* tlater will check in a minute, is on the go but phone doesn't have enough ram to render files on gitlab | 11:40 | |
tristan | tlater, incremental builds in workspaces expect that you can load the last successful build metadata from the artifact cache at stage time | 11:41 |
tristan | tlater, so when that metadata suddenly disappears, it's important to not break. | 11:41 |
tristan | tlater, I'm more than half expecting an exclamation on your part, please feel free to use foul language :) | 11:42 |
*** marcledilson has joined #buildstream | 11:43 | |
tlater | Oh | 11:55 |
* tlater curses internally, easier to avoid doing so on IRC x) | 11:55 | |
tristan | tlater, I dont know, maybe you just need to wipe that incremental stuff to some kind of "clean slate" once the last_successful goes missing | 11:57 |
tlater | Hrm | 11:57 |
tlater | That would involve knowing what the build system changed and what was changed by the user | 11:57 |
tristan | I dont know that it would, it would rather involve ensuring that everything staged appears to be *new*, wouldnt it ? | 11:58 |
tlater | Oh, a new mtime on all files? | 11:58 |
tlater | Yeah, that works | 11:58 |
tristan | I havent followed what you've been doing closely, but seems that way | 11:59 |
tlater | ta, I'll make sure the expiry branch doesn't break here | 11:59 |
tlater | (In fact, I guess if someone manually deleted artifacts this would already break) | 12:00 |
* tristan really has to complete the privateness refactor | 12:00 | |
tristan | Element is still in a total mess, with plenty of really private things exposed with only a single underscore | 12:00 |
tristan | For this patch, I'm following current style and not paying heed to private policy | 12:01 |
tristan | better to fix that separately | 12:01 |
tlater | tristan: On that actually, I don't quite think the hacking file is clear enough on that | 12:03 |
* tlater at least doesn't quite understand where we make a distinction between "local private" and "API private" | 12:03 | |
tlater | At least before the refactor I assumed we only marked API private specially, but the wording now seems a bit confusing | 12:04 |
tristan | tlater, what don't you understand ? | 12:05 |
tristan | tlater, API Private is pretty clear, it means not exposed in public API | 12:06 |
tristan | tlater, local private means that it is not used outside of the current *scope* | 12:06 |
tlater | Do we mark *both* of those with a leading underscore? | 12:06 |
tristan | tlater, I'll be happy accept patches to the HACKING file | 12:06 |
tristan | Yes we do | 12:06 |
tristan | tlater, with double underscore in the few cases where there is ambiguity | 12:07 |
tlater | Ok, that makes sense then :) | 12:07 |
tristan | there is ambiguity when a class is public, and has internal things which are *also* local private (need 2 underscores) | 12:08 |
tlater | Annoyingly, that makes those variables unusable in subclasses | 12:08 |
tlater | So something like element.py might struggle using them | 12:08 |
tristan | those things need to have single underscore things which are API private but not local private (like Element._assemble_done() for instance, needs to be called from BuildQueue, but is not public API) | 12:08 |
tristan | tlater, that is the *whole point* | 12:09 |
tlater | Does an implementing class not count as local scope? | 12:09 |
tristan | tlater, it is not local private if it is accessed from somewhere else, like in another file, or a subclass | 12:09 |
tristan | No, "scope" is something pretty clearly defined :) | 12:09 |
tristan | it doesnt include fancy shmancy thingamabobs that Java, C++ and such languages add, this is just python :) | 12:10 |
* tlater thought in object scopes, as the subclass will actually be operating in a copy of its parent's scope, but doesn't further question this :) | 12:10 | |
tristan | juergbi, anything else you'd like to do while reving the artifact version ? | 12:36 |
tristan | juergbi, I'm going to do a little bit more cleanup, at least how cache keys are calculated for sources is right now a bit weird, worth changing if there is a rev bump | 12:37 |
* tristan thinks out loud that at this time... he would really love to see a *graph* of the performance of loading pipelines in non-strict mode | 12:47 | |
tlater | Gah, I've been designing something that relies on knowing how large ostree artifacts are :| | 12:54 |
tlater | o\ | 12:54 |
skullman | bah, worked out how to retry a pipeline, but it doesn't automatically merge after a retry | 12:56 |
skullman | https://gitlab.com/BuildStream/buildstream/merge_requests/375 will need someone with elevated permisisons to prod it | 12:56 |
tristan | skullman, it looks like we're hitting some kind of weird gitlab bug :-S | 12:59 |
tristan | it's telling me I set it to be merged | 12:59 |
tristan | and the pipeline passed | 12:59 |
tristan | but nothing to merge it | 12:59 |
tristan | skullman, I take it you've rebased and it now passes ? | 12:59 |
tristan | ok wait, I've canceled automatic merge, and now it tells me I need a rebase again | 13:00 |
tristan | skullman, note that I rebased in the UI the other day, and that's when it failed, I suspect you re-pushed this without local rebase maybe ? | 13:00 |
tristan | try rebase it locally and repush that | 13:01 |
skullman | I didn't rebase or push. I fetched the rebase you did via the UI and tested that locally and it worked, so got it to retry. | 13:01 |
* skullman will rebase | 13:01 | |
*** xjuan has joined #buildstream | 13:02 | |
tristan | strange | 13:04 |
tristan | still a weird gitlab bug occurred :) | 13:04 |
tristan | should I try UI rebase ? | 13:14 |
skullman | I'm currently running tests locally on a manual rebase. | 13:15 |
skullman | I don't expect it to fail. | 13:16 |
skullman | I'm fine with either waiting for a local pass, or not. | 13:16 |
gitlab-br-bot | buildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/381 | 13:17 |
tristan | juergbi, want to take a look at #381 ? | 13:18 |
tristan | juergbi, I want to merge that and ask mcatanzaro to observe performance after... | 13:19 |
tristan | maybe that's bad, but the split is generally a good strategy I think | 13:19 |
*** jonathanmaw has quit IRC | 13:23 | |
*** jonathanmaw has joined #buildstream | 13:32 | |
gitlab-br-bot | buildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/372 | 13:34 |
juergbi | tristan: taking a look | 13:35 |
*** bethw has joined #buildstream | 13:35 | |
*** marcledilson has quit IRC | 13:41 | |
*** Prince781 has joined #buildstream | 13:48 | |
* tristan leaves closing coffee shop... | 13:55 | |
tristan | will appear in a moment | 13:55 |
skullman | >:( there's non-deterministic test failures in tests/frontend/push.py and tests/integration/stack.py | 13:56 |
tlater | :| | 13:56 |
skullman | tests pass after a rebase with master, just not always | 13:56 |
tlater | Do you have a CI run for them? I've seen some such failures, though not recently. | 13:57 |
gitlab-br-bot | buildstream: merge request (jjardon/update_depencies->master: docs: Update list of dependecies) #373 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/373 | 13:58 |
skullman | tlater: https://gitlab.com/BuildStream/buildstream/-/jobs/62223526 contains a stack.py fail | 13:58 |
*** tristan has quit IRC | 13:58 | |
jjardon | Can someone review https://gitlab.com/BuildStream/buildstream/merge_requests/373/ and the docker counterpart of this https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/32 , please? | 13:59 |
tlater | Hm, that's not an integration test, so unlikely to be a problem with cleanup | 13:59 |
skullman | tlater: https://paste.baserock.org/reyobuwuwe for logs of another failed test_stack, https://paste.baserock.org/uqojayotof for a test_push | 14:00 |
*** tristan has joined #buildstream | 14:01 | |
*** bethw has quit IRC | 14:02 | |
*** bethw has joined #buildstream | 14:02 | |
tlater | skullman: Sorry, I've only seen the last issue before, everything else you probably have better context on than me currently | 14:02 |
tlater | As for that last one, the only time I've seen it occur was when I broke cache keys for pushed artifacts | 14:03 |
tlater | But I doubt that's the issue here, considering the previous two traces | 14:03 |
tlater | The ruamel exceptions are red herrings, btw, I think they pop up with our specific combination of pylint+ruamel in the debian-8 image | 14:04 |
* tlater has tried to fix them before, but couldn't find a working combination | 14:04 | |
tristan | I have a regular "hang" in test_push_pull when run locally, but seems to never happen in CI | 14:04 |
tristan | happens maybe once out of 20 | 14:04 |
tristan | or less | 14:04 |
tristan | but still often enough that I notice | 14:05 |
tristan | juergbi, still not sure eh :) | 14:06 |
juergbi | I actually wanted to test it locally but that's not so quick, so let's just review it | 14:07 |
tristan | Are you able to easily test that ? | 14:07 |
tristan | juergbi, if you want, I'll hold off for a few hours | 14:07 |
tristan | did you start a build ? | 14:08 |
tristan | it will take a few hours | 14:08 |
tristan | I guess | 14:08 |
juergbi | I was considering it but noticed that I first have to rebase some other stuff, so maybe not a good idea right now | 14:09 |
tristan | Ok, just review, lets get it in and see about performance after | 14:09 |
tristan | it should help at least | 14:09 |
* tristan will also run a GNOME build overnight | 14:10 | |
gitlab-br-bot | buildstream: merge request (jjardon/update_depencies->master: docs: Update list of dependecies) #373 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/373 | 14:10 |
gitlab-br-bot | buildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/381 | 14:11 |
skullman | :/ the test_stack failure from my local build is that umount is failing because there's nothing there. The one in CI is from a mount directory already being there. In CI I can believe that could be a scheduling problem of two runners running the same test on the same system at the same time. | 14:11 |
skullman | On the local failure it implies that there was some code expecting to not perform cleanup in some non-deterministic circumstances. | 14:12 |
tristan | the CI runners should be running at least in completely separate docker containers if they run on the same system | 14:13 |
tristan | for local failure, it does sound likely that your fix may have uncovered a bug that was silently ignored | 14:14 |
skullman | concurrency failures make me want to quit programming and work in a shop or something | 14:15 |
tristan | wait a sec | 14:15 |
tristan | oh no, if you hit the terminator codepath, there should be no opportunity to land in the finally: clause | 14:16 |
tlater | Don't quit *just* yet, skullman ;p | 14:16 |
tristan | hah, I always quite liked concurrency synchronization bugs | 14:16 |
tristan | it's ref count imbalances which drive me nuts :) | 14:16 |
* skullman wishes mount returned a mount ID and umount required it, so you'd at least get more obvious double-umount errors | 14:17 | |
juergbi | tristan: maybe I'm missing something but isn't the key return value of __extract() essentially unused? | 14:18 |
juergbi | it's either ignored or the same as what's passed as input key | 14:18 |
juergbi | ah, not in the None case | 14:18 |
juergbi | nm | 14:19 |
tristan | juergbi, yeah I was considering splitting that part of __extract() out | 14:19 |
tristan | juergbi, but there is so much noise already of "get the key" functions for now, that I wanted to just let it be | 14:20 |
juergbi | the whole keyed caching makes things not as trivial as expected | 14:20 |
tristan | well, I wanted to cache under both keys | 14:20 |
tristan | mostly, there is a lot of boiler plate, indeed | 14:20 |
tristan | to have nice separate functions with different names which return practical things | 14:21 |
tristan | skullman, I wonder if your try/finally blocks which be *within* the terminator context managers | 14:22 |
tristan | skullman, also, looking at https://gitlab.com/BuildStream/buildstream/merge_requests/375/diffs#9bff43db2e8aefc8dbfb4c1068b99f443647bc0d_146_145 ... | 14:22 |
juergbi | right. I think the changes are fine | 14:22 |
tristan | skullman, on that line it looks like if mount() fails, you anyway do a finally: umount() | 14:23 |
tristan | skullman, shortening the "try" blocks to be more specific may be a good idea | 14:23 |
skullman | tristan: that umount will need to be done whether that mount happens or not, since the mount is outside the try block, and that one just changes the flags. | 14:24 |
tristan | skullman, this one ? https://gitlab.com/BuildStream/buildstream/blob/621d0d85eede4ce824be4f35ecf15df516535ff7/buildstream/sandbox/_mounter.py#L145 | 14:27 |
* tristan links to file instead of MR | 14:27 | |
skullman | tristan: my rationale for putting the context managers inside the try-finally rather than the other way around was that I was at least able to reason about what happens if it gets terminated after the context ends but before the finally, that it won't get umounted, while termination having an umount after umount means it could unmount something underneath what was stacked | 14:27 |
tristan | skullman, by the looks of it, both _mount and _umount do things, and can raise an exception | 14:27 |
skullman | tristan: yeah, that's the mount call I meant | 14:27 |
tristan | if that one fails, you probably dont want to call umount | 14:28 |
tristan | you probably just want to reraise the exception no ? | 14:28 |
tristan | the terminator logic can be daunting indeed, I think when we use it for launching processes, we use it outside the block, and we only terminate the process if the process pointer has been assigned | 14:29 |
tristan | but that's indeed not helpful without an id/handle representing the mount | 14:30 |
skullman | I don't think I follow. I'd expect with `bind_mount()` that any mount it made would be unmounted on context exit. Since a bind-mount is created before the yield in the try-finally, we need to try after mounting and umount in the finally. | 14:30 |
skullman | If it fails to make the bind-mount's propagation rslave, then I'd want to roll back the context before yielding, so it'll umount and then reraise. | 14:31 |
tristan | ahhhh | 14:32 |
*** xjuan_ has joined #buildstream | 14:32 | |
tristan | skullman, I missed the part that the second call to _mount() was only modifying a property of the mount, sorry | 14:33 |
gitlab-br-bot | buildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/381 | 14:34 |
*** xjuan has quit IRC | 14:35 | |
tristan | skullman, I'm not sure if this is helpful for debugging but maybe (and would be good anyway)... | 14:39 |
tristan | if we went into _exceptions.py and made SandboxError() support the newer `detail` kwarg which is available in the BstError base class... | 14:40 |
tristan | and then if we took the stdout/stderr from the mount calls and put them in that detail arg... | 14:40 |
tristan | we might see the errors better than just "mount returned exit status 255" or such | 14:40 |
tristan | really unsure if that's helpful here though :-S | 14:41 |
skullman | right now I'm just trying to get it to happen again to rule out neutrino strike | 14:42 |
skullman | it may be useful if it turns out we've hit some obscure corner case where mount can't actually do anything, but it's most likely that the mount doesn't exist | 14:42 |
*** Prince781 has quit IRC | 15:06 | |
*** Prince781 has joined #buildstream | 15:08 | |
*** xjuan_ has quit IRC | 15:13 | |
skullman | ok, so test_push fails reliably on my system since rebasing, but if I roll back my change it still happens, so could be something about my environment | 15:24 |
skullman | going to try and get test_stack to happen | 15:24 |
*** Prince781 has quit IRC | 15:31 | |
tlater | Hm, I'm struggling to understand the exit condition for the scheduler | 15:42 |
tlater | This function here suggests that it exits once that while loop finishes: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/_scheduler/scheduler.py#L318 | 15:42 |
tlater | But it only conditionally stops the loop | 15:43 |
tlater | Yet there's nothing to suggest that there's any other condition to stop if that doesn't happen: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/_scheduler/scheduler.py#L125 | 15:43 |
tlater | (Outside of the user ^C-ing out) | 15:43 |
tlater | So as far as I understand, once buildstream runs out of processes to move between queues (which it will eventually, if they are just stuck building for example) it will keep running forever - but that's not right. Could someone point me to the real exit condition? | 15:45 |
*** noisecell has quit IRC | 15:46 | |
juergbi | tlater: check Scheduler.sched() | 15:48 |
juergbi | it exits if no queues have active jobs | 15:49 |
juergbi | 'ticking' | 15:49 |
tlater | juergbi: That function is only run once though | 15:49 |
tlater | It only checks exactly once whether there's still something ticking | 15:49 |
juergbi | no, Queue.job_done() calls that as well | 15:49 |
tlater | Ah | 15:49 |
*** noisecell has joined #buildstream | 15:49 | |
tlater | ta juergbi, I overlooked that | 15:50 |
tristan | looks like gnome-build-meta master does already depend on freedesktop-sdk... progress \o/ | 16:03 |
tlater | :O | 16:05 |
tlater | Looks like we can soon change the base.bst file for our tests :) | 16:05 |
tristan | I wouldnt be in a hurry - I'd honestly prefer that the base.bst file depend on a published BaseSdk, once it's honed down to a nice and tiny size | 16:07 |
tristan | and not inter-project depend for that purpose | 16:07 |
tristan | but yeah, it's coming :) | 16:07 |
tristan | w00t, my desktop has slowed down to a nice and reasonably slow speed | 16:08 |
tristan | a speed that you would expect under "regular circumstances", i.e. like when you are normally compiling gcc, glibc, webkit and spidermonkey at -j8 each in the background | 16:09 |
tristan | ahhh, life is back to normal :) | 16:09 |
*** noisecell has quit IRC | 16:11 | |
*** toscalix has quit IRC | 16:17 | |
*** bethw has quit IRC | 16:18 | |
bochecha | tristan: what kind of a monster machine do you have? :o | 16:25 |
tristan | bochecha, pretty strong one, 4 dual thread cores up to 3.4GHz and plenty of ssd :) | 16:29 |
tlater | tristan: Is that in a laptop? | 16:30 |
*** xjuan_ has joined #buildstream | 16:30 | |
tristan | I find that -j8 on 4 parallel builds is a decent balance, because most of the time 2 or 3 builds are doing ./configure | 16:30 |
tristan | would be good to get a job server and some standard across build systems, though | 16:31 |
tristan | tlater, yep | 16:32 |
* tlater is impressed | 16:32 | |
tlater | I just run my gentoo builds on my PC, which does things fine, but it's a pain when I'm not in the UK | 16:33 |
tristan | tlater, I was hoping that I could get at least 75% power out of a thin quad core thinkpad yoga this year (plugged in), but two things happened: A.) heat dissipation problem, meaning more throttling... B.) meltdown (so who wants to buy cores this year ?) | 16:34 |
*** jonathanmaw has quit IRC | 16:34 | |
gitlab-br-bot | buildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/372 | 16:36 |
jjardon | Hi, is there someone around to review https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/33 ? | 16:37 |
tlater | jjardon: I can take a peek if you like | 16:37 |
jjardon | tlater: thanks! | 16:37 |
tlater | jjardon: Why do we need these calls now? https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/33/diffs#587d266bb27a4dc3022bbed44dfa19849df3044c_69_73 | 16:41 |
tlater | In fact, I don't think we want a latest tag for the testsuite images | 16:41 |
jjardon | tlater: not strictly needed, but its conviniant for some things | 16:42 |
jjardon | tlater: see latest commit, to use docker caching mechanism | 16:43 |
tlater | Oh, whoops, not sure why I thought it was all one commit | 16:43 |
tlater | Ah, I see what you mean... | 16:43 |
tlater | Hm | 16:43 |
* tlater was under the impression that we stored our old images on the runner | 16:44 | |
tlater | So that technically wouldn't be required | 16:44 |
*** Prince781 has joined #buildstream | 16:44 | |
jjardon | we dont | 16:44 |
tlater | It also seems to cut build times in half, so yeah, I don't see a reason not to | 16:45 |
jjardon | its in the gitlab cache but thats: 1slow 2not garantee to be always there | 16:45 |
tlater | Yeah, I agree, this is probably better | 16:45 |
tlater | And then we do need a latest tag for the testsuite images to support this | 16:46 |
* tlater wanted those anyway because it makes scripting stuff for dev use easier | 16:46 | |
tlater | Approved! I really like the changes jjardon, will make things much neater in the future, tyvm :) | 16:47 |
jjardon | np! thanks for the quick review tlater ! | 16:47 |
jjardon | tlater: would you mind reviewing https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/32 al well (its quite trivial this one) | 16:48 |
tlater | Less comfortable with that one because it's not just technical, but I'll have a look | 16:49 |
tlater | jjardon: One comment, otherwise also fine with this one, actually | 16:56 |
tlater | Maybe the whole thing can also be squashed into a single commit | 16:57 |
jjardon | tlater: sure, I have pushed a new version to make it more explicit | 17:00 |
tlater | jjardon: If you squash the commits I'll approve this one too, don't think there's much value in exactly explaining every split here :) | 17:00 |
jjardon | sure, done | 17:02 |
jjardon | tlater: ^ | 17:02 |
*** dominic has quit IRC | 17:03 | |
tlater | Awesome, approved now | 17:03 |
*** noisecell has joined #buildstream | 17:06 | |
jjardon[m] | thanks! | 17:09 |
*** bethw has joined #buildstream | 17:42 | |
*** Prince781 has quit IRC | 18:15 | |
tristan | hmmm, status area even does not display full element names | 18:45 |
tristan | ah, I may be wrong but that might of been worth changing along with todays cache key break, I think removing that normal name nonsense also implied a cache key break :-S | 18:46 |
tristan | missed opportunity, oh well | 18:47 |
jjardon | tristan: https://www.phoronix.com/scan.php?page=news_item&px=RHEL-8-No-Python-2 | 18:51 |
jjardon | tristan: is RHEL serious enough? :) | 18:52 |
*** valentind has quit IRC | 19:15 | |
*** valentind has joined #buildstream | 19:15 | |
*** bethw has quit IRC | 19:26 | |
*** bethw has joined #buildstream | 19:50 | |
*** bethwhite_ has joined #buildstream | 20:19 | |
*** cs_shadow_ has joined #buildstream | 20:20 | |
*** jjardon_ has joined #buildstream | 20:21 | |
*** bethw has quit IRC | 20:21 | |
*** hergertme has quit IRC | 20:21 | |
*** cs_shadow has quit IRC | 20:21 | |
*** jjardon has quit IRC | 20:21 | |
*** benbrown has quit IRC | 20:21 | |
*** laurence has quit IRC | 20:21 | |
*** awacheux[m] has quit IRC | 20:21 | |
*** abderrahim[m] has quit IRC | 20:21 | |
*** juergbi has quit IRC | 20:21 | |
*** waltervargas[m] has quit IRC | 20:21 | |
*** csoriano has quit IRC | 20:21 | |
*** ernestask[m] has quit IRC | 20:21 | |
*** ssssam[m] has quit IRC | 20:21 | |
*** segfault3[m] has quit IRC | 20:21 | |
*** theawless[m] has quit IRC | 20:21 | |
*** cgmcintyre[m] has quit IRC | 20:21 | |
*** pro[m] has quit IRC | 20:21 | |
*** connorshea[m] has quit IRC | 20:21 | |
*** m_22[m] has quit IRC | 20:21 | |
*** alatiera has quit IRC | 20:21 | |
*** rafaelff[m] has quit IRC | 20:21 | |
*** jjardon[m] has quit IRC | 20:21 | |
*** bochecha has quit IRC | 20:21 | |
*** kailueke[m] has quit IRC | 20:21 | |
*** ptomato[m] has quit IRC | 20:21 | |
*** asingh_[m] has quit IRC | 20:21 | |
*** inigomartinez has quit IRC | 20:21 | |
*** mattiasb has quit IRC | 20:21 | |
*** cs_shadow_ is now known as cs_shadow | 20:21 | |
*** jjardon_ is now known as jjardon | 20:21 | |
*** csoriano has joined #buildstream | 20:22 | |
*** benbrown has joined #buildstream | 20:24 | |
*** laurence has joined #buildstream | 20:27 | |
*** juergbi has joined #buildstream | 20:27 | |
*** hergertme has joined #buildstream | 20:27 | |
*** inigomartinez has joined #buildstream | 20:29 | |
*** bethwhite_ has quit IRC | 20:33 | |
*** mattiasb has joined #buildstream | 20:37 | |
*** ptomato[m] has joined #buildstream | 20:42 | |
gitlab-br-bot | buildstream: issue #354 ("Allow configuring an element to not be tracked by --track-all") changed state ("opened") https://gitlab.com/BuildStream/buildstream/issues/354 | 20:47 |
*** bethw has joined #buildstream | 20:51 | |
*** awacheux[m] has joined #buildstream | 20:52 | |
*** bethw has quit IRC | 21:04 | |
*** awacheux[m] has quit IRC | 21:05 | |
*** ptomato[m] has quit IRC | 21:05 | |
*** mattiasb has quit IRC | 21:05 | |
*** inigomartinez has quit IRC | 21:05 | |
*** noisecell has joined #buildstream | 21:11 | |
*** noisecell has quit IRC | 21:11 | |
*** bethw has joined #buildstream | 21:18 | |
*** aday has quit IRC | 21:22 | |
*** xjuan_ has quit IRC | 21:23 | |
*** bethw has quit IRC | 21:32 | |
*** rafaelff[m] has joined #buildstream | 21:34 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!