IRC logs for #buildstream for Wednesday, 2018-04-11

gitlab-br-botbuildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/37200:10
*** Prince781 has joined #buildstream00:20
*** jsgrant has joined #buildstream01:24
*** jsgrant has quit IRC01:26
*** Prince781 has quit IRC01:50
*** Prince781 has joined #buildstream02:05
*** Prince781 has quit IRC02:42
*** tristan has joined #buildstream05:06
gitlab-br-botbuildstream: merge request (test-run-doc->master: HACKING.rst: Add integration and pytest notes) #319 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/31906:49
gitlab-br-botbuildstream: merge request (test-run-doc->master: HACKING.rst: Add integration and pytest notes) #319 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/31907:09
gitlab-br-botbuildstream: merge request (milloni/assert-bwrap-version->master: Assert minimum bwrap version on host) #379 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/37907:10
*** toscalix has joined #buildstream07:57
tristanjuergbi, ok so I first fixed that "running files" serialization, with the thought in mind that serializing where sources were staged relative to the sandbox in the workspaces.yml was the right solve for `bst shell --sysroot`08:10
tristanfixing that serialization of writes was important, but now I think maybe I had the wrong solve for `bst shell --sysroot`08:10
tristanConsider that; first of all we want to cache artifacts of failed builds08:11
tristanAlso consider that... a failed build sandbox is probably a very specific state08:11
tristanjuergbi, so I'm starting to think that; instead, after a build fails and we intend to give the user a sysroot snapshot of that build failure... maybe we should just be copying the workspace files into the mount location, and not mount at `bst shell --sysroot` time ?08:13
tristanIt would require I guess a utils.copy_files(...preserve_mtime=True) semantic08:13
tristanany thoughts on that ?08:13
tristanfor reference: https://gitlab.com/BuildStream/buildstream/issues/346 <-- that issue08:14
juergbitristan: hm, that would be a user-visible change, though08:15
gitlab-br-botbuildstream: merge request (milloni/assert-bwrap-version->master: Assert minimum bwrap version on host) #350 changed state ("closed"): https://gitlab.com/BuildStream/buildstream/merge_requests/35008:15
juergbiusers might rely on read-write workspaces08:15
juergbibut maybe they shouldn't08:16
juergbithis would be in line with my idea of separating workspace sources and build directory08:16
juergbifrom that perspective it sounds fine to me. just wondering whether there are use cases where people really need the mounted workspace08:17
tristanjuergbi, I think a user visible change was that `bst shell --sysroot` became completely broken when we started mounting the workspaces into the sandbox08:18
tristanjuergbi, the question is rather what is the appropriate fix; I feel like the appropriate fix is to give the user a sysroot that is actually representative of what failed08:19
tristannot a sysroot that "might have changed because the user edited a workspace file"08:19
juergbitrue, I think that makes sense08:19
tristanwhich would be more in line with what we will be doing once we solve caching failed builds08:20
juergbifor the sysroot case08:20
juergbiwhat about bst shell without sysroot?08:20
tristani.e. when we cache failed builds, we should be removing the sysroots completely I think, and always cleaning up the build directory08:20
tristanbst shell without sysroot is not broken08:20
juergbiyes, so we keep mounting it there, right?08:21
tristanit stages everything on demand, delegates to Element.stage() and everything08:21
tristanright, we keep mounting it there08:21
juergbiso that will still need the running_files update08:21
*** segfault3[m] has joined #buildstream08:21
tristanjuergbi, right: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/element.py#L522 :)08:22
tristaninterestingly, that code path cannot get hit with `--sysroot` anyway08:22
tristanbecause nothing is staged when shelling into an existing sysroot08:23
segfault3[m]Hi, I'm having issues with buildstream. I tried building both gtk+ and GNOME Shell on Debian Stable and Debian Testing, it always fails with this message:08:23
segfault3[m]FAILURE [base/cyrus-sasl.bst]: tar source at base/cyrus-sasl.bst [line 48 column 4]: Error mirroring ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz: <urlopen error ftp error: error_perm('500 OOPS: failed to open vsftpd log file:/var/log/vsftpd.log',)>08:23
tristansegfault3[m], that looks like a legitimate error08:24
tristansegfault3[m], i.e. you're getting 500 from the upstream site08:25
tristansegfault3[m], wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz is also reporting errors for me08:25
tristanhttps://bpaste.net/show/0aae6b299b9c08:26
segfault3[m]ok, thanks, I will ask the cyrusimap devs then08:27
tristansegfault3[m], this is a typical issue, and we're currently working on a mirroring solution here https://gitlab.com/BuildStream/buildstream/issues/32808:27
tristanso we hope that we can setup alternative mirrors and easily enable mirroring such that we can fallback to alternative mirrored locations08:27
segfault3[m]sounds good08:28
tristansegfault3[m], oh, if you have a means of actually obtaining the tarball, a workaround would be to locally set the 'url' field in `base/cyrus-sasl.bst` of it to a local file:// url08:31
tristanah, we already use shutil.copystat() in safe_copy(), so no need to explicitly preserve mtime08:47
gitlab-br-botbuildstream: merge request (135-expire-artifacts-in-local-cache->master: WIP: Resolve "Expire artifacts in local cache") #347 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/34708:57
segfault3[m]tristan: I was able to download the tarball from https://www.cyrusimap.org/releases/cyrus-sasl-2.1.26.tar.gz, but I can't find the cyrus-sasl.bst on my system09:02
jmacI've had this problem09:03
jmacCyrus-sasl's ftp site has been down for several days09:04
jmacThe bst file should be sdk/elements/base/cyrus-sasl.bst09:04
jmacActually that's freedesktop-sdk, which you may not be using09:05
segfault3[m]I'm using gnome-build-meta, it only has freedesktop-sdk-junction.bst and linker-priority.bst in elements/base/09:06
*** jonathanmaw has joined #buildstream09:06
gitlab-br-botbuildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/38009:12
tristanjuergbi, wanna take a look at https://gitlab.com/BuildStream/buildstream/merge_requests/380 ?09:12
juergbisure09:13
tristanoops linting error, I'll have to repush, but it's certainly something minor09:15
tristanmaybe a line too long09:15
* tristan will look into optimizing "Calculating cached state" and hopefully improve that today09:15
tlaterjuergbi: I'm not sure how to tackle concurrency on !347 - if we want to take advantage of cached artifact cache sizes, and don't want to accidentally start deleting artifacts from another element's build, the data in the main process needs to be updated as fast as possible09:16
tlaterSo an approach like how tristan dealt with the workspace metadata doesn't seem like it's going to work09:16
tlater09:16
tristantlater, first of all, half of your problem doesnt make sense09:16
tristantlater, all of the artifacts which you are going to create in a build should be pinned anyway09:17
tristanit may be a corner case for workspace artifacts, which should be added to the "pinned" list as soon as the cache key is discovered09:17
tristantlater, juergbi; I'm not sure I like the idea of abusing the IPC and synchronizing things for the purpose of LRU expiry09:19
tristanit sounds like this would fit better into the existing architecture of queues09:19
tristantlater, if you had an entirely separate task, who's job was just to cache an already built artifact, where the dispatch mechanism has some opportunity to reconcile, wouldnt that make things easier ?09:20
tlaterYeah, I do think so, dispatching "cache" tasks would allow synchronizing cache sizes after each operation, instead of between processes09:20
juergbitristan: !380 makes sense to me. I'm wondering whether this might be a bit expensive for git sources09:21
tristan*maybe* treating the main process as a sort of "cache size counter server" to tasks, and doing IPC from task -> parent would also work09:21
tristanjuergbi, it might be expensive for linux.git09:22
tristanre the above, looking at program flow: Compare a bunch of child processes doing assemble, and all wanting the attention of the main process to periodically negotiate a cache size counter...09:22
juergbitlater: can't we do it in _assemble_done?09:22
juergbiin the main process09:22
tristanwith a program flow where caching the artifacts is done in a separate queue09:23
tlaterjuergbi: Wouldn't we have synchronization issues then, where some processes think the cache is larger than it is?09:23
tristanjuergbi, you want to hold up the main process with deletion of artifacts ?09:23
tristanactually, that doesnt work I think09:23
juergbino, just pin the cache key for workspaces09:23
tristanyou need to delete artifacts from the cache to make space for new incoming artifacts09:23
tlaterAh, I see what you mean09:23
tristanah, that part yes09:23
juergbiah, synchronization of the deletion is still an issue09:23
tlaterYeah :)09:24
juergbithe thing is, committing to ostree doesn't actually consume additional space09:24
juergbiif the build result already exists09:24
tristantlater, it's tempting to go with "Add some IPC sugar, because I dont want to split up this big task into two tasks"09:24
tristantlater, I dont want us to be lazy that way; instead think of what is the cleaner program flow09:25
juergbiso maybe keep committing in the build job but have a separate (serial) queue for deletion if necessary?09:25
tristanif it really is cleaner to add some IPC (I feel like it's doubtful, though), then we go that route09:25
tristanBut "Lets fix this while modifying the least number of lines of code" is how we get into trouble09:26
juergbiI don't see a point of splitting out committing itself09:26
tristanjuergbi, seeing your point09:26
tristanjuergbi, it's not going to take additional disk space to commit the artifact and delete the build directory; or is it ?09:27
tristanjuergbi, when we commit an artifact, does ostree retain hardlinks to the sources ?09:27
tristanor can it grow ?09:27
juergbithere will be a very short peak as we delete it after committing09:27
juergbino, ostree doesn't take hardlinks to keep things safe09:27
juergbiiirc09:27
tristanright, that makes sense, so technically, we bust our limits if we take that approach, and we are kind of lying about the quota09:28
juergbitheoretically, we could commit and delete on a file by file basis09:28
* tlater doesn't like this for the case in which we're limited by disk space09:28
juergbiso the peak would just be a single file09:28
juergbilonger term with FUSE we might actually be able to hardlink/rename if we control everything09:29
tristanjuergbi, rather, if we have a separate queue/task which which deletes-things-if-necessary, what would be the difference in that, from just committing to cache in a separate queue ?09:29
tristanseems like not much of a big difference09:29
juergbiseems more complex to me09:30
juergbithe deletion aspect would also be required for pull and with remote execution09:30
juergbibut separate committing makes much less sense in general09:30
juergbiregarding peak disk usage: if we don't hold off parallel build jobs, we can't control peak disk usage anyway09:31
juergbiwe'll always need some extra space for working09:31
juergbiand we must keep the case in mind where multiple bst sessions are running with the same local cache09:32
juergbii.e., we can't control this all anyway09:32
tlaterjuergbi: That's what the 2GB headroom is for - I don't think that's quite enough for artifacts using the tar cache, though09:32
juergbiwe can hopefully replace the tar cache, so let's ignore it for architecture questions09:33
tristantlater, the headroom was originally a consideration that the user *might* have sources on the same volume09:33
tristantlater, and because we wanted to have a sensible "all" default value09:33
tristanA.) We know the maximum space an artifact can potentially consume in the cache... B.) We know the size of the cache... C.) We (ungracefully) consider the build directory, to *not* be the cache (it can in fact be on a separate volume, but that's slow)09:35
tristanfor what is the cache, we can respect the limit; except for this synchronization issue09:35
juergbithat also assumes no multiple concurrent bst sessions09:36
juergbithe whole pinning actually as well09:36
tristanYes I noticed that09:36
tlaterjuergbi: We work around that slightly09:37
tristanjuergbi, to mitigate this, I asked that tlater mark every artifact that was going to be used, as used; immediately09:37
tristanjuergbi, lessening the chance of such a parallel build deletion situation09:37
juergbiright09:37
tristanin the end, we *still* have to gracefully handle the out-of-space failures09:37
juergbiyes, and also missing artifacts09:38
juergbi(in case the quota is too small for the session)09:38
tlaterjuergbi: Currently that's considered a failed build, although it will remove any artifacts not required in the pipeline.09:39
tlater(Hard to work around that latter bit, but this should be a very rare case)09:39
juergbiyes, failing is fine but we might need a suitable hint in the error message09:40
tlaterI do have a specific error message for that, could take some rewording09:40
juergbijust saying we can't completely ignore it and completely crash and burn09:40
juergbigood09:40
tristanright, that is the graceful part :)09:40
gitlab-br-botbuildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/38009:43
gitlab-br-botbuildstream: issue #346 ("Fails to open build shell after build failure") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/34609:54
gitlab-br-botbuildstream: issue #346 ("Fails to open build shell after build failure") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/34609:54
gitlab-br-botbuildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/38009:54
tlaterHm, so, I think I like the idea of a deletion queue. We could hold off on committing a handful of element's artifacts and queue a "please make this much space in the artifact cache" request.09:56
tlaterThat would reduce the number of times we need to calculate the cache size09:56
tlaterAs well as help reduce the number of calls to ostree.repo.prune09:57
tlaterPlus as juergbi says, this is more flexible than trying to do this while we're deleting, it could be reused very easily.09:58
tlaterSome synchronization to figure out how much space we want to reserve would still need to happen for the build queue09:58
tlaterBut the major problem of trying to figure out how much space there is in the cache would be a lot easier09:59
tlaterThoughts?09:59
tlaterI think the deletion queue wouldn't need any concurrency, actually09:59
tlaterSince we'd just need to figure out what artifacts we want to delete, and do them all at once.10:00
*** aday has joined #buildstream10:02
tristantlater, Sure, I dont mind so much except that it seems that it *could* respect the limit, while this approach is a bit more fuzzy on that10:06
tristantlater, but I dont mind so much, it will be more performant this way anyway10:07
tristantlater, one thing is, we probably wont be treating this as a regular "Queue", so we may need other ways to launch a Job10:07
tristanthat will probably change the architecture somewhat, interested to see what you have in mind...10:08
tristantlater, maybe it can be a regular queue with a special attribute such that it is not displayed in any status parts of the UI10:14
tristani.e. it makes no sense to see "You skipped the deletion of 200 elements" reported, or to see the queue in the status area in interactive mode10:15
tristantlater, also the log lines are weird in this case, they will automatically print the element in context being processed in that queue, but that element is not getting deleted10:16
tlaterHm, right. I'm wondering if this actually is a queue, in that sense.10:17
tristanmaybe it will print "Making room in the cache", and then a message "detail" listing the artifacts it's deleting10:17
*** dominic has joined #buildstream10:18
tristanyeah that's what I mean, we could hijack the queue; but we'd have to special case the UI10:18
tristantlater, on the other hand, there is already some good cases for leveraging the scheduler without queues10:18
tlaterReally, the deletion method doesn't need to know anything about the elements it wants to make space for, except the total maximum size of their artifacts.10:18
tristanthere are some places we've wanted that, I think initialization is slow because we dont even interrogate source consistency state in parallel10:18
tristantlater, so having a way to spawn parallel tasks with the scheduler, without them being "elements in queues" is already a thing we can use10:19
tristanif we do it, it should be streamlined in such a way that there is no code duplication between Queues and NewThing10:20
tristanprobably Queues uses the NewThing, to handle job completion and such parts10:20
tristanok, time to refactor artifact metadata10:33
tristanlets split this all into separate files10:34
tristanjuergbi, oddly, your comment here https://gitlab.com/BuildStream/buildstream/issues/294#note_62585246 talks about loading meta/artifact.yml ... but Element._get_artifact_metadata() in *only* called at staging time10:35
tristanmaybe there is some justification for reading the meta/artifact.yaml via a separate code path, dunno...10:36
* tristan hunts it down and gets to work...10:36
juergbitristan: _update_state()10:37
juergbi                # Load the strong cache key from the artifact10:37
juergbi                metadir = os.path.join(self.__extract(), 'meta')10:37
juergbi                meta = _yaml.load(os.path.join(metadir, 'artifact.yaml'))10:37
juergbi                self.__cache_key = meta['keys']['strong']10:37
juergbishould probably have used the helper10:37
juergbimaybe it didn't exist yet10:37
tristangah10:56
tristanlooks like it's manually extracted all over the place10:56
tristanI should note too, that the artifact.yaml is getting parsed *every time*11:05
tlatertristan: The way I imagined this was that the build queue would be paused until there's enough space for a set of elements - this doesn't look particularly possible with the scheduler atm11:05
tlater^Re the concurrent deletion thing11:05
tristantlater, see Queue.status()11:07
tristantlater, it can return QueueStatus.WAIT11:07
tristantlater, I'm not saying that this can be done without a global revisiting of how scheduling is done, and probably some minor changes11:08
tristanin any case, ability to run jobs outside of the context of a Queue is already a big change11:08
tristanjuergbi, every cached artifact has both a weak and a strong key encoded right ?11:18
juergbiyes11:18
juergbi(they may be identical, though)11:19
tristansure11:19
tristanjuergbi, think it's at all possible that that codepath in _update_state() is getting called multiple times recursively ?11:20
tristanstill, I'm adding Element.__metadata_cache hashtable11:20
tristanjust wondering if that might be enough to fix performance11:21
tristanonly load metadata once per artifact would be a good start11:21
tristanEh, splitting up the files is still a good idea, but I'll start with cache as first step, and follow that pattern while splitting11:22
juergbithat part in _update_state() should never be invoked multiple times for the same element11:23
juergbias it sets the cache key and will never unset it11:23
juergbi(at least in a single process)11:24
juergbiso I don't think a cache helps here but it might make sense in general11:24
tristanyeah there is other workspace bits which ask for the metadata11:25
tristanso it will still help11:25
tristantlater, ...11:36
tristantlater, seeing as you were involved in the incremental builds stuff, and are now doing local LRU expiry... I should make sure...11:36
tristantlater, ... I should just double check and make sure you are not going to break this: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/element.py#L50111:38
* tlater will check in a minute, is on the go but phone doesn't have enough ram to render files on gitlab11:40
tristantlater, incremental builds in workspaces expect that you can load the last successful build metadata from the artifact cache at stage time11:41
tristantlater, so when that metadata suddenly disappears, it's important to not break.11:41
tristantlater, I'm more than half expecting an exclamation on your part, please feel free to use foul language :)11:42
*** marcledilson has joined #buildstream11:43
tlaterOh11:55
* tlater curses internally, easier to avoid doing so on IRC x)11:55
tristantlater, I dont know, maybe you just need to wipe that incremental stuff to some kind of "clean slate" once the last_successful goes missing11:57
tlaterHrm11:57
tlaterThat would involve knowing what the build system changed and what was changed by the user11:57
tristanI dont know that it would, it would rather involve ensuring that everything staged appears to be *new*, wouldnt it ?11:58
tlaterOh, a new mtime on all files?11:58
tlaterYeah, that works11:58
tristanI havent followed what you've been doing closely, but seems that way11:59
tlaterta, I'll make sure the expiry branch doesn't break here11:59
tlater(In fact, I guess if someone manually deleted artifacts this would already break)12:00
* tristan really has to complete the privateness refactor12:00
tristanElement is still in a total mess, with plenty of really private things exposed with only a single underscore12:00
tristanFor this patch, I'm following current style and not paying heed to private policy12:01
tristanbetter to fix that separately12:01
tlatertristan: On that actually, I don't quite think the hacking file is clear enough on that12:03
* tlater at least doesn't quite understand where we make a distinction between "local private" and "API private"12:03
tlaterAt least before the refactor I assumed we only marked API private specially, but the wording now seems a bit confusing12:04
tristantlater, what don't you understand ?12:05
tristantlater, API Private is pretty clear, it means not exposed in public API12:06
tristantlater, local private means that it is not used outside of the current *scope*12:06
tlaterDo we mark *both* of those with a leading underscore?12:06
tristantlater, I'll be happy accept patches to the HACKING file12:06
tristanYes we do12:06
tristantlater, with double underscore in the few cases where there is ambiguity12:07
tlaterOk, that makes sense then :)12:07
tristanthere is ambiguity when a class is public, and has internal things which are *also* local private (need 2 underscores)12:08
tlaterAnnoyingly, that makes those variables unusable in subclasses12:08
tlaterSo something like element.py might struggle using them12:08
tristanthose things need to have single underscore things which are API private but not local private (like Element._assemble_done() for instance, needs to be called from BuildQueue, but is not public API)12:08
tristantlater, that is the *whole point*12:09
tlaterDoes an implementing class not count as local scope?12:09
tristantlater, it is not local private if it is accessed from somewhere else, like in another file, or a subclass12:09
tristanNo, "scope" is something pretty clearly defined :)12:09
tristanit doesnt include fancy shmancy thingamabobs that Java, C++ and such languages add, this is just python :)12:10
* tlater thought in object scopes, as the subclass will actually be operating in a copy of its parent's scope, but doesn't further question this :)12:10
tristanjuergbi, anything else you'd like to do while reving the artifact version ?12:36
tristanjuergbi, I'm going to do a little bit more cleanup, at least how cache keys are calculated for sources is right now a bit weird, worth changing if there is a rev bump12:37
* tristan thinks out loud that at this time... he would really love to see a *graph* of the performance of loading pipelines in non-strict mode12:47
tlaterGah, I've been designing something that relies on knowing how large ostree artifacts are :|12:54
tlatero\12:54
skullmanbah, worked out how to retry a pipeline, but it doesn't automatically merge after a retry12:56
skullmanhttps://gitlab.com/BuildStream/buildstream/merge_requests/375 will need someone with elevated permisisons to prod it12:56
tristanskullman, it looks like we're hitting some kind of weird gitlab bug :-S12:59
tristanit's telling me I set it to be merged12:59
tristanand the pipeline passed12:59
tristanbut nothing to merge it12:59
tristanskullman, I take it you've rebased and it now passes ?12:59
tristanok wait, I've canceled automatic merge, and now it tells me I need a rebase again13:00
tristanskullman, note that I rebased in the UI the other day, and that's when it failed, I suspect you re-pushed this without local rebase maybe ?13:00
tristantry rebase it locally and repush that13:01
skullmanI didn't rebase or push. I fetched the rebase you did via the UI and tested that locally and it worked, so got it to retry.13:01
* skullman will rebase13:01
*** xjuan has joined #buildstream13:02
tristanstrange13:04
tristanstill a weird gitlab bug occurred :)13:04
tristanshould I try UI rebase ?13:14
skullmanI'm currently running tests locally on a manual rebase.13:15
skullmanI don't expect it to fail.13:16
skullmanI'm fine with either waiting for a local pass, or not.13:16
gitlab-br-botbuildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/38113:17
tristanjuergbi, want to take a look at #381 ?13:18
tristanjuergbi, I want to merge that and ask mcatanzaro to observe performance after...13:19
tristanmaybe that's bad, but the split is generally a good strategy I think13:19
*** jonathanmaw has quit IRC13:23
*** jonathanmaw has joined #buildstream13:32
gitlab-br-botbuildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/37213:34
juergbitristan: taking a look13:35
*** bethw has joined #buildstream13:35
*** marcledilson has quit IRC13:41
*** Prince781 has joined #buildstream13:48
* tristan leaves closing coffee shop...13:55
tristanwill appear in a moment13:55
skullman>:( there's non-deterministic test failures in tests/frontend/push.py and tests/integration/stack.py13:56
tlater:|13:56
skullmantests pass after a rebase with master, just not always13:56
tlaterDo you have a CI run for them? I've seen some such failures, though not recently.13:57
gitlab-br-botbuildstream: merge request (jjardon/update_depencies->master: docs: Update list of dependecies) #373 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/37313:58
skullmantlater: https://gitlab.com/BuildStream/buildstream/-/jobs/62223526 contains a stack.py fail13:58
*** tristan has quit IRC13:58
jjardonCan someone review https://gitlab.com/BuildStream/buildstream/merge_requests/373/ and the docker counterpart of this https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/32 , please?13:59
tlaterHm, that's not an integration test, so unlikely to be a problem with cleanup13:59
skullmantlater: https://paste.baserock.org/reyobuwuwe for logs of another failed test_stack, https://paste.baserock.org/uqojayotof for a test_push14:00
*** tristan has joined #buildstream14:01
*** bethw has quit IRC14:02
*** bethw has joined #buildstream14:02
tlaterskullman: Sorry, I've only seen the last issue before, everything else you probably have better context on than me currently14:02
tlaterAs for that last one, the only time I've seen it occur was when I broke cache keys for pushed artifacts14:03
tlaterBut I doubt that's the issue here, considering the previous two traces14:03
tlaterThe ruamel exceptions are red herrings, btw, I think they pop up with our specific combination of pylint+ruamel in the debian-8 image14:04
* tlater has tried to fix them before, but couldn't find a working combination14:04
tristanI have a regular "hang" in test_push_pull when run locally, but seems to never happen in CI14:04
tristanhappens maybe once out of 2014:04
tristanor less14:04
tristanbut still often enough that I notice14:05
tristanjuergbi, still not sure eh :)14:06
juergbiI actually wanted to test it locally but that's not so quick, so let's just review it14:07
tristanAre you able to easily test that ?14:07
tristanjuergbi, if you want, I'll hold off for a few hours14:07
tristandid you start a build ?14:08
tristanit will take a few hours14:08
tristanI guess14:08
juergbiI was considering it but noticed that I first have to rebase some other stuff, so maybe not a good idea right now14:09
tristanOk, just review, lets get it in and see about performance after14:09
tristanit should help at least14:09
* tristan will also run a GNOME build overnight14:10
gitlab-br-botbuildstream: merge request (jjardon/update_depencies->master: docs: Update list of dependecies) #373 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/37314:10
gitlab-br-botbuildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/38114:11
skullman:/ the test_stack failure from my local build is that umount is failing because there's nothing there. The one in CI is from a mount directory already being there. In CI I can believe that could be a scheduling problem of two runners running the same test on the same system at the same time.14:11
skullmanOn the local failure it implies that there was some code expecting to not perform cleanup in some non-deterministic circumstances.14:12
tristanthe CI runners should be running at least in completely separate docker containers if they run on the same system14:13
tristanfor local failure, it does sound likely that your fix may have uncovered a bug that was silently ignored14:14
skullmanconcurrency failures make me want to quit programming and work in a shop or something14:15
tristanwait a sec14:15
tristanoh no, if you hit the terminator codepath, there should be no opportunity to land in the finally: clause14:16
tlaterDon't quit *just* yet, skullman ;p14:16
tristanhah, I always quite liked concurrency synchronization bugs14:16
tristanit's ref count imbalances which drive me nuts :)14:16
* skullman wishes mount returned a mount ID and umount required it, so you'd at least get more obvious double-umount errors14:17
juergbitristan: maybe I'm missing something but isn't the key return value of __extract() essentially unused?14:18
juergbiit's either ignored or the same as what's passed as input key14:18
juergbiah, not in the None case14:18
juergbinm14:19
tristanjuergbi, yeah I was considering splitting that part of __extract() out14:19
tristanjuergbi, but there is so much noise already of "get the key" functions for now, that I wanted to just let it be14:20
juergbithe whole keyed caching makes things not as trivial as expected14:20
tristanwell, I wanted to cache under both keys14:20
tristanmostly, there is a lot of boiler plate, indeed14:20
tristanto have nice separate functions with different names which return practical things14:21
tristanskullman, I wonder if your try/finally blocks which be *within* the terminator context managers14:22
tristanskullman, also, looking at https://gitlab.com/BuildStream/buildstream/merge_requests/375/diffs#9bff43db2e8aefc8dbfb4c1068b99f443647bc0d_146_145 ...14:22
juergbiright. I think the changes are fine14:22
tristanskullman, on that line it looks like if mount() fails, you anyway do a finally: umount()14:23
tristanskullman, shortening the "try" blocks to be more specific may be a good idea14:23
skullmantristan: that umount will need to be done whether that mount happens or not, since the mount is outside the try block, and that one just changes the flags.14:24
tristanskullman, this one ? https://gitlab.com/BuildStream/buildstream/blob/621d0d85eede4ce824be4f35ecf15df516535ff7/buildstream/sandbox/_mounter.py#L14514:27
* tristan links to file instead of MR14:27
skullmantristan: my rationale for putting the context managers inside the try-finally rather than the other way around was that I was at least able to reason about what happens if it gets terminated after the context ends but before the finally, that it won't get umounted, while termination having an umount after umount means it could unmount something underneath what was stacked14:27
tristanskullman, by the looks of it, both _mount and _umount do things, and can raise an exception14:27
skullmantristan: yeah, that's the mount call I meant14:27
tristanif that one fails, you probably dont want to call umount14:28
tristanyou probably just want to reraise the exception no ?14:28
tristanthe terminator logic can be daunting indeed, I think when we use it for launching processes, we use it outside the block, and we only terminate the process if the process pointer has been assigned14:29
tristanbut that's indeed not helpful without an id/handle representing the mount14:30
skullmanI don't think I follow. I'd expect with `bind_mount()` that any mount it made would be unmounted on context exit. Since a bind-mount is created before the yield in the try-finally, we need to try after mounting and umount in the finally.14:30
skullmanIf it fails to make the bind-mount's propagation rslave, then I'd want to roll back the context before yielding, so it'll umount and then reraise.14:31
tristanahhhh14:32
*** xjuan_ has joined #buildstream14:32
tristanskullman, I missed the part that the second call to _mount() was only modifying a property of the mount, sorry14:33
gitlab-br-botbuildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/38114:34
*** xjuan has quit IRC14:35
tristanskullman, I'm not sure if this is helpful for debugging but maybe (and would be good anyway)...14:39
tristanif we went into _exceptions.py and made SandboxError() support the newer `detail` kwarg which is available in the BstError base class...14:40
tristanand then if we took the stdout/stderr from the mount calls and put them in that detail arg...14:40
tristanwe might see the errors better than just "mount returned exit status 255" or such14:40
tristanreally unsure if that's helpful here though :-S14:41
skullmanright now I'm just trying to get it to happen again to rule out neutrino strike14:42
skullmanit may be useful if it turns out we've hit some obscure corner case where mount can't actually do anything, but it's most likely that the mount doesn't exist14:42
*** Prince781 has quit IRC15:06
*** Prince781 has joined #buildstream15:08
*** xjuan_ has quit IRC15:13
skullmanok, so test_push fails reliably on my system since rebasing, but if I roll back my change it still happens, so could be something about my environment15:24
skullmangoing to try and get test_stack to happen15:24
*** Prince781 has quit IRC15:31
tlaterHm, I'm struggling to understand the exit condition for the scheduler15:42
tlaterThis function here suggests that it exits once that while loop finishes: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/_scheduler/scheduler.py#L31815:42
tlaterBut it only conditionally stops the loop15:43
tlaterYet there's nothing to suggest that there's any other condition to stop if that doesn't happen: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/_scheduler/scheduler.py#L12515:43
tlater(Outside of the user ^C-ing out)15:43
tlaterSo as far as I understand, once buildstream runs out of processes to move between queues (which it will eventually, if they are just stuck building for example) it will keep running forever - but that's not right. Could someone point me to the real exit condition?15:45
*** noisecell has quit IRC15:46
juergbitlater: check Scheduler.sched()15:48
juergbiit exits if no queues have active jobs15:49
juergbi'ticking'15:49
tlaterjuergbi: That function is only run once though15:49
tlaterIt only checks exactly once whether there's still something ticking15:49
juergbino, Queue.job_done() calls that as well15:49
tlaterAh15:49
*** noisecell has joined #buildstream15:49
tlaterta juergbi, I overlooked that15:50
tristanlooks like gnome-build-meta master does already depend on freedesktop-sdk... progress \o/16:03
tlater:O16:05
tlaterLooks like we can soon change the base.bst file for our tests :)16:05
tristanI wouldnt be in a hurry - I'd honestly prefer that the base.bst file depend on a published BaseSdk, once it's honed down to a nice and tiny size16:07
tristanand not inter-project depend for that purpose16:07
tristanbut yeah, it's coming :)16:07
tristanw00t, my desktop has slowed down to a nice and reasonably slow speed16:08
tristana speed that you would expect under "regular circumstances", i.e. like when you are normally compiling gcc, glibc, webkit and spidermonkey at -j8 each in the background16:09
tristanahhh, life is back to normal :)16:09
*** noisecell has quit IRC16:11
*** toscalix has quit IRC16:17
*** bethw has quit IRC16:18
bochechatristan: what kind of a monster machine do you have? :o16:25
tristanbochecha, pretty strong one, 4 dual thread cores up to 3.4GHz and plenty of ssd :)16:29
tlatertristan: Is that in a laptop?16:30
*** xjuan_ has joined #buildstream16:30
tristanI find that -j8 on 4 parallel builds is a decent balance, because most of the time 2 or 3 builds are doing ./configure16:30
tristanwould be good to get a job server and some standard across build systems, though16:31
tristantlater, yep16:32
* tlater is impressed16:32
tlaterI just run my gentoo builds on my PC, which does things fine, but it's a pain when I'm not in the UK16:33
tristantlater, I was hoping that I could get at least 75% power out of a thin quad core thinkpad yoga this year (plugged in), but two things happened: A.) heat dissipation problem, meaning more throttling... B.) meltdown (so who wants to buy cores this year ?)16:34
*** jonathanmaw has quit IRC16:34
gitlab-br-botbuildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/37216:36
jjardonHi, is there someone around to review https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/33 ?16:37
tlaterjjardon: I can take a peek if you like16:37
jjardontlater: thanks!16:37
tlaterjjardon: Why do we need these calls now? https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/33/diffs#587d266bb27a4dc3022bbed44dfa19849df3044c_69_7316:41
tlaterIn fact, I don't think we want a latest tag for the testsuite images16:41
jjardontlater: not strictly needed, but its conviniant for some things16:42
jjardontlater: see latest commit, to use docker caching mechanism16:43
tlaterOh, whoops, not sure why I thought it was all one commit16:43
tlaterAh, I see what you mean...16:43
tlaterHm16:43
* tlater was under the impression that we stored our old images on the runner16:44
tlaterSo that technically wouldn't be required16:44
*** Prince781 has joined #buildstream16:44
jjardonwe dont16:44
tlaterIt also seems to cut build times in half, so yeah, I don't see a reason not to16:45
jjardonits in the gitlab cache but thats: 1slow 2not garantee to be always there16:45
tlaterYeah, I agree, this is probably better16:45
tlaterAnd then we do need a latest tag for the testsuite images to support this16:46
* tlater wanted those anyway because it makes scripting stuff for dev use easier16:46
tlaterApproved! I really like the changes jjardon, will make things much neater in the future, tyvm :)16:47
jjardonnp! thanks for the quick review tlater !16:47
jjardontlater: would you mind reviewing https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/32 al well (its quite trivial this one)16:48
tlaterLess comfortable with that one because it's not just technical, but I'll have a look16:49
tlaterjjardon: One comment, otherwise also fine with this one, actually16:56
tlaterMaybe the whole thing can also be squashed into a single commit16:57
jjardontlater: sure, I have pushed a new version to make it more explicit17:00
tlaterjjardon: If you squash the commits I'll approve this one too, don't think there's much value in exactly explaining every split here :)17:00
jjardonsure, done17:02
jjardontlater: ^17:02
*** dominic has quit IRC17:03
tlaterAwesome, approved now17:03
*** noisecell has joined #buildstream17:06
jjardon[m]thanks!17:09
*** bethw has joined #buildstream17:42
*** Prince781 has quit IRC18:15
tristanhmmm, status area even does not display full element names18:45
tristanah, I may be wrong but that might of been worth changing along with todays cache key break, I think removing that normal name nonsense also implied a cache key break :-S18:46
tristanmissed opportunity, oh well18:47
jjardontristan: https://www.phoronix.com/scan.php?page=news_item&px=RHEL-8-No-Python-218:51
jjardontristan: is RHEL serious enough? :)18:52
*** valentind has quit IRC19:15
*** valentind has joined #buildstream19:15
*** bethw has quit IRC19:26
*** bethw has joined #buildstream19:50
*** bethwhite_ has joined #buildstream20:19
*** cs_shadow_ has joined #buildstream20:20
*** jjardon_ has joined #buildstream20:21
*** bethw has quit IRC20:21
*** hergertme has quit IRC20:21
*** cs_shadow has quit IRC20:21
*** jjardon has quit IRC20:21
*** benbrown has quit IRC20:21
*** laurence has quit IRC20:21
*** awacheux[m] has quit IRC20:21
*** abderrahim[m] has quit IRC20:21
*** juergbi has quit IRC20:21
*** waltervargas[m] has quit IRC20:21
*** csoriano has quit IRC20:21
*** ernestask[m] has quit IRC20:21
*** ssssam[m] has quit IRC20:21
*** segfault3[m] has quit IRC20:21
*** theawless[m] has quit IRC20:21
*** cgmcintyre[m] has quit IRC20:21
*** pro[m] has quit IRC20:21
*** connorshea[m] has quit IRC20:21
*** m_22[m] has quit IRC20:21
*** alatiera has quit IRC20:21
*** rafaelff[m] has quit IRC20:21
*** jjardon[m] has quit IRC20:21
*** bochecha has quit IRC20:21
*** kailueke[m] has quit IRC20:21
*** ptomato[m] has quit IRC20:21
*** asingh_[m] has quit IRC20:21
*** inigomartinez has quit IRC20:21
*** mattiasb has quit IRC20:21
*** cs_shadow_ is now known as cs_shadow20:21
*** jjardon_ is now known as jjardon20:21
*** csoriano has joined #buildstream20:22
*** benbrown has joined #buildstream20:24
*** laurence has joined #buildstream20:27
*** juergbi has joined #buildstream20:27
*** hergertme has joined #buildstream20:27
*** inigomartinez has joined #buildstream20:29
*** bethwhite_ has quit IRC20:33
*** mattiasb has joined #buildstream20:37
*** ptomato[m] has joined #buildstream20:42
gitlab-br-botbuildstream: issue #354 ("Allow configuring an element to not be tracked by --track-all") changed state ("opened") https://gitlab.com/BuildStream/buildstream/issues/35420:47
*** bethw has joined #buildstream20:51
*** awacheux[m] has joined #buildstream20:52
*** bethw has quit IRC21:04
*** awacheux[m] has quit IRC21:05
*** ptomato[m] has quit IRC21:05
*** mattiasb has quit IRC21:05
*** inigomartinez has quit IRC21:05
*** noisecell has joined #buildstream21:11
*** noisecell has quit IRC21:11
*** bethw has joined #buildstream21:18
*** aday has quit IRC21:22
*** xjuan_ has quit IRC21:23
*** bethw has quit IRC21:32
*** rafaelff[m] has joined #buildstream21:34

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!