IRC logs for #buildstream for Wednesday, 2018-04-11

gitlab-br-bot	buildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/372	00:10
*** Prince781 has joined #buildstream		00:20
*** jsgrant has joined #buildstream		01:24
*** jsgrant has quit IRC		01:26
*** Prince781 has quit IRC		01:50
*** Prince781 has joined #buildstream		02:05
*** Prince781 has quit IRC		02:42
*** tristan has joined #buildstream		05:06
gitlab-br-bot	buildstream: merge request (test-run-doc->master: HACKING.rst: Add integration and pytest notes) #319 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/319	06:49
gitlab-br-bot	buildstream: merge request (test-run-doc->master: HACKING.rst: Add integration and pytest notes) #319 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/319	07:09
gitlab-br-bot	buildstream: merge request (milloni/assert-bwrap-version->master: Assert minimum bwrap version on host) #379 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/379	07:10
*** toscalix has joined #buildstream		07:57
tristan	juergbi, ok so I first fixed that "running files" serialization, with the thought in mind that serializing where sources were staged relative to the sandbox in the workspaces.yml was the right solve for `bst shell --sysroot`	08:10
tristan	fixing that serialization of writes was important, but now I think maybe I had the wrong solve for `bst shell --sysroot`	08:10
tristan	Consider that; first of all we want to cache artifacts of failed builds	08:11
tristan	Also consider that... a failed build sandbox is probably a very specific state	08:11
tristan	juergbi, so I'm starting to think that; instead, after a build fails and we intend to give the user a sysroot snapshot of that build failure... maybe we should just be copying the workspace files into the mount location, and not mount at `bst shell --sysroot` time ?	08:13
tristan	It would require I guess a utils.copy_files(...preserve_mtime=True) semantic	08:13
tristan	any thoughts on that ?	08:13
tristan	for reference: https://gitlab.com/BuildStream/buildstream/issues/346 <-- that issue	08:14
juergbi	tristan: hm, that would be a user-visible change, though	08:15
gitlab-br-bot	buildstream: merge request (milloni/assert-bwrap-version->master: Assert minimum bwrap version on host) #350 changed state ("closed"): https://gitlab.com/BuildStream/buildstream/merge_requests/350	08:15
juergbi	users might rely on read-write workspaces	08:15
juergbi	but maybe they shouldn't	08:16
juergbi	this would be in line with my idea of separating workspace sources and build directory	08:16
juergbi	from that perspective it sounds fine to me. just wondering whether there are use cases where people really need the mounted workspace	08:17
tristan	juergbi, I think a user visible change was that `bst shell --sysroot` became completely broken when we started mounting the workspaces into the sandbox	08:18
tristan	juergbi, the question is rather what is the appropriate fix; I feel like the appropriate fix is to give the user a sysroot that is actually representative of what failed	08:19
tristan	not a sysroot that "might have changed because the user edited a workspace file"	08:19
juergbi	true, I think that makes sense	08:19
tristan	which would be more in line with what we will be doing once we solve caching failed builds	08:20
juergbi	for the sysroot case	08:20
juergbi	what about bst shell without sysroot?	08:20
tristan	i.e. when we cache failed builds, we should be removing the sysroots completely I think, and always cleaning up the build directory	08:20
tristan	bst shell without sysroot is not broken	08:20
juergbi	yes, so we keep mounting it there, right?	08:21
tristan	it stages everything on demand, delegates to Element.stage() and everything	08:21
tristan	right, we keep mounting it there	08:21
juergbi	so that will still need the running_files update	08:21
*** segfault3[m] has joined #buildstream		08:21
tristan	juergbi, right: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/element.py#L522 :)	08:22
tristan	interestingly, that code path cannot get hit with `--sysroot` anyway	08:22
tristan	because nothing is staged when shelling into an existing sysroot	08:23
segfault3[m]	Hi, I'm having issues with buildstream. I tried building both gtk+ and GNOME Shell on Debian Stable and Debian Testing, it always fails with this message:	08:23
segfault3[m]	FAILURE [base/cyrus-sasl.bst]: tar source at base/cyrus-sasl.bst [line 48 column 4]: Error mirroring ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz: <urlopen error ftp error: error_perm('500 OOPS: failed to open vsftpd log file:/var/log/vsftpd.log',)>	08:23
tristan	segfault3[m], that looks like a legitimate error	08:24
tristan	segfault3[m], i.e. you're getting 500 from the upstream site	08:25
tristan	segfault3[m], wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz is also reporting errors for me	08:25
tristan	https://bpaste.net/show/0aae6b299b9c	08:26
segfault3[m]	ok, thanks, I will ask the cyrusimap devs then	08:27
tristan	segfault3[m], this is a typical issue, and we're currently working on a mirroring solution here https://gitlab.com/BuildStream/buildstream/issues/328	08:27
tristan	so we hope that we can setup alternative mirrors and easily enable mirroring such that we can fallback to alternative mirrored locations	08:27
segfault3[m]	sounds good	08:28
tristan	segfault3[m], oh, if you have a means of actually obtaining the tarball, a workaround would be to locally set the 'url' field in `base/cyrus-sasl.bst` of it to a local file:// url	08:31
tristan	ah, we already use shutil.copystat() in safe_copy(), so no need to explicitly preserve mtime	08:47
gitlab-br-bot	buildstream: merge request (135-expire-artifacts-in-local-cache->master: WIP: Resolve "Expire artifacts in local cache") #347 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/347	08:57
segfault3[m]	tristan: I was able to download the tarball from https://www.cyrusimap.org/releases/cyrus-sasl-2.1.26.tar.gz, but I can't find the cyrus-sasl.bst on my system	09:02
jmac	I've had this problem	09:03
jmac	Cyrus-sasl's ftp site has been down for several days	09:04
jmac	The bst file should be sdk/elements/base/cyrus-sasl.bst	09:04
jmac	Actually that's freedesktop-sdk, which you may not be using	09:05
segfault3[m]	I'm using gnome-build-meta, it only has freedesktop-sdk-junction.bst and linker-priority.bst in elements/base/	09:06
*** jonathanmaw has joined #buildstream		09:06
gitlab-br-bot	buildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/380	09:12
tristan	juergbi, wanna take a look at https://gitlab.com/BuildStream/buildstream/merge_requests/380 ?	09:12
juergbi	sure	09:13
tristan	oops linting error, I'll have to repush, but it's certainly something minor	09:15
tristan	maybe a line too long	09:15
* tristan will look into optimizing "Calculating cached state" and hopefully improve that today		09:15
tlater	juergbi: I'm not sure how to tackle concurrency on !347 - if we want to take advantage of cached artifact cache sizes, and don't want to accidentally start deleting artifacts from another element's build, the data in the main process needs to be updated as fast as possible	09:16
tlater	So an approach like how tristan dealt with the workspace metadata doesn't seem like it's going to work	09:16
tlater		09:16
tristan	tlater, first of all, half of your problem doesnt make sense	09:16
tristan	tlater, all of the artifacts which you are going to create in a build should be pinned anyway	09:17
tristan	it may be a corner case for workspace artifacts, which should be added to the "pinned" list as soon as the cache key is discovered	09:17
tristan	tlater, juergbi; I'm not sure I like the idea of abusing the IPC and synchronizing things for the purpose of LRU expiry	09:19
tristan	it sounds like this would fit better into the existing architecture of queues	09:19
tristan	tlater, if you had an entirely separate task, who's job was just to cache an already built artifact, where the dispatch mechanism has some opportunity to reconcile, wouldnt that make things easier ?	09:20
tlater	Yeah, I do think so, dispatching "cache" tasks would allow synchronizing cache sizes after each operation, instead of between processes	09:20
juergbi	tristan: !380 makes sense to me. I'm wondering whether this might be a bit expensive for git sources	09:21
tristan	maybe treating the main process as a sort of "cache size counter server" to tasks, and doing IPC from task -> parent would also work	09:21
tristan	juergbi, it might be expensive for linux.git	09:22
tristan	re the above, looking at program flow: Compare a bunch of child processes doing assemble, and all wanting the attention of the main process to periodically negotiate a cache size counter...	09:22
juergbi	tlater: can't we do it in _assemble_done?	09:22
juergbi	in the main process	09:22
tristan	with a program flow where caching the artifacts is done in a separate queue	09:23
tlater	juergbi: Wouldn't we have synchronization issues then, where some processes think the cache is larger than it is?	09:23
tristan	juergbi, you want to hold up the main process with deletion of artifacts ?	09:23
tristan	actually, that doesnt work I think	09:23
juergbi	no, just pin the cache key for workspaces	09:23
tristan	you need to delete artifacts from the cache to make space for new incoming artifacts	09:23
tlater	Ah, I see what you mean	09:23
tristan	ah, that part yes	09:23
juergbi	ah, synchronization of the deletion is still an issue	09:23
tlater	Yeah :)	09:24
juergbi	the thing is, committing to ostree doesn't actually consume additional space	09:24
juergbi	if the build result already exists	09:24
tristan	tlater, it's tempting to go with "Add some IPC sugar, because I dont want to split up this big task into two tasks"	09:24
tristan	tlater, I dont want us to be lazy that way; instead think of what is the cleaner program flow	09:25
juergbi	so maybe keep committing in the build job but have a separate (serial) queue for deletion if necessary?	09:25
tristan	if it really is cleaner to add some IPC (I feel like it's doubtful, though), then we go that route	09:25
tristan	But "Lets fix this while modifying the least number of lines of code" is how we get into trouble	09:26
juergbi	I don't see a point of splitting out committing itself	09:26
tristan	juergbi, seeing your point	09:26
tristan	juergbi, it's not going to take additional disk space to commit the artifact and delete the build directory; or is it ?	09:27
tristan	juergbi, when we commit an artifact, does ostree retain hardlinks to the sources ?	09:27
tristan	or can it grow ?	09:27
juergbi	there will be a very short peak as we delete it after committing	09:27
juergbi	no, ostree doesn't take hardlinks to keep things safe	09:27
juergbi	iirc	09:27
tristan	right, that makes sense, so technically, we bust our limits if we take that approach, and we are kind of lying about the quota	09:28
juergbi	theoretically, we could commit and delete on a file by file basis	09:28
* tlater doesn't like this for the case in which we're limited by disk space		09:28
juergbi	so the peak would just be a single file	09:28
juergbi	longer term with FUSE we might actually be able to hardlink/rename if we control everything	09:29
tristan	juergbi, rather, if we have a separate queue/task which which deletes-things-if-necessary, what would be the difference in that, from just committing to cache in a separate queue ?	09:29
tristan	seems like not much of a big difference	09:29
juergbi	seems more complex to me	09:30
juergbi	the deletion aspect would also be required for pull and with remote execution	09:30
juergbi	but separate committing makes much less sense in general	09:30
juergbi	regarding peak disk usage: if we don't hold off parallel build jobs, we can't control peak disk usage anyway	09:31
juergbi	we'll always need some extra space for working	09:31
juergbi	and we must keep the case in mind where multiple bst sessions are running with the same local cache	09:32
juergbi	i.e., we can't control this all anyway	09:32
tlater	juergbi: That's what the 2GB headroom is for - I don't think that's quite enough for artifacts using the tar cache, though	09:32
juergbi	we can hopefully replace the tar cache, so let's ignore it for architecture questions	09:33
tristan	tlater, the headroom was originally a consideration that the user might have sources on the same volume	09:33
tristan	tlater, and because we wanted to have a sensible "all" default value	09:33
tristan	A.) We know the maximum space an artifact can potentially consume in the cache... B.) We know the size of the cache... C.) We (ungracefully) consider the build directory, to not be the cache (it can in fact be on a separate volume, but that's slow)	09:35
tristan	for what is the cache, we can respect the limit; except for this synchronization issue	09:35
juergbi	that also assumes no multiple concurrent bst sessions	09:36
juergbi	the whole pinning actually as well	09:36
tristan	Yes I noticed that	09:36
tlater	juergbi: We work around that slightly	09:37
tristan	juergbi, to mitigate this, I asked that tlater mark every artifact that was going to be used, as used; immediately	09:37
tristan	juergbi, lessening the chance of such a parallel build deletion situation	09:37
juergbi	right	09:37
tristan	in the end, we still have to gracefully handle the out-of-space failures	09:37
juergbi	yes, and also missing artifacts	09:38
juergbi	(in case the quota is too small for the session)	09:38
tlater	juergbi: Currently that's considered a failed build, although it will remove any artifacts not required in the pipeline.	09:39
tlater	(Hard to work around that latter bit, but this should be a very rare case)	09:39
juergbi	yes, failing is fine but we might need a suitable hint in the error message	09:40
tlater	I do have a specific error message for that, could take some rewording	09:40
juergbi	just saying we can't completely ignore it and completely crash and burn	09:40
juergbi	good	09:40
tristan	right, that is the graceful part :)	09:40
gitlab-br-bot	buildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/380	09:43
gitlab-br-bot	buildstream: issue #346 ("Fails to open build shell after build failure") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/346	09:54
gitlab-br-bot	buildstream: issue #346 ("Fails to open build shell after build failure") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/346	09:54
gitlab-br-bot	buildstream: merge request (tristan/fix-shell-worksace-files->master: Fix shelling into a failed build sandbox with workspaces active) #380 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/380	09:54
tlater	Hm, so, I think I like the idea of a deletion queue. We could hold off on committing a handful of element's artifacts and queue a "please make this much space in the artifact cache" request.	09:56
tlater	That would reduce the number of times we need to calculate the cache size	09:56
tlater	As well as help reduce the number of calls to ostree.repo.prune	09:57
tlater	Plus as juergbi says, this is more flexible than trying to do this while we're deleting, it could be reused very easily.	09:58
tlater	Some synchronization to figure out how much space we want to reserve would still need to happen for the build queue	09:58
tlater	But the major problem of trying to figure out how much space there is in the cache would be a lot easier	09:59
tlater	Thoughts?	09:59
tlater	I think the deletion queue wouldn't need any concurrency, actually	09:59
tlater	Since we'd just need to figure out what artifacts we want to delete, and do them all at once.	10:00
*** aday has joined #buildstream		10:02
tristan	tlater, Sure, I dont mind so much except that it seems that it could respect the limit, while this approach is a bit more fuzzy on that	10:06
tristan	tlater, but I dont mind so much, it will be more performant this way anyway	10:07
tristan	tlater, one thing is, we probably wont be treating this as a regular "Queue", so we may need other ways to launch a Job	10:07
tristan	that will probably change the architecture somewhat, interested to see what you have in mind...	10:08
tristan	tlater, maybe it can be a regular queue with a special attribute such that it is not displayed in any status parts of the UI	10:14
tristan	i.e. it makes no sense to see "You skipped the deletion of 200 elements" reported, or to see the queue in the status area in interactive mode	10:15
tristan	tlater, also the log lines are weird in this case, they will automatically print the element in context being processed in that queue, but that element is not getting deleted	10:16
tlater	Hm, right. I'm wondering if this actually is a queue, in that sense.	10:17
tristan	maybe it will print "Making room in the cache", and then a message "detail" listing the artifacts it's deleting	10:17
*** dominic has joined #buildstream		10:18
tristan	yeah that's what I mean, we could hijack the queue; but we'd have to special case the UI	10:18
tristan	tlater, on the other hand, there is already some good cases for leveraging the scheduler without queues	10:18
tlater	Really, the deletion method doesn't need to know anything about the elements it wants to make space for, except the total maximum size of their artifacts.	10:18
tristan	there are some places we've wanted that, I think initialization is slow because we dont even interrogate source consistency state in parallel	10:18
tristan	tlater, so having a way to spawn parallel tasks with the scheduler, without them being "elements in queues" is already a thing we can use	10:19
tristan	if we do it, it should be streamlined in such a way that there is no code duplication between Queues and NewThing	10:20
tristan	probably Queues uses the NewThing, to handle job completion and such parts	10:20
tristan	ok, time to refactor artifact metadata	10:33
tristan	lets split this all into separate files	10:34
tristan	juergbi, oddly, your comment here https://gitlab.com/BuildStream/buildstream/issues/294#note_62585246 talks about loading meta/artifact.yml ... but Element._get_artifact_metadata() in only called at staging time	10:35
tristan	maybe there is some justification for reading the meta/artifact.yaml via a separate code path, dunno...	10:36
* tristan hunts it down and gets to work...		10:36
juergbi	tristan: _update_state()	10:37
juergbi	# Load the strong cache key from the artifact	10:37
juergbi	metadir = os.path.join(self.__extract(), 'meta')	10:37
juergbi	meta = _yaml.load(os.path.join(metadir, 'artifact.yaml'))	10:37
juergbi	self.__cache_key = meta['keys']['strong']	10:37
juergbi	should probably have used the helper	10:37
juergbi	maybe it didn't exist yet	10:37
tristan	gah	10:56
tristan	looks like it's manually extracted all over the place	10:56
tristan	I should note too, that the artifact.yaml is getting parsed every time	11:05
tlater	tristan: The way I imagined this was that the build queue would be paused until there's enough space for a set of elements - this doesn't look particularly possible with the scheduler atm	11:05
tlater	^Re the concurrent deletion thing	11:05
tristan	tlater, see Queue.status()	11:07
tristan	tlater, it can return QueueStatus.WAIT	11:07
tristan	tlater, I'm not saying that this can be done without a global revisiting of how scheduling is done, and probably some minor changes	11:08
tristan	in any case, ability to run jobs outside of the context of a Queue is already a big change	11:08
tristan	juergbi, every cached artifact has both a weak and a strong key encoded right ?	11:18
juergbi	yes	11:18
juergbi	(they may be identical, though)	11:19
tristan	sure	11:19
tristan	juergbi, think it's at all possible that that codepath in _update_state() is getting called multiple times recursively ?	11:20
tristan	still, I'm adding Element.__metadata_cache hashtable	11:20
tristan	just wondering if that might be enough to fix performance	11:21
tristan	only load metadata once per artifact would be a good start	11:21
tristan	Eh, splitting up the files is still a good idea, but I'll start with cache as first step, and follow that pattern while splitting	11:22
juergbi	that part in _update_state() should never be invoked multiple times for the same element	11:23
juergbi	as it sets the cache key and will never unset it	11:23
juergbi	(at least in a single process)	11:24
juergbi	so I don't think a cache helps here but it might make sense in general	11:24
tristan	yeah there is other workspace bits which ask for the metadata	11:25
tristan	so it will still help	11:25
tristan	tlater, ...	11:36
tristan	tlater, seeing as you were involved in the incremental builds stuff, and are now doing local LRU expiry... I should make sure...	11:36
tristan	tlater, ... I should just double check and make sure you are not going to break this: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/element.py#L501	11:38
* tlater will check in a minute, is on the go but phone doesn't have enough ram to render files on gitlab		11:40
tristan	tlater, incremental builds in workspaces expect that you can load the last successful build metadata from the artifact cache at stage time	11:41
tristan	tlater, so when that metadata suddenly disappears, it's important to not break.	11:41
tristan	tlater, I'm more than half expecting an exclamation on your part, please feel free to use foul language :)	11:42
*** marcledilson has joined #buildstream		11:43
tlater	Oh	11:55
* tlater curses internally, easier to avoid doing so on IRC x)		11:55
tristan	tlater, I dont know, maybe you just need to wipe that incremental stuff to some kind of "clean slate" once the last_successful goes missing	11:57
tlater	Hrm	11:57
tlater	That would involve knowing what the build system changed and what was changed by the user	11:57
tristan	I dont know that it would, it would rather involve ensuring that everything staged appears to be new, wouldnt it ?	11:58
tlater	Oh, a new mtime on all files?	11:58
tlater	Yeah, that works	11:58
tristan	I havent followed what you've been doing closely, but seems that way	11:59
tlater	ta, I'll make sure the expiry branch doesn't break here	11:59
tlater	(In fact, I guess if someone manually deleted artifacts this would already break)	12:00
* tristan really has to complete the privateness refactor		12:00
tristan	Element is still in a total mess, with plenty of really private things exposed with only a single underscore	12:00
tristan	For this patch, I'm following current style and not paying heed to private policy	12:01
tristan	better to fix that separately	12:01
tlater	tristan: On that actually, I don't quite think the hacking file is clear enough on that	12:03
* tlater at least doesn't quite understand where we make a distinction between "local private" and "API private"		12:03
tlater	At least before the refactor I assumed we only marked API private specially, but the wording now seems a bit confusing	12:04
tristan	tlater, what don't you understand ?	12:05
tristan	tlater, API Private is pretty clear, it means not exposed in public API	12:06
tristan	tlater, local private means that it is not used outside of the current scope	12:06
tlater	Do we mark both of those with a leading underscore?	12:06
tristan	tlater, I'll be happy accept patches to the HACKING file	12:06
tristan	Yes we do	12:06
tristan	tlater, with double underscore in the few cases where there is ambiguity	12:07
tlater	Ok, that makes sense then :)	12:07
tristan	there is ambiguity when a class is public, and has internal things which are also local private (need 2 underscores)	12:08
tlater	Annoyingly, that makes those variables unusable in subclasses	12:08
tlater	So something like element.py might struggle using them	12:08
tristan	those things need to have single underscore things which are API private but not local private (like Element._assemble_done() for instance, needs to be called from BuildQueue, but is not public API)	12:08
tristan	tlater, that is the whole point	12:09
tlater	Does an implementing class not count as local scope?	12:09
tristan	tlater, it is not local private if it is accessed from somewhere else, like in another file, or a subclass	12:09
tristan	No, "scope" is something pretty clearly defined :)	12:09
tristan	it doesnt include fancy shmancy thingamabobs that Java, C++ and such languages add, this is just python :)	12:10
* tlater thought in object scopes, as the subclass will actually be operating in a copy of its parent's scope, but doesn't further question this :)		12:10
tristan	juergbi, anything else you'd like to do while reving the artifact version ?	12:36
tristan	juergbi, I'm going to do a little bit more cleanup, at least how cache keys are calculated for sources is right now a bit weird, worth changing if there is a rev bump	12:37
* tristan thinks out loud that at this time... he would really love to see a graph of the performance of loading pipelines in non-strict mode		12:47
tlater	Gah, I've been designing something that relies on knowing how large ostree artifacts are :\|	12:54
tlater	o\	12:54
skullman	bah, worked out how to retry a pipeline, but it doesn't automatically merge after a retry	12:56
skullman	https://gitlab.com/BuildStream/buildstream/merge_requests/375 will need someone with elevated permisisons to prod it	12:56
tristan	skullman, it looks like we're hitting some kind of weird gitlab bug :-S	12:59
tristan	it's telling me I set it to be merged	12:59
tristan	and the pipeline passed	12:59
tristan	but nothing to merge it	12:59
tristan	skullman, I take it you've rebased and it now passes ?	12:59
tristan	ok wait, I've canceled automatic merge, and now it tells me I need a rebase again	13:00
tristan	skullman, note that I rebased in the UI the other day, and that's when it failed, I suspect you re-pushed this without local rebase maybe ?	13:00
tristan	try rebase it locally and repush that	13:01
skullman	I didn't rebase or push. I fetched the rebase you did via the UI and tested that locally and it worked, so got it to retry.	13:01
* skullman will rebase		13:01
*** xjuan has joined #buildstream		13:02
tristan	strange	13:04
tristan	still a weird gitlab bug occurred :)	13:04
tristan	should I try UI rebase ?	13:14
skullman	I'm currently running tests locally on a manual rebase.	13:15
skullman	I don't expect it to fail.	13:16
skullman	I'm fine with either waiting for a local pass, or not.	13:16
gitlab-br-bot	buildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/381	13:17
tristan	juergbi, want to take a look at #381 ?	13:18
tristan	juergbi, I want to merge that and ask mcatanzaro to observe performance after...	13:19
tristan	maybe that's bad, but the split is generally a good strategy I think	13:19
*** jonathanmaw has quit IRC		13:23
*** jonathanmaw has joined #buildstream		13:32
gitlab-br-bot	buildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/372	13:34
juergbi	tristan: taking a look	13:35
*** bethw has joined #buildstream		13:35
*** marcledilson has quit IRC		13:41
*** Prince781 has joined #buildstream		13:48
* tristan leaves closing coffee shop...		13:55
tristan	will appear in a moment	13:55
skullman	>:( there's non-deterministic test failures in tests/frontend/push.py and tests/integration/stack.py	13:56
tlater	:\|	13:56
skullman	tests pass after a rebase with master, just not always	13:56
tlater	Do you have a CI run for them? I've seen some such failures, though not recently.	13:57
gitlab-br-bot	buildstream: merge request (jjardon/update_depencies->master: docs: Update list of dependecies) #373 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/373	13:58
skullman	tlater: https://gitlab.com/BuildStream/buildstream/-/jobs/62223526 contains a stack.py fail	13:58
*** tristan has quit IRC		13:58
jjardon	Can someone review https://gitlab.com/BuildStream/buildstream/merge_requests/373/ and the docker counterpart of this https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/32 , please?	13:59
tlater	Hm, that's not an integration test, so unlikely to be a problem with cleanup	13:59
skullman	tlater: https://paste.baserock.org/reyobuwuwe for logs of another failed test_stack, https://paste.baserock.org/uqojayotof for a test_push	14:00
*** tristan has joined #buildstream		14:01
*** bethw has quit IRC		14:02
*** bethw has joined #buildstream		14:02
tlater	skullman: Sorry, I've only seen the last issue before, everything else you probably have better context on than me currently	14:02
tlater	As for that last one, the only time I've seen it occur was when I broke cache keys for pushed artifacts	14:03
tlater	But I doubt that's the issue here, considering the previous two traces	14:03
tlater	The ruamel exceptions are red herrings, btw, I think they pop up with our specific combination of pylint+ruamel in the debian-8 image	14:04
* tlater has tried to fix them before, but couldn't find a working combination		14:04
tristan	I have a regular "hang" in test_push_pull when run locally, but seems to never happen in CI	14:04
tristan	happens maybe once out of 20	14:04
tristan	or less	14:04
tristan	but still often enough that I notice	14:05
tristan	juergbi, still not sure eh :)	14:06
juergbi	I actually wanted to test it locally but that's not so quick, so let's just review it	14:07
tristan	Are you able to easily test that ?	14:07
tristan	juergbi, if you want, I'll hold off for a few hours	14:07
tristan	did you start a build ?	14:08
tristan	it will take a few hours	14:08
tristan	I guess	14:08
juergbi	I was considering it but noticed that I first have to rebase some other stuff, so maybe not a good idea right now	14:09
tristan	Ok, just review, lets get it in and see about performance after	14:09
tristan	it should help at least	14:09
* tristan will also run a GNOME build overnight		14:10
gitlab-br-bot	buildstream: merge request (jjardon/update_depencies->master: docs: Update list of dependecies) #373 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/373	14:10
gitlab-br-bot	buildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/381	14:11
skullman	:/ the test_stack failure from my local build is that umount is failing because there's nothing there. The one in CI is from a mount directory already being there. In CI I can believe that could be a scheduling problem of two runners running the same test on the same system at the same time.	14:11
skullman	On the local failure it implies that there was some code expecting to not perform cleanup in some non-deterministic circumstances.	14:12
tristan	the CI runners should be running at least in completely separate docker containers if they run on the same system	14:13
tristan	for local failure, it does sound likely that your fix may have uncovered a bug that was silently ignored	14:14
skullman	concurrency failures make me want to quit programming and work in a shop or something	14:15
tristan	wait a sec	14:15
tristan	oh no, if you hit the terminator codepath, there should be no opportunity to land in the finally: clause	14:16
tlater	Don't quit just yet, skullman ;p	14:16
tristan	hah, I always quite liked concurrency synchronization bugs	14:16
tristan	it's ref count imbalances which drive me nuts :)	14:16
* skullman wishes mount returned a mount ID and umount required it, so you'd at least get more obvious double-umount errors		14:17
juergbi	tristan: maybe I'm missing something but isn't the key return value of __extract() essentially unused?	14:18
juergbi	it's either ignored or the same as what's passed as input key	14:18
juergbi	ah, not in the None case	14:18
juergbi	nm	14:19
tristan	juergbi, yeah I was considering splitting that part of __extract() out	14:19
tristan	juergbi, but there is so much noise already of "get the key" functions for now, that I wanted to just let it be	14:20
juergbi	the whole keyed caching makes things not as trivial as expected	14:20
tristan	well, I wanted to cache under both keys	14:20
tristan	mostly, there is a lot of boiler plate, indeed	14:20
tristan	to have nice separate functions with different names which return practical things	14:21
tristan	skullman, I wonder if your try/finally blocks which be within the terminator context managers	14:22
tristan	skullman, also, looking at https://gitlab.com/BuildStream/buildstream/merge_requests/375/diffs#9bff43db2e8aefc8dbfb4c1068b99f443647bc0d_146_145 ...	14:22
juergbi	right. I think the changes are fine	14:22
tristan	skullman, on that line it looks like if mount() fails, you anyway do a finally: umount()	14:23
tristan	skullman, shortening the "try" blocks to be more specific may be a good idea	14:23
skullman	tristan: that umount will need to be done whether that mount happens or not, since the mount is outside the try block, and that one just changes the flags.	14:24
tristan	skullman, this one ? https://gitlab.com/BuildStream/buildstream/blob/621d0d85eede4ce824be4f35ecf15df516535ff7/buildstream/sandbox/_mounter.py#L145	14:27
* tristan links to file instead of MR		14:27
skullman	tristan: my rationale for putting the context managers inside the try-finally rather than the other way around was that I was at least able to reason about what happens if it gets terminated after the context ends but before the finally, that it won't get umounted, while termination having an umount after umount means it could unmount something underneath what was stacked	14:27
tristan	skullman, by the looks of it, both _mount and _umount do things, and can raise an exception	14:27
skullman	tristan: yeah, that's the mount call I meant	14:27
tristan	if that one fails, you probably dont want to call umount	14:28
tristan	you probably just want to reraise the exception no ?	14:28
tristan	the terminator logic can be daunting indeed, I think when we use it for launching processes, we use it outside the block, and we only terminate the process if the process pointer has been assigned	14:29
tristan	but that's indeed not helpful without an id/handle representing the mount	14:30
skullman	I don't think I follow. I'd expect with `bind_mount()` that any mount it made would be unmounted on context exit. Since a bind-mount is created before the yield in the try-finally, we need to try after mounting and umount in the finally.	14:30
skullman	If it fails to make the bind-mount's propagation rslave, then I'd want to roll back the context before yielding, so it'll umount and then reraise.	14:31
tristan	ahhhh	14:32
*** xjuan_ has joined #buildstream		14:32
tristan	skullman, I missed the part that the second call to _mount() was only modifying a property of the mount, sorry	14:33
gitlab-br-bot	buildstream: merge request (tristan/restructure-artifact-metadata->master: Restructure artifact metadata) #381 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/381	14:34
*** xjuan has quit IRC		14:35
tristan	skullman, I'm not sure if this is helpful for debugging but maybe (and would be good anyway)...	14:39
tristan	if we went into _exceptions.py and made SandboxError() support the newer `detail` kwarg which is available in the BstError base class...	14:40
tristan	and then if we took the stdout/stderr from the mount calls and put them in that detail arg...	14:40
tristan	we might see the errors better than just "mount returned exit status 255" or such	14:40
tristan	really unsure if that's helpful here though :-S	14:41
skullman	right now I'm just trying to get it to happen again to rule out neutrino strike	14:42
skullman	it may be useful if it turns out we've hit some obscure corner case where mount can't actually do anything, but it's most likely that the mount doesn't exist	14:42
*** Prince781 has quit IRC		15:06
*** Prince781 has joined #buildstream		15:08
*** xjuan_ has quit IRC		15:13
skullman	ok, so test_push fails reliably on my system since rebasing, but if I roll back my change it still happens, so could be something about my environment	15:24
skullman	going to try and get test_stack to happen	15:24
*** Prince781 has quit IRC		15:31
tlater	Hm, I'm struggling to understand the exit condition for the scheduler	15:42
tlater	This function here suggests that it exits once that while loop finishes: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/_scheduler/scheduler.py#L318	15:42
tlater	But it only conditionally stops the loop	15:43
tlater	Yet there's nothing to suggest that there's any other condition to stop if that doesn't happen: https://gitlab.com/BuildStream/buildstream/blob/master/buildstream/_scheduler/scheduler.py#L125	15:43
tlater	(Outside of the user ^C-ing out)	15:43
tlater	So as far as I understand, once buildstream runs out of processes to move between queues (which it will eventually, if they are just stuck building for example) it will keep running forever - but that's not right. Could someone point me to the real exit condition?	15:45
*** noisecell has quit IRC		15:46
juergbi	tlater: check Scheduler.sched()	15:48
juergbi	it exits if no queues have active jobs	15:49
juergbi	'ticking'	15:49
tlater	juergbi: That function is only run once though	15:49
tlater	It only checks exactly once whether there's still something ticking	15:49
juergbi	no, Queue.job_done() calls that as well	15:49
tlater	Ah	15:49
*** noisecell has joined #buildstream		15:49
tlater	ta juergbi, I overlooked that	15:50
tristan	looks like gnome-build-meta master does already depend on freedesktop-sdk... progress \o/	16:03
tlater	:O	16:05
tlater	Looks like we can soon change the base.bst file for our tests :)	16:05
tristan	I wouldnt be in a hurry - I'd honestly prefer that the base.bst file depend on a published BaseSdk, once it's honed down to a nice and tiny size	16:07
tristan	and not inter-project depend for that purpose	16:07
tristan	but yeah, it's coming :)	16:07
tristan	w00t, my desktop has slowed down to a nice and reasonably slow speed	16:08
tristan	a speed that you would expect under "regular circumstances", i.e. like when you are normally compiling gcc, glibc, webkit and spidermonkey at -j8 each in the background	16:09
tristan	ahhh, life is back to normal :)	16:09
*** noisecell has quit IRC		16:11
*** toscalix has quit IRC		16:17
*** bethw has quit IRC		16:18
bochecha	tristan: what kind of a monster machine do you have? :o	16:25
tristan	bochecha, pretty strong one, 4 dual thread cores up to 3.4GHz and plenty of ssd :)	16:29
tlater	tristan: Is that in a laptop?	16:30
*** xjuan_ has joined #buildstream		16:30
tristan	I find that -j8 on 4 parallel builds is a decent balance, because most of the time 2 or 3 builds are doing ./configure	16:30
tristan	would be good to get a job server and some standard across build systems, though	16:31
tristan	tlater, yep	16:32
* tlater is impressed		16:32
tlater	I just run my gentoo builds on my PC, which does things fine, but it's a pain when I'm not in the UK	16:33
tristan	tlater, I was hoping that I could get at least 75% power out of a thin quad core thinkpad yoga this year (plugged in), but two things happened: A.) heat dissipation problem, meaning more throttling... B.) meltdown (so who wants to buy cores this year ?)	16:34
*** jonathanmaw has quit IRC		16:34
gitlab-br-bot	buildstream: merge request (issue-21_Caching_build_trees->master: WIP: Issue 21 caching build trees) #372 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/372	16:36
jjardon	Hi, is there someone around to review https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/33 ?	16:37
tlater	jjardon: I can take a peek if you like	16:37
jjardon	tlater: thanks!	16:37
tlater	jjardon: Why do we need these calls now? https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/33/diffs#587d266bb27a4dc3022bbed44dfa19849df3044c_69_73	16:41
tlater	In fact, I don't think we want a latest tag for the testsuite images	16:41
jjardon	tlater: not strictly needed, but its conviniant for some things	16:42
jjardon	tlater: see latest commit, to use docker caching mechanism	16:43
tlater	Oh, whoops, not sure why I thought it was all one commit	16:43
tlater	Ah, I see what you mean...	16:43
tlater	Hm	16:43
* tlater was under the impression that we stored our old images on the runner		16:44
tlater	So that technically wouldn't be required	16:44
*** Prince781 has joined #buildstream		16:44
jjardon	we dont	16:44
tlater	It also seems to cut build times in half, so yeah, I don't see a reason not to	16:45
jjardon	its in the gitlab cache but thats: 1slow 2not garantee to be always there	16:45
tlater	Yeah, I agree, this is probably better	16:45
tlater	And then we do need a latest tag for the testsuite images to support this	16:46
* tlater wanted those anyway because it makes scripting stuff for dev use easier		16:46
tlater	Approved! I really like the changes jjardon, will make things much neater in the future, tyvm :)	16:47
jjardon	np! thanks for the quick review tlater !	16:47
jjardon	tlater: would you mind reviewing https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/32 al well (its quite trivial this one)	16:48
tlater	Less comfortable with that one because it's not just technical, but I'll have a look	16:49
tlater	jjardon: One comment, otherwise also fine with this one, actually	16:56
tlater	Maybe the whole thing can also be squashed into a single commit	16:57
jjardon	tlater: sure, I have pushed a new version to make it more explicit	17:00
tlater	jjardon: If you squash the commits I'll approve this one too, don't think there's much value in exactly explaining every split here :)	17:00
jjardon	sure, done	17:02
jjardon	tlater: ^	17:02
*** dominic has quit IRC		17:03
tlater	Awesome, approved now	17:03
*** noisecell has joined #buildstream		17:06
jjardon[m]	thanks!	17:09
*** bethw has joined #buildstream		17:42
*** Prince781 has quit IRC		18:15
tristan	hmmm, status area even does not display full element names	18:45
tristan	ah, I may be wrong but that might of been worth changing along with todays cache key break, I think removing that normal name nonsense also implied a cache key break :-S	18:46
tristan	missed opportunity, oh well	18:47
jjardon	tristan: https://www.phoronix.com/scan.php?page=news_item&px=RHEL-8-No-Python-2	18:51
jjardon	tristan: is RHEL serious enough? :)	18:52
*** valentind has quit IRC		19:15
*** valentind has joined #buildstream		19:15
*** bethw has quit IRC		19:26
*** bethw has joined #buildstream		19:50
*** bethwhite_ has joined #buildstream		20:19
*** cs_shadow_ has joined #buildstream		20:20
*** jjardon_ has joined #buildstream		20:21
*** bethw has quit IRC		20:21
*** hergertme has quit IRC		20:21
*** cs_shadow has quit IRC		20:21
*** jjardon has quit IRC		20:21
*** benbrown has quit IRC		20:21
*** laurence has quit IRC		20:21
*** awacheux[m] has quit IRC		20:21
*** abderrahim[m] has quit IRC		20:21
*** juergbi has quit IRC		20:21
*** waltervargas[m] has quit IRC		20:21
*** csoriano has quit IRC		20:21
*** ernestask[m] has quit IRC		20:21
*** ssssam[m] has quit IRC		20:21
*** segfault3[m] has quit IRC		20:21
*** theawless[m] has quit IRC		20:21
*** cgmcintyre[m] has quit IRC		20:21
*** pro[m] has quit IRC		20:21
*** connorshea[m] has quit IRC		20:21
*** m_22[m] has quit IRC		20:21
*** alatiera has quit IRC		20:21
*** rafaelff[m] has quit IRC		20:21
*** jjardon[m] has quit IRC		20:21
*** bochecha has quit IRC		20:21
*** kailueke[m] has quit IRC		20:21
*** ptomato[m] has quit IRC		20:21
*** asingh_[m] has quit IRC		20:21
*** inigomartinez has quit IRC		20:21
*** mattiasb has quit IRC		20:21
*** cs_shadow_ is now known as cs_shadow		20:21
*** jjardon_ is now known as jjardon		20:21
*** csoriano has joined #buildstream		20:22
*** benbrown has joined #buildstream		20:24
*** laurence has joined #buildstream		20:27
*** juergbi has joined #buildstream		20:27
*** hergertme has joined #buildstream		20:27
*** inigomartinez has joined #buildstream		20:29
*** bethwhite_ has quit IRC		20:33
*** mattiasb has joined #buildstream		20:37
*** ptomato[m] has joined #buildstream		20:42
gitlab-br-bot	buildstream: issue #354 ("Allow configuring an element to not be tracked by --track-all") changed state ("opened") https://gitlab.com/BuildStream/buildstream/issues/354	20:47
*** bethw has joined #buildstream		20:51
*** awacheux[m] has joined #buildstream		20:52
*** bethw has quit IRC		21:04
*** awacheux[m] has quit IRC		21:05
*** ptomato[m] has quit IRC		21:05
*** mattiasb has quit IRC		21:05
*** inigomartinez has quit IRC		21:05
*** noisecell has joined #buildstream		21:11
*** noisecell has quit IRC		21:11
*** bethw has joined #buildstream		21:18
*** aday has quit IRC		21:22
*** xjuan_ has quit IRC		21:23
*** bethw has quit IRC		21:32
*** rafaelff[m] has joined #buildstream		21:34

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!