IRC logs for #buildstream for Tuesday, 2020-09-22

*** tristan_ has joined #buildstream		00:00
*** ChanServ sets mode: +o tristan_		00:00
*** tristan_ has quit IRC		00:04
*** tristan_ has joined #buildstream		03:48
*** ChanServ sets mode: +o tristan_		03:48
*** tristan_ is now known as tristan		03:49
tristan	Hmmmm, not sure how fond I am of Source.BST_KEY_REQUIRES_STAGE	05:54
tristan	Seems to have a lot of implications wrapped up into there	05:54
tristan	As a plugin author, what am I to do with "Whether the source will require staging in order to efficiently generate a unique key" ?	05:55
tristan	Seems to be a strange contract	05:55
tristan	Tracking sources explicitly does not require downloading them, we have inefficiently implemented tracking in ways which download, but that's supposed to be improvable, that gives you a ref; and a ref should be enough to get a key	05:56
tristan	But	05:56
tristan	Indeed, if tracking gives you only a ref, that doesn't necessarily mean you get a key, you have the opportunity to stage it first	05:56
tristan	fetching requires a ref, not a key, oddly enough	05:57
tristan	Although, I would certainly expect to have all my keys in a stable state before fetching sources	05:57
tristan	We know exactly what we're going to build, that means we can show a key	05:57
tristan	This weird trick however seems to be only used in the case that sources are available locally, it would seem like a good idea to at least make BST_KEY_REQUIRES_STAGE private	05:58
tristan	Also note: This staging appears to happen for all local sources at startup, every time	05:59
tristan	I guess this is a trick to leverage CAS in order to generate a checksum instead of using a checksum utility, but not sure how much better it is (especially if it has to stage every startup)	06:00
tristan	https://gitlab.com/BuildStream/buildstream/-/merge_requests/1651	06:04
* tristan wonders what are the motivations behind this		06:04
tristan	So this whole thing is driven just by the workspace changes	06:05
* tristan serves up a proposal on the ML for this		07:29
*** benschubert has joined #buildstream		07:43
tristan	Hi benschubert... I've been thinking about adding tests for the regression fix... and I think the right solution would be to tweak logging a bit; such that --verbose shows everything and --no-verbose continues to behave the way it does	07:52
tristan	When I say everything, I mean: Completely nuke the concept of silent messages	07:53
tristan	I gave a build a try this way, and that led me to discover a bunch more messages	07:53
benschubert	That seem insanely verbose no? Verbose is the default mode, so I'd rather not force users to _have_ to specify --no-verbose to have a ok-ish experience	07:53
tristan	Well	07:53
tristan	I kind of feel like we should make --no-verbose the default too	07:54
tristan	Should we bite the bullet and break CLI API for it ?	07:55
tristan	Make it an enum ?	07:55
tristan	--verbose=[minimum\|....\|maximum] ?	07:55
tristan	(A) I'd like to land the regression fix quickly... (B) I cannot realistically write a regression test for it without this additional change	07:56
tristan	benschubert, honestly, it is very interesting to see all the messages I have to admit, for example, you can see the messages as we stage junctions at load time, this improves awareness (for us at least)	07:57
tristan	So, I would very much like to have a mode where we can see every single logged line	07:58
benschubert	tristan: if no verbose is default, I'm fine with that change. I'm wondering if a "--verbose --verbose" approach would be nicer ? Or "-vv/--very-verbose" ?	07:58
benschubert	But I think that having that additional level of verbosity coul dbe better	07:59
benschubert	because I usually don't care about what happens at that level but might need "--verbose" to debug issues	07:59
benschubert	(a single run for me results in hundred of thousands of log lines already)	07:59
tristan	Consequently while seeing that we redundantly stage junctions every single time, that led me to discover the weird BST_KEY_REQUIRES_STAGE thing, which led me to make todays proposal on the ML	07:59
tristan	(pretty lightweight proposal I think)	08:00
benschubert	yeah I glanced through this, as long as we keep the optimization for 'local' and 'workspace' I'm probably fine :D Need to read more through it	08:00
tristan	I kind of wrote it whilst understanding what it does	08:01
tristan	So the end is probably more informed than the beginning :)	08:01
tristan	Regarding CLI options: I'm a bit partial to enums.... (A) I'd rather keep verbosity control in a single CLI option, avoids overcrowding... (B) I'd rather not get onto the -v, -vv, -vvv, -vvvv train	08:02
tristan	To be honest I personally don't like tools which do (B), they don't tell me how many -v options are meaningful, and tend to add more -v levels over time (more and more verbose)	08:03
tristan	it's not clean	08:03
benschubert	For the 'single enum' I'm happy with that, though it gets harder to check in the messenger, which already has a lot of work to do :)	08:03
tristan	compared to an enum, which we could add new values to without breaking API, but would still have meaningful descriptions in the man pages	08:03
tristan	I think my branch will fix that though	08:04
tristan	benschubert, The messenger code is a bit hacky, it's details are controlled by external forces	08:04
benschubert	My problem with the enum is if we end up: "If MyEnum.LOGLEVEL1 or (MyEnum.LOGLEVEL2 or not silence)" in lots of places, it's not ideal :/	08:04
tristan	_frontend/app.py and _scheduler/job.py presume to know the meaning of things in _message.py	08:04
benschubert	yeah, I really need to get a good way of rewriting the messenger -_-'	08:05
benschubert	which is any ways needed to finish the threaded scheduler	08:05
tristan	That just needs a single filter function	08:05
benschubert	so now call yet another function for every line of logs? :D	08:05
benschubert	A non-negligible amount of time is spent inside the messenger for big projects	08:06
benschubert	But yeah, if we don't have a better way	08:06
tristan	We can try to remove "function call overheads" separately I don't know	08:06
tristan	Spreading branch statements about doesn't seem like the right answer for any codebase	08:06
tristan	We could even have a C preprocessor run over the python files and expand macros, old school style	08:07
benschubert	haha, well ok, let's disregard that point	08:16
* tristan had success with a C preprocessor in a JS project in the past, was pretty nifty (different usecase though)		08:18
*** santi has joined #buildstream		08:23
benschubert	we could also just have some cython code on the hotspots instead? maybe? :P	08:32
benschubert	tristan: so what other option would you state for that? I think that "-vv" would not be too bad if it was an explicit option (and with --very-verbose), being inclusive of '--verbose'. Otherwise, maybe a flag like "--trace" ? We already have verbose and debug...	08:39
benschubert	Gah, it gets hard to be clear on what does what	08:39
tristan	I know what I want --debug to do	09:10
tristan	I want `--debug artifact` or `--debug artifact:sandbox` or `--debug sourceplugin[pluginidentifier]:artifact`	09:11
tristan	selection of debugging topics is what I want from a `--debug` option, and I want `Plugin.debug()` messages to be filtered by their plugin identifier	09:11
tristan	with of course a catch all `--debug all`	09:12
benschubert	So, should we use the --debug=scheduler for showing such messages for example?	09:15
*** tristan has quit IRC		09:15
*** tristan has joined #buildstream		09:22
*** ChanServ sets mode: +o tristan		09:22
*** tristan_ has joined #buildstream		09:26
*** ChanServ sets mode: +o tristan_		09:26
*** tristan has quit IRC		09:27
tristan_	Any idea why this would be a fatal error: https://gitlab.com/BuildStream/buildstream/-/jobs/750164954 ?	09:29
*** tristan_ is now known as tristan		09:29
tristan	If we can't pull it: Build it	09:29
juergbi	tristan: iirc, we fallback to build if any part of the artifact is missing. however, here it seems like an unexpected pull failure, in which case we fail the job and don't move the element to any other queue	09:35
juergbi	13 is the grpc status code for internal error	09:36
tristan	mmm, /me retried job	09:37
juergbi	I'm not sure what the ideal behavior is. if the remote artifact cache is malfunctioning, the user may not want to get a silent rebuild	09:38
juergbi	but this may vary among users	09:38
coldtom	imo the failure should be reported loudly, but fallback should be automatic (at least in non-interactive mode)	09:42
juergbi	continue with build after pull error if `--on-error continue` is specified (or the user interactively selects 'continue') sounds sensible	09:52
tristan	consistent would be to retry	09:55
tristan	until the amount of retries is reached I guess	09:55
*** tristan has quit IRC		10:02
*** tristan has joined #buildstream		10:03
*** ChanServ sets mode: +o tristan		10:03
tristan	if it fails, continue building is of course more useful, but it probably already should retry; errors in push/pull are network related and have probability to succeed before failing hard	10:18
tristan	and then, any kind of hard failure of a pull results in building instead - but of course only if `--on-error continue` or choice of continue; yes I think that makes sense to me too	10:19
tristan	in this case, consistent fail: https://gitlab.com/BuildStream/buildstream/-/jobs/750379460	10:20
tristan	interesting, second try fail same place	10:21
tristan	pulling other artifacts successful for the same pipeline	10:22
juergbi	I think we already distinguish between retriable failures and non-retriable ones	10:22
juergbi	internal errors are essentially server (or buildbox) bugs, so not considering that as retriable sounds sensible to me	10:22
coldtom	possible corruption in the cache, i wonder?	10:34
coldtom	buildstream should probably log the error detail as well as the code, which would make debugging such failures easier	10:35
tristan	I see, yeah if it's a bug its a bug	11:16
tristan	would be good to have more info and treat it as fatal	11:16
jjardon	Hi, for some given inputs, if the binary output is different (because the project is not binary reproducible), how buildstream treats that cache level? I guess the binary output is not part of the cache key, rigth?	12:04
ironfoot	jjardon: I may be wrong here, but if that was the case, wouldn't be impossible to benefit from the cache at all?	12:06
tristan	jjardon, no it is not	12:06
tristan	ironfoot, it would be possible with a different build topology - iirc bazel takes the opposite approach	12:07
tristan	A dude who came to last london build meetup had an interesting talk about that	12:07
tristan	for instance; you "just build": And then you take the hashes of the outputs and consider those as the inputs of reverse dependencies	12:08
tristan	if you happen to produce identical output, you can do the "cutoff" thing where you stop building reverse dependencies because the inputs were binary identical (even if the sources are changed, because of comments or whatever)	12:08
tristan	implementing that reverse cutoff in BuildStream is something we discussed a few times (I ineptly called this "artifact aliasing", i.e. aliasing newly created cache keys to identical build outputs)	12:09
ironfoot	I see, interesting	12:10
tristan	jjardon, the content digests of artifacts are of course used in CAS and in RE	12:10
tristan	I	12:10
tristan	gah	12:10
tristan	I suppose that RE setups are probably optimized by reproducible builds in BuildStream, as there is less content to move from one place to another	12:11
tristan	it looks like fixing the overnight tests might be blocking on this pull bug :'(	12:14
tristan	Maybe we can merge it anyway so long as the "no-cache" pipeline passes: https://gitlab.com/BuildStream/buildstream/-/jobs/750164958	12:15
tristan	And zap the artifact cache	12:15
jjardon	tristan: so in this case, buildstream will reuse the cache even if potentially the output binary is different?	12:16
tristan	jjardon, Yes, also: if you don't ever build the same input multiple times, you don't know if it's reproducible or not	12:17
tristan	jjardon, with BuildStream, we provide an environment that tries to be as conducive as possible to reproducibility, and strive to never build the same thing twice	12:18
tristan	If the inputs have not changed (an artifact is available under that key), then the output (artifact) is reused	12:18
tristan	https://bb-cache.buildstream.build:11002 <-- I don't know much/anything about this host	12:18
jjardon	tristan: yup ok, thanks for the info	12:18
tristan	jjardon, juergbi... That is used in the buildstream overnight tests... would we know how to get more information from that host which could be helpful to fix "Bug number 13" ?	12:19
jjardon	tristan: I think is defined here: https://gitlab.com/BuildStream/infrastructure/infrastructure	12:19
tristan	WARNING cross-compilers/standard-libs-i686.bst: Could not pull from remote https://bb-cache.buildstream.build:11002: Failed to download blob d35f019a8c9d24115c4658147b340fe6baf23ab6d76922035835ff0d931ed3c5: 13 <-- bug 13	12:20
jjardon	coldtom: ^ ?	12:20
tristan	that means someone has credentials to login and poke around and gather logs ?	12:20
tristan	I don't know even what to do with them to be honest	12:20
tristan	just trying to figure out what is the appropriate course of action when we hit a "numbered bug"	12:20
jjardon	AFAIK what is deployed there is https://github.com/buildbarn/bb-remote-asset	12:22
jjardon	that implements the new protocols buildstream uses as master will drop / has drop support for bst-artifact-server	12:23
tristan	Right okay, so... what should we do... do we have a "zap cache" button in case of blocking issues like this one ?	12:27
tristan	That is an extreme measure of course	12:27
tristan	Measure number 1 would be 's/13/Description of what actually happened/g' in https://github.com/buildbarn/bb-remote-asset, or whatever is producing the 13	12:28
tristan	as well as dumping everything important to a log file	12:28
tristan	but we have 2 objectives: Solve the bugs which happen... Pass CI for unrelated issues	12:28
tristan	We can't be blocking up BuildStream development because of shaky bb-remote-asset implementations (although we should still be fixing those asynchronously)	12:29
jjardon	tristan I would try to make the no-cache pass and that would be enough to merge for now	12:30
jjardon	While we fix the cached one	12:30
tristan	"we" is a loaded word I guess, I don't want to imply that "it's BuildStream core developers problem if bb-remote-asset has a bug"	12:30
tristan	but, if we're using it in our CI, it does become our problem	12:30
tristan	jjardon, yeah, I suspect it will, lets give it another day to finish the build ...	12:31
* tristan just thinks that... it should be easy for a BuildStream developer to spin up a docker or vm locally and reproduce what is happening in the overnight tests, based on source code for BuildStream + buildbox-run + buildbox-casd + bb-remote-whatever-we-use		12:32
jjardon	tristan I'd say it's our problem the only remote cache solution buildstream has keep working	12:32
tristan	Having things in all kind of cloudy locations isn't really helpful for fixing stuff :-/	12:33
tristan	We don't really know where the bug is either	12:33
tristan	But we should have our hands on all materials and be able to reproduce something in order to fix things	12:34
tristan	I'm guessing it's not super hard to fix, just don't have an easy-to-spin-up dev environment right now	12:34
jjardon	The cache server is in a normal machine people can SSH to, same as always. What you proposse works for the CI but not if you want to preserve the cache for overnigth tests. Check the ci jobs for remote cache, I think they have a setup as you describe	12:35
tristan	Ok, I'll take a look into that and try not to get lost	12:36
tristan	I only wish I can have all services running in containers on my own laptop	12:37
tristan	So I can do stuff like run the services under gdb and such, and launch builds	12:37
jjardon	tristan afaik bst ci does exactly that for normal MRs in the remote cache job	12:38
tristan	nice	12:38
tristan	I thought it used some fancy gitlab feature with multiple docker containers	12:38
tristan	but I'll be very happy if it doesnt :D	12:38
*** tristan has quit IRC		12:44
*** tristan has joined #buildstream		12:45
*** ChanServ sets mode: +o tristan		12:45
coldtom	for the remote cache, i can't tell what's gone wrong without the error detail :/ all the error in the log tells us is that it's a gRPC Internal error, and that it's either from buildbox or bb-storage	12:47
coldtom	if the pipeline preserves the buildbox logs as an artifact, we might be able to rule out buildbox	12:50
gitlab-br-bot	abderrahimk opened MR !2071 (abderrahim/stage-artifact-scriptelement->bst-1: scriptelement.py: use stage_artifact() instead of stage_dependency_artifact()) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/2071	13:01
*** tomaz has joined #buildstream		15:46
douglaswinship	While writing a plugin, how would I set a default value for a list that i'm reading from the config info?	17:42
douglaswinship	I want to write self.blacklist = self.node_subst_list(node, "blacklist", default=[])	17:42
douglaswinship	but node_subst_list doesn't have a "default" argument.	17:42
douglaswinship	I want to let users create a list of "blacklist" items in the config dictionary. But if the user doesn't define a blacklist at all, I don't think it should throw an error.	17:45
juergbi	douglaswinship: in master? node.get_str_list("blacklist", default=[]) should work	17:49
douglaswinship	juergbi: very good to know.	17:50
douglaswinship	How about 1.4.3 though?	17:50
douglaswinship	is there a workaround?	17:50
juergbi	douglaswinship: however, if this is a top-level config option, you should rather add it to the corresponding plugin.yaml as: blacklist: []	17:50
juergbi	and that way `blacklist` will always be defined	17:50
juergbi	this should also work on 1.4	17:51
douglaswinship	juergbi: oh, I hadn't thought of that. Thanks!	17:51
juergbi	yw	17:51
*** santi has quit IRC		18:04
tomaz	people, I"m a bit lost trying to make a bootstrap image. I'm using freedesktop-sdk and changing the initial image for testing purposes, using archlinux.	18:41
tomaz	while compilling an element, I'm having the issue: bootstrap/debugedit.bst fails with rpmio.c:1050:10: fatal error: zstd.h: No such file or directory	18:41
tomaz	but that makes no sense for me, as this is installed on the image:	18:41
tomaz	<0mbootstrap/debugedit.bst:/usr/include]$ ls \| grep zstd.h	18:42
tomaz	zstd.h	18:42
tomaz	https://termbin.com/xxbn the log	18:46
*** benschubert has quit IRC		23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!