IRC logs for #buildstream for Tuesday, 2020-09-22

*** tristan_ has joined #buildstream00:00
*** ChanServ sets mode: +o tristan_00:00
*** tristan_ has quit IRC00:04
*** tristan_ has joined #buildstream03:48
*** ChanServ sets mode: +o tristan_03:48
*** tristan_ is now known as tristan03:49
tristanHmmmm, not sure how fond I am of Source.BST_KEY_REQUIRES_STAGE05:54
tristanSeems to have a lot of implications wrapped up into there05:54
tristanAs a plugin author, what am I to do with "Whether the source will require staging in order to efficiently generate a unique key" ?05:55
tristanSeems to be a strange contract05:55
tristanTracking sources explicitly does not require downloading them, we have inefficiently implemented tracking in ways which download, but that's supposed to be improvable, that gives you a ref; and a ref should be enough to get a key05:56
tristanBut05:56
tristanIndeed, if tracking gives you only a ref, that doesn't necessarily mean you get a key, you have the opportunity to stage it first05:56
tristanfetching requires a ref, not a key, oddly enough05:57
tristanAlthough, I would certainly expect to have all my keys in a stable state before fetching sources05:57
tristanWe *know* exactly what we're going to build, that means we can show a key05:57
tristanThis weird trick however seems to be only used in the case that sources are available locally, it would seem like a good idea to at least make BST_KEY_REQUIRES_STAGE private05:58
tristanAlso note: This staging appears to happen for *all local sources* at startup, every time05:59
tristanI guess this is a trick to leverage CAS in order to generate a checksum instead of using a checksum utility, but not sure how much better it is (especially if it has to stage every startup)06:00
tristanhttps://gitlab.com/BuildStream/buildstream/-/merge_requests/165106:04
* tristan wonders what are the motivations behind this06:04
tristanSo this whole thing is driven just by the workspace changes06:05
* tristan serves up a proposal on the ML for this07:29
*** benschubert has joined #buildstream07:43
tristanHi benschubert... I've been thinking about adding tests for the regression fix... and I think the right solution would be to tweak logging a bit; such that --verbose shows *everything* and --no-verbose continues to behave the way it does07:52
tristanWhen I say *everything*, I mean: Completely nuke the concept of silent messages07:53
tristanI gave a build a try this way, and that led me to discover a bunch more messages07:53
benschubertThat seem insanely verbose no? Verbose is the default mode, so I'd rather not force users to _have_ to specify --no-verbose to have a ok-ish experience07:53
tristanWell07:53
tristanI kind of feel like we should make --no-verbose the default too07:54
tristanShould we bite the bullet and break CLI API for it ?07:55
tristanMake it an enum ?07:55
tristan--verbose=[minimum|....|maximum] ?07:55
tristan(A) I'd like to land the regression fix quickly... (B) I cannot realistically write a regression test for it without this additional change07:56
tristanbenschubert, honestly, it is very interesting to see *all* the messages I have to admit, for example, you can see the messages as we stage junctions at load time, this improves awareness (for us at least)07:57
tristanSo, I would very much like to have a mode where we can see every single logged line07:58
benschuberttristan: if no verbose is default, I'm fine with that change. I'm wondering if a "--verbose --verbose" approach would be nicer ? Or "-vv/--very-verbose" ?07:58
benschubertBut I think that having that additional level of verbosity coul dbe better07:59
benschubertbecause I usually don't care about what happens at that level but might need "--verbose" to debug issues07:59
benschubert(a single run for me results in hundred of thousands of log lines already)07:59
tristanConsequently while seeing that we redundantly stage junctions *every single time*, that led me to discover the weird BST_KEY_REQUIRES_STAGE thing, which led me to make todays proposal on the ML07:59
tristan(pretty lightweight proposal I think)08:00
benschubertyeah I glanced through this, as long as we keep the optimization for 'local' and 'workspace' I'm probably fine :D Need to read more through it08:00
tristanI kind of wrote it whilst understanding what it does08:01
tristanSo the end is probably more informed than the beginning :)08:01
tristanRegarding CLI options: I'm a bit partial to enums.... (A) I'd rather keep verbosity control in a single CLI option, avoids overcrowding... (B) I'd rather not get onto the -v, -vv, -vvv, -vvvv train08:02
tristanTo be honest I personally don't like tools which do (B), they don't tell me how many -v options are meaningful, and tend to add more -v levels over time (more and more verbose)08:03
tristanit's not clean08:03
benschubertFor the 'single enum' I'm happy with that, though it gets harder to check in the messenger, which already has a lot of work to do :)08:03
tristancompared to an enum, which we could add new values to without breaking API, but would still have meaningful descriptions in the man pages08:03
tristanI think my branch will fix that though08:04
tristanbenschubert, The messenger code is a bit hacky, it's details are controlled by external forces08:04
benschubertMy problem with the enum is if we end up: "If MyEnum.LOGLEVEL1 or (MyEnum.LOGLEVEL2 or not silence)" in lots of places, it's not ideal :/08:04
tristan_frontend/app.py and _scheduler/job.py presume to know the meaning of things in _message.py08:04
benschubertyeah, I really need to get a good way of rewriting the messenger -_-'08:05
benschubertwhich is any ways needed to finish the threaded scheduler08:05
tristanThat just needs a single filter function08:05
benschubertso now call yet another function for every line of logs? :D08:05
benschubertA non-negligible amount of time is spent inside the messenger for big projects08:06
benschubertBut yeah, if we don't have a better way08:06
tristanWe can try to remove "function call overheads" separately I don't know08:06
tristanSpreading branch statements about doesn't seem like the right answer for any codebase08:06
tristanWe could even have a C preprocessor run over the python files and expand macros, old school style08:07
benschuberthaha, well ok, let's disregard that point08:16
* tristan had success with a C preprocessor in a JS project in the past, was pretty nifty (different usecase though)08:18
*** santi has joined #buildstream08:23
benschubertwe could also just have some cython code on the hotspots instead? maybe? :P08:32
benschuberttristan: so what other option would you state for that? I think that "-vv" would not be too bad if it was an explicit option (and with --very-verbose), being inclusive of '--verbose'. Otherwise, maybe a flag like "--trace" ? We already have verbose and debug...08:39
benschubertGah, it gets hard to be clear on what does what08:39
tristanI know what I *want* --debug to do09:10
tristanI want `--debug artifact` or `--debug artifact:sandbox` or `--debug sourceplugin[pluginidentifier]:artifact`09:11
tristanselection of debugging topics is what I want from a `--debug` option, and I want `Plugin.debug()` messages to be filtered by their plugin identifier09:11
tristanwith of course a catch all `--debug all`09:12
benschubertSo, should we use the --debug=scheduler for showing such messages for example?09:15
*** tristan has quit IRC09:15
*** tristan has joined #buildstream09:22
*** ChanServ sets mode: +o tristan09:22
*** tristan_ has joined #buildstream09:26
*** ChanServ sets mode: +o tristan_09:26
*** tristan has quit IRC09:27
tristan_Any idea why this would be a fatal error: https://gitlab.com/BuildStream/buildstream/-/jobs/750164954 ?09:29
*** tristan_ is now known as tristan09:29
tristanIf we can't pull it: Build it09:29
juergbitristan: iirc, we fallback to build if any part of the artifact is missing. however, here it seems like an unexpected pull failure, in which case we fail the job and don't move the element to any other queue09:35
juergbi13 is the grpc status code for internal error09:36
tristanmmm, /me retried job09:37
juergbiI'm not sure what the ideal behavior is. if the remote artifact cache is malfunctioning, the user may not want to get a silent rebuild09:38
juergbibut this may vary among users09:38
coldtomimo the failure should be reported loudly, but fallback should be automatic (at least in non-interactive mode)09:42
juergbicontinue with build after pull error if `--on-error continue` is specified (or the user interactively selects 'continue') sounds sensible09:52
tristanconsistent would be to retry09:55
tristanuntil the amount of retries is reached I guess09:55
*** tristan has quit IRC10:02
*** tristan has joined #buildstream10:03
*** ChanServ sets mode: +o tristan10:03
tristanif it fails, continue building is of course more useful, but it probably already should retry; errors in push/pull are network related and have probability to succeed before failing hard10:18
tristanand then, any kind of hard failure of a pull results in building instead - but of course only if `--on-error continue` or choice of continue; yes I think that makes sense to me too10:19
tristanin this case, consistent fail: https://gitlab.com/BuildStream/buildstream/-/jobs/75037946010:20
tristaninteresting, second try fail same place10:21
tristanpulling other artifacts successful for the same pipeline10:22
juergbiI think we already  distinguish between retriable failures and non-retriable ones10:22
juergbiinternal errors are essentially server (or buildbox) bugs, so not considering that as retriable sounds sensible to me10:22
coldtompossible corruption in the cache, i wonder?10:34
coldtombuildstream should probably log the error detail as well as the code, which would make debugging such failures easier10:35
tristanI see, yeah if it's a bug its a bug11:16
tristanwould be good to have more info and treat it as fatal11:16
jjardonHi, for some given inputs, if the binary output is different (because the project is not binary reproducible), how buildstream treats that cache level? I guess the binary output is not part of the cache key, rigth?12:04
ironfootjjardon: I may be wrong here, but if that was the case, wouldn't be impossible to benefit from the cache at all?12:06
tristanjjardon, no it is not12:06
tristanironfoot, it would be possible with a different build topology - iirc bazel takes the opposite approach12:07
tristanA dude who came to last london build meetup had an interesting talk about that12:07
tristanfor instance; you "just build": And then you take the hashes of the outputs and consider those as the inputs of reverse dependencies12:08
tristanif you happen to produce identical output, you can do the "cutoff" thing where you stop building reverse dependencies because the inputs were binary identical (even if the sources are changed, because of comments or whatever)12:08
tristanimplementing that reverse cutoff in BuildStream is something we discussed a few times (I ineptly called this "artifact aliasing", i.e. aliasing newly created cache keys to identical build outputs)12:09
ironfootI see, interesting12:10
tristanjjardon, the content digests of artifacts are of course used in CAS and in RE12:10
tristanI12:10
tristangah12:10
tristanI suppose that RE setups are probably optimized by reproducible builds in BuildStream, as there is less content to move from one place to another12:11
tristanit looks like fixing the overnight tests might be blocking on this pull bug :'(12:14
tristanMaybe we can merge it anyway so long as the "no-cache" pipeline passes: https://gitlab.com/BuildStream/buildstream/-/jobs/75016495812:15
tristanAnd zap the artifact cache12:15
jjardontristan: so in this case, buildstream will reuse the cache even if potentially the output binary is different?12:16
tristanjjardon, Yes, also: if you don't ever build the same input multiple times, you don't know if it's reproducible or not12:17
tristanjjardon, with BuildStream, we provide an environment that tries to be as conducive as possible to reproducibility, and strive to never build the same thing twice12:18
tristanIf the inputs have not changed (an artifact is available under that key), then the output (artifact) is reused12:18
tristanhttps://bb-cache.buildstream.build:11002 <-- I don't know much/anything about this host12:18
jjardontristan: yup ok, thanks for the info12:18
tristanjjardon, juergbi... That is used in the buildstream overnight tests... would we know how to get more information from that host which could be helpful to fix "Bug number 13" ?12:19
jjardontristan: I think is defined here: https://gitlab.com/BuildStream/infrastructure/infrastructure12:19
tristanWARNING cross-compilers/standard-libs-i686.bst: Could not pull from remote https://bb-cache.buildstream.build:11002: Failed to download blob d35f019a8c9d24115c4658147b340fe6baf23ab6d76922035835ff0d931ed3c5: 13 <-- bug 1312:20
jjardoncoldtom: ^ ?12:20
tristanthat means someone has credentials to login and poke around and gather logs ?12:20
tristanI don't know even what to do with them to be honest12:20
tristanjust trying to figure out what is the appropriate course of action when we hit a "numbered bug"12:20
jjardonAFAIK what is deployed there is https://github.com/buildbarn/bb-remote-asset12:22
jjardonthat implements the new protocols  buildstream  uses as master will drop  / has drop support for bst-artifact-server12:23
tristanRight okay, so... what should we do... do we have a "zap cache" button in case of blocking issues like this one ?12:27
tristanThat is an extreme measure of course12:27
tristanMeasure number 1 would be 's/13/Description of what actually happened/g' in https://github.com/buildbarn/bb-remote-asset, or whatever is producing the 1312:28
tristanas well as dumping everything important to a log file12:28
tristanbut we have 2 objectives: Solve the bugs which happen... Pass CI for unrelated issues12:28
tristanWe can't be blocking up BuildStream development because of shaky bb-remote-asset implementations (although we should still be fixing those asynchronously)12:29
jjardontristan I would try to make the no-cache pass and that would be enough to merge for now12:30
jjardonWhile we fix the cached one12:30
tristan"we" is a loaded word I guess, I don't want to imply that "it's BuildStream core developers problem if bb-remote-asset has a bug"12:30
tristanbut, if we're using it in our CI, it does become our problem12:30
tristanjjardon, yeah, I suspect it will, lets give it another day to finish the build ...12:31
* tristan just thinks that... it should be easy for a BuildStream developer to spin up a docker or vm locally and reproduce what is happening in the overnight tests, based on source code for BuildStream + buildbox-run + buildbox-casd + bb-remote-whatever-we-use12:32
jjardontristan I'd say it's our problem the only remote cache solution buildstream has keep working12:32
tristanHaving things in all kind of cloudy locations isn't really helpful for fixing stuff :-/12:33
tristanWe don't really know where the bug is either12:33
tristanBut we should have our hands on all materials and be able to reproduce something in order to fix things12:34
tristanI'm guessing it's not super hard to fix, just don't have an easy-to-spin-up dev environment right now12:34
jjardonThe cache server is in a normal machine people can SSH to, same as always. What you proposse works for the CI but not if you want to preserve the cache for overnigth tests. Check the ci jobs for remote cache, I think they have a setup as you describe12:35
tristanOk, I'll take a look into that and try not to get lost12:36
tristanI only wish I can have all services running in containers on my own laptop12:37
tristanSo I can do stuff like run the services under gdb and such, and launch builds12:37
jjardontristan afaik bst ci does exactly that for normal MRs in the remote cache job12:38
tristannice12:38
tristanI thought it used some fancy gitlab feature with multiple docker containers12:38
tristanbut I'll be very happy if it doesnt :D12:38
*** tristan has quit IRC12:44
*** tristan has joined #buildstream12:45
*** ChanServ sets mode: +o tristan12:45
coldtomfor the remote cache, i can't tell what's gone wrong without the error detail :/ all the error in the log tells us is that it's a gRPC Internal error, and that it's either from buildbox or bb-storage12:47
coldtomif the pipeline preserves the buildbox logs as an artifact, we might be able to rule out buildbox12:50
gitlab-br-botabderrahimk opened MR !2071 (abderrahim/stage-artifact-scriptelement->bst-1: scriptelement.py: use stage_artifact() instead of stage_dependency_artifact()) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/207113:01
*** tomaz has joined #buildstream15:46
douglaswinshipWhile writing a plugin, how would I set a default value for a list that i'm reading from the config info?17:42
douglaswinshipI want to write self.blacklist = self.node_subst_list(node, "blacklist", default=[])17:42
douglaswinshipbut node_subst_list doesn't have a "default" argument.17:42
douglaswinshipI want to let users create a list of "blacklist" items in the config dictionary. But if the user doesn't define a blacklist at all, I don't think it should throw an error.17:45
juergbidouglaswinship: in master? node.get_str_list("blacklist", default=[]) should work17:49
douglaswinshipjuergbi: very good to know.17:50
douglaswinshipHow about 1.4.3 though?17:50
douglaswinshipis there a workaround?17:50
juergbidouglaswinship: however, if this is a top-level config option, you should rather add it to the corresponding plugin.yaml as: blacklist: []17:50
juergbiand that way `blacklist` will always be defined17:50
juergbithis should also work on 1.417:51
douglaswinshipjuergbi: oh, I hadn't thought of that. Thanks!17:51
juergbiyw17:51
*** santi has quit IRC18:04
tomazpeople, I"m a bit lost trying to make a bootstrap image. I'm using freedesktop-sdk and changing the initial image for testing purposes, using archlinux.18:41
tomazwhile compilling an element, I'm having the issue: bootstrap/debugedit.bst fails with rpmio.c:1050:10: fatal error: zstd.h: No such file or directory18:41
tomazbut that makes no sense for me, as this is installed on the image:18:41
tomaz<0mbootstrap/debugedit.bst:/usr/include]$ ls | grep zstd.h18:42
tomazzstd.h18:42
tomazhttps://termbin.com/xxbn the log18:46
*** benschubert has quit IRC23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!