*** tristan_ has joined #buildstream | 00:00 | |
*** ChanServ sets mode: +o tristan_ | 00:00 | |
*** tristan_ has quit IRC | 00:04 | |
*** tristan_ has joined #buildstream | 03:48 | |
*** ChanServ sets mode: +o tristan_ | 03:48 | |
*** tristan_ is now known as tristan | 03:49 | |
tristan | Hmmmm, not sure how fond I am of Source.BST_KEY_REQUIRES_STAGE | 05:54 |
---|---|---|
tristan | Seems to have a lot of implications wrapped up into there | 05:54 |
tristan | As a plugin author, what am I to do with "Whether the source will require staging in order to efficiently generate a unique key" ? | 05:55 |
tristan | Seems to be a strange contract | 05:55 |
tristan | Tracking sources explicitly does not require downloading them, we have inefficiently implemented tracking in ways which download, but that's supposed to be improvable, that gives you a ref; and a ref should be enough to get a key | 05:56 |
tristan | But | 05:56 |
tristan | Indeed, if tracking gives you only a ref, that doesn't necessarily mean you get a key, you have the opportunity to stage it first | 05:56 |
tristan | fetching requires a ref, not a key, oddly enough | 05:57 |
tristan | Although, I would certainly expect to have all my keys in a stable state before fetching sources | 05:57 |
tristan | We *know* exactly what we're going to build, that means we can show a key | 05:57 |
tristan | This weird trick however seems to be only used in the case that sources are available locally, it would seem like a good idea to at least make BST_KEY_REQUIRES_STAGE private | 05:58 |
tristan | Also note: This staging appears to happen for *all local sources* at startup, every time | 05:59 |
tristan | I guess this is a trick to leverage CAS in order to generate a checksum instead of using a checksum utility, but not sure how much better it is (especially if it has to stage every startup) | 06:00 |
tristan | https://gitlab.com/BuildStream/buildstream/-/merge_requests/1651 | 06:04 |
* tristan wonders what are the motivations behind this | 06:04 | |
tristan | So this whole thing is driven just by the workspace changes | 06:05 |
* tristan serves up a proposal on the ML for this | 07:29 | |
*** benschubert has joined #buildstream | 07:43 | |
tristan | Hi benschubert... I've been thinking about adding tests for the regression fix... and I think the right solution would be to tweak logging a bit; such that --verbose shows *everything* and --no-verbose continues to behave the way it does | 07:52 |
tristan | When I say *everything*, I mean: Completely nuke the concept of silent messages | 07:53 |
tristan | I gave a build a try this way, and that led me to discover a bunch more messages | 07:53 |
benschubert | That seem insanely verbose no? Verbose is the default mode, so I'd rather not force users to _have_ to specify --no-verbose to have a ok-ish experience | 07:53 |
tristan | Well | 07:53 |
tristan | I kind of feel like we should make --no-verbose the default too | 07:54 |
tristan | Should we bite the bullet and break CLI API for it ? | 07:55 |
tristan | Make it an enum ? | 07:55 |
tristan | --verbose=[minimum|....|maximum] ? | 07:55 |
tristan | (A) I'd like to land the regression fix quickly... (B) I cannot realistically write a regression test for it without this additional change | 07:56 |
tristan | benschubert, honestly, it is very interesting to see *all* the messages I have to admit, for example, you can see the messages as we stage junctions at load time, this improves awareness (for us at least) | 07:57 |
tristan | So, I would very much like to have a mode where we can see every single logged line | 07:58 |
benschubert | tristan: if no verbose is default, I'm fine with that change. I'm wondering if a "--verbose --verbose" approach would be nicer ? Or "-vv/--very-verbose" ? | 07:58 |
benschubert | But I think that having that additional level of verbosity coul dbe better | 07:59 |
benschubert | because I usually don't care about what happens at that level but might need "--verbose" to debug issues | 07:59 |
benschubert | (a single run for me results in hundred of thousands of log lines already) | 07:59 |
tristan | Consequently while seeing that we redundantly stage junctions *every single time*, that led me to discover the weird BST_KEY_REQUIRES_STAGE thing, which led me to make todays proposal on the ML | 07:59 |
tristan | (pretty lightweight proposal I think) | 08:00 |
benschubert | yeah I glanced through this, as long as we keep the optimization for 'local' and 'workspace' I'm probably fine :D Need to read more through it | 08:00 |
tristan | I kind of wrote it whilst understanding what it does | 08:01 |
tristan | So the end is probably more informed than the beginning :) | 08:01 |
tristan | Regarding CLI options: I'm a bit partial to enums.... (A) I'd rather keep verbosity control in a single CLI option, avoids overcrowding... (B) I'd rather not get onto the -v, -vv, -vvv, -vvvv train | 08:02 |
tristan | To be honest I personally don't like tools which do (B), they don't tell me how many -v options are meaningful, and tend to add more -v levels over time (more and more verbose) | 08:03 |
tristan | it's not clean | 08:03 |
benschubert | For the 'single enum' I'm happy with that, though it gets harder to check in the messenger, which already has a lot of work to do :) | 08:03 |
tristan | compared to an enum, which we could add new values to without breaking API, but would still have meaningful descriptions in the man pages | 08:03 |
tristan | I think my branch will fix that though | 08:04 |
tristan | benschubert, The messenger code is a bit hacky, it's details are controlled by external forces | 08:04 |
benschubert | My problem with the enum is if we end up: "If MyEnum.LOGLEVEL1 or (MyEnum.LOGLEVEL2 or not silence)" in lots of places, it's not ideal :/ | 08:04 |
tristan | _frontend/app.py and _scheduler/job.py presume to know the meaning of things in _message.py | 08:04 |
benschubert | yeah, I really need to get a good way of rewriting the messenger -_-' | 08:05 |
benschubert | which is any ways needed to finish the threaded scheduler | 08:05 |
tristan | That just needs a single filter function | 08:05 |
benschubert | so now call yet another function for every line of logs? :D | 08:05 |
benschubert | A non-negligible amount of time is spent inside the messenger for big projects | 08:06 |
benschubert | But yeah, if we don't have a better way | 08:06 |
tristan | We can try to remove "function call overheads" separately I don't know | 08:06 |
tristan | Spreading branch statements about doesn't seem like the right answer for any codebase | 08:06 |
tristan | We could even have a C preprocessor run over the python files and expand macros, old school style | 08:07 |
benschubert | haha, well ok, let's disregard that point | 08:16 |
* tristan had success with a C preprocessor in a JS project in the past, was pretty nifty (different usecase though) | 08:18 | |
*** santi has joined #buildstream | 08:23 | |
benschubert | we could also just have some cython code on the hotspots instead? maybe? :P | 08:32 |
benschubert | tristan: so what other option would you state for that? I think that "-vv" would not be too bad if it was an explicit option (and with --very-verbose), being inclusive of '--verbose'. Otherwise, maybe a flag like "--trace" ? We already have verbose and debug... | 08:39 |
benschubert | Gah, it gets hard to be clear on what does what | 08:39 |
tristan | I know what I *want* --debug to do | 09:10 |
tristan | I want `--debug artifact` or `--debug artifact:sandbox` or `--debug sourceplugin[pluginidentifier]:artifact` | 09:11 |
tristan | selection of debugging topics is what I want from a `--debug` option, and I want `Plugin.debug()` messages to be filtered by their plugin identifier | 09:11 |
tristan | with of course a catch all `--debug all` | 09:12 |
benschubert | So, should we use the --debug=scheduler for showing such messages for example? | 09:15 |
*** tristan has quit IRC | 09:15 | |
*** tristan has joined #buildstream | 09:22 | |
*** ChanServ sets mode: +o tristan | 09:22 | |
*** tristan_ has joined #buildstream | 09:26 | |
*** ChanServ sets mode: +o tristan_ | 09:26 | |
*** tristan has quit IRC | 09:27 | |
tristan_ | Any idea why this would be a fatal error: https://gitlab.com/BuildStream/buildstream/-/jobs/750164954 ? | 09:29 |
*** tristan_ is now known as tristan | 09:29 | |
tristan | If we can't pull it: Build it | 09:29 |
juergbi | tristan: iirc, we fallback to build if any part of the artifact is missing. however, here it seems like an unexpected pull failure, in which case we fail the job and don't move the element to any other queue | 09:35 |
juergbi | 13 is the grpc status code for internal error | 09:36 |
tristan | mmm, /me retried job | 09:37 |
juergbi | I'm not sure what the ideal behavior is. if the remote artifact cache is malfunctioning, the user may not want to get a silent rebuild | 09:38 |
juergbi | but this may vary among users | 09:38 |
coldtom | imo the failure should be reported loudly, but fallback should be automatic (at least in non-interactive mode) | 09:42 |
juergbi | continue with build after pull error if `--on-error continue` is specified (or the user interactively selects 'continue') sounds sensible | 09:52 |
tristan | consistent would be to retry | 09:55 |
tristan | until the amount of retries is reached I guess | 09:55 |
*** tristan has quit IRC | 10:02 | |
*** tristan has joined #buildstream | 10:03 | |
*** ChanServ sets mode: +o tristan | 10:03 | |
tristan | if it fails, continue building is of course more useful, but it probably already should retry; errors in push/pull are network related and have probability to succeed before failing hard | 10:18 |
tristan | and then, any kind of hard failure of a pull results in building instead - but of course only if `--on-error continue` or choice of continue; yes I think that makes sense to me too | 10:19 |
tristan | in this case, consistent fail: https://gitlab.com/BuildStream/buildstream/-/jobs/750379460 | 10:20 |
tristan | interesting, second try fail same place | 10:21 |
tristan | pulling other artifacts successful for the same pipeline | 10:22 |
juergbi | I think we already distinguish between retriable failures and non-retriable ones | 10:22 |
juergbi | internal errors are essentially server (or buildbox) bugs, so not considering that as retriable sounds sensible to me | 10:22 |
coldtom | possible corruption in the cache, i wonder? | 10:34 |
coldtom | buildstream should probably log the error detail as well as the code, which would make debugging such failures easier | 10:35 |
tristan | I see, yeah if it's a bug its a bug | 11:16 |
tristan | would be good to have more info and treat it as fatal | 11:16 |
jjardon | Hi, for some given inputs, if the binary output is different (because the project is not binary reproducible), how buildstream treats that cache level? I guess the binary output is not part of the cache key, rigth? | 12:04 |
ironfoot | jjardon: I may be wrong here, but if that was the case, wouldn't be impossible to benefit from the cache at all? | 12:06 |
tristan | jjardon, no it is not | 12:06 |
tristan | ironfoot, it would be possible with a different build topology - iirc bazel takes the opposite approach | 12:07 |
tristan | A dude who came to last london build meetup had an interesting talk about that | 12:07 |
tristan | for instance; you "just build": And then you take the hashes of the outputs and consider those as the inputs of reverse dependencies | 12:08 |
tristan | if you happen to produce identical output, you can do the "cutoff" thing where you stop building reverse dependencies because the inputs were binary identical (even if the sources are changed, because of comments or whatever) | 12:08 |
tristan | implementing that reverse cutoff in BuildStream is something we discussed a few times (I ineptly called this "artifact aliasing", i.e. aliasing newly created cache keys to identical build outputs) | 12:09 |
ironfoot | I see, interesting | 12:10 |
tristan | jjardon, the content digests of artifacts are of course used in CAS and in RE | 12:10 |
tristan | I | 12:10 |
tristan | gah | 12:10 |
tristan | I suppose that RE setups are probably optimized by reproducible builds in BuildStream, as there is less content to move from one place to another | 12:11 |
tristan | it looks like fixing the overnight tests might be blocking on this pull bug :'( | 12:14 |
tristan | Maybe we can merge it anyway so long as the "no-cache" pipeline passes: https://gitlab.com/BuildStream/buildstream/-/jobs/750164958 | 12:15 |
tristan | And zap the artifact cache | 12:15 |
jjardon | tristan: so in this case, buildstream will reuse the cache even if potentially the output binary is different? | 12:16 |
tristan | jjardon, Yes, also: if you don't ever build the same input multiple times, you don't know if it's reproducible or not | 12:17 |
tristan | jjardon, with BuildStream, we provide an environment that tries to be as conducive as possible to reproducibility, and strive to never build the same thing twice | 12:18 |
tristan | If the inputs have not changed (an artifact is available under that key), then the output (artifact) is reused | 12:18 |
tristan | https://bb-cache.buildstream.build:11002 <-- I don't know much/anything about this host | 12:18 |
jjardon | tristan: yup ok, thanks for the info | 12:18 |
tristan | jjardon, juergbi... That is used in the buildstream overnight tests... would we know how to get more information from that host which could be helpful to fix "Bug number 13" ? | 12:19 |
jjardon | tristan: I think is defined here: https://gitlab.com/BuildStream/infrastructure/infrastructure | 12:19 |
tristan | WARNING cross-compilers/standard-libs-i686.bst: Could not pull from remote https://bb-cache.buildstream.build:11002: Failed to download blob d35f019a8c9d24115c4658147b340fe6baf23ab6d76922035835ff0d931ed3c5: 13 <-- bug 13 | 12:20 |
jjardon | coldtom: ^ ? | 12:20 |
tristan | that means someone has credentials to login and poke around and gather logs ? | 12:20 |
tristan | I don't know even what to do with them to be honest | 12:20 |
tristan | just trying to figure out what is the appropriate course of action when we hit a "numbered bug" | 12:20 |
jjardon | AFAIK what is deployed there is https://github.com/buildbarn/bb-remote-asset | 12:22 |
jjardon | that implements the new protocols buildstream uses as master will drop / has drop support for bst-artifact-server | 12:23 |
tristan | Right okay, so... what should we do... do we have a "zap cache" button in case of blocking issues like this one ? | 12:27 |
tristan | That is an extreme measure of course | 12:27 |
tristan | Measure number 1 would be 's/13/Description of what actually happened/g' in https://github.com/buildbarn/bb-remote-asset, or whatever is producing the 13 | 12:28 |
tristan | as well as dumping everything important to a log file | 12:28 |
tristan | but we have 2 objectives: Solve the bugs which happen... Pass CI for unrelated issues | 12:28 |
tristan | We can't be blocking up BuildStream development because of shaky bb-remote-asset implementations (although we should still be fixing those asynchronously) | 12:29 |
jjardon | tristan I would try to make the no-cache pass and that would be enough to merge for now | 12:30 |
jjardon | While we fix the cached one | 12:30 |
tristan | "we" is a loaded word I guess, I don't want to imply that "it's BuildStream core developers problem if bb-remote-asset has a bug" | 12:30 |
tristan | but, if we're using it in our CI, it does become our problem | 12:30 |
tristan | jjardon, yeah, I suspect it will, lets give it another day to finish the build ... | 12:31 |
* tristan just thinks that... it should be easy for a BuildStream developer to spin up a docker or vm locally and reproduce what is happening in the overnight tests, based on source code for BuildStream + buildbox-run + buildbox-casd + bb-remote-whatever-we-use | 12:32 | |
jjardon | tristan I'd say it's our problem the only remote cache solution buildstream has keep working | 12:32 |
tristan | Having things in all kind of cloudy locations isn't really helpful for fixing stuff :-/ | 12:33 |
tristan | We don't really know where the bug is either | 12:33 |
tristan | But we should have our hands on all materials and be able to reproduce something in order to fix things | 12:34 |
tristan | I'm guessing it's not super hard to fix, just don't have an easy-to-spin-up dev environment right now | 12:34 |
jjardon | The cache server is in a normal machine people can SSH to, same as always. What you proposse works for the CI but not if you want to preserve the cache for overnigth tests. Check the ci jobs for remote cache, I think they have a setup as you describe | 12:35 |
tristan | Ok, I'll take a look into that and try not to get lost | 12:36 |
tristan | I only wish I can have all services running in containers on my own laptop | 12:37 |
tristan | So I can do stuff like run the services under gdb and such, and launch builds | 12:37 |
jjardon | tristan afaik bst ci does exactly that for normal MRs in the remote cache job | 12:38 |
tristan | nice | 12:38 |
tristan | I thought it used some fancy gitlab feature with multiple docker containers | 12:38 |
tristan | but I'll be very happy if it doesnt :D | 12:38 |
*** tristan has quit IRC | 12:44 | |
*** tristan has joined #buildstream | 12:45 | |
*** ChanServ sets mode: +o tristan | 12:45 | |
coldtom | for the remote cache, i can't tell what's gone wrong without the error detail :/ all the error in the log tells us is that it's a gRPC Internal error, and that it's either from buildbox or bb-storage | 12:47 |
coldtom | if the pipeline preserves the buildbox logs as an artifact, we might be able to rule out buildbox | 12:50 |
gitlab-br-bot | abderrahimk opened MR !2071 (abderrahim/stage-artifact-scriptelement->bst-1: scriptelement.py: use stage_artifact() instead of stage_dependency_artifact()) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/2071 | 13:01 |
*** tomaz has joined #buildstream | 15:46 | |
douglaswinship | While writing a plugin, how would I set a default value for a list that i'm reading from the config info? | 17:42 |
douglaswinship | I want to write self.blacklist = self.node_subst_list(node, "blacklist", default=[]) | 17:42 |
douglaswinship | but node_subst_list doesn't have a "default" argument. | 17:42 |
douglaswinship | I want to let users create a list of "blacklist" items in the config dictionary. But if the user doesn't define a blacklist at all, I don't think it should throw an error. | 17:45 |
juergbi | douglaswinship: in master? node.get_str_list("blacklist", default=[]) should work | 17:49 |
douglaswinship | juergbi: very good to know. | 17:50 |
douglaswinship | How about 1.4.3 though? | 17:50 |
douglaswinship | is there a workaround? | 17:50 |
juergbi | douglaswinship: however, if this is a top-level config option, you should rather add it to the corresponding plugin.yaml as: blacklist: [] | 17:50 |
juergbi | and that way `blacklist` will always be defined | 17:50 |
juergbi | this should also work on 1.4 | 17:51 |
douglaswinship | juergbi: oh, I hadn't thought of that. Thanks! | 17:51 |
juergbi | yw | 17:51 |
*** santi has quit IRC | 18:04 | |
tomaz | people, I"m a bit lost trying to make a bootstrap image. I'm using freedesktop-sdk and changing the initial image for testing purposes, using archlinux. | 18:41 |
tomaz | while compilling an element, I'm having the issue: bootstrap/debugedit.bst fails with rpmio.c:1050:10: fatal error: zstd.h: No such file or directory | 18:41 |
tomaz | but that makes no sense for me, as this is installed on the image: | 18:41 |
tomaz | <0mbootstrap/debugedit.bst:/usr/include]$ ls | grep zstd.h | 18:42 |
tomaz | zstd.h | 18:42 |
tomaz | https://termbin.com/xxbn the log | 18:46 |
*** benschubert has quit IRC | 23:43 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!