*** benschubert has joined #buildstream | 08:10 | |
*** tpollard has joined #buildstream | 08:46 | |
*** rdale has joined #buildstream | 09:05 | |
*** tristan has quit IRC | 09:09 | |
*** santi has joined #buildstream | 09:40 | |
*** phildawson has joined #buildstream | 09:50 | |
*** lachlan has joined #buildstream | 09:53 | |
*** tristan has joined #buildstream | 10:04 | |
*** lachlan has quit IRC | 10:27 | |
*** narispo has quit IRC | 10:33 | |
*** narispo has joined #buildstream | 10:34 | |
*** lachlan has joined #buildstream | 10:36 | |
*** lachlan has quit IRC | 10:56 | |
*** lachlan has joined #buildstream | 10:57 | |
*** jib has joined #buildstream | 11:02 | |
*** jib has left #buildstream | 11:03 | |
*** lachlan has quit IRC | 11:06 | |
*** lachlan has joined #buildstream | 11:18 | |
*** lachlan has quit IRC | 12:05 | |
WSalmon | juergbi, benschubert dose bst start a new cas-d for every involcation of bst? or can it share them | 12:11 |
---|---|---|
benschubert | every invocation, having it for multiple would be more complex but something we might have to consider | 12:12 |
WSalmon | do they look for the size at the start and then just add to a counter as they make stuff? | 12:14 |
WSalmon | if two were making stuff at the same time then they would get out of sink but if you start bst tomorrow then it will be write then | 12:16 |
WSalmon | but under the new way it may not | 12:16 |
WSalmon | thats why buildgrid just has one that says around all the time but thats not really practiacl here as far as i can tell | 12:17 |
benschubert | buildgrid is a service, it's used in a widely different way than buildstream is. | 12:22 |
juergbi | WSalmon: yes, that's why I mentioned on the ML that one issue is missing protection against multiple casd instances | 12:22 |
benschubert | juergbi: however, running multiple bst at the same time has always been undefined behavior right? :) | 12:22 |
juergbi | it would be good to add this before implementing the disk usage file | 12:22 |
benschubert | (Not saying that's a good thing) | 12:22 |
benschubert | add this -> not sure what you mean by "this" ? | 12:23 |
juergbi | with regards to expiry, yes | 12:23 |
juergbi | don't allow multiple casd instances in the same directory | 12:23 |
benschubert | ah right | 12:23 |
juergbi | (or keep spawned casd running but I'm not really a fan of that) | 12:23 |
benschubert | Adding a lock file in the directory with a PID inside would do right? | 12:23 |
WSalmon | benschubert, 2 bst's is bad but it dosnt mess up your cache going forward generally which was my concern | 12:25 |
juergbi | PID is not perfect but it might be good enough in practice | 12:25 |
benschubert | there is no perfect solution though right? :) | 12:25 |
juergbi | wondering whether race free uniqueness would be possible in a reasonable way by means of the socket file | 12:26 |
benschubert | I mean other than telling the user "ah you have a .lock file, is it stray? IF so delete it" | 12:26 |
WSalmon | i was gona say about maybe just removing the file if it was already there but making sure that only one casd can run aggenst one cache sounds like a good idea | 12:26 |
benschubert | But we generate random sockets. Would you want to encode the directory in the socket name? | 12:27 |
juergbi | the reason we generate random socket files is to allow multiple casd instances | 12:27 |
juergbi | we'd drop that, of course | 12:27 |
benschubert | (We _could_, but then what if I create my socket somewhere else?) | 12:27 |
juergbi | I wouldn't be worried about non-buildstream casd invocations in the same directory | 12:28 |
juergbi | to clarify, the socket file is already in the cas directory | 12:28 |
benschubert | fair | 12:28 |
benschubert | then yeah using a unique name for the socket would be a solution | 12:29 |
juergbi | we also create a directory in /tmp and a symlink to the socket but that's just to workaround the idiotic path length restriction | 12:29 |
juergbi | (or rather, the missing connectat() syscall) | 12:29 |
benschubert | juergbi: oh btw, for userchroot and the newer pytest version, it seems like it's something between pytest and pytest-forked that fails :/ | 12:32 |
benschubert | (running without "-n2" makes the test pass | 12:32 |
benschubert | I'll probably pin pytest to 4.3 in the meantime | 12:32 |
juergbi | I guess bisecting pytest or pytest-forked is not quite as straight forward' | 12:33 |
benschubert | Yep :) It's a change in pytest breaking pytest-forked | 12:37 |
benschubert | I have found nothing obvious | 12:37 |
benschubert | I'll dig a bit more but will soon give up and pin | 12:37 |
*** lachlan has joined #buildstream | 12:49 | |
*** lachlan has quit IRC | 12:56 | |
*** CTtpollard has joined #buildstream | 14:00 | |
*** tpollard has quit IRC | 14:00 | |
*** lachlan has joined #buildstream | 14:18 | |
*** lachlan has quit IRC | 14:28 | |
*** CTtpollard has quit IRC | 14:36 | |
*** lachlan has joined #buildstream | 14:44 | |
gitlab-br-bot | jjardon opened issue #1276 (BuildStream build fails if the CAS is missing blobs) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/1276 | 15:20 |
*** lachlan has quit IRC | 15:26 | |
juergbi | jjardon: at a very quick glance it seems BuildStream doesn't recognize the error as a NOT_FOUND error and we currently only fall back to local build on NOT_FOUND errors | 15:32 |
juergbi | besides fixing that issue at hand, it might make sense to fall back even for other pull errors. that said, it would still be good for the user to be aware of such unexpected errors | 15:33 |
juergbi | (could be an issue in BuildStream or BuildBox) | 15:34 |
jjardon | juergbi: yup, that is why I mentioned there that maybe the behavious should be configurable | 15:35 |
jjardon | ("always use the cache if present", "fallback even if present", "never use even if present", etc) | 15:36 |
*** tpollard has joined #buildstream | 15:36 | |
juergbi | for `bst build` we should always perform builds if it's not cached yet | 15:36 |
*** lachlan has joined #buildstream | 15:36 | |
jjardon | yeah, agree | 15:37 |
juergbi | remote cache is optional, of course. we might still be missing some CLI option to override remote cache in config files, don't remember | 15:37 |
juergbi | and if you don't want any builds, you should call bst artifact pull instead of bst build | 15:37 |
jjardon | that makes sense | 15:37 |
juergbi | what I was referring to is the distinction between 'blob not found' and other gRPC errors | 15:38 |
juergbi | e.g. network or internal server error | 15:38 |
cphang | juergbi this came up as with the deployments we've been developing with, there isn't referential integrity between the artifact cache and CAS. | 15:38 |
tpollard | bst2 build can override remove cache via the cli | 15:38 |
jjardon | right | 15:38 |
tpollard | s/remove/remote | 15:38 |
juergbi | cphang: correct, buildstream should be able to deal with missing blobs | 15:38 |
tpollard | s/bst2/bst master | 15:39 |
juergbi | but within certain limits | 15:39 |
cphang | indeed | 15:39 |
juergbi | i.e., if we call findmissingblobs on the remote CAS server and it's all there (or freshly uploaded), we expect it to stay there for a while | 15:39 |
cphang | So in the buildbarn deployments we're working with, we'll be doing some server side development to make sure we can provide those guarantees. | 15:39 |
jjardon | tpollard: seems it's a bit broken: https://gitlab.com/BuildStream/buildstream/-/issues/1240 | 15:39 |
juergbi | we can't recover from blobs going missing in the same session | 15:39 |
cphang | Indeed | 15:39 |
cphang | Bazel is the same (well with builds without the bytes enabled) | 15:40 |
cphang | But Bazel does have the means to fall back and reupload, if that mode is not enabled. | 15:40 |
cphang | So I think it's beneificial for buildstream to have that same behaviour, if not *strictly* essential. | 15:41 |
tpollard | jjardon: there should be test coverage for it on master (the --remote option) | 15:41 |
jjardon | mmm, so can we say that at the moment buildstream remote cache have the same cache restrictions as bazel with builds without the bytes? | 15:42 |
cphang | jjardon similar. Then there's the distinction between the action cache that Bazel uses, and the artifact cache that buildstream currently uses, and with the pending move to using the remote-asset api | 15:43 |
coldtom | tpollard, that means that if i want to avoid pushing, i also lose the ability to pull though? | 15:43 |
cphang | juergbi is that an accurate statement? | 15:43 |
juergbi | cphang: BuildStream checks all required blobs for a particular action are on the CAS server (uploads missing ones) before issuing an action | 15:43 |
jjardon | tpollard: nice, since when? seems coldtom experience the same some months ago | 15:43 |
tpollard | coldtom; | 15:44 |
juergbi | cphang: BuildStream does not assume that all blobs are available for artifacts in the artifact cache, afaik | 15:44 |
tpollard | coldtom: yep, I would like to see it extended | 15:44 |
cphang | juergbi, I believe coldtom found that if you delete the CAS and then in a separate bst session try and pull blobs from the CAS that are referenced in the artifact cache then the build fails. coldtom can you confirm? | 15:47 |
*** lachlan has quit IRC | 15:47 | |
cphang | This is documented at https://gitlab.com/celduin/infrastructure/celduin-infra/-/issues/37 | 15:47 |
juergbi | it's definitely possible but that would be a bug, not by design | 15:47 |
juergbi | as per my initial comment here to jjardon | 15:48 |
juergbi | jjardon, cphang: this might help https://gitlab.com/BuildStream/buildstream/-/commit/b0e84e0cfaffa1fc7a196b991458600a8afd14c0 | 16:00 |
cphang | ooh | 16:01 |
cphang | coldtom ^ | 16:01 |
cphang | tvm juergbi | 16:04 |
coldtom | ta juergbi | 16:07 |
*** lachlan has joined #buildstream | 16:08 | |
jjardon | juergbi: great, thanks! | 16:09 |
jjardon | valentind: ^^ let's try again to build when that get merged | 16:15 |
*** lachlan has quit IRC | 16:46 | |
*** lachlan has joined #buildstream | 16:52 | |
*** lachlan has quit IRC | 17:13 | |
valentind | jjardon, we use the latest tag. | 17:18 |
valentind | So it would be nice if there was a snapshot done at some point | 17:18 |
valentind | I can try to apply the patch however. | 17:20 |
valentind | jjardon, here: https://gitlab.com/freedesktop-sdk/infrastructure/freedesktop-sdk-docker-images/-/merge_requests/95 | 17:24 |
*** lachlan has joined #buildstream | 17:28 | |
*** tpollard has quit IRC | 17:28 | |
jjardon | valentind: coolio, thanks! | 17:33 |
valentind | jjardon, Just approve it, and I will merge. | 17:34 |
jjardon | valentind: done! | 17:34 |
*** lachlan has quit IRC | 17:45 | |
*** lachlan has joined #buildstream | 17:56 | |
*** lachlan has quit IRC | 18:15 | |
*** lachlan has joined #buildstream | 18:30 | |
*** santi has quit IRC | 18:42 | |
*** lachlan has quit IRC | 18:42 | |
*** lachlan has joined #buildstream | 18:45 | |
*** lachlan has quit IRC | 19:12 | |
*** toscalix has joined #buildstream | 19:18 | |
*** toscalix has quit IRC | 19:23 | |
*** toscalix has joined #buildstream | 19:23 | |
*** phildawson has quit IRC | 19:25 | |
*** phildawson has joined #buildstream | 19:26 | |
*** phildawson has quit IRC | 19:30 | |
*** rdale has quit IRC | 20:01 | |
*** mohan43u has quit IRC | 20:22 | |
*** mohan43u has joined #buildstream | 20:25 | |
*** benschubert has quit IRC | 20:31 | |
*** toscalix has quit IRC | 21:09 | |
*** narispo has quit IRC | 21:43 | |
*** narispo has joined #buildstream | 21:43 | |
valentind | jjardon, juergbi, same error with the patch: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/jobs/489065406 | 22:47 |
valentind | It could be that in _CASBatchRead.send, missing_blobs is not None, so it never raises BlobNotFound. And instead it raises a generic CASRemoteError. | 22:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!