juergbi | nanonyme: no, I'm not aware of any hard limits | 06:32 |
---|---|---|
juergbi | there might be disk space issues with temporary files when dealing with multi-GB blobs | 06:34 |
nanonyme | juergbi, well, this just reproducibly broke with no failures logged when large files were pushed (I guess multi-GB) | 06:36 |
nanonyme | juergbi, it's a normal(ish) scenario, some projects have eg debuginfo files where individual files are multiple gigabytes in size | 06:36 |
juergbi | do you happen to know whether it might have run out of disk space (i.e. not triggering expiry early enough)? | 06:37 |
juergbi | casd reserves 2GB extra headroom by default. if you can reproduce the issue and suspect a disk space issue, you could increase that | 06:39 |
juergbi | I would still expect an error be reported even if it was a disk space issue, though | 06:40 |
juergbi | nanonyme: is this an issue with casd running on the client side or an issue on the server side when casd is used as remote CAS server? | 06:42 |
juergbi | purely pushing shouldn't require any extra disk space on the client side | 06:43 |
nanonyme | We're using buildbox-casd as remote CAS server here | 06:43 |
nanonyme | So latter | 06:43 |
*** toscalix has joined #buildstream | 08:06 | |
*** toscalix has quit IRC | 08:09 | |
*** toscalix has joined #buildstream | 08:14 | |
*** sstriker has joined #buildstream | 08:34 | |
*** santi has joined #buildstream | 08:35 | |
*** toscalix has quit IRC | 08:45 | |
*** toscalix has joined #buildstream | 08:45 | |
*** toscalix has quit IRC | 09:48 | |
*** toscalix has joined #buildstream | 10:17 | |
*** toscalix has quit IRC | 10:31 | |
*** toscalix has joined #buildstream | 10:35 | |
*** toscalix has quit IRC | 10:49 | |
*** santi has quit IRC | 10:54 | |
*** toscalix has joined #buildstream | 11:02 | |
*** santi has joined #buildstream | 11:08 | |
*** toscalix has quit IRC | 12:22 | |
*** abderrahim[m] has quit IRC | 13:23 | |
*** sstriker has quit IRC | 16:16 | |
*** santi has quit IRC | 17:54 | |
nanonyme | juergbi, the nasty bit here is as said buildbox-casd doesn't emit errors in this scenario and bst1 vomits something like "Unexpected error in RPC handling" so it's hard to debug | 18:27 |
juergbi | nanonyme: if you can reproduce it, can you monitor disk usage on the server to check whether this might indeed be the issue? | 18:29 |
juergbi | I assume you can't easily test it with bst2 as client to check whether it reports a clearer error | 18:29 |
nanonyme | I haven't yet tried to reproduced it locally, I think abderrahim did | 18:30 |
nanonyme | BuildStream is having serious issues working at all with CAS as we realized (it's missing retries both for pulls and pushes) | 18:31 |
nanonyme | As in, master | 18:31 |
juergbi | or alternatively, invoke buildbox-casd on the server with a larger disk headroom, e.g. --reserved=8G (default is 2GB) | 18:31 |
juergbi | I assume that's for the (small) remote asset server requests as per the open issue | 18:32 |
nanonyme | I see. These are the current parameters https://gitlab.com/freedesktop-sdk/infrastructure/local-cache/-/blob/master/docker-compose.yml#L14 | 18:32 |
juergbi | need to fix that, probably not too difficult | 18:32 |
nanonyme | In other news, we in freedesktop-sdk project currently run buildboxx-casd as our primary remote cache implementation | 18:33 |
nanonyme | buildbox-casd even | 18:33 |
juergbi | hm, the quota is set to SIZE-100G. does this mean 100G are unused, i.e. there should be plenty of headroom on disk? or is this space used by something else? | 18:34 |
nanonyme | This is basically a shared builder, there's at maximum three concurrent containers running bst + separate container running a local CAS instance | 18:35 |
juergbi | buildbox-casd isn't designed as scalable server but if you only need a single instance, it should be fine - after fixing this issue | 18:35 |
nanonyme | Current caching architecture is each runner has its own remote CAS (buildbox-casd) and then there's one central remote CAS (buildbox-casd) | 18:36 |
nanonyme | Former has been hitting these issues with large binaries but it's possible latter would too (bst just fails before trying) | 18:36 |
juergbi | we should definitely fix disk space handling for large temporary files | 18:37 |
nanonyme | juergbi, right. The first choice was bb-remote-asset but it had suspicious object eviction characteristics so we switched to buildbox-casd instead | 18:38 |
juergbi | however, can't be sure this is the (only) issue in your case, though | 18:38 |
juergbi | buildgrid cas server would be an alternative that should be able to scale across multiple machines, if that's something you need | 18:39 |
nanonyme | I don't think multiple machines is currently the desired thing. | 18:39 |
nanonyme | The runner CAS is mostly for data locality | 18:41 |
nanonyme | Well, also for heterogenic conent caching as runner only has given arch builds | 18:43 |
nanonyme | Or rather, handling caching of heterogenic content in more efficient way | 18:43 |
nanonyme | juergbi, I think the 100GB is mostly "leave at minimum 100GB on disk for server to remain functional" | 18:49 |
*** toscalix has joined #buildstream | 19:12 | |
*** toscalix has quit IRC | 22:26 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!