*** tristan <tristan!tristan@223.62.169.153> has joined #buildstream | 05:08 | |
*** ChanServ sets mode: +o tristan | 05:08 | |
*** tristan <tristan!tristan@223.62.169.153> has quit IRC | 07:21 | |
doras | juergbi: was something like hardlink checkout + overlayfs considered as an alternative CAS protection mechanism instead of FUSE? | 15:08 |
---|---|---|
doras | For staging, I mean. | 15:11 |
juergbi | doras: overlayfs was not available to unprivileged users back then, which essentially ruled this out. however, I think it is available since linux 5.11, so may be worth considering as alternative | 15:12 |
juergbi | hardlink staging does add a significant cost in staging time if there are many files while staging with FUSE takes almost no time but adds a performance hit on some operations later | 15:15 |
juergbi | btw: buildbox-casd already supports hardlink staging but buildbox-casd has to be installed setuid (of a casd user, not root) for that | 15:16 |
juergbi | (as that uses uid separation for protection instead of overlayfs / bst1 fuse) | 15:17 |
doras | juergbi: doesn't this mean that staged hardlinks are essentially read-only? | 15:18 |
juergbi | doras: which part/variant are you referring to? | 15:19 |
doras | Hardlink staging + different uid for CAS (if I understood things correctly). | 15:20 |
juergbi | yes, indeed, that's the purpose of using a different uid | 15:20 |
juergbi | ah, or are you referring to potential issues due to that? | 15:21 |
doras | I'm referring to potential issues, yes | 15:22 |
doras | I mean, can bst2 actually make use of it without breaking something use cases that otherwise work with FUSE? | 15:22 |
doras | s/something/some/ | 15:22 |
juergbi | there may be issues in rare cases but well-written code usually has no issue with that (the staged files can still be replaced, they simply can't be modified in-place) | 15:22 |
juergbi | that said, we strongly recommend use of buildbox-fuse on Linux. the hardlink support is mainly for non-Linux | 15:25 |
juergbi | with regular non-setuid buildbox-casd, it will fall back to full file copies is buildbox-fuse is not available. while the full file copy fallback is obviously slow, it allows files to be modified, matching buildbox-fuse | 15:26 |
doras | I guess the recent composefs proposal could be useful to avoid the hardlink staging costs while still allowing the use of overlayfs on top for native read/write file access. Best of both worlds, almost. | 15:27 |
juergbi | composefs could indeed be useful but that's not something we can rely on in the foreseeable future | 15:28 |
juergbi | although, we could support it as additional option even if we can't rely on it being available, of course | 15:29 |
juergbi | I can't estimate the chances of it being mainlined | 15:30 |
*** tristan94 <tristan94!tristan@78.40.148.178> has quit IRC | 15:31 | |
doras | juergbi: supposedly EROFS already provides capabilities similar to composefs. | 15:33 |
doras | Minus a shared page cache between images, which is supposedly planned. | 15:42 |
doras | I'm not sure if it supports user namespaces though. | 15:43 |
juergbi | doras: isn't EROFS image-based? | 15:48 |
juergbi | doras: actually, it could be useful to use overlayfs also with buildbox-fuse. this could eliminate the hit on write performance. and read performance is typically less of an issue with buildbox-fuse | 15:49 |
juergbi | it might be tricky to use, though, as unprivileged mount is only possible within a corresponding mount namespace. might not be possible with the current protocol between buildbox-run and buildbox-casd. also, bubblewrap doesn't support overlayfs mounts yet (unclear whether that would actually be useful, though, depends on how the new protocol would look like) | 15:49 |
doras | juergbi: I see what you mean. Basically provide write protection through an overlay, and use FUSE only to redirect reads to CAS? | 16:22 |
juergbi | doras: yes. well, write protection is the wrong term in context of buildbox-fuse but redirect writes to a separate (possibly even tmpfs) overlay and use FUSE only to read unmodified staged files | 16:24 |
doras | juergbi: can't buildbox-fuse be used as-is as the lower directory of an overlay and the upper directory of the overlay would be on a native filesystem? Or this what you meant? | 16:33 |
juergbi | yes, that's what I meant | 16:33 |
doras | juergbi: I'm not familiar with the protocol between buildbox-casd and buildbox-fuse, but it sounds like in theory the FUSE filesystem itself can remain unchanged as long as it's mounted read-only. | 16:38 |
juergbi | yes, the question is how to handle overlayfs with regards to buildbox-run and buildbox-casd (and buildstream). buildbox-fuse itself shouldn't be affected | 16:40 |
juergbi | (except possibly minor changes to allow mounting within an unprivileged user namespace) | 16:40 |
doras | Oh, I now see I read your message wrong. | 16:41 |
doras | juergbi: back to hardlinks + overlayfs idea for a moment, can't we perform the CAS "checkout" offline and not during staging? i.e., after an artifact is pulled? | 16:49 |
doras | Then use an overlay with multiple lower read-only directories, each representing a different artifact? | 16:50 |
doras | It will have some storage overhead for the hardlinks themselves, but it may be worth it. | 16:53 |
juergbi | doras: persistent artifact checkouts are an issue for LRU cleanup of the CAS. we used to have that. and there may also be issues with hardlink limits per file | 16:53 |
juergbi | overlayfs with many layers also has a performance penalty for open and readdir. don't know how much but overall I think it's questionable that it would perform better than read-only buildbox-fuse | 16:54 |
doras | I see. | 16:54 |
juergbi | if we can implement overlayfs support (single lower layer), I don't think there will be a significant remaining performance issue with buildbox-fuse | 16:55 |
juergbi | the reflink optimization for buildbox-fuse that we recently discussed, would probably not be possible with this, though. not a disadvantage compared to building on e.g. ext4 without FUSE but that optimization might be nice for some special cases such as ostree | 16:59 |
juergbi | avoiding performance regression compared to build on ext4 is probably more important than being faster than ext4 in some special cases | 17:00 |
nanonyme | juergbi: sorry, where would this overlayfs help? I missed that | 22:48 |
nanonyme | Also wait wait wait, do you mean us putting buildbox-casd as setuid another user would help with performance? Did I read that correctly? That can be done if needed. | 22:51 |
nanonyme | We have a hack that allows putting setuid for files we create with BuildStream before creating Docker images out of them | 22:53 |
nanonyme | juergbi: the checkout performance isn't currently our primary problem. Checking out 7GB of data takes only couple of minutes apparently. It's waste of IO but such is life. The bigger problem is sandbox performance | 22:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!