IRC logs for #buildstream for Monday, 2018-09-17

*** tristan has joined #buildstream		06:56
*** ChanServ sets mode: +o tristan		06:56
*** iker has joined #buildstream		07:00
*** bochecha has joined #buildstream		07:33
gitlab-br-bot	buildstream: merge request (tristan/contributing-guide->master: Update contributing guide) #801 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/801	07:42
*** toscalix has joined #buildstream		07:45
tristan	adds68, I created !801 above ^^^ which replaces your CONTRIBUTING.rst commit, only change is the symbolic link	07:50
tristan	if it passes, I'll close your other MR	07:50
*** finn has joined #buildstream		07:56
*** tiagogomes has quit IRC		08:01
*** tiagogomes has joined #buildstream		08:02
gitlab-br-bot	buildstream: merge request (tristan/contributing-guide->master: Update contributing guide) #801 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/801	08:09
*** rdale has joined #buildstream		08:17
gitlab-br-bot	buildstream: issue #602 ("Cannot mount disk image in sandbox") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/602	08:32
*** tpollard has joined #buildstream		08:59
*** tiagogomes has quit IRC		09:01
*** lachlan has joined #buildstream		09:07
*** jonathanmaw has joined #buildstream		09:23
laurence	i poked around on the website this morning - some of the contents are really quite good, nice work folks	09:25
Nexus	we have a website?	09:26
Nexus	woo, buildstream.build is up \o/	09:27
gitlab-br-bot	buildstream: merge request (adamjones/contribution-guide->master: Update contribution guide) #796 changed state ("closed"): https://gitlab.com/BuildStream/buildstream/merge_requests/796	09:27
tristan	juergbi, Related to this comment: https://gitlab.com/BuildStream/buildstream/merge_requests/797/diffs#note_101771156, I think we want to restructure this ArtifactCache <--> CasCache relationship	09:31
tristan	juergbi, Do I understand correctly that a CasCache is more than just an ArtifactCache, and that an ArtifactCache is only one use case for a CasCache ?	09:32
*** lachlan has quit IRC		09:35
toscalix	Nexus: we still have two key pages unfinished. I would prefer to finish them and then make a little noise	09:38
*** tiagogomes has joined #buildstream		09:46
*** alatiera_ has joined #buildstream		09:52
Kinnison	tristan: Thanks for your comment on 797, I'm afraid I don't have the deeper understanding of what's going on there, so perhaps I should abandon the MR and let someone else tackle it who has more context?	09:55
* Kinnison is only a very very lightweight dabbler		09:57
Kinnison	tristan: though a gut feeling I have is that if setup_remotes() should be called only once, it should probably gain a check and a BUG if it is called > 1 time :-)	09:59
* Kinnison isn't sure if that fits with the style of code in BuildStream though :-)		10:00
juergbi	tristan: I think something in that direction would indeed make sense. will try to take a look	10:01
tristan	Kinnison, re your last, it does; and the way to do that is simply with `assert` :)	10:06
tristan	Kinnison, any assertion triggers a BUG message with it's stack trace	10:06
Kinnison	tristan: aha, then perhaps I should add that in my MR and see if it splodes :-)	10:07
Kinnison	But if juergbi is going to look at rearranging Artifactcache to encapsulate a CASCache rather than the current structure, I should perhaps leave things to him	10:07
* tristan looks back at 797		10:07
tristan	Kinnison, The problem I have with 797, is that I don't know why people instantiated a CasCache in the first place (or why they called it cascache instead of `artifacts`, as we call it everywhere else)	10:08
Kinnison	Fair, I know so little about everything that I didn't even think of that :-)	10:08
tristan	Kinnison, I have a hunch that I am right that a Cas should be a delegate and separate from ArtifactCache	10:08
tristan	So when I asked the question "why", I am rather wondering, what are you guys doing with this remote execution stuff ? how do you expect the architecture to work ?	10:09
tristan	Maybe instantiating another one was intentional for some reason	10:09
*** iker has quit IRC		10:10
*** iker has joined #buildstream		10:10
tristan	From what I understand, there will also be a SourceCache, which is used for caching the results of staging sources and providing those to a buildgrid	10:10
tristan	And a SourceCache is not an ArtifactCache, but they both probably use an underlying CAS	10:10
tristan	Maybe it's the same CAS, maybe it's not	10:10
Kinnison	I fear mablanch or jmac needs to answer that unless juergbi knows	10:12
tristan	juergbi, jmac, I will have to put together an overall architectural document soon; I think it would be good if we could sit down together and I can listen to what exactly the architecture is that you propose for all of this to fit together	10:12
juergbi	yes, I think it makes sense to separate the relatively low level CAS and the code that uses it for various purpose	10:13
juergbi	s	10:13
tristan	I hope that we are not driving this through trial and error in implementation :)	10:13
tristan	I think also, we might want multiple CAS handles in the application, while they might share the same local store, they might have different remotes	10:14
tristan	But I'm really unsure of that, I need to sit down and understand the intentions of how a BuildGrid is supposed to work	10:15
tristan	I.e. it makes sense to have a local CAS store for everything the client stores, but if everything is executing remote, it might also not make sense to ever store artifacts locally, but only serve up the sources	10:16
tristan	If the end goal is to push the results to a remote artifact cache, that is of course not the same network as a build grid	10:16
* tiagogomes wonders if not calling pull_tree() on the SandboxRemove is part of the plan for the rearranging		10:16
tristan	etc	10:16
juergbi	tristan: why are you saying that the remote artifact cache is of course not the same network as a build grid? while various combinations are possible, it typically makes sense to use the same CAS for both purposes	10:20
juergbi	or maybe I misunderstood you	10:20
tristan	juergbi, It is possible I'm sure	10:21
tristan	juergbi, That doesnt mean it makes sense that it is forced right ?	10:21
juergbi	right, but your 'of course' sounds like the split would be the typical case	10:21
tristan	As, we already have allowances for multiple artifact caches, even if one of them is; it might not be the only one that is needed to push to / pull from for a given project	10:21
tristan	I sort of expect that the split is the typical case yes	10:22
tiagogomes	Scripts and other unprocessed files would be both in the source cache and artifact cache. So there is an advantage of both being in the same CAS for deduplication purposes	10:22
tristan	juergbi, it might or might not be typical, my expectation would have been you have your store in one place, and you might spin up a build farm at times	10:22
juergbi	overall most efficient solution would be single CAS infrastructure, afaict	10:22
tristan	the build farm being rather more ephemeral than the store	10:22
juergbi	you can certainly do that but the build farm would hopefully use the same store	10:23
juergbi	not providing its own	10:23
tristan	juergbi, Ok but it's rather unrelated; I suppose the artchitecture still does not require that	10:23
juergbi	yes, agreed	10:23
tristan	So anyway, what I'm expressing here is that I don't have a clear picture of how these components are intended to interact. In this specific case, I would hazard that one approach is to have a CAS handle which is not related to anything local at all	10:24
paulsher1ood	any chance we could migrate this community discussion to freenode?	10:24
tristan	juergbi, The BuildStream client running on your machine, might inform the shared CAS on a build grid which is close to the workers, that it must upload the artifacts to a specific artifact cache server	10:25
juergbi	that's not supported by the protocol	10:25
tristan	juergbi, In any case, there must be an intention before an implementation :)	10:25
juergbi	if you want to use multiple CAS, you have to push everything via client	10:25
jmac	The intention from my side is that there is one CAS	10:26
tristan	juergbi, So does the client need to be on the grid and store the CAS ?	10:26
juergbi	there must be a CAS server that is accessible by both the client and the grid	10:26
tristan	Or can I run BuildStream and execute everything on a remote grid ?	10:26
juergbi	other than that, the client can be anywhere	10:27
juergbi	the client uploads whatever is needed for execution to that CAS	10:27
jmac	One CAS, locally, that has remotes defined	10:27
juergbi	the grid uses the same CAS to execute the build and uploads the output again to the same CAS	10:27
tristan	But if the project says that there are multiple artifact servers to push to, and the user wants to run the build on a grid, the user will have to download everything from the grid on their own connection, and push it to the artifact servers themself	10:28
juergbi	yes	10:28
juergbi	hence I don't think it's a very reasonable thing to do	10:28
tristan	So that seems a bit suboptimal	10:28
jmac	Projects defining multiple push remotes isn't part of my plan at the moment	10:28
juergbi	tristan: you'd want a protocol that supports remote-to-remote transfers?	10:29
tristan	jmac, However it is rather already a part of BuildStream afaict	10:29
juergbi	also seems a bit odd to have that client controlled	10:29
tristan	juergbi, I am currently thinking, on the surface... "The project defines it's artifact cache(s) as usual"... "The user might have a build grid at their disposal"	10:30
tristan	juergbi, that's my basic expectation which leads me to that angle	10:30
tristan	The user might have permission to use the resources of a given build grid, to build any project they want	10:30
juergbi	it's far from an ideal overall setup	10:31
tristan	I.e. that is a choice/privilege of a user, not the project	10:31
tristan	Is it ?	10:31
tristan	juergbi, Perhaps it's just a simple protocol from a BuildStream client to a BuildGrid server, which does everything on it's grid where everything is close, on behalf of the calling user, if they are authenticated ?	10:32
tristan	Where the user never downloads any sources or artifacts onto their own computer ?	10:32
juergbi	remote source download is something that we have been thinking a bit about but not something planned out yet	10:33
juergbi	however, it's mostly separate from the conversation we're having, imo	10:33
tristan	juergbi, Ok so - What I want to make sure is that the implementation is design driven, and the design is not implementation driven	10:34
tristan	I don't have a clear picture here how joe blow user is going to use their own pet project and build it on a grid they have access to	10:34
juergbi	that is mainly the case but we don't have any plans for such remote-remote transfers	10:35
juergbi	a configuration seems to be missing to mark a CAS remote as being the one to use for remote execution	10:36
juergbi	right now the regular push setting is used, afaict	10:36
tristan	That is one point I was mentioning above yes, it seems that you might want different remotes for different purposes	10:37
juergbi	with multiple push remotes the current logic doesn't make sense, although it will still work	10:37
tristan	Which led me to, perhaps you might want a CAS handle without any local storage at all	10:37
juergbi	that's not on the plan	10:37
tristan	multiple push remotes I think is certainly on the menu, we already do this at least in the case of junctions	10:38
juergbi	it could be discussed but I don't think we should try to change everything in one go	10:38
tristan	It's also strongly requested, and makes sense, that we push to the toplevel project even for subprojects	10:38
tristan	I.e. the owner of a project should not have to trust the artifact servers of the projects they depend on, in order to be reliable to store their own artifacts	10:39
juergbi	the per-(sub)project config is indeed a bit a problematic point. should discuss this in general	10:39
tristan	That has to be considered in design yes	10:39
juergbi	it's also an issue without remote exec, but remote exec makes it more pressing	10:40
*** mohan43u has joined #buildstream		10:41
tristan	juergbi, jmac; ok so my goal here is just to put together a clear architectural document of what everything should look like (not necessarily what it is), this will likely take some time and iterations	10:47
tristan	juergbi, jmac, Do you think we can schedule a video chat later this week and you both can explain to me (A) What user stories you have considered (B) How the intended architecture addresses those	10:48
tristan	Then we can move forward and I can use that input for the architecture, or I can furiously be upset about my favorite use cases not being supported :)	10:48
tristan	Although the latter is mostly just for dramatic effect	10:48
jmac	No, I don't think so; I don't have those user stories	10:49
jmac	I can schedule a video chat, of course	10:49
*** lachlan has joined #buildstream		10:50
jmac	Nor do I think I have a clear written architecture other than the REAPI	10:52
tristan	Ok so we need to make sure those are worked out	10:56
tristan	The user story for the first iteration might be "You have to setup a build grid somewhere, and then you ssh into the build grid network and run BuildStream there"	10:57
tristan	I mean, we should be thinking first, how are users suppose to use it; then architecture for that - in those cases where the protocol is not sufficient, we need to update those protocols to do what we intend	10:57
tristan	jmac, I want to end up with a clear story of which component is responsible for doing what in the end picture, so that we can share that knowledge and everyone can code towards the same goals	10:58
jmac	My only informal user 'story' at the moment is from a user who wants us to implement the REAPI	10:58
jmac	tristan: Naturally.	10:59
tristan	right, it's a bit scary if at this point we don't have that :)	10:59
gitlab-br-bot	buildstream: merge request (willsalmon/outOfSourecBuild->master: WIP: out of source builds) #776 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/776	10:59
jmac	tristan: It's a fairly familiar situation: You want things done properly, other people want features quickly, and I have to mediate between them	11:01
juergbi	tristan: sure, we can have a chat this week. the current user story is that a remote execution user uses a single push CAS which is the same as what buildgrid uses	11:08
juergbi	more complex configurations are possible, but we don't have user stories / plans for those at the moment	11:09
*** jonathanmaw_ has joined #buildstream		11:10
*** jonathanmaw has quit IRC		11:11
juergbi	'ssh into buildgrid network' should not be the typical case. depending on the location / bandwidth of the client, doing things via ssh could be faster, of course, but I don't think we want to recommend this in general	11:12
*** lachlan has quit IRC		11:17
tristan	juergbi, Agreed; I very much like the idea that the user can run bst on their own machine, build and checkout the results, deploy an image to the rig sitting beside them, smoketest the results, etc; and that having access to a grid is just extra compute power	11:19
tristan	over ssh is annoying for this - anyway it's fine if there are limitations in an initial implementation :)	11:19
juergbi	the main limitation in the initial implementation is that some things still have to go through the client, requiring bandwidth, e.g. source fetching. i.e., the initial version is helpful to alleviate a local CPU bottleneck but it still requires network bandwidth	11:22
juergbi	in the future we can hopefully improve this aspect further	11:22
juergbi	handling of multiple independent CAS networks in an efficient way is to me lower priority than that	11:23
tristan	Yes, and sorry I have not had time myself to get involved in the remote execution discussions in the last months since GUADEC - for now I'm mostly lagging behind and want to paint a clear picture :)	11:23
juergbi	that's definitely understandable :)	11:23
*** lachlan has joined #buildstream		11:33
*** solid_black has joined #buildstream		11:51
gitlab-br-bot	buildstream: issue #500 ("While caching build artifact: "Cannot extract [path to socket file] into staging-area. Unsupported type."") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/500	11:55
gitlab-br-bot	buildstream: merge request (willsalmon/outOfSourecBuild->master: WIP: out of source builds) #776 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/776	11:57
Kinnison	Wow, there was a lot of discussion while I was elsewhere :-)	12:03
Kinnison	tristan, juergbi: For now, should I pause/abandon !797?	12:04
tristan	Kinnison, I think we can at least go ahead with using the shared ArtifactCache instance, and not re-initializing the remotes	12:16
tristan	Kinnison, I made some additional comments which I don't think need to be considered as blocking the patch from landing	12:17
Kinnison	tristan: Okay, I'll look at it later, thank you for taking the time	12:17
* Kinnison really appreciates it		12:17
tristan	Kinnison, Only if jmac or juergbi have a reason to want a separate CasCache instance would I consider dropping it	12:17
tristan	But that didn't seem to happen	12:17
Kinnison	Heh	12:18
tristan	I will work on getting a better understanding on "how it is supposed to work" in the immediate future	12:18
tristan	juergbi, Have a moment to brainstorm something with me ?	12:24
tristan	I have just written up a proposal, but now I'm not sure anymore if I want to send it ;-)	12:25
tristan	heh	12:25
tristan	Anyone is welcome to brainstorm for that matter, of course	12:25
tristan	So the problem is that; as mcatanzaro raised in his talk at GUADEC, BuildStream is unusable for developing applications; plain and simple, it's because of launch time of `bst shell`	12:26
Kinnison	What makes the launch of bst shell slow?	12:26
Kinnison	Has it been profiled?	12:26
tristan	When reviewing what we had done to make BuildStream better for developing Flatpaks, Alex also raised that he wondered why we run the integration commands before the build instead of after; like flatpak-builder does	12:27
tristan	The biggest bottleneck to launching a shell is clearly running the integration commands	12:27
tristan	So, my proposal ran along the lines of caching the integrations post build, perhaps optionally	12:27
* Kinnison suggests that would be good, and that actually jmac's work in the RE arena moves us a little toward that		12:28
tristan	And only integrating on-demand when starting a build (launching a shell being pretty equal to starting a build)	12:28
Kinnison	since RE allows caching of actions, and integration commands are actions, so we have intermediate integrated things cached (almost) by side-effect	12:28
tristan	So if two elements share the same set of dependencies, and the second element starts after the first has already completed, the second element does not need to integrate	12:28
Kinnison	indeed	12:28
Kinnison	that'd be excellent IMO	12:28
tristan	However it also means potentially more integrations overall	12:29
Kinnison	Why would it be more overall?	12:29
tristan	Some that are unused	12:29
tristan	Because it is fairly rare that multiple elements share exactly the same set of build dependencies	12:29
Kinnison	So I'd suggest we make the keys which hold integrations to be weaker than those holding artifacts	12:29
Kinnison	don't integrations run at various points during sandbox construction? Or is it only immediately before a build?	12:29
tristan	So we don't really know if running the same set of integration commands produces the same output, if run against a differing set of dependencies	12:30
tristan	Always immediately before a build; actually this is an implementation detail of various plugins (complicating the implementation a bit more)	12:30
Kinnison	Aaah	12:30
tristan	Clearly, BuildElements would benefit from this	12:30
* Kinnison thought integrations were run as built elements were staged into the sandbox		12:30
* Kinnison misunderstood		12:30
WSalmon	skullman, juergbi , etc, whats the best way to force the chroot sandbox? esp. in tests. can i just add a --addopts argument?	12:31
* Kinnison would have to ponder further, do you have a brainstorm/proposal draft written up which I could read tristan?		12:31
tristan	So, for a developer, they might be more forgiving that a build finishes with integration commands, but annoyed that it costs integration commands when they want a shell to launch	12:31
tristan	sec	12:32
skullman	WSalmon: BST_FORCE_BACKEND=unix python3 setup.py test --addopts -x\ --integration	12:32
WSalmon	skullman, Many Thanks!	12:32
tristan	gitlab slooowwwwww	12:33
WSalmon	that did it, test fails, woop.... thanks skullman	12:34
tristan	Kinnison, https://gitlab.com/BuildStream/buildstream/snippets/1754601	12:35
* Kinnison takes a look		12:36
tristan	Kinnison, that is my half baked proposal	12:36
tristan	So, there is another approach I was thinking of a while back, and I wonder if it, or a combination of both might be smarter	12:36
tristan	That is https://gitlab.com/BuildStream/buildstream/issues/464	12:37
Kinnison	tristan: would it be sensible to integrate only the top level elements named in the build invocation?	12:37
tristan	It basically says that "We integrate for certain purposes" like integrating for a build, vs integrating for the purpose of running, vs integrating for the purpose of creating a bootable image	12:37
Kinnison	tristan: So only doing that post-build integration for those elements named	12:37
tristan	Kinnison, Well, I'm rather worried that that specific element is also the one which the developer is working on in their edit/compile/test, but also no	12:38
persia	Note that performing integration post-build will be wasted for some integration commands (like regeneratng ld.so cache).	12:38
tristan	Kinnison, things need to be integrated before running consecutive builds; for instance `ldconfig` is an integration command	12:38
Kinnison	True	12:38
tristan	persia, Right, that's why we need to integrate on demand pre-build anyways, unless a cached integration is found	12:39
Kinnison	I suppose the number of leaf elements ought not outweigh the number of intermediate elements, so always post-build integrating ought to be valuable	12:39
tristan	So, that means in the edit/compile/test loop, we still run them, which is still bad	12:39
tristan	Then I got to thinking, what if we were to reuse the integration of the element's dependencies at build time ?	12:40
tristan	Still it has problems, but only on a subset of applications	12:40
Kinnison	Can we do some kind of analysis of whether or not the dependency chains tend to be simple element -> simple element -> simple element or if it tends to be element -> {elements...} -> ...	12:40
persia	tristan: Also, maybe we need more words, I can think of several cases where post-build is fine, but others where it needs to wait until the target environment is ready. Using two words for this might be better than trying to eithr duplicate effort or determine type in advance.	12:40
Kinnison	i.e. do we tend to get linear chains	12:40
tristan	E.g., a UI application which installs icons, doesnt get their icons included in the icon cache	12:40
tristan	Kinnison, With gnome-build-meta and freedesktop-sdk as a sample, I normally build with 4 builders enabled on my laptop	12:41
tristan	it is rare that I have only one build going on	12:41
Kinnison	tristan: that implies either a very wide dependency set, or else a number of linear chains	12:41
tristan	it happens maybe near the end of a webkit build, or near the end of an llvm build	12:41
Kinnison	Basically a post-build integration only helps if you want to shell into that element, or if there are elements which depend exactly on the just-built element and its build-dependencies, and no more.	12:42
tristan	So; I got to thinking that; maybe an element could advertize whether or not it "Contributes to an integration"	12:42
persia	Kinnison: While I've not built many systems with BuildStream, depdendency maps from other software collections suggest that it is often single-threaded until some base is available, than widely parallel until approaching completion, with the remaining few linear chains being completed at the end of a build.	12:42
tristan	Kinnison, Think of this for instance... a C library contributes to the integration of ld.so.cache, while a font contributes to an fc-cache integration	12:42
Kinnison	It's very hard to know if element A's files contribute to the result of element B's integration commands, unless you have a way to know if the files were accessed (or even enumerated)	12:43
Kinnison	Not least it requires knowledge backward through the tree (or forward, depending on your viewpoint)	12:43
tristan	Yes, I don't see how that could be automated, but it could potentially be annotated	12:43
tristan	So we could potentially paint the fc-cache integration with a domain name "fonts"	12:44
Kinnison	If integrations could list sensitivities then that might help	12:44
persia	Kinnison: a post-build integration may also coincidentally help if the integration detail doesn't depend on other parts of the system (or depends in a compatible way: e.g. determine whether to include some stanza in a config file only if some optional dependency is on the system: not including the stanza continues to work for every possible system without the optional dependency)	12:44
tristan	And later paint elements which contribute to it with "fonts"	12:44
Kinnison	tristan: Effectively sensitivity lists. A bit like dpkg triggers in Debian's infrastructure?	12:44
persia	Please, not annotation.	12:44
persia	Everything interesting that can be done with annocation can be automated better.	12:44
tristan	persia, I'm not sure how, how could we possibly know ?	12:45
Kinnison	tristan: So an ldconfig integration lists the dirs relevant to it, and then any element whose outputs include stuff in those dirs is automatically noted as contributing on that integration	12:45
persia	tristan: Instrumentation	12:45
Kinnison	tristan: Ditto fc-cache for font directories	12:45
tristan	persia, yes but more specifically	12:45
Kinnison	tristan: and any integration not listing a sensitivity set is a WARNING and assumes /	12:45
tristan	Ok well, maybe as a starting point, we could have a glob pattern with an integration command set	12:45
tristan	I.e. "I need to run whenever .foo files are installed into /usr/share/foo"	12:46
Kinnison	Yeah a list of globs could be a good start	12:46
Kinnison	ldconfig would say something like "/*/.so*"	12:46
Kinnison	as a very wide-ranging glob	12:46
Kinnison	or could be more carefully limited to the lib dirs by a more savvy element author	12:47
tristan	Ok, that is sort of where I was heading, minus the useful instrumentation idea you add here, after writing my proposal and sort of debunking it myself :)	12:47
Kinnison	:-D	12:47
Kinnison	brainstorming is always useful	12:47
persia	My imaginary element author finds this too frustrating to debug to be worth doing and spends most of their days deleting entries, as many linux desktop users do with pulseaudio	12:47
tristan	This will still involve caching, and at what level is unclear	12:48
persia	Perhaps we drop "integration commands" entirely, and instead take deeper advantage of BuildStream's pipelining model?	12:48
tristan	I think this also probably needs the option of annotations, only to be used for special cases; there I am not completely sure	12:48
persia	Such than an "integration" just processes some element into some integrated element, and we can cache that normally?	12:48
Kinnison	persia: an element author who is frustrated just skips the annotation and bst can't optimise for them	12:48
Kinnison	integration implies element+others	12:49
Kinnison	you don't integrate a single element in isolation	12:49
tristan	Right, you do not - and imposing this structure on project authors is also yuck	12:49
persia	Kinnison: If the automation can do automation user-invisibly, then the user has no lever to tweak. If you give a user a lever, they will either try different positions, or try to remove the lever.	12:49
persia	Our current implementation of "integration" implies that one builds a sandbox and runs some code to put the sandbox in a new state.	12:50
Kinnison	persia: So automation can only do it post-hoc and can't safely transfer that knowledge from build to build	12:50
persia	I don't really understand how this is functionally different than "build".	12:50
tristan	persia, I think that already the idea of a "list of globs which accompany an integration command set" is effectively a lever, right ?	12:50
Kinnison	persia: integrations affect the sandbox as a whole, rather than producing new content in isolation	12:50
persia	Kinnison: There's no reason automation can't cache, but reverse construction of globs requires levels of meta-computation not usually considered easily available.	12:51
persia	tristan: Yes.	12:51
Kinnison	persia: right but automation caching can't know at what point decisions were made	12:51
tristan	Right, an integration is a filesystem permutation which is sensitive to what is inside the filesystem tree	12:51
Kinnison	persia: If something readdir()s /lib that doesn't mean that it only cares about /lib, it might notice a special trigger filename in there to read the entirety of /bin for example	12:51
tristan	It can do unpredictable things, including even removing files	12:51
persia	Firstly, we have other tooling that notices changes to the sandbox as a whole, and calls that "new content".	12:52
Kinnison	persia: this means it's nearly impossible (in the turing completeness sense) to understand what happened purely by looking at what IO was done	12:52
persia	Secondly, who cares when something happened, as long as we capture input, output, and results over a period of time?	12:52
persia	And Thirdly, are we assuming we have no means to notice reads or writes that makes this complex? Remember, automation is good at overwhelming amounts of detail in ways humans are not.	12:53
tristan	persia, Ok so; practically speaking; a compose element does this	12:53
persia	tristan: Yes. That is where I was going with that idea.	12:53
tristan	persia, however introducing a compose element between each build is strange, and plausibly costly	12:53
persia	Can it be made less expensive if we assume it always happens?	12:54
persia	e.g. keep the sandbox around for an automatic second modification?	12:54
tristan	currently it's rather costly; it only might be almost free if virtual filesystem and on demand staging is as awesome is... well really damn awesome	12:54
Kinnison	tristan: I need to prep for meetings right now. Good luck with your brainstorming, and I'll look over !797 when I'm done either later today or first thing tomorrow.	12:54
tristan	persia, Right, in that case it is by design "not another element"	12:54
tristan	persia, which is pretty much what we have	12:54
persia	Fair, and runs into the "does this result actually help?" problem.	12:55
tristan	persia, or rather, we do it before hand, but one element one sandbox one artifact in general is the design; you are moving it back to a side effect	12:55
persia	I just think globs are nearly impossible for automation to generate, and don't trust users to use them responsibly.	12:56
tristan	I'm not entirely clear at what can be cached and when	12:56
tristan	persia, BuildStream will not grow it's own internal understanding of integration commands, that must be provided to BuildStream in some form	12:56
tristan	persia, however note that the globs we're talking about, are defined once	12:57
persia	I think "what can be cached and when" depends on the contents of both the integration commands and the filesystems they run against.	12:57
persia	I remember a discussion about build avoidance, wherein there was some tracking of what files in a filesystem were used when generating output.	12:57
persia	I imagined that a similar technique could be used for integration avoidance.	12:57
persia	Both share the property that they need to understand when a new file will start being used for build/integration.	12:58
*** tristan has quit IRC		13:00
*** tristan has joined #buildstream		13:09
*** ChanServ sets mode: +o tristan		13:09
tristan	oops	13:10
tristan	<tristan> persia, I.e. if your system uses a C library and has a C runtime, then in the declaration of your C library producing element, you will define an `ldconfig` integration command, which needs to be run when files are added in `/lib/`, `/usr/lib`, etc	13:10
tristan	<tristan> persia, That build avoidance stuff afair is tied tightly to the not strictly deterministic approach of supporting incremental builds	13:10
tristan	<tristan> persia, i.e. rather an optimization for a second class citizen type of build	13:10
* tristan sees those missing from the log, so must not have made it to the channel		13:10
tristan	Still it's hard to see what can be cached, you cannot cache the removal of a file, and you cannot realistically combine integrations on top of staged artifacts	13:11
tristan	So, maybe the best you can do is assume as a rule, that staging an artifact does not overwrite the result of an integration command	13:12
tristan	Then you could cache integrations of sets of elements, and only stage the remaining artifacts which did not require any additional integration on top of that	13:12
tristan	Looking messy here	13:12
*** solid_black has quit IRC		13:18
*** tristan has quit IRC		13:20
*** ikerperez has joined #buildstream		13:28
*** iker has quit IRC		13:29
*** finn_ has joined #buildstream		13:30
*** finn has quit IRC		13:32
*** tristan has joined #buildstream		13:36
*** ChanServ sets mode: +o tristan		13:37
persia	Thanks for the repost. Yes. Very messy indeed. I don't like annotations, but if they actually speed up things fast enough, they may be useful.	13:44
persia	On the other hand, if we can just cache the results of integration run at shell time, so only the first run is a bit slow, that might achieve most of the benefits that are expected with annotations, without the complexity.	13:45
*** ikerperez has quit IRC		14:16
*** iker has joined #buildstream		14:17
*** abderrahim has quit IRC		14:45
*** abderrahim has joined #buildstream		14:46
*** alatiera_ has quit IRC		14:52
*** alatiera_ has joined #buildstream		14:55
*** finn_ has quit IRC		15:00
gitlab-br-bot	buildstream: issue #657 ("Setup our own x86_64 runners") changed state ("opened") https://gitlab.com/BuildStream/buildstream/issues/657	15:02
juergbi	Nexus: fyi: https://github.com/projectatomic/bubblewrap/commits/wip/WSL	15:11
*** finn_ has joined #buildstream		15:13
Nexus	juergbi: thanks, i found that a while ago i think, afaicr the only thing it doesn't have is FUSE	15:13
juergbi	right, it doesn't help with the FUSE issue	15:13
Nexus	yeah :/ i think it'll be a while before that gets put in by Mike Rosoft	15:15
gitlab-br-bot	buildstream: merge request (willsalmon/outOfSourecBuild->master: WIP: out of source builds) #776 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/776	15:44
*** iker has quit IRC		15:55
*** toscalix has quit IRC		15:56
*** iker has joined #buildstream		15:56
*** iker has quit IRC		16:01
gitlab-br-bot	buildstream: merge request (richardmaw/fix-chroot-sandbox-devices->master: fix chroot sandbox devices) #781 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/781	16:25
tpollard	for anyone hacking on bst master whilst consuming freedesktop elements this might be of use https://gitlab.com/freedesktop-sdk/freedesktop-sdk/issues/384	16:44
*** dtf has quit IRC		16:54
*** finn_ has quit IRC		17:11
gitlab-br-bot	buildstream: merge request (jonathan/pickle-yaml->master: WIP: Add a cache of parsed and provenanced yaml) #787 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/787	17:12
*** jonathanmaw_ has quit IRC		17:30
*** lachlan has quit IRC		17:53
*** finn has joined #buildstream		17:57
*** cs-shadow has joined #buildstream		18:03
*** xjuan has joined #buildstream		18:19
*** tristan has quit IRC		18:20
*** xjuan has quit IRC		18:27
*** xjuan has joined #buildstream		18:31
*** toscalix has joined #buildstream		18:47
*** tristan has joined #buildstream		18:48
*** xjuan has quit IRC		19:07
*** lachlan has joined #buildstream		19:12
*** rdale has quit IRC		19:16
*** lachlan has quit IRC		19:19
*** finn has joined #buildstream		19:19
*** xjuan has joined #buildstream		20:05
gitlab-br-bot	buildstream: issue #658 ("Conditionals not supported in element overrides") changed state ("opened") https://gitlab.com/BuildStream/buildstream/issues/658	20:34
*** alatiera__ has joined #buildstream		20:59
*** alatiera_ has quit IRC		21:01
*** alatiera_ has joined #buildstream		21:02
*** alatiera__ has quit IRC		21:03
*** alatiera__ has joined #buildstream		21:05
*** alatiera_ has quit IRC		21:06
*** tristan has quit IRC		21:14
*** bochecha has quit IRC		21:23
*** alatiera_ has joined #buildstream		21:27
*** alatiera__ has quit IRC		21:28
*** alatiera__ has joined #buildstream		21:35
*** alatiera_ has quit IRC		21:37
*** alatiera_ has joined #buildstream		21:49
*** alatiera__ has quit IRC		21:50
*** alatiera__ has joined #buildstream		21:52
*** alatiera_ has quit IRC		21:54
*** alatiera_ has joined #buildstream		21:57
*** alatiera__ has quit IRC		21:58
*** alatiera__ has joined #buildstream		21:59
*** alatiera_ has quit IRC		22:01
*** alatiera_ has joined #buildstream		22:15
*** alatiera__ has quit IRC		22:17
*** alatiera_ has quit IRC		22:19

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!