IRC logs for #buildstream for Monday, 2018-02-19

*** Prince781 has quit IRC		05:12
*** tristan has joined #buildstream		06:28
*** tm has joined #buildstream		08:08
*** mwa has joined #buildstream		08:16
mwa	_ _	08:17
mwa	(_)_ __ ___ _ __ _____ ____ _ _______ _ __ ___ _ __ ___\| \|_	08:17
mwa	\| \| '__/ __\| \| '_ ` _ \ \ /\ / / _` \|_ / _ \\| '_ \ / _ \ \| '_ \ / _ \ __\|	08:17
mwa	\| \| \| \| (__ _\| \| \| \| \| \ V V / (_\| \|/ / (_) \| \| \| \| __/_\| \| \| \| __/ \|_	08:17
mwa	\|_\|_\| \___(_)_\| \|_\| \|_\|\_/\_/ \__,_/___\___/\|_\| \|_\|\___(_)_\| \|_\|\___\|\__\|	08:17
mwa	irc.mwazone.net #talkzone	08:17
mwa	tm tristan slaf laurenceurhegyi paulsherwood nexus jmac persia juergbi zalupik jennis theawless[m] lantw44 csoriano skullman rafaelff[m] cgmcintyre[m] pro[m] connorshea[m] mrmcq2u[m] m_22[m] jjardon[m] bochecha kailueke[m] abderrahim[m] ptomato[m] asingh_[m] inigomartinez mattiasb waltervargas[m]	08:17
*** mwa has left #buildstream		08:17
*** toscalix has joined #buildstream		08:38
*** adds68 has joined #buildstream		09:08
*** jonathanmaw has joined #buildstream		09:38
*** valentind has joined #buildstream		09:44
*** tiago has joined #buildstream		09:49
*** tiago has quit IRC		09:52
*** tiago has joined #buildstream		09:54
*** noisecell has joined #buildstream		09:54
*** aday has joined #buildstream		09:59
*** ssam2 has joined #buildstream		10:12
*** valentind has quit IRC		10:18
*** ironfoot changes topic to "BuildStream 1.0.1 is out ! \| https://gitlab.com/BuildStream/buildstream \| Docs: https://buildstream.gitlab.io/buildstream \| IRC logs: https://irclogs.baserock.org/buildstream \| Roadmap: https://wiki.gnome.org/Projects/BuildStream/Roadmap"		10:19
*** ironfoot changes topic to "BuildStream 1.1.0 is out ! \| https://gitlab.com/BuildStream/buildstream \| Docs: https://buildstream.gitlab.io/buildstream \| IRC logs: https://irclogs.baserock.org/buildstream \| Roadmap: https://wiki.gnome.org/Projects/BuildStream/Roadmap"		10:19
*** dominic has joined #buildstream		10:23
juergbi	looks like we can't virtualize -march/mtune=native on x86 as that directly uses cpuid, not /proc/cpuinfo	10:50
juergbi	have to be careful never to use that to not break reproducibility. i think there are some upstream projects that enable this by default but not many	10:51
juergbi	bst toolchains might want to consider disabling support for this completely in gcc	10:51
tristan	juergbi, "bst toolchains" sounds... in direct conflict with overall goals	11:09
tristan	also, it seems cpuid is not enabled by default on my host at least	11:10
juergbi	tristan: i mean bst projects that include gcc	11:10
juergbi	tristan: hm, i don't think that part of cpuid can be disabled on any x86-64 CPU	11:11
tristan	it means the payload wants/needs to be aware of build environment	11:11
juergbi	(hypervisors can virtualize it)	11:11
juergbi	in the case of -march=native this breaks repeatability	11:12
juergbi	it's not a huge concern as gcc doesn't enable it by default but due to existence of build systems that enable it by default, it can be an issue	11:13
tristan	Yes, that indeed would seem to be the case	11:13
* tristan just looked up what it does		11:13
juergbi	so i would recommend completely patching that out for any project with gcc.bst, but there isn't really anything we can do about it, unfortunately	11:14
tristan	this sounds like a current shortcoming of our sandboxing environment which should probably be documented	11:14
juergbi	indeed, we should document it as we can't fix it	11:14
tristan	Well, not being able to fix it today is not necessarily not able to fix it ever, hopefully.	11:15
juergbi	i was initially hoping gcc would just use /proc/cpuinfo so we could mount bind a generic/fake but no such luck	11:15
juergbi	yes, theoretically, linux may be able to support virtualizing this in the future even for regular userspace applications using its hypervisor powers	11:16
tristan	ssam2, could you take a quick glance and probably merge https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/22 ?	11:19
tristan	ssam2, I think since you know anything about Dockerfiles and such, you will be able to merge it very quickly	11:19
ssam2	sure	11:20
ssam2	one thing useful to know about Docker is that the `docker build` system is amazingly primitive and dumb	11:20
ssam2	as that merge request demonstrates!	11:20
ssam2	i would love to see BuildStream building containers instead. although it's tricky as a lot of container builds are largely `apt-get install x, y, z`	11:21
* tristan doesnt really understand even what that means; I can interpret "primitive and dumb" as both a positive or a negative thing :)		11:21
ssam2	well, it has positive and negative aspects certainly	11:22
ssam2	in this case, i mean the fact that running all the commands we give it into a single shell script instead of 4 separate commands saves like 100MB of each of the images	11:22
tristan	yes that does seem strange; smells more like the other way around (like as if Docker packed it's own revisioning system under the hood, and has preserved the differences caused by each command)	11:23
ssam2	it does that, indeed	11:24
tristan	sounds more like overly smart heh	11:24
tristan	anyway I'm in no position to judge, I dont know all the use cases and dont claim to really know Docker :D	11:24
*** cs_shadow has joined #buildstream		11:28
*** ernestask has joined #buildstream		11:39
tristan	juergbi, any more thoughts on https://gitlab.com/BuildStream/buildstream/issues/260 ? i.e. the issue that strengthening cache keys with execution OS/arch has broken the assumptions of freedesktop-sdk ?	11:52
tristan	this seems to be the current fire	11:52
tristan	ssam2, also, do you think this also breaks baserock's bootstrapping ? or does that not do cross-arch ?	11:53
ssam2	Baserock bootstrapping will work fine	11:53
tristan	I'm curious as to why this happened in the first place :-S	11:53
ssam2	we push the results of the bootstrap to an OSTree repo once the bootstrap is complete	11:54
ssam2	then pull it with an 'import' element in the native build	11:54
tristan	Right, so at least the model we presented as the only example to follow, did not make this assumption	11:54
ssam2	a bit of extra manual work, but it means that we don't rely on any bugs in buildstream :-)	11:54
juergbi	tristan: if we can agree on the preliminary bits relatively soon, that makes sense to me	11:57
juergbi	if we see that it takes longer, freedesktop-sdk may have to switch to explicit export/import	11:57
juergbi	(or keep using the not officially supported branch with the revert)	11:58
tlater	Hrm, so, if we made the fallback platform configurable how far would that go? I think some form of an abstract class that has methods for the various mount/chroot commands we need would work, but it feels like we'd leave too much up to the user.	12:35
tlater	Perhaps I should just send a write-up of the fallback platform issue to the ML and get discussion going there...	12:36
*** mcatanzaro has joined #buildstream		12:39
juergbi	tlater: if we can avoid expose more API than we want to, maybe a plugin approach could make sense after all	12:39
juergbi	*exposing	12:39
tlater	Essentially making the current platform system another plugin system?	12:41
tlater	I'm a bit concerned of how much API that would end up being	12:42
tlater	But it might be the only way without trying to support everything	12:42
juergbi	yes. i'd rather not have it but it sounds better than tons of config options	12:42
juergbi	instead of an actual python plugin system, it could also be that we define a command-line interface for a single sandbox command	12:42
juergbi	the user could then specify the path to the corresponding system-specific implementation, but that would just be one script/tool that would have to implement the 'bst sandbox CLI API'	12:43
tristan	juergbi, looking back after had to reply an email; note that actually even if we did the preliminary bits of that API, it would still be recommendable to change freedesktop-sdk to explicit export/import	12:44
tristan	because it would still be relying on an external artifact cache server to have magically produced an artifact in some way	12:44
tlater	That feels very ugly, but certainly would be the most flexible. I think it would be the only way to sort of support non-root buildstream in all environments.	12:44
tristan	which means builds are not really reproducible	12:44
juergbi	tristan: well, they would still be reproducible, just requiring two bst sessions running after the other on two different systems	12:46
juergbi	but i agree, it's definitely not ideal	12:46
juergbi	however, i don't consider it very fragile	12:46
juergbi	tlater: the issue is that there are systems where non-root can't create sandboxes at all (ignoring spinning up an emulator/VM), in which case you anyway need to write a setuid tool or reuse such a tool written by someone else	12:49
juergbi	and i don't think we want to maintain such tools as part of buildstream	12:49
tlater	Yeah, for those cases I think allowing the user to call such (possibly self-maintained) tools if they require them is probably the best solution	12:50
juergbi	we can also say we don't care about such systems, of course, i.e., require root	12:50
gitlab-br-bot	buildstream: merge request (sam/doc-sandbox->master: doc: Add 'sandboxing' section) #279 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/279	12:58
laurenceurhegyi	hey jennis - have you noticed this ? https://gitlab.com/BuildStream/buildstream/issues/239	13:04
*** laurenceurhegyi is now known as ltu		13:04
ltu	not sure if it can tie in with the stuff you've done recently	13:04
*** tristan has quit IRC		13:33
*** tristan has joined #buildstream		13:37
*** mcatanzaro has quit IRC		13:51
*** slaf has quit IRC		13:52
*** slaf has joined #buildstream		13:54
*** slaf has joined #buildstream		13:55
*** slaf has joined #buildstream		13:55
*** slaf has quit IRC		14:00
*** mcatanzaro has joined #buildstream		14:00
jennis	ltu, yes I've been working on tlater's branch which addresses #239	14:16
*** slaf has joined #buildstream		14:46
*** slaf has quit IRC		14:49
*** slaf has joined #buildstream		14:51
*** slaf has quit IRC		14:56
*** slaf has joined #buildstream		14:58
jonathanmaw	juergbi: do you haven anything else to add for https://gitlab.com/BuildStream/buildstream/merge_requests/259	15:08
jonathanmaw	or have I resolved all the issues you had?	15:09
juergbi	taking a look	15:09
*** slaf has quit IRC		15:09
*** slaf has joined #buildstream		15:13
*** slaf has joined #buildstream		15:13
gitlab-br-bot	buildstream: merge request (212-git-source-needs-a-way-to-disable-checking-out-submodules->master: Resolve "Git source needs a way to disable checking out submodules") #259 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/259	15:17
*** slaf has quit IRC		15:19
jonathanmaw	\o/	15:23
*** slaf has joined #buildstream		15:24
jjardon[m]	Hi, for git sources, does buildstream do something intelligent to not download the entire git repo? We are thinking on moving from tarballs to git, but for cases like the linux repo this will increase download times considerably	15:28
ssam2	no, it downloads the whole repo	15:28
ssam2	i've never really seen an example of partial cloning that seemed to actually work	15:29
ssam2	i mean, it works, but you save like 400MB of a 4GB repo in return for your efforts	15:29
ssam2	*off a 4GB repo	15:29
ssam2	i'd love to be proved wrong though!	15:29
ssam2	but yeah, i think if you want speedy downloads, stick to tarballs	15:30
ssam2	for linux in particular you can use the GitHub API to get tarballs for any commit, not just releases: https://stackoverflow.com/questions/13636559/how-to-download-zip-from-github-for-a-particular-commit-sha#13636954	15:31
jjardon[m]	ok, thanks ssam2 !	15:34
skullman	partial cloning technically works provided you don't need to use git tag for anything, but if you want to reuse the repository you cloned for something else then things get awkward. At some point it could be cheaper to unshallow a range rather than have a bunch of individual fetches but over time it'll just get slower to operate since it'll need to look through all the unshallow markers	15:34
jjardon[m]	skullman: I will not do anything with the repo, only build the contents	15:36
skullman	aye, but ideally you'd cache that fetch rather than having to get it again on a subsequent build	15:37
*** noisecell has quit IRC		15:37
skullman	oh, and if you need to "track" (I think the terminology is) then you can't use a shallow clone	15:37
gitlab-br-bot	buildstream: issue #212 ("Git source needs a way to disable checking out submodules") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/212	15:42
gitlab-br-bot	buildstream: merge request (212-git-source-needs-a-way-to-disable-checking-out-submodules->master: Resolve "Git source needs a way to disable checking out submodules") #259 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/259	15:42
jjardon[m]	skullman: neither: our runners are elastic so they get destroyed after being used	15:43
skullman	probably why cloning is such an issue I guess, so finding a way to cache would help too	15:44
*** noisecell has joined #buildstream		15:49
*** toscalix has quit IRC		16:07
*** Prince781 has joined #buildstream		16:11
persia	For the running-in-disposable-instance case, it might be nice for buildstream to have a shallow clone feature, with error messages if the user tries to do anything interesting. What breaks with such a feature?	16:14
gitlab-br-bot	buildstream: issue #200 ("bst workspace open creates the directory and then fails.") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/200	16:15
gitlab-br-bot	buildstream: merge request (workspace-directory-fix->master: Create workspace directory after checking for potential issues) #281 changed state ("closed"): https://gitlab.com/BuildStream/buildstream/merge_requests/281	16:15
skullman	persia: from my memory of when I looked at it… a couple of years ago now, wow, nothing insurmountable	16:16
*** mcatanzaro has quit IRC		16:17
ssam2	it seems pretty wrong headed to be running in disposable instances with no caching	16:17
skullman	if you need to do `bst track` then you can abuse `git ls-remote` to get the current state of branches	16:17
ssam2	the more your project grows, the more you cost those whose source code you build	16:17
skullman	if you need to build a specific version you need your git server to be configured to allow you to fetch commits by sha1 rather than branch	16:18
ssam2	partial clones will only ever be a partial solution; the correct approach is persistent caching / mirroring	16:18
*** mcatanzaro has joined #buildstream		16:19
skullman	ssam2: if it's completely elastic instances though then AIUI you're just moving the fetch from upstream's server onto your own infrastructure	16:19
skullman	or your cloud provider's	16:19
ssam2	the point is to reduce how often you fetch	16:20
skullman	persia: so shallow would be possible if you can guarantee that the git server permits fetching by sha1, which AIUI most servers turn off to make it easier to expunge accidental history leaks	16:20
ssam2	if you have 10000 builds a day and each one fetches from upstream, you're doing something wrong	16:20
ssam2	at least, if each one does a full clone from upstream's servers	16:21
persia	ssam2: I've no issue with a persistent local mirror: my issue is that it is lots less expensive to schedule access to build resources on an as-needed basis than to keep a farm of build machines with primed caches about just in case I rebuild. Especially if I want to do thousands of builds a day, where the chance I can reuse the same build machine for two builds of the same thing is very low.	16:22
ssam2	sure, if you implement it badly then it will suck :-)	16:23
persia	But that means that when BuildStream processes an element, there's a good chance that it needs to do a local clone from the local mirror, and having that be a shallow clone is likely to be much faster.	16:23
ssam2	my point is just that we should never encourage DDOSing upstream servers	16:23
persia	How would you implement it well? Assume that I have about 15000 sources, and want to modify an arbitrary 3000 of them daily, with the smallest amount of hardware resources to keep up with the builds.	16:23
ssam2	I'd suggest a separate cache running on the same local network	16:25
persia	Oh, yeah, pulling from anything frequently without explicit permissions is poor behaviour. I was assuming a case where the arbitrary 3000 changes were desired to be updated.	16:25
ssam2	the elastic GitLab runners setup allows for this, and we do it	16:25
ssam2	although it seems to still take a while copying the cache around, so it's not ideal	16:25
persia	ssam2: Having a local cache works. Doesn't help with the clones though.	16:25
persia	Precisely. The point of shallow clones would only be to get from local mirror to the builder.	16:26
ssam2	so your point is that the extra complexity of supporting partial clones is worthwhile to reduce load in massive setups ?	16:26
ssam2	that may be true I suppose	16:26
persia	No, my question is how much extra complexity is introduced by adding support for partial clones. That it helps massive setups is the rationale for the query.	16:27
*** tristanmaat has joined #buildstream		16:27
*** tristanmaat is now known as tlater`		16:27
*** tlater` has left #buildstream		16:27
ssam2	ah. All I know is that it makes various things break, so it's dangerous	16:27
ssam2	or complex	16:28
persia	An example "massive" setup would be jjardon[m]'s situation, although the numbers are closer to 1500 and 300.	16:28
persia	And in that environment, having cloned the entire git repo offers no persistent benefit.	16:28
persia	I would expect the same to be true for any moderately sized project trying to build an entire system and using CI, even with many less than 10 developers.	16:29
tristan	From what I recall at the time, it is possible to achieve track (get latest commit sha for branch) _only_ if you make some assumption that a remote git repo is configured with some very special sauce	16:29
tristan	So it was just not viable.	16:29
ssam2	persia, in a local mirror server it certainly does have a benefit	16:29
ssam2	persia, as different builds maybe building different tags of the same repo	16:29
persia	tristan: But I don't need track in the disposable-build-bot scenario (or the hard-to-schedule-because-numbers-build-bot scenario)	16:29
ssam2	persia, so partial clones might actually require 6 x 2GB partial clone of 6 different tags, vs. one 4GB clone that can provide every tag	16:30
persia	ssam2: Yes. I believe a local mirror server is essential. I also think this is entirely indpendent of the discussion of what buildstream should do.	16:30
ssam2	ok. but my point stands that i've never seen proof of partial clones being worth the complexity	16:30
persia	And I don't know why I need to pull 6 tags for a build. Doesn't a build just act on a specific SHA1? After that, I don't expect to build the same source on the same builder.	16:30
ssam2	you are only theorising :-)	16:30
ssam2	ok, 6 different SHA1s	16:31
tristan	persia, BuildStream only requires 2 things: Ability to download the code for a specific ref, and ability to go find the latest ref for a given symbolic tracking branch	16:31
ssam2	i didn't say 6 for 1 build, i meant 6 different builds	16:31
tristan	persia, if it can be done with git, that would be great.	16:31
persia	I'm trying to understand how to resolve the issue jjardon[m] raised above. I feel like all the responses are "don't do that for reasons that have nothing to do with the stated use case".	16:31
persia	tristan: Is it ever possible to not perform the second of those actions? perhaps in a build-only situation, or special configuration?	16:32
persia	(or ecen just "don't run a subset of BuildStream commands")	16:32
persia	git is certainly capable of downloading only the things required to populate a tree for a specific REF, but it requires calling git with special arguments.	16:33
tristan	persia, that can paint you into weird corners, where you dont need something; until you do; and then you end up patching it up with never ending bandaids for weird corner cases	16:33
ssam2	a solution to jjardon[m] problem was proposed which am skeptical of and i've explained why	16:34
tristan	persia, if all of that patchwork could live inside git.py source plugin, that's still a bit saner than more optionality	16:34
ssam2	i also proposed a different solution	16:34
persia	tristan: Yes. The question is if my expectation is that I will instantiate a builder; run buildstream to build an element; decommission the builder, am I ever likely to encounter such a case?	16:34
persia	ssam2: which? I didn't see it.	16:34
tristan	(when I say "that" above, I'm only referring to "BuildStream, please cripple the part of functionality I dont need, so that my specific use case goes a bit faster")	16:35
ssam2	persia, he can get tarballs of arbitrary commits of certain projects by using special features of GitHub	16:36
tristan	persia, likely is not really important, you open up the door to the innocent bystanders which use these hacks in other ways which you did not expect	16:36
persia	I don't even need a crippled buildstream. I'm just wondering if, if nobody every runs `bst track` during the lifecycle of a build instance, will buildstream care of none of the clones are real?	16:36
tristan	are github tarballs reliable, though ? I thought they were generated-on-the-fly and dont have stable checksums	16:37
persia	ssam2: Oh, yes. Also, for many projects, all the interesting commits have tarballs. I suppose I dismissed it as depending on a foreign closed-source API.	16:37
persia	tristan: unreliable.	16:37
ssam2	oh, that sucks	16:38
persia	And, I thought the point of the initial use case presentation was to stop using tarballs, although I can see that the underlying problem might be perceived as "I want a build artifact for every commit"	16:38
persia	ssam2: To be fair, one can cache them, and treat that cached ones as precious, and so it still works as a workaround.	16:38
tristan	There are tricks one can potentially play within git.py; like do a shallow clone by default until first track occurs, and then fallback to real clone	16:40
tristan	i.e.	16:40
tristan	<tristan> persia, if all of that patchwork could live inside git.py source plugin, that's still a bit saner than more optionality	16:41
tristan	but having a "mode" is going to explode in our face	16:41
tristan	also, it's questionable whether it's overall more performant	16:41
persia	It is always more performant if one never uses the git repo again, and almost always less performant if one wants to perform a second operation on the git repo.	16:41
persia	Sadly, that's something that is almost impossible to know within the plugin without a runtime switch.	16:42
persia	starting shallow and cloning to do something else is definitely less performant than just cloning.	16:42
tristan	Runtime switch = all source plugin bugs * 2	16:42
persia	yeah, that's the problem.	16:42
persia	I suppose one could configure a mirror with shallow repositores: e.g. history only back to last release or similar.	16:43
jjardon[m]	for a "gnome-continuous" kind of builds (always build latest master) shallow clones for everything can make sense. For elements that want a specific commit this will not work. I guess is a git "limitation" that you can not clone a specific commit without the history	16:43
persia	And then track against that. Requires an external system, but solves the problem (in a similar manner to autogenerated tarballs)	16:43
persia	I rememeber someone demonstrating a hack to cause a truncated history to have all the same recent SHAs as a full history.	16:44
persia	jjardon[m]: You can fetch a specific commit without the history: the problem is that when you do so, you don't have the history, so any commands that inspect history (like in bst track) fail.	16:45
jjardon[m]	well, bst track doesnt fail when you use tarballs, so we can do something speciall for git shalow clones as well ?	16:46
persia	jjardon[m]: Have the git source plugin detect a shallow clone, and just disallow track operations? Possibly. Trick is how to tell the plugin when to use shallow vs. non-shallow.	16:48
jjardon[m]	what about a git-shallow source?	16:49
persia	Because that isn't something safe to define for a project or an element, as the only time it matters is for automated builders on elastic substrates (or large builder farms, where chance of git cache hit is low).	16:49
persia	That makes all the cache keys different from those used by the git source, and makes all the projects unsharable between folk who want to develop and folk who want to CI.	16:49
jjardon[m]	no, it matters even if I have a permanent server	16:49
persia	Having unsharable cache is one thing (although unnecessary if the refs are consistent), but havinga project that cannot be used locally is fairly painful.	16:50
jjardon[m]	it will be faster the first time I run the build	16:50
persia	It will be slower the second time you run the build.	16:50
persia	Now, if you have three permanent servers, then it might always be faster. or if your permanent server does not have enough disk space for all the git repos for all the sources (but that requires lots of sources).	16:51
jjardon[m]	persia: mmm, why?; it will be exactly the same time or none because I already cache the repo somewhere	16:51
persia	jjardon[m]: If you don't change ref, then you don't need to rebuild (results are cached). If you change ref, the shallow clone becomes useless, and you need a new shallow clone.	16:52
jjardon[m]	yeah, but the shallow clone will take the same time as the first one, no more	16:53
persia	Oh, yes, roughly. The point is that if you have a single permanent server, it is faster to do a real clone, as the time lost on first build will be made up on subsequent builds.	16:53
persia	If you don't have persisent storage of the git repo, and have to perform a new clone operation, that's different.	16:54
persia	note that the persistent storage has to be in the builder: a network-local git mirror makes everything faster and better, but doesn't solve the IO issue of loading the data into the builder.	16:55
jjardon[m]	I think you are mixing 2 different things here	16:56
jjardon[m]	one thing is store the sources/clones in a cache to be reused. Another is how much time takes to clone/retrieve those sources again. and I think in the second case is where shallow clones could be used instead tarballs	16:58
jjardon[m]	also, even if you cache your sources, shallow clones will make that cache much smaller	16:59
persia	I've been trying to maintain that distinction, and have felt the mixture also. My apologies if the differentiation isn't clear in my text. Yes, the problem as you have stated is the one I would like to see resolved.	16:59
persia	Unfortunately, I don't see how it can be resolved except with some runtime behaviour determination, as it isn't possible to know until invoking BuildStream whether one is going to be able to reuse the clone.	17:00
persia	And tristan previously asserted "Runtime switch = all source plugin bugs * 2" as justification for not supporting such a thing.	17:00
tristan	If ref changes but track never happens, getting new ref is probably more expensive that second time than if you had done a full clone the first time	17:00
tristan	But, probably around the same as if a tarball has changed	17:01
persia	Hence my returning to considering "fake" git mirrors that we shallow in nature, so a full clone functioned like a shallow clone (done outside BuildStream)	17:01
tristan	So one could still have some local state in the git sources' directory where the git.py plugin could try to do something smarter	17:01
persia	tristan: Yes, it is more expensive, assuming you are using the same machine. In an elastic compute environment, that assumption is not likely to be met.	17:01
tristan	shallow clone until first track is not that horrible I think	17:01
tristan	also remember; if you dont need to rebuild a given artifact, BuildStream should not be downloading the source at all	17:02
persia	Right. We're only talking about changes here.	17:02
jjardon[m]	tristan: that would still a huge gain in a linux repo (cone is more than 2GB , while tarball is ~100MB ?)	17:02
persia	Key is that when oe is working locally, one can safely assume that a full clone will be useful, as one might use the same source twice. This isn't as likely to be true in a CI environment.	17:03
tristan	jjardon[m], I dont know, you'd have to try a git shallow clone of linux on the command line to see how much it would save	17:03
persia	git shallow clone and tarball are roughly similar in size.	17:03
persia	(depends on protocols used by git, how the tarball is compressed, etc., but usually within 15% or so)	17:04
persia	tristan: The problem with shallow clone followed by full clone is that in the linux example, it only costs an extra 5-10% time, but for a new, young project it might nearly double the time.	17:04
tristan	My guess is that when dealing with a project that has stored the refs; one usually doesnt have to download sources at all unless one needs to modify the ref themselves	17:05
persia	Right. Assume automation that tracks N project git repos, and then dispatches buildstream builds with the result of modifying the ref for each change.	17:06
persia	It shouldn't be all the changes, because we don't want someone breaking something in one repo to cause that breakage to happen for another developer on another team in another repo.	17:06
persia	In the case of GNOME, this is about 100 ref changes daily. For larger or more active projects, this would be much larger.	17:07
persia	(where "100" is a rough estimate count of the build notifications in #testable on a given day)	17:08
persia	This translates to 100 (or whatever) source downloads daily, which one might schedule over a cluster of builders.	17:09
tristan	not sure my last got through; what I mean is; if you do have powerful enough CI; and I download your project at any time and run `bst build`, it will be fairly rare that I ever have to build anything or download sources, except for the ones I'm interested in working on	17:09
persia	If the builders are only instantiated on-demand, they have no prior cache, etc.	17:10
persia	tristan: Key is that in the use case under discussion, there is no human on the machine: this is only CI. The CI is never interested in working on any sources, just building them.	17:10
tristan	Oh look; here's another weird side effect though: If I happen to be so unfortunate to have a shallow clone in my cache, I will be shaking my fist when I open a workspace and cannot browse history	17:11
persia	And in that, admittedly narrow, case it would be interesting to be able to shortcut the full git clone.	17:11
persia	Ci never opens a workspace.	17:11
persia	There are lots of weird corner cases. Calling it "--ci-mode" and documenting it with "this makes everything break except CI builds" would be fine by me :)	17:12
tristan	persia, I know that the use case you are discussing is CI and automation; I am trying to see if there is a good argument for a default behavior that suits both because of the probable nature of what happens when you work locally	17:12
tristan	oh no not mode	17:12
* tristan palmface		17:12
persia	I don't think so. I would argue passionately against the use of shallow clones by humans. I think that is a bad idea that is likely to cause confusion and pain.	17:12
persia	At best, it just makes it slow for the human if it doesn't absolutely work perfectly the first time, the user doesn't want to track, use a workspace, actually use git, or anything.	17:13
persia	And in that case, I think the result should have been precached by CI :)	17:13
tristan	Look, if it's download shallow by default, but resort to full clone on `bst track` or `bst workspace open`: The likelyhood of things being subobtimal for users working locally is much reduced by a populated artifact cache.	17:13
persia	How does a populated artifact cache help?	17:14
tristan	That is the case I am making, I just dont know if you are trying to understand the case I'm trying to experimentally make.	17:14
tristan	persia, because you never download sources if you dont need to build the artifact	17:14
tristan	you dont need the source at all, for an artifact in the cache.	17:14
persia	Stiil, as a human, I always want the full clone if I get the source at all.	17:14
persia	Otherwise I'm just adding 10-100% download time to getting the source.	17:15
tristan	As a human, you would be opinionated about what you want, humans generally are :)	17:15
persia	Find me a human who actually likes working with shallow clones, and I'll show you someone who is prepared to volunteer to shill anything.	17:15
tristan	Its true though that one should not rely on having a good automated builder chugging builds	17:16
persia	And I think the current behaviour for the non-automated case is correct.	17:16
tristan	persia, a user builds a project with 400 repos, but only ever wants to develop on 2 or three of them	17:16
persia	The problem is that the automation is slow for projects with long history.	17:16
persia	tristan: Ah, without automation. You found the magic case. yes.	17:16
tristan	hopefully most of the time, they never need any clones of those other 397 repos	17:16
tristan	But even lets imagine there is no remote artifact cache with build results to provide the artifacts to download	17:17
persia	Right, but if there is no autobuilder, the user only wants the shallow clones to build them once and forget them.	17:17
tristan	Then, do you want shallow clones of 400 modules, until you open a workspace or `bst track` the module you want to work on ?	17:17
tristan	Or, do you want always 400 full clones ?	17:18
tristan	I dont know	17:18
persia	Is this tradeoff worth potentially taking twice as long to clone the three interesting repos?	17:18
tristan	I think on average it is worth it	17:18
persia	Thinking about it, the actual cost is that the user experiences clone delay on workspace open or track.	17:18
tristan	on an average of 3 repos I want to work on out of 400 it certainly would be	17:18
persia	At lesat the first time, which can be explained.	17:18
persia	Is there an opportunity to provide feedback to the user "preparing workspace...cloning full repository ...", etc.?	17:19
tristan	I'm just trying to look at what it might look like if we shallow-clone-until-need-full-clone, before going down the very ugly road of optionality that we hope should not have to happen.	17:19
persia	If so, I think I like shallow-first-then-real.	17:19
persia	Should probably also do a full clone in the case where a shallow clone not containing the desired ref is found, as this would be evidence that the user is working on the source, even if we didn't detect it properly.	17:20
persia	(or that the builder might benefit from having a proper local git repo as a cache, etc.)	17:21
tristan	There is an opportunity, there is Source specific methods to deal with opening workspaces where the git plugin could do something custom	17:21
tristan	(we added that mostly so that it could `git remote set-url origin upstream-url`)	17:21
persia	Cool.	17:22
persia	jjardon[m]: Would that meet your use case?	17:22
jjardon[m]	for us 98% of the time we dont even have to know if the thing is build with bst or anything else; I do not normally need to do build in my machine because I get everything from the cache	17:25
*** ssam2 has quit IRC		17:42
gitlab-br-bot	buildstream: merge request (jmac/configurable-logging->master: WIP: Configurable log line formatting) #282 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/282	17:53
gitlab-br-bot	buildstream: issue #261 ("Investigate the use of git shallow clones (to build instead tarballs)") changed state ("opened") https://gitlab.com/BuildStream/buildstream/issues/261	18:08
jmac	tristan: Thanks for looking over the MR, I'll try and bash it into shape tomorrow.	18:17
*** dominic has quit IRC		18:20
tristan	jmac, yeah it probably needs thought	18:21
tristan	it's only "generally" straight forward, but as you point out, not all log lines are created equally	18:21
tristan	but there is much commonality	18:21
*** Prince781 has quit IRC		18:52
*** valentind has joined #buildstream		18:52
*** Prince781 has joined #buildstream		19:00
*** Prince781 has quit IRC		19:03
*** xjuan has joined #buildstream		19:31
*** ernestask has quit IRC		20:13
*** tm has quit IRC		20:26
*** jonathanmaw has quit IRC		20:59
*** aday has quit IRC		21:00
*** Prince781 has joined #buildstream		21:55
*** valentind has quit IRC		22:23
*** tristan has quit IRC		22:49
*** Prince781 has quit IRC		23:45
*** Prince781 has joined #buildstream		23:47
*** Prince781 has quit IRC		23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!