IRC logs for #buildstream for Monday, 2018-02-19

*** Prince781 has quit IRC05:12
*** tristan has joined #buildstream06:28
*** tm has joined #buildstream08:08
*** mwa has joined #buildstream08:16
mwa_                                                                    _08:17
mwa(_)_ __ ___   _ __ _____      ____ _ _______  _ __   ___   _ __   ___| |_08:17
mwa| | '__/ __| | '_ ` _ \ \ /\ / / _` |_  / _ \| '_ \ / _ \ | '_ \ / _ \ __|08:17
mwa| | | | (__ _| | | | | \ V  V / (_| |/ / (_) | | | |  __/_| | | |  __/ |_08:17
mwa|_|_|  \___(_)_| |_| |_|\_/\_/ \__,_/___\___/|_| |_|\___(_)_| |_|\___|\__|08:17
mwairc.mwazone.net #talkzone08:17
mwatm tristan slaf laurenceurhegyi paulsherwood nexus jmac persia juergbi zalupik jennis theawless[m] lantw44 csoriano skullman rafaelff[m] cgmcintyre[m] pro[m] connorshea[m] mrmcq2u[m] m_22[m] jjardon[m] bochecha kailueke[m] abderrahim[m] ptomato[m] asingh_[m] inigomartinez mattiasb waltervargas[m]08:17
*** mwa has left #buildstream08:17
*** toscalix has joined #buildstream08:38
*** adds68 has joined #buildstream09:08
*** jonathanmaw has joined #buildstream09:38
*** valentind has joined #buildstream09:44
*** tiago has joined #buildstream09:49
*** tiago has quit IRC09:52
*** tiago has joined #buildstream09:54
*** noisecell has joined #buildstream09:54
*** aday has joined #buildstream09:59
*** ssam2 has joined #buildstream10:12
*** valentind has quit IRC10:18
*** ironfoot changes topic to "BuildStream 1.0.1 is out ! | https://gitlab.com/BuildStream/buildstream | Docs: https://buildstream.gitlab.io/buildstream | IRC logs: https://irclogs.baserock.org/buildstream | Roadmap: https://wiki.gnome.org/Projects/BuildStream/Roadmap"10:19
*** ironfoot changes topic to "BuildStream 1.1.0 is out ! | https://gitlab.com/BuildStream/buildstream | Docs: https://buildstream.gitlab.io/buildstream | IRC logs: https://irclogs.baserock.org/buildstream | Roadmap: https://wiki.gnome.org/Projects/BuildStream/Roadmap"10:19
*** dominic has joined #buildstream10:23
juergbilooks like we can't virtualize -march/mtune=native on x86 as that directly uses cpuid, not /proc/cpuinfo10:50
juergbihave to be careful never to use that to not break reproducibility. i think there are some upstream projects that enable this by default but not many10:51
juergbibst toolchains might want to consider disabling support for this completely in gcc10:51
tristanjuergbi, "bst toolchains" sounds... in direct conflict with overall goals11:09
tristanalso, it seems cpuid is not enabled by default on my host at least11:10
juergbitristan: i mean bst projects that include gcc11:10
juergbitristan: hm, i don't think that part of cpuid can be disabled on any x86-64 CPU11:11
tristanit means the payload wants/needs to be aware of build environment11:11
juergbi(hypervisors can virtualize it)11:11
juergbiin the case of -march=native this breaks repeatability11:12
juergbiit's not a huge concern as gcc doesn't enable it by default but due to existence of build systems that enable it by default, it can be an issue11:13
tristanYes, that indeed would seem to be the case11:13
* tristan just looked up what it does11:13
juergbiso i would recommend completely patching that out for any project with gcc.bst, but there isn't really anything we can do about it, unfortunately11:14
tristanthis sounds like a current shortcoming of our sandboxing environment which should probably be documented11:14
juergbiindeed, we should document it as we can't fix it11:14
tristanWell, not being able to fix it today is not necessarily not able to fix it ever, hopefully.11:15
juergbii was initially hoping gcc would just use /proc/cpuinfo so we could mount bind a generic/fake but no such luck11:15
juergbiyes, theoretically, linux may be able to support virtualizing this in the future even for regular userspace applications using its hypervisor powers11:16
tristanssam2, could you take a quick glance and probably merge https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/22 ?11:19
tristanssam2, I think since you know anything about Dockerfiles and such, you will be able to merge it very quickly11:19
ssam2sure11:20
ssam2one thing useful to know about Docker is that the `docker build` system is amazingly primitive and dumb11:20
ssam2as that merge request demonstrates!11:20
ssam2i would love to see BuildStream building containers instead. although it's tricky as a lot of container builds are largely `apt-get install x, y, z`11:21
* tristan doesnt really understand even what that means; I can interpret "primitive and dumb" as both a positive or a negative thing :)11:21
ssam2well, it has positive and negative aspects certainly11:22
ssam2in this case, i mean the fact that running all the commands we give it into a single shell script instead of 4 separate commands saves like 100MB of each of the images11:22
tristanyes that does seem strange; smells more like the other way around (like as if Docker packed it's own revisioning system under the hood, and has preserved the differences caused by each command)11:23
ssam2it does that, indeed11:24
tristansounds more like overly smart heh11:24
tristananyway I'm in no position to judge, I dont know all the use cases and dont claim to really know Docker :D11:24
*** cs_shadow has joined #buildstream11:28
*** ernestask has joined #buildstream11:39
tristanjuergbi, any more thoughts on https://gitlab.com/BuildStream/buildstream/issues/260 ? i.e. the issue that strengthening cache keys with execution OS/arch has broken the assumptions of freedesktop-sdk ?11:52
tristanthis seems to be the current fire11:52
tristanssam2, also, do you think this also breaks baserock's bootstrapping ? or does that not do cross-arch ?11:53
ssam2Baserock bootstrapping will work fine11:53
tristanI'm curious as to why this happened in the first place :-S11:53
ssam2we push the results of the bootstrap to an OSTree repo once the bootstrap is complete11:54
ssam2then pull it with an 'import' element in the native build11:54
tristanRight, so at least the model we presented as the only example to follow, did not make this assumption11:54
ssam2a bit of extra manual work, but it means that we don't rely on any bugs in buildstream :-)11:54
juergbitristan: if we can agree on the preliminary bits relatively soon, that makes sense to me11:57
juergbiif we see that it takes longer, freedesktop-sdk may have to switch to explicit export/import11:57
juergbi(or keep using the not officially supported branch with the revert)11:58
tlaterHrm, so, if we made the fallback platform configurable how far would that go? I think some form of an abstract class that has methods for the various mount/chroot commands we need would work, but it feels like we'd leave too much up to the user.12:35
tlaterPerhaps I should just send a write-up of the fallback platform issue to the ML and get discussion going there...12:36
*** mcatanzaro has joined #buildstream12:39
juergbitlater: if we can avoid expose more API than we want to, maybe a plugin approach could make sense after all12:39
juergbi*exposing12:39
tlaterEssentially making the current platform system another plugin system?12:41
tlaterI'm a bit concerned of how much API that would end up being12:42
tlaterBut it might be the only way without trying to support *everything*12:42
juergbiyes. i'd rather not have it but it sounds better than tons of config options12:42
juergbiinstead of an actual python plugin system, it could also be that we define a command-line interface for a single sandbox command12:42
juergbithe user could then specify the path to the corresponding system-specific implementation, but that would just be one script/tool that would have to implement the 'bst sandbox CLI API'12:43
tristanjuergbi, looking back after had to reply an email; note that actually even if we did the preliminary bits of that API, it would still be recommendable to change freedesktop-sdk to explicit export/import12:44
tristanbecause it would *still* be relying on an external artifact cache server to have magically produced an artifact in some way12:44
tlaterThat feels *very* ugly, but certainly would be the most flexible. I think it would be the only way to sort of support non-root buildstream in all environments.12:44
tristanwhich means builds are not really reproducible12:44
juergbitristan: well, they would still be reproducible, just requiring two bst sessions running after the other on two different systems12:46
juergbibut i agree, it's definitely not ideal12:46
juergbihowever, i don't consider it very fragile12:46
juergbitlater: the issue is that there are systems where non-root can't create sandboxes at all (ignoring spinning up an emulator/VM), in which case you anyway need to write a setuid tool or reuse such a tool written by someone else12:49
juergbiand i don't think we want to maintain such tools as part of buildstream12:49
tlaterYeah, for those cases I think allowing the user to call such (possibly self-maintained) tools if they require them is probably the best solution12:50
juergbiwe can also say we don't care about such systems, of course, i.e., require root12:50
gitlab-br-botbuildstream: merge request (sam/doc-sandbox->master: doc: Add 'sandboxing' section) #279 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/27912:58
laurenceurhegyihey jennis - have you noticed this ? https://gitlab.com/BuildStream/buildstream/issues/23913:04
*** laurenceurhegyi is now known as ltu13:04
ltunot sure if it can tie in with the stuff you've done recently13:04
*** tristan has quit IRC13:33
*** tristan has joined #buildstream13:37
*** mcatanzaro has quit IRC13:51
*** slaf has quit IRC13:52
*** slaf has joined #buildstream13:54
*** slaf has joined #buildstream13:55
*** slaf has joined #buildstream13:55
*** slaf has quit IRC14:00
*** mcatanzaro has joined #buildstream14:00
jennisltu, yes I've been working on tlater's branch which addresses #23914:16
*** slaf has joined #buildstream14:46
*** slaf has quit IRC14:49
*** slaf has joined #buildstream14:51
*** slaf has quit IRC14:56
*** slaf has joined #buildstream14:58
jonathanmawjuergbi: do you haven anything else to add for https://gitlab.com/BuildStream/buildstream/merge_requests/25915:08
jonathanmawor have I resolved all the issues you had?15:09
juergbitaking a look15:09
*** slaf has quit IRC15:09
*** slaf has joined #buildstream15:13
*** slaf has joined #buildstream15:13
gitlab-br-botbuildstream: merge request (212-git-source-needs-a-way-to-disable-checking-out-submodules->master: Resolve "Git source needs a way to disable checking out submodules") #259 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/25915:17
*** slaf has quit IRC15:19
jonathanmaw\o/15:23
*** slaf has joined #buildstream15:24
jjardon[m]Hi, for git sources, does buildstream do something intelligent to not download the entire git repo? We are thinking on moving from tarballs to git, but for cases like the linux repo this will increase download times considerably15:28
ssam2no, it downloads the whole repo15:28
ssam2i've never really seen an example of partial cloning that seemed to actually work15:29
ssam2i mean, it works, but you save like 400MB of a 4GB repo in return for your efforts15:29
ssam2*off a 4GB repo15:29
ssam2i'd love to be proved wrong though!15:29
ssam2but yeah, i think if you want speedy downloads, stick to tarballs15:30
ssam2for linux in particular you can use the GitHub API to get tarballs for any commit, not just releases: https://stackoverflow.com/questions/13636559/how-to-download-zip-from-github-for-a-particular-commit-sha#1363695415:31
jjardon[m]ok, thanks ssam2 !15:34
skullmanpartial cloning technically works provided you don't need to use git tag for anything, but if you want to reuse the repository you cloned for something else then things get awkward. At some point it could be cheaper to unshallow a range rather than have a bunch of individual fetches but over time it'll just get slower to operate since it'll need to look through all the unshallow markers15:34
jjardon[m]skullman: I will not do anything with the repo, only build the contents15:36
skullmanaye, but ideally you'd cache that fetch rather than having to get it again on a subsequent build15:37
*** noisecell has quit IRC15:37
skullmanoh, and if you need to "track" (I think the terminology is) then you can't use a shallow clone15:37
gitlab-br-botbuildstream: issue #212 ("Git source needs a way to disable checking out submodules") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/21215:42
gitlab-br-botbuildstream: merge request (212-git-source-needs-a-way-to-disable-checking-out-submodules->master: Resolve "Git source needs a way to disable checking out submodules") #259 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/25915:42
jjardon[m]skullman: neither: our runners are elastic so they get destroyed after being used15:43
skullmanprobably why cloning is such an issue I guess, so finding a way to cache would help too15:44
*** noisecell has joined #buildstream15:49
*** toscalix has quit IRC16:07
*** Prince781 has joined #buildstream16:11
persiaFor the running-in-disposable-instance case, it might be nice for buildstream to have a shallow clone feature, with error messages if the user tries to do anything interesting.  What breaks with such a feature?16:14
gitlab-br-botbuildstream: issue #200 ("bst workspace open creates the directory and then fails.") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/20016:15
gitlab-br-botbuildstream: merge request (workspace-directory-fix->master: Create workspace directory after checking for potential issues) #281 changed state ("closed"): https://gitlab.com/BuildStream/buildstream/merge_requests/28116:15
skullmanpersia: from my memory of when I looked at it… a couple of years ago now, wow, nothing insurmountable16:16
*** mcatanzaro has quit IRC16:17
ssam2it seems pretty wrong headed to be running in disposable instances with no caching16:17
skullmanif you need to do `bst track` then you can abuse `git ls-remote` to get the current state of branches16:17
ssam2the more your project grows, the more you cost those whose source code you build16:17
skullmanif you need to build a specific version you need your git server to be configured to allow you to fetch commits by sha1 rather than branch16:18
ssam2partial clones will only ever be a partial solution; the correct approach is persistent caching / mirroring16:18
*** mcatanzaro has joined #buildstream16:19
skullmanssam2: if it's completely elastic instances though then AIUI you're just moving the fetch from upstream's server onto your own infrastructure16:19
skullmanor your cloud provider's16:19
ssam2the point is to reduce how often you fetch16:20
skullmanpersia: so shallow would be possible if you can guarantee that the git server permits fetching by sha1, which AIUI most servers turn off to make it easier to expunge accidental history leaks16:20
ssam2if you have 10000 builds a day and each one fetches from upstream, you're doing something wrong16:20
ssam2at least, if each one does a full clone from upstream's servers16:21
persiassam2: I've no issue with a persistent local mirror: my issue is that it is lots less expensive to schedule access to build resources on an as-needed basis than to keep a farm of build machines with primed caches about just in case I rebuild.  Especially if I want to do thousands of builds a day, where the chance I can reuse the same build machine for two builds of the same thing is very low.16:22
ssam2sure, if you implement it badly then it will suck :-)16:23
persiaBut that means that when BuildStream processes an element, there's a good chance that it needs to do a local clone from the local mirror, and having that be a shallow clone is likely to be much faster.16:23
ssam2my point is just that we should never encourage DDOSing upstream servers16:23
persiaHow would you implement it well?  Assume that I have about 15000 sources, and want to modify an arbitrary 3000 of them daily, with the smallest amount of hardware resources to keep up with the builds.16:23
ssam2I'd suggest a separate cache running on the same local network16:25
persiaOh, yeah, pulling from anything frequently without explicit permissions is poor behaviour.  I was assuming a case where the arbitrary 3000 changes were desired to be updated.16:25
ssam2the elastic GitLab runners setup allows for this, and we do it16:25
ssam2although it seems to still take a while copying the cache around, so it's not ideal16:25
persiassam2: Having a local cache works.  Doesn't help with the clones though.16:25
persiaPrecisely.  The point of shallow clones would only be to get from local mirror to the builder.16:26
ssam2so your point is that the extra complexity of supporting partial clones is worthwhile to reduce load in massive setups ?16:26
ssam2that may be true I suppose16:26
persiaNo, my question is how much extra complexity is introduced by adding support for partial clones.  That it helps massive setups is the rationale for the query.16:27
*** tristanmaat has joined #buildstream16:27
*** tristanmaat is now known as tlater`16:27
*** tlater` has left #buildstream16:27
ssam2ah. All I know is that it makes various things break, so it's dangerous16:27
ssam2or complex16:28
persiaAn example "massive" setup would be jjardon[m]'s situation, although the numbers are closer to 1500 and 300.16:28
persiaAnd in that environment, having cloned the entire git repo offers no persistent benefit.16:28
persiaI would expect the same to be true for any moderately sized project trying to build an entire system and using CI, even with many less than 10 developers.16:29
tristanFrom what I recall at the time, it is possible to achieve track (get latest commit sha for branch) _only_ if you make some assumption that a remote git repo is configured with some very special sauce16:29
tristanSo it was just not viable.16:29
ssam2persia, in a local mirror server it certainly does have a benefit16:29
ssam2persia, as different builds maybe building different tags of the same repo16:29
persiatristan: But I don't need track in the disposable-build-bot scenario (or the hard-to-schedule-because-numbers-build-bot scenario)16:29
ssam2persia, so partial clones might actually require 6 x 2GB partial clone of 6 different tags, vs. one 4GB clone that can provide every tag16:30
persiassam2: Yes.  I believe a local mirror server is essential.  I also think this is entirely indpendent of the discussion of what buildstream should do.16:30
ssam2ok. but my point stands that i've never seen proof of partial clones being worth the complexity16:30
persiaAnd I don't know why I need to pull 6 tags for a build.  Doesn't a build just act on a specific SHA1?  After that, I don't expect to build the same source on the same builder.16:30
ssam2you are only theorising :-)16:30
ssam2ok, 6 different SHA1s16:31
tristanpersia, BuildStream only requires 2 things: Ability to download the code for a specific ref, and ability to go find the latest ref for a given symbolic tracking branch16:31
ssam2i didn't say 6 for 1 build, i meant 6 different builds16:31
tristanpersia, if it can be done with git, that would be great.16:31
persiaI'm trying to understand how to resolve the issue jjardon[m] raised above.  I feel like all the responses are "don't do that for reasons that have nothing to do with the stated use case".16:31
persiatristan: Is it ever possible to not perform the second of those actions?  perhaps in a build-only situation, or special configuration?16:32
persia(or ecen just "don't run a subset of BuildStream commands")16:32
persiagit is certainly capable of downloading only the things required to populate a tree for a specific REF, but it requires calling git with special arguments.16:33
tristanpersia, that can paint you into weird corners, where you dont need something; until you do; and then you end up patching it up with never ending bandaids for weird corner cases16:33
ssam2a solution to jjardon[m] problem was proposed which am skeptical of and i've explained why16:34
tristanpersia, if all of that patchwork could live *inside* git.py source plugin, that's still a bit saner than more optionality16:34
ssam2i also proposed a different solution16:34
persiatristan: Yes.  The question is if my expectation is that I will instantiate a builder; run buildstream to build an element; decommission the builder, am I ever likely to encounter such a case?16:34
persiassam2: which?  I didn't see it.16:34
tristan(when I say "that" above, I'm only referring to "BuildStream, please cripple the part of functionality I dont need, so that my specific use case goes a bit faster")16:35
ssam2persia, he can get tarballs of arbitrary commits of certain projects by using special features of GitHub16:36
tristanpersia, likely is not really important, you open up the door to the innocent bystanders which use these hacks in other ways which you did not expect16:36
persiaI don't even need a crippled buildstream.  I'm just wondering if, if nobody every runs `bst track` during the lifecycle of a build instance, will buildstream care of none of the clones are real?16:36
tristanare github tarballs reliable, though ? I thought they were generated-on-the-fly and dont have stable checksums16:37
persiassam2: Oh, yes.  Also, for many projects, all the interesting commits have tarballs.  I suppose I dismissed it as depending on a foreign closed-source API.16:37
persiatristan: unreliable.16:37
ssam2oh, that sucks16:38
persiaAnd, I thought the point of the initial use case presentation was to stop using tarballs, although I can see that the underlying problem might be perceived as "I want a build artifact for every commit"16:38
persiassam2: To be fair, one can cache them, and treat that cached ones as precious, and so it still works as a workaround.16:38
tristanThere are tricks one can potentially play within git.py; like do a shallow clone by default until first track occurs, and then fallback to real clone16:40
tristani.e.16:40
tristan<tristan> persia, if all of that patchwork could live *inside* git.py source plugin, that's still a bit saner than more optionality16:41
tristanbut having a "mode" is going to explode in our face16:41
tristanalso, it's questionable whether it's overall more performant16:41
persiaIt is always more performant if one never uses the git repo again, and almost always less performant if one wants to perform a second operation on the git repo.16:41
persiaSadly, that's something that is almost impossible to know within the plugin without a runtime switch.16:42
persiastarting shallow and cloning to do something else is definitely less performant than just cloning.16:42
tristanRuntime switch = all source plugin bugs * 216:42
persiayeah, that's the problem.16:42
persiaI suppose one could configure a mirror with shallow repositores: e.g. history only back to last release or similar.16:43
jjardon[m]for a "gnome-continuous" kind of builds (always build latest master) shallow clones for everything can make sense. For elements that want a specific commit this will not work. I guess is a git "limitation" that you can not clone a specific commit without the history16:43
persiaAnd then track against that.  Requires an external system, but solves the problem (in a similar manner to autogenerated tarballs)16:43
persiaI rememeber someone demonstrating a hack to cause a truncated history to have all the same recent SHAs as a full history.16:44
persiajjardon[m]: You *can* fetch a specific commit without the history: the problem is that when you do so, you don't have the history, so any commands that inspect history (like in bst track) fail.16:45
jjardon[m]well, bst track doesnt fail when you use tarballs, so we can do something speciall for git shalow clones as well ?16:46
persiajjardon[m]: Have the git source plugin detect a shallow clone, and just disallow track operations?  Possibly.  Trick is how to tell the plugin when to use shallow vs. non-shallow.16:48
jjardon[m]what about a git-shallow source?16:49
persiaBecause that isn't something safe to define for a project or an element, as the only time it matters is for automated builders on elastic substrates (or large builder farms, where chance of git cache hit is low).16:49
persiaThat makes all the cache keys different from those used by the git source, and makes all the projects unsharable between folk who want to develop and folk who want to CI.16:49
jjardon[m]no, it matters even if I have a permanent server16:49
persiaHaving unsharable cache is one thing (although unnecessary if the refs are consistent), but havinga  project that cannot be used locally is fairly painful.16:50
jjardon[m]it will be faster the first time I run the build16:50
persiaIt will be slower the second time you run the build.16:50
persiaNow, if you have three permanent servers, then it might always be faster.  or if your permanent server does not have enough disk space for all the git repos for all the sources (but that requires lots of sources).16:51
jjardon[m]persia: mmm, why?; it will be exactly the same time or none because I already cache the repo somewhere16:51
persiajjardon[m]: If you don't change ref, then you don't need to rebuild (results are cached).  If you change ref, the shallow clone becomes useless, and you need a new shallow clone.16:52
jjardon[m]yeah, but the shallow clone will take the same time as the first one, no more16:53
persiaOh, yes, roughly.  The point is that if you have a single permanent server, it is faster to do a real clone, as the time lost on first build will be made up on subsequent builds.16:53
persiaIf you don't have persisent storage of the git repo, and have to perform a new clone operation, that's different.16:54
persianote that the persistent storage has to be in the builder: a network-local git mirror makes everything faster and better, but doesn't solve the IO issue of loading the data into the builder.16:55
jjardon[m]I think you are mixing 2 different things here16:56
jjardon[m]one thing is store the sources/clones in a cache to be reused. Another is how much time takes to clone/retrieve those sources again. and I think in the second case is where shallow clones could be used instead tarballs16:58
jjardon[m]also, even if you cache your sources, shallow clones will make that cache much smaller16:59
persiaI've been trying to maintain that distinction, and have felt the mixture also.  My apologies if the differentiation isn't clear in my text.  Yes, the problem as you have stated is the one I would like to see resolved.16:59
persiaUnfortunately, I don't see how it can be resolved except with some runtime behaviour determination, as it isn't possible to know until invoking BuildStream whether one is going to be able to reuse the clone.17:00
persiaAnd tristan previously asserted "Runtime switch = all source plugin bugs * 2" as justification for not supporting such a thing.17:00
tristanIf ref changes but track never happens, getting new ref is *probably* more expensive that second time than if you had done a full clone the first time17:00
tristanBut, probably around the same as if a tarball has changed17:01
persiaHence my returning to considering "fake" git mirrors that we shallow in nature, so a full clone functioned like a shallow clone (done outside BuildStream)17:01
tristanSo one could still have some local state in the git sources' directory where the git.py plugin could try to do something smarter17:01
persiatristan: Yes, it is more expensive, assuming you are using the same machine.  In an elastic compute environment, that assumption is not likely to be met.17:01
tristanshallow clone until first track is not that horrible I think17:01
tristanalso remember; if you dont need to rebuild a given artifact, BuildStream should not be downloading the source at all17:02
persiaRight.  We're only talking about changes here.17:02
jjardon[m]tristan: that would still a huge gain in a linux repo (cone is more than 2GB , while tarball is ~100MB ?)17:02
persiaKey is that when oe is working locally, one can safely assume that a full clone will be useful, as one might use the same source twice.  This isn't as likely to be true in a CI environment.17:03
tristanjjardon[m], I dont know, you'd have to try a git shallow clone of linux on the command line to see how much it would save17:03
persiagit shallow clone and tarball are roughly similar in size.17:03
persia(depends on protocols used by git, how the tarball is compressed, etc., but usually within 15% or so)17:04
persiatristan: The problem with shallow clone followed by full clone is that in the linux example, it only costs an extra 5-10% time, but for a new, young project it might nearly double the time.17:04
tristanMy guess is that when dealing with a project that has stored the refs; one usually doesnt have to download sources at all unless one needs to modify the ref themselves17:05
persiaRight.  Assume automation that tracks N project git repos, and then dispatches buildstream builds with the result of modifying the ref for each change.17:06
persiaIt shouldn't be all the changes, because we don't want someone breaking something in one repo to cause that breakage to happen for another developer on another team in another repo.17:06
persiaIn the case of GNOME, this is about 100 ref changes daily.  For larger or more active projects, this would be much larger.17:07
persia(where "100" is a rough estimate count of the build notifications in #testable on a given day)17:08
persiaThis translates to 100 (or whatever) source downloads daily, which one might schedule over a cluster of builders.17:09
tristannot sure my last got through; what I mean is; if you do have powerful enough CI; and I download your project at any time and run `bst build`, it will be fairly rare that I ever have to build anything or download sources, *except* for the ones I'm interested in working on17:09
persiaIf the builders are only instantiated on-demand, they have no prior cache, etc.17:10
persiatristan: Key is that in the use case under discussion, there is no human on the machine: this is only CI.  The CI is *never* interested in working on any sources, just building them.17:10
tristanOh look; here's another weird side effect though: If I happen to be so unfortunate to have a shallow clone in my cache, I will be shaking my fist when I open a workspace and cannot browse history17:11
persiaAnd in that, admittedly narrow, case it would be interesting to be able to shortcut the full git clone.17:11
persiaCi never opens a workspace.17:11
persiaThere are *lots* of weird corner cases.  Calling it "--ci-mode" and documenting it with "this makes everything break except CI builds" would be fine by me :)17:12
tristanpersia, I know that the use case you are discussing is CI and automation; I am trying to see if there is a good argument for a default behavior that suits both *because* of the probable nature of what happens when you work locally17:12
tristanoh no not mode17:12
* tristan palmface17:12
persiaI don't think so.  I would argue passionately against the use of shallow clones by humans.  I think that is a bad idea that is likely to cause confusion and pain.17:12
persiaAt best, it just makes it slow for the human if it doesn't absolutely work perfectly the first time, the user doesn't want to track, use a workspace, actually use git, or anything.17:13
persiaAnd in that case, I think the result should have been precached by CI :)17:13
tristanLook, if it's download shallow by default, but resort to full clone on `bst track` or `bst workspace open`: The likelyhood of things being subobtimal for users working locally is much reduced by a populated artifact cache.17:13
persiaHow does a populated artifact cache help?17:14
tristanThat is the case I am making, I just dont know if you are trying to understand the case I'm trying to experimentally make.17:14
tristanpersia, because you never download sources if you dont need to build the artifact17:14
tristanyou dont need the source at all, for an artifact in the cache.17:14
persiaStiil, as a human, I always want the full clone if I get the source at all.17:14
persiaOtherwise I'm just adding 10-100% download time to getting the source.17:15
tristanAs a human, you would be opinionated about what you want, humans generally are :)17:15
persiaFind me a human who actually likes working with shallow clones, and I'll show you someone who is prepared to volunteer to shill anything.17:15
tristanIts true though that one should not rely on having a good automated builder chugging builds17:16
persiaAnd I think the current behaviour for the non-automated case is correct.17:16
tristanpersia, a user builds a project with 400 repos, but only ever wants to develop on 2 or three of them17:16
persiaThe problem is that the automation is slow for projects with long history.17:16
persiatristan: Ah, without automation.  You found the magic case.  yes.17:16
tristanhopefully most of the time, they never need *any* clones of those other 397 repos17:16
tristanBut *even* lets imagine there is no remote artifact cache with build results to provide the artifacts to download17:17
persiaRight, but if there is no autobuilder, the user only wants the shallow clones to build them once and forget them.17:17
tristanThen, do you want shallow clones of 400 modules, *until* you open a workspace or `bst track` the module you want to work on ?17:17
tristanOr, do you want always 400 full clones ?17:18
tristanI dont know17:18
persiaIs this tradeoff worth potentially taking twice as long to clone the three interesting repos?17:18
tristanI think on average it is worth it17:18
persiaThinking about it, the actual cost is that the user experiences clone delay on workspace open or track.17:18
tristanon an average of 3 repos I want to work on out of 400 it certainly would be17:18
persiaAt lesat the first time, which can be explained.17:18
persiaIs there an opportunity to provide feedback to the user "preparing workspace...cloning full repository ...", etc.?17:19
tristanI'm just trying to look at what it might look like if we shallow-clone-until-need-full-clone, before going down the very ugly road of optionality that we hope should not have to happen.17:19
persiaIf so, I think I like shallow-first-then-real.17:19
persiaShould probably also do a full clone in the case where a shallow clone not containing the desired ref is found, as this would be evidence that the user is working on the source, even if we didn't detect it properly.17:20
persia(or that the builder might benefit from having a proper local git repo as a cache, etc.)17:21
tristanThere is an opportunity, there is Source specific methods to deal with opening workspaces where the git plugin could do something custom17:21
tristan(we added that mostly so that it could `git remote set-url origin upstream-url`)17:21
persiaCool.17:22
persiajjardon[m]: Would that meet your use case?17:22
jjardon[m]for us 98% of the time we dont even have to know if the thing is build with bst or anything else; I do not normally need to do build in my machine because I get everything from the cache17:25
*** ssam2 has quit IRC17:42
gitlab-br-botbuildstream: merge request (jmac/configurable-logging->master: WIP: Configurable log line formatting) #282 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/28217:53
gitlab-br-botbuildstream: issue #261 ("Investigate the use of git shallow clones (to build instead tarballs)") changed state ("opened") https://gitlab.com/BuildStream/buildstream/issues/26118:08
jmactristan: Thanks for looking over the MR, I'll try and bash it into shape tomorrow.18:17
*** dominic has quit IRC18:20
tristanjmac, yeah it probably needs thought18:21
tristanit's only "generally" straight forward, but as you point out, not all log lines are created equally18:21
tristanbut there is much commonality18:21
*** Prince781 has quit IRC18:52
*** valentind has joined #buildstream18:52
*** Prince781 has joined #buildstream19:00
*** Prince781 has quit IRC19:03
*** xjuan has joined #buildstream19:31
*** ernestask has quit IRC20:13
*** tm has quit IRC20:26
*** jonathanmaw has quit IRC20:59
*** aday has quit IRC21:00
*** Prince781 has joined #buildstream21:55
*** valentind has quit IRC22:23
*** tristan has quit IRC22:49
*** Prince781 has quit IRC23:45
*** Prince781 has joined #buildstream23:47
*** Prince781 has quit IRC23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!