*** Prince781 has quit IRC | 05:12 | |
*** tristan has joined #buildstream | 06:28 | |
*** tm has joined #buildstream | 08:08 | |
*** mwa has joined #buildstream | 08:16 | |
mwa | _ _ | 08:17 |
---|---|---|
mwa | (_)_ __ ___ _ __ _____ ____ _ _______ _ __ ___ _ __ ___| |_ | 08:17 |
mwa | | | '__/ __| | '_ ` _ \ \ /\ / / _` |_ / _ \| '_ \ / _ \ | '_ \ / _ \ __| | 08:17 |
mwa | | | | | (__ _| | | | | \ V V / (_| |/ / (_) | | | | __/_| | | | __/ |_ | 08:17 |
mwa | |_|_| \___(_)_| |_| |_|\_/\_/ \__,_/___\___/|_| |_|\___(_)_| |_|\___|\__| | 08:17 |
mwa | irc.mwazone.net #talkzone | 08:17 |
mwa | tm tristan slaf laurenceurhegyi paulsherwood nexus jmac persia juergbi zalupik jennis theawless[m] lantw44 csoriano skullman rafaelff[m] cgmcintyre[m] pro[m] connorshea[m] mrmcq2u[m] m_22[m] jjardon[m] bochecha kailueke[m] abderrahim[m] ptomato[m] asingh_[m] inigomartinez mattiasb waltervargas[m] | 08:17 |
*** mwa has left #buildstream | 08:17 | |
*** toscalix has joined #buildstream | 08:38 | |
*** adds68 has joined #buildstream | 09:08 | |
*** jonathanmaw has joined #buildstream | 09:38 | |
*** valentind has joined #buildstream | 09:44 | |
*** tiago has joined #buildstream | 09:49 | |
*** tiago has quit IRC | 09:52 | |
*** tiago has joined #buildstream | 09:54 | |
*** noisecell has joined #buildstream | 09:54 | |
*** aday has joined #buildstream | 09:59 | |
*** ssam2 has joined #buildstream | 10:12 | |
*** valentind has quit IRC | 10:18 | |
*** ironfoot changes topic to "BuildStream 1.0.1 is out ! | https://gitlab.com/BuildStream/buildstream | Docs: https://buildstream.gitlab.io/buildstream | IRC logs: https://irclogs.baserock.org/buildstream | Roadmap: https://wiki.gnome.org/Projects/BuildStream/Roadmap" | 10:19 | |
*** ironfoot changes topic to "BuildStream 1.1.0 is out ! | https://gitlab.com/BuildStream/buildstream | Docs: https://buildstream.gitlab.io/buildstream | IRC logs: https://irclogs.baserock.org/buildstream | Roadmap: https://wiki.gnome.org/Projects/BuildStream/Roadmap" | 10:19 | |
*** dominic has joined #buildstream | 10:23 | |
juergbi | looks like we can't virtualize -march/mtune=native on x86 as that directly uses cpuid, not /proc/cpuinfo | 10:50 |
juergbi | have to be careful never to use that to not break reproducibility. i think there are some upstream projects that enable this by default but not many | 10:51 |
juergbi | bst toolchains might want to consider disabling support for this completely in gcc | 10:51 |
tristan | juergbi, "bst toolchains" sounds... in direct conflict with overall goals | 11:09 |
tristan | also, it seems cpuid is not enabled by default on my host at least | 11:10 |
juergbi | tristan: i mean bst projects that include gcc | 11:10 |
juergbi | tristan: hm, i don't think that part of cpuid can be disabled on any x86-64 CPU | 11:11 |
tristan | it means the payload wants/needs to be aware of build environment | 11:11 |
juergbi | (hypervisors can virtualize it) | 11:11 |
juergbi | in the case of -march=native this breaks repeatability | 11:12 |
juergbi | it's not a huge concern as gcc doesn't enable it by default but due to existence of build systems that enable it by default, it can be an issue | 11:13 |
tristan | Yes, that indeed would seem to be the case | 11:13 |
* tristan just looked up what it does | 11:13 | |
juergbi | so i would recommend completely patching that out for any project with gcc.bst, but there isn't really anything we can do about it, unfortunately | 11:14 |
tristan | this sounds like a current shortcoming of our sandboxing environment which should probably be documented | 11:14 |
juergbi | indeed, we should document it as we can't fix it | 11:14 |
tristan | Well, not being able to fix it today is not necessarily not able to fix it ever, hopefully. | 11:15 |
juergbi | i was initially hoping gcc would just use /proc/cpuinfo so we could mount bind a generic/fake but no such luck | 11:15 |
juergbi | yes, theoretically, linux may be able to support virtualizing this in the future even for regular userspace applications using its hypervisor powers | 11:16 |
tristan | ssam2, could you take a quick glance and probably merge https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/22 ? | 11:19 |
tristan | ssam2, I think since you know anything about Dockerfiles and such, you will be able to merge it very quickly | 11:19 |
ssam2 | sure | 11:20 |
ssam2 | one thing useful to know about Docker is that the `docker build` system is amazingly primitive and dumb | 11:20 |
ssam2 | as that merge request demonstrates! | 11:20 |
ssam2 | i would love to see BuildStream building containers instead. although it's tricky as a lot of container builds are largely `apt-get install x, y, z` | 11:21 |
* tristan doesnt really understand even what that means; I can interpret "primitive and dumb" as both a positive or a negative thing :) | 11:21 | |
ssam2 | well, it has positive and negative aspects certainly | 11:22 |
ssam2 | in this case, i mean the fact that running all the commands we give it into a single shell script instead of 4 separate commands saves like 100MB of each of the images | 11:22 |
tristan | yes that does seem strange; smells more like the other way around (like as if Docker packed it's own revisioning system under the hood, and has preserved the differences caused by each command) | 11:23 |
ssam2 | it does that, indeed | 11:24 |
tristan | sounds more like overly smart heh | 11:24 |
tristan | anyway I'm in no position to judge, I dont know all the use cases and dont claim to really know Docker :D | 11:24 |
*** cs_shadow has joined #buildstream | 11:28 | |
*** ernestask has joined #buildstream | 11:39 | |
tristan | juergbi, any more thoughts on https://gitlab.com/BuildStream/buildstream/issues/260 ? i.e. the issue that strengthening cache keys with execution OS/arch has broken the assumptions of freedesktop-sdk ? | 11:52 |
tristan | this seems to be the current fire | 11:52 |
tristan | ssam2, also, do you think this also breaks baserock's bootstrapping ? or does that not do cross-arch ? | 11:53 |
ssam2 | Baserock bootstrapping will work fine | 11:53 |
tristan | I'm curious as to why this happened in the first place :-S | 11:53 |
ssam2 | we push the results of the bootstrap to an OSTree repo once the bootstrap is complete | 11:54 |
ssam2 | then pull it with an 'import' element in the native build | 11:54 |
tristan | Right, so at least the model we presented as the only example to follow, did not make this assumption | 11:54 |
ssam2 | a bit of extra manual work, but it means that we don't rely on any bugs in buildstream :-) | 11:54 |
juergbi | tristan: if we can agree on the preliminary bits relatively soon, that makes sense to me | 11:57 |
juergbi | if we see that it takes longer, freedesktop-sdk may have to switch to explicit export/import | 11:57 |
juergbi | (or keep using the not officially supported branch with the revert) | 11:58 |
tlater | Hrm, so, if we made the fallback platform configurable how far would that go? I think some form of an abstract class that has methods for the various mount/chroot commands we need would work, but it feels like we'd leave too much up to the user. | 12:35 |
tlater | Perhaps I should just send a write-up of the fallback platform issue to the ML and get discussion going there... | 12:36 |
*** mcatanzaro has joined #buildstream | 12:39 | |
juergbi | tlater: if we can avoid expose more API than we want to, maybe a plugin approach could make sense after all | 12:39 |
juergbi | *exposing | 12:39 |
tlater | Essentially making the current platform system another plugin system? | 12:41 |
tlater | I'm a bit concerned of how much API that would end up being | 12:42 |
tlater | But it might be the only way without trying to support *everything* | 12:42 |
juergbi | yes. i'd rather not have it but it sounds better than tons of config options | 12:42 |
juergbi | instead of an actual python plugin system, it could also be that we define a command-line interface for a single sandbox command | 12:42 |
juergbi | the user could then specify the path to the corresponding system-specific implementation, but that would just be one script/tool that would have to implement the 'bst sandbox CLI API' | 12:43 |
tristan | juergbi, looking back after had to reply an email; note that actually even if we did the preliminary bits of that API, it would still be recommendable to change freedesktop-sdk to explicit export/import | 12:44 |
tristan | because it would *still* be relying on an external artifact cache server to have magically produced an artifact in some way | 12:44 |
tlater | That feels *very* ugly, but certainly would be the most flexible. I think it would be the only way to sort of support non-root buildstream in all environments. | 12:44 |
tristan | which means builds are not really reproducible | 12:44 |
juergbi | tristan: well, they would still be reproducible, just requiring two bst sessions running after the other on two different systems | 12:46 |
juergbi | but i agree, it's definitely not ideal | 12:46 |
juergbi | however, i don't consider it very fragile | 12:46 |
juergbi | tlater: the issue is that there are systems where non-root can't create sandboxes at all (ignoring spinning up an emulator/VM), in which case you anyway need to write a setuid tool or reuse such a tool written by someone else | 12:49 |
juergbi | and i don't think we want to maintain such tools as part of buildstream | 12:49 |
tlater | Yeah, for those cases I think allowing the user to call such (possibly self-maintained) tools if they require them is probably the best solution | 12:50 |
juergbi | we can also say we don't care about such systems, of course, i.e., require root | 12:50 |
gitlab-br-bot | buildstream: merge request (sam/doc-sandbox->master: doc: Add 'sandboxing' section) #279 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/279 | 12:58 |
laurenceurhegyi | hey jennis - have you noticed this ? https://gitlab.com/BuildStream/buildstream/issues/239 | 13:04 |
*** laurenceurhegyi is now known as ltu | 13:04 | |
ltu | not sure if it can tie in with the stuff you've done recently | 13:04 |
*** tristan has quit IRC | 13:33 | |
*** tristan has joined #buildstream | 13:37 | |
*** mcatanzaro has quit IRC | 13:51 | |
*** slaf has quit IRC | 13:52 | |
*** slaf has joined #buildstream | 13:54 | |
*** slaf has joined #buildstream | 13:55 | |
*** slaf has joined #buildstream | 13:55 | |
*** slaf has quit IRC | 14:00 | |
*** mcatanzaro has joined #buildstream | 14:00 | |
jennis | ltu, yes I've been working on tlater's branch which addresses #239 | 14:16 |
*** slaf has joined #buildstream | 14:46 | |
*** slaf has quit IRC | 14:49 | |
*** slaf has joined #buildstream | 14:51 | |
*** slaf has quit IRC | 14:56 | |
*** slaf has joined #buildstream | 14:58 | |
jonathanmaw | juergbi: do you haven anything else to add for https://gitlab.com/BuildStream/buildstream/merge_requests/259 | 15:08 |
jonathanmaw | or have I resolved all the issues you had? | 15:09 |
juergbi | taking a look | 15:09 |
*** slaf has quit IRC | 15:09 | |
*** slaf has joined #buildstream | 15:13 | |
*** slaf has joined #buildstream | 15:13 | |
gitlab-br-bot | buildstream: merge request (212-git-source-needs-a-way-to-disable-checking-out-submodules->master: Resolve "Git source needs a way to disable checking out submodules") #259 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/259 | 15:17 |
*** slaf has quit IRC | 15:19 | |
jonathanmaw | \o/ | 15:23 |
*** slaf has joined #buildstream | 15:24 | |
jjardon[m] | Hi, for git sources, does buildstream do something intelligent to not download the entire git repo? We are thinking on moving from tarballs to git, but for cases like the linux repo this will increase download times considerably | 15:28 |
ssam2 | no, it downloads the whole repo | 15:28 |
ssam2 | i've never really seen an example of partial cloning that seemed to actually work | 15:29 |
ssam2 | i mean, it works, but you save like 400MB of a 4GB repo in return for your efforts | 15:29 |
ssam2 | *off a 4GB repo | 15:29 |
ssam2 | i'd love to be proved wrong though! | 15:29 |
ssam2 | but yeah, i think if you want speedy downloads, stick to tarballs | 15:30 |
ssam2 | for linux in particular you can use the GitHub API to get tarballs for any commit, not just releases: https://stackoverflow.com/questions/13636559/how-to-download-zip-from-github-for-a-particular-commit-sha#13636954 | 15:31 |
jjardon[m] | ok, thanks ssam2 ! | 15:34 |
skullman | partial cloning technically works provided you don't need to use git tag for anything, but if you want to reuse the repository you cloned for something else then things get awkward. At some point it could be cheaper to unshallow a range rather than have a bunch of individual fetches but over time it'll just get slower to operate since it'll need to look through all the unshallow markers | 15:34 |
jjardon[m] | skullman: I will not do anything with the repo, only build the contents | 15:36 |
skullman | aye, but ideally you'd cache that fetch rather than having to get it again on a subsequent build | 15:37 |
*** noisecell has quit IRC | 15:37 | |
skullman | oh, and if you need to "track" (I think the terminology is) then you can't use a shallow clone | 15:37 |
gitlab-br-bot | buildstream: issue #212 ("Git source needs a way to disable checking out submodules") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/212 | 15:42 |
gitlab-br-bot | buildstream: merge request (212-git-source-needs-a-way-to-disable-checking-out-submodules->master: Resolve "Git source needs a way to disable checking out submodules") #259 changed state ("merged"): https://gitlab.com/BuildStream/buildstream/merge_requests/259 | 15:42 |
jjardon[m] | skullman: neither: our runners are elastic so they get destroyed after being used | 15:43 |
skullman | probably why cloning is such an issue I guess, so finding a way to cache would help too | 15:44 |
*** noisecell has joined #buildstream | 15:49 | |
*** toscalix has quit IRC | 16:07 | |
*** Prince781 has joined #buildstream | 16:11 | |
persia | For the running-in-disposable-instance case, it might be nice for buildstream to have a shallow clone feature, with error messages if the user tries to do anything interesting. What breaks with such a feature? | 16:14 |
gitlab-br-bot | buildstream: issue #200 ("bst workspace open creates the directory and then fails.") changed state ("closed") https://gitlab.com/BuildStream/buildstream/issues/200 | 16:15 |
gitlab-br-bot | buildstream: merge request (workspace-directory-fix->master: Create workspace directory after checking for potential issues) #281 changed state ("closed"): https://gitlab.com/BuildStream/buildstream/merge_requests/281 | 16:15 |
skullman | persia: from my memory of when I looked at it… a couple of years ago now, wow, nothing insurmountable | 16:16 |
*** mcatanzaro has quit IRC | 16:17 | |
ssam2 | it seems pretty wrong headed to be running in disposable instances with no caching | 16:17 |
skullman | if you need to do `bst track` then you can abuse `git ls-remote` to get the current state of branches | 16:17 |
ssam2 | the more your project grows, the more you cost those whose source code you build | 16:17 |
skullman | if you need to build a specific version you need your git server to be configured to allow you to fetch commits by sha1 rather than branch | 16:18 |
ssam2 | partial clones will only ever be a partial solution; the correct approach is persistent caching / mirroring | 16:18 |
*** mcatanzaro has joined #buildstream | 16:19 | |
skullman | ssam2: if it's completely elastic instances though then AIUI you're just moving the fetch from upstream's server onto your own infrastructure | 16:19 |
skullman | or your cloud provider's | 16:19 |
ssam2 | the point is to reduce how often you fetch | 16:20 |
skullman | persia: so shallow would be possible if you can guarantee that the git server permits fetching by sha1, which AIUI most servers turn off to make it easier to expunge accidental history leaks | 16:20 |
ssam2 | if you have 10000 builds a day and each one fetches from upstream, you're doing something wrong | 16:20 |
ssam2 | at least, if each one does a full clone from upstream's servers | 16:21 |
persia | ssam2: I've no issue with a persistent local mirror: my issue is that it is lots less expensive to schedule access to build resources on an as-needed basis than to keep a farm of build machines with primed caches about just in case I rebuild. Especially if I want to do thousands of builds a day, where the chance I can reuse the same build machine for two builds of the same thing is very low. | 16:22 |
ssam2 | sure, if you implement it badly then it will suck :-) | 16:23 |
persia | But that means that when BuildStream processes an element, there's a good chance that it needs to do a local clone from the local mirror, and having that be a shallow clone is likely to be much faster. | 16:23 |
ssam2 | my point is just that we should never encourage DDOSing upstream servers | 16:23 |
persia | How would you implement it well? Assume that I have about 15000 sources, and want to modify an arbitrary 3000 of them daily, with the smallest amount of hardware resources to keep up with the builds. | 16:23 |
ssam2 | I'd suggest a separate cache running on the same local network | 16:25 |
persia | Oh, yeah, pulling from anything frequently without explicit permissions is poor behaviour. I was assuming a case where the arbitrary 3000 changes were desired to be updated. | 16:25 |
ssam2 | the elastic GitLab runners setup allows for this, and we do it | 16:25 |
ssam2 | although it seems to still take a while copying the cache around, so it's not ideal | 16:25 |
persia | ssam2: Having a local cache works. Doesn't help with the clones though. | 16:25 |
persia | Precisely. The point of shallow clones would only be to get from local mirror to the builder. | 16:26 |
ssam2 | so your point is that the extra complexity of supporting partial clones is worthwhile to reduce load in massive setups ? | 16:26 |
ssam2 | that may be true I suppose | 16:26 |
persia | No, my question is how much extra complexity is introduced by adding support for partial clones. That it helps massive setups is the rationale for the query. | 16:27 |
*** tristanmaat has joined #buildstream | 16:27 | |
*** tristanmaat is now known as tlater` | 16:27 | |
*** tlater` has left #buildstream | 16:27 | |
ssam2 | ah. All I know is that it makes various things break, so it's dangerous | 16:27 |
ssam2 | or complex | 16:28 |
persia | An example "massive" setup would be jjardon[m]'s situation, although the numbers are closer to 1500 and 300. | 16:28 |
persia | And in that environment, having cloned the entire git repo offers no persistent benefit. | 16:28 |
persia | I would expect the same to be true for any moderately sized project trying to build an entire system and using CI, even with many less than 10 developers. | 16:29 |
tristan | From what I recall at the time, it is possible to achieve track (get latest commit sha for branch) _only_ if you make some assumption that a remote git repo is configured with some very special sauce | 16:29 |
tristan | So it was just not viable. | 16:29 |
ssam2 | persia, in a local mirror server it certainly does have a benefit | 16:29 |
ssam2 | persia, as different builds maybe building different tags of the same repo | 16:29 |
persia | tristan: But I don't need track in the disposable-build-bot scenario (or the hard-to-schedule-because-numbers-build-bot scenario) | 16:29 |
ssam2 | persia, so partial clones might actually require 6 x 2GB partial clone of 6 different tags, vs. one 4GB clone that can provide every tag | 16:30 |
persia | ssam2: Yes. I believe a local mirror server is essential. I also think this is entirely indpendent of the discussion of what buildstream should do. | 16:30 |
ssam2 | ok. but my point stands that i've never seen proof of partial clones being worth the complexity | 16:30 |
persia | And I don't know why I need to pull 6 tags for a build. Doesn't a build just act on a specific SHA1? After that, I don't expect to build the same source on the same builder. | 16:30 |
ssam2 | you are only theorising :-) | 16:30 |
ssam2 | ok, 6 different SHA1s | 16:31 |
tristan | persia, BuildStream only requires 2 things: Ability to download the code for a specific ref, and ability to go find the latest ref for a given symbolic tracking branch | 16:31 |
ssam2 | i didn't say 6 for 1 build, i meant 6 different builds | 16:31 |
tristan | persia, if it can be done with git, that would be great. | 16:31 |
persia | I'm trying to understand how to resolve the issue jjardon[m] raised above. I feel like all the responses are "don't do that for reasons that have nothing to do with the stated use case". | 16:31 |
persia | tristan: Is it ever possible to not perform the second of those actions? perhaps in a build-only situation, or special configuration? | 16:32 |
persia | (or ecen just "don't run a subset of BuildStream commands") | 16:32 |
persia | git is certainly capable of downloading only the things required to populate a tree for a specific REF, but it requires calling git with special arguments. | 16:33 |
tristan | persia, that can paint you into weird corners, where you dont need something; until you do; and then you end up patching it up with never ending bandaids for weird corner cases | 16:33 |
ssam2 | a solution to jjardon[m] problem was proposed which am skeptical of and i've explained why | 16:34 |
tristan | persia, if all of that patchwork could live *inside* git.py source plugin, that's still a bit saner than more optionality | 16:34 |
ssam2 | i also proposed a different solution | 16:34 |
persia | tristan: Yes. The question is if my expectation is that I will instantiate a builder; run buildstream to build an element; decommission the builder, am I ever likely to encounter such a case? | 16:34 |
persia | ssam2: which? I didn't see it. | 16:34 |
tristan | (when I say "that" above, I'm only referring to "BuildStream, please cripple the part of functionality I dont need, so that my specific use case goes a bit faster") | 16:35 |
ssam2 | persia, he can get tarballs of arbitrary commits of certain projects by using special features of GitHub | 16:36 |
tristan | persia, likely is not really important, you open up the door to the innocent bystanders which use these hacks in other ways which you did not expect | 16:36 |
persia | I don't even need a crippled buildstream. I'm just wondering if, if nobody every runs `bst track` during the lifecycle of a build instance, will buildstream care of none of the clones are real? | 16:36 |
tristan | are github tarballs reliable, though ? I thought they were generated-on-the-fly and dont have stable checksums | 16:37 |
persia | ssam2: Oh, yes. Also, for many projects, all the interesting commits have tarballs. I suppose I dismissed it as depending on a foreign closed-source API. | 16:37 |
persia | tristan: unreliable. | 16:37 |
ssam2 | oh, that sucks | 16:38 |
persia | And, I thought the point of the initial use case presentation was to stop using tarballs, although I can see that the underlying problem might be perceived as "I want a build artifact for every commit" | 16:38 |
persia | ssam2: To be fair, one can cache them, and treat that cached ones as precious, and so it still works as a workaround. | 16:38 |
tristan | There are tricks one can potentially play within git.py; like do a shallow clone by default until first track occurs, and then fallback to real clone | 16:40 |
tristan | i.e. | 16:40 |
tristan | <tristan> persia, if all of that patchwork could live *inside* git.py source plugin, that's still a bit saner than more optionality | 16:41 |
tristan | but having a "mode" is going to explode in our face | 16:41 |
tristan | also, it's questionable whether it's overall more performant | 16:41 |
persia | It is always more performant if one never uses the git repo again, and almost always less performant if one wants to perform a second operation on the git repo. | 16:41 |
persia | Sadly, that's something that is almost impossible to know within the plugin without a runtime switch. | 16:42 |
persia | starting shallow and cloning to do something else is definitely less performant than just cloning. | 16:42 |
tristan | Runtime switch = all source plugin bugs * 2 | 16:42 |
persia | yeah, that's the problem. | 16:42 |
persia | I suppose one could configure a mirror with shallow repositores: e.g. history only back to last release or similar. | 16:43 |
jjardon[m] | for a "gnome-continuous" kind of builds (always build latest master) shallow clones for everything can make sense. For elements that want a specific commit this will not work. I guess is a git "limitation" that you can not clone a specific commit without the history | 16:43 |
persia | And then track against that. Requires an external system, but solves the problem (in a similar manner to autogenerated tarballs) | 16:43 |
persia | I rememeber someone demonstrating a hack to cause a truncated history to have all the same recent SHAs as a full history. | 16:44 |
persia | jjardon[m]: You *can* fetch a specific commit without the history: the problem is that when you do so, you don't have the history, so any commands that inspect history (like in bst track) fail. | 16:45 |
jjardon[m] | well, bst track doesnt fail when you use tarballs, so we can do something speciall for git shalow clones as well ? | 16:46 |
persia | jjardon[m]: Have the git source plugin detect a shallow clone, and just disallow track operations? Possibly. Trick is how to tell the plugin when to use shallow vs. non-shallow. | 16:48 |
jjardon[m] | what about a git-shallow source? | 16:49 |
persia | Because that isn't something safe to define for a project or an element, as the only time it matters is for automated builders on elastic substrates (or large builder farms, where chance of git cache hit is low). | 16:49 |
persia | That makes all the cache keys different from those used by the git source, and makes all the projects unsharable between folk who want to develop and folk who want to CI. | 16:49 |
jjardon[m] | no, it matters even if I have a permanent server | 16:49 |
persia | Having unsharable cache is one thing (although unnecessary if the refs are consistent), but havinga project that cannot be used locally is fairly painful. | 16:50 |
jjardon[m] | it will be faster the first time I run the build | 16:50 |
persia | It will be slower the second time you run the build. | 16:50 |
persia | Now, if you have three permanent servers, then it might always be faster. or if your permanent server does not have enough disk space for all the git repos for all the sources (but that requires lots of sources). | 16:51 |
jjardon[m] | persia: mmm, why?; it will be exactly the same time or none because I already cache the repo somewhere | 16:51 |
persia | jjardon[m]: If you don't change ref, then you don't need to rebuild (results are cached). If you change ref, the shallow clone becomes useless, and you need a new shallow clone. | 16:52 |
jjardon[m] | yeah, but the shallow clone will take the same time as the first one, no more | 16:53 |
persia | Oh, yes, roughly. The point is that if you have a single permanent server, it is faster to do a real clone, as the time lost on first build will be made up on subsequent builds. | 16:53 |
persia | If you don't have persisent storage of the git repo, and have to perform a new clone operation, that's different. | 16:54 |
persia | note that the persistent storage has to be in the builder: a network-local git mirror makes everything faster and better, but doesn't solve the IO issue of loading the data into the builder. | 16:55 |
jjardon[m] | I think you are mixing 2 different things here | 16:56 |
jjardon[m] | one thing is store the sources/clones in a cache to be reused. Another is how much time takes to clone/retrieve those sources again. and I think in the second case is where shallow clones could be used instead tarballs | 16:58 |
jjardon[m] | also, even if you cache your sources, shallow clones will make that cache much smaller | 16:59 |
persia | I've been trying to maintain that distinction, and have felt the mixture also. My apologies if the differentiation isn't clear in my text. Yes, the problem as you have stated is the one I would like to see resolved. | 16:59 |
persia | Unfortunately, I don't see how it can be resolved except with some runtime behaviour determination, as it isn't possible to know until invoking BuildStream whether one is going to be able to reuse the clone. | 17:00 |
persia | And tristan previously asserted "Runtime switch = all source plugin bugs * 2" as justification for not supporting such a thing. | 17:00 |
tristan | If ref changes but track never happens, getting new ref is *probably* more expensive that second time than if you had done a full clone the first time | 17:00 |
tristan | But, probably around the same as if a tarball has changed | 17:01 |
persia | Hence my returning to considering "fake" git mirrors that we shallow in nature, so a full clone functioned like a shallow clone (done outside BuildStream) | 17:01 |
tristan | So one could still have some local state in the git sources' directory where the git.py plugin could try to do something smarter | 17:01 |
persia | tristan: Yes, it is more expensive, assuming you are using the same machine. In an elastic compute environment, that assumption is not likely to be met. | 17:01 |
tristan | shallow clone until first track is not that horrible I think | 17:01 |
tristan | also remember; if you dont need to rebuild a given artifact, BuildStream should not be downloading the source at all | 17:02 |
persia | Right. We're only talking about changes here. | 17:02 |
jjardon[m] | tristan: that would still a huge gain in a linux repo (cone is more than 2GB , while tarball is ~100MB ?) | 17:02 |
persia | Key is that when oe is working locally, one can safely assume that a full clone will be useful, as one might use the same source twice. This isn't as likely to be true in a CI environment. | 17:03 |
tristan | jjardon[m], I dont know, you'd have to try a git shallow clone of linux on the command line to see how much it would save | 17:03 |
persia | git shallow clone and tarball are roughly similar in size. | 17:03 |
persia | (depends on protocols used by git, how the tarball is compressed, etc., but usually within 15% or so) | 17:04 |
persia | tristan: The problem with shallow clone followed by full clone is that in the linux example, it only costs an extra 5-10% time, but for a new, young project it might nearly double the time. | 17:04 |
tristan | My guess is that when dealing with a project that has stored the refs; one usually doesnt have to download sources at all unless one needs to modify the ref themselves | 17:05 |
persia | Right. Assume automation that tracks N project git repos, and then dispatches buildstream builds with the result of modifying the ref for each change. | 17:06 |
persia | It shouldn't be all the changes, because we don't want someone breaking something in one repo to cause that breakage to happen for another developer on another team in another repo. | 17:06 |
persia | In the case of GNOME, this is about 100 ref changes daily. For larger or more active projects, this would be much larger. | 17:07 |
persia | (where "100" is a rough estimate count of the build notifications in #testable on a given day) | 17:08 |
persia | This translates to 100 (or whatever) source downloads daily, which one might schedule over a cluster of builders. | 17:09 |
tristan | not sure my last got through; what I mean is; if you do have powerful enough CI; and I download your project at any time and run `bst build`, it will be fairly rare that I ever have to build anything or download sources, *except* for the ones I'm interested in working on | 17:09 |
persia | If the builders are only instantiated on-demand, they have no prior cache, etc. | 17:10 |
persia | tristan: Key is that in the use case under discussion, there is no human on the machine: this is only CI. The CI is *never* interested in working on any sources, just building them. | 17:10 |
tristan | Oh look; here's another weird side effect though: If I happen to be so unfortunate to have a shallow clone in my cache, I will be shaking my fist when I open a workspace and cannot browse history | 17:11 |
persia | And in that, admittedly narrow, case it would be interesting to be able to shortcut the full git clone. | 17:11 |
persia | Ci never opens a workspace. | 17:11 |
persia | There are *lots* of weird corner cases. Calling it "--ci-mode" and documenting it with "this makes everything break except CI builds" would be fine by me :) | 17:12 |
tristan | persia, I know that the use case you are discussing is CI and automation; I am trying to see if there is a good argument for a default behavior that suits both *because* of the probable nature of what happens when you work locally | 17:12 |
tristan | oh no not mode | 17:12 |
* tristan palmface | 17:12 | |
persia | I don't think so. I would argue passionately against the use of shallow clones by humans. I think that is a bad idea that is likely to cause confusion and pain. | 17:12 |
persia | At best, it just makes it slow for the human if it doesn't absolutely work perfectly the first time, the user doesn't want to track, use a workspace, actually use git, or anything. | 17:13 |
persia | And in that case, I think the result should have been precached by CI :) | 17:13 |
tristan | Look, if it's download shallow by default, but resort to full clone on `bst track` or `bst workspace open`: The likelyhood of things being subobtimal for users working locally is much reduced by a populated artifact cache. | 17:13 |
persia | How does a populated artifact cache help? | 17:14 |
tristan | That is the case I am making, I just dont know if you are trying to understand the case I'm trying to experimentally make. | 17:14 |
tristan | persia, because you never download sources if you dont need to build the artifact | 17:14 |
tristan | you dont need the source at all, for an artifact in the cache. | 17:14 |
persia | Stiil, as a human, I always want the full clone if I get the source at all. | 17:14 |
persia | Otherwise I'm just adding 10-100% download time to getting the source. | 17:15 |
tristan | As a human, you would be opinionated about what you want, humans generally are :) | 17:15 |
persia | Find me a human who actually likes working with shallow clones, and I'll show you someone who is prepared to volunteer to shill anything. | 17:15 |
tristan | Its true though that one should not rely on having a good automated builder chugging builds | 17:16 |
persia | And I think the current behaviour for the non-automated case is correct. | 17:16 |
tristan | persia, a user builds a project with 400 repos, but only ever wants to develop on 2 or three of them | 17:16 |
persia | The problem is that the automation is slow for projects with long history. | 17:16 |
persia | tristan: Ah, without automation. You found the magic case. yes. | 17:16 |
tristan | hopefully most of the time, they never need *any* clones of those other 397 repos | 17:16 |
tristan | But *even* lets imagine there is no remote artifact cache with build results to provide the artifacts to download | 17:17 |
persia | Right, but if there is no autobuilder, the user only wants the shallow clones to build them once and forget them. | 17:17 |
tristan | Then, do you want shallow clones of 400 modules, *until* you open a workspace or `bst track` the module you want to work on ? | 17:17 |
tristan | Or, do you want always 400 full clones ? | 17:18 |
tristan | I dont know | 17:18 |
persia | Is this tradeoff worth potentially taking twice as long to clone the three interesting repos? | 17:18 |
tristan | I think on average it is worth it | 17:18 |
persia | Thinking about it, the actual cost is that the user experiences clone delay on workspace open or track. | 17:18 |
tristan | on an average of 3 repos I want to work on out of 400 it certainly would be | 17:18 |
persia | At lesat the first time, which can be explained. | 17:18 |
persia | Is there an opportunity to provide feedback to the user "preparing workspace...cloning full repository ...", etc.? | 17:19 |
tristan | I'm just trying to look at what it might look like if we shallow-clone-until-need-full-clone, before going down the very ugly road of optionality that we hope should not have to happen. | 17:19 |
persia | If so, I think I like shallow-first-then-real. | 17:19 |
persia | Should probably also do a full clone in the case where a shallow clone not containing the desired ref is found, as this would be evidence that the user is working on the source, even if we didn't detect it properly. | 17:20 |
persia | (or that the builder might benefit from having a proper local git repo as a cache, etc.) | 17:21 |
tristan | There is an opportunity, there is Source specific methods to deal with opening workspaces where the git plugin could do something custom | 17:21 |
tristan | (we added that mostly so that it could `git remote set-url origin upstream-url`) | 17:21 |
persia | Cool. | 17:22 |
persia | jjardon[m]: Would that meet your use case? | 17:22 |
jjardon[m] | for us 98% of the time we dont even have to know if the thing is build with bst or anything else; I do not normally need to do build in my machine because I get everything from the cache | 17:25 |
*** ssam2 has quit IRC | 17:42 | |
gitlab-br-bot | buildstream: merge request (jmac/configurable-logging->master: WIP: Configurable log line formatting) #282 changed state ("opened"): https://gitlab.com/BuildStream/buildstream/merge_requests/282 | 17:53 |
gitlab-br-bot | buildstream: issue #261 ("Investigate the use of git shallow clones (to build instead tarballs)") changed state ("opened") https://gitlab.com/BuildStream/buildstream/issues/261 | 18:08 |
jmac | tristan: Thanks for looking over the MR, I'll try and bash it into shape tomorrow. | 18:17 |
*** dominic has quit IRC | 18:20 | |
tristan | jmac, yeah it probably needs thought | 18:21 |
tristan | it's only "generally" straight forward, but as you point out, not all log lines are created equally | 18:21 |
tristan | but there is much commonality | 18:21 |
*** Prince781 has quit IRC | 18:52 | |
*** valentind has joined #buildstream | 18:52 | |
*** Prince781 has joined #buildstream | 19:00 | |
*** Prince781 has quit IRC | 19:03 | |
*** xjuan has joined #buildstream | 19:31 | |
*** ernestask has quit IRC | 20:13 | |
*** tm has quit IRC | 20:26 | |
*** jonathanmaw has quit IRC | 20:59 | |
*** aday has quit IRC | 21:00 | |
*** Prince781 has joined #buildstream | 21:55 | |
*** valentind has quit IRC | 22:23 | |
*** tristan has quit IRC | 22:49 | |
*** Prince781 has quit IRC | 23:45 | |
*** Prince781 has joined #buildstream | 23:47 | |
*** Prince781 has quit IRC | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!