*** gtristan has quit IRC | 01:30 | |
gitlab-br-bot | definitions: issue #19 ("Ostree and git cache fails to upload on minimal systems") changed state ("opened") https://gitlab.com/baserock/definitions/issues/19 | 08:58 |
---|---|---|
noisecell | jjardon, could you limit the number of fetchers in ~/.config/buildstream.conf to 1, just in case this helps with the blockage of the pipelines we have when fetching delta-linux? | 09:03 |
noisecell | https://gitlab.com/baserock/definitions/pipelines/24529547 | 09:03 |
noisecell | if this is not solved, the only way I can go forward with a MR is pressing the button for every job in parallel in the pipeline myself, which is not ideal | 09:04 |
noisecell | :( | 09:04 |
* noisecell presses buttons for the time being | 09:05 | |
*** jonathanmaw has joined #baserock | 09:09 | |
*** jonathanmaw has quit IRC | 09:45 | |
jjardon | noisecell: that never have failed before; are you aware of any change in git.baserock.org? | 09:48 |
jjardon | noisecell: you can problably change that parameter in your branch though? | 09:49 |
noisecell | jjardon, yeah, I've sent commit, but it is more a workaround than anything else. We need to see why that is returning 500 | 09:49 |
noisecell | when trying to upload the cache | 09:50 |
noisecell | jjardon, I did open https://gitlab.com/baserock/definitions/issues/19 | 09:50 |
noisecell | and no, I haven't see any issue in git.baserock.org | 09:51 |
noisecell | but to be fair I haven't work with it for quite a while, it is just now that we are beating with the stick the tree when we are getting all these things back to us | 09:52 |
*** jonathanmaw has joined #baserock | 10:00 | |
ironfoot | if we haven't seen this error before might be because the shared git cache was working | 10:05 |
ironfoot | that would really help with this case | 10:05 |
ironfoot | I have no idea about how this cache server, althought there might be some docs in the infra readme | 10:05 |
jjardon | those are different issues though: one is the cache not working, the other is the git.baserock.org fails to serve after a number of jobs | 10:10 |
noisecell | jjardon, yeah, but if we fix the first one, the second one should disappear because we would be sharing that cache, though | 10:11 |
jjardon | noisecell: I think that cache server is being used by the buildstream team for a long time, maybe you can ask them | 10:11 |
noisecell | which is local to the runners? | 10:11 |
noisecell | jjardon, is that cache server not part of the runners? | 10:12 |
jjardon | cache server is not local to the runners | 10:12 |
jjardon | cache server should always be an optimization, CI can not fail because an optimization is not present | 10:12 |
ironfoot | I agree here | 10:12 |
jjardon | so yea, maybe tweak the number of fetchers is better solution | 10:13 |
ironfoot | fixing this would also help: https://gitlab.com/BuildStream/buildstream/issues/5# | 10:13 |
ironfoot | g.b.o really needs to be re-deployed so that we can maintain it | 10:14 |
noisecell | I agree | 10:15 |
jjardon | noisecell: feel free to open a MR to reduce the number of fetchers if that helps with the CI to not fail | 10:15 |
noisecell | ironfoot, maybe is worth reminding buildstream that we still suffering for that bug... it is 1 year old already | 10:15 |
noisecell | jjardon, I've already sent a commit in my branch with that change, let's see what happens | 10:16 |
jjardon | great! in the meantime, maybe poke the bst guys in case they are aware of the cache server being broken | 10:16 |
jjardon | maybe we can redeploy a new one | 10:17 |
noisecell | jjardon, how can I claim that http://10.131.43.226:9005 is buildstream cache server? | 10:19 |
noisecell | is there anywhere where they said this? | 10:19 |
jjardon | buildstream is using our runners, so pretty sure it is :) | 10:19 |
ironfoot | that is 46.101.48.48 | 10:20 |
ironfoot | (i have access to the DO console) | 10:20 |
noisecell | pinging 10.131.43.226 does nothing | 10:20 |
ironfoot | noisecell: thats a private address | 10:20 |
jjardon | I just confirm; 10.131.43.226 is the internal IP, 46.101.48.48 is the external one | 10:20 |
ironfoot | the public one is 46.101.48.48 | 10:20 |
noisecell | pinging 46.101.48.48 does work | 10:21 |
jjardon | runners use 10.131.43.226 to connect with it | 10:21 |
jjardon | (they are all in DigitalOcean) | 10:21 |
noisecell | jjardon, you know better than me about this thing, why don't you ask in buildstream? if not Im going to be asking things there and here | 10:22 |
jjardon | noisecell: because I do not have time atm, sorry | 10:22 |
noisecell | jjardon, ok | 10:22 |
gitlab-br-bot | infrastructure: merge request (new-runners-manager-ip-address->master: README: Update runners manager IP address) #28 changed state ("opened"): https://gitlab.com/baserock/infrastructure/merge_requests/28 | 10:23 |
ironfoot | jjardon: do you know how this manager and cache server were deployed, and who has access to them? I want to look inside to figure out what's failing | 10:26 |
ironfoot | it feels odd asking bst folks about baserock infra :) | 10:31 |
jjardon | sec | 10:32 |
jjardon | ironfoot: https://gitlab.com/baserock/infrastructure/#gitlab-ci-runners-setup | 10:34 |
jjardon | I'd say the easier is to setup another distributed cache server | 10:34 |
* ironfoot hides and runs | 10:35 | |
jjardon | It should be easy following the steps at https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching | 10:36 |
ironfoot | if wanted, i can restart the cache server to see if that helps | 10:48 |
ironfoot | but I don't have time to do more than that | 10:48 |
noisecell | ironfoot, it won't hurt, it is actually not working, so it might help :/ | 10:49 |
noisecell | unless jjardon is against that because he knows something that we don't | 10:49 |
noisecell | ironfoot, ^^ | 10:50 |
jjardon | noisecell: only users of that cache server are baserock and buildstream, so taking in account it seems is not working, I do not see the problem on restarting it | 11:26 |
noisecell | ironfoot, sadly the cache server still not working: https://gitlab.com/baserock/definitions/-/jobs/77447715 | 13:18 |
ironfoot | yeah, saw that | 13:18 |
ironfoot | now is even worse | 13:18 |
noisecell | waiting now to see if limiting the fetchers to 2 solve the other issue | 13:18 |
noisecell | Duration: 140 minutes 30 seconds -- and the 3 job in the pipeline hasn't finished yet.... | 15:04 |
paulsherwood | noisecell: what does the buildstream team recommend here? | 15:05 |
paulsherwood | can you spin up your own artifact server, for example? | 15:05 |
noisecell | paulsherwood, this is the baserock artifact server not running properly as discussed before: https://gitlab.com/baserock/definitions/issues/19 | 15:07 |
noisecell | https://gitlab.com/baserock/definitions/pipelines/24581307 -- this is the pipeline for reference | 15:08 |
noisecell | nothing to be related with buildstream, though | 15:13 |
noisecell | s/to be// | 15:13 |
*** gtristan has joined #baserock | 15:19 | |
ironfoot | don't confuse artifact server with git cache server | 15:22 |
noisecell | ironfoot, true | 15:23 |
ironfoot | all the pipelines fetching linux at the same time, not good | 15:23 |
ironfoot | if the runnes shared git cache was working, then this wouldnt happen | 15:24 |
ironfoot | not an excuse, of course, all these runners and git.baserock.org being in the cloud, this should just work | 15:24 |
noisecell | yes, but the pipeline hasn't get that far yet, with the change to reduce the fetchers to 2 | 15:24 |
ironfoot | i see, the pipeline is fetching linux twice in parallel | 15:25 |
ironfoot | so it's like 20 times in total | 15:26 |
ironfoot | paulsherwood: the main problem here is that the CI is broken making it very (if not impossible) to contribute changes | 15:32 |
paulsherwood | ironfoot: ack. do we know what broke it? | 15:33 |
ironfoot | CI is broken for various issues related to the baserock infra: | 15:34 |
ironfoot | - git.baserock.org not giving repositories quick enough to the runners | 15:34 |
ironfoot | - shared git cache between the runners not working at all | 15:34 |
ironfoot | - buildstream not being clever about clonning the samame repository twice | 15:34 |
ironfoot | s/samame/same/ | 15:34 |
paulsherwood | ah ok | 15:35 |
ironfoot | and somehow, I feel like the artifacts (ostree) cache isn't working either | 15:35 |
* paulsherwood believesd that's true... | 15:35 | |
paulsherwood | iiuc they've already realized ostree is not a great fit for the cache-server | 15:35 |
ironfoot | regarding g.b.o not giving repostiories fast enough, maybe, allowing 24 clients to fetch linux at the same time and expect it to work, it might be too much | 15:36 |
ironfoot | heheh, we could start by removing some ppc64 and arm64 jobs from the pipeline | 16:28 |
ironfoot | I had a quick look at them and seem to be using x86 artifacts and build on x86 | 16:29 |
*** jonathanmaw has quit IRC | 17:10 | |
jjardon | noisecell: hey, did limiting the fetchers resolve the problem? | 17:38 |
*** gary_perkins has quit IRC | 18:03 | |
*** chrispolin has quit IRC | 18:03 | |
*** ltu has quit IRC | 18:03 | |
*** anahuelamo has quit IRC | 18:03 | |
*** tlater has quit IRC | 18:03 | |
*** anahuelamo has joined #baserock | 18:05 | |
*** anahuelamo has quit IRC | 18:05 | |
*** chrispolin has joined #baserock | 18:06 | |
*** paulsherwood has quit IRC | 18:07 | |
*** jmac_ct has quit IRC | 18:07 | |
*** benbrown_ has quit IRC | 18:07 | |
*** tlater has joined #baserock | 18:08 | |
*** anahuelamo has joined #baserock | 18:08 | |
*** gary_perkins has joined #baserock | 18:08 | |
*** laurence- has joined #baserock | 18:09 | |
gitlab-br-bot | infrastructure: merge request (new-runners-manager-ip-address->master: README: Update runners manager IP address) #28 changed state ("opened"): https://gitlab.com/baserock/infrastructure/merge_requests/28 | 19:06 |
gitlab-br-bot | infrastructure: merge request (new-runners-manager-ip-address->master: README: Update runners manager IP address) #28 changed state ("merged"): https://gitlab.com/baserock/infrastructure/merge_requests/28 | 19:06 |
gitlab-br-bot | infrastructure: issue #12 ("Setup new gitlab cache server") changed state ("opened") https://gitlab.com/baserock/infrastructure/issues/12 | 19:10 |
jjardon | noisecell: new gitlab cache server created (using digitalocean spaces instead our own instance, so one thing less to maintain | 19:42 |
gitlab-br-bot | definitions: issue #19 ("Ostree and git cache fails to upload on minimal systems") changed state ("closed") https://gitlab.com/baserock/definitions/issues/19 | 20:11 |
*** gtristan has quit IRC | 20:28 | |
*** paulsherwood has joined #baserock | 21:03 | |
*** gtristan has joined #baserock | 22:25 | |
*** gtristan has quit IRC | 23:51 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!