IRC logs for #baserock for Tuesday, 2018-06-26

*** gtristan has quit IRC01:30
gitlab-br-botdefinitions: issue #19 ("Ostree and git cache fails to upload on minimal systems") changed state ("opened") https://gitlab.com/baserock/definitions/issues/1908:58
noisecelljjardon, could you limit the number of fetchers in ~/.config/buildstream.conf to 1, just in case this helps with the blockage of the pipelines we have when fetching delta-linux?09:03
noisecellhttps://gitlab.com/baserock/definitions/pipelines/2452954709:03
noisecellif this is not solved, the only way I can go forward with a MR is pressing the button for every job in parallel in the pipeline myself, which is not ideal09:04
noisecell:(09:04
* noisecell presses buttons for the time being09:05
*** jonathanmaw has joined #baserock09:09
*** jonathanmaw has quit IRC09:45
jjardonnoisecell: that never have failed before; are you aware of any change in git.baserock.org?09:48
jjardonnoisecell: you can problably change that parameter in your branch though?09:49
noisecelljjardon, yeah, I've sent commit, but it is more a workaround than anything else. We need to see why that is returning 50009:49
noisecellwhen trying to upload the cache09:50
noisecelljjardon, I did open https://gitlab.com/baserock/definitions/issues/1909:50
noisecelland no, I haven't see any issue in git.baserock.org09:51
noisecellbut to be fair I haven't work with it for quite a while, it is just now that we are beating with the stick the tree when we are getting all these things back to us09:52
*** jonathanmaw has joined #baserock10:00
ironfootif we haven't seen this error before might be because the shared git cache was working10:05
ironfootthat would really help with this case10:05
ironfootI have no idea about how this cache server, althought there might be some docs in the infra readme10:05
jjardonthose are different issues though: one is the cache not working, the other is the git.baserock.org fails to serve after a number of jobs10:10
noisecelljjardon, yeah, but if we fix the first one, the second one should disappear because we would be sharing that cache, though10:11
jjardonnoisecell: I think that cache server is being used by the buildstream team for a long time, maybe you can ask them10:11
noisecellwhich is local to the runners?10:11
noisecelljjardon, is that cache server not part of the runners?10:12
jjardoncache server is not local to the runners10:12
jjardoncache server should always be an optimization, CI can not fail because an optimization is not present10:12
ironfootI agree here10:12
jjardonso yea, maybe tweak the number of fetchers is better solution10:13
ironfootfixing this would also help: https://gitlab.com/BuildStream/buildstream/issues/5#10:13
ironfootg.b.o really needs to be re-deployed so that we can maintain it10:14
noisecellI agree10:15
jjardonnoisecell: feel free to open a MR to reduce the number of fetchers if that helps with the CI to not fail10:15
noisecellironfoot, maybe is worth reminding buildstream that we still suffering for that bug... it is 1 year old already10:15
noisecelljjardon, I've already sent a commit in my branch with that change, let's see what happens10:16
jjardongreat! in the meantime, maybe poke the bst guys in case they are aware of the cache server being broken10:16
jjardonmaybe we can redeploy a new one10:17
noisecelljjardon, how can I claim that http://10.131.43.226:9005 is buildstream cache server?10:19
noisecellis there anywhere where they said this?10:19
jjardonbuildstream is using our runners, so pretty sure it is :)10:19
ironfootthat is 46.101.48.4810:20
ironfoot(i have access to the DO console)10:20
noisecellpinging 10.131.43.226 does nothing10:20
ironfootnoisecell: thats a private address10:20
jjardonI just confirm; 10.131.43.226 is the internal IP, 46.101.48.48 is the external one10:20
ironfootthe public one is 46.101.48.4810:20
noisecellpinging 46.101.48.48 does work10:21
jjardonrunners use 10.131.43.226  to connect with it10:21
jjardon(they are all in DigitalOcean)10:21
noisecelljjardon, you know better than me about this thing, why don't you ask in buildstream? if not Im going to be asking things there and here10:22
jjardonnoisecell: because I do not have time atm, sorry10:22
noisecelljjardon, ok10:22
gitlab-br-botinfrastructure: merge request (new-runners-manager-ip-address->master: README: Update runners manager IP address) #28 changed state ("opened"): https://gitlab.com/baserock/infrastructure/merge_requests/2810:23
ironfootjjardon: do you know how this manager and cache server were deployed, and who has access to them? I want to look inside to figure out what's failing10:26
ironfootit feels odd asking bst folks about baserock infra :)10:31
jjardonsec10:32
jjardonironfoot: https://gitlab.com/baserock/infrastructure/#gitlab-ci-runners-setup10:34
jjardonI'd say the easier is to setup another distributed cache server10:34
* ironfoot hides and runs10:35
jjardonIt should be easy following the steps at https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching10:36
ironfootif wanted, i can restart the cache server to see if that helps10:48
ironfootbut I don't have time to do more than that10:48
noisecellironfoot, it won't hurt, it is actually not working, so it might help :/10:49
noisecellunless jjardon is against that because he knows something that we don't10:49
noisecellironfoot, ^^10:50
jjardonnoisecell: only users of that cache server are baserock and buildstream, so taking in account it seems is not working, I do not see the problem on restarting it11:26
noisecellironfoot, sadly the cache server still not working: https://gitlab.com/baserock/definitions/-/jobs/7744771513:18
ironfootyeah, saw that13:18
ironfootnow is even worse13:18
noisecellwaiting now to see if limiting the fetchers to 2 solve the other issue13:18
noisecellDuration: 140 minutes 30 seconds  -- and the 3 job in the pipeline hasn't finished yet....15:04
paulsherwoodnoisecell: what does the buildstream team recommend here?15:05
paulsherwoodcan you spin up your own artifact server, for example?15:05
noisecellpaulsherwood, this is the baserock artifact server not running properly as discussed before: https://gitlab.com/baserock/definitions/issues/1915:07
noisecellhttps://gitlab.com/baserock/definitions/pipelines/24581307 -- this is the pipeline for reference15:08
noisecellnothing to be related with buildstream, though15:13
noisecells/to be//15:13
*** gtristan has joined #baserock15:19
ironfootdon't confuse artifact server with git cache server15:22
noisecellironfoot, true15:23
ironfootall the pipelines fetching linux at the same time, not good15:23
ironfootif the runnes shared git cache was working, then this wouldnt happen15:24
ironfootnot an excuse, of course, all these runners and git.baserock.org being in the cloud, this should just work15:24
noisecellyes, but the pipeline hasn't get that far yet, with the change to reduce the fetchers to 215:24
ironfooti see, the pipeline is fetching linux twice in parallel15:25
ironfootso it's like 20 times in total15:26
ironfootpaulsherwood: the main problem here is that the CI is broken making it very (if not impossible) to contribute changes15:32
paulsherwoodironfoot: ack. do we know what broke it?15:33
ironfootCI is broken for various issues related to the baserock infra:15:34
ironfoot- git.baserock.org not giving repositories quick enough to the runners15:34
ironfoot- shared git cache between the runners not working at all15:34
ironfoot- buildstream not being clever about clonning the samame repository twice15:34
ironfoots/samame/same/15:34
paulsherwoodah ok15:35
ironfootand somehow, I feel like the artifacts (ostree) cache isn't working either15:35
* paulsherwood believesd that's true... 15:35
paulsherwoodiiuc they've already realized ostree is not a great fit for the cache-server15:35
ironfootregarding g.b.o not giving repostiories fast enough, maybe, allowing 24 clients to fetch linux at the same time and expect it to work, it might be too much15:36
ironfootheheh, we could start by removing some ppc64 and arm64 jobs from the pipeline16:28
ironfootI had a quick look at them and seem to be using x86 artifacts and build on x8616:29
*** jonathanmaw has quit IRC17:10
jjardonnoisecell: hey, did limiting the fetchers resolve the problem?17:38
*** gary_perkins has quit IRC18:03
*** chrispolin has quit IRC18:03
*** ltu has quit IRC18:03
*** anahuelamo has quit IRC18:03
*** tlater has quit IRC18:03
*** anahuelamo has joined #baserock18:05
*** anahuelamo has quit IRC18:05
*** chrispolin has joined #baserock18:06
*** paulsherwood has quit IRC18:07
*** jmac_ct has quit IRC18:07
*** benbrown_ has quit IRC18:07
*** tlater has joined #baserock18:08
*** anahuelamo has joined #baserock18:08
*** gary_perkins has joined #baserock18:08
*** laurence- has joined #baserock18:09
gitlab-br-botinfrastructure: merge request (new-runners-manager-ip-address->master: README: Update runners manager IP address) #28 changed state ("opened"): https://gitlab.com/baserock/infrastructure/merge_requests/2819:06
gitlab-br-botinfrastructure: merge request (new-runners-manager-ip-address->master: README: Update runners manager IP address) #28 changed state ("merged"): https://gitlab.com/baserock/infrastructure/merge_requests/2819:06
gitlab-br-botinfrastructure: issue #12 ("Setup new gitlab cache server") changed state ("opened") https://gitlab.com/baserock/infrastructure/issues/1219:10
jjardonnoisecell: new gitlab cache server created (using digitalocean spaces instead our own instance, so one thing less to maintain19:42
gitlab-br-botdefinitions: issue #19 ("Ostree and git cache fails to upload on minimal systems") changed state ("closed") https://gitlab.com/baserock/definitions/issues/1920:11
*** gtristan has quit IRC20:28
*** paulsherwood has joined #baserock21:03
*** gtristan has joined #baserock22:25
*** gtristan has quit IRC23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!