IRC logs for #buildstream for Wednesday, 2020-10-07

*** tristan has joined #buildstream03:55
*** ChanServ sets mode: +o tristan03:55
tristanFWIW, I was going to do the IRC public migration bit a few weeks ago04:10
tristanAnd then I ended up reading some page about freenode which.... (A) Talks all about how "groups" or smth works there so that the proper groups can control the channels which belong to them.... (B) Did not provide a link or any useful information at all about how to create or register such a group on freenode04:11
tristanI just wanted to register our channel with the chanserv there before making an announcement, but I don't know how04:11
juergbimaybe the ASF is already registered as a group with freenode04:36
tristanMaybe04:41
tristanI was thinking that we still count as an organization, even if we're supported by ASF04:42
tristanAny way works, but we should be able to register the channel04:42
*** tristan has quit IRC05:09
*** tristan has joined #buildstream05:55
*** ChanServ sets mode: +o tristan05:55
* tristan wonders why black is configured in pyproject.toml instead of setup.cfg where everything else is configured06:02
nanonymetristan, what does one need to do to get CI working on a MR to BuildStream repo?06:15
tristannanonyme, simply create the branch *in* the BuildStream repo, and not in a forked repo06:15
nanonymeI don't think I have permissions to do that06:15
tristannanonyme, our policy is to hand out developer access to anyone who asks for it, precisely for this reason06:15
tristanso that people can share the CI setup06:16
nanonymeJust trying to get this into bst-1 https://gitlab.com/BuildStream/buildstream/-/merge_requests/208206:16
tristanAh, well this looks like it's already in the BuildStream repo06:16
nanonymeThe MR is from fork06:16
tristanoh, no it's not, it's nanonyme:...06:17
tristannanonyme, I've just sent you dev access, giving you permission to push to non-protected branches, which will allow you to create an MR within the repo where CI is configured06:18
gitlab-br-botnanonyme opened MR !2083 (nanonyme/fix-import->bst-1: Fix import of BlobNotFound) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/208306:19
tristan:)06:19
gitlab-br-botnanonyme closed MR !2082 (nanonyme/fix-import->bst-1: Fix import of BlobNotFound) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/208206:19
tristanThanks :)06:19
* tristan wonders if we have this problem in master06:19
nanonymeProbably not. I assume this was a mistake backporting cache stuff as part of 1.6.x06:20
gitlab-br-bottristanvb approved MR !2083 (nanonyme/fix-import->bst-1: Fix import of BlobNotFound) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/208306:20
tristannod06:21
nanonymetristan, note as said, the root cause is possibly https://gitlab.com/BuildStream/buildstream/-/issues/1382 which hasn't gotten much debugging yet06:21
tristanIn master we have "from ._cas.casremote import BlobNotFound"06:21
nanonymeIt's possible bst1 leaks fd's06:21
tristanSo I suppose it's not an issue there06:21
tristanIts not a leak06:22
tristanit's a system limit of simultaneously open fds06:22
tristannanonyme, this is a symptom of bst-1's approach of using the python fuse layer as I understand it06:22
nanonymeSo does using buildbox-casd and buildbox-fuse help?06:23
tristana lot of fds get opened per build - it can be worked around by increasing the system limit at the OS level, and if not sufficient, then you need to reduce parallelism unfortunately06:23
tristanYes, with the buildbox-fuse and buildbox-casd setup, this is not a problem06:24
tristanHonestly, I have not been very involved in the buildbox side, I am merely repeating when I heard juergbi say on the same topic just last week06:25
nanonymeJust wondering how much that cache backport actually pulled in. I guess it's not enough to move bst1 to buildbox-casd?06:25
tristanBut I think it's a credible source ;-)06:25
nanonymeBut just to support remote asset caches06:26
tristanThere was a whole other mountain of backports which I didn't want to risk pulling in, and those would have changed local cache management to make expiry work like in bst-206:26
tristanwhich would have been nice but; seems like a huge amount of effort that we should be spending towards actually making a bst-2 release instead06:27
juergbinanonyme: correct, backporting use of buildbox-casd and buildbox-fuse would likely be too big06:27
nanonymeMaybe the defaults in bst-1 limits should be reduced then? It clearly doesn't scale how well the default configuration claims it should06:28
nanonymeEven with bst2 released, there's no intent on moving already released versions of bst projects to it. So bst1 maintenance still continues like at least two years after bst2 release pragmatically06:29
nanonymetristan, so. Is bst1 CAS implementation more like to fail if you bst build larger amount of components than smaller amount of components?06:33
nanonymeSo IOW if you split into multiple bst build invocations, builds might start passing06:34
juergbithe fd issue is not related to CAS06:34
juergbiit's due to the SafeHardlinks FUSE layer06:34
nanonymejuergbi, where exactly is it then and how can we workaround it properly? As said, we need to still keep building this with bst1 for two more years06:35
nanonymeAt least06:35
nanonymeDepends on how soon bst2 is out06:35
juergbiiirc, the FUSE layer is running in the job process, so I wouldn't expect it to be affected by the number of concurrent build jobs06:36
nanonymeIt seems potentially more likely to fail when there's statistically more items to pull from CAS than to build06:36
juergbinanonyme: on Linux it's easy to increase the fd limit if you have root access to the build machine06:36
tristanYou should be able to reduce it with `bst --max-jobs {lower-number} build ...` I think06:37
juergbitristan: isn't pyfuse running in the job process, i.e., the fds are open in the job process and thus not affected by concurrent jobs?06:37
tristanHmmm06:37
tristanGood point06:37
tristanjuergbi, but, wouldn't less files be concurrently accessed in that job, if less subprocesses are opening files ?06:38
nanonymejuergbi, okay, I just noticed an anomaly in our build system. I had build running on x86_64/i686 (which are failing the most) and they have different file open ulimits (like magnitude smaller) than others06:38
juergbitristan: ah, right, I was thinking of --builders but you wrote --max-jobs06:39
nanonymeI started https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/pipelines/198924964 that just outputs ulimit information for all builders06:39
tristanThat is of course a better workaround, to increase the limit06:39
nanonymeYeah, this definitely must be changed. On x86_64/i686 there's hard/soft limit of 1024 (which on the boundary of breakingly low) and on others 104857606:41
juergbiyes, hard limit of 1024 is _really_ low06:41
juergbion a modern system here it's 524288 by default06:41
juergbimaybe we should add a warning to bst startup where we print out the hard limit if it's lower than 64k or so06:43
juergbiin bst 1.x06:43
nanonyme1024 is so low that we get common build failures where it fails to read CAS push certificate from disk 20 minutes or so towards build when it's pulling a lot of things06:44
nanonymeNot even building anything06:44
nanonymeI wonder if the hard limit comes from the docker host here06:46
nanonymeCreated issue on our side, referenced bst issue. I don't think I can do much about it since hard limits are so low as well.06:51
nanonymejuergbi, also makes me wonder if the limits have been tightened at some point since I don't recall seeing these failures when I originally joined the project06:55
juergbimaybe there was a change in CI servers/VMs06:56
nanonymeI'm suspecting the physical host06:57
nanonymeDid another CI round that queries the file-max from /proc06:57
gitlab-br-botmarge-bot123 merged MR !2083 (nanonyme/fix-import->bst-1: Fix import of BlobNotFound) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/208307:15
*** santi has joined #buildstream08:50
*** tristan has quit IRC10:48
gitlab-br-botnanonyme closed issue #1403 (NameError while processing BlobNotFound by bst 1.6.0) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/140311:13
nanonymejuergbi, I wonder if you should even go as far as to refuse to run at all if hard limit is at 1024...14:03
juergbimaybe14:04
nanonymeAnyway, file-max is just fine. It's definitely userland issue. But it sounds like what is the most plausible origin of the configs is from host security.conf through docker service14:05
nanonyme(which is really annoying, it means I can't easily fix it)14:05
*** cs-shadow has joined #buildstream14:50
*** dftxbs3e has quit IRC15:04
*** dftxbs3e has joined #buildstream15:05
*** robjh has joined #buildstream16:03
*** robjh has joined #buildstream16:05
*** santi has quit IRC17:38
gitlab-br-botTheRealMichaelCatanzaro opened issue #1404 (Hang on "Initializing remote caches") on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/140417:50
benschubert^ yeah it seems python 3.9 doesn't work with BuildStream... *sigh*18:04
nanonymebenschubert, would moving to thread-based scheduler make it less painful to support new Python versions?18:18
nanonymeIirc with 3.8 the issue was with some implementation detail in multiprocessing changing18:19
nanonyme(I asked because I noticed you already have an issue open for this)18:19
benschubertnanonyme: probably, I am in the process of testing my branch with python3.9 :)18:19
benschubertmy thread based scheduler at least re-enables coverage with python3.8, so probably fixing some of those problems18:20
nanonymeNice18:20
benschubertI need to find the time to finish it, and still need a MR from tristan to be in before :) But I think I'm getting there for the first implementation18:20
benschubertStill have one deadlock in very rare cases I need ot squash18:21
nanonymebenschubert, I'm going through issues. Is #824 affected by your work?18:24
gitlab-br-botIssue #824: Scheduler requires significant rework https://gitlab.com/BuildStream/buildstream/-/issues/82418:24
benschubertnanonyme: this one seems stale, the symptoms it describes have changed. Though it is true that it needs significant rework18:26
benschubertI'll update it and possibly close it tomorrow18:27
benschubertthanks for surfacing it18:27
gitlab-br-botnanonyme closed issue #762 (Periodic "ping" from long tasks in non-interactive mode) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/76218:33
nanonymeIs #731 anymore a poblem? I guess that's building these days?18:38
gitlab-br-botIssue #731: Freedesktop SDK does not build with master due to missing .git directory https://gitlab.com/BuildStream/buildstream/-/issues/73118:38
gitlab-br-botnanonyme closed issue #720 (Support alias without : in URL) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/72018:40
nanonymeDone18:47
*** hergertme has joined #buildstream18:53
nanonymebenschubert, btw, after your thread-based scheduler is merged, it might make sense to review that this level of hangs if running in stupid configurations cannot happpen with bst master https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/jobs/77721284418:56
nanonyme(TBH probably it cannot, it's easy to just avoid re-reading same data from file over and over again if you have shared memory)18:58
gitlab-br-botBenjaminSchubert closed issue #731 (Freedesktop SDK does not build with master due to missing .git directory) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/73119:05
benschubertnanonyme: thanks :D19:06
benschubertnanonyme: the thing around not enough file descriptors should not be a problem in master already19:06
benschubertalso, for python3.9: my branch has 12 tests failing, 10 hanging, the rest passing. So it's better than master, but not perfect yet19:07
nanonymebenschubert, right, because it uses less fd's. But IMHO the problem there is not really about running out of fd's but the fact that build hangs in some error cases19:07
nanonymeBuild should never hang19:07
benschubertah! yes definitely19:07
benschubertI still have one case where that happens, which I'm trying to reproduce and fix19:07
benschubertand then... we'll need testing and time to see if we still have such cases19:08
nanonymebenschubert, I'm astonished TBH how well bst1 works under that setup still. Like, we say it uses a lot of fd's but that builder apparently has fd hard limit at *1024*, yet it hasn't been reproducibly failing (until now in master with 1.6.0)19:10
nanonymeThere have clearly been issues before (valentind filed an issue a month ago). They have just been really rare19:11
benschubertYeah, that's quite something with so little resources19:13
benschubertI remember when I broke it the other way around (reached ext4 hard limit for hardlinks), you need to push a bit for it to start failing19:14
nanonyme:)19:15
*** toscalix has joined #buildstream19:18
nanonymebenschubert, I think https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/3650/diffs will prevent us from regressing this way again ever20:05
*** toscalix has quit IRC20:51
nanonymehttps://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/jobs/778669118 if this fails, then we have bigger problems :) open file limit is now 104857620:54
benschubertThat should he safe :p21:04
nanonymeWell, that's what we had on other builders21:04
nanonymeAt least setting ulimit in CI should ensure it is same value everywhere21:05
benschubertnanonyme: actually might not if your host has a lower high limit21:16
benschubertIf it doesn't though it will be enough21:17
*** nanonyme has quit IRC22:24
*** cs-shadow has quit IRC22:47
*** nanonyme has joined #buildstream22:48

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!