*** tristan has joined #buildstream | 03:55 | |
*** ChanServ sets mode: +o tristan | 03:55 | |
tristan | FWIW, I was going to do the IRC public migration bit a few weeks ago | 04:10 |
---|---|---|
tristan | And then I ended up reading some page about freenode which.... (A) Talks all about how "groups" or smth works there so that the proper groups can control the channels which belong to them.... (B) Did not provide a link or any useful information at all about how to create or register such a group on freenode | 04:11 |
tristan | I just wanted to register our channel with the chanserv there before making an announcement, but I don't know how | 04:11 |
juergbi | maybe the ASF is already registered as a group with freenode | 04:36 |
tristan | Maybe | 04:41 |
tristan | I was thinking that we still count as an organization, even if we're supported by ASF | 04:42 |
tristan | Any way works, but we should be able to register the channel | 04:42 |
*** tristan has quit IRC | 05:09 | |
*** tristan has joined #buildstream | 05:55 | |
*** ChanServ sets mode: +o tristan | 05:55 | |
* tristan wonders why black is configured in pyproject.toml instead of setup.cfg where everything else is configured | 06:02 | |
nanonyme | tristan, what does one need to do to get CI working on a MR to BuildStream repo? | 06:15 |
tristan | nanonyme, simply create the branch *in* the BuildStream repo, and not in a forked repo | 06:15 |
nanonyme | I don't think I have permissions to do that | 06:15 |
tristan | nanonyme, our policy is to hand out developer access to anyone who asks for it, precisely for this reason | 06:15 |
tristan | so that people can share the CI setup | 06:16 |
nanonyme | Just trying to get this into bst-1 https://gitlab.com/BuildStream/buildstream/-/merge_requests/2082 | 06:16 |
tristan | Ah, well this looks like it's already in the BuildStream repo | 06:16 |
nanonyme | The MR is from fork | 06:16 |
tristan | oh, no it's not, it's nanonyme:... | 06:17 |
tristan | nanonyme, I've just sent you dev access, giving you permission to push to non-protected branches, which will allow you to create an MR within the repo where CI is configured | 06:18 |
gitlab-br-bot | nanonyme opened MR !2083 (nanonyme/fix-import->bst-1: Fix import of BlobNotFound) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/2083 | 06:19 |
tristan | :) | 06:19 |
gitlab-br-bot | nanonyme closed MR !2082 (nanonyme/fix-import->bst-1: Fix import of BlobNotFound) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/2082 | 06:19 |
tristan | Thanks :) | 06:19 |
* tristan wonders if we have this problem in master | 06:19 | |
nanonyme | Probably not. I assume this was a mistake backporting cache stuff as part of 1.6.x | 06:20 |
gitlab-br-bot | tristanvb approved MR !2083 (nanonyme/fix-import->bst-1: Fix import of BlobNotFound) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/2083 | 06:20 |
tristan | nod | 06:21 |
nanonyme | tristan, note as said, the root cause is possibly https://gitlab.com/BuildStream/buildstream/-/issues/1382 which hasn't gotten much debugging yet | 06:21 |
tristan | In master we have "from ._cas.casremote import BlobNotFound" | 06:21 |
nanonyme | It's possible bst1 leaks fd's | 06:21 |
tristan | So I suppose it's not an issue there | 06:21 |
tristan | Its not a leak | 06:22 |
tristan | it's a system limit of simultaneously open fds | 06:22 |
tristan | nanonyme, this is a symptom of bst-1's approach of using the python fuse layer as I understand it | 06:22 |
nanonyme | So does using buildbox-casd and buildbox-fuse help? | 06:23 |
tristan | a lot of fds get opened per build - it can be worked around by increasing the system limit at the OS level, and if not sufficient, then you need to reduce parallelism unfortunately | 06:23 |
tristan | Yes, with the buildbox-fuse and buildbox-casd setup, this is not a problem | 06:24 |
tristan | Honestly, I have not been very involved in the buildbox side, I am merely repeating when I heard juergbi say on the same topic just last week | 06:25 |
nanonyme | Just wondering how much that cache backport actually pulled in. I guess it's not enough to move bst1 to buildbox-casd? | 06:25 |
tristan | But I think it's a credible source ;-) | 06:25 |
nanonyme | But just to support remote asset caches | 06:26 |
tristan | There was a whole other mountain of backports which I didn't want to risk pulling in, and those would have changed local cache management to make expiry work like in bst-2 | 06:26 |
tristan | which would have been nice but; seems like a huge amount of effort that we should be spending towards actually making a bst-2 release instead | 06:27 |
juergbi | nanonyme: correct, backporting use of buildbox-casd and buildbox-fuse would likely be too big | 06:27 |
nanonyme | Maybe the defaults in bst-1 limits should be reduced then? It clearly doesn't scale how well the default configuration claims it should | 06:28 |
nanonyme | Even with bst2 released, there's no intent on moving already released versions of bst projects to it. So bst1 maintenance still continues like at least two years after bst2 release pragmatically | 06:29 |
nanonyme | tristan, so. Is bst1 CAS implementation more like to fail if you bst build larger amount of components than smaller amount of components? | 06:33 |
nanonyme | So IOW if you split into multiple bst build invocations, builds might start passing | 06:34 |
juergbi | the fd issue is not related to CAS | 06:34 |
juergbi | it's due to the SafeHardlinks FUSE layer | 06:34 |
nanonyme | juergbi, where exactly is it then and how can we workaround it properly? As said, we need to still keep building this with bst1 for two more years | 06:35 |
nanonyme | At least | 06:35 |
nanonyme | Depends on how soon bst2 is out | 06:35 |
juergbi | iirc, the FUSE layer is running in the job process, so I wouldn't expect it to be affected by the number of concurrent build jobs | 06:36 |
nanonyme | It seems potentially more likely to fail when there's statistically more items to pull from CAS than to build | 06:36 |
juergbi | nanonyme: on Linux it's easy to increase the fd limit if you have root access to the build machine | 06:36 |
tristan | You should be able to reduce it with `bst --max-jobs {lower-number} build ...` I think | 06:37 |
juergbi | tristan: isn't pyfuse running in the job process, i.e., the fds are open in the job process and thus not affected by concurrent jobs? | 06:37 |
tristan | Hmmm | 06:37 |
tristan | Good point | 06:37 |
tristan | juergbi, but, wouldn't less files be concurrently accessed in that job, if less subprocesses are opening files ? | 06:38 |
nanonyme | juergbi, okay, I just noticed an anomaly in our build system. I had build running on x86_64/i686 (which are failing the most) and they have different file open ulimits (like magnitude smaller) than others | 06:38 |
juergbi | tristan: ah, right, I was thinking of --builders but you wrote --max-jobs | 06:39 |
nanonyme | I started https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/pipelines/198924964 that just outputs ulimit information for all builders | 06:39 |
tristan | That is of course a better workaround, to increase the limit | 06:39 |
nanonyme | Yeah, this definitely must be changed. On x86_64/i686 there's hard/soft limit of 1024 (which on the boundary of breakingly low) and on others 1048576 | 06:41 |
juergbi | yes, hard limit of 1024 is _really_ low | 06:41 |
juergbi | on a modern system here it's 524288 by default | 06:41 |
juergbi | maybe we should add a warning to bst startup where we print out the hard limit if it's lower than 64k or so | 06:43 |
juergbi | in bst 1.x | 06:43 |
nanonyme | 1024 is so low that we get common build failures where it fails to read CAS push certificate from disk 20 minutes or so towards build when it's pulling a lot of things | 06:44 |
nanonyme | Not even building anything | 06:44 |
nanonyme | I wonder if the hard limit comes from the docker host here | 06:46 |
nanonyme | Created issue on our side, referenced bst issue. I don't think I can do much about it since hard limits are so low as well. | 06:51 |
nanonyme | juergbi, also makes me wonder if the limits have been tightened at some point since I don't recall seeing these failures when I originally joined the project | 06:55 |
juergbi | maybe there was a change in CI servers/VMs | 06:56 |
nanonyme | I'm suspecting the physical host | 06:57 |
nanonyme | Did another CI round that queries the file-max from /proc | 06:57 |
gitlab-br-bot | marge-bot123 merged MR !2083 (nanonyme/fix-import->bst-1: Fix import of BlobNotFound) on buildstream https://gitlab.com/BuildStream/buildstream/-/merge_requests/2083 | 07:15 |
*** santi has joined #buildstream | 08:50 | |
*** tristan has quit IRC | 10:48 | |
gitlab-br-bot | nanonyme closed issue #1403 (NameError while processing BlobNotFound by bst 1.6.0) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/1403 | 11:13 |
nanonyme | juergbi, I wonder if you should even go as far as to refuse to run at all if hard limit is at 1024... | 14:03 |
juergbi | maybe | 14:04 |
nanonyme | Anyway, file-max is just fine. It's definitely userland issue. But it sounds like what is the most plausible origin of the configs is from host security.conf through docker service | 14:05 |
nanonyme | (which is really annoying, it means I can't easily fix it) | 14:05 |
*** cs-shadow has joined #buildstream | 14:50 | |
*** dftxbs3e has quit IRC | 15:04 | |
*** dftxbs3e has joined #buildstream | 15:05 | |
*** robjh has joined #buildstream | 16:03 | |
*** robjh has joined #buildstream | 16:05 | |
*** santi has quit IRC | 17:38 | |
gitlab-br-bot | TheRealMichaelCatanzaro opened issue #1404 (Hang on "Initializing remote caches") on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/1404 | 17:50 |
benschubert | ^ yeah it seems python 3.9 doesn't work with BuildStream... *sigh* | 18:04 |
nanonyme | benschubert, would moving to thread-based scheduler make it less painful to support new Python versions? | 18:18 |
nanonyme | Iirc with 3.8 the issue was with some implementation detail in multiprocessing changing | 18:19 |
nanonyme | (I asked because I noticed you already have an issue open for this) | 18:19 |
benschubert | nanonyme: probably, I am in the process of testing my branch with python3.9 :) | 18:19 |
benschubert | my thread based scheduler at least re-enables coverage with python3.8, so probably fixing some of those problems | 18:20 |
nanonyme | Nice | 18:20 |
benschubert | I need to find the time to finish it, and still need a MR from tristan to be in before :) But I think I'm getting there for the first implementation | 18:20 |
benschubert | Still have one deadlock in very rare cases I need ot squash | 18:21 |
nanonyme | benschubert, I'm going through issues. Is #824 affected by your work? | 18:24 |
gitlab-br-bot | Issue #824: Scheduler requires significant rework https://gitlab.com/BuildStream/buildstream/-/issues/824 | 18:24 |
benschubert | nanonyme: this one seems stale, the symptoms it describes have changed. Though it is true that it needs significant rework | 18:26 |
benschubert | I'll update it and possibly close it tomorrow | 18:27 |
benschubert | thanks for surfacing it | 18:27 |
gitlab-br-bot | nanonyme closed issue #762 (Periodic "ping" from long tasks in non-interactive mode) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/762 | 18:33 |
nanonyme | Is #731 anymore a poblem? I guess that's building these days? | 18:38 |
gitlab-br-bot | Issue #731: Freedesktop SDK does not build with master due to missing .git directory https://gitlab.com/BuildStream/buildstream/-/issues/731 | 18:38 |
gitlab-br-bot | nanonyme closed issue #720 (Support alias without : in URL) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/720 | 18:40 |
nanonyme | Done | 18:47 |
*** hergertme has joined #buildstream | 18:53 | |
nanonyme | benschubert, btw, after your thread-based scheduler is merged, it might make sense to review that this level of hangs if running in stupid configurations cannot happpen with bst master https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/jobs/777212844 | 18:56 |
nanonyme | (TBH probably it cannot, it's easy to just avoid re-reading same data from file over and over again if you have shared memory) | 18:58 |
gitlab-br-bot | BenjaminSchubert closed issue #731 (Freedesktop SDK does not build with master due to missing .git directory) on buildstream https://gitlab.com/BuildStream/buildstream/-/issues/731 | 19:05 |
benschubert | nanonyme: thanks :D | 19:06 |
benschubert | nanonyme: the thing around not enough file descriptors should not be a problem in master already | 19:06 |
benschubert | also, for python3.9: my branch has 12 tests failing, 10 hanging, the rest passing. So it's better than master, but not perfect yet | 19:07 |
nanonyme | benschubert, right, because it uses less fd's. But IMHO the problem there is not really about running out of fd's but the fact that build hangs in some error cases | 19:07 |
nanonyme | Build should never hang | 19:07 |
benschubert | ah! yes definitely | 19:07 |
benschubert | I still have one case where that happens, which I'm trying to reproduce and fix | 19:07 |
benschubert | and then... we'll need testing and time to see if we still have such cases | 19:08 |
nanonyme | benschubert, I'm astonished TBH how well bst1 works under that setup still. Like, we say it uses a lot of fd's but that builder apparently has fd hard limit at *1024*, yet it hasn't been reproducibly failing (until now in master with 1.6.0) | 19:10 |
nanonyme | There have clearly been issues before (valentind filed an issue a month ago). They have just been really rare | 19:11 |
benschubert | Yeah, that's quite something with so little resources | 19:13 |
benschubert | I remember when I broke it the other way around (reached ext4 hard limit for hardlinks), you need to push a bit for it to start failing | 19:14 |
nanonyme | :) | 19:15 |
*** toscalix has joined #buildstream | 19:18 | |
nanonyme | benschubert, I think https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/3650/diffs will prevent us from regressing this way again ever | 20:05 |
*** toscalix has quit IRC | 20:51 | |
nanonyme | https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/jobs/778669118 if this fails, then we have bigger problems :) open file limit is now 1048576 | 20:54 |
benschubert | That should he safe :p | 21:04 |
nanonyme | Well, that's what we had on other builders | 21:04 |
nanonyme | At least setting ulimit in CI should ensure it is same value everywhere | 21:05 |
benschubert | nanonyme: actually might not if your host has a lower high limit | 21:16 |
benschubert | If it doesn't though it will be enough | 21:17 |
*** nanonyme has quit IRC | 22:24 | |
*** cs-shadow has quit IRC | 22:47 | |
*** nanonyme has joined #buildstream | 22:48 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!