*** benschubert has quit IRC | 00:01 | |
*** phildawson has quit IRC | 02:09 | |
*** phildawson has joined #buildstream | 02:22 | |
*** tristan has quit IRC | 05:18 | |
*** tristan has joined #buildstream | 06:13 | |
*** ChanServ sets mode: +o tristan | 06:13 | |
*** tristan has quit IRC | 06:30 | |
*** tristan has joined #buildstream | 06:39 | |
*** ChanServ sets mode: +o tristan | 06:39 | |
*** benschubert has joined #buildstream | 06:57 | |
tristan | benschubert, Ummm, your 6K element test project.... is it 6K elements *wide* by chance ? | 07:30 |
---|---|---|
tristan | It appears to be impossible to have a project that is > 1K elements *tall* | 07:30 |
benschubert | tristan: No, it's not a flat project | 07:30 |
benschubert | oh? | 07:30 |
benschubert | It's definitely not 1k elements tall though | 07:30 |
* tristan gets recursion error | 07:30 | |
benschubert | it's based on `base-files` in debian (and all its deps) | 07:30 |
* tristan used a simple script to make a chain of elements depend on eachother | 07:31 | |
tristan | I see | 07:31 |
tristan | You know, we have an `except RecursionError` in _frontend/app.py which says, "Sorry, you'r SOL" | 07:31 |
benschubert | wait what? | 07:31 |
tristan | in more professional language | 07:32 |
benschubert | ah, I got worried :'D | 07:32 |
tristan | So that you don't get to see any stack trace | 07:32 |
tristan | I guess that's better than a stack trace | 07:32 |
benschubert | unless you --debug :) | 07:32 |
tristan | Really ? | 07:33 |
* tristan uncomments | 07:33 | |
benschubert | at least it used to be the case | 07:33 |
benschubert | if it's not the case anymore we need to re-enable that | 07:33 |
* tristan also noticed that it appears that --error-lines is getting ignored | 07:33 | |
tristan | I get the stack trace in a BUG message if I uncomment that handling | 07:34 |
tristan | But then, it shows me a hard coded limit of stderr lines | 07:34 |
tristan | or of exception lines | 07:34 |
tristan | Without consulting Context.log_error_lines | 07:34 |
* tristan guesses we really need to have a test for that if we want people to pay attention... sadly :-S | 07:35 | |
tristan | maybe it's just with stack traces | 07:36 |
tristan | no no | 07:36 |
tristan | benschubert, --debug doesnt cause the RecursionError stacktrace to be printed either ;-) | 07:36 |
tristan | Aaaaaanyway | 07:37 |
tristan | Polish for another day, and hope it stays polished after that | 07:37 |
benschubert | yeah :/ | 07:37 |
* tristan doesn't think we need to cover every little detail with a test case, and would rather expect people to fully understand code they are replacing | 07:38 | |
benschubert | oh by the way, I had a wild thought the other day: we are using mypy for the public API, how do you find it? Useful? I was wondering if we should switch from "only the minimum for the public API to pass" to "as you wish, as long as it's well made and not blocking others" ? | 07:38 |
tristan | I would like to get to a place where it is *everywhere* so long as there is no runtime overhead | 07:38 |
tristan | I don't know what is the official policy but, I guess that it makes sense to mandate it in public API | 07:39 |
tristan | Getting it *everywhere* would force us to restructure modules to avoid any circular dependencies of modules at all | 07:39 |
benschubert | yeah currently we've said "only public" | 07:39 |
tristan | Would would be the main benefit I would see | 07:39 |
benschubert | not necessarily, but yeah that could help with such refactors at least | 07:40 |
tristan | I mean, right now we think it's okay to call a method on an object which we did not import it's class | 07:40 |
tristan | That kind of stuff would become impossible | 07:40 |
tristan | not sure, but thought it might | 07:41 |
benschubert | still possible with forward declarations and If TYPECHECKING guards | 07:41 |
benschubert | so it's more work but it's possible | 07:41 |
benschubert | however it becomes obvious :) | 07:41 |
tristan | Yeah, that too, we can expose the warts in their horrible glory | 07:41 |
benschubert | so would you be rather in favor if I start a ML thread about allowing mypy anywhere we want? | 07:41 |
tristan | I think I've already pushed code which adds annotations in non-public API | 07:42 |
tristan | nobody caught me doing it ;-) | 07:42 |
benschubert | ok I take that for a yes :D | 07:42 |
tristan | PluginFactory | 07:42 |
tristan | That uses some, not completely but some | 07:42 |
benschubert | second point I wanted to ask: I have found `black` to be very useful for our formatting. One last thing bugs me: our import statements are never consistent. Would you be fine with using `isort` in addition to sort all our includes _in a single_possible_way_ ? | 07:43 |
tristan | Of course, it makes no sense for cdef cython things, but I guess outward facing def/cpdef functions on cython classes can still do it right ? | 07:43 |
tristan | what is isort ? | 07:43 |
benschubert | for cython, we can use .pyi files like I did for yaml.pyx :) | 07:44 |
benschubert | isort is a tool that focus on one thing: sorting import statements for python | 07:44 |
benschubert | it has an autoformat mode and a check mode | 07:44 |
tristan | right, I would not favor alphabetic, but the webpage says something about sections | 07:44 |
tristan | not sure how it would work | 07:44 |
benschubert | you can decide, I want "stdlib - htird party - first party - tests" for example | 07:44 |
tristan | we have a nicely treeish thing | 07:44 |
benschubert | and then inside each section you can say: "normal imports first, then "from ... import", sorted alphabetically | 07:45 |
benschubert | ah you prefer the tree? | 07:45 |
tristan | So I'd like to have [standard libraries] -> [parent package] -> [local package] imports | 07:45 |
tristan | I would not like those to be interspersed | 07:46 |
benschubert | yeah, you can ask for this :) | 07:46 |
tristan | from . import foo goes last | 07:46 |
tristan | preceded by from .. | 07:46 |
benschubert | So if we can get such an order, you'd be in favor of an auto-import formatter? | 07:46 |
tristan | Yep :) | 07:46 |
tristan | That's pretty much my only nit I think | 07:47 |
tristan | Yeah | 07:47 |
benschubert | ok! I'll see what I can do and push it to the list then, thanks | 07:47 |
traveltissues | tristan: is there a planned release of 1.93.4.dev0 to pypi? | 07:48 |
tristan | traveltissues, not particularly | 07:49 |
traveltissues | it would be useful is that were published. would you please consider that | 07:49 |
tristan | traveltissues, pypi for dev snapshots at this point is more of an on demand/depending on whether cs-shadow feels like it kinda thing | 07:49 |
tristan | Pushing out tags is cheap | 07:50 |
benschubert | tristan: I can ping cs-shadow to ask him if he can push it if you need it | 07:50 |
tristan | Going and doing a whole pypi dance is another thing, we probably have another tag out by the time a pypi dance is performed | 07:50 |
tristan | benschubert, Ask traveltissues ... I'm not the one expressing a need :D | 07:50 |
benschubert | oups, I means traveltissues | 07:50 |
benschubert | my autocomplete tricked me :) | 07:51 |
tristan | Ah, 't' ! | 07:51 |
tristan | 'tr' ! | 07:51 |
tristan | traveltissues, we're sorted in the same bucket ;-) | 07:51 |
tristan | bucket buddies | 07:51 |
traveltissues | well i don't strictly need it, it would be nicer if we could add the local dir to the sys.path when installing in dev mode | 07:53 |
benschubert | traveltissues: what do you mean? | 07:54 |
traveltissues | if you install bst-plugins-experimental in dev mode | 07:55 |
traveltissues | >>> print(sys.path) | 07:55 |
traveltissues | ['', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/root/.local/lib/python3.8/site-packages', '/bst-plugins-experimental/src', '/usr/lib/python3.8/site-packages'] | 07:55 |
traveltissues | trying the same thing with bst and copying the source, libs, and binary to another container breaks it | 07:55 |
traveltissues | bash-5.0# bst | 07:55 |
traveltissues | Traceback (most recent call last): | 07:55 |
traveltissues | File "/usr/local/bin/bst", line 5, in <module> | 07:55 |
traveltissues | from buildstream._frontend import cli | 07:55 |
traveltissues | ModuleNotFoundError: No module named 'buildstream' | 07:55 |
benschubert | what do you call dev mode? `pip install -e ` ? | 07:55 |
traveltissues | yes | 07:55 |
benschubert | are you doing this in a venv? | 07:56 |
traveltissues | in docker | 07:56 |
benschubert | so you are `pip install -e buildstream` and it doesn't work? | 07:56 |
traveltissues | no, the installation works, copying this to another container doesn't. that's not the surprising thing, what i'm wondering is why is bst-plugins-experimental still found when it's copied but buildstream is not | 07:57 |
benschubert | you might not be copying everything correctly | 07:58 |
benschubert | pip install -e does quite a few things under the hood | 07:58 |
benschubert | I'm curious why you need to copy stuff like that? | 07:58 |
traveltissues | yes, but i'm trying to understand what the difference is here if they're both installed the same way | 07:58 |
benschubert | bst-plugins-experimental is not a cli | 07:58 |
tristan | I think the next tag will be more interesting to have on pypi for what it's worth | 07:59 |
traveltissues | i'm copying it because the container i have doesn't have the tools to install, it doesn't have pip for example | 07:59 |
tristan | The last tag doesnt include the juncle merge | 07:59 |
tristan | Which is a game changer | 07:59 |
benschubert | traveltissues: ./setup.py develop would work too | 07:59 |
tristan | I'd like to land a fix to Element.search() supporting full paths properly first and get all the full-pathing cases right | 08:00 |
traveltissues | nice | 08:00 |
benschubert | tristan: speaking of junction jungle.... it's not possible to have a project depending on itself right? | 08:00 |
traveltissues | i didn't know that, thanks benschubert | 08:00 |
tristan | benschubert, yes it is | 08:00 |
benschubert | traveltissues: no problem, guess it should solve your problem :) | 08:00 |
benschubert | tristan: oh! perfect | 08:00 |
traveltissues | yes, it did | 08:00 |
* benschubert is going back to his recursive project | 08:01 | |
tristan | benschubert, I think it needs to be 'internal' though | 08:01 |
tristan | we don't have a notation to express 'self' in 'duplicates' | 08:01 |
benschubert | tristan: yep my aim is to have it internal anyways | 08:02 |
tristan | But I think it's alright this way, the use cases for self junctioning I can think of should be internal, and we could extend with that if the need arises | 08:02 |
benschubert | yep, my aim is to have a stage3 rebuild itself | 08:02 |
benschubert | so internal is exactly what I need | 08:02 |
tristan | yeah, bootstrapping is the main use case | 08:02 |
tristan | (the known one anyway) | 08:03 |
tristan | So it looks like for me to do meaningful benchmarks, I really need to run the same intense project 1000 times and take an average | 08:04 |
tristan | The results are so erratic | 08:04 |
benschubert | yeah the way I run them is 4 runs and I discard the first on the debian-like | 08:05 |
tristan | only 4 ? maybe with 6K elements you have more stable results | 08:05 |
benschubert | yep | 08:06 |
tristan | With my measly 900K tall project it varies a lot | 08:06 |
tristan | 900 tall | 08:06 |
tristan | -K | 08:06 |
benschubert | you've seen the difference in timing in my results :) (the third column) | 08:06 |
tristan | yeah | 08:07 |
tristan | Ok what should I do :-S | 08:07 |
tristan | Hmmm | 08:07 |
tristan | benschubert, mind if I toss you some branches to run benchmarks, you can do this in background without too much distraction ? | 08:08 |
tristan | I'd like to try without the unneeded ref/unref calls, and try a more complete array approach to see if that improves things | 08:08 |
tristan | the first one I can dish out in a minute | 08:08 |
benschubert | Sure, won't be the same machine and will take longer but I can :) | 08:09 |
tristan | alright | 08:09 |
tristan | benschubert, pushed removal of refcounting | 08:13 |
benschubert | on it :) | 08:13 |
tristan | Aha | 08:14 |
tristan | benschubert, wait, can you run it while comparing the second last commit too ? | 08:14 |
tristan | Since it's a different machine, I'd like to compare the cost of that with the last, see if it makes a difference | 08:15 |
benschubert | sure | 08:15 |
traveltissues | i've been out of the loop for a while. what happened with the autobenchmark bot effort? | 08:20 |
tristan | Ok that will be helpful, I think I might have a breakthrough too, I'll let you know | 08:20 |
* tristan has spotted a crucial difference between the old and new _variables code | 08:20 | |
benschubert | traveltissues: not sure it is "ready"yet? Haven't heard about it for a bit | 08:23 |
tristan | Last I heard, we had a plan to at least run `bst build` for every commit in the benchmarks repo, without gating it on CI, and automatically updating a webpage with that result | 08:29 |
tristan | Even that never happened | 08:30 |
tristan | I think it just involves using an API key and interacting with gitlab restful API | 08:30 |
tristan | I noticed where my code is more complex. | 08:37 |
tristan | The old code has the _expstr_map, which is a dictionary of "either list or string" | 08:37 |
tristan | If it's a string, then it's resolved, if it's a list, then it needs to be recursively resolved | 08:37 |
tristan | There is a quicker return for resolved values | 08:38 |
benschubert | oh good catch | 08:38 |
tristan | Adding a second dictionary for resolved values gets much similar performance, *but* it would much increase memory signature | 08:39 |
tristan | Looking at the current code, it would seem that the way things do get resolved, we don't technically need the ValueClasses for already resolved values | 08:39 |
tristan | i.e. you cannot have a circular reference error | 08:40 |
tristan | or an unresolved counterpart, for something that is already resolved | 08:40 |
tristan | So... I could probably change this to do a similar "str-or-Value" dictionary | 08:40 |
tristan | crap, that might not be it :-S | 08:44 |
* tristan was looking at false results | 08:44 | |
tristan | I could still be close though | 08:44 |
tristan | Hmmm, ok I need to rethink this | 08:45 |
tristan | too many dictionary lookups while resolving a variable, that is what is changing things | 08:45 |
tristan | Ahhhh, ok so tiny quicker return could be helpful but maybe not | 08:46 |
tristan | However, when I do the reverse loop over value dependencies, instead of indexing variables, the resolved values of previously iterated values could be readily in context | 08:47 |
tristan | this would match the nature of the recursive resolve_variable_and_use_result algo we had before | 08:47 |
tristan | https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables/src/buildstream/_variables.pyx#L332 | 08:48 |
tristan | iter_value.resolve() right there, could be given the previously resolved values, the order of the array should ensure that any value it *wants*, is the next value in the list | 08:49 |
tristan | so if a value needs 2 variables, it takes the last 2 previously resolved ones | 08:49 |
tristan | (whether they were resolved this round or not) | 08:49 |
WSalmon | juergbi, i reran this job quite a few times, it always crashes while just pulling, https://gitlab.com/celduin/bsps/bst-boardsupport/-/jobs/622631814 but https://gitlab.com/celduin/bsps/bst-boardsupport/-/commit/b3d248c38efb191088258df8d5a2d090de8fc038 this change allows it to get MUCH further, im mostly running 1.93.4 see master of https://gitlab.com/freedesktop-sdk/infrastructure/freedesktop-sdk-docker-images for the exact versions of everything | 09:00 |
WSalmon | i presume https://gitlab.com/freedesktop-sdk/infrastructure/freedesktop-sdk-docker-images/-/blob/master/project.refs#L176 is the version of casd that should not cause excessive memeory usage? | 09:01 |
cphang | WSalmon I thought that the error handling of build box was cleaned up so there are more informative errors now? | 09:01 |
WSalmon | cphang, i think the issue is that the system is OOM so everything crashes a bit badly | 09:02 |
WSalmon | when i changed the other stages to be limited like that i did it after localy running using the amount of ram that those runners have. but i though we had updated things since then. i should have dont that last job too but gitlab-ci variable subsutition only gose 2 variables deep which makes everything trickythat stage got missed.. | 09:07 |
WSalmon | i didnt mean to send ^ i was trying to delete it and fat finngered but you get the jist | 09:07 |
juergbi | WSalmon: do you have the kernel log? | 09:10 |
juergbi | to see the casd memory usage at time of OOM kill | 09:11 |
WSalmon | no, all i have is https://gitlab.com/celduin/bsps/bst-boardsupport/-/jobs/622631814/artifacts/browse/artifact/ | 09:11 |
juergbi | casd 0.0.9 should indeed already have the memory usage fixes but there might be more issues | 09:11 |
juergbi | WSalmon: how much RAM do these runners have? and are multiple docker containers running on a single runner without RAM restrictions? could it be caused by another container? | 09:13 |
juergbi | WSalmon: running buildbox-casd with `/usr/bin/time -v` prints the peak memory usage, iirc | 09:14 |
juergbi | if you want to look into this issue, describing steps that trigger higher than expected peak memory usage in casd would be very helpful | 09:15 |
juergbi | could be done by isolated pulling | 09:15 |
juergbi | I thought I did this before and after my memory fixes and it was reasonable afterwards | 09:16 |
WSalmon | juergbi, they are amazonec2-instance-type=a1.2xlarge with 16Gb of ram | 09:16 |
juergbi | however, my focus was on handling of large files, maybe there is an additional issue with lots of regular sized files (instead of individual large files) | 09:16 |
WSalmon | they have one ci job per vm | 09:16 |
WSalmon | so each ci job has 16GB to its self minus the OS and docker over head, the vms are spun up by docker machine each morning | 09:17 |
juergbi | ok, so presumably 10 pulls at once trigger this on 16 GB | 09:17 |
WSalmon | yep | 09:17 |
WSalmon | afaict | 09:17 |
juergbi | should definitely not be happening | 09:17 |
juergbi | might be a leak | 09:18 |
juergbi | (the fixed issue was not a leak, simply not-optimal code requiring high peak usage) | 09:18 |
WSalmon | i cant rember if i have tride it with 0.0.9 but i have had it use that much ram locally for that job | 09:18 |
WSalmon | * the x86 verion | 09:18 |
juergbi | the ISA will likely not make a big difference | 09:19 |
juergbi | so it should be possible to reproduce this on x86-64 | 09:19 |
WSalmon | thats what i ment by locally | 09:19 |
WSalmon | incedentally you can pull that arm job on to you x86 machine with `rpi4-features]$ bst -o target_arch aarch64 build deploy/image.bst` | 09:20 |
WSalmon | i think you have the keys to that cache but its open for anyone to pull from | 09:21 |
WSalmon | if you clone and run from the sample/rpi4-feature folder | 09:22 |
juergbi | WSalmon: I think I can reproduce it | 09:36 |
WSalmon | :D | 09:36 |
juergbi | doesn't appear to be a memory leak but rather excessive memory usage with large trees | 09:41 |
juergbi | e.g. can be seen by the memory usage of pulling bootstrap/acl-build-deps.bst alone | 09:41 |
juergbi | WSalmon: https://gitlab.com/BuildGrid/buildbox/buildbox-casd/-/issues/76 | 10:00 |
tristan | Whoa, ok lists are definitely more expensive than arrays | 10:00 |
WSalmon | thanks juergbi | 10:03 |
juergbi | I'm just realizing, while this will fix 'the large artifact of small files' issue, it's still not idea for large files | 10:05 |
WSalmon | juergbi, they want writing strat to disk? | 10:07 |
WSalmon | straight | 10:08 |
WSalmon | presumably its worth not pulling to fast as well as then you just get clogged up in IO issues? | 10:08 |
WSalmon | or will linux handdle that nicely | 10:08 |
juergbi | we have to write them to disk, that's where our CAS cache is | 10:09 |
juergbi | (disk could be tmpfs but that can require a lot of RAM) | 10:09 |
WSalmon | right, i was just thinking that in some data centers you have huge pull bandwidth but terible disk IO | 10:11 |
WSalmon | but thats probly premiture optomisation, and we just need to get them on to disk and off RAM quicker | 10:11 |
juergbi | yes, there isn't too much that can be done about this | 10:12 |
juergbi | except striving towards reducing unnecessary pulling - or using machines with tons of RAM and tmpfs | 10:12 |
juergbi | *and | 10:13 |
WSalmon | i think bst defauling to unlimited pullers might not be the best bet ether | 10:15 |
juergbi | it defaults to 10 max | 10:15 |
WSalmon | ah | 10:18 |
coldtom | 10 pullers is still enough to put a lot of stress on a machine ime | 10:18 |
WSalmon | +1 | 10:18 |
WSalmon | Especally for anything based on FD | 10:18 |
benschubert | you can always set the default in your user conf no? | 10:20 |
juergbi | yes, the best default will vary depending on your machine and your connection | 10:21 |
WSalmon | benschubert, yes but it seems silly to have a default that requires you to have a 32Gb machine..? having a function of ram or cpu seems more sane to me, but getting buildbox-casd to write straight to disk seems like the most important | 10:21 |
WSalmon | if you have a huge machine you may well need to bump it up any way | 10:22 |
juergbi | the buildbox-casd issue needs to be fixed, of course | 10:22 |
juergbi | I don't think we should reduce the default in buildstream because of that | 10:22 |
benschubert | I agree with juergbi, it seems to work well with 10 even on my core 2duo 8GB ram machine | 10:22 |
WSalmon | benschubert, you have pulled 10 big chunks of FD from a fast cache like celduins at the same time or were you fetching from sources like github that are quite slow? | 10:25 |
benschubert | WSalmon: I have a local cache at home, so we're talking 1GBps :) | 10:26 |
benschubert | and I'm currently working with clang and friends | 10:27 |
benschubert | my version of bst is not 100% up to date though | 10:27 |
WSalmon | umm, interesting, also fetching and pulling work quite diffrenlty i have though for a while that they should have diffrent limiters, as fetching hits the network via the source tool but the pull is via casd | 10:28 |
juergbi | the issue is with large artifacts | 10:29 |
juergbi | the relevant fdo-sdk artifacts are gigabytes (single artifact) | 10:29 |
juergbi | as they compose build-deps into an artifact of their own | 10:29 |
benschubert | juergbi: but that's a very old bug no? I remember opening it when I had 60GB artifacts | 10:29 |
benschubert | or was it fixed and came back in? | 10:30 |
juergbi | there was a capture issue that was fixed | 10:30 |
juergbi | for pulling I don't think it's a regression | 10:30 |
juergbi | at least on the buildbox side | 10:30 |
juergbi | it's very possible we fixed this in buildstream before we handed this task to buildbox-casd | 10:31 |
juergbi | (or it was correct from the beginning in pre-casd buildstream) | 10:31 |
benschubert | tristan: https://gitlab.com/BuildStream/buildstream/-/compare/master...bschubert%2Fisort that's roughly what isort would look like, there's still 1-2 details I'd need to fix around comments though | 10:49 |
benschubert | juergbi: seems like the only tests now failing are the ones using an artifact_share, do you have any idea where we could have more problems? | 10:50 |
juergbi | benschubert: BaseRemote.init? | 10:51 |
juergbi | can hopefully get rid of that fairly soon with step 2 of Remote Asset support | 10:51 |
benschubert | ah good point let me try | 10:52 |
tristan | benschubert, looking at a modified file (instead of the diff): https://gitlab.com/BuildStream/buildstream/-/blob/2833a6abd562a628f4039e74a1dadf12661d2b6d/src/buildstream/_frontend/app.py ... those comments are somehow preserved from the source code ? | 11:04 |
* tristan can see that private modules come before public ones, simply due to the sort, which seems alright | 11:04 | |
benschubert | tristan: yeah the comments are preserved but not moved correctly, I'd need to go through all of them to ensure they are in the correct order / remove the stale ones ;) | 11:08 |
benschubert | juergbi: it works! awesome, thanks a lot | 11:12 |
benschubert | ok now cleanup and make a WIP | 11:13 |
juergbi | :) | 11:13 |
tristan | benschubert, so confusing results so far, any word on that benchmark ? | 11:20 |
benschubert | let me go and check | 11:20 |
tristan | it will be interesting just to know if it makes any difference | 11:20 |
tristan | and if so, how much | 11:20 |
* tristan actually *lost* performance by removing unnecessary dict lookups | 11:21 | |
* tristan palmface | 11:21 | |
tristan | doesnt seem to make any sense | 11:21 |
benschubert | tristan: https://gitlab.com/snippets/1992419 should realized I forgot to do one of the checks | 11:22 |
benschubert | which other commit did you want again? | 11:22 |
tristan | oh no :( | 11:22 |
tristan | I wanted the tip and the before tip | 11:23 |
tristan | just to measure the difference with and without refcounting | 11:23 |
benschubert | ok let me restart with the one before | 11:24 |
benschubert | is just the `bst show` enough? | 11:24 |
benschubert | I can speed up benchmarks a lot if so :) | 11:24 |
tristan | yes | 11:25 |
tristan | its enough | 11:25 |
benschubert | tristan: i went for everything: https://gitlab.com/snippets/1992419#note_373289531 | 12:00 |
*** tristan has quit IRC | 12:02 | |
*** tristan has joined #buildstream | 12:20 | |
*** ChanServ sets mode: +o tristan | 12:20 | |
benschubert | juergbi: seems like I've got still more problems :'D https://gitlab.com/BuildStream/buildstream/-/jobs/623129875#L860 | 14:09 |
benschubert | the diff's https://gitlab.com/BuildStream/buildstream/-/compare/master...bschubert%2Fno-multiprocessing | 14:10 |
benschubert | I'm not 100% sure how that can happen :/ | 14:10 |
* douglaswinship is particularly concerned by https://gitlab.com/BuildStream/buildstream/-/jobs/623129875#L6023 | 14:16 | |
douglaswinship | The error messages seem to be getting quiet insulting! | 14:16 |
juergbi | benschubert: so that's ostree-related, right? | 14:16 |
juergbi | ostree is also known for background thread issues | 14:16 |
juergbi | do you only see this in CI? maybe it depends on the ostree version | 14:17 |
benschubert | juergbi: no sorry, line 860 :) | 14:17 |
benschubert | I've seen the ostree ones and I'll have a look at them | 14:17 |
benschubert | but the one line 860 seems worse currently | 14:17 |
juergbi | ah, hm | 14:18 |
juergbi | benschubert: CASDChannel.get_local_cas() is not thread-safe | 14:19 |
benschubert | oh gosh :'D | 14:20 |
juergbi | it assumes that _local_cas is set if _casd_channel is set | 14:20 |
benschubert | I've got a lock around establish_connection though | 14:20 |
benschubert | and same on the close | 14:20 |
juergbi | yes, but the lock is not used in get_local_cas | 14:21 |
juergbi | changing the conditional in get_local_cas to check for _local_cas probably fixes it | 14:21 |
benschubert | https://gitlab.com/BuildStream/buildstream/-/compare/master...bschubert%2Fno-multiprocessing#6b68d6e75602b4de08330322d00edd6aabf0a718_249_246 isn't that enough? | 14:21 |
benschubert | I mean, the 3 only get set at the same time right? | 14:21 |
juergbi | can't the GIL switch between threads between these assignments? | 14:22 |
benschubert | oh | 14:22 |
benschubert | -_-' right | 14:22 |
benschubert | gah thanks | 14:23 |
benschubert | now ostree... | 14:23 |
benschubert | # Use local-only GIO, don't use the DBus-based GVfs backend. | 14:37 |
benschubert | # GVfs/DBus uses background threads, which can cause problems | 14:37 |
benschubert | # with BuildStream's fork-based process model. | 14:37 |
benschubert | os.environ["GIO_USE_VFS"] = "local" | 14:37 |
benschubert | Seems like we should not have threads? -_-' | 14:37 |
juergbi | benschubert: maybe GIO initialization is triggered at some earlier point | 14:43 |
benschubert | even setting it before the import of ostree doesn't change | 14:44 |
juergbi | try setting that environment variable early on in buildstream startup, if you can reproduce this | 14:44 |
benschubert | yeah I can reproduce easily locally | 14:44 |
juergbi | glib/gio might be triggered by something non-ostree, although I don't know what | 14:45 |
WSalmon | so i can pull down a artifact build on a diffrent arch but when i try to check it out it tries to run the sandbox, the final element is a script element with no run time deps, i presume bst want to run some final intergrations stops, can i turn that off? https://pastebin.com/cFX2ZGyT | 14:45 |
benschubert | juergbi: putting that in my pytest env doesn't change anything either | 14:47 |
abderrahim[m] | WSalmon: --no-integrate may help | 14:47 |
WSalmon | i just noticed that but it seems not | 14:47 |
WSalmon | https://pastebin.com/SUkzCW8E | 14:48 |
juergbi | benschubert: can you do a 'thread apply all bt' in gdb at a point where the extra thread exists? | 14:48 |
juergbi | maybe the backtrace of that thread provides some hints | 14:48 |
WSalmon | juergbi, tristan, benschubert i would have thought the --no-integration would fix it is that a bug? or am i using it wrong / it dose something else? | 14:49 |
juergbi | WSalmon: maybe we instantiate the sandbox when we don't need to | 14:49 |
WSalmon | ok, ill go look at the code | 14:50 |
juergbi | I wouldn't expect artifact checkout --no-integrate to require the sandbox | 14:50 |
juergbi | it should even work without buildbox-run installed | 14:50 |
benschubert | juergbi: I can tell they are not python threads, I'm trying to get more info | 14:54 |
benschubert | do you know by any chance where's the code/doc for all of this? | 14:55 |
benschubert | (I mean ostree python) | 14:55 |
juergbi | benschubert: there is no Python OSTree code per se. pygobject dynamically generated Python bindings for the libostree C library | 15:06 |
benschubert | oh :/ less than ideal | 15:07 |
juergbi | the background thread is very likely to be created by libostree (or a dependency of that), not pygobject | 15:07 |
juergbi | it may also depend on the HTTP backend of ostree. iirc, there is a compile-time option between curl and libsoup | 15:08 |
benschubert | so what would you recommend? In the ostree tests we increment the potential number of threads? | 15:08 |
tristan | I'm not sure I understand the context of why ostree libraries starting threads becomes problematic | 15:10 |
tristan | I thought it was mostly problematic that it gets called to early when using the fork() model | 15:10 |
tristan | Quick recommendation though which would result in (A) Easier install story, (B) Lots of code removed, (C) Addressing any conflicts with threads | 15:11 |
juergbi | without fork it should be much less of an issue | 15:11 |
tristan | Would be: Just stop using python bindings for ostree, and call it normally with Plugin.call() | 15:11 |
juergbi | however, it would still be good to be sure that everything is cleaned up after each test | 15:11 |
juergbi | i.e., the tests are isolated | 15:12 |
benschubert | yeah that's the problem, for testing | 15:12 |
tristan | That would also have the benefit of (D): Not special casing that Python API break anymore | 15:12 |
juergbi | if the command line tool is sufficient for the purpose of the ostree source, that might indeed make sense | 15:12 |
juergbi | originally we depended on libostree as it was also used as our artifact cache | 15:13 |
juergbi | so the CLI might not have be sufficient | 15:13 |
juergbi | but purely for the source, it might be | 15:13 |
benschubert | that's true that this would solve the problem | 15:13 |
benschubert | and we need ostree-cli anyways to run the plugin | 15:13 |
benschubert | so in the end... | 15:13 |
tristan | I think the initial decision to use python bindings was rather arbitrary, but definitely driven by using it as artifact cache | 15:14 |
tristan | iirc there was a deeper API we needed for the ostree pushing thing to work well | 15:14 |
benschubert | So you'd be both fine with ostree-cli? We'd loose the reporting of progress, which I think is an improvement :) | 15:14 |
tristan | Haha, that progress has always been whacky | 15:15 |
tristan | percentage up and down as you discover during pull that your total is changing | 15:15 |
*** hasebastian has joined #buildstream | 15:15 | |
benschubert | ok! I'll see what I can do | 15:15 |
juergbi | I'd be fine with that. we don't have this for other sources either | 15:15 |
juergbi | benschubert: that said, fixing the background thread might be as simple as adding a close() call somewhere | 15:16 |
juergbi | with the forked job processes, background threads get eliminated by termination, so we didn't care | 15:16 |
benschubert | juergbi: I've dir() on every ostree-related object we create | 15:16 |
benschubert | none has a `close` method | 15:16 |
benschubert | afaik | 15:16 |
tristan | Myeah, I don't expect state of the art | 15:17 |
tristan | I don't think it's GIOStream based | 15:17 |
juergbi | benschubert: have you also tried setting .repo to None? | 15:17 |
tristan | I was thinking that | 15:18 |
juergbi | essentially we might need the inverse of the ensure() method | 15:18 |
benschubert | I've `del repo` | 15:18 |
tristan | Even if it's GC'ed syncrhonously and "disposed", it would not be uncommon for the thread itself to asynchronously complete (no g_thread_join() in dispose()) | 15:19 |
juergbi | could be the case, yes | 15:19 |
juergbi | if ostree cli works, let's go for that | 15:20 |
benschubert | we're waiting up to 5s for all threads to close | 15:20 |
* tristan recalls spending time to make this happen synchronously for EDS EBookClient handles, and noticing that there wasn't much interest in having this work | 15:20 | |
* tristan rubs tabasco into his eyeball | 15:23 | |
tristan | sad | 15:23 |
benschubert | What part of the code prompted you to do that so I can avoid it? | 15:23 |
tristan | Right before the return from the pizza() function | 15:24 |
tristan | whew ok lets give this another go... I'm gonna throw it ALL away, _variable.py... the Values the ValueClass, the ValueParts, all the nice stuff | 15:26 |
tristan | And see if I can't get a performant loop-instead-of-recurse using the ol' list returned from python regex | 15:26 |
tristan | I'm not particularly hopeful, every way I look at it, avoiding recursion requires allocating data structures | 15:27 |
tristan | So it would logically be slower than pushing a stack frame | 15:27 |
Frazer61 | hi, im getting an issue where it says "Did not find 'bzr' in PATH" when testing in the venv, i have tried installing bzr but it doesnt seem to fix it unfortunately | 15:29 |
tristan | That is odd | 15:31 |
tristan | and when you run `tox -e venv /bin/bash` and type `which bzr`, do you have it ? | 15:31 |
Frazer61 | its when i try to build cachekey. nothing, when i try to install bzr in the venv https://pastebin.com/hQS9ePxB | 15:32 |
tristan | uhhh | 15:33 |
tristan | is there a reason to install bzr into the venv itself ? | 15:34 |
tristan | maybe bzr is implemented in python (I don't keep track) but it shouldnt matter I think | 15:34 |
Frazer61 | https://pastebin.com/y1s4F5S1 | 15:34 |
Frazer61 | for bst build in cachekey | 15:34 |
tristan | right so when you are not in the venv, you have host bzr installed ? | 15:35 |
tristan | `which bzr` ? | 15:35 |
tristan | and when you are in `tox -e venv /bin/bash`, `which bzr` doesn't find your host bzr ? | 15:36 |
Frazer61 | no, because when i try to install it, it gives the error shown in the first pastebin. happens locally and in venv | 15:36 |
tristan | ok but what OS are you running ? | 15:37 |
Frazer61 | fedora 32 | 15:37 |
tristan | don't you have a reliable way of installing bzr on your host ? | 15:37 |
Frazer61 | using pip to try and install it | 15:37 |
tristan | what is the latest rpm -i... `aptitude` ? | 15:37 |
tristan | No thats not fedora | 15:37 |
tristan | I forget | 15:37 |
tristan | Frazer61, try using a reliable method provided by your distro to install bzr, that should work | 15:38 |
Frazer61 | when i try to install bzr it tries to install breezy, but i thought it was bazaar. is it fine to continue? | 15:39 |
Frazer61 | via dnf | 15:39 |
* tristan doesnt know what it is on fedora :-S | 15:40 | |
Frazer61 | dnf is equivalent to apt | 15:40 |
tristan | https://fedoraproject.org/wiki/Changes/ReplaceBazaarWithBreezy | 15:40 |
tristan | I mean I don't know what bzr is on fedora :) | 15:40 |
coldtom | breezy is bazaar iirc | 15:40 |
Frazer61 | nice bit of confusion :), thanks will give it a go | 15:41 |
tristan | Frazer61, yeah so it looks like it will work, different implementation of the same thing | 15:41 |
Frazer61 | y | 15:41 |
tristan | indeed confusing | 15:41 |
Frazer61 | sorry ignore that | 15:41 |
*** phildawson has quit IRC | 15:41 | |
Frazer61 | hopefully should get the first bit of bst source show working | 15:41 |
tristan | Frazer61, awesome, please do remember to summarize and get that formal proposal on the list though | 15:42 |
tristan | I know, hacking is more fun | 15:42 |
Frazer61 | will do, just using elements from fetch to get it working and go from there to make a proposal for what we want as a final solution | 15:43 |
tristan | It should pass without issue but not everyone is on IRC all the time, so it's better to have it formally on the list, and we can link to that in a resulting merge request etc for good provenance | 15:43 |
tristan | Yup :) | 15:43 |
benschubert | tristan: it does remove quite a bit of code from ostree to use the cli :D | 16:00 |
benschubert | i should have a prospective PR soon | 16:01 |
WSalmon | https://gitlab.com/BuildStream/buildstream/-/commits/willsalmon/checkout_no_sandbox feels really close to working, do we have a patten for creating a vdir with out a sandbox? i can just make a temp dir context and trigger one with that.. but wonder if we already do it some were i can make sure i am consistent with. juergbi maybe? | 16:03 |
juergbi | WSalmon: could just create a CASBasedDirectory directly | 16:06 |
juergbi | WSalmon: a possible alternative is to create a SandboxDummy | 16:07 |
juergbi | not sure off the top of my head what's the cleanest approach | 16:07 |
WSalmon | thanks juergbi | 16:17 |
benschubert | juergbi: tristan https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/123/ still two tests failing but should work well :) | 16:40 |
juergbi | nice | 16:41 |
benschubert | not sure what's the deal with the failures :/ | 16:59 |
benschubert | thought I had reproduced the API correctly | 16:59 |
tristan | nice | 17:01 |
tristan | what are the failures ? gpg key related ? | 17:02 |
tristan | I think I added a test recently to ensure that keeps working | 17:02 |
tristan | I'm not sure if command line would tolerate changes of gpg key presence well either | 17:03 |
tristan | i.e. it happens in repo initialization, so if you add a gpg key or remove it later, it should probably mean you need a seperate mirror directory for it ? | 17:04 |
tristan | repo url + hash of gpg key ? | 17:04 |
tristan | Either that or, `ostree init` needs to be able to reconfigure the repo at the beginning of every session ? | 17:05 |
benschubert | tristan: one sec, finishing the run, but apparently not, more fetch related afaict? | 17:06 |
tristan | not spotting the problem easily | 17:12 |
* tristan has been looking it over | 17:12 | |
tristan | what happens ? | 17:12 |
benschubert | I'm just waiting on a test to finish to share, will be simpler | 17:13 |
benschubert | but basically `remote origin not found` is one | 17:13 |
benschubert | and `${ref}.commit doesn't exist` | 17:13 |
benschubert | (two different errors) | 17:14 |
tristan | might be is_cached giving you trouble somehow ? | 17:15 |
tristan | that doesnt seem to ensure the remote anymore ? | 17:16 |
benschubert | oh, did it use to? shoot | 17:16 |
tristan | was just noticing that in the diff | 17:16 |
tristan | Also if is_cached() is a delegated method, which I think is one of the things replacing get_consistency(), seems unnecessary to call it in fetch() | 17:17 |
tristan | but probably our plugin facing API docs need to be clarified to specify these invariants | 17:17 |
benschubert | tristan: true I can remove that | 17:18 |
tristan | is_cached() should of course be able to return False if the mirror directory doesnt even exist | 17:19 |
tristan | like we shouldnt require spending time initializing an empty repo | 17:19 |
benschubert | yep I've just fixed that | 17:19 |
benschubert | [gw0] [ 66%] FAILED mirror.py::test_mirror_junction_from_includes[ostree] | 17:20 |
benschubert | [gw4] [ 53%] FAILED tests/sources/ostree.py::test_fetch_gpg_verify | 17:20 |
benschubert | Those are the two failures now | 17:20 |
tristan | clue: The reason why we don't just use 'origin' all around is because of mirrors | 17:21 |
tristan | if we have some commands that just rely on default remote, that could cause mirror tests to fail | 17:22 |
benschubert | so I need to set the default remote all the time? | 17:22 |
benschubert | I thought we only set one remote for every one? | 17:22 |
tristan | I don't think so | 17:22 |
tristan | Just guessing every command needs to specify the remote | 17:23 |
tristan | maybe in the old code we were setting the default remote ? not sure | 17:23 |
benschubert | https://gitlab.com/BuildStream/bst-plugins-experimental/-/jobs/623441172#L575 is one of them :/ | 17:24 |
benschubert | yeah I'm really not familiar with ostree | 17:24 |
*** hasebastian has quit IRC | 17:27 | |
WSalmon | so this works locally with and without --no-intergration https://gitlab.com/BuildStream/buildstream/-/commits/willsalmon/checkout_no_sandbox juergbi the only diffrent in reality as far as i can tell is that self.__configure_sandbox(sandbox) is not called, would that effect what gets checked out for a --no--intergartion checkout? | 17:29 |
WSalmon | juergbi, ^ | 17:29 |
WSalmon | as far as i can tell that would only effect the case with intergration | 17:29 |
tristan | benschubert, looks like it should be fairly easy to figure out with the failed test data and the ostree command line locally though | 17:30 |
tristan | You can literally copy paste the commands, run them on the test data and figure out where things went wrong | 17:30 |
tristan | I'm not sure if you want --mirror for instance, you might, I don't know | 17:31 |
benschubert | tristan: right, that seems the right way to go :D | 17:33 |
benschubert | Seems also like my multiprocessing change prevents termination :/ | 17:36 |
benschubert | pause works perfectly though | 17:36 |
benschubert | so there is that | 17:36 |
tristan | cool ! | 17:51 |
tristan | benschubert, checking `ps axf` to make sure you've put everything to sleep ? | 17:52 |
benschubert | I'll try that | 17:52 |
tristan | TS I think you want... iirc | 17:52 |
benschubert | So I was building LLVM, 12 cores used, LVA 12. ctrl + c and in a second no more core used at all :) | 17:53 |
benschubert | that works | 17:53 |
benschubert | the only thing I'm not 100% sure is around the python code | 17:53 |
benschubert | and contrinue -> restart immediatly | 17:53 |
tristan | some things don't respond well, I think ninja might still suck at job control wrt SIGTSTP | 17:54 |
tristan | not our fault if those process trees don't suspend | 17:54 |
benschubert | yep, we're doing the best we can | 17:54 |
benschubert | but unless we are the kernel, well :D | 17:54 |
tristan | We could get better support from the kernel in the future | 17:55 |
tristan | with containers | 17:55 |
tristan | who knows :) | 17:55 |
benschubert | ah namespaces yeah :) | 17:55 |
* benschubert is loosing his sanity on the last bug of ostree | 17:55 | |
tristan | join the club :) | 17:55 |
benschubert | WSalmon: what are the symptoms of your OOM errors again? | 17:56 |
tristan | I've lost it already earlier this week trying to get non-recursive variable resolution to perform decently :) | 17:56 |
tristan | Another thing to consider wrt ostree: compatibility with the older plugin | 17:56 |
benschubert | what kind of compatibility? | 17:57 |
tristan | if for any reason we do things differently (repo config or smth) causing an old cached repo to not work with the new cli based access commands | 17:57 |
tristan | then we'd have to namespace the cache directory | 17:57 |
benschubert | no I don't think that's a problem | 17:58 |
tristan | and trigger an expensive re-download of probably a lot of data (ostree mostly used for big downloads) | 17:58 |
tristan | yeah I doubt it | 17:58 |
WSalmon | i cant find a example but casd just uses all of the memory | 18:03 |
benschubert | ok I'm giving up for today, will come back to it tomorrow :'D | 18:03 |
benschubert | WSalmon: And what exit code? | 18:03 |
WSalmon | then it depends what needs a chunk of memery once its all gona | 18:03 |
benschubert | ok, will need to monitor my build more closely | 18:04 |
benschubert | it's not succeeding and I've got 32Go of Ram :/ | 18:04 |
WSalmon | :( | 18:05 |
WSalmon | i should be able to find one but not now | 18:05 |
WSalmon | ill ping you one | 18:05 |
WSalmon | when i find it | 18:05 |
WSalmon | have a good non buildstreaming evening :D | 18:06 |
*** tomaz has joined #buildstream | 18:36 | |
tomaz | people, is there any `kind` for installing a ruby-gem? | 18:37 |
*** tomaz has quit IRC | 18:51 | |
WSalmon | ohhh i helped some one out with one a while back | 19:24 |
WSalmon | cant rember if it was here or in FD-SDK | 19:25 |
benschubert | Yep I can confirm, with casd's master from two days ago I can't build llvm on a 32GB machine | 19:35 |
WSalmon | toma | 20:49 |
WSalmon | sorry fat figgers | 20:49 |
*** pointswaves has joined #buildstream | 21:01 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!