*** phildawson has quit IRC | 02:42 | |
*** traveltissues has quit IRC | 05:42 | |
*** tristan has quit IRC | 05:54 | |
*** mohan43u has quit IRC | 05:59 | |
*** mohan43u has joined #buildstream | 05:59 | |
*** benschubert has joined #buildstream | 07:17 | |
*** Trevinho has joined #buildstream | 07:52 | |
*** benschubert has quit IRC | 08:02 | |
*** benschubert has joined #buildstream | 08:04 | |
*** Trevinho_ has quit IRC | 08:05 | |
benschubert | Would people be ok with making a new dev release of bst-plugins-experimental? It's accumulated quite a bit of bug fixes over time | 08:19 |
---|---|---|
WSalmon | +1 for new devtag for bst-plg-exp, on a related note we could also do with a new dev tag of bst soon, as the cache push fixes are a huge deal but i think there are still some bugs in master that it might be good to squash first, eg, junctions with the same name in subprojcets and checkout cross arch | 08:24 |
WSalmon | douglaswinship, did you land some fixes in bst-plugins-experimental recently or are they on there way, if they have not landed could we hold off till they are in? | 08:25 |
WSalmon | benschubert, ^ | 08:25 |
benschubert | WSalmon: there is no open MR from douglaswinship on experimental. We can wait a bit, but we'll need a new one soon as I need the ostree changes for some changes in master | 08:27 |
WSalmon | douglaswinship, is usually about from a little after 10am uk time so we should be clearer then | 08:30 |
WSalmon | i think it was https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/125 so lets wait for him to confirm but if it is just that one then a tag would probably help me and douglaswinship too | 08:32 |
benschubert | ok! | 08:36 |
*** santi has joined #buildstream | 08:42 | |
*** tristan has joined #buildstream | 08:52 | |
*** ChanServ sets mode: +o tristan | 08:52 | |
juergbi | WSalmon: do you know of a bug with regards to junctions in master? please note that semantics have changed as discussed on ML (and here) | 09:06 |
tristan | juergbi, I verified, there is indeed a bug (but it's not the same one which WSalmon encountered and pasted on IRC) | 09:11 |
tristan | I will need to reproduce in a test case and fix it properly | 09:12 |
WSalmon | juergbi, sorry i meant to make a issue, i rased the issue here with tristan when i tried to go from 1.93.4 to master but i have not made a issue | 09:12 |
WSalmon | ah tristan beet me to the punch | 09:12 |
juergbi | ok | 09:12 |
* WSalmon dose to file a issue | 09:13 | |
WSalmon | goes | 09:13 |
tristan | it has something to do with a toplevel having a link to a subsubproject, in a certain order; we hit an exception in https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_loader/loader.py#L404 | 09:13 |
tristan | WSalmon, I do have a feeling you were not using master when you posted it, as you posted something like "ERROR: Dude, you gave me a 'link' element, but I wanted a 'junction' here !" | 09:14 |
tristan | The issue I get with master is that there is no ElementFactory created in the subproject in order to instantiate the junction | 09:15 |
tristan | It would appear the order in which subprojects get instantiated is significant, perhaps when the link accesses the subsubproject before the direct junction does, the order of project resolution is messed up | 09:15 |
tristan | Something like project.ensure_fully_loaded() is required at the right step | 09:16 |
WSalmon | tristan, sorry i missed the tags to me from yesteday when i logged on today i will make a issue and check the reproduction | 09:16 |
tristan | Thanks :) | 09:16 |
tristan | I do have a lot of new tests landed with the branch, so I think this is really about order of which subprojects get loaded (I'm sure I have a similar test scenario, but I'll bet that the direct subproject is instantiated before a link-through-subproject is accessed) | 09:17 |
douglaswinship | WSalmon, benschubert: Yes, https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/125 was the change I was pushing for in plugins-experimental, and it merged yesterday. | 09:25 |
douglaswinship | So I'm fine with a new dev reliease. | 09:26 |
benschubert | Ok! tristan, juergbi anything against a bst-plugins-experimental release now? | 09:26 |
douglaswinship | WSalmon: thanks for raising it btw. Good to know you're looking out for me and my MRs :) | 09:27 |
juergbi | benschubert: no objections | 09:29 |
WSalmon | douglaswinship, no worries :D | 09:29 |
WSalmon | tristan, please see https://gitlab.com/BuildStream/buildstream/-/issues/1359 i have updated with a clearer reproduction method and have also done some hacking and it looks like it might be the includes and the order, i have included my findings in the issue | 09:37 |
tristan | WSalmon, yeah that's exactly the stack trace I saw | 09:45 |
tristan | indeed | 09:45 |
benschubert | https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/126 for bst-plugins-experimental 1.93.4! | 09:50 |
tristan | benschubert, can you run tristan/partial-variables-manual-string-join ? | 10:18 |
benschubert | tristan: sure! | 10:59 |
tristan | benschubert, I could add another one at the same time if we're starting now | 11:01 |
tristan | I think a bit better | 11:01 |
benschubert | tristan: just add what you need and let me know when I can start :) | 11:01 |
benschubert | show only is fine hey? | 11:01 |
tristan | show only is perfectly fine yeah, the rest is rather irrelevant | 11:02 |
tristan | benschubert, similar to last time: tristan/partial-variables-manual-string-join and the before tip | 11:03 |
tristan | c4d0843c1 (tip) 20db0e114 (before tip) | 11:03 |
tristan | benschubert, have any idea if we can improve granularity of what cProfile generates ? | 11:03 |
tristan | it seems to really only enjoy showing us functions in our code | 11:03 |
benschubert | tristan: how do you run it and use it? | 11:04 |
benschubert | I usually do: `python -m cProfile -o profile.pstat $(which bst) my command`, then open it in snakeviz and I can see everything | 11:04 |
tristan | Like, I can't see how much time is spent in say, ResolutionStep() initializers | 11:04 |
tristan | Oh do you | 11:04 |
benschubert | ah for cython? | 11:04 |
benschubert | for cython, you'll need 'annotate=True' in the cythonize function call of setup.py | 11:05 |
benschubert | and then rebuild | 11:05 |
benschubert | we can probably make it more usefriendly sometimes | 11:05 |
benschubert | but need to make sure it's not accidentaly turned on as it destroys performance :) | 11:06 |
tristan | benschubert, I've followed the 'buildstream' way and did: https://gitlab.com/BuildStream/buildstream/-/commit/4a148bc3ccdfe509e64240e1b59d186bd6764c13 | 11:06 |
tristan | Then I run BST_CYTHON_PROFILE=1 tox -e venv /bin/bash | 11:07 |
tristan | and in there, I run BST_PROFILE=variables-check bst show element.bst | 11:07 |
benschubert | ah we already have BST_CYTHON_PROFILE | 11:07 |
benschubert | then that should have worked | 11:07 |
tristan | It "works" | 11:07 |
benschubert | can you delete the .so's and re-run? | 11:07 |
tristan | benschubert, https://bpa.st/QHWQ | 11:08 |
tristan | I'm just saying, it's a little "light" | 11:08 |
tristan | it tells me cumulative time in functions I've defined | 11:09 |
benschubert | can't access this, proxies :'D | 11:09 |
benschubert | snippet? | 11:09 |
tristan | ugh, how come bpaste went all weird | 11:09 |
benschubert | benchmark running btw | 11:10 |
tristan | pastebins getting all crazy, using javascript, not accepting middle click for paste (secondary clipboard which is pretty much primary on linux) | 11:10 |
tristan | world going insane | 11:10 |
tristan | emails with html in them | 11:10 |
tristan | what will come next /o\ | 11:10 |
tristan | benschubert, https://pastebin.com/C8MrB7jN | 11:11 |
benschubert | tristan: proxies, please use gitlab snippets :) | 11:11 |
tristan | looks like pastebin.com has reverted to something less javascripty | 11:11 |
tristan | proxies don't let you access normal pastebins ? | 11:11 |
benschubert | Nope | 11:11 |
* tristan goes sets off on a journey to learn the dark arts of "gitlab snippets" | 11:12 | |
tristan | benschubert, https://gitlab.com/snippets/1994285 | 11:13 |
tristan | I just think the report lacks a bit of detail | 11:14 |
benschubert | tristan: mmh weird. do you have the profile thing? can you try opening it in snakeviz? | 11:14 |
tristan | Would be nice to have a few thousand rows | 11:14 |
benschubert | (pip install nakeviz) | 11:14 |
benschubert | *snakeviz | 11:14 |
tristan | It wants to access my browser :-S | 11:15 |
tristan | snakeviz: error: no web browser found: could not locate runnable browser | 11:15 |
tristan | maybe install it on the host | 11:16 |
tristan | benschubert, nothing different | 11:17 |
tristan | it just shows me some bar chart | 11:17 |
tristan | with the same informative rows at the bottom | 11:17 |
tristan | benschubert, of course this is a snapshot of *only* what happens in Variables.check() (the loop which resolves variables) | 11:17 |
tristan | Not the entire bst load process | 11:18 |
tristan | So if you ran the profiler on the whole process you'd get different results | 11:18 |
tristan | I'm just wondering if I can get finer grained information | 11:18 |
benschubert | mmh have you tried what's in https://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html ? | 11:19 |
benschubert | mainly profile=True ? | 11:19 |
tristan | That's enabled | 11:19 |
tristan | otherwise I would probably not have the signatures of the cython functions in context | 11:20 |
benschubert | ah, I think I know | 11:20 |
tristan | what I'm missing is like, how much time spent in ResolutionStep() cython class initializers | 11:20 |
tristan | tiny bits, any function call below my code | 11:20 |
benschubert | after: BST_CYTHON_PROFILE=1 tox -e venv /bin/bash | 11:20 |
benschubert | can you run printenv BST_CYTHON_PROFILE ? | 11:20 |
tristan | I see nothing of what core Python is doing in the reports | 11:20 |
tristan | nothing | 11:21 |
benschubert | gotcha | 11:21 |
benschubert | we don't pass it to tox, so I'm not sure if buildstream is built with it | 11:21 |
tristan | prinenv BST_CYTHON_PROFILE is like `env | grep BST_CYTHON_PROFILE` ? | 11:21 |
benschubert | yeah | 11:21 |
tristan | it doesnt make a different, so it should be | 11:22 |
tristan | I can try without it, but I expect it's built with it | 11:22 |
tristan | benschubert, I discovered BST_CYTHON_PROFILE only this morning | 11:22 |
tristan | before it was a pesky comment at the top of _variables.pyx | 11:22 |
benschubert | if you try without it and it doesn't make a difference, then we need to do some shenanigans to have it passed to the tox env | 11:22 |
benschubert | let me know, I'll let you know what to change if that's the case | 11:22 |
tristan | Right, I'm doing that now | 11:23 |
tristan | Still, I believe it's the same now as it was with the comment | 11:23 |
tristan | And still I don't get anything about python internals | 11:23 |
tristan | just time spent in buildstream defined functions | 11:23 |
tristan | Yeah, confirmed | 11:24 |
benschubert | what kind of python internals would you expect? | 11:24 |
tristan | benschubert, building without BST_CYTHON_PROFILE gives me a 3 line report, compared with the 15 line report I had before | 11:24 |
tristan | I would expect everything possible | 11:24 |
tristan | Like say, time spent in list() object initializers | 11:25 |
tristan | everything | 11:25 |
tristan | I want to see a full report | 11:25 |
benschubert | ah I don't think that's possible in cProfile | 11:25 |
tristan | Like, even if it takes 10 minutes to run a .05 second program | 11:25 |
benschubert | seems you want more some C profiling tool | 11:25 |
benschubert | perf? :) | 11:26 |
tristan | Myeah :-/ | 11:26 |
tristan | I never got friendly with perf actually, I liked the valgrind tool for that | 11:26 |
tristan | there was an extension/companion tool, rendered really nice graphs | 11:26 |
* tristan would think at *least* he would see the initializers of types he defined | 11:27 | |
tristan | like now I have cdef class Foo: (without any __init__ or __cinit__), but creating those Foo doesnt give anything | 11:27 |
benschubert | mmh if you defined a __init__ it should show | 11:28 |
benschubert | __cinit__ too | 11:28 |
benschubert | at that point I'm not sure what's happening | 11:28 |
tristan | But I wouldnt want to know the time spent in those functions | 11:28 |
tristan | I would want to know the total time spent in `foo = Foo()` | 11:29 |
benschubert | So you'll want to look at the time spend in Foo.__new__ | 11:29 |
tristan | yeah, that's not in the report | 11:30 |
benschubert | tristan: https://gitlab.com/snippets/1994291 benchmark result | 11:35 |
tristan | benschubert, .036 off the mark, my tip is slightly better than the before tip | 11:36 |
tristan | It's quite odd, the profiler I think changes the results significantly based on how many times I enter a function | 11:37 |
tristan | I suppose I can instrument further by declaring some functions and splitting things up | 11:40 |
tristan | little bitty wrapper functions | 11:40 |
benschubert | tristan: regarding the threading scheduler: I've got somethign mostly working. The parts that still need to be fixed is around the cancelling (termination), as the python code doesn't react to that. So I'll probably have to have some event that when set, the job code raises an exception, and we would check that at regular intervals. I don't think we can do much nicer. Would you be fine with something like this? | 11:51 |
benschubert | Like something at least in every `with signals.suspendable`, but potentially in some long running parts of the code (downloadsource._ensure_mirror, for example) | 11:52 |
tristan | don't we have signals.terminator() for that ? | 11:53 |
benschubert | because cancel() actually doesn't cancel already running stuff | 11:53 |
benschubert | ah sorry, not suspendable, I meant terminator yes | 11:53 |
tristan | I'm not sure how this is supposed to work with threads, I expect it to be difficult | 11:53 |
tristan | I mean, in the main event loop, we'll of course have a SIGTERM handler to handle it in the main loop | 11:54 |
tristan | in case bst itself receives SIGTERM | 11:54 |
benschubert | Thus why I'm thinking about setting an event should_terminate, that the jobs would check periodically | 11:54 |
tristan | But how we delegate that to supbprocesses is another thing | 11:54 |
tristan | I guess with threads, we need to register any subprocesses | 11:55 |
benschubert | Yeah that's fine | 11:55 |
benschubert | what's less fine is when we don't use a subprocess | 11:55 |
benschubert | like, when we use the urlopen() for downlaoding sources | 11:55 |
benschubert | or when we do grpc calls | 11:55 |
tristan | Right, we need a way to cause threads to return from poll() or select() or whatever is blocking | 11:55 |
benschubert | which I'm not really sure how to do :/ | 11:56 |
tristan | Those anyway wont respond to the regular "Please terminate... Wait a grace period... Sorry bro, you get an axe in your back -> SIGKILL" | 11:56 |
tristan | Yup | 11:57 |
tristan | But there should be a way ;-) | 11:57 |
tristan | Possibly a signal | 11:57 |
benschubert | well, we can't SIGKILL a process | 11:57 |
benschubert | aa thread | 11:57 |
benschubert | we can't send signals to threads | 11:57 |
tristan | benschubert, I mean we're going to still have to do that for subprocesses | 11:57 |
benschubert | yeah and subprocesses are not the concern :) | 11:57 |
benschubert | the concern is python code | 11:57 |
benschubert | when the thread is doing something in python | 11:57 |
tristan | So we probably have any launched subprocesses in a single pool which we can gracefully terminate/kill | 11:57 |
benschubert | like, urlopen() | 11:58 |
benschubert | Subprocesses are handled fine already :) | 11:58 |
tristan | For those, we might provoke a return from poll() with SIGIO | 11:58 |
tristan | And have the event loops observe a terminated state before going back to sleep | 11:58 |
benschubert | well, that still doesn't help for threads :) | 11:58 |
tristan | Or whatever is implementing `await` or whatever the python thing is | 11:58 |
benschubert | which are not running in the ioloop | 11:58 |
tristan | Well, they will be probably running an internal ioloop or such | 11:59 |
benschubert | the concern is really around threads, that are doing long running stuff | 11:59 |
tristan | under the hood | 11:59 |
benschubert | not even sure we have access to such apis | 11:59 |
tristan | While they are blocking, what happens, they sleep in poll(), poll returns automatically on some signals | 11:59 |
tristan | And by default returns to poll | 11:59 |
tristan | until real data is encountered, we need to sneak in and get it to observe a termination state at that point that we provoke it's syscall interruption | 12:00 |
tristan | We could certainly override the implementation right ? | 12:00 |
tristan | I would guess ? | 12:00 |
tristan | Replace it with something that is termination aware | 12:00 |
benschubert | I am not really sure :'D | 12:02 |
benschubert | but I can see whether I can find something | 12:02 |
tristan | plugins using raw python APIs for network connections could be problematic | 12:02 |
benschubert | https://docs.python.org/3/library/select.html | 12:02 |
benschubert | mmh we might | 12:03 |
tristan | it would be sensible to ensure those are off limits | 12:03 |
tristan | But if we could get a blanket approach it could be easier | 12:03 |
tristan | I.e. why would we allow plugins to just go and call stuff like urlopen() without going through a BuildStream API ? | 12:03 |
benschubert | tristan: then I can do much simpler: have `downloadablefilesource` urlopen() read chunk by chunk and on each chunk see if we need to terminate | 12:04 |
* tristan realizes this is largely unpoliced, unfortunately | 12:04 | |
benschubert | We _could_ make the should_terminate something public too, and plugins could react nicely | 12:04 |
tristan | I don't know, I would think something lower level is better | 12:05 |
benschubert | though I'm less in favor of that | 12:05 |
tristan | reading chunk by chunk can still block | 12:05 |
benschubert | which is fine | 12:05 |
benschubert | it's probably not for long | 12:05 |
tristan | Uhhh | 12:05 |
tristan | No, it's the network, can be unpredictable | 12:05 |
tristan | If you terminate at the same time as a connection gets lost, you end up with a case where we fail to terminate | 12:06 |
benschubert | well, until the timeout of the connection throws | 12:06 |
tristan | lost connection might take a long time | 12:06 |
benschubert | and it's rare to have that above a few seconds no? | 12:06 |
tristan | Can we guarantee that the timeout is always under our control ? | 12:06 |
benschubert | no | 12:06 |
tristan | So then I don't think it's appropriate at all | 12:06 |
benschubert | We can also have a timeout on the read, if no events happend in X time | 12:07 |
tristan | I'm not sure it's alright to focus only on DownloadableFileSource honestly; a drop in replacement of urllib in buildstream.utils would be more user friendly | 12:08 |
tristan | otherwise we are saying you can never open a url unless you use DownloadableFileSource | 12:08 |
tristan | We could do that I guess, though | 12:09 |
* tristan back a bit later | 12:10 | |
*** tristan has quit IRC | 12:13 | |
*** jward has quit IRC | 12:56 | |
*** Trevinho has quit IRC | 13:05 | |
*** tristan has joined #buildstream | 14:15 | |
*** ChanServ sets mode: +o tristan | 14:15 | |
tristan | I just went for a walk and will now go get some food, it's been on my mind though; these seems really unfortunate | 14:16 |
benschubert | tristan: I've found a solution :D | 14:16 |
tristan | I came across this SO post https://stackoverflow.com/questions/11817337/how-do-i-gracefully-interrupt-urllib2-downloads which looks grim, maybe things have improved in recent years | 14:16 |
tristan | Did you ? | 14:16 |
benschubert | yes sir :D | 14:16 |
tristan | is it nice and graceful ? what is it ? | 14:17 |
benschubert | https://gitlab.com/BuildStream/buildstream/-/merge_requests/1982/diffs?commit_id=41d3ee544ceabd1c608a0c1231a02916881334c4 | 14:17 |
benschubert | I still need to cleanup | 14:17 |
benschubert | but it seems to work 100% of the time | 14:17 |
* tristan was thinking that perhaps sending ourselves SIGTERM might provoke python to abort system calls but not sure | 14:18 | |
benschubert | I need to check: | 14:18 |
benschubert | - no unintended side effects | 14:18 |
benschubert | - clean flush of log files | 14:18 |
benschubert | - cleanup of casd | 14:18 |
benschubert | - subprocess are closed correctly | 14:18 |
tristan | What am I looking at ? | 14:18 |
benschubert | but apart from that it's 100% transparent to the threads unless they catch the exception we throw | 14:18 |
tristan | so many files | 14:18 |
benschubert | https://gitlab.com/BuildStream/buildstream/-/commit/41d3ee544ceabd1c608a0c1231a02916881334c4 this commit | 14:18 |
tristan | right that's what I'm looking at, ok so... what... abort_thread() ? | 14:19 |
benschubert | yeah | 14:19 |
benschubert | well, everything, it's 22 lines changed ;) | 14:19 |
tristan | PyThreadState_SetAsyncExc | 14:19 |
benschubert | https://docs.python.org/3/c-api/init.html#c.PyThreadState_SetAsyncExc | 14:19 |
tristan | yeah I see | 14:20 |
benschubert | and we could instead send a custom exception so that if we really want to cleanup something we can | 14:20 |
benschubert | but systemexit should be nice enough otherwise | 14:20 |
benschubert | and hard to catch "by accident" | 14:20 |
tristan | from that docs, I don't understand how this aborts a blocking syscall | 14:20 |
tristan | it says "Asynchronously raise an exception in a thread" | 14:21 |
tristan | Which leads me to think, it schedules something to happen in the immediate future | 14:21 |
tristan | after something blocking returns | 14:21 |
benschubert | yes | 14:21 |
tristan | So... not solved then ? | 14:22 |
benschubert | well, it does stop the thread immediatelly during a download | 14:22 |
benschubert | because it causes the poll() selector (which is python) to actually throw at the next invocation afaik | 14:22 |
tristan | :-S | 14:23 |
benschubert | ah shoot doesn't always work: I've tried a `time.sleep(20)` and I'm actually waiting... | 14:23 |
tristan | "the next invocation", but what about the current blocking one | 14:23 |
tristan | That SO post suggests that there are alternatives to urllib which are more suitable to production code | 14:24 |
tristan | It would be nice to have something deeper in python to ensure that we can interrupt anything, but I think it would be perfectly acceptable to enforce that plugins must use a specific networking API surface provided by BuildStream | 14:25 |
tristan | perhaps more than acceptable: expectable ;-P | 14:26 |
benschubert | if the thread becomes blocked on a blocking call I'm not so sure how we could unblock it at any point though | 14:26 |
tristan | there are certainly ways to do it in C, syscalls can be interrupted; so it should be doable in python | 14:27 |
tristan | there is no way we can direct signals at specific threads ? | 14:28 |
benschubert | there is no thread signals at all no | 14:28 |
tristan | perhaps anyway, as you mentioned before, we should have all of these file descriptors in the same event loop | 14:28 |
tristan | In which case we should have control of our poll(), and threads are only blocking on a thread condition | 14:29 |
benschubert | https://docs.python.org/3.4/library/signal.html#signals-and-threads | 14:29 |
benschubert | how do you interrupt a system call in C also? | 14:30 |
tristan | benschubert, I don't know if that is relevant | 14:30 |
benschubert | well, it means we can't send a signal to a thread | 14:30 |
tristan | I.e. it's not necessarily important that a signal handler be invoked in the thread | 14:30 |
benschubert | and then how do you kill the thread? | 14:31 |
tristan | if we sent a given signal to the whole process group and capture it in the main thread; I think we can have a way to provoke blocked syscalls to exit (regardless of where python code handles them) | 14:31 |
tristan | You would then rely on errors being reported from interrupted reads | 14:32 |
tristan | or such | 14:32 |
benschubert | > if we sent a given signal to the whole process group and capture it in the main thread; | 14:32 |
benschubert | Is really not a problem, not even sure why we'd need a signal there. | 14:32 |
benschubert | > ; I think we can have a way to provoke blocked syscalls to exit (regardless of where python code handles them) | 14:32 |
benschubert | Never seen non-kernel code interrupt a syscall. How do you do that? :) | 14:32 |
tristan | The thing is, most posix level code will run a syscall with ret = read(...); and then check if ret < 0; if ret == -EINTR, I believe you normally reenter the syscall | 14:34 |
benschubert | tristan: but how would you cause the EINTR first? | 14:34 |
benschubert | that's probably the first problem to solve | 14:35 |
tristan | SIGINT for sure, but most probably others | 14:35 |
tristan | it's a whole book that is not currently fresh on my mind hehe | 14:35 |
benschubert | we catch sigints, and we start killing at that point | 14:36 |
tristan | where is stephen's manual when you need it | 14:36 |
benschubert | not sure what you want to do in the other threads after that | 14:36 |
tristan | right, so one would think, we should be able to cooperate with sigint | 14:36 |
tristan | perhaps explicitly trigger it if it's not the cause of our desire to terminate, and streamline our exit strategy through sigint (just a thought) | 14:37 |
tristan | possibly another signal, not sure | 14:37 |
tristan | certainly it would make sense that python's SIGINT signal handler (in asyncio loop) would not override the SIGINT initially | 14:38 |
tristan | so if the main thread killed the group with SIGINT, we might be able to rely on that unblocking things, and cleanup asyncrhonously... *really* not sure | 14:39 |
tristan | possibly twisted's url libraries are worth a looksee | 14:39 |
benschubert | tristan: There are things like: https://docs.python.org/3/library/signal.html#signal.pthread_kill, however, if i `time.sleep()` in my thread, nothing will happen either | 14:40 |
tristan | I think time.sleep() uses alarm() | 14:40 |
tristan | sleep should be easy to abort in a uniform fashion though | 14:41 |
tristan | (even though it's not something one would expect) | 14:41 |
tristan | time.sleep() is an interesting test, but of course it's nothing that should ever happen in real life :-S | 14:42 |
* tristan is mostly concerned with a blocking socket connection, establishing the connection or reading when losing connectivity | 14:43 | |
benschubert | tristan: as I told you, my tests with downloads work fine with the previous commit, I'm not sure what else to test | 14:43 |
benschubert | So, disconnectin the internet connection on download? I can test that | 14:44 |
tristan | in this case, it doesnt really matter how small your chunks are, the chances that you are not inside a syscall at the time of connectivity loss are always going to be slim | 14:44 |
tristan | if you're in a loop reading chunks, this only lets you terminate gracefully when you *don't* lose connectivity | 14:45 |
tristan | otherwise you are hostage | 14:45 |
benschubert | Then I'm not sure how you'd expect to ensure it works? | 14:46 |
tristan | best to read the twisted implementation first before answering that in python | 14:46 |
benschubert | tristan: twisted's implementation is a non-blocking, async based one | 14:49 |
benschubert | So unless we decide to break the API and force downlaods to be done in an async way | 14:49 |
benschubert | I don't see a way we could handle this generically | 14:50 |
tristan | https://docs.python.org/3/library/socket.html#socket.socket.settimeout might be helpful | 14:51 |
tristan | Right | 14:51 |
tristan | non-blocking async sounds like something more trustworthy | 14:51 |
benschubert | but that's something that is a lot of work and breaking APIs :) | 14:52 |
tristan | benschubert, forcing them to be done in an async way does not mean the caller needs to use an async API | 14:52 |
tristan | benschubert, it *definitely* means BuildStream has to set the ground rules, saying "you must use *this* API to talk to the internet and nothing else" | 14:52 |
tristan | Which I think is a good API break | 14:52 |
benschubert | which I would rather not do | 14:53 |
benschubert | Now you want to run a scp command, using paramiko, you can't. You need to run scp? That's one more system dependency :) | 14:53 |
tristan | subprocess model is better ? | 14:53 |
tristan | Sure, at least a host depencency in a subprocess can be aborted | 14:54 |
tristan | And we can eventually look into providing sandboxes which provide those host deps | 14:54 |
benschubert | tristan: however, if setting this timeout (like 5-10 seconds?) globally at buildstream import is enough, I think that is a good solution | 14:55 |
benschubert | no idea whether other things would conflict, but hey | 14:55 |
* tristan worries it might not be | 14:55 | |
tristan | benschubert, in honesty, if we find that it "happens to work" with urllib, I would be a hundred times more confident if we started out by wrapping urllib internally in BuildStream and saying that every plugin must use the wrapped API | 14:56 |
tristan | At least this would allow us to fix things properly later | 14:57 |
benschubert | tristan: we have the downloadable filesource already | 14:57 |
benschubert | I'd be wary of forcing plugins to only pass through that | 14:57 |
benschubert | since it might not be the best way for a specific plugin | 14:57 |
tristan | That's not a promise that no other plugin will not use urllib | 14:57 |
benschubert | And if one plugin does a mistake and 'terminate' takes longer, what's the real problem? | 14:57 |
benschubert | The only problem I see is that users of this will realize it takes a long time to terminate | 14:58 |
benschubert | time that might not be that long anyways since most read() operations are chunked anyways | 14:58 |
tristan | A plugin cannot make a mistake if it is only allowed to use BuildStream APIs, if it only uses BuildStream APIs, then only we can make a mistake, and we can fix it | 14:58 |
tristan | this whole conversation is really making me start to favor the idea of using subprocesses instead of threads :-S | 14:59 |
* tristan goes to get some food before it closes | 14:59 | |
benschubert | and have the pickle stuff? | 15:01 |
benschubert | Because if that's the only problem we can also set the threads in the pool as daemon and just not wait for it to close | 15:04 |
benschubert | tristan: https://gitlab.com/snippets/1994376 here is a minimal POC, it's indeed not killed | 15:31 |
tristan | I'm not sure there is much problem with using (or copying) something like twisted internals for this to implement DownloadableFileSource, and not even exposing an API surface (yet) until a need might arise for a downloading API in BuildStream | 15:38 |
tristan | we should be able to get this right | 15:38 |
tristan | I thought we might have something ugly in the pip source which might be using some uncontrolled networking interface, but no; that is safely using Plugin.call() | 15:38 |
tristan | At the very least we could say "Never do that" and open a high severity issue against DownloadableFileSource to make it properly interruptable | 15:40 |
tristan | The "Never use urllib directly from your source plugin" seems to be mandatory with threads no matter how you slice it | 15:40 |
tristan | Sending them off to be their own little daemons doesnt seem adequate, they might pollute log files after buildstream exits, and anyway it would be quite sloppy to allow them life after bst exits | 15:41 |
tristan | s/might pollute log files/might cause all kinds of local state side effects such as polluting log files/ | 15:42 |
benschubert | tristan: otherwise: socket.setdefaulttimeout(10) <- maximum 10 seconds before the socket times out | 15:47 |
benschubert | though if you have a really really slow server, that might not be nice | 15:48 |
benschubert | tristan: Ok I guess we should decide how we move on and go with an implementation. What would you prefer? I'm not really fond of pulling a lot of twised in BuildStream "just" for that, but if we can't avoid | 16:01 |
tristan | a few things to consider | 16:32 |
tristan | an async/non-blocking/interruptable download should not be super onerous, hopefully we could get such a thing to live in a private module with private APIs | 16:32 |
tristan | inspired by twisted perhaps | 16:33 |
tristan | A plugin facing API I would think should be unnecessary | 16:33 |
tristan | Also, having it work properly might not be a good idea for a hard blocker for landing a change to a threading model | 16:33 |
tristan | BuildStream is not an http socket library or a server and should not become one, we should stick to our strengths | 16:34 |
tristan | And that is sandboxing and orchestration | 16:34 |
tristan | Launching commands, hopefully always in sandboxes in some optimal future, should be what we do best | 16:35 |
tristan | benschubert, my impression at this time is, we should get a better handle on what plugins can do in order for BuildStream to successfully deliver a fault tolerant and clean experience all around | 16:36 |
tristan | Maybe this reaches beyond just "Source plugins should not go and try to access the internet using python APIs", maybe the buck stops there | 16:37 |
tristan | Clearly the threading model changes things, though, so a first step would be to clearly/better understand what those changes are | 16:37 |
benschubert | tristan: I'm not sure having to use host tools for downloads is that great a model though :/ | 16:38 |
benschubert | Yeah | 16:38 |
tristan | benschubert, I also don't think this is a suitable call to be making in a midnight IRC chat | 16:38 |
benschubert | hahaha sure | 16:38 |
tristan | Well, I think host tools are perfect for downloads in the cases we use them, and we should focus on making those sandboxable | 16:38 |
benschubert | So, the move to a threading thing is not blocking anything else for bst 2.0 right? | 16:38 |
tristan | so that people can use pre-baked deterministic images containing all the host tools they need (git, bzr, etc) | 16:39 |
benschubert | If that's still correct, I'll just pause that part of the work for now | 16:39 |
benschubert | Do we want this for bst2.0? :'D | 16:39 |
tristan | benschubert, I am starting to think though, for downloading regular files, DownloadableFileSource abstract class should really be enough | 16:39 |
benschubert | I think so too | 16:39 |
tristan | So if we could keep that sealed in, and say; you wanna download a file: subclass this; then I think it would not be too bad of a limitation | 16:40 |
benschubert | since it can do http,https,ftp | 16:40 |
benschubert | I'd agree with that | 16:40 |
tristan | That should in theory let us bake in a more sophisticated implementation than urllib | 16:40 |
tristan | without necessarily blocking immediate progress | 16:40 |
benschubert | tristan: so you'd rather continue now with the push to a threading model right? | 16:40 |
tristan | but I would be partial to outlining policy | 16:40 |
tristan | I think so, that's my gut feeling, and it really is dependent on the idea that we have something sealed | 16:41 |
tristan | i.e. we could treat it as a bug | 16:41 |
tristan | by sealed, of course I mean: we need to have a policy in place which ensures that we've encapsulated this problem to an internal code body which we can eventually fix | 16:42 |
tristan | We don't allow plugin authors to write code which has the same bugs | 16:42 |
benschubert | tristan: there is an "immediate fix" which would be to add a timeout for now | 16:43 |
tristan | (of course this means a proper guide for plugins, but that starts with agreed policy) | 16:43 |
tristan | Right, that could be the short term thing | 16:43 |
benschubert | Ok, so ML post around preventing downloads from something else than a 'downloadablesource' ? | 16:43 |
benschubert | and if agreement, we continue on the threading | 16:43 |
tristan | ML post sounds like a good start | 16:44 |
tristan | It would be good if the post considers what kind of abstract guidelines which plugin authors need to follow | 16:45 |
tristan | Like avoidance of blocking system calls, and explanations about how buildstream goes about force terminating threads and such | 16:46 |
tristan | Maybe there is a cancellable API or such which plugins could implement, I don't know | 16:46 |
benschubert | ok I'll probably prepare a proposal next week for this then :) | 16:46 |
benschubert | thanks! | 16:46 |
tristan | sorry if I've been irritated at all today, this all came up quite suddenly :) | 16:48 |
* tristan has been struggling at the same time to put everlasting variables behind him while other important things creep up :-S | 16:48 | |
benschubert | tristan: no worries, I've been banging my head on this for more than two weeks now and it was good to have someone to discuss it with :D | 16:50 |
benschubert | do you want me to have a look at the variables stuff? | 16:50 |
benschubert | tristan: actually, otherwise we can ask for the downloads in python to be run in a subprocess. That could work too if we 'spawn' :) | 16:51 |
tristan | benschubert, with the same limitation that we do seal the API and come up with policy, this does seem like a very sane alternative to importing heaps of code we don't need to specialize in yes :) | 16:55 |
tristan | for variables, I am having a hard time shaving off those .4 remaining seconds, I have a couple of possible optimizations to consider | 16:57 |
benschubert | tristan: I believe a ```.. warning: When making long running operations in the plugins in python, please consider using a subprocess (multiprocess.Process), in order to allow us to nicely terminate BuildStream on demand``` | 16:57 |
*** toscalix has joined #buildstream | 16:57 | |
benschubert | in the docs would be a good thing | 16:57 |
tristan | Some of the profiling has shown longer durations locally but better performance in your benchmarks | 16:58 |
tristan | benschubert, I don't like "... please consider ..." | 16:59 |
benschubert | tristan: profiling of cython will add operations at each function call that you would not see in real C calls | 16:59 |
benschubert | tristan: `Please use...` ? | 16:59 |
tristan | benschubert, I would rather speak in terms of "must" | 16:59 |
benschubert | tristan: sure that would work | 16:59 |
benschubert | and it might make everything nicer no? | 16:59 |
* tristan not sure which "it" is making things nicer | 17:00 | |
benschubert | using multiprocessing.Process for those specific parts of the code :) | 17:00 |
tristan | I don't know if multiprocessing.Process is on the table | 17:01 |
tristan | Can it be used ? | 17:01 |
tristan | subprocess could be used I think, can't you use subprocess and pass it a function ? or you absolutely need an executable file on the host ? | 17:02 |
tristan | benschubert, I think that multiprocessing.Process (which we currently use for jobs) only works when you are careful to never have any threads going on in the main process | 17:02 |
benschubert | tristan: if you 'spawn' we should be good | 17:03 |
tristan | i.e. fork() without execve(): a perfectly fine and good multitasking model, as long as libraries dont do crazy nonsense like starting their own threads without being asked to | 17:03 |
tristan | Ok, so spawn takes care of encapsulating it all into a re-exec ? | 17:04 |
tristan | and you can give it a Queue() ? not sure about this | 17:04 |
tristan | but you probably have your head in the docs and know better :) | 17:04 |
* tristan thought multiprocessing was really only for the fork() copy-on-write memory model | 17:04 | |
benschubert | https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods nope :) | 17:05 |
benschubert | and yeah you can give it a queue | 17:06 |
*** xjuan_ has joined #buildstream | 17:06 | |
benschubert | tristan: we could event make it more abstract maybe: Having a 'timed_activity' variant that takes a callable that we run in a multiprocessing.Process ourselves. That way we cna ensure everything works nicely. What do you think? (credits to cs-shadow) | 17:14 |
tristan | benschubert, that sounds promising | 17:18 |
tristan | benschubert, we should also make it have stdout/stderr pointed to the appropriate open logging fd | 17:19 |
benschubert | exactly | 17:19 |
benschubert | and we would pass the 'return_value' through a queue under our control | 17:20 |
tristan | yeah this looks like a decent way forward, subprocess anything which can get blocked in a system call | 17:20 |
benschubert | I think that is even something we can do first, without doing the rest of the changes | 17:20 |
benschubert | ah wait no | 17:20 |
tristan | of course for our internal cas interactions, we'd not want to do that | 17:20 |
tristan | but for that I'm sure we have better APIs which allow a more delicate touch | 17:20 |
benschubert | yeah cas should not block like this :) | 17:21 |
benschubert | since it's local | 17:21 |
benschubert | and if it dies, it gets cleaned up | 17:21 |
tristan | Well pushing to artifact servers etc | 17:21 |
benschubert | that's done by casd itself | 17:21 |
tristan | but this is grpc, I'll bet we can do it without blocking | 17:21 |
benschubert | yep | 17:21 |
tristan | oh yeah that's true also, casd takes care of it | 17:22 |
tristan | if you do have some inspiration around: https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables-manual-string-join/src/buildstream/_variables.pyx#L319, let me know | 17:23 |
tristan | currently I'm thinking: convert ResolutionStep to a struct instead of an object, possibly reuse the values array and avoid re-allocation | 17:24 |
tristan | I'm quite happy with the bottom half: https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables-manual-string-join/src/buildstream/_variables.pyx#L372 | 17:24 |
benschubert | One question that might be dumb but hey: Do you know if there is more work done somehow? Like is the algorithm at least as efficient? | 17:25 |
tristan | The separation between Variables._resolve() and Variables._resolve_value() ensures we never go into the loop for a resolved variable, while also ensuring there is never an extra dictionary lookup for the toplevel varname | 17:25 |
tristan | benschubert, there is a little more work at startup, populating the ValueClass table | 17:26 |
tristan | this is rather constant for large projects | 17:27 |
tristan | and doesnt appear to take much in profiling | 17:27 |
tristan | benschubert, however, the main thing is that the old algorithm is recursive, and the new one necessarily needs to track state on the heap | 17:27 |
tristan | Another property of the old algorithm is that broken down list-of-strings (returned by re.split()) are replaced inline with resolved values in the same dictionary | 17:28 |
tristan | I was thinking of replicating this behavior, it's possible, but I don't see how it should/would really improve performance | 17:29 |
tristan | and it would be messier and harder to follow I think (we'd end up checking if values are isinstance(str) or isinstance(value)) | 17:29 |
tristan | There is of course a tiny indirection of getting the Value and then (probably internal pointer arithmetic) to get from Value -> Value._resolved | 17:30 |
tristan | the recursion state is mostly captured by ResolutionStep, which is a multi-link list | 17:31 |
tristan | reverse linked list with pointers to parent "stack frame" basically | 17:31 |
tristan | and left to the gc to deal with | 17:31 |
tristan | So, changing that to structs and freeing them up could help | 17:32 |
tristan | If I just kept a pointer to the last step, I could reverse free the whole list in a go at the end, and avoid it being a PyObject | 17:33 |
tristan | asides from the instantiation overhead, there is also a ref-counting overhead when assigning link relations | 17:34 |
tristan | whenever you say step = step.prev or such | 17:35 |
tristan | we're probably talking about 2 cycles per assignment here, though | 17:35 |
benschubert | yeah, which probably doesn't account to .4s :) | 17:36 |
benschubert | gah, that's an itchy one too | 17:36 |
tristan | right | 17:36 |
tristan | I am a bit curious about https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables-manual-string-join/src/buildstream/_variables.pyx#L791 | 17:37 |
tristan | benschubert, This gets called every time we enter _resolve_value() | 17:37 |
tristan | and freed at the end | 17:37 |
tristan | seems like it could be cached, and never realloced, at least until Variables.check() is complete | 17:38 |
tristan | so the array could grow as needed on allocation, length could be set to 0 upon entry of the loop and the same memory reused | 17:38 |
benschubert | so create it before and pass it as argument? | 17:38 |
tristan | Well, it could be stored on Variables directly | 17:39 |
tristan | as a cache | 17:39 |
benschubert | right | 17:39 |
tristan | you only need one as it's not reentrant | 17:39 |
benschubert | as long as it's cleaned up correctly | 17:39 |
tristan | right, and we could give it a cleanup at the end of check() (and in __dealloc__) | 17:39 |
benschubert | yup | 17:40 |
tristan | or even at the end of everything which incurs variable resolutions (like _expand()) | 17:40 |
tristan | still, does that account for .4 seconds ? maybe it all culmulatively does | 17:40 |
tristan | but no matter how I slice it, the logic to me appears that; if you go from recursive to non-recursive, you have to manage more state on the heap, and this must be the main source of the overhead | 17:41 |
tristan | I could be wrong but I think it's a sound theory ;-) | 17:42 |
tristan | of course, it should be possible to get very close anyway | 17:42 |
tristan | it's just a very delicate dance to do so | 17:42 |
tristan | If you look at _resolve_value(), the original code basically drives through only the last part of the loop, from left to right instead of right to left, and expands variables recursively as needed | 17:43 |
benschubert | right | 17:44 |
benschubert | I'll bump my head against it a moment this weekend to see what I can get out of it | 17:44 |
tristan | for part in value.value_parts: list.append(possibly_expanded(part)) ... "".join(list) | 17:44 |
tristan | I'll try my micro-optimizations (cache/reuse the array, structize ResolutionStep) tomorrow and we can see if it makes any dents | 17:46 |
benschubert | ok! | 17:52 |
benschubert | tristan: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1982/diffs?commit_id=056d466cb51a98a0de6a266d55f284fb54e594c9 it's still lacking a bit on the handling but it's mostly there now | 18:04 |
benschubert | how would you find this solution? | 18:04 |
tristan | not using blocking_activity() yet ? | 18:08 |
benschubert | woops my bad | 18:08 |
tristan | :) | 18:08 |
tristan | something along those lines seems good though :) | 18:09 |
tristan | maybe ensure Plugin objects dont get pulled in (no self._foo callbacks) | 18:10 |
tristan | ensure that code inside there doesnt have buildstream API and state, and can't try weird things like self.message() | 18:11 |
benschubert | If that happens we get a pickling error | 18:12 |
*** toscalix has quit IRC | 18:29 | |
*** mohan43u has quit IRC | 19:01 | |
*** santi has quit IRC | 19:06 | |
*** mohan43u has joined #buildstream | 20:38 | |
*** mohan43u has quit IRC | 20:44 | |
*** mohan43u has joined #buildstream | 20:48 | |
*** mohan43u has quit IRC | 20:57 | |
*** mohan43u has joined #buildstream | 21:11 | |
*** pointswaves has quit IRC | 21:16 | |
*** mohan43u has quit IRC | 21:17 | |
*** xjuan_ has quit IRC | 22:36 | |
*** benschubert has quit IRC | 22:37 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!