IRC logs for #buildstream for Thursday, 2020-07-09

*** phildawson has quit IRC02:42
*** traveltissues has quit IRC05:42
*** tristan has quit IRC05:54
*** mohan43u has quit IRC05:59
*** mohan43u has joined #buildstream05:59
*** benschubert has joined #buildstream07:17
*** Trevinho has joined #buildstream07:52
*** benschubert has quit IRC08:02
*** benschubert has joined #buildstream08:04
*** Trevinho_ has quit IRC08:05
benschubertWould people be ok with making a new dev release of bst-plugins-experimental? It's accumulated quite a bit of bug fixes over time08:19
WSalmon+1 for new devtag for bst-plg-exp, on a related note we could also do with a new dev tag of bst soon, as the cache push fixes are a huge deal but i think there are still some bugs in master that it might be good to squash first, eg, junctions with the same name in subprojcets and checkout cross arch08:24
WSalmondouglaswinship, did you land some fixes in  bst-plugins-experimental recently or are they on there way, if they have not landed could we hold off till they are in?08:25
WSalmonbenschubert, ^08:25
benschubertWSalmon: there is no open MR from douglaswinship on experimental. We can wait a bit, but we'll need a new one soon as I need the ostree changes for some changes in master08:27
WSalmondouglaswinship, is usually about from a little after 10am uk time so we should be clearer then08:30
WSalmoni think it was https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/125 so lets wait for him to confirm but if it is just that one then a tag would probably help me and douglaswinship too08:32
benschubertok!08:36
*** santi has joined #buildstream08:42
*** tristan has joined #buildstream08:52
*** ChanServ sets mode: +o tristan08:52
juergbiWSalmon: do you know of a bug with regards to junctions in master? please note that semantics have changed as discussed on ML (and here)09:06
tristanjuergbi, I verified, there is indeed a bug (but it's not the same one which WSalmon encountered and pasted on IRC)09:11
tristanI will need to reproduce in a test case and fix it properly09:12
WSalmonjuergbi, sorry i meant to make a issue, i rased the issue here with tristan when i tried to go from 1.93.4 to master but i have not made a issue09:12
WSalmonah tristan beet me to the punch09:12
juergbiok09:12
* WSalmon dose to file a issue09:13
WSalmongoes09:13
tristanit has something to do with a toplevel having a link to a subsubproject, in a certain order; we hit an exception in https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_loader/loader.py#L40409:13
tristanWSalmon, I do have a feeling you were not using master when you posted it, as you posted something like "ERROR: Dude, you gave me a 'link' element, but I wanted a 'junction' here !"09:14
tristanThe issue I get with master is that there is no ElementFactory created in the subproject in order to instantiate the junction09:15
tristanIt would appear the order in which subprojects get instantiated is significant, perhaps when the link accesses the subsubproject before the direct junction does, the order of project resolution is messed up09:15
tristanSomething like project.ensure_fully_loaded() is required at the right step09:16
WSalmontristan, sorry i missed the tags to me from yesteday when i logged on today i will make a issue and check the reproduction09:16
tristanThanks :)09:16
tristanI do have a lot of new tests landed with the branch, so I think this is really about order of which subprojects get loaded (I'm sure I have a similar test scenario, but I'll bet that the direct subproject is instantiated before a link-through-subproject is accessed)09:17
douglaswinshipWSalmon, benschubert: Yes, https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/125 was the change I was pushing for in plugins-experimental, and it merged yesterday.09:25
douglaswinshipSo I'm fine with a new dev reliease.09:26
benschubertOk! tristan, juergbi anything against a bst-plugins-experimental release now?09:26
douglaswinshipWSalmon: thanks for raising it btw. Good to know you're looking out for me and my MRs :)09:27
juergbibenschubert: no objections09:29
WSalmondouglaswinship, no worries :D09:29
WSalmontristan, please see https://gitlab.com/BuildStream/buildstream/-/issues/1359 i have updated with a clearer reproduction method and have also done some hacking and it looks like it might be the includes and the order, i have included my findings in the issue09:37
tristanWSalmon, yeah that's exactly the stack trace I saw09:45
tristanindeed09:45
benschuberthttps://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/126 for bst-plugins-experimental 1.93.4!09:50
tristanbenschubert, can you run tristan/partial-variables-manual-string-join ?10:18
benschuberttristan: sure!10:59
tristanbenschubert, I could add another one at the same time if we're starting now11:01
tristanI think a bit better11:01
benschuberttristan: just add what you need and let me know when I can start :)11:01
benschubertshow only is fine hey?11:01
tristanshow only is perfectly fine yeah, the rest is rather irrelevant11:02
tristanbenschubert, similar to last time: tristan/partial-variables-manual-string-join and the before tip11:03
tristanc4d0843c1 (tip) 20db0e114 (before tip)11:03
tristanbenschubert, have any idea if we can improve granularity of what cProfile generates ?11:03
tristanit seems to really only enjoy showing us functions in our code11:03
benschuberttristan: how do you run it and use it?11:04
benschubertI usually do: `python -m cProfile -o profile.pstat $(which bst) my command`, then open it in snakeviz and I can see everything11:04
tristanLike, I can't see how much time is spent in say, ResolutionStep() initializers11:04
tristanOh do you11:04
benschubertah for cython?11:04
benschubertfor cython, you'll need 'annotate=True' in the cythonize function call of setup.py11:05
benschubertand then rebuild11:05
benschubertwe can probably make it more usefriendly sometimes11:05
benschubertbut need to make sure it's not accidentaly turned on as it destroys performance :)11:06
tristanbenschubert, I've followed the 'buildstream' way and did: https://gitlab.com/BuildStream/buildstream/-/commit/4a148bc3ccdfe509e64240e1b59d186bd6764c1311:06
tristanThen I run BST_CYTHON_PROFILE=1 tox -e venv /bin/bash11:07
tristanand in there, I run  BST_PROFILE=variables-check bst show element.bst11:07
benschubertah we already have BST_CYTHON_PROFILE11:07
benschubertthen that should have worked11:07
tristanIt "works"11:07
benschubertcan you delete the .so's and re-run?11:07
tristanbenschubert, https://bpa.st/QHWQ11:08
tristanI'm just saying, it's a little "light"11:08
tristanit tells me cumulative time in functions I've defined11:09
benschubertcan't access this, proxies :'D11:09
benschubertsnippet?11:09
tristanugh, how come bpaste went all weird11:09
benschubertbenchmark running btw11:10
tristanpastebins getting all crazy, using javascript, not accepting middle click for paste (secondary clipboard which is pretty much primary on linux)11:10
tristanworld going insane11:10
tristanemails with html in them11:10
tristanwhat will come next /o\11:10
tristanbenschubert, https://pastebin.com/C8MrB7jN11:11
benschuberttristan: proxies, please use gitlab snippets :)11:11
tristanlooks like pastebin.com has reverted to something less javascripty11:11
tristanproxies don't let you access normal pastebins ?11:11
benschubertNope11:11
* tristan goes sets off on a journey to learn the dark arts of "gitlab snippets"11:12
tristanbenschubert, https://gitlab.com/snippets/199428511:13
tristanI just think the report lacks a bit of detail11:14
benschuberttristan: mmh weird. do you have the profile thing? can you try opening it in snakeviz?11:14
tristanWould be nice to have a few thousand rows11:14
benschubert(pip install nakeviz)11:14
benschubert*snakeviz11:14
tristanIt wants to access my browser :-S11:15
tristansnakeviz: error: no web browser found: could not locate runnable browser11:15
tristanmaybe install it on the host11:16
tristanbenschubert, nothing different11:17
tristanit just shows me some bar chart11:17
tristanwith the same informative rows at the bottom11:17
tristanbenschubert, of course this is a snapshot of *only* what happens in Variables.check() (the loop which resolves variables)11:17
tristanNot the entire bst load process11:18
tristanSo if you ran the profiler on the whole process you'd get different results11:18
tristanI'm just wondering if I can get finer grained information11:18
benschubertmmh have you tried what's in https://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html ?11:19
benschubertmainly profile=True ?11:19
tristanThat's enabled11:19
tristanotherwise I would probably not have the signatures of the cython functions in context11:20
benschubertah, I think I know11:20
tristanwhat I'm missing is like, how much time spent in ResolutionStep() cython class initializers11:20
tristantiny bits, any function call below my code11:20
benschubertafter: BST_CYTHON_PROFILE=1 tox -e venv /bin/bash11:20
benschubertcan you run printenv BST_CYTHON_PROFILE ?11:20
tristanI see nothing of what core Python is doing in the reports11:20
tristannothing11:21
benschubertgotcha11:21
benschubertwe don't pass it to tox, so I'm not sure if buildstream is built with it11:21
tristanprinenv BST_CYTHON_PROFILE is like `env | grep BST_CYTHON_PROFILE` ?11:21
benschubertyeah11:21
tristanit doesnt make a different, so it should be11:22
tristanI can try without it, but I expect it's built with it11:22
tristanbenschubert, I discovered BST_CYTHON_PROFILE only this morning11:22
tristanbefore it was a pesky comment at the top of _variables.pyx11:22
benschubertif you try without it and it doesn't make a difference, then we need to do some shenanigans to have it passed to the tox env11:22
benschubertlet me know, I'll let you know what to change if that's the case11:22
tristanRight, I'm doing that now11:23
tristanStill, I believe it's the same now as it was with the comment11:23
tristanAnd still I don't get anything about python internals11:23
tristanjust time spent in buildstream defined functions11:23
tristanYeah, confirmed11:24
benschubertwhat kind of python internals would you expect?11:24
tristanbenschubert, building without BST_CYTHON_PROFILE gives me a 3 line report, compared with the 15 line report I had before11:24
tristanI would expect everything possible11:24
tristanLike say, time spent in list() object initializers11:25
tristaneverything11:25
tristanI want to see a full report11:25
benschubertah I don't think that's possible in cProfile11:25
tristanLike, even if it takes 10 minutes to run a .05 second program11:25
benschubertseems you want more some C profiling tool11:25
benschubertperf? :)11:26
tristanMyeah :-/11:26
tristanI never got friendly with perf actually, I liked the valgrind tool for that11:26
tristanthere was an extension/companion tool, rendered really nice graphs11:26
* tristan would think at *least* he would see the initializers of types he defined11:27
tristanlike now I have cdef class Foo: (without any __init__ or __cinit__), but creating those Foo doesnt give anything11:27
benschubertmmh if you defined a __init__ it should show11:28
benschubert__cinit__ too11:28
benschubertat that point I'm not sure what's happening11:28
tristanBut I wouldnt want to know the time spent in those functions11:28
tristanI would want to know the total time spent in `foo = Foo()`11:29
benschubertSo you'll want to look at the time spend in Foo.__new__11:29
tristanyeah, that's not in the report11:30
benschuberttristan: https://gitlab.com/snippets/1994291 benchmark result11:35
tristanbenschubert, .036 off the mark, my tip is slightly better than the before tip11:36
tristanIt's quite odd, the profiler I think changes the results significantly based on how many times I enter a function11:37
tristanI suppose I can instrument further by declaring some functions and splitting things up11:40
tristanlittle bitty wrapper functions11:40
benschuberttristan: regarding the threading scheduler: I've got somethign mostly working. The parts that still need to be fixed is around the cancelling (termination), as the python code doesn't react to that. So I'll probably have to have some event that when set, the job code raises an exception, and we would check that at regular intervals. I don't think we can do much nicer. Would you be fine with something like this?11:51
benschubertLike something at least in every `with signals.suspendable`, but potentially in some long running parts of the code (downloadsource._ensure_mirror, for example)11:52
tristandon't we have signals.terminator() for that ?11:53
benschubertbecause cancel() actually doesn't cancel already running stuff11:53
benschubertah sorry, not suspendable, I meant terminator yes11:53
tristanI'm not sure how this is supposed to work with threads, I expect it to be difficult11:53
tristanI mean, in the main event loop, we'll of course have a SIGTERM handler to handle it in the main loop11:54
tristanin case bst itself receives SIGTERM11:54
benschubertThus why I'm thinking about setting an event should_terminate, that the jobs would check periodically11:54
tristanBut how we delegate that to supbprocesses is another thing11:54
tristanI guess with threads, we need to register any subprocesses11:55
benschubertYeah that's fine11:55
benschubertwhat's less fine is when we don't use a subprocess11:55
benschubertlike, when we use the urlopen() for downlaoding sources11:55
benschubertor when we do grpc calls11:55
tristanRight, we need a way to cause threads to return from poll() or select() or whatever is blocking11:55
benschubertwhich I'm not really sure how to do :/11:56
tristanThose anyway wont respond to the regular "Please terminate... Wait a grace period... Sorry bro, you get an axe in your back -> SIGKILL"11:56
tristanYup11:57
tristanBut there should be a way ;-)11:57
tristanPossibly a signal11:57
benschubertwell, we can't SIGKILL a process11:57
benschubertaa thread11:57
benschubertwe can't send signals to threads11:57
tristanbenschubert, I mean we're going to still have to do that for subprocesses11:57
benschubertyeah and subprocesses are not the concern :)11:57
benschubertthe concern is python code11:57
benschubertwhen the thread is doing something in python11:57
tristanSo we probably have any launched subprocesses in a single pool which we can gracefully terminate/kill11:57
benschubertlike, urlopen()11:58
benschubertSubprocesses are handled fine already :)11:58
tristanFor those, we might provoke a return from poll() with SIGIO11:58
tristanAnd have the event loops observe a terminated state before going back to sleep11:58
benschubertwell, that still doesn't help for threads :)11:58
tristanOr whatever is implementing `await` or whatever the python thing is11:58
benschubertwhich are not running in the ioloop11:58
tristanWell, they will be probably running an internal ioloop or such11:59
benschubertthe concern is really around threads, that are doing long running stuff11:59
tristanunder the hood11:59
benschubertnot even sure we have access to such apis11:59
tristanWhile they are blocking, what happens, they sleep in poll(), poll returns automatically on some signals11:59
tristanAnd by default returns to poll11:59
tristanuntil real data is encountered, we need to sneak in and get it to observe a termination state at that point that we provoke it's syscall interruption12:00
tristanWe could certainly override the implementation right ?12:00
tristanI would guess ?12:00
tristanReplace it with something that is termination aware12:00
benschubertI am not really sure :'D12:02
benschubertbut I can see whether I can find something12:02
tristanplugins using raw python APIs for network connections could be problematic12:02
benschuberthttps://docs.python.org/3/library/select.html12:02
benschubertmmh we might12:03
tristanit would be sensible to ensure those are off limits12:03
tristanBut if we could get a blanket approach it could be easier12:03
tristanI.e. why would we allow plugins to just go and call stuff like urlopen() without going through a BuildStream API ?12:03
benschuberttristan: then I can do much simpler: have `downloadablefilesource` urlopen() read chunk by chunk and on each chunk see if we need to terminate12:04
* tristan realizes this is largely unpoliced, unfortunately12:04
benschubertWe _could_ make the should_terminate something public too, and plugins could react nicely12:04
tristanI don't know, I would think something lower level is better12:05
benschubertthough I'm less in favor of that12:05
tristanreading chunk by chunk can still block12:05
benschubertwhich is fine12:05
benschubertit's probably not for long12:05
tristanUhhh12:05
tristanNo, it's the network, can be unpredictable12:05
tristanIf you terminate at the same time as a connection gets lost, you end up with a case where we fail to terminate12:06
benschubertwell, until the timeout of the connection throws12:06
tristanlost connection might take a long time12:06
benschubertand it's rare to have that above a few seconds no?12:06
tristanCan we guarantee that the timeout is always under our control ?12:06
benschubertno12:06
tristanSo then I don't think it's appropriate at all12:06
benschubertWe can also have a timeout on the read, if no events happend in X time12:07
tristanI'm not sure it's alright to focus only on DownloadableFileSource honestly; a drop in replacement of urllib in buildstream.utils would be more user friendly12:08
tristanotherwise we are saying you can never open a url unless you use DownloadableFileSource12:08
tristanWe could do that I guess, though12:09
* tristan back a bit later12:10
*** tristan has quit IRC12:13
*** jward has quit IRC12:56
*** Trevinho has quit IRC13:05
*** tristan has joined #buildstream14:15
*** ChanServ sets mode: +o tristan14:15
tristanI just went for a walk and will now go get some food, it's been on my mind though; these seems really unfortunate14:16
benschuberttristan: I've found a solution :D14:16
tristanI came across this SO post https://stackoverflow.com/questions/11817337/how-do-i-gracefully-interrupt-urllib2-downloads which looks grim, maybe things have improved in recent years14:16
tristanDid you ?14:16
benschubertyes sir :D14:16
tristanis it nice and graceful ? what is it ?14:17
benschuberthttps://gitlab.com/BuildStream/buildstream/-/merge_requests/1982/diffs?commit_id=41d3ee544ceabd1c608a0c1231a02916881334c414:17
benschubertI still need to cleanup14:17
benschubertbut it seems to work 100% of the time14:17
* tristan was thinking that perhaps sending ourselves SIGTERM might provoke python to abort system calls but not sure14:18
benschubertI need to check:14:18
benschubert- no unintended side effects14:18
benschubert- clean flush of log files14:18
benschubert- cleanup of casd14:18
benschubert- subprocess are closed correctly14:18
tristanWhat am I looking at ?14:18
benschubertbut apart from that it's 100% transparent to the threads unless they catch the exception we throw14:18
tristanso many files14:18
benschuberthttps://gitlab.com/BuildStream/buildstream/-/commit/41d3ee544ceabd1c608a0c1231a02916881334c4 this commit14:18
tristanright that's what I'm looking at, ok so... what... abort_thread() ?14:19
benschubertyeah14:19
benschubertwell, everything, it's 22 lines changed ;)14:19
tristanPyThreadState_SetAsyncExc14:19
benschuberthttps://docs.python.org/3/c-api/init.html#c.PyThreadState_SetAsyncExc14:19
tristanyeah I see14:20
benschubertand we could instead send a custom exception so that if we really want to cleanup something we can14:20
benschubertbut systemexit should be nice enough otherwise14:20
benschubertand hard to catch "by accident"14:20
tristanfrom that docs, I don't understand how this aborts a blocking syscall14:20
tristanit says "Asynchronously raise an exception in a thread"14:21
tristanWhich leads me to think, it schedules something to happen in the immediate future14:21
tristanafter something blocking returns14:21
benschubertyes14:21
tristanSo... not solved then ?14:22
benschubertwell, it does stop the thread immediatelly during a download14:22
benschubertbecause it causes the poll() selector (which is python) to actually throw at the next invocation afaik14:22
tristan:-S14:23
benschubertah shoot doesn't always work: I've tried a `time.sleep(20)` and I'm actually waiting...14:23
tristan"the next invocation", but what about the current blocking one14:23
tristanThat SO post suggests that there are alternatives to urllib which are more suitable to production code14:24
tristanIt would be nice to have something deeper in python to ensure that we can interrupt anything, but I think it would be perfectly acceptable to enforce that plugins must use a specific networking API surface provided by BuildStream14:25
tristanperhaps more than acceptable: expectable ;-P14:26
benschubertif the thread becomes blocked on a blocking call I'm not so sure how we could unblock it at any point though14:26
tristanthere are certainly ways to do it in C, syscalls can be interrupted; so it should be doable in python14:27
tristanthere is no way we can direct signals at specific threads ?14:28
benschubertthere is no thread signals at all no14:28
tristanperhaps anyway, as you mentioned before, we should have all of these file descriptors in the same event loop14:28
tristanIn which case we should have control of our poll(), and threads are only blocking on a thread condition14:29
benschuberthttps://docs.python.org/3.4/library/signal.html#signals-and-threads14:29
benschuberthow do you interrupt a system call in C also?14:30
tristanbenschubert, I don't know if that is relevant14:30
benschubertwell, it means we can't send a signal to a thread14:30
tristanI.e. it's not necessarily important that a signal handler be invoked in the thread14:30
benschubertand then how do you kill the thread?14:31
tristanif we sent a given signal to the whole process group and capture it in the main thread; I think we can have a way to provoke blocked syscalls to exit (regardless of where python code handles them)14:31
tristanYou would then rely on errors being reported from interrupted reads14:32
tristanor such14:32
benschubert> if we sent a given signal to the whole process group and capture it in the main thread;14:32
benschubertIs really not a problem, not even sure why we'd need a signal there.14:32
benschubert> ; I think we can have a way to provoke blocked syscalls to exit (regardless of where python code handles them)14:32
benschubertNever seen non-kernel code interrupt a syscall. How do you do that? :)14:32
tristanThe thing is, most posix level code will run a syscall with ret = read(...); and then check if ret < 0; if ret == -EINTR, I believe you normally reenter the syscall14:34
benschuberttristan: but how would you cause the EINTR first?14:34
benschubertthat's probably the first problem to solve14:35
tristanSIGINT for sure, but most probably others14:35
tristanit's a whole book that is not currently fresh on my mind hehe14:35
benschubertwe catch sigints, and we start killing at that point14:36
tristanwhere is stephen's manual when you need it14:36
benschubertnot sure what you want to do in the other threads after that14:36
tristanright, so one would think, we should be able to cooperate with sigint14:36
tristanperhaps explicitly trigger it if it's not the cause of our desire to terminate, and streamline our exit strategy through sigint (just a thought)14:37
tristanpossibly another signal, not sure14:37
tristancertainly it would make sense that python's SIGINT signal handler (in asyncio loop) would not override the SIGINT initially14:38
tristanso if the main thread killed the group with SIGINT, we might be able to rely on that unblocking things, and cleanup asyncrhonously... *really* not sure14:39
tristanpossibly twisted's url libraries are worth a looksee14:39
benschuberttristan: There are things like: https://docs.python.org/3/library/signal.html#signal.pthread_kill, however, if i `time.sleep()` in my thread, nothing will happen either14:40
tristanI think time.sleep() uses alarm()14:40
tristansleep should be easy to abort in a uniform fashion though14:41
tristan(even though it's not something one would expect)14:41
tristantime.sleep() is an interesting test, but of course it's nothing that should ever happen in real life :-S14:42
* tristan is mostly concerned with a blocking socket connection, establishing the connection or reading when losing connectivity14:43
benschuberttristan: as I told you, my tests with downloads work fine with the previous commit, I'm not sure what else to test14:43
benschubertSo, disconnectin the internet connection on download? I can test that14:44
tristanin this case, it doesnt really matter how small your chunks are, the chances that you are not inside a syscall at the time of connectivity loss are always going to be slim14:44
tristanif you're in a loop reading chunks, this only lets you terminate gracefully when you *don't* lose connectivity14:45
tristanotherwise you are hostage14:45
benschubertThen I'm not sure how you'd expect to ensure it works?14:46
tristanbest to read the twisted implementation first before answering that in python14:46
benschuberttristan: twisted's implementation is a non-blocking, async based one14:49
benschubertSo unless we decide to break the API and force downlaods to be done in an async way14:49
benschubertI don't see a way we could handle this generically14:50
tristanhttps://docs.python.org/3/library/socket.html#socket.socket.settimeout might be helpful14:51
tristanRight14:51
tristannon-blocking async sounds like something more trustworthy14:51
benschubertbut that's something that is a lot of work and breaking APIs :)14:52
tristanbenschubert, forcing them to be done in an async way does not mean the caller needs to use an async API14:52
tristanbenschubert, it *definitely* means BuildStream has to set the ground rules, saying "you must use *this* API to talk to the internet and nothing else"14:52
tristanWhich I think is a good API break14:52
benschubertwhich I would rather not do14:53
benschubertNow you want to run a scp command, using paramiko, you can't. You need to run scp? That's one more system dependency :)14:53
tristansubprocess model is better ?14:53
tristanSure, at least a host depencency in a subprocess can be aborted14:54
tristanAnd we can eventually look into providing sandboxes which provide those host deps14:54
benschuberttristan: however, if setting this timeout (like 5-10 seconds?) globally at buildstream import is enough, I think that is a good solution14:55
benschubertno idea whether other things would conflict, but hey14:55
* tristan worries it might not be14:55
tristanbenschubert, in honesty, if we find that it "happens to work" with urllib, I would be a hundred times more confident if we started out by wrapping urllib internally in BuildStream and saying that every plugin must use the wrapped API14:56
tristanAt least this would allow us to fix things properly later14:57
benschuberttristan: we have the downloadable filesource already14:57
benschubertI'd be wary of forcing plugins to only pass through that14:57
benschubertsince it might not be the best way for a specific plugin14:57
tristanThat's not a promise that no other plugin will not use urllib14:57
benschubertAnd if one plugin does a mistake and 'terminate' takes longer, what's the real problem?14:57
benschubertThe only problem I see is that users of this will realize it takes a long time to terminate14:58
benschuberttime that might not be that long anyways since most read() operations are chunked anyways14:58
tristanA plugin cannot make a mistake if it is only allowed to use BuildStream APIs, if it only uses BuildStream APIs, then only we can make a mistake, and we can fix it14:58
tristanthis whole conversation is really making me start to favor the idea of using subprocesses instead of threads :-S14:59
* tristan goes to get some food before it closes14:59
benschubertand have the pickle stuff?15:01
benschubertBecause if that's the only problem we can also set the threads in the pool as daemon and just not wait for it to close15:04
benschuberttristan: https://gitlab.com/snippets/1994376 here is a minimal POC, it's indeed not killed15:31
tristanI'm not sure there is much problem with using (or copying) something like twisted internals for this to implement DownloadableFileSource, and not even exposing an API surface (yet) until a need might arise for a downloading API in BuildStream15:38
tristanwe should be able to get this right15:38
tristanI thought we might have something ugly in the pip source which might be using some uncontrolled networking interface, but no; that is safely using Plugin.call()15:38
tristanAt the very least we could say "Never do that" and open a high severity issue against DownloadableFileSource to make it properly interruptable15:40
tristanThe "Never use urllib directly from your source plugin" seems to be mandatory with threads no matter how you slice it15:40
tristanSending them off to be their own little daemons doesnt seem adequate, they might pollute log files after buildstream exits, and anyway it would be quite sloppy to allow them life after bst exits15:41
tristans/might pollute log files/might cause all kinds of local state side effects such as polluting log files/15:42
benschuberttristan: otherwise: socket.setdefaulttimeout(10) <- maximum 10 seconds before the socket times out15:47
benschubertthough if you have a really really slow server, that might not be nice15:48
benschuberttristan: Ok I guess we should decide how we move on and go with an implementation. What would you prefer? I'm not really fond of pulling a lot of twised in BuildStream "just" for that, but if we can't avoid16:01
tristana few things to consider16:32
tristanan async/non-blocking/interruptable download should not be super onerous, hopefully we could get such a thing to live in a private module with private APIs16:32
tristaninspired by twisted perhaps16:33
tristanA plugin facing API I would think should be unnecessary16:33
tristanAlso, having it work properly might not be a good idea for a hard blocker for landing a change to a threading model16:33
tristanBuildStream is not an http socket library or a server and should not become one, we should stick to our strengths16:34
tristanAnd that is sandboxing and orchestration16:34
tristanLaunching commands, hopefully always in sandboxes in some optimal future, should be what we do best16:35
tristanbenschubert, my impression at this time is, we should get a better handle on what plugins can do in order for BuildStream to successfully deliver a fault tolerant and clean experience all around16:36
tristanMaybe this reaches beyond just "Source plugins should not go and try to access the internet using python APIs", maybe the buck stops there16:37
tristanClearly the threading model changes things, though, so a first step would be to clearly/better understand what those changes are16:37
benschuberttristan: I'm not sure having to use host tools for downloads is that great a model though :/16:38
benschubertYeah16:38
tristanbenschubert, I also don't think this is a suitable call to be making in a midnight IRC chat16:38
benschuberthahaha sure16:38
tristanWell, I think host tools are perfect for downloads in the cases we use them, and we should focus on making those sandboxable16:38
benschubertSo, the move to a threading thing is not blocking anything else for bst 2.0 right?16:38
tristanso that people can use pre-baked deterministic images containing all the host tools they need (git, bzr, etc)16:39
benschubertIf that's still correct, I'll just pause that part of the work for now16:39
benschubertDo we want this for bst2.0? :'D16:39
tristanbenschubert, I am starting to think though, for downloading regular files, DownloadableFileSource abstract class should really be enough16:39
benschubertI think so too16:39
tristanSo if we could keep that sealed in, and say; you wanna download a file: subclass this; then I think it would not be too bad of a limitation16:40
benschubertsince it can do http,https,ftp16:40
benschubertI'd agree with that16:40
tristanThat should in theory let us bake in a more sophisticated implementation than urllib16:40
tristanwithout necessarily blocking immediate progress16:40
benschuberttristan: so you'd rather continue now with the push to a threading model right?16:40
tristanbut I would be partial to outlining policy16:40
tristanI think so, that's my gut feeling, and it really is dependent on the idea that we have something sealed16:41
tristani.e. we could treat it as a bug16:41
tristanby sealed, of course I mean: we need to have a policy in place which ensures that we've encapsulated this problem to an internal code body which we can eventually fix16:42
tristanWe don't allow plugin authors to write code which has the same bugs16:42
benschuberttristan: there is an "immediate fix" which would be to add a timeout for now16:43
tristan(of course this means a proper guide for plugins, but that starts with agreed policy)16:43
tristanRight, that could be the short term thing16:43
benschubertOk, so ML post around preventing downloads from something else than a 'downloadablesource' ?16:43
benschubertand if agreement, we continue on the threading16:43
tristanML post sounds like a good start16:44
tristanIt would be good if the post considers what kind of abstract guidelines which plugin authors need to follow16:45
tristanLike avoidance of blocking system calls, and explanations about how buildstream goes about force terminating threads and such16:46
tristanMaybe there is a cancellable API or such which plugins could implement, I don't know16:46
benschubertok I'll probably prepare a proposal next week for this then :)16:46
benschubertthanks!16:46
tristansorry if I've been irritated at all today, this all came up quite suddenly :)16:48
* tristan has been struggling at the same time to put everlasting variables behind him while other important things creep up :-S16:48
benschuberttristan: no worries, I've been banging my head on this for more than two weeks now and it was good to have someone to discuss it with :D16:50
benschubertdo you want me to have a look at the variables stuff?16:50
benschuberttristan: actually, otherwise we can ask for the downloads in python to be run in a subprocess. That could work too if we 'spawn' :)16:51
tristanbenschubert, with the same limitation that we do seal the API and come up with policy, this does seem like a very sane alternative to importing heaps of code we don't need to specialize in yes :)16:55
tristanfor variables, I am having a hard time shaving off those .4 remaining seconds, I have a couple of possible optimizations to consider16:57
benschuberttristan: I believe a ```.. warning: When making long running operations in the plugins in python, please consider using a subprocess (multiprocess.Process), in order to allow us to nicely terminate BuildStream on demand```16:57
*** toscalix has joined #buildstream16:57
benschubertin the docs would be a good thing16:57
tristanSome of the profiling has shown longer durations locally but better performance in your benchmarks16:58
tristanbenschubert, I don't like "... please consider ..."16:59
benschuberttristan: profiling of cython will add operations at each function call that you would not see in real C calls16:59
benschuberttristan: `Please use...` ?16:59
tristanbenschubert, I would rather speak in terms of "must"16:59
benschuberttristan: sure that would work16:59
benschubertand it might make everything nicer no?16:59
* tristan not sure which "it" is making things nicer17:00
benschubertusing multiprocessing.Process for those specific parts of the code :)17:00
tristanI don't know if multiprocessing.Process is on the table17:01
tristanCan it be used ?17:01
tristansubprocess could be used I think, can't you use subprocess and pass it a function ? or you absolutely need an executable file on the host ?17:02
tristanbenschubert, I think that multiprocessing.Process (which we currently use for jobs) only works when you are careful to never have any threads going on in the main process17:02
benschuberttristan: if you 'spawn' we should be good17:03
tristani.e. fork() without execve(): a perfectly fine and good multitasking model, as long as libraries dont do crazy nonsense like starting their own threads without being asked to17:03
tristanOk, so spawn takes care of encapsulating it all into a re-exec ?17:04
tristanand you can give it a Queue() ? not sure about this17:04
tristanbut you probably have your head in the docs and know better :)17:04
* tristan thought multiprocessing was really only for the fork() copy-on-write memory model17:04
benschuberthttps://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods nope :)17:05
benschubertand yeah you can give it a queue17:06
*** xjuan_ has joined #buildstream17:06
benschuberttristan: we could event make it more abstract maybe: Having a 'timed_activity' variant that takes a callable that we run in a multiprocessing.Process ourselves. That way we cna ensure everything works nicely. What do you think? (credits to cs-shadow)17:14
tristanbenschubert, that sounds promising17:18
tristanbenschubert, we should also make it have stdout/stderr pointed to the appropriate open logging fd17:19
benschubertexactly17:19
benschubertand we would pass the 'return_value' through a queue under our control17:20
tristanyeah this looks like a decent way forward, subprocess anything which can get blocked in a system call17:20
benschubertI think that is even something we can do first, without doing the rest of the changes17:20
benschubertah wait no17:20
tristanof course for our internal cas interactions, we'd not want to do that17:20
tristanbut for that I'm sure we have better APIs which allow a more delicate touch17:20
benschubertyeah cas should not block like this :)17:21
benschubertsince it's local17:21
benschubertand if it dies, it gets cleaned up17:21
tristanWell pushing to artifact servers etc17:21
benschubertthat's done by casd itself17:21
tristanbut this is grpc, I'll bet we can do it without blocking17:21
benschubertyep17:21
tristanoh yeah that's true also, casd takes care of it17:22
tristanif you do have some inspiration around: https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables-manual-string-join/src/buildstream/_variables.pyx#L319, let me know17:23
tristancurrently I'm thinking: convert ResolutionStep to a struct instead of an object, possibly reuse the values array and avoid re-allocation17:24
tristanI'm quite happy with the bottom half: https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables-manual-string-join/src/buildstream/_variables.pyx#L37217:24
benschubertOne question that might be dumb but hey: Do you know if there is more work done somehow? Like is the algorithm at least as efficient?17:25
tristanThe separation between Variables._resolve() and Variables._resolve_value() ensures we never go into the loop for a resolved variable, while also ensuring there is never an extra dictionary lookup for the toplevel varname17:25
tristanbenschubert, there is a little more work at startup, populating the ValueClass table17:26
tristanthis is rather constant for large projects17:27
tristanand doesnt appear to take much in profiling17:27
tristanbenschubert, however, the main thing is that the old algorithm is recursive, and the new one necessarily needs to track state on the heap17:27
tristanAnother property of the old algorithm is that broken down list-of-strings (returned by re.split()) are replaced inline with resolved values in the same dictionary17:28
tristanI was thinking of replicating this behavior, it's possible, but I don't see how it should/would really improve performance17:29
tristanand it would be messier and harder to follow I think (we'd end up checking if values are isinstance(str) or isinstance(value))17:29
tristanThere is of course a tiny indirection of getting the Value and then (probably internal pointer arithmetic) to get from Value -> Value._resolved17:30
tristanthe recursion state is mostly captured by ResolutionStep, which is a multi-link list17:31
tristanreverse linked list with pointers to parent "stack frame" basically17:31
tristanand left to the gc to deal with17:31
tristanSo, changing that to structs and freeing them up could help17:32
tristanIf I just kept a pointer to the last step, I could reverse free the whole list in a go at the end, and avoid it being a PyObject17:33
tristanasides from the instantiation overhead, there is also a ref-counting overhead when assigning link relations17:34
tristanwhenever you say step = step.prev or such17:35
tristanwe're probably talking about 2 cycles per assignment here, though17:35
benschubertyeah, which probably doesn't account to .4s :)17:36
benschubertgah, that's an itchy one too17:36
tristanright17:36
tristanI am a bit curious about https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables-manual-string-join/src/buildstream/_variables.pyx#L79117:37
tristanbenschubert, This gets called every time we enter _resolve_value()17:37
tristanand freed at the end17:37
tristanseems like it could be cached, and never realloced, at least until Variables.check() is complete17:38
tristanso the array could grow as needed on allocation, length could be set to 0 upon entry of the loop and the same memory reused17:38
benschubertso create it before and pass it as argument?17:38
tristanWell, it could be stored on Variables directly17:39
tristanas a cache17:39
benschubertright17:39
tristanyou only need one as it's not reentrant17:39
benschubertas long as it's cleaned up correctly17:39
tristanright, and we could give it a cleanup at the end of check() (and in __dealloc__)17:39
benschubertyup17:40
tristanor even at the end of everything which incurs variable resolutions (like _expand())17:40
tristanstill, does that account for .4 seconds ? maybe it all culmulatively does17:40
tristanbut no matter how I slice it, the logic to me appears that; if you go from recursive to non-recursive, you have to manage more state on the heap, and this must be the main source of the overhead17:41
tristanI could be wrong but I think it's a sound theory ;-)17:42
tristanof course, it should be possible to get very close anyway17:42
tristanit's just a very delicate dance to do so17:42
tristanIf you look at _resolve_value(), the original code basically drives through only the last part of the loop, from left to right instead of right to left, and expands variables recursively as needed17:43
benschubertright17:44
benschubertI'll bump my head against it a moment this weekend to see what I can get out of it17:44
tristanfor part in value.value_parts: list.append(possibly_expanded(part)) ... "".join(list)17:44
tristanI'll try my micro-optimizations (cache/reuse the array, structize ResolutionStep) tomorrow and we can see if it makes any dents17:46
benschubertok!17:52
benschuberttristan: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1982/diffs?commit_id=056d466cb51a98a0de6a266d55f284fb54e594c9 it's still lacking a bit on the handling but it's mostly there now18:04
benschuberthow would you find this solution?18:04
tristannot using blocking_activity() yet ?18:08
benschubertwoops my bad18:08
tristan:)18:08
tristansomething along those lines seems good though :)18:09
tristanmaybe ensure Plugin objects dont get pulled in (no self._foo callbacks)18:10
tristanensure that code inside there doesnt have buildstream API and state, and can't try weird things like self.message()18:11
benschubertIf that happens we get a pickling error18:12
*** toscalix has quit IRC18:29
*** mohan43u has quit IRC19:01
*** santi has quit IRC19:06
*** mohan43u has joined #buildstream20:38
*** mohan43u has quit IRC20:44
*** mohan43u has joined #buildstream20:48
*** mohan43u has quit IRC20:57
*** mohan43u has joined #buildstream21:11
*** pointswaves has quit IRC21:16
*** mohan43u has quit IRC21:17
*** xjuan_ has quit IRC22:36
*** benschubert has quit IRC22:37

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!