IRC logs for #buildstream for Tuesday, 2020-06-16

*** xjuan has quit IRC00:01
*** valentind has quit IRC00:51
*** cphang has quit IRC00:51
*** scottclarke has quit IRC00:51
*** traveltissues has quit IRC00:51
*** robjh has quit IRC00:51
*** valentind has joined #buildstream00:56
*** robjh has joined #buildstream00:57
*** hasebastian has joined #buildstream03:36
*** mohan43u has quit IRC03:56
*** tristan has quit IRC03:56
*** mohan43u has joined #buildstream03:57
*** tristan has joined #buildstream04:57
*** ChanServ sets mode: +o tristan04:57
juergbitristan: I'm wondering whether it could make sense to move from the options 'fork' or 'spawn' to the options 'fork' or 'everything in the main process'07:20
juergbiwith the move to making better use of buildbox we've offloaded quite a bit of work from the buildstream (job) processes07:20
juergbiit might be feasible to run all python code in the main process (on systems where fork is not available)07:21
juergbiby making Element.assemble(), Source.fetch() and co. async coroutines and use async calls ast least for Sandbox.run() and subprocesses such as 'git fetch'07:22
juergbiand longer term (after optimizations) maybe even move away from fork on Linux as we've had our share of problems with that as well07:23
juergbinot sure whether the overall impact on the code base would be smaller than spawn. however, I'd rather move towards async than towards pickling07:25
* tristan resurfaces07:38
tristanHmmm07:39
tristanjuergbi, that sounds like it would be a much huger change07:40
tristanIn theory, python could be able to support subprocess calls asynchronously07:41
juergbipossible but it would be a move into a direction that may be beneficial to all systems07:41
juergbitristan: it already does: https://docs.python.org/3/library/asyncio-subprocess.html07:42
tristanAnd then the core main process would either have to employ (fake) threading07:42
tristanWhich we'd have to ensure works with that subprocessing (for sources), or we'd have to hold on to state of active calls07:42
juergbiwe already use asyncio and Python already has async coroutine support integrated into the syntax07:43
tristanJob control could suffer as a result of this07:43
tristanYes python is aware of this and we use asyncio, but our core will have to significantly change to adapt to that strategy07:43
juergbiproperly implemented, I don't think it should suffer. however, I don't know whether that's easy to achieve in Python or not07:43
tristanAnd then there are the plugin contract side effects to consider07:44
juergbido you think the core really has to change that much? it's possible but I hope it wouldn't have to07:44
tristanI'm thinking, such a change could push 2.0 forward by months07:44
juergbiyes, this would have an impact on plugin API, however, that may be done as part of https://gitlab.com/BuildStream/buildstream/-/issues/129307:45
tristanAnd then from a plugin author (i.e. end user) perspective, it also probably makes writeups of plugins more complicated07:45
tristanit's really nice to have your function and just say to your users: "do this in that function"07:45
tristanrather than have any event loop awareness (or top/bottom halfs) in plugin methods07:46
juergbiI would hope that the async syntax makes this easy enough but it's definitely possible that there are some aspects that still make this tricky07:46
juergbithere should definitely be no need to split methods. if the async support is not good enough to handle it without splitting methods, it's definitely a no-go07:46
tristanI would think BuildStream would have to leverage async syntax itself, and only feed out top/bottom half methods to plugin authors07:47
*** benschubert has joined #buildstream07:47
tristanotherwise BuildStream would not be able to police/ensure/force that plugins do the expected thing07:47
juergbiI'd rather make it easy on the plugin developers even if the core can't guarantee that the plugin does the right thing (i.e., block where it shouldn't)07:48
juergbiI think even with method splitting the core couldn't really enforce it07:49
tristanYeah, but with the current approach, we can; and things are much simpler for it07:49
juergbiin this regard, yes, on the other hand the fork model cases a lot of other issues07:49
* benschubert arrives in the middle of an important conversation?07:49
tristanI kind of think that has more value, we can always use async semantics for launching a child job, so long as we have the hardness in which that code runs in a subprocess07:50
tristanbenschubert, just started, check the logs and catch up :)07:50
juergbibenschubert: fork/spawn alternative: https://irclogs.baserock.org/buildstream/%23buildstream.2020-06-16.log.html#t2020-06-16T07:20:0007:50
benschubertthanks07:50
tristanbenschubert, you may want to read the ML thread as this sort of stems from there07:50
* tristan takes 5 while Ben catches up07:51
juergbitristan: I'm not quite sure I understand your last sentence (to me)07:51
tristanjuergbi, I kind of think that BuildStream being in total control has more value, especially given that we currently do; and things (plugin facing) are simpler for it already07:52
benschubertYeah it's about the removal of the pickling states, I've skimmed through that already, will have to have a more thorough look07:52
benschubertFWIW, we _could_ make everything async without changing the plugin API at all (and using `loop.run_in_executor`, which would execute the code in an 'executor', either a thread (default) or a subprocess (that would be much harder though).07:53
juergbihm, yes, while threads wouldn't provide parallelism (GIL), they might still be useful for concurrency07:54
benschubertI think the question async/not-async is orthogonal to separate in subprocess for user supplied code, and we should treat both separately :)07:54
juergbihowever, we'd need to make sure we don't add significant locking/synchronization complexity07:55
benschubertwell, if we assume most plugins are not _compute_ heavy, we're good with the GIL :)07:55
juergbi(GIL granularity might be sufficient in many cases)07:55
benschubertwell, if we use async, we _know_ where the splitting points are (cooperative multitasking), so as long as the plugins' API calls have no side effect when run in the thread, we're good07:56
benschubertbut that's something we _could_ discuss another time07:56
benschubertI'm also not fond with what's proposed on the ML (I'll answer to that, have to catch up after a break), having each attribute explicitely mark whether it should be pickled or not, as this would add some runtime impact for both fork and non-fork systems07:57
benschubertIMO, we'd be better with a forkserver/jobserver model07:58
juergbibased on our current plugin approach a threaded executor might be fairly straight forward. as a child subprocess anyway can't directly affect global state, so that rule would stay and might avoid the need for complex locking/synchronization. we also have to check potentially concurrent global state manipulation by the main thread which may be observable by job threads but our API might already be suitable for this as well08:00
benschubertbut we're loosing the clean separation of ensuring nothing bad in the stte change08:01
juergbi(again, as our existing API anyway assumes that observable global state doesn't change while the plugin is running)08:01
benschuberthowever, my opinion is that if someone _really_ wants it, well... they'll be able to08:01
juergbiit would have to be ensured by the API surface08:01
benschubertcorrect08:01
juergbiyes, it's not like we have full sandboxing protecting against evil plugins08:02
juergbithis is about accidental breakage08:02
benschubertI'd not say 'evil', people might have a real use case for it, though I can't see one :D08:02
benschubertbut I definitely remember having to override constants in a few python libraries that were not fit for what I was doing08:03
juergbiah, third party libraries08:03
benschubertso, yeah I think that we should not make is _easy_, but trying to defend too much against it is probably not a good win08:03
benschubertOne thing I'd rather have first though is asyncio support in grpc08:04
tristanThe root causes of this discussion are (A) Python doesnt really have threading... (B) Non unix/linux operating systems tend to have pathetic perfomance in their fork() implementations (just noting: it seems crappy to do a whole complicated async redesign around these obstacles)08:04
juergbi(B) is a bit different08:04
juergbinon-unix, i.e., Win32, doesn't have any fork08:05
juergbiand even on unix forking without exec is highly problematic08:05
juergbimainly due to threading08:05
juergbiwe've had tons of issues. ostree, grpc, maybe others08:05
juergbiso from my perspective it's rather moving away from the highly problematic fork08:06
juergbinot a workaround08:06
tristanYes, some libraries do crazy things like start threads without being asked to08:06
juergbithat's to be expected these days08:06
* tristan has always considered those to be severe bugs08:06
juergbiyou want to use your cores08:06
benschuberttristan:08:06
benschubertA) is not a problem _if_ we are not CPU intensive, which we used to be a lot, nowadays it's much better thanks to casd/buildbox-run/a lot of optimisations08:06
benschubertB) True, thus a forkserver/job worker implementation08:06
benschubertAsync: I don't believe the change would be that big, remember that:08:06
benschubert  1) We are not a GUI, so 'liveness' is not too important08:06
benschubert  2) Only 'long running' operations would need to run in async mode, the rest can be used as is08:06
benschubertalso, I have checked with a Python core developper: Async + fork is _not supported_, and will break in a lot of subbtle ways (as we already experienced)08:07
benschubertso I agree with juerg there that fork is problematic nevertheless08:07
juergbiright, I forgot about the default child watcher in Python 3.8 which causes problems with fork08:08
tristanEvery time I check with python developers, they tell me "dont do that": for just about anything useful I want to do :-(08:08
tristanSo from a reading of this conversation while I was outside... so far we have (A) Use threading in the main process... (B) Use proper subprocesses for child jobs08:09
tristanAm I wrong in that (B) implies Python object serialization anyway ?08:09
benschubertcorrect08:10
juergbiI'm a bit confused08:10
juergbitristna's A/B was about root causes, not options08:10
tristanjuergbi, this is a second A/B, sorry should have used 1/2 maybe ?08:10
juergbiah ok08:11
juergbiI'd really like to avoid pickling (at least outside very limited pickling scope)08:11
juergbiso for me it's either threading or full async without threading08:11
tristanWhat I don't want, is for plugin developers to have to know anything complicated: I don't even want plugin developers to really know Python08:11
tristanIt should be enough to just look at someones plugin, maybe graze the API reference (which we currently lack), and bang out a plugin that you need on demand08:12
benschubertjuergbi: we won't be able to use async without threads. However, without the thread API yes, is that what you meant with your separation?08:12
juergbiI think threading gets closest to this08:12
tristanSo I'm quite against the idea of offloading any knowledge of async coroutines to plugins08:12
juergbibenschubert: theoretically, everything could be based on async generators/coroutines (not counting native background threads)08:13
benschubertwe can have full async support without the plugins ever knowing, I'm confident of that08:13
benschubertjuergbi: gotcha08:13
juergbihowever, yes, my favorite right now is also full async support with threading without the plugins ever knowing08:13
tristanTerminology melt down: We currently have full async support already I think08:14
juergbiok, let's add 'without Python subprocesses'08:14
juergbii.e., threads instead of Python subprocesses08:14
benschuberttristan: using only 'asyncio' as the multitasking mechanism08:14
juergbiand if this is really feasible, it would theoretically not even be a 2.0 blocker as the plugin API wouldn't change. although, would be nice to make such a change before 2.008:15
tristanbenschubert, Do you think job control will suffer ?08:15
tristanbenschubert, Can I at least have all my subprocesses go to sleep when I ^Z buildstream ?08:15
juergbiI wouldn't expect any issues there with threads08:15
benschuberttristan: if we are full async, I think we could achieve better performance than currently08:15
benschubertbecause we'd have less signal passing around notably08:16
tristanbenschubert, better performance of job control ? you mean less buggy ?08:16
benschubertboth08:16
tristanQuicker08:16
tristanQuicker time-to-sleep/wakeup, sure08:16
tristanWell, there's still going to be a timeout on killing processes which fail to respond to SIGTERM08:17
tristananyway08:17
tristanNow regarding performance, I'm skeptical of threads08:17
tristanWe are saying that we are not processing intensive, but I'm not buying it08:17
juergbiah, implementation may not exactly be simpler as we'd have to suspend subprocesses of all threads from a single signal handler, I think08:17
tristanIf it were true, we would not have an initiative to subprocess the scheduler08:17
benschuberttristan: which didn't pass the cut and was actually slower. Yes the UI updates are _really_ bad, which was "fixed" by not refreshing more than every 1sec08:18
benschubert(actually user defined)08:18
tristanWe could revive that initiative, I would personally like to work on it, because I want to get rid of all that extra progress reporting codepaths and make sure it's all encapsulated in the mirrored State object in the scheduler process08:18
benschubertand again this part was slow, because of writing to the cscreen08:18
tristanHmm08:19
juergbithe job subprocesses do less work nowadays due to offloading to buildbox. however, we would require further optimization if we want to drop fork completely (instead of supporting it as less performant alternative for mac/windows)08:19
benschubert> I want to get rid of all that extra progress reporting codepaths and make sure it's all encapsulated in the mirrored State object in the scheduler process08:19
benschubertI'd love that, +1 definitely, not sure the process split is the best there08:19
tristanOk but what about the plugins which *do* process intensive stuff08:19
tristanThat stuff still exists and has it's place08:20
juergbie.g., 'virtual staging' still requires some Python CPU cycles. however, it should be possible to find an offload approach for that as well08:20
tristanAnything around compose elements08:20
tristanAnything obseriving and filtering files etc08:20
tristan*observing08:20
juergbidifficult to estimate the performance of this in real projects08:21
tristanEven recursive dependency walks can be expensive and add up08:21
tristanthose will block other tasks from being dispatched08:21
benschubertat that point, I think we should decide what "guarantees" on plugins isolation we want, and how much we are ready to change of public API(None is fine)08:21
benschubertThen have a shot at a few ways we could optimize that and test it out with some real and benchmarking projects08:21
*** santi has joined #buildstream08:23
benschuberttristan: one way of optimizing the recursive dependency (has been in my mind for a while), would be to use a list (dependencies_list) instead of a generator when we know we're going to consume all (or the "happy case will do at least). That would remove a lot of context switch but means two separate implementations08:23
juergbido you see any potential issues in pursuing a dual approach as a starting point, i.e., support both, make fork opt-out on Linux?08:23
benschubertjuergbi: both being async and fork?08:23
juergbiyes08:23
juergbikeep both options open depending on performance08:23
tristanSome things I would really want this conversation to conclude with are (1) Who is going to do this big refactor and when... (2) What are we going to do in order to immediately unblock !1901 ?08:23
benschubertI think that we might be able to optimize more 'async' if we don't have to keep fork, but we could at least have a try on a branch08:24
juergbitristan: if we agree on this direction, this would still mean merging !196508:24
juergbias it's a completely different approach08:24
benschubertand it would become unnecessary08:24
juergbibut (1) is a valid question08:24
benschubertso yeah, I think going forward with !196 is fine08:24
benschubertdepends on the time frame, I could probably have a go, but I'm pretty busy those days08:25
juergbia proof of concept prototype might not take that much time, may allow figuring out how much work is needed for a mergeable implementation08:26
tristanRegarding protections and dangers of threading: I'm happy with knowing that if a plugin invokes private API illegally, they cause everything to break, otherwise if anything breaks: It's mostly our fault (or our responsibility to print a reasonable error about what the plugin obviously did wrong)08:27
juergbiyes, I think that would be the goal08:27
tristanAny updates to state in the internal data model should still be synchronized when jobs complete08:27
benschubertyeah, definitely08:27
juergbijust using public API the worst a plugin should be able to do is to be slower than necessary08:27
tristanSo we may need some internal wrangling for that, but I think plugins don't update state themselves (it's all our internal jobs which do that)08:28
benschubertat least in the code path for jobs that's true08:28
benschubertso we should be good08:28
tristanRight, but in jobs we have some things like the job updating state in the data model before completion, passing that state over to the main process and then re-updating it08:28
tristanMaybe not much to fix around there08:29
benschubertyeah08:29
tristanOk08:30
tristanbenschubert, what I've gleaned from your previous comments regarding scheduler separation: I can already nuke the redundant code paths about state delivery in the scheduler08:30
tristanThose are *really* getting on my nerves08:30
tristanI was wanting to do that even if we had to revive the scheduler process separation08:31
benschubert:D08:31
tristanState was designed to do that in the first place, ok...08:32
tristanOk so, we don't really have a decision here do we ?08:32
tristanRegarding subprocess vs threads08:32
benschubertI think we should _try_ threads08:32
tristanThat's what decides !196508:32
tristanWhich blocks !190108:32
juergbiI'd say agreement is pretty clear on merging !196508:32
benschubertYes08:33
juergbias nobody is fully happy with the current state of affairs08:33
tristanOk, lets do it08:33
tristanAnd if we can also merge !1964 that will bring me one step closer to !190108:33
tristanBut we can do !1965 first08:34
tristanbenschubert, As you were going to reply to that mail anyway, would you like to announce that conclusion on the ML ?08:34
tristan(and perhaps link to this IRC log)08:34
benschubertsure, I'll do that today08:34
benschubertah08:34
benschubertI though you meant another thread08:35
benschubertbut yeah I can08:35
tristanI meant this one about nuking pickle jobbers :)08:35
tristanThere is a looming topic of artifact cache configurations which we can all afford to let simmer in the back of our minds for a while yet08:36
benschubertyeah i need to answer that08:36
benschuberton my plate is:08:37
benschubert- answer this ML thread08:37
benschubert- Answer the one about artifact cache08:37
benschubert- Start one with plugins and testing (gaaah)08:37
benschubertI should be able to do all of it by eow08:37
benschubertand after that I could try a POC for threading if nobody picks it up before08:37
juergbisounds great08:40
tristanbenschubert, if you can do this ML thread first, I'm eager to land !1965 and !196408:40
benschubertyeah I'll do that today08:40
benschuberttristan: ^08:40
* tristan thinks it's not a big mail, just wanted someone other than me to share the load of announcing conclusions of threads :)08:41
benschubertdefinitely08:41
*** traveltissues has joined #buildstream09:10
traveltissuesany opinions on excluding tests directory from sast scans? https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/11909:11
traveltissues"secrets" will fail while scanning tests/sources/ostree due to lacking a gpg-agent socket.09:12
WSalmonis this a know issue i can update or do i need to make a new issue? https://hastebin.com/asibabicaf.sql09:12
traveltissuesi've seen this before WSalmon but i never diagnosed the actual issue. have you recompiled the cython components?09:13
WSalmontraveltissues, as far as i can tell09:13
WSalmonit only dose it for some elements in the project09:14
traveltissuesso is it an unhandled bad format?09:14
WSalmonbut they work fine in CI so i assume its a local funny09:14
WSalmontraveltissues, https://hastebin.com/uzevemomij.md more context09:15
traveltissuesif it's only local i'd try recompiling/cleanign venv09:15
WSalmonit looks like it can read all the yaml on the first pass09:15
traveltissuesyes, that's the same as what i was seeing09:15
WSalmonwhich is supper weired09:16
traveltissuesthis should be handled better either way09:16
traveltissuesimo09:16
traveltissuesprobably worth making an issue09:16
WSalmonwill do, didnt want to make a duplicate if some one knew of one already09:17
benschubertWSalmon: this is a bug in one of the plugins in bst-experimental, don't know why it's hitting now, one of the plugins uses ConfigParser09:18
WSalmonah09:19
WSalmonthanks benschubert09:19
*** hasebastian has quit IRC09:22
WSalmonFYI benschubert traveltissues https://gitlab.com/BuildStream/buildstream/-/issues/134109:29
traveltissuesty09:31
*** hasebastian has joined #buildstream09:38
*** tristan has quit IRC10:07
*** xjuan has joined #buildstream12:29
*** douglaswinship has joined #buildstream13:45
douglaswinshipwindow move 713:46
douglaswinship(ooops, sorry. Forgot the slash)13:46
douglaswinshipIs there an introduction anywhere to how to write plugins / what the various functions inside a plugin actually do / what invokes them?14:05
douglaswinshipEverything I've found so far seems to be written on the assumption that the reader already knows most of it.14:05
juergbidouglaswinship: have you seen the plugin API reference already?14:07
juergbihttps://docs.buildstream.build/master/core_framework.html14:07
juergbithe plugin methods are described there. for element plugins: https://docs.buildstream.build/master/buildstream.element.html14:07
juergbifor source plugins: https://docs.buildstream.build/master/buildstream.source.html14:07
douglaswinshipjuergbi: I've seen some of the pages from it, certainly. I still got pretty lost when I looked at them. I'll see if reading through from the beginning helps.14:09
juergbidouglaswinship: for some higher level info the architecture docs may be useful as well: https://docs.buildstream.build/master/arch_data_model.html#14:11
douglaswinshipjuergbi: thanks!14:11
douglaswinshipI've never seen the architecture section before. I think i must have been looking at older versions of the docs14:12
*** chipb has quit IRC14:16
*** chipb has joined #buildstream14:16
*** xjuan has quit IRC14:16
*** xjuan has joined #buildstream14:18
*** tristan has joined #buildstream16:24
*** ChanServ sets mode: +o tristan16:24
*** samwilson has joined #buildstream16:54
*** hasebastian has quit IRC17:39
*** toscalix has joined #buildstream17:41
*** samwilson has quit IRC17:46
*** santi has quit IRC18:10
*** toscalix has quit IRC18:30
*** santi has joined #buildstream18:35
*** santi has quit IRC18:37
*** santi has joined #buildstream18:42
*** santi has quit IRC18:44
benschuberttristan: any reason we explicitely suspend every child process one by one instead of getting the list of all children and pausing all of them?19:26
*** toscalix has joined #buildstream20:44
*** philn has joined #buildstream20:52
*** toscalix has quit IRC21:26
*** xjuan has quit IRC22:43
*** benschubert has quit IRC23:49

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!