*** xjuan has quit IRC | 00:01 | |
*** valentind has quit IRC | 00:51 | |
*** cphang has quit IRC | 00:51 | |
*** scottclarke has quit IRC | 00:51 | |
*** traveltissues has quit IRC | 00:51 | |
*** robjh has quit IRC | 00:51 | |
*** valentind has joined #buildstream | 00:56 | |
*** robjh has joined #buildstream | 00:57 | |
*** hasebastian has joined #buildstream | 03:36 | |
*** mohan43u has quit IRC | 03:56 | |
*** tristan has quit IRC | 03:56 | |
*** mohan43u has joined #buildstream | 03:57 | |
*** tristan has joined #buildstream | 04:57 | |
*** ChanServ sets mode: +o tristan | 04:57 | |
juergbi | tristan: I'm wondering whether it could make sense to move from the options 'fork' or 'spawn' to the options 'fork' or 'everything in the main process' | 07:20 |
---|---|---|
juergbi | with the move to making better use of buildbox we've offloaded quite a bit of work from the buildstream (job) processes | 07:20 |
juergbi | it might be feasible to run all python code in the main process (on systems where fork is not available) | 07:21 |
juergbi | by making Element.assemble(), Source.fetch() and co. async coroutines and use async calls ast least for Sandbox.run() and subprocesses such as 'git fetch' | 07:22 |
juergbi | and longer term (after optimizations) maybe even move away from fork on Linux as we've had our share of problems with that as well | 07:23 |
juergbi | not sure whether the overall impact on the code base would be smaller than spawn. however, I'd rather move towards async than towards pickling | 07:25 |
* tristan resurfaces | 07:38 | |
tristan | Hmmm | 07:39 |
tristan | juergbi, that sounds like it would be a much huger change | 07:40 |
tristan | In theory, python could be able to support subprocess calls asynchronously | 07:41 |
juergbi | possible but it would be a move into a direction that may be beneficial to all systems | 07:41 |
juergbi | tristan: it already does: https://docs.python.org/3/library/asyncio-subprocess.html | 07:42 |
tristan | And then the core main process would either have to employ (fake) threading | 07:42 |
tristan | Which we'd have to ensure works with that subprocessing (for sources), or we'd have to hold on to state of active calls | 07:42 |
juergbi | we already use asyncio and Python already has async coroutine support integrated into the syntax | 07:43 |
tristan | Job control could suffer as a result of this | 07:43 |
tristan | Yes python is aware of this and we use asyncio, but our core will have to significantly change to adapt to that strategy | 07:43 |
juergbi | properly implemented, I don't think it should suffer. however, I don't know whether that's easy to achieve in Python or not | 07:43 |
tristan | And then there are the plugin contract side effects to consider | 07:44 |
juergbi | do you think the core really has to change that much? it's possible but I hope it wouldn't have to | 07:44 |
tristan | I'm thinking, such a change could push 2.0 forward by months | 07:44 |
juergbi | yes, this would have an impact on plugin API, however, that may be done as part of https://gitlab.com/BuildStream/buildstream/-/issues/1293 | 07:45 |
tristan | And then from a plugin author (i.e. end user) perspective, it also probably makes writeups of plugins more complicated | 07:45 |
tristan | it's really nice to have your function and just say to your users: "do this in that function" | 07:45 |
tristan | rather than have any event loop awareness (or top/bottom halfs) in plugin methods | 07:46 |
juergbi | I would hope that the async syntax makes this easy enough but it's definitely possible that there are some aspects that still make this tricky | 07:46 |
juergbi | there should definitely be no need to split methods. if the async support is not good enough to handle it without splitting methods, it's definitely a no-go | 07:46 |
tristan | I would think BuildStream would have to leverage async syntax itself, and only feed out top/bottom half methods to plugin authors | 07:47 |
*** benschubert has joined #buildstream | 07:47 | |
tristan | otherwise BuildStream would not be able to police/ensure/force that plugins do the expected thing | 07:47 |
juergbi | I'd rather make it easy on the plugin developers even if the core can't guarantee that the plugin does the right thing (i.e., block where it shouldn't) | 07:48 |
juergbi | I think even with method splitting the core couldn't really enforce it | 07:49 |
tristan | Yeah, but with the current approach, we can; and things are much simpler for it | 07:49 |
juergbi | in this regard, yes, on the other hand the fork model cases a lot of other issues | 07:49 |
* benschubert arrives in the middle of an important conversation? | 07:49 | |
tristan | I kind of think that has more value, we can always use async semantics for launching a child job, so long as we have the hardness in which that code runs in a subprocess | 07:50 |
tristan | benschubert, just started, check the logs and catch up :) | 07:50 |
juergbi | benschubert: fork/spawn alternative: https://irclogs.baserock.org/buildstream/%23buildstream.2020-06-16.log.html#t2020-06-16T07:20:00 | 07:50 |
benschubert | thanks | 07:50 |
tristan | benschubert, you may want to read the ML thread as this sort of stems from there | 07:50 |
* tristan takes 5 while Ben catches up | 07:51 | |
juergbi | tristan: I'm not quite sure I understand your last sentence (to me) | 07:51 |
tristan | juergbi, I kind of think that BuildStream being in total control has more value, especially given that we currently do; and things (plugin facing) are simpler for it already | 07:52 |
benschubert | Yeah it's about the removal of the pickling states, I've skimmed through that already, will have to have a more thorough look | 07:52 |
benschubert | FWIW, we _could_ make everything async without changing the plugin API at all (and using `loop.run_in_executor`, which would execute the code in an 'executor', either a thread (default) or a subprocess (that would be much harder though). | 07:53 |
juergbi | hm, yes, while threads wouldn't provide parallelism (GIL), they might still be useful for concurrency | 07:54 |
benschubert | I think the question async/not-async is orthogonal to separate in subprocess for user supplied code, and we should treat both separately :) | 07:54 |
juergbi | however, we'd need to make sure we don't add significant locking/synchronization complexity | 07:55 |
benschubert | well, if we assume most plugins are not _compute_ heavy, we're good with the GIL :) | 07:55 |
juergbi | (GIL granularity might be sufficient in many cases) | 07:55 |
benschubert | well, if we use async, we _know_ where the splitting points are (cooperative multitasking), so as long as the plugins' API calls have no side effect when run in the thread, we're good | 07:56 |
benschubert | but that's something we _could_ discuss another time | 07:56 |
benschubert | I'm also not fond with what's proposed on the ML (I'll answer to that, have to catch up after a break), having each attribute explicitely mark whether it should be pickled or not, as this would add some runtime impact for both fork and non-fork systems | 07:57 |
benschubert | IMO, we'd be better with a forkserver/jobserver model | 07:58 |
juergbi | based on our current plugin approach a threaded executor might be fairly straight forward. as a child subprocess anyway can't directly affect global state, so that rule would stay and might avoid the need for complex locking/synchronization. we also have to check potentially concurrent global state manipulation by the main thread which may be observable by job threads but our API might already be suitable for this as well | 08:00 |
benschubert | but we're loosing the clean separation of ensuring nothing bad in the stte change | 08:01 |
juergbi | (again, as our existing API anyway assumes that observable global state doesn't change while the plugin is running) | 08:01 |
benschubert | however, my opinion is that if someone _really_ wants it, well... they'll be able to | 08:01 |
juergbi | it would have to be ensured by the API surface | 08:01 |
benschubert | correct | 08:01 |
juergbi | yes, it's not like we have full sandboxing protecting against evil plugins | 08:02 |
juergbi | this is about accidental breakage | 08:02 |
benschubert | I'd not say 'evil', people might have a real use case for it, though I can't see one :D | 08:02 |
benschubert | but I definitely remember having to override constants in a few python libraries that were not fit for what I was doing | 08:03 |
juergbi | ah, third party libraries | 08:03 |
benschubert | so, yeah I think that we should not make is _easy_, but trying to defend too much against it is probably not a good win | 08:03 |
benschubert | One thing I'd rather have first though is asyncio support in grpc | 08:04 |
tristan | The root causes of this discussion are (A) Python doesnt really have threading... (B) Non unix/linux operating systems tend to have pathetic perfomance in their fork() implementations (just noting: it seems crappy to do a whole complicated async redesign around these obstacles) | 08:04 |
juergbi | (B) is a bit different | 08:04 |
juergbi | non-unix, i.e., Win32, doesn't have any fork | 08:05 |
juergbi | and even on unix forking without exec is highly problematic | 08:05 |
juergbi | mainly due to threading | 08:05 |
juergbi | we've had tons of issues. ostree, grpc, maybe others | 08:05 |
juergbi | so from my perspective it's rather moving away from the highly problematic fork | 08:06 |
juergbi | not a workaround | 08:06 |
tristan | Yes, some libraries do crazy things like start threads without being asked to | 08:06 |
juergbi | that's to be expected these days | 08:06 |
* tristan has always considered those to be severe bugs | 08:06 | |
juergbi | you want to use your cores | 08:06 |
benschubert | tristan: | 08:06 |
benschubert | A) is not a problem _if_ we are not CPU intensive, which we used to be a lot, nowadays it's much better thanks to casd/buildbox-run/a lot of optimisations | 08:06 |
benschubert | B) True, thus a forkserver/job worker implementation | 08:06 |
benschubert | Async: I don't believe the change would be that big, remember that: | 08:06 |
benschubert | 1) We are not a GUI, so 'liveness' is not too important | 08:06 |
benschubert | 2) Only 'long running' operations would need to run in async mode, the rest can be used as is | 08:06 |
benschubert | also, I have checked with a Python core developper: Async + fork is _not supported_, and will break in a lot of subbtle ways (as we already experienced) | 08:07 |
benschubert | so I agree with juerg there that fork is problematic nevertheless | 08:07 |
juergbi | right, I forgot about the default child watcher in Python 3.8 which causes problems with fork | 08:08 |
tristan | Every time I check with python developers, they tell me "dont do that": for just about anything useful I want to do :-( | 08:08 |
tristan | So from a reading of this conversation while I was outside... so far we have (A) Use threading in the main process... (B) Use proper subprocesses for child jobs | 08:09 |
tristan | Am I wrong in that (B) implies Python object serialization anyway ? | 08:09 |
benschubert | correct | 08:10 |
juergbi | I'm a bit confused | 08:10 |
juergbi | tristna's A/B was about root causes, not options | 08:10 |
tristan | juergbi, this is a second A/B, sorry should have used 1/2 maybe ? | 08:10 |
juergbi | ah ok | 08:11 |
juergbi | I'd really like to avoid pickling (at least outside very limited pickling scope) | 08:11 |
juergbi | so for me it's either threading or full async without threading | 08:11 |
tristan | What I don't want, is for plugin developers to have to know anything complicated: I don't even want plugin developers to really know Python | 08:11 |
tristan | It should be enough to just look at someones plugin, maybe graze the API reference (which we currently lack), and bang out a plugin that you need on demand | 08:12 |
benschubert | juergbi: we won't be able to use async without threads. However, without the thread API yes, is that what you meant with your separation? | 08:12 |
juergbi | I think threading gets closest to this | 08:12 |
tristan | So I'm quite against the idea of offloading any knowledge of async coroutines to plugins | 08:12 |
juergbi | benschubert: theoretically, everything could be based on async generators/coroutines (not counting native background threads) | 08:13 |
benschubert | we can have full async support without the plugins ever knowing, I'm confident of that | 08:13 |
benschubert | juergbi: gotcha | 08:13 |
juergbi | however, yes, my favorite right now is also full async support with threading without the plugins ever knowing | 08:13 |
tristan | Terminology melt down: We currently have full async support already I think | 08:14 |
juergbi | ok, let's add 'without Python subprocesses' | 08:14 |
juergbi | i.e., threads instead of Python subprocesses | 08:14 |
benschubert | tristan: using only 'asyncio' as the multitasking mechanism | 08:14 |
juergbi | and if this is really feasible, it would theoretically not even be a 2.0 blocker as the plugin API wouldn't change. although, would be nice to make such a change before 2.0 | 08:15 |
tristan | benschubert, Do you think job control will suffer ? | 08:15 |
tristan | benschubert, Can I at least have all my subprocesses go to sleep when I ^Z buildstream ? | 08:15 |
juergbi | I wouldn't expect any issues there with threads | 08:15 |
benschubert | tristan: if we are full async, I think we could achieve better performance than currently | 08:15 |
benschubert | because we'd have less signal passing around notably | 08:16 |
tristan | benschubert, better performance of job control ? you mean less buggy ? | 08:16 |
benschubert | both | 08:16 |
tristan | Quicker | 08:16 |
tristan | Quicker time-to-sleep/wakeup, sure | 08:16 |
tristan | Well, there's still going to be a timeout on killing processes which fail to respond to SIGTERM | 08:17 |
tristan | anyway | 08:17 |
tristan | Now regarding performance, I'm skeptical of threads | 08:17 |
tristan | We are saying that we are not processing intensive, but I'm not buying it | 08:17 |
juergbi | ah, implementation may not exactly be simpler as we'd have to suspend subprocesses of all threads from a single signal handler, I think | 08:17 |
tristan | If it were true, we would not have an initiative to subprocess the scheduler | 08:17 |
benschubert | tristan: which didn't pass the cut and was actually slower. Yes the UI updates are _really_ bad, which was "fixed" by not refreshing more than every 1sec | 08:18 |
benschubert | (actually user defined) | 08:18 |
tristan | We could revive that initiative, I would personally like to work on it, because I want to get rid of all that extra progress reporting codepaths and make sure it's all encapsulated in the mirrored State object in the scheduler process | 08:18 |
benschubert | and again this part was slow, because of writing to the cscreen | 08:18 |
tristan | Hmm | 08:19 |
juergbi | the job subprocesses do less work nowadays due to offloading to buildbox. however, we would require further optimization if we want to drop fork completely (instead of supporting it as less performant alternative for mac/windows) | 08:19 |
benschubert | > I want to get rid of all that extra progress reporting codepaths and make sure it's all encapsulated in the mirrored State object in the scheduler process | 08:19 |
benschubert | I'd love that, +1 definitely, not sure the process split is the best there | 08:19 |
tristan | Ok but what about the plugins which *do* process intensive stuff | 08:19 |
tristan | That stuff still exists and has it's place | 08:20 |
juergbi | e.g., 'virtual staging' still requires some Python CPU cycles. however, it should be possible to find an offload approach for that as well | 08:20 |
tristan | Anything around compose elements | 08:20 |
tristan | Anything obseriving and filtering files etc | 08:20 |
tristan | *observing | 08:20 |
juergbi | difficult to estimate the performance of this in real projects | 08:21 |
tristan | Even recursive dependency walks can be expensive and add up | 08:21 |
tristan | those will block other tasks from being dispatched | 08:21 |
benschubert | at that point, I think we should decide what "guarantees" on plugins isolation we want, and how much we are ready to change of public API(None is fine) | 08:21 |
benschubert | Then have a shot at a few ways we could optimize that and test it out with some real and benchmarking projects | 08:21 |
*** santi has joined #buildstream | 08:23 | |
benschubert | tristan: one way of optimizing the recursive dependency (has been in my mind for a while), would be to use a list (dependencies_list) instead of a generator when we know we're going to consume all (or the "happy case will do at least). That would remove a lot of context switch but means two separate implementations | 08:23 |
juergbi | do you see any potential issues in pursuing a dual approach as a starting point, i.e., support both, make fork opt-out on Linux? | 08:23 |
benschubert | juergbi: both being async and fork? | 08:23 |
juergbi | yes | 08:23 |
juergbi | keep both options open depending on performance | 08:23 |
tristan | Some things I would really want this conversation to conclude with are (1) Who is going to do this big refactor and when... (2) What are we going to do in order to immediately unblock !1901 ? | 08:23 |
benschubert | I think that we might be able to optimize more 'async' if we don't have to keep fork, but we could at least have a try on a branch | 08:24 |
juergbi | tristan: if we agree on this direction, this would still mean merging !1965 | 08:24 |
juergbi | as it's a completely different approach | 08:24 |
benschubert | and it would become unnecessary | 08:24 |
juergbi | but (1) is a valid question | 08:24 |
benschubert | so yeah, I think going forward with !196 is fine | 08:24 |
benschubert | depends on the time frame, I could probably have a go, but I'm pretty busy those days | 08:25 |
juergbi | a proof of concept prototype might not take that much time, may allow figuring out how much work is needed for a mergeable implementation | 08:26 |
tristan | Regarding protections and dangers of threading: I'm happy with knowing that if a plugin invokes private API illegally, they cause everything to break, otherwise if anything breaks: It's mostly our fault (or our responsibility to print a reasonable error about what the plugin obviously did wrong) | 08:27 |
juergbi | yes, I think that would be the goal | 08:27 |
tristan | Any updates to state in the internal data model should still be synchronized when jobs complete | 08:27 |
benschubert | yeah, definitely | 08:27 |
juergbi | just using public API the worst a plugin should be able to do is to be slower than necessary | 08:27 |
tristan | So we may need some internal wrangling for that, but I think plugins don't update state themselves (it's all our internal jobs which do that) | 08:28 |
benschubert | at least in the code path for jobs that's true | 08:28 |
benschubert | so we should be good | 08:28 |
tristan | Right, but in jobs we have some things like the job updating state in the data model before completion, passing that state over to the main process and then re-updating it | 08:28 |
tristan | Maybe not much to fix around there | 08:29 |
benschubert | yeah | 08:29 |
tristan | Ok | 08:30 |
tristan | benschubert, what I've gleaned from your previous comments regarding scheduler separation: I can already nuke the redundant code paths about state delivery in the scheduler | 08:30 |
tristan | Those are *really* getting on my nerves | 08:30 |
tristan | I was wanting to do that even if we had to revive the scheduler process separation | 08:31 |
benschubert | :D | 08:31 |
tristan | State was designed to do that in the first place, ok... | 08:32 |
tristan | Ok so, we don't really have a decision here do we ? | 08:32 |
tristan | Regarding subprocess vs threads | 08:32 |
benschubert | I think we should _try_ threads | 08:32 |
tristan | That's what decides !1965 | 08:32 |
tristan | Which blocks !1901 | 08:32 |
juergbi | I'd say agreement is pretty clear on merging !1965 | 08:32 |
benschubert | Yes | 08:33 |
juergbi | as nobody is fully happy with the current state of affairs | 08:33 |
tristan | Ok, lets do it | 08:33 |
tristan | And if we can also merge !1964 that will bring me one step closer to !1901 | 08:33 |
tristan | But we can do !1965 first | 08:34 |
tristan | benschubert, As you were going to reply to that mail anyway, would you like to announce that conclusion on the ML ? | 08:34 |
tristan | (and perhaps link to this IRC log) | 08:34 |
benschubert | sure, I'll do that today | 08:34 |
benschubert | ah | 08:34 |
benschubert | I though you meant another thread | 08:35 |
benschubert | but yeah I can | 08:35 |
tristan | I meant this one about nuking pickle jobbers :) | 08:35 |
tristan | There is a looming topic of artifact cache configurations which we can all afford to let simmer in the back of our minds for a while yet | 08:36 |
benschubert | yeah i need to answer that | 08:36 |
benschubert | on my plate is: | 08:37 |
benschubert | - answer this ML thread | 08:37 |
benschubert | - Answer the one about artifact cache | 08:37 |
benschubert | - Start one with plugins and testing (gaaah) | 08:37 |
benschubert | I should be able to do all of it by eow | 08:37 |
benschubert | and after that I could try a POC for threading if nobody picks it up before | 08:37 |
juergbi | sounds great | 08:40 |
tristan | benschubert, if you can do this ML thread first, I'm eager to land !1965 and !1964 | 08:40 |
benschubert | yeah I'll do that today | 08:40 |
benschubert | tristan: ^ | 08:40 |
* tristan thinks it's not a big mail, just wanted someone other than me to share the load of announcing conclusions of threads :) | 08:41 | |
benschubert | definitely | 08:41 |
*** traveltissues has joined #buildstream | 09:10 | |
traveltissues | any opinions on excluding tests directory from sast scans? https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/119 | 09:11 |
traveltissues | "secrets" will fail while scanning tests/sources/ostree due to lacking a gpg-agent socket. | 09:12 |
WSalmon | is this a know issue i can update or do i need to make a new issue? https://hastebin.com/asibabicaf.sql | 09:12 |
traveltissues | i've seen this before WSalmon but i never diagnosed the actual issue. have you recompiled the cython components? | 09:13 |
WSalmon | traveltissues, as far as i can tell | 09:13 |
WSalmon | it only dose it for some elements in the project | 09:14 |
traveltissues | so is it an unhandled bad format? | 09:14 |
WSalmon | but they work fine in CI so i assume its a local funny | 09:14 |
WSalmon | traveltissues, https://hastebin.com/uzevemomij.md more context | 09:15 |
traveltissues | if it's only local i'd try recompiling/cleanign venv | 09:15 |
WSalmon | it looks like it can read all the yaml on the first pass | 09:15 |
traveltissues | yes, that's the same as what i was seeing | 09:15 |
WSalmon | which is supper weired | 09:16 |
traveltissues | this should be handled better either way | 09:16 |
traveltissues | imo | 09:16 |
traveltissues | probably worth making an issue | 09:16 |
WSalmon | will do, didnt want to make a duplicate if some one knew of one already | 09:17 |
benschubert | WSalmon: this is a bug in one of the plugins in bst-experimental, don't know why it's hitting now, one of the plugins uses ConfigParser | 09:18 |
WSalmon | ah | 09:19 |
WSalmon | thanks benschubert | 09:19 |
*** hasebastian has quit IRC | 09:22 | |
WSalmon | FYI benschubert traveltissues https://gitlab.com/BuildStream/buildstream/-/issues/1341 | 09:29 |
traveltissues | ty | 09:31 |
*** hasebastian has joined #buildstream | 09:38 | |
*** tristan has quit IRC | 10:07 | |
*** xjuan has joined #buildstream | 12:29 | |
*** douglaswinship has joined #buildstream | 13:45 | |
douglaswinship | window move 7 | 13:46 |
douglaswinship | (ooops, sorry. Forgot the slash) | 13:46 |
douglaswinship | Is there an introduction anywhere to how to write plugins / what the various functions inside a plugin actually do / what invokes them? | 14:05 |
douglaswinship | Everything I've found so far seems to be written on the assumption that the reader already knows most of it. | 14:05 |
juergbi | douglaswinship: have you seen the plugin API reference already? | 14:07 |
juergbi | https://docs.buildstream.build/master/core_framework.html | 14:07 |
juergbi | the plugin methods are described there. for element plugins: https://docs.buildstream.build/master/buildstream.element.html | 14:07 |
juergbi | for source plugins: https://docs.buildstream.build/master/buildstream.source.html | 14:07 |
douglaswinship | juergbi: I've seen some of the pages from it, certainly. I still got pretty lost when I looked at them. I'll see if reading through from the beginning helps. | 14:09 |
juergbi | douglaswinship: for some higher level info the architecture docs may be useful as well: https://docs.buildstream.build/master/arch_data_model.html# | 14:11 |
douglaswinship | juergbi: thanks! | 14:11 |
douglaswinship | I've never seen the architecture section before. I think i must have been looking at older versions of the docs | 14:12 |
*** chipb has quit IRC | 14:16 | |
*** chipb has joined #buildstream | 14:16 | |
*** xjuan has quit IRC | 14:16 | |
*** xjuan has joined #buildstream | 14:18 | |
*** tristan has joined #buildstream | 16:24 | |
*** ChanServ sets mode: +o tristan | 16:24 | |
*** samwilson has joined #buildstream | 16:54 | |
*** hasebastian has quit IRC | 17:39 | |
*** toscalix has joined #buildstream | 17:41 | |
*** samwilson has quit IRC | 17:46 | |
*** santi has quit IRC | 18:10 | |
*** toscalix has quit IRC | 18:30 | |
*** santi has joined #buildstream | 18:35 | |
*** santi has quit IRC | 18:37 | |
*** santi has joined #buildstream | 18:42 | |
*** santi has quit IRC | 18:44 | |
benschubert | tristan: any reason we explicitely suspend every child process one by one instead of getting the list of all children and pausing all of them? | 19:26 |
*** toscalix has joined #buildstream | 20:44 | |
*** philn has joined #buildstream | 20:52 | |
*** toscalix has quit IRC | 21:26 | |
*** xjuan has quit IRC | 22:43 | |
*** benschubert has quit IRC | 23:49 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!