*** narispo has quit IRC | 00:23 | |
*** narispo has joined #buildstream | 00:24 | |
*** narispo has quit IRC | 00:27 | |
*** dylan-m_ has quit IRC | 00:28 | |
*** narispo has joined #buildstream | 00:30 | |
*** dylan-m_ has joined #buildstream | 00:43 | |
*** narispo has quit IRC | 00:45 | |
*** narispo has joined #buildstream | 00:47 | |
*** rdale has quit IRC | 00:57 | |
*** narispo has quit IRC | 01:00 | |
*** narispo has joined #buildstream | 01:00 | |
*** narispo has quit IRC | 01:21 | |
*** narispo has joined #buildstream | 01:23 | |
*** narispo has quit IRC | 01:45 | |
*** narispo has joined #buildstream | 01:45 | |
*** paulsherwood has joined #buildstream | 02:08 | |
*** dylan-m_ has quit IRC | 02:51 | |
*** paulsherwood has quit IRC | 02:56 | |
*** paulsherwood has joined #buildstream | 02:56 | |
*** traveltissues has joined #buildstream | 08:12 | |
*** juanalday has joined #buildstream | 08:33 | |
*** rdale has joined #buildstream | 09:04 | |
*** tiagogomes has joined #buildstream | 09:42 | |
*** santi has joined #buildstream | 09:50 | |
*** phildawson_ has joined #buildstream | 10:05 | |
*** rdale has quit IRC | 10:22 | |
*** jonathanmaw has joined #buildstream | 10:25 | |
*** lachlan has joined #buildstream | 10:32 | |
*** lachlan has quit IRC | 11:01 | |
gitlab-br-bot | traveltissues opened issue #1193 (Split `Stream._load` for target types) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1193 | 11:06 |
---|---|---|
*** lachlan has joined #buildstream | 11:09 | |
*** lachlan has quit IRC | 11:20 | |
*** lachlan has joined #buildstream | 11:21 | |
*** lachlan has quit IRC | 11:31 | |
*** bochecha has joined #buildstream | 11:35 | |
*** lachlan has joined #buildstream | 11:51 | |
*** phildawson_ has quit IRC | 12:08 | |
*** phildawson_ has joined #buildstream | 12:09 | |
*** juanalday has quit IRC | 12:10 | |
*** akvilebirgelyte has joined #buildstream | 12:12 | |
*** lachlan has quit IRC | 12:19 | |
*** lachlan has joined #buildstream | 12:32 | |
*** lachlan has quit IRC | 12:41 | |
gitlab-br-bot | traveltissues opened issue #1194 (Remove `Element.__tracking_scheduled`) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1194 | 12:46 |
gitlab-br-bot | traveltissues opened (was WIP) MR !1689 (traveltissues/1186-3->master: skip tracking elements without trackable sources) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1689 | 13:04 |
gitlab-br-bot | traveltissues opened issue #1195 (Resolve FIXME from "skip tracking elements without trackable sources") on buildstream https://gitlab.com/BuildStream/buildstream/issues/1195 | 13:08 |
tlater[m] | juergbi: The only files left in my test env that don't have g+r are artifact protos - I don't think they are ever touched by buildbox-casd, yet I still get permission denied errors; albeit on the client side now. | 13:15 |
juergbi | tlater[m]: I assume they are owned by the buildstream user? | 13:16 |
juergbi | seems odd | 13:16 |
juergbi | and they should indeed never be accessed by casd | 13:16 |
juergbi | only bst client and bst-artifact-server (for server repo) | 13:17 |
* tlater[m] tries to get his logs for this | 13:17 | |
tlater[m] | juergbi: https://hastebin.com/roxonujiyu.makefile | 13:18 |
* tlater[m] doesn't know where to debug from here, since nothing shows up in the artifact server logs | 13:18 | |
tlater[m] | And I can't track down exactly which permission is being denied | 13:19 |
tlater[m] | Since artifacts aren't handled by buildbox-casd, and casserver.py still has no logging in the commit, I suspect it's the artifact servicer, but it's difficult to get any reasonable output to confirm that :> | 13:19 |
tlater[m] | s/>|/ | 13:20 |
tlater[m] | Any ideas? | 13:20 |
tlater[m] | Oh, maybe I can print to sys.stderr | 13:22 |
*** juanalday has joined #buildstream | 13:22 | |
juergbi | tlater[m]: so casd from the bst client fails to download from bst-artifact-server | 13:26 |
juergbi | I don't see how this can be related to artifact protos | 13:27 |
tlater[m] | Neither do I tbh | 13:27 |
tlater[m] | Everything else is at least g+r, though | 13:27 |
juergbi | tlater[m]: is this with your bst-artifact-server changes merged in? | 13:28 |
juergbi | it could also be a permission denied due to utime | 13:29 |
juergbi | which requires write access | 13:29 |
tlater[m] | No | 13:29 |
tlater[m] | I have an early return before those | 13:30 |
tlater[m] | Unles... | 13:30 |
tlater[m] | Hm, I suppoe we might be calling that more often | 13:30 |
tlater[m] | "Files specified but no files found", hm, well, at least it's progress | 13:34 |
tlater[m] | I guess that means the pull succeeded | 13:35 |
tlater[m] | Oh, no, it just fails earlier | 13:35 |
* tlater[m] probably broke something | 13:35 | |
*** santi has quit IRC | 13:43 | |
*** santi has joined #buildstream | 13:56 | |
*** traveltissues has quit IRC | 14:00 | |
*** traveltissues has joined #buildstream | 14:03 | |
benschubert | juergbi, tlater[m] : Does one of you know of any case where the main process we have would receive a SIGTERM if it wasn't sent by outside? Or why the event loop we run would suddenly call the SIGTERM handler? (Or anybody else?) | 14:21 |
juergbi | benschubert: for the main process? hm, no, can't think of a reason | 14:21 |
juergbi | benschubert: I've looked into an issue where a SIGTERM was processed with a delay, though, due to the way asyncio works | 14:22 |
juergbi | and this actually resulted in double processing of SIGTERM, once by regular Python handle and once by asyncio | 14:22 |
*** lachlan has joined #buildstream | 14:25 | |
gitlab-br-bot | frazerleslieclews opened issue #1196 (flake8) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1196 | 14:26 |
tlater[m] | benschubert: I don't know of any cases besides that either | 14:26 |
juergbi | benschubert: to elaborate, asyncio uses Python's wakeup_fd signal handler, which writes the signal number into an asyncio socket. Scheduler._disconnect_signals() will not disable that (because the child watcher is still active and needs it), which means that signals received while the loop is suspended between _disconnect_signals() and _connect_signals() will be passed to the handlers registered in _connect_signals() | 14:29 |
juergbi | independent of whether any other regular python signal handlers are active while the loop is suspended or not | 14:29 |
gitlab-br-bot | tlater closed issue #1196 (flake8) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1196 | 14:35 |
*** lachlan has quit IRC | 14:36 | |
benschubert | juergbi: ok my problem is that when I do CTRL-C twice, I end up with having roughly every .5s the event loop calling the SIGTERM handler, but I really can't find why we would. Tracing the process under strace -tt -f doesn't show any sigterm received on the main process either | 14:39 |
*** traveltissues has quit IRC | 14:39 | |
juergbi | hm, every .5s, that's really odd | 14:41 |
juergbi | you could check strace output for the signal number being written to the socket by the wakeup_fd handler | 14:41 |
juergbi | i.e., is this happening multiple times as well or not | 14:41 |
juergbi | maybe somehow child processes write to the socket that belongs to the parent's asyncio loop | 14:42 |
benschubert | oh interesting let me try that | 14:45 |
*** lachlan has joined #buildstream | 15:07 | |
benschubert | juergbi: write(12<socket:[24613]>, "\17", 1) = 1 I get quite a lot of things like that and not 100% sure how to interpret the "\17" | 15:08 |
juergbi | benschubert: that's 0o17 = 15 = SIGTERM | 15:08 |
benschubert | oh -_-' | 15:08 |
juergbi | so that would likely be Python's wakeup_fd signal handler | 15:08 |
benschubert | so by forking we keep it | 15:09 |
benschubert | interesting | 15:09 |
juergbi | do you see these writes from multiple processes? | 15:09 |
benschubert | I was seeing this from one of the child process | 15:09 |
benschubert | let me check if I can see similar reads on the main one | 15:09 |
juergbi | benschubert: you could try signal.set_wakeup_fd(-1) early on in the child process | 15:09 |
juergbi | as workaround | 15:09 |
benschubert | yup I can see similar reads | 15:11 |
benschubert | I'm really surprised by this | 15:13 |
benschubert | does that mean our main process will actually be reading every signal sent to the children? | 15:13 |
benschubert | ok, set_wakeup_fd(-1) does remove this problem | 15:14 |
benschubert | juergbi: do you see any problems that could happen if we disable the child process wakeup_fd? | 15:14 |
juergbi | benschubert: no, they should really not write to that socket | 15:16 |
benschubert | ok! I'll fix this and cleanup my PR | 15:16 |
juergbi | I'm just wondering whether there is a more elegant way to essentially tell asyncio it should clean up on fork | 15:16 |
benschubert | good point | 15:16 |
benschubert | let me check :) | 15:16 |
juergbi | it should ideally close the socket, not just not write to it | 15:16 |
*** lachlan has quit IRC | 15:16 | |
benschubert | https://bugs.python.org/issue21998 | 15:16 |
juergbi | maybe there is some kind of asyncio shutdown function? | 15:17 |
benschubert | https://bugs.python.org/issue21998#msg325820 that's bad | 15:19 |
benschubert | tldr: fork is not supported when you have an event loop | 15:23 |
coldtom | hi, has anyone else experienced a *lot* of GRPC errors while using master? | 15:26 |
tpollard | benschubert: ..... | 15:27 |
benschubert | tpollard: yes... I'm not joking | 15:27 |
benschubert | I think I can get around it | 15:27 |
benschubert | but gosh -_-' | 15:27 |
*** lachlan has joined #buildstream | 15:30 | |
benschubert | juergbi, tpollard: I think we have two choices: | 15:43 |
benschubert | 1) set wakeup_fd to -1, which means we will have a copy of the master event loop still running in each children, this is error prone and if someone uses an aioloop without creating one explicitely will actually be running things from the master process (potentially) | 15:43 |
benschubert | 2) Reimplement an event loop/Process that cleanly shuts off the event loop as we fork, which will add overhead to each call but is the cleanest | 15:43 |
benschubert | I tend to prefer 2 even though that's more work. Do you have any preference? | 15:43 |
juergbi | benschubert: reimplement as in not using asyncio at all? or what exactly would we reimplement? | 15:58 |
benschubert | ah, implement another event loop that handles this nicely (probably just helper methods called as soon as the new process is called) | 15:59 |
benschubert | so closing the necessary fds and cleaning up the memory | 16:00 |
benschubert | I think 2 is much cleaner in the end, we'll need to benchmark it however | 16:00 |
juergbi | benschubert: but still using some asyncio infrastructure, right? | 16:00 |
benschubert | ah yeah | 16:01 |
benschubert | I don't want to give up on that :) | 16:01 |
juergbi | :) | 16:01 |
juergbi | if that's achievable with reasonable effort, that sounds good to me | 16:01 |
benschubert | ok! I'll see what I can do | 16:01 |
benschubert | thanks a lot for the debugging help | 16:01 |
juergbi | benschubert: maybe a custom policy like https://bugs.python.org/issue21998#msg313196 plus a fork handler would suffice | 16:02 |
benschubert | yep | 16:03 |
tpollard | is 2 compatible with forking the 'backend' into a subprocess? | 16:04 |
tlater[m] | juergbi: buildbox-casd tells me it's created a socket and is listening on it but buildstream tells me it times out waiting for the socket :( | 16:11 |
tlater[m] | The socket definitely doesn't come into existence | 16:11 |
tlater[m] | Have you seen anything like that? | 16:11 |
juergbi | tlater[m]: is it possible this is happening due to the symlink approach leading to two different temp directories pointing to the same cas directory in a test? | 16:11 |
benschubert | tpollard: the backend would have the loop so that should be transparent | 16:11 |
tlater[m] | juergbi: I wouldn't see how, but perhaps if I revert that change for a minute... | 16:13 |
*** lachlan has quit IRC | 16:17 | |
tlater[m] | Yup, that fixes the problem | 16:19 |
tlater[m] | I suppose a solution here would be to randomize the socket name? | 16:20 |
juergbi | tlater[m]: yes, that would retain the current behavior when running multiple bst sessions with the same cache directory | 16:24 |
juergbi | we don't really support it, however, making sure that each bst process talks to its own casd process is still much better than having it talk to a random casd process | 16:24 |
juergbi | (whose lifetime may be very limited) | 16:24 |
juergbi | and it fixes tests ;) | 16:24 |
*** dylan-m_ has joined #buildstream | 16:30 | |
*** santi has quit IRC | 16:44 | |
*** lachlan has joined #buildstream | 16:49 | |
*** narispo has quit IRC | 16:59 | |
*** santi has joined #buildstream | 17:03 | |
*** narispo has joined #buildstream | 17:03 | |
*** lachlan has quit IRC | 17:11 | |
*** narispo has quit IRC | 17:14 | |
*** narispo has joined #buildstream | 17:15 | |
*** jonathanmaw has quit IRC | 17:21 | |
*** tpollard has quit IRC | 17:29 | |
*** dtf has joined #buildstream | 17:33 | |
*** dylan-m_ is now known as dylanm | 17:35 | |
*** dylanm has quit IRC | 17:35 | |
*** dylan-m_ has joined #buildstream | 17:36 | |
*** juanalday has quit IRC | 17:36 | |
*** lachlan has joined #buildstream | 17:49 | |
*** tiagogomes has quit IRC | 17:49 | |
*** lachlan has quit IRC | 17:53 | |
*** narispo has quit IRC | 17:58 | |
*** narispo has joined #buildstream | 17:58 | |
*** santi has quit IRC | 18:01 | |
*** santi has joined #buildstream | 18:01 | |
*** santi has quit IRC | 18:04 | |
*** slaf_ has joined #buildstream | 18:28 | |
*** slaf_ has joined #buildstream | 18:28 | |
*** slaf_ has joined #buildstream | 18:29 | |
*** slaf_ has joined #buildstream | 18:29 | |
*** slaf has quit IRC | 18:30 | |
*** slaf_ is now known as slaf | 18:30 | |
*** dylan-m_ has quit IRC | 18:33 | |
*** narispo has quit IRC | 19:02 | |
*** narispo has joined #buildstream | 19:04 | |
*** dylan-m_ has joined #buildstream | 20:13 | |
*** dylan-m_ has quit IRC | 20:20 | |
*** benschubert has quit IRC | 21:19 | |
*** dylan-m_ has joined #buildstream | 21:47 | |
*** dylan-m__ has joined #buildstream | 23:25 | |
*** dylan-m_ has quit IRC | 23:25 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!