*** slaf has joined #buildstream | 00:08 | |
*** slaf has joined #buildstream | 00:08 | |
*** slaf has joined #buildstream | 00:08 | |
*** slaf has joined #buildstream | 00:08 | |
*** slaf has joined #buildstream | 00:09 | |
*** slaf has joined #buildstream | 00:09 | |
*** traveltissues has joined #buildstream | 09:19 | |
*** jonathanmaw has joined #buildstream | 09:26 | |
*** santi has joined #buildstream | 09:27 | |
*** tiagogomes has joined #buildstream | 09:40 | |
*** bochecha has joined #buildstream | 10:00 | |
*** phildawson_ has joined #buildstream | 10:04 | |
*** lachlan has joined #buildstream | 10:29 | |
*** lachlan has quit IRC | 10:58 | |
*** santi has quit IRC | 11:16 | |
*** santi has joined #buildstream | 11:24 | |
*** lachlan has joined #buildstream | 11:24 | |
*** lachlan has quit IRC | 11:54 | |
*** phildawson_ has quit IRC | 12:06 | |
*** phildawson_ has joined #buildstream | 12:07 | |
*** dtf has joined #buildstream | 12:22 | |
gitlab-br-bot | traveltissues opened issue #1183 (automatic cache key updating is not compatible with !1651) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1183 | 12:28 |
---|---|---|
gitlab-br-bot | traveltissues opened (was WIP) MR !1651 (traveltissues/1161->master: extend source api and remove private use from workspace plugin) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1651 | 12:36 |
gitlab-br-bot | cs-shadow opened issue #1184 (Remote Execution tests produce too much logs) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1184 | 12:42 |
*** phildawson_ has quit IRC | 13:03 | |
*** lachlan has joined #buildstream | 13:14 | |
gitlab-br-bot | traveltissues closed issue #1183 (automatic cache key updating is not compatible with !1651) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1183 | 13:31 |
gitlab-br-bot | traveltissues opened (was WIP) MR !1651 (traveltissues/1161->master: extend source api and remove private use from workspace plugin) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1651 | 13:32 |
tlater[m] | juergbi: Hrm, we have a test that tries to spam the remote server with lots of small files | 13:41 |
juergbi | yes | 13:42 |
tlater[m] | It's either hanging or taking more than 20 minutes to process with the buildbox-casd proxying | 13:42 |
juergbi | hm, the proxying is supposed to reduce overhead | 13:42 |
juergbi | could potentially be a message size issue | 13:43 |
juergbi | (was the motivation for the test) | 13:43 |
tlater[m] | juergbi: Specifically, we never seem to make it past FindMissingBlobs | 13:43 |
tlater[m] | buildbox-casd seems to get a lot of Write requests, that's all I can see | 13:43 |
tlater[m] | What could happen to the message size? | 13:44 |
* tlater[m] just literally forwards the message, and sets the same message size restraints BuildStream normally does | 13:45 | |
tlater[m] | Curiously the casserver thinks FindMissingBlobs() completes, but the client doesn't. | 13:46 |
*** phildawson_ has joined #buildstream | 13:59 | |
*** santi has quit IRC | 14:06 | |
juergbi | tlater[m]: is this when proxying FindMissingBlobs() requests that bst-artifact-server receives to casd | 14:08 |
juergbi | or when issuing FindMissingBlobs() as part of the artifact timestamp update code? | 14:08 |
tlater[m] | juergbi: This is part of proxying them when bst-artifact-server receives them | 14:09 |
* tlater[m] isn't sure why he never ran into this before | 14:09 | |
tlater[m] | Hm, worth checking if reverting things causes issues, this may be an issue with master? | 14:09 |
tlater[m] | buildbox-casd master that is | 14:10 |
juergbi | I've very recently tested buildbox-casd master against the test suite | 14:10 |
tlater[m] | Yes, but not proxied buildbox-casd | 14:11 |
tlater[m] | Yep, looks like it has broken between buildbox-casd versions | 14:13 |
* tlater[m] should check the old version, just to be sure | 14:13 | |
tlater[m] | Ah, no, just as I say that the test finishes | 14:14 |
gitlab-br-bot | BenjaminSchubert approved MR !1657 (aevri/enable_spawn_ci_5->master: job pickling: also pickle global state in node.pyx) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1657 | 14:14 |
* tlater[m] wonders if it's because he set LogLevel.TRACE | 14:21 | |
tlater[m] | Maybe that just makes buildbox-casd significantly slower | 14:21 |
benschubert | tlater[m]: I would be very surprised about that :) | 14:22 |
*** santi has joined #buildstream | 14:22 | |
tlater[m] | So would I! | 14:22 |
tlater[m] | But it's the only notable change I can see | 14:22 |
tlater[m] | Oh, yes | 14:22 |
tlater[m] | It is that | 14:22 |
tlater[m] | Wow | 14:22 |
benschubert | wut? | 14:23 |
tlater[m] | My guess is that it just spams so many files that the logging overhead becomes severe | 14:23 |
tlater[m] | Since every Write request is logged with an individual message | 14:23 |
tlater[m] | \o/ my tests pass now | 14:29 |
tlater[m] | So yeah, TRACE log level in buildbox-casd is very slow | 14:30 |
gitlab-br-bot | traveltissues closed issue #1181 (resolve workspaces stages only once test failure for multiprocessing run) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1181 | 15:12 |
*** santi has quit IRC | 15:42 | |
gitlab-br-bot | marge-bot123 merged MR !1657 (aevri/enable_spawn_ci_5->master: job pickling: also pickle global state in node.pyx) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1657 | 15:47 |
*** lachlan has quit IRC | 15:56 | |
gitlab-br-bot | aevri opened (was WIP) MR !1663 (aevri/enable_spawn_ci_6->master: Fix remaining spawn unit test breaks under 'tests/') on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1663 | 16:03 |
gitlab-br-bot | tpollard opened issue #1185 (Build does not exit gracefully on a second CTRL-C) on buildstream https://gitlab.com/BuildStream/buildstream/issues/1185 | 16:13 |
*** phil has joined #buildstream | 16:16 | |
*** phildawson_ has quit IRC | 16:17 | |
tlater[m] | juergbi: buildbox-casd uses the same old LRU expiry approach, right? | 16:18 |
* tlater[m] sees an artifact that is not LRU being expired and is very confused | 16:18 | |
tlater[m] | That is, not LRU on the fs leve | 16:19 |
tlater[m] | *l | 16:19 |
juergbi | tlater[m]: it uses blob-based LRU expiry | 16:19 |
juergbi | i.e., like bst-artifact-server before the switch to casd, but not like the old bst client artifact-based expiry | 16:19 |
tlater[m] | juergbi: Does it delete the actual artifact protos? | 16:20 |
tlater[m] | I think I get that, but I'm trying to figure out why my artifact disappears | 16:21 |
tlater[m] | Does it, besides removing the least recently used blobs, also delete any artifacts that refer to them? | 16:21 |
tlater[m] | Or is that done by the ArtifactServicer? | 16:21 |
juergbi | tlater[m]: there is no artifact proto expiry at all yet | 16:22 |
juergbi | I think there is a bug about it but I don't think this has been implemented yet | 16:22 |
tlater[m] | Hehe, so why is that proto disappearing :D | 16:23 |
* tlater[m] will try to figure that out then | 16:23 | |
*** lachlan has joined #buildstream | 16:45 | |
*** lachlan has quit IRC | 16:50 | |
tlater[m] | juergbi: I'm stuck - it looks like FetchTree is simply not updating my mtimes | 17:04 |
tlater[m] | If/when you have time, is there anything obviously wrong you can spot in https://gitlab.com/BuildStream/buildstream/merge_requests/1645/diffs?commit_id=35fb6969f474c32b69df3270a8aae1131c65830b ? | 17:04 |
gitlab-br-bot | aevri opened (was WIP) MR !1674 (aevri/fuse_mount_private->master: _fuse/mount: make mount() and unmount() private) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1674 | 17:06 |
gitlab-br-bot | aevri opened (was WIP) MR !1673 (aevri/testutils_artifactshare->master: tests/artifactshare: safer cleanup_on_sigterm use) on buildstream https://gitlab.com/BuildStream/buildstream/merge_requests/1673 | 17:07 |
juergbi | tlater[m]: I commented on an issue with the old reference service but I assume you're testing with the artifact service | 17:07 |
juergbi | it's about artifact.files, I presume | 17:07 |
juergbi | I don't see an issue at a quick glance | 17:08 |
tlater[m] | I suppose I'll need to start stracing buildbox-casd then | 17:09 |
*** santi has joined #buildstream | 17:09 | |
* tlater[m] doesn't see why it wouldn't update mtimes, but yet some blobs are removed despite their parent artifacts not being marked as outdated | 17:09 | |
*** santi has quit IRC | 17:12 | |
juergbi | tlater[m]: you see this as a failure in one of our expiry tests, or is this manual testing? | 17:25 |
tlater[m] | juergbi: It's in frontend/push.py::test_recently_pulled_artifact_does_not_expire | 17:25 |
tlater[m] | I've played a little with the cache quota sizes to make sure it's a proper failure | 17:26 |
tlater[m] | And I've dug into the buildbox-casd logs, too | 17:26 |
juergbi | ok | 17:26 |
juergbi | I assume you've checked we're passing the right quota to casd? | 17:26 |
tlater[m] | yeah, buildbox-casd handily reports the cache quota early on | 17:27 |
tlater[m] | juergbi: For reference, here are the logs: https://hastebin.com/rodedorepo.sql | 17:27 |
tlater[m] | What should be happening is that the cache usage drops to ~5M | 17:28 |
tlater[m] | Err, ~10M that is | 17:28 |
tlater[m] | For element1 and element3 to fit | 17:28 |
tlater[m] | But that's probably hard to read from those logs :) | 17:28 |
*** santi has joined #buildstream | 17:33 | |
juergbi | tlater[m]: shouldn't it use a 22 MB quota, not a 25 MB one for this test? | 17:38 |
juergbi | or rather, instead of the 24 MB one | 17:38 |
tlater[m] | juergbi: Yes, that's my experimentation to push that higher in case the lower quota trips | 17:38 |
tlater[m] | I have two logs in there, one with 24M, one with 25M | 17:39 |
juergbi | ah ok | 17:39 |
juergbi | tlater[m]: I assume it deletes element1 and element2 instead of just element2? | 17:41 |
tlater[m] | Yes | 17:42 |
tlater[m] | The assertions for element 3 and 2 pass, but it then fails with the final assertion | 17:42 |
tlater[m] | I've traced all the way into the assertion, and it's a FileNotFoundError in the code that reads `artifact.files`' blobs | 17:43 |
tlater[m] | So it's pretty clear that the mtime setting isn't working | 17:43 |
juergbi | tlater[m]: ah, you need to set fetch_file_blobs in the FetchTreeRequest | 17:47 |
juergbi | otherwise only the directories will be covered | 17:47 |
tlater[m] | Oh! | 17:48 |
tlater[m] | Hm, I wonder if that instead of means I need to call it twice, actually | 17:48 |
tlater[m] | But yes, that makes sense | 17:49 |
* tlater[m] was wondering if it was an argument he was missing, but dismissed that as unimportant o\ | 17:49 | |
tlater[m] | Ta juergbi, really appreciate it. | 17:49 |
tlater[m] | Ugh, I've been banging my head against this for hours :| | 17:49 |
juergbi | directory objects are always covered | 17:50 |
juergbi | files are optional | 17:50 |
juergbi | only one call required | 17:50 |
juergbi | I also missed it when first looking at your code | 17:50 |
tlater[m] | The doc should probably be changed then :) | 17:51 |
tlater[m] | Eh, easier to miss in review than when actually coding it | 17:51 |
juergbi | yes, I didn't look at the proto at that point | 17:52 |
* tlater[m] almost wants to try writing an artifact-server testing binary to really drive the protocol into his brain | 17:52 | |
*** lachlan has joined #buildstream | 17:55 | |
*** lachlan has quit IRC | 17:57 | |
*** traveltissues has quit IRC | 18:02 | |
*** jonathanmaw has quit IRC | 18:04 | |
*** santi has quit IRC | 18:06 | |
*** phil has quit IRC | 18:18 | |
*** rdale has quit IRC | 18:20 | |
*** benschubert has quit IRC | 21:26 | |
*** narispo has quit IRC | 22:42 | |
*** narispo has joined #buildstream | 22:43 | |
*** narispo has quit IRC | 23:12 | |
*** narispo has joined #buildstream | 23:12 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!