IRC logs for #buildstream for Wednesday, 2020-07-01

*** philn has quit IRC00:07
*** phildawson has quit IRC02:28
*** phildawson has joined #buildstream02:28
*** tristan has quit IRC05:44
*** tristan_ has joined #buildstream05:49
*** ChanServ sets mode: +o tristan_05:49
*** benschubert has joined #buildstream07:13
tristan_https://bpa.st/KFWQ07:47
tristan_benschubert, ^^^^07:47
tristan_And better performance on the way, possibly equal07:47
benschubertoh nice! That's a really good output07:48
tristan_So interestingly, constructors in cython all want tuples07:59
tristan_But method calls don't07:59
tristan_I'm thinking, having constructors with no arguments, and a separate init() method, is cheaper than tuplizing your constructor arguments08:00
benschubertit's probably worth a try, so I'd bet that if cython does it like that, they have a good reason :)08:01
*** dftxbs3e has joined #buildstream08:07
tristan_Well, it will decrease yellowness08:08
benschubertas I said, worth a shot! For Cython I realized that benchmarking was actually the only way of knowing if I was improving the code or not :'D08:09
tristan_Yeah, I think that I have one more avenue though08:09
tristan_Remove the lists08:09
tristan_That appears expensive, now that I have ResolutionStep, I should be able to just link the steps and avoid append()/pop()08:10
benschubertok :)08:25
benschubertGah, for the threaded scheduler, I now have all tests passing when running pytest directly, but most fail when running through tox -_-'08:26
tristan_Myeah08:31
juergbibenschubert: due to tox running with -n? or independent of -n?08:43
*** santi has joined #buildstream08:49
tristan_benschubert, Can you run another benchmark on the latest update ?08:58
* tristan_ will remove the lists now09:00
tristan_benschubert, In fact, the last three commits on the tree (at this moment) are interesting to benchmark09:22
tristan_benschubert, The third last has an algorithm change for circular deps which very noticeably improves performance, the last two commits are (A): Remove yellowness, by avoiding tuplizing constructor arguments... (B) Remove lists in Variables._resolve(), by adding ->prev pointers to ResolutionStep, and adding a ValueLink object to use instead of a python list of deps09:24
tristan_list removal being the tip of the branch09:24
benschuberttristan_: sure will do, starting now :)09:28
benschubertjuergbi: independent of -n09:28
benschubertpytest -n 10 works perfectly well09:28
benschubertI'll start looking in the env variables and see if I can reproduce09:29
benschubertbut it's annoying :/09:29
benschubertOtherwise I've got profiles now, and it seems like it might be just that we overwhelm the asyncio loop, so some cleanup might help well. Nothing too weird09:31
benschuberttristan_: benchmark running. Will let you know when it's done.09:34
robjhmy workflow; disable push to remote cache, build, fail? fix. success? reenable push, delete local cache and build again.09:54
robjhis there an easier way to make it not push failures?09:54
benschubertrobjh: I don't think so. You also do not need to delete the local cache in case of success, you could "just" rebuild / push directly, which would already be slightly better09:59
benschubert--no-push-failure or something might make sense though, not sure we have an issue about that09:59
juergbirobjh: can you elaborate on the 'fix' step. are you fixing the .bst file or the plugin?09:59
robjhthe bst file juergbi10:01
juergbiok, so you should definitely get a new cache key, so no manual deletion of local cache should be necessary10:01
juergbiand the only advantage of first disabling push is to save a bit of network bandwidth, correct?10:01
robjhif i dont delete the local cache first it doesnt trigger a push to the remote10:02
juergbithat seems odd10:02
juergbican you check whether the cache key of the element changes as it should?10:02
juergbiif not, this might be a plugin bug10:02
robjhno. failures being in the remote cache reduces my ability to debug because it doesnt show the full log of the failure and isnt then able to drop to a shell10:03
juergbihm, failure behavior shouldn't change just because the artifact has been pushed10:04
juergbiif you can reproduce this, can you please write down the steps in an issue?10:04
juergbithe main issue I'm aware of with cached failures is for transient failures or plugin bugs10:05
robjhi can certainly try10:05
juergbie.g. when an element fails because your system ran out of memory10:05
juergbibut in that case there is no fix in the .bst file and thus, this is a completely different scenario10:06
robjhin this case, im not sure the fix will work and i need more than one shot at seeing the failure happen10:07
tristan_benschubert, thanks10:08
robjhalso, what on earth does this mean?;10:08
robjhChoice: [continue]: s10:08
robjhDropping into an interactive shell in the failed build sandbox10:08
robjhError while attempting to create interactive shell: Buildtree is not cached locally or in available remotes10:08
juergbirobjh: I don't see why logs would deteriorate after pushing the failure10:08
robjhi just saw it try to build this one. it really should have everything10:08
juergbithat's right after the build failure in the same session?10:09
robjhyep10:09
juergbidon't have an idea off the top of my head10:09
robjhon failures, i like to go in with a shell and try it manually. once its failed, and cached, i cant do that10:10
juergbimaybe caching the buildtree failed (e.g. out of space) but I'd expect a different error message in that case10:10
juergbishell from a cached failure is supposed to work10:10
robjhonce its failed, and cached onto a server, its a pain to go and sort that out10:10
juergbiboth from the interactive build session and with `bst shell` afterwards10:10
abderrahim[m]caching the buildtree often fails for me10:10
abderrahim[m]and it fails silently10:10
abderrahim[m]I have filed this bug, but forgot to update it with my investigation10:11
juergbioh, that's new to me10:11
abderrahim[m]I'll try to do it later today10:11
juergbior maybe I forgot10:11
juergbithanks10:11
abderrahim[m]like it only caches things correctly for BuildElement, not for all Elements10:11
juergbihttps://gitlab.com/BuildStream/buildstream/-/issues/126310:11
juergbiah, that could be10:12
juergbirobjh: what10:12
juergbirobjh: what's the element kind in the shell failure?10:13
robjhjuergbi, this one is a kind: script10:13
juergbiok, thanks, maybe there is an issue in scriptelement10:13
robjhive worked around the issue by putting all the build deps into a compose element, checking it out and chrooting into it10:13
juergbias root cause for all of this10:13
robjhjuergbi, changing it to a manual and im able to get a shell10:15
abderrahim[m]https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/element.py#L165410:15
robjh(<juergbi> both from the interactive build session and with `bst shell` afterwards) `bst shell` you say?10:17
juergbiyes, bst shell --build can use cached buildtrees10:18
juergbibut the behavior will/should be the same as in the interactive build session. so if it already fails there, bst shell will probably fail as well10:18
robjhi'll make a note of that10:18
robjhDo you want to use the cached buildtree? [y/N]: y10:26
robjhWARNING: using a buildtree from a failed build.10:26
robjhshocker!10:26
*** SamThursfield[m] has joined #buildstream10:36
SamThursfield[m]registering your nickname on the Matrix IRC bridge is certainly a faff. I felt like i was trying to get into a berlin nightclub :)10:37
*** benbrown_ has joined #buildstream10:47
*** juergbi` has joined #buildstream10:48
*** benbrown has quit IRC10:49
*** juergbi has quit IRC10:49
*** juergbi` is now known as juergbi10:51
abderrahim[m]Sam Thursfield: :)10:51
juergbiHi SamThursfield[m] o/10:52
tristan_It's been a while since there's been a 'faff', about as long a time since there was a SamThursfield[m] !10:52
tristan_coincidence ? hmmmm10:53
SamThursfield[m]i've been teaching more people to faff, though10:53
*** tristan_ has quit IRC11:17
robjhim trying to stage a dependency at a specific location. I have a config: layout: section but buildstream is complaining that layout is an unexpected key. has the way you do this changed since 1.4?11:24
juergbirobjh: is this in a script element? I don't think it has changed11:25
robjhthis is in a kind: compose11:26
juergbiit's not supported for standard build elements11:26
robjhahhh11:26
juergbicompose doesn't support it either11:26
juergbiit's a scriptelement feature11:26
WSalmonrobjh, only scripts and other plugs that do it them selves can do layout like things11:26
juergbithere has been discussion about generalizing this in the yaml format11:27
WSalmonyou can have a pretty basic script element move it tho, and then you dep on that11:27
WSalmonbut i think you lose all the filter domains etc11:27
WSalmonso its not idea11:27
WSalmonideal11:27
juergbihttps://gitlab.com/BuildStream/buildstream/-/merge_requests/894 is related but it's unlikely to get merged as is11:27
robjhack, thanks11:28
douglaswinshipIf buildbox-casd dies during a CI run, and causes the job to fail, is that generally a problem with the branch I uploaded, or something going wrong with the runners? I'm pretty sure I've seen it before, and at the time I got the impression it was because the runners or the cache were reset in some way, while the CI job was running.11:50
abderrahim[m]IME buildbox only dies when it runs out of disk space11:51
abderrahim[m]you can add the buildstream logs to the CI artifacts (possibly only on failure) to investigate11:52
benschuberttristan: https://gitlab.com/snippets/199172511:54
*** Trevinho has quit IRC11:56
*** Trevinho has joined #buildstream11:56
WSalmondouglaswinship, buildbox-casd is still using a lot of memory, you will note that i limit the amount that it dose to stop it running out of ram in some places11:57
WSalmonespecally on pull11:57
juergbiWSalmon: do you still see this with 0.0.9+?11:58
WSalmonyep, i had it today locally11:58
WSalmoni think it might be better11:58
juergbihm, I thought I fixed all these issues but maybe I missed something11:59
WSalmoni am trying to do too much at once atm so i will double check when i get a change11:59
WSalmonand update/create issues11:59
WSalmonjuergbi, it seems a good bit quicker tho, so thank you for all your efforts12:00
WSalmondouglaswinship, quite a few project capture the casd logs tho, wrorth a check12:00
WSalmoni think fd do12:00
douglaswinshipwill have a look. Not sure yet what i'd expect to see.12:05
*** mohan43u has quit IRC12:05
*** mohan43u has joined #buildstream12:08
*** mohan43u has quit IRC12:11
*** mohan43u has joined #buildstream12:14
*** Frazer has quit IRC12:18
*** Frazer6 has joined #buildstream12:18
*** pointswaves has quit IRC12:21
*** Frazer61 has joined #buildstream12:52
*** Frazer6 has quit IRC12:52
*** tristan_ has joined #buildstream13:08
*** ChanServ sets mode: +o tristan_13:08
tristan_benschubert, any luck ?13:08
benschuberttristan_: https://gitlab.com/snippets/199172513:08
tristan_still not quite perfect13:10
tristan_how many elements is this btw ?13:11
tristan_Would be nice to know just how severely worse performance is13:11
tristan_right now we're down to have a second slower out of around 6 seconds13:12
*** dftxbs3e has quit IRC13:29
*** dftxbs3e has joined #buildstream13:32
benschuberttristan_: 6k elements13:33
benschubertI'd say 15-20% slower is quite a big thing for the `show` :/13:34
tristan_I'm trying structs and PyMem_Malloc/PyMem_Free13:43
Frazer61hey, not sure if this should be an issue for the mailing list or gitlab issue. but i was wondering if its wanted to add BuildStream to the linguist repository https://github.com/github/linguist ? mainly so BuildStream can be shown up in projects along side the other languages used in the project on github or gitlab down the road and give13:45
Frazer61BuildStream more notice? doesnt seem to be too hard to do https://github.com/github/linguist/blob/master/CONTRIBUTING.md#adding-a-language13:45
* tristan_ doesnt quite get struct pointer member dereferencing in python13:49
tristan_Struct *struct = <Struct *>PyMem_Malloc(...) ... struct->member is unliked by cython13:50
benschuberttristan_: that indeed seems weird :/13:50
tristan_struct.member doesnt cause any compile time error, but seems obviously wrong13:50
*** hasebastian has joined #buildstream14:28
tristan_I think 15-20% slower is an exaggeration, we're .5 seconds slower for 6k elements to show (where it already took almost 6 seconds) I think that is maximum 10% slower in show (and probably entirely negligible for anything other than show)14:52
benschuberttristan_: ah sorry, I was still on the earlier numbers14:56
benschubertyeah, 10% is still too much though14:57
benschuberteven the current 6s is difficult to work with (and that's for only 6k elements)14:57
tristan_*only*14:57
tristan_heh14:57
tristan_Like as if I'll ever see a project in my life with that many elements ;-)14:57
benschubertI can tell you I need at least one more order of magnitude :)14:57
benschubertSo yeah, I care about those .5s :)14:59
tristan_yeah yeah... you'll get em back14:59
tristan_I'll have to ramp up the voltage though15:00
* tristan_ has been getting segfaults and this time around `gdb src/buildstream/_variables.cpython-37m-x86_64-linux-gnu.so core` is not being useful15:00
tristan_So I'll start with a little corner below keeping the rest in tact and try out some things more incrementally15:01
benschubertLet me know if I can help, instead of only being picky ;)15:01
tristan_nah, there's not much room15:02
tristan_better to concentrate on other 2.0 related items15:02
tristan_I should get there in one more day, maaaaaybe two15:03
benschubertok!15:03
benschubertI'll go back hammering my threads then15:03
WSalmonjuergbi, et al. i have just had https://gitlab.com/celduin/bsps/bst-boardsupport/-/jobs/619700756 fail to push a index to https://push.public.aws.celduin.co.uk or http://cache.pointswaves.co.uk:50053, the second server dose not have any proxy in front of it, its just runnig a resonably recent nightly bst docker image with the artifact server. the second one has the artifact falsely reporting that its already there far less tho, so im thruly15:33
WSalmonperplexed, the second is in a scaleways data center while the runner and the frist cache are both in15:33
WSalmonaws irland15:34
juergbiWSalmon: maybe the remaining issue is related to the push issue I mentioned last week or so15:35
juergbiwhich should be fixed by https://gitlab.com/BuildStream/buildstream/-/merge_requests/197615:35
juergbiI also have a branch now that uses the Remote Asset API. however, the BuildStream core logic hasn't changed that much, so it might not change anything15:37
WSalmonso line 1385 is [--:--:--][c6f8bfed][    push:freedesktop-sdk.bst:bootstrap/diffutils.bst] INFO    Remote (http://cache.pointswaves.co.uk:50053) already has artifact c6f8bfed cached but sh-5.0# ls /data/artifacts/refs/freedesktop-sdk/bootstrap-diffutils/15:46
WSalmon68e95fa6fb1e1751f01ee175f2bad646443b2309bb4be9dd0aef85b20bdb71e0  9a9a994c84e7111f7fa6dbbac4cd755ed2b1799382b5f2624accd4a86a444a0b15:46
WSalmon81a4239a68f96a594bb2c18a6dd9edb74947c77c59d942445fdace52c99f0272  9b5e41aa0123519fc4b32d6ace6f2d783b14bc59f79b24a21412e6b490b5b47b15:46
WSalmonsh-5.0#15:46
WSalmonthere is no artifact for that key, its not that it is not updating it15:46
WSalmonjuergbi^15:46
coldtomyup that's the behaviour i observed too WSalmon, there's no artifacts in the cache15:47
WSalmonand there is 10s of Gb of disk left and plenty of cpu and ram15:47
coldtomseems like it happens consistently on the cache key too15:47
WSalmonbut my server dose it a lot less than yours15:48
WSalmoni wondered if its was a payload but the artifacts are pretty small15:48
WSalmonsh-5.0# cat /data/artifacts/refs/freedesktop-sdk/bootstrap-coreutils/e27244816ea8bba93588004b40d2d1183ace971a512a5e1b8f67e06494ccec9015:49
WSalmon�succeeded*@e27244816ea8bba93588004b40d2d1183ace971a512a5e1b8f67e06494ccec902@c97c7cbe50806804e80fbe58b6d791e70c6cf163171bd039c7869bbe4b19bf39BD15:49
WSalmon@7383888b4c7ce5616a33965cfbfda64a830558ea820a1c11effb994999f200ecNJw15:49
WSalmonfreedesktop-sdk"bootstrap/coreutils-build-deps.bst�@a8c01fedda1bcff6e47b5ffb5aeee40a096bcdbefc22715816d534908e810f32RE15:49
WSalmon@4e73db136d9542d8e28e632f2adb18cc25f94df4cbe6df58d4240e4334c44a17Za15:49
WSalmone2724481-build.7008.logF15:49
WSalmonit seems like once the server decides it dosent like that artifact it dose it again, i wonded if there could be some wire thing with files not getting closed but i presume the with block should sort that15:51
WSalmonand the atomic write15:51
juergbiWSalmon: do you use non-strict mode?15:54
WSalmonno15:54
juergbiit might still be related to weak cache keys, though15:55
juergbiit's possible that the weak cache key already exists on the server and that alone may be sufficient to trigger that message, due to a bug15:56
juergbi!1976 should fix this as well15:56
WSalmonthat would make more sence as the first build when we redeploy the server dosnt have issues15:56
WSalmon*as far as i have seen15:57
juergbiyes, the loop in _push_artifact_proto is buggy15:57
WSalmonwill !1976 be enough?15:58
WSalmonit takes a while to build the docker image i will use to test this so i can build one for that MR but I can hold back if more is likely to change?15:59
WSalmonthere are all client side changes right? looking at the MR thats my impression..16:00
juergbiyes, I don't see this issue with the version in !1976 based on reading the code16:01
juergbiyes16:01
WSalmonthanks juergbi16:01
WSalmoni spent a while looking at both the client and server code but not that bit..16:03
WSalmonalways the way16:03
juergbiabderrahim[m]: I've noticed your source-cache branch. that will likely conflict with my remote-asset branch16:45
juergbiI already have something in mind for a related speedup, which is to use casd's StageTree instead of staging previous sources in BuildStream itself. casd will use buildbox-fuse where available, so staging will be instant16:46
abderrahim[m]ok16:47
juergbi(and when planned capture optimizations are in place, the capture part will also be much faster)16:47
abderrahim[m]let's see :)16:47
juergbican hopefully get my changes into an MR soon16:47
abderrahim[m]any ETA on the remote-asset16:48
juergbithe main part is working now in my branch, all tests are passing16:49
juergbineed to fix up my casd branch16:49
juergbiand then it might soon be ready for an MR16:49
juergbiprobably at some point next week16:50
juergbithis is just the first step, though, moving over to the new protocol. a few additional things will be changed, especially in the source cache area16:51
abderrahim[m]ack16:54
abderrahim[m]I'm more interested in the new protocol :)16:54
juergbiabderrahim[m]: the changes are pushed in juerg/remote-asset if you're interested16:58
*** hasebastian has quit IRC17:17
*** santi has quit IRC17:44
*** tristan_ has quit IRC19:31
*** tristan has joined #buildstream19:32
*** ChanServ sets mode: +o tristan19:33
tristanbenschubert, I hit the mark !19:49
tristanWell, according to the variables test, I'm getting better results19:49
tristanAnd I'm only half way through structifying19:50
tristanI'm not sure about dropping 'str' for 'char *' though, if we do, I'd still want to take advantage of 'dict'19:50
tristanand intern strings19:50
* tristan not sure if that's possible or if it's worth bringing in a hash table implementation19:51
* tristan pushes tristan/partial-variables for the night (or early morning)19:53
tristanstill pretty dirty in there... but speed is getting nicer19:53
tristanPyCapsule !20:12
*** benschubert has quit IRC21:31

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!