*** benschubert has quit IRC | 00:34 | |
*** mohan43u has quit IRC | 01:40 | |
*** mohan43u has joined #buildstream | 01:40 | |
*** tristan has quit IRC | 06:50 | |
*** benschubert has joined #buildstream | 07:37 | |
*** tristan has joined #buildstream | 07:39 | |
*** ChanServ sets mode: +o tristan | 07:39 | |
*** samwilson has joined #buildstream | 07:43 | |
tristan | benschubert, There is still probably a lot which I don't understand about cython... could you explain how it is that PARSE_CACHE here is bound to a given element, and not process global state? https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_variables.pyx#L223 | 08:21 |
---|---|---|
benschubert | tristan: It is global state | 08:22 |
tristan | Oh wait | 08:22 |
tristan | I understand now | 08:23 |
tristan | It's a translation table of "%{varname}" -> "varname" | 08:23 |
benschubert | right | 08:23 |
benschubert | tht's actually a super comment we should add on top | 08:23 |
benschubert | would help a lot | 08:23 |
tristan | heh | 08:23 |
tristan | Yeah the comment is confusing, speaking already of lifecycles of contained strings being tied to elements which use it | 08:24 |
*** santi has joined #buildstream | 08:24 | |
tristan | Or wait, its a translation to a list | 08:25 |
tristan | So, it also indexes things like "%{foo} bar %{baz}" -> ["foo", "baz"] iiuc | 08:25 |
benschubert | I amsure of the correctness of this part | 08:25 |
tristan | yes that seems correct, when you try to expand something, you can have this form, so you would need a list of things which need to be resolved in order to resolve the thing | 08:26 |
benschubert | *I am not sure | 08:26 |
tristan | No ? | 08:26 |
benschubert | mmh no actually, should work sorry | 08:26 |
benschubert | this part of the code always gets me | 08:26 |
tristan | Well, it appears correct so far, but I think I will need to understand a bit more :) | 08:26 |
benschubert | don't hesitate to add comments there | 08:27 |
* tristan will try to leave it in a more understandable state yeah | 08:27 | |
benschubert | it's definitely a part of the coe I'm not comfortable with (most of the rest of the file is more direct at least) | 08:27 |
tristan | may rename some variables too | 08:27 |
tristan | Do you know if we support strings like "There are some { literal %{braces} in } here" ? | 08:29 |
tristan | I think at first we didn't really support that | 08:29 |
tristan | And I'm sure we never supported "%{var-%{name}}" dynamic variable name resolution | 08:30 |
tristan | (nor do I think we need to) | 08:30 |
tristan | Anyway, that's kind of orthogonal, I'll try to focus | 08:31 |
tristan | The part of variables which always bites me is the regex, I kind of despise them :) | 08:32 |
benschubert | > "%{var-%{name}}" definitely not. We should probably efend against it | 08:35 |
benschubert | > "There are some { literal %{braces} in } here" ? should work correctly, cs-shadow did a fix for that recently | 08:35 |
*** tomaz has joined #buildstream | 08:41 | |
*** phildawson has quit IRC | 08:42 | |
*** phildawson has joined #buildstream | 08:43 | |
*** tristan has quit IRC | 08:52 | |
juergbi | benschubert: have you seen https://github.com/cython/cython/issues/3515#issuecomment-646374155 ? adding a dummy inline function to node.pxd might work around the coverage 5 issues, which might allow getting py38 coverage to finally work | 09:02 |
juergbi | haven't tested it yet | 09:02 |
benschubert | That would be great news! | 09:09 |
*** tristan has joined #buildstream | 10:16 | |
*** ChanServ sets mode: +o tristan | 10:16 | |
tristan | benschubert, you don't like the try / except KeyErrors ? I think that we save a line of code in loader.py with your suggestion, but in _project.py, it seems cleaner to me to have try/except KeyError no ? | 10:45 |
tristan | To be honest, I've been hoping to make more explicit `try / except KeyError` statements throughout the codebase, to enforce a more "ask for forgiveness instead of permission" mentality/pattern everywhere | 10:46 |
tristan | To many code blocks are doing "if (this_thing_exists): do_this_thing()" instead of "try: get_this_thing() / except KeyError: oops_nothing_here()" | 10:47 |
tristan | resulting in a handful of redundant lookups | 10:47 |
tristan | I think/hope that try/except coding style helps to change the readers mind about how to go about things | 10:48 |
benschubert | tristan: I am not against try-except on the contrary! | 10:55 |
benschubert | Just that when we do `try-except-pass`, I think that `suppress` is more explicit. the `pass` always seems like someone forgot to implement it | 10:56 |
benschubert | if you find try-except-pass more redaable I'm fine with that too | 10:56 |
tristan | Hah | 10:56 |
tristan | Yeah I rather liked suppress in loader.py, in _project.py there is not pass | 10:57 |
*** abderrahim[m] has joined #buildstream | 10:57 | |
tristan | benschubert, however, looking again, I think even better is `junction_list = self._junction_duplicates.get(project_name, {})` | 10:58 |
tristan | rendering the following loop a no-op and doing it in one-line | 10:58 |
benschubert | definitely | 10:58 |
tristan | I will also rename it junction_dict, since it is a dict of dicts afterall | 10:58 |
benschubert | Seems good! | 10:59 |
tristan | That used to be a list of tuples | 10:59 |
tristan | But I changed it | 10:59 |
abderrahim[m] | no gitlab bot :S | 11:15 |
abderrahim[m] | tristan: I think you would love https://gitlab.com/BuildStream/buildstream/-/merge_requests/1970 | 11:16 |
abderrahim[m] | please tell me if not :) | 11:16 |
tristan | abderrahim[m], Well, removing `element._build_log_path = ...` from _artifact.py is already a great course correction ! | 11:22 |
tristan | I'm unsure exactly everything that's going on in there | 11:22 |
tristan | just a glance | 11:22 |
tristan | Logging the build environment in element.assemble() rather than in buildjob also looks good to me | 11:23 |
tristan | Oh ! that horrible hack from buildjob is going away ! | 11:24 |
* tristan jumps for joy \o/ | 11:24 | |
abderrahim[m] | I'd also appreciate a hint on where to put a regression test for https://gitlab.com/BuildStream/buildstream/-/merge_requests/1971 | 11:24 |
tristan | ohhhh | 11:25 |
tristan | abderrahim[m], you may want to wait for me to finish !1901, not sure... | 11:25 |
tristan | abderrahim[m], yeah, I think you should wait for !1901 to land, I'm landing it quite presently | 11:26 |
tristan | abderrahim[m], the point is that, an element path is not always the same, you can have multiple element paths from the command line which are all valid ways of reaching a given element | 11:27 |
abderrahim[m] | ok, I'll try to see if !1901 fixes the issue then | 11:27 |
tristan | So test cases for that should be parameterized to try paths (A) Absolute resolved paths, (B) Paths which involve links, (C) Paths which point to elements via overridden subprojects | 11:28 |
tristan | benschubert, I just replied regarding the error messages, if you have any thoughts on how to represent these strings (or agree with the approach), let me know: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1901/#note_366414606 | 11:31 |
tristan | I'll try to bang it out now | 11:31 |
benschubert | tristan: answered | 11:34 |
benschubert | I agree with the approach and if you'd rather have it like that I'm also fine with it | 11:34 |
tristan | And answered your question :) | 11:40 |
tristan | So the whole error message is about a "project" being defined multiple times, where we display every instance - it is indeed left up to the user to know *why*, that is probably a shortcomming of the message | 11:41 |
tristan | Maybe a standard paragraph along with the error would be good. | 11:41 |
benschubert | or a link to the docs? | 11:42 |
benschubert | it might get out of date but it is a nice way of helping | 11:42 |
tristan | Hmmm | 11:43 |
tristan | Referring to urls in error messages | 11:43 |
benschubert | anyways I'll be back in a moment, I'm off for food | 11:43 |
tristan | I'm hoping we don't have that pattern too much | 11:43 |
tristan | Yup | 11:43 |
benschubert | I don't think we have | 11:43 |
tristan | benschubert, We have an agreement to do so for one case | 11:43 |
tristan | And I think it's exceptional | 11:43 |
benschubert | ok, let's not do that then | 11:44 |
tristan | I.e. it's regarding parallel installation of bst-1 / bst-2 | 11:44 |
tristan | Enjoy your dinner :) | 11:44 |
* tristan thinks he can capture the important bit with one sentence | 11:45 | |
tristan | "The following instances of project 'foo' were encountered: [ insert list ] .... \n\n Internal projects do not cause any conflicts. Conflicts can also be avoided by marking every instance of the project as a duplicate." | 11:47 |
tristan | Two short sentences | 11:48 |
benschubert | tristan: I like that | 12:22 |
tristan | benschubert, juergbi, I think I've addressed the majority of comments on the !1901 branch now | 12:44 |
tristan | juergbi, do you have strong feelings about https://gitlab.com/BuildStream/buildstream/-/merge_requests/1901/#note_366357312 ? Would you like additional comments explaining that alternative parents are needed in order to access the entire ancestry ? | 12:45 |
tristan | As I mentioned in the comments, I would rather have that function stand on it's own, and not speak about internal/duplicates as one of it's use cases | 12:45 |
tristan | How about, I'll let you guys think on it a bit more, and in the absence of further comments on the MR I will merge tomorrow morning Korea time ? | 12:47 |
*** tristan has quit IRC | 13:11 | |
*** phildawson has quit IRC | 13:34 | |
*** phildawson has joined #buildstream | 13:35 | |
benschubert | tristan: good for me, I'll have a lsat look tonight | 13:39 |
*** santi has quit IRC | 13:42 | |
*** santi has joined #buildstream | 13:45 | |
WSalmon | can anyone think of a good reason https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_cas/casserver.py#L512 would fail? | 15:50 |
juergbi | WSalmon: how does it fail? | 16:13 |
juergbi | the serialization or the write? | 16:13 |
WSalmon | any of it | 16:14 |
juergbi | out of disk space but rather unlikely | 16:14 |
juergbi | I wouldn't expect SerializeToString to fail | 16:14 |
WSalmon | i *think* something todo with pushing is failing and its being reported as a `artifact already there` the server its on is 200M/1G disk wise | 16:15 |
WSalmon | is that right coldtom ? | 16:15 |
WSalmon | when coldtom went and looked he couldnt find the artifact even though the error said `already here` | 16:15 |
WSalmon | <coldtom> [root@infra /]# kubectl exec -ti -n rex statefulset/storage-artifact -- find /artifacts -name f7b9a8c38b4d65f633117ff028ee433c488d6ce4a05643869a49f74b5c99c4f3 | 16:16 |
WSalmon | <coldtom> [root@infra /]# | 16:16 |
coldtom | that's correct WSalmon | 16:16 |
juergbi | what was the exact error when it couldn't find the artifact? | 16:16 |
juergbi | i.e., could it have been a blob missing? | 16:16 |
WSalmon | its a index only server | 16:16 |
juergbi | ok but buildstream always does both | 16:17 |
coldtom | juergbi, there have been no gRPC NOT_FOUNDs from the CAS | 16:17 |
coldtom | so i don't think it's missing blobs | 16:17 |
WSalmon | this is the right up https://gitlab.com/celduin/infrastructure/celduin-infra/-/issues/208 | 16:17 |
WSalmon | the error message seems to imply that it could push the cas bits | 16:17 |
WSalmon | but then failed to push to the index | 16:18 |
coldtom | the behaviour is "not cached" -> build -> push -> "already cached" | 16:18 |
coldtom | inspecting the artifact store shows that it is not cached | 16:18 |
juergbi | there is a logic error in the buildstream artifact cache client I've noticed recently, which may be related | 16:18 |
juergbi | pushing the artifact proto should actually replace the proto on the server, not return that it's already cached | 16:19 |
juergbi | as the artifact proto contents may be different (pointing to different CAS objects) | 16:19 |
juergbi | i.e., _push_artifact_proto() first calling get_artifact() seems incorrect to me | 16:20 |
juergbi | it's not a big issue on combined index/CAS servers as there get_artifact() will return NOT_FOUND if a CAS blob is missing | 16:20 |
juergbi | but with a standalone index server, the artifact proto might remain forever with dead references to CAS objects | 16:21 |
juergbi | I can't be sure that artifact protos with dead references are the issue in your tests but it could be | 16:22 |
WSalmon | that would make sence if the pull reported index but not cas, but the pull fails striaght away | 16:22 |
coldtom | hmm, we shouldn't have had any blob expiry in the CAS yet | 16:22 |
coldtom | and in that case, there should be a reference stored on the server, afaict the artifact is never stored | 16:23 |
juergbi | hm, so get_artifact() is successful despite the artifact proto not existing on the index server? | 16:24 |
juergbi | that's pretty odd | 16:24 |
juergbi | coldtom: you mention GetArtifact requests being logged. do you also have logs of the result (grpc status)? | 16:26 |
juergbi | i.e., is the server incorrectly returning OK or is BuildStream not detecting the error? | 16:27 |
coldtom | juergbi, unfortunately bst-artifact-server doesn't log the responses, and there's no way to tell which GetArtifact request is the offending one from the nginx logs | 16:29 |
juergbi | ok. I would expect at least one test in our test suite to exercise GetArtifact for a not existing artifact | 16:30 |
WSalmon | coldtom, do you record the standard out of the artifact-index server? | 16:35 |
coldtom | yup, but it only logs when requests come in so doesn't know the response | 16:38 |
juergbi | WSalmon, coldtom: I think we should head in this direction: https://gitlab.com/BuildStream/buildstream/commits/juerg/push | 16:40 |
juergbi | (and at a higher level, completely skip artifact push for remotes where we've pulled the artifact in the same session) | 16:41 |
juergbi | I still can't explain the odd behavior you're seeing but it might still help | 16:42 |
WSalmon | [--:--:--][f46ae283][ push:freedesktop-sdk.bst:bootstrap/file.bst] INFO Remote (https://push.public.aws.celduin.co.uk) already has artifact f46ae283 cached | 16:45 |
WSalmon | [--:--:--][f46ae283][ push:freedesktop-sdk.bst:bootstrap/file.bst] STATUS Pushing artifact f46ae283 -> http://localhost:10101 | 16:45 |
WSalmon | [--:--:--][f46ae283][ push:freedesktop-sdk.bst:bootstrap/file.bst] INFO Pushed artifact f46ae283 -> http://localhost:10101 | 16:45 |
WSalmon | [00:00:00][f46ae283][ push:freedesktop-sdk.bst:bootstrap/file.bst] SUCCESS freedesktop-sdk/bootstrap-file/f46ae283-push.2372052.log | 16:45 |
WSalmon | so this is annoying, (https://push.public.aws.celduin.co.uk) already has artifact f46ae283 cached is a lie | 16:45 |
WSalmon | but my local index only has just excepted the artifact | 16:46 |
*** tristan has joined #buildstream | 16:46 | |
*** ChanServ sets mode: +o tristan | 16:46 | |
* WSalmon looks suspiciously at coldtom's deployment | 16:47 | |
*** samwilson has quit IRC | 16:52 | |
WSalmon | [--:--:--][3a490e29][ push:freedesktop-sdk.bst:bootstrap/binutils.bst] STATUS Pushing data from artifact 3a490e29 -> https://push.public.aws.celduin.co.uk | 16:52 |
WSalmon | [--:--:--][3a490e29][ push:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO Pushed data from artifact 3a490e29 -> https://push.public.aws.celduin.co.uk | 16:52 |
WSalmon | [--:--:--][3a490e29][ push:freedesktop-sdk.bst:bootstrap/binutils.bst] STATUS Pushing artifact 3a490e29 -> https://push.public.aws.celduin.co.uk | 16:52 |
WSalmon | [--:--:--][3a490e29][ push:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO Remote (https://push.public.aws.celduin.co.uk) already has artifact 3a490e29 cached | 16:52 |
WSalmon | [--:--:--][3a490e29][ push:freedesktop-sdk.bst:bootstrap/binutils.bst] STATUS Pushing artifact 3a490e29 -> http://localhost:10101 | 16:52 |
WSalmon | [--:--:--][3a490e29][ push:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO Pushed artifact 3a490e29 -> http://localhost:10101 | 16:52 |
WSalmon | [00:00:00][3a490e29][ push:freedesktop-sdk.bst:bootstrap/binutils.bst] SUCCESS freedesktop-sdk/bootstrap-binutils/3a490e29-push.2386028.log | 16:52 |
WSalmon | same again | 16:52 |
juergbi | WSalmon: and you're really certain that the artifact doesn't exist on the server? | 16:54 |
WSalmon | [--:--:--][3a490e29][ pull:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO Remote (https://push.public.aws.celduin.co.uk) does not have artifact 3a490e29 cached | 16:57 |
WSalmon | [--:--:--][3a490e29][ pull:freedesktop-sdk.bst:bootstrap/binutils.bst] STATUS Pulling artifact 3a490e29 <- http://localhost:10101 | 16:57 |
WSalmon | [--:--:--][3a490e29][ pull:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO Remote (http://localhost:10101) does not have artifact 3a490e29 cached | 16:57 |
WSalmon | [00:00:00][3a490e29][ pull:freedesktop-sdk.bst:bootstrap/binutils.bst] SKIPPED Pull | 16:57 |
WSalmon | one or other is failing and as top showed for that other one i dosent look like its there | 16:58 |
juergbi | WSalmon: and there is no chance it has been pushed by another buildstream client in the meantime? | 16:59 |
WSalmon | no | 16:59 |
WSalmon | we get this a lot | 16:59 |
WSalmon | its not the odd thing | 17:00 |
juergbi | ah, wait, that's the localhost:10101 | 17:00 |
juergbi | i.e., not the same server | 17:00 |
juergbi | it successfully pushes to localhost:10101 | 17:00 |
WSalmon | yes | 17:00 |
WSalmon | thats what i mean | 17:00 |
WSalmon | * WSalmon looks suspiciously at coldtom's deployment | 17:00 |
WSalmon | im gona delete the artifact and try again but i need to go sort something els out now | 17:01 |
juergbi | i.e., verify the deployed bst-artifact-server is actually up-to-date? | 17:01 |
juergbi | coldtom: is it possible that not both of the two bst-artifact-server instances (public pull-only and push) are launched with --index-only | 17:03 |
*** persia has joined #buildstream | 17:10 | |
*** adds68 has joined #buildstream | 17:11 | |
persia | Hi everyone. Was just chatting with adds68, who pointed out that https://buildstream.build/ was a bit outdated. Who covers https://gitlab.com/BuildStream/website these days? | 17:15 |
WSalmon | juergbi, i think there the same instance but with diffrent proxy settings but i could be wrong | 17:36 |
WSalmon | juergbi, so when i deleted bootstrap/file.bst from my local cache it could pull the proto from http://localhost:10101 and the elemnets from the cas of https://push.public.aws.celduin.co.uk even tho `Remote (https://public.aws.celduin.co.uk) does not have artifact f46ae283 cached` | 17:42 |
juergbi | WSalmon: could the issue be nginx caching grpc requests? | 17:43 |
juergbi | i.e., the responses are not up-to-date? | 17:43 |
WSalmon | so i think its coldtom's instance's index server or nginx that is the issue.... | 17:43 |
WSalmon | juergbi, maybe or the grpc forwarding, but it seems to have excepted some elements and not be excepting others.. | 17:45 |
WSalmon | i would expect the nginx to ether work or not, but it can be effected by size of artifact but the protos should be pretty small | 17:46 |
juergbi | if it's a stale cache issue it might depend on timing, i.e., it may seem random | 17:46 |
WSalmon | should be pretty easy for coldtom to rule anything like that out in the morning | 17:47 |
WSalmon | thanks juergbi for all you insight | 17:47 |
WSalmon | juergbi, once we get to the bottom of this i think we need to improve the error reporting in bst tho.. | 17:52 |
*** santi has quit IRC | 17:58 | |
*** tomaz has quit IRC | 18:08 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!