IRC logs for #buildstream for Tuesday, 2020-06-23

*** benschubert has quit IRC00:34
*** mohan43u has quit IRC01:40
*** mohan43u has joined #buildstream01:40
*** tristan has quit IRC06:50
*** benschubert has joined #buildstream07:37
*** tristan has joined #buildstream07:39
*** ChanServ sets mode: +o tristan07:39
*** samwilson has joined #buildstream07:43
tristanbenschubert, There is still probably a lot which I don't understand about cython... could you explain how it is that PARSE_CACHE here is bound to a given element, and not process global state?  https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_variables.pyx#L22308:21
benschuberttristan: It is global state08:22
tristanOh wait08:22
tristanI understand now08:23
tristanIt's a translation table of "%{varname}" -> "varname"08:23
benschubertright08:23
benschuberttht's actually a super comment we should add on top08:23
benschubertwould help a lot08:23
tristanheh08:23
tristanYeah the comment is confusing, speaking already of lifecycles of contained strings being tied to elements which use it08:24
*** santi has joined #buildstream08:24
tristanOr wait, its a translation to a list08:25
tristanSo, it also indexes things like "%{foo} bar %{baz}" -> ["foo", "baz"] iiuc08:25
benschubertI amsure of the correctness of this part08:25
tristanyes that seems correct, when you try to expand something, you can have this form, so you would need a list of things which need to be resolved in order to resolve the thing08:26
benschubert*I am not sure08:26
tristanNo ?08:26
benschubertmmh no actually, should work sorry08:26
benschubertthis part of the code always gets me08:26
tristanWell, it appears correct so far, but I think I will need to understand a bit more :)08:26
benschubertdon't hesitate to add comments there08:27
* tristan will try to leave it in a more understandable state yeah08:27
benschubertit's definitely a part of the coe I'm not comfortable with (most of the rest of the file is more direct at least)08:27
tristanmay rename some variables too08:27
tristanDo you know if we support strings like "There are some { literal %{braces} in } here" ?08:29
tristanI think at first we didn't really support that08:29
tristanAnd I'm sure we never supported "%{var-%{name}}" dynamic variable name resolution08:30
tristan(nor do I think we need to)08:30
tristanAnyway, that's kind of orthogonal, I'll try to focus08:31
tristanThe part of variables which always bites me is the regex, I kind of despise them :)08:32
benschubert> "%{var-%{name}}" definitely not. We should probably efend against it08:35
benschubert> "There are some { literal %{braces} in } here" ? should work correctly, cs-shadow did a fix for that recently08:35
*** tomaz has joined #buildstream08:41
*** phildawson has quit IRC08:42
*** phildawson has joined #buildstream08:43
*** tristan has quit IRC08:52
juergbibenschubert: have you seen https://github.com/cython/cython/issues/3515#issuecomment-646374155 ? adding a dummy inline function to node.pxd might work around the coverage 5 issues, which might allow getting py38 coverage to finally work09:02
juergbihaven't tested it yet09:02
benschubertThat would be great news!09:09
*** tristan has joined #buildstream10:16
*** ChanServ sets mode: +o tristan10:16
tristanbenschubert, you don't like the try / except KeyErrors ? I think that we save a line of code in loader.py with your suggestion, but in _project.py, it seems cleaner to me to have try/except KeyError no ?10:45
tristanTo be honest, I've been hoping to make more explicit `try / except KeyError` statements throughout the codebase, to enforce a more "ask for forgiveness instead of permission" mentality/pattern everywhere10:46
tristanTo many code blocks are doing "if (this_thing_exists): do_this_thing()" instead of "try: get_this_thing() / except KeyError: oops_nothing_here()"10:47
tristanresulting in a handful of redundant lookups10:47
tristanI think/hope that try/except coding style helps to change the readers mind about how to go about things10:48
benschuberttristan: I am not against try-except on the contrary!10:55
benschubertJust that when we do `try-except-pass`, I think that `suppress` is more explicit. the `pass` always seems like someone forgot to implement it10:56
benschubertif you find try-except-pass more redaable I'm fine with that too10:56
tristanHah10:56
tristanYeah I rather liked suppress in loader.py, in _project.py there is not pass10:57
*** abderrahim[m] has joined #buildstream10:57
tristanbenschubert, however, looking again, I think even better is `junction_list = self._junction_duplicates.get(project_name, {})`10:58
tristanrendering the following loop a no-op and doing it in one-line10:58
benschubertdefinitely10:58
tristanI will also rename it junction_dict, since it is a dict of dicts afterall10:58
benschubertSeems good!10:59
tristanThat used to be a list of tuples10:59
tristanBut I changed it10:59
abderrahim[m]no gitlab bot :S11:15
abderrahim[m]tristan: I think you would love https://gitlab.com/BuildStream/buildstream/-/merge_requests/197011:16
abderrahim[m]please tell me if not :)11:16
tristanabderrahim[m], Well, removing `element._build_log_path = ...` from _artifact.py is already a great course correction !11:22
tristanI'm unsure exactly everything that's going on in there11:22
tristanjust a glance11:22
tristanLogging the build environment in element.assemble() rather than in buildjob also looks good to me11:23
tristanOh ! that horrible hack from buildjob is going away !11:24
* tristan jumps for joy \o/11:24
abderrahim[m]I'd also appreciate a hint on where to put a regression test for https://gitlab.com/BuildStream/buildstream/-/merge_requests/197111:24
tristanohhhh11:25
tristanabderrahim[m], you may want to wait for me to finish !1901, not sure...11:25
tristanabderrahim[m], yeah, I think you should wait for !1901 to land, I'm landing it quite presently11:26
tristanabderrahim[m], the point is that, an element path is not always the same, you can have multiple element paths from the command line which are all valid ways of reaching a given element11:27
abderrahim[m]ok, I'll try to see if !1901 fixes the issue then11:27
tristanSo test cases for that should be parameterized to try paths (A) Absolute resolved paths, (B) Paths which involve links, (C) Paths which point to elements via overridden subprojects11:28
tristanbenschubert, I just replied regarding the error messages, if you have any thoughts on how to represent these strings (or agree with the approach), let me know: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1901/#note_36641460611:31
tristanI'll try to bang it out now11:31
benschuberttristan: answered11:34
benschubertI agree with the approach and if you'd rather have it like that I'm also fine with it11:34
tristanAnd answered your question :)11:40
tristanSo the whole error message is about a "project" being defined multiple times, where we display every instance - it is indeed left up to the user to know *why*, that is probably a shortcomming of the message11:41
tristanMaybe a standard paragraph along with the error would be good.11:41
benschubertor a link to the docs?11:42
benschubertit might get out of date but it is a nice way of helping11:42
tristanHmmm11:43
tristanReferring to urls in error messages11:43
benschubertanyways I'll be back in a moment, I'm off for food11:43
tristanI'm hoping we don't have that pattern too much11:43
tristanYup11:43
benschubertI don't think we have11:43
tristanbenschubert, We have an agreement to do so for one case11:43
tristanAnd I think it's exceptional11:43
benschubertok, let's not do that then11:44
tristanI.e. it's regarding parallel installation of bst-1 / bst-211:44
tristanEnjoy your dinner :)11:44
* tristan thinks he can capture the important bit with one sentence11:45
tristan"The following instances of project 'foo' were encountered: [ insert list ] .... \n\n Internal projects do not cause any conflicts. Conflicts can also be avoided by marking every instance of the project as a duplicate."11:47
tristanTwo short sentences11:48
benschuberttristan: I like that12:22
tristanbenschubert, juergbi, I think I've addressed the majority of comments on the !1901 branch now12:44
tristanjuergbi, do you have strong feelings about https://gitlab.com/BuildStream/buildstream/-/merge_requests/1901/#note_366357312 ? Would you like additional comments explaining that alternative parents are needed in order to access the entire ancestry ?12:45
tristanAs I mentioned in the comments, I would rather have that function stand on it's own, and not speak about internal/duplicates as one of it's use cases12:45
tristanHow about, I'll let you guys think on it a bit more, and in the absence of further comments on the MR I will merge tomorrow morning Korea time ?12:47
*** tristan has quit IRC13:11
*** phildawson has quit IRC13:34
*** phildawson has joined #buildstream13:35
benschuberttristan: good for me, I'll have a lsat look tonight13:39
*** santi has quit IRC13:42
*** santi has joined #buildstream13:45
WSalmoncan anyone think of a good reason https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_cas/casserver.py#L512 would fail?15:50
juergbiWSalmon: how does it fail?16:13
juergbithe serialization or the write?16:13
WSalmonany of it16:14
juergbiout of disk space but rather unlikely16:14
juergbiI wouldn't expect SerializeToString to fail16:14
WSalmoni *think* something todo with pushing is failing and its being reported as a `artifact already there` the server its on is 200M/1G disk wise16:15
WSalmonis that right coldtom ?16:15
WSalmonwhen coldtom went and looked he couldnt find the artifact even though the error said `already here`16:15
WSalmon<coldtom> [root@infra /]# kubectl exec -ti -n rex statefulset/storage-artifact -- find /artifacts -name f7b9a8c38b4d65f633117ff028ee433c488d6ce4a05643869a49f74b5c99c4f316:16
WSalmon<coldtom> [root@infra /]#16:16
coldtomthat's correct WSalmon16:16
juergbiwhat was the exact error when it couldn't find the artifact?16:16
juergbii.e., could it have been a blob missing?16:16
WSalmonits a index only server16:16
juergbiok but buildstream always does both16:17
coldtomjuergbi, there have been no gRPC NOT_FOUNDs from the CAS16:17
coldtomso i don't think it's missing blobs16:17
WSalmonthis is the right up https://gitlab.com/celduin/infrastructure/celduin-infra/-/issues/20816:17
WSalmonthe error message seems to imply that it could push the cas bits16:17
WSalmonbut then failed to push to the index16:18
coldtomthe behaviour is "not cached" -> build -> push -> "already cached"16:18
coldtominspecting the artifact store shows that it is not cached16:18
juergbithere is a logic error in the buildstream artifact cache client I've noticed recently, which may be related16:18
juergbipushing the artifact proto  should actually replace the proto on the server, not return that it's already cached16:19
juergbias the artifact proto contents may be different (pointing to different CAS objects)16:19
juergbii.e., _push_artifact_proto() first calling get_artifact() seems incorrect to me16:20
juergbiit's not a big issue on combined index/CAS servers as there get_artifact() will return NOT_FOUND if a CAS blob is missing16:20
juergbibut with a standalone index server, the artifact proto might remain forever with dead references to CAS objects16:21
juergbiI can't be sure that artifact protos with dead references are the issue in your tests but it could be16:22
WSalmonthat would make sence if the pull reported index but not cas, but the pull fails striaght away16:22
coldtomhmm, we shouldn't have had any blob expiry in the CAS yet16:22
coldtomand in that case, there should be a reference stored on the server, afaict the artifact is never stored16:23
juergbihm, so get_artifact() is successful despite the artifact proto not existing on the index server?16:24
juergbithat's pretty odd16:24
juergbicoldtom: you mention GetArtifact requests being logged. do you also have logs of the result (grpc status)?16:26
juergbii.e., is the server incorrectly returning OK or is BuildStream not detecting the error?16:27
coldtomjuergbi, unfortunately bst-artifact-server doesn't log the responses, and there's no way to tell which GetArtifact request is the offending one from the nginx logs16:29
juergbiok. I would expect at least one test in our test suite to exercise GetArtifact for a not existing artifact16:30
WSalmoncoldtom, do you record the standard out of the artifact-index server?16:35
coldtomyup, but it only logs when requests come in so doesn't know the response16:38
juergbiWSalmon, coldtom: I think we should head in this direction: https://gitlab.com/BuildStream/buildstream/commits/juerg/push16:40
juergbi(and at a higher level, completely skip artifact push for remotes where we've pulled the artifact in the same session)16:41
juergbiI still can't explain the odd behavior you're seeing but it might still help16:42
WSalmon[--:--:--][f46ae283][    push:freedesktop-sdk.bst:bootstrap/file.bst] INFO    Remote (https://push.public.aws.celduin.co.uk) already has artifact f46ae283 cached16:45
WSalmon[--:--:--][f46ae283][    push:freedesktop-sdk.bst:bootstrap/file.bst] STATUS  Pushing artifact f46ae283 -> http://localhost:1010116:45
WSalmon[--:--:--][f46ae283][    push:freedesktop-sdk.bst:bootstrap/file.bst] INFO    Pushed artifact f46ae283 -> http://localhost:1010116:45
WSalmon[00:00:00][f46ae283][    push:freedesktop-sdk.bst:bootstrap/file.bst] SUCCESS freedesktop-sdk/bootstrap-file/f46ae283-push.2372052.log16:45
WSalmonso this is annoying, (https://push.public.aws.celduin.co.uk) already has artifact f46ae283 cached is a lie16:45
WSalmonbut my local index only has just excepted the artifact16:46
*** tristan has joined #buildstream16:46
*** ChanServ sets mode: +o tristan16:46
* WSalmon looks suspiciously at coldtom's deployment16:47
*** samwilson has quit IRC16:52
WSalmon[--:--:--][3a490e29][    push:freedesktop-sdk.bst:bootstrap/binutils.bst] STATUS  Pushing data from artifact 3a490e29 -> https://push.public.aws.celduin.co.uk16:52
WSalmon[--:--:--][3a490e29][    push:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO    Pushed data from artifact 3a490e29 -> https://push.public.aws.celduin.co.uk16:52
WSalmon[--:--:--][3a490e29][    push:freedesktop-sdk.bst:bootstrap/binutils.bst] STATUS  Pushing artifact 3a490e29 -> https://push.public.aws.celduin.co.uk16:52
WSalmon[--:--:--][3a490e29][    push:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO    Remote (https://push.public.aws.celduin.co.uk) already has artifact 3a490e29 cached16:52
WSalmon[--:--:--][3a490e29][    push:freedesktop-sdk.bst:bootstrap/binutils.bst] STATUS  Pushing artifact 3a490e29 -> http://localhost:1010116:52
WSalmon[--:--:--][3a490e29][    push:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO    Pushed artifact 3a490e29 -> http://localhost:1010116:52
WSalmon[00:00:00][3a490e29][    push:freedesktop-sdk.bst:bootstrap/binutils.bst] SUCCESS freedesktop-sdk/bootstrap-binutils/3a490e29-push.2386028.log16:52
WSalmonsame again16:52
juergbiWSalmon: and you're really certain that the artifact doesn't exist on the server?16:54
WSalmon[--:--:--][3a490e29][    pull:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO    Remote (https://push.public.aws.celduin.co.uk) does not have artifact 3a490e29 cached16:57
WSalmon[--:--:--][3a490e29][    pull:freedesktop-sdk.bst:bootstrap/binutils.bst] STATUS  Pulling artifact 3a490e29 <- http://localhost:1010116:57
WSalmon[--:--:--][3a490e29][    pull:freedesktop-sdk.bst:bootstrap/binutils.bst] INFO    Remote (http://localhost:10101) does not have artifact 3a490e29 cached16:57
WSalmon[00:00:00][3a490e29][    pull:freedesktop-sdk.bst:bootstrap/binutils.bst] SKIPPED Pull16:57
WSalmonone or other is failing and as top showed for that other one i dosent look like its there16:58
juergbiWSalmon: and there is no chance it has been pushed by another buildstream client in the meantime?16:59
WSalmonno16:59
WSalmonwe get this a lot16:59
WSalmonits not the odd thing17:00
juergbiah, wait, that's the localhost:1010117:00
juergbii.e., not the same server17:00
juergbiit successfully pushes to localhost:1010117:00
WSalmonyes17:00
WSalmonthats what i mean17:00
WSalmon* WSalmon looks suspiciously at coldtom's deployment17:00
WSalmonim gona delete the artifact and try again but i need to go sort something els out now17:01
juergbii.e., verify the deployed bst-artifact-server is actually up-to-date?17:01
juergbicoldtom: is it possible that not both of the two bst-artifact-server instances (public pull-only and push) are launched with --index-only17:03
*** persia has joined #buildstream17:10
*** adds68 has joined #buildstream17:11
persiaHi everyone.  Was just chatting with adds68, who pointed out that https://buildstream.build/ was a bit outdated.  Who covers https://gitlab.com/BuildStream/website these days?17:15
WSalmonjuergbi,  i think there the same instance but with diffrent proxy settings but i could be wrong17:36
WSalmonjuergbi, so when i deleted bootstrap/file.bst from my local cache it could pull the proto from  http://localhost:10101 and the elemnets from the cas of  https://push.public.aws.celduin.co.uk even tho `Remote (https://public.aws.celduin.co.uk) does not have artifact f46ae283 cached`17:42
juergbiWSalmon: could the issue be nginx caching grpc requests?17:43
juergbii.e., the responses are not up-to-date?17:43
WSalmonso i think its coldtom's instance's index server or nginx that is the issue....17:43
WSalmonjuergbi, maybe or the grpc forwarding, but it seems to have excepted some elements and not be excepting others..17:45
WSalmoni would expect the nginx to ether work or not, but it can be effected by size of artifact but the protos should be pretty small17:46
juergbiif it's a stale cache issue it might depend on timing, i.e., it may seem random17:46
WSalmonshould be pretty easy for coldtom to rule anything like that out in the morning17:47
WSalmonthanks juergbi for all you insight17:47
WSalmonjuergbi, once we get to the bottom of this i think we need to improve the error reporting in bst tho..17:52
*** santi has quit IRC17:58
*** tomaz has quit IRC18:08

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!