IRC logs for #buildstream for Friday, 2021-01-22

*** tristan has quit IRC04:44
*** tristan has joined #buildstream05:05
*** ChanServ sets mode: +o tristan05:05
tristanSo in remote configurations, if you need to only pull, you might have an insecure cache, or you might require a cert... and for pulling purposes, you would need a "server-cert"08:59
tristanI am *presuming* this is the public portion of the key08:59
tristanjuergbi, Does it make sense for projects to store a "server-cert" in a keys/ subdirectory when configuring a pull remote ?09:00
tristanAt the same as I go through this config as per https://lists.apache.org/thread.html/rf2da9830e2fa918357f99a6021e55fc43df876f0b19d43f68802f083%40%3Cdev.buildstream.apache.org%3E ... I wonder if it make sense to change the nomenclature of these parameters09:01
tristanIt may make sense to the BuildStream hacker to speak of these things in terms of "client" and "server" certs and keys, but I'm not convinced it makes sense to the user09:02
tristanTo the user (I think) it makes more sense to think in terms of "The certs and keys I need to push" and "The certs I need to pull"09:02
tristanin both cases, there is a client and a server, I don't think it makes sense to have to guess what each one means09:03
coldtomimo it makes sense as is - the "server-cert" is the public key of the server (needed for https if the server cert is not from a trusted CA on your machine, e.g. self-signed) and the client cert/key are required if you have mutual TLS as authentication on the server09:10
tristancoldtom, right, that makes sense if you happen to have some intimate knowledge of how certs work09:11
tristanthe naming is built around that understanding, which I don't think is a reasonable requirement09:12
tristanlooking at it, it confuses me, and then I have to dig into thinking about relations of certs and what machine does what, when in fact, even though I am a client and I am talking to a server, I only need to know "This is the cert I need to download" and "This is the cert/key I need to upload"09:13
tristanAt this point, I'm not entirely clear on whether you might need the "server-cert" if you only mean to push to that remote (and never pull from it)09:18
coldtomany time you want to talk to a server that is using a non-trusted certificate using HTTPS (for push or pull) it needs to be specified09:19
tristanIt if is possible to require the "server-cert" for push-only purposes as well, then indeed, it would not make sense09:19
tristanRight, that is rather unfortunate :-S09:20
tristanhttps://docs.buildstream.build/master/using_config.html#artifact-server <-- this deserves a lesson, possibly with some external links to help explain what this is all for.09:21
coldtomfor sure, public key stuff is very confusing in general09:22
tristanIn this case though, am I correct to assume that this "cert" portion is the public key ? I think so09:23
tristanIn which case that would at least answer my initial question: Yes, it makes sense to store this "server-cert" in a project directory09:23
coldtomyeah, the "cert" is the public key09:23
coldtomso yeah it does make sense to store it in the project09:23
juergbiyes, the server-cert is the public certificate09:25
juergbiclient key/cert is an authentication mechanism. it might well make sense to support other authentication mechanisms in the future09:26
tristanSo when we use client-cert/client-key, I presume the services have a list of "client-cert"s which it has decided to trust09:26
juergbimaybe creating an `auth` dict/group would make sense09:26
juergbiexactly09:26
tristanWhich these services could later revoke09:26
tristanthe servers need not have the client-keys, but the clients need to carry those safely around out of band09:27
juergbiyes09:27
tristanCreating an "auth" YAML fragment would be helpful yeah09:27
tristanThis way we could document it in one place, while it applies to artifact caches (index/storage), source caches, and also remote execution services09:28
coldtom+1 to `auth` yaml fragment09:28
tristanThis could also allow for extensibility (we could at some point add a "type" field to the "auth" dict)09:30
tristannot sure if that makes sense, but anyway, could help extensibility wise09:30
*** tristan has quit IRC09:45
*** santi has joined #buildstream09:45
*** tristan has joined #buildstream10:15
*** ChanServ sets mode: +o tristan10:15
nanonymeHey, we're having a situation where https://github.com/apache/buildstream/blob/bst-1/buildstream/_artifactcache/cascache.py#L489 is returning a directory instead of a file which is breaking things like https://paste.debian.net/plain/1182185; any debugging tips?11:03
tristanHmmm, that looks sadly lacking in terms of API comments, hard to even tell what _reachable_refs_dir() is all about, or even update_tree_mtime() (although the latter appears more self explanatory)11:21
tristannanonyme, it looks like that path is a data root of some sort, presuming this is reproducible I would trace it with sys.stderr.write("...\n"), printing anything going into there, and possibly adding some assertions for invariants higher up in the call stack11:22
tristanif you get a full stack trace of the culprit for providing that digest to resolve_ref(), we might find out why it got there11:23
nanonymetristan, created https://github.com/apache/buildstream/pull/1447 which would hopefully give better output whenever that happens. Not sure if it's worth merging though since it doesn't actually fix the issue11:25
tristanSo it's not reproducible ?11:26
tristanI guess11:26
nanonymeIt's probably reproducible if we figure out the exact key11:27
nanonymeI don't know. This happened once before and we don't know what fixed it, possibly full cache flush11:27
nanonymeI would imagine there's something bonkers written inside the file but I was having a hard time understanding the code in https://github.com/apache/buildstream/blob/3f6a92f9f7d4c113569abdaec8998c898dcb27c2/buildstream/_protos/build/bazel/remote/execution/v2/remote_execution_pb2.py anyway11:28
tristanI suppose we could try to figure that out, i.e. try to map these errors better to the keys for later debugging purposes11:29
nanonymeBut should we in that case return some actual error so you know "stop now and debug"?11:30
tristanif say, we were to raise a specific error that we could catch when servicing the main request, and print the key with the error (or not the "key" I suppose but rather the highest level digest)11:30
nanonymeHmm11:30
tristanNot sure11:30
nanonymeWe don't know if the digest is botched or not until we stop having access to the digest11:31
tristanknowing the digest which is causing problems is only half the problem, ultimately it looks like we've got a screwy cache, and we'd want to know the steps to reproduce a scenario that causes a cache to get screwed up11:31
nanonymeBut we can probably re-read the digest if we have access to the key11:31
nanonymeRight11:31
tristanOr, either there is something wrong with the cache, or there is something wrong with the code accessing it, but the accessing code is rather thin11:32
nanonymeWe tried deploying bst-1.4 to check when the problem started happening. It's not a regression in 1.611:32
tristanOne idea/question, does this start happening after the first time the cache gets pruned ?11:33
nanonymeBut rather before11:33
nanonymeI don't know TBH11:33
nanonymejjardon, valentind, can we figure out somehow which version of bst local CAS we have on other nodes than ppc64le? To try to somehow figure out timeframe of when these issues started happening11:33
*** santi has quit IRC12:36
*** santi has joined #buildstream12:36
*** tristan has quit IRC17:16
*** tristan has joined #buildstream17:16
*** ChanServ sets mode: +o tristan17:16
nanonymetristan, so basically here https://github.com/apache/buildstream/blob/45442f963a19a67c9346b53f9f802dc17ed560f2/buildstream/_artifactcache/cascache.py#L392 digest.hash is most likely an empty string17:36
nanonymeWhich made no sense to me but that remote_execution_pb2 looked so confusing to me that I had difficulty trying to figure out just *when* that could happen17:37
nanonymeOh. Can that happen if this file: https://github.com/apache/buildstream/blob/45442f963a19a67c9346b53f9f802dc17ed560f2/buildstream/_artifactcache/cascache.py#L488 is empty?17:38
nanonymeThis is *exactly* what happens17:41
nanonymeSo there's an empty file in the object storage17:41
nanonymeThere's a speacial case that remote_execution_pb2 doesn't validate input if it's an empty bytestring17:42
nanonymeSorry, remote_execution_pb2.Digest17:42
nanonymejuergbi, make sense to you?17:42
nanonymeOkay, so if this happens again, we should go through this directory https://github.com/apache/buildstream/blob/45442f963a19a67c9346b53f9f802dc17ed560f2/buildstream/_artifactcache/cascache.py#L671 and find out if any file is empty17:43
nanonymeCreated https://github.com/apache/buildstream/issues/1448 to gather these findings18:09
nanonymetristan, https://github.com/apache/buildstream/blob/45442f963a19a67c9346b53f9f802dc17ed560f2/buildstream/utils.py#L498 FWIW this is my conclusion. I'm missing proper data to know which one it is18:18
nanonymeWhy is this function so complex? It first gets fd, then it closes it just to open it again insteadd of calling fdopen o.O18:33
nanonymeSeems to completely mitigate the usefulness of using mkstemp18:34
nanonyme... how is this done in master branch18:35
nanonymeMore or less same way except reformatted by black18:36
*** toscalix has joined #buildstream19:01
nanonymeBut this doesn't make any sense. While I think the code is suboptimal, it should work19:02
*** santi has quit IRC19:12

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!