*** tristan has quit IRC | 04:44 | |
*** tristan has joined #buildstream | 05:05 | |
*** ChanServ sets mode: +o tristan | 05:05 | |
tristan | So in remote configurations, if you need to only pull, you might have an insecure cache, or you might require a cert... and for pulling purposes, you would need a "server-cert" | 08:59 |
---|---|---|
tristan | I am *presuming* this is the public portion of the key | 08:59 |
tristan | juergbi, Does it make sense for projects to store a "server-cert" in a keys/ subdirectory when configuring a pull remote ? | 09:00 |
tristan | At the same as I go through this config as per https://lists.apache.org/thread.html/rf2da9830e2fa918357f99a6021e55fc43df876f0b19d43f68802f083%40%3Cdev.buildstream.apache.org%3E ... I wonder if it make sense to change the nomenclature of these parameters | 09:01 |
tristan | It may make sense to the BuildStream hacker to speak of these things in terms of "client" and "server" certs and keys, but I'm not convinced it makes sense to the user | 09:02 |
tristan | To the user (I think) it makes more sense to think in terms of "The certs and keys I need to push" and "The certs I need to pull" | 09:02 |
tristan | in both cases, there is a client and a server, I don't think it makes sense to have to guess what each one means | 09:03 |
coldtom | imo it makes sense as is - the "server-cert" is the public key of the server (needed for https if the server cert is not from a trusted CA on your machine, e.g. self-signed) and the client cert/key are required if you have mutual TLS as authentication on the server | 09:10 |
tristan | coldtom, right, that makes sense if you happen to have some intimate knowledge of how certs work | 09:11 |
tristan | the naming is built around that understanding, which I don't think is a reasonable requirement | 09:12 |
tristan | looking at it, it confuses me, and then I have to dig into thinking about relations of certs and what machine does what, when in fact, even though I am a client and I am talking to a server, I only need to know "This is the cert I need to download" and "This is the cert/key I need to upload" | 09:13 |
tristan | At this point, I'm not entirely clear on whether you might need the "server-cert" if you only mean to push to that remote (and never pull from it) | 09:18 |
coldtom | any time you want to talk to a server that is using a non-trusted certificate using HTTPS (for push or pull) it needs to be specified | 09:19 |
tristan | It if is possible to require the "server-cert" for push-only purposes as well, then indeed, it would not make sense | 09:19 |
tristan | Right, that is rather unfortunate :-S | 09:20 |
tristan | https://docs.buildstream.build/master/using_config.html#artifact-server <-- this deserves a lesson, possibly with some external links to help explain what this is all for. | 09:21 |
coldtom | for sure, public key stuff is very confusing in general | 09:22 |
tristan | In this case though, am I correct to assume that this "cert" portion is the public key ? I think so | 09:23 |
tristan | In which case that would at least answer my initial question: Yes, it makes sense to store this "server-cert" in a project directory | 09:23 |
coldtom | yeah, the "cert" is the public key | 09:23 |
coldtom | so yeah it does make sense to store it in the project | 09:23 |
juergbi | yes, the server-cert is the public certificate | 09:25 |
juergbi | client key/cert is an authentication mechanism. it might well make sense to support other authentication mechanisms in the future | 09:26 |
tristan | So when we use client-cert/client-key, I presume the services have a list of "client-cert"s which it has decided to trust | 09:26 |
juergbi | maybe creating an `auth` dict/group would make sense | 09:26 |
juergbi | exactly | 09:26 |
tristan | Which these services could later revoke | 09:26 |
tristan | the servers need not have the client-keys, but the clients need to carry those safely around out of band | 09:27 |
juergbi | yes | 09:27 |
tristan | Creating an "auth" YAML fragment would be helpful yeah | 09:27 |
tristan | This way we could document it in one place, while it applies to artifact caches (index/storage), source caches, and also remote execution services | 09:28 |
coldtom | +1 to `auth` yaml fragment | 09:28 |
tristan | This could also allow for extensibility (we could at some point add a "type" field to the "auth" dict) | 09:30 |
tristan | not sure if that makes sense, but anyway, could help extensibility wise | 09:30 |
*** tristan has quit IRC | 09:45 | |
*** santi has joined #buildstream | 09:45 | |
*** tristan has joined #buildstream | 10:15 | |
*** ChanServ sets mode: +o tristan | 10:15 | |
nanonyme | Hey, we're having a situation where https://github.com/apache/buildstream/blob/bst-1/buildstream/_artifactcache/cascache.py#L489 is returning a directory instead of a file which is breaking things like https://paste.debian.net/plain/1182185; any debugging tips? | 11:03 |
tristan | Hmmm, that looks sadly lacking in terms of API comments, hard to even tell what _reachable_refs_dir() is all about, or even update_tree_mtime() (although the latter appears more self explanatory) | 11:21 |
tristan | nanonyme, it looks like that path is a data root of some sort, presuming this is reproducible I would trace it with sys.stderr.write("...\n"), printing anything going into there, and possibly adding some assertions for invariants higher up in the call stack | 11:22 |
tristan | if you get a full stack trace of the culprit for providing that digest to resolve_ref(), we might find out why it got there | 11:23 |
nanonyme | tristan, created https://github.com/apache/buildstream/pull/1447 which would hopefully give better output whenever that happens. Not sure if it's worth merging though since it doesn't actually fix the issue | 11:25 |
tristan | So it's not reproducible ? | 11:26 |
tristan | I guess | 11:26 |
nanonyme | It's probably reproducible if we figure out the exact key | 11:27 |
nanonyme | I don't know. This happened once before and we don't know what fixed it, possibly full cache flush | 11:27 |
nanonyme | I would imagine there's something bonkers written inside the file but I was having a hard time understanding the code in https://github.com/apache/buildstream/blob/3f6a92f9f7d4c113569abdaec8998c898dcb27c2/buildstream/_protos/build/bazel/remote/execution/v2/remote_execution_pb2.py anyway | 11:28 |
tristan | I suppose we could try to figure that out, i.e. try to map these errors better to the keys for later debugging purposes | 11:29 |
nanonyme | But should we in that case return some actual error so you know "stop now and debug"? | 11:30 |
tristan | if say, we were to raise a specific error that we could catch when servicing the main request, and print the key with the error (or not the "key" I suppose but rather the highest level digest) | 11:30 |
nanonyme | Hmm | 11:30 |
tristan | Not sure | 11:30 |
nanonyme | We don't know if the digest is botched or not until we stop having access to the digest | 11:31 |
tristan | knowing the digest which is causing problems is only half the problem, ultimately it looks like we've got a screwy cache, and we'd want to know the steps to reproduce a scenario that causes a cache to get screwed up | 11:31 |
nanonyme | But we can probably re-read the digest if we have access to the key | 11:31 |
nanonyme | Right | 11:31 |
tristan | Or, either there is something wrong with the cache, or there is something wrong with the code accessing it, but the accessing code is rather thin | 11:32 |
nanonyme | We tried deploying bst-1.4 to check when the problem started happening. It's not a regression in 1.6 | 11:32 |
tristan | One idea/question, does this start happening after the first time the cache gets pruned ? | 11:33 |
nanonyme | But rather before | 11:33 |
nanonyme | I don't know TBH | 11:33 |
nanonyme | jjardon, valentind, can we figure out somehow which version of bst local CAS we have on other nodes than ppc64le? To try to somehow figure out timeframe of when these issues started happening | 11:33 |
*** santi has quit IRC | 12:36 | |
*** santi has joined #buildstream | 12:36 | |
*** tristan has quit IRC | 17:16 | |
*** tristan has joined #buildstream | 17:16 | |
*** ChanServ sets mode: +o tristan | 17:16 | |
nanonyme | tristan, so basically here https://github.com/apache/buildstream/blob/45442f963a19a67c9346b53f9f802dc17ed560f2/buildstream/_artifactcache/cascache.py#L392 digest.hash is most likely an empty string | 17:36 |
nanonyme | Which made no sense to me but that remote_execution_pb2 looked so confusing to me that I had difficulty trying to figure out just *when* that could happen | 17:37 |
nanonyme | Oh. Can that happen if this file: https://github.com/apache/buildstream/blob/45442f963a19a67c9346b53f9f802dc17ed560f2/buildstream/_artifactcache/cascache.py#L488 is empty? | 17:38 |
nanonyme | This is *exactly* what happens | 17:41 |
nanonyme | So there's an empty file in the object storage | 17:41 |
nanonyme | There's a speacial case that remote_execution_pb2 doesn't validate input if it's an empty bytestring | 17:42 |
nanonyme | Sorry, remote_execution_pb2.Digest | 17:42 |
nanonyme | juergbi, make sense to you? | 17:42 |
nanonyme | Okay, so if this happens again, we should go through this directory https://github.com/apache/buildstream/blob/45442f963a19a67c9346b53f9f802dc17ed560f2/buildstream/_artifactcache/cascache.py#L671 and find out if any file is empty | 17:43 |
nanonyme | Created https://github.com/apache/buildstream/issues/1448 to gather these findings | 18:09 |
nanonyme | tristan, https://github.com/apache/buildstream/blob/45442f963a19a67c9346b53f9f802dc17ed560f2/buildstream/utils.py#L498 FWIW this is my conclusion. I'm missing proper data to know which one it is | 18:18 |
nanonyme | Why is this function so complex? It first gets fd, then it closes it just to open it again insteadd of calling fdopen o.O | 18:33 |
nanonyme | Seems to completely mitigate the usefulness of using mkstemp | 18:34 |
nanonyme | ... how is this done in master branch | 18:35 |
nanonyme | More or less same way except reformatted by black | 18:36 |
*** toscalix has joined #buildstream | 19:01 | |
nanonyme | But this doesn't make any sense. While I think the code is suboptimal, it should work | 19:02 |
*** santi has quit IRC | 19:12 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!