Please note: This isn't really a request for a feature to be implemented, it is just to get ideas and a discussion going. I don't know if people consider this part of their threat model, but it's something I mulled over, and I'm interested in hearing the ideas of other people who have written binary cache software. Attic looks great in any case!
I quite like Attic at a glance. I have developed several Nix binary caches in the past, including Eris, and an an unreleased "serverless" one running on WASM/JS function services, which I also planned to have many of the same features as Attic. But I'd like to mention something since I mulled over it a bit.
My serverless solution also has server-side signing, since it is relatively easy to compute the signature for a .narinfo
, and makes many bugs like nixos/nix#6960 irrelevant. It's nice. But I think it's important to note that server-side signing acts as a kind of oracle; anything uploaded to the binary cache is implicitly signed as if it was authored by you. This means that if anyone uploads anything invalid or garbage (or backdoored, e.g. a CI system) it can and will be shown as "authentic." There is also no secure provenance or identity attached to the original upload; it is not possible to prove after the fact that where it came from.
For instance, if someone steals an authentication key from a (valid) person doing uploads, they can then upload anything they want with abandon and it can never be tied back to them. For example, given the way I think the upload works from the description in #7, they can do things like populate "correct" hashes with trojaned binaries e.g. deadbeef-firefox-100
is a valid hash the user computes, but they upload a trojaned binary under this hash. Now, if someone else tries to upload deadbeef-firefox-100
, the deadbeef.narinfo
file gets located, and therefore the nar itself is silently discarded. The cache then remains infected for all time until it is purged.
This is sort of a different take on the original problems signatures solved; current non-content-addressed store derivations take their hash from their inputs, not their output; keys are used to authenticate that the output binary is authentically coming from a trusted source, because a malicious source could compute the same input hash, but give you a trojaned binary under it. What you want is a kind of non-repudiation so that when you get an upload, it cannot be denied where it came from. CA derivations partially address this because they're self authenticating, so when you look up a hash, it can be verified that it is legitimate immediately. But you still don't know who gave it to you.
One way around this is to sign uploads in a lock-step way. First, an agent requests to do an upload, and establishes some identity that can be validated e.g. "I am github CI runner on commit hash 0xDEADBEEF running at 12pm UTC, with the given $GITHUB_TOKEN
", and you check the $GITHUB_TOKEN
is legitimate on the server via OAuth. You then issue a new short-term ed25519 signing key in return, which can be used to sign uploads for a short time frame, say, 15 minutes. The agent then uploads all its derivations under this signing key, within this time frame, and this is validated by the server. The key is then marked as "permissible, but not usable for any further signatures" after the 15 minutes. Then, when a narinfo is requested, it is identifiable where it came from at what time through the signature. This signature can then be replaced by a new signature "on the fly", and this replaced signature is what is shown to the user, just like it works today. This design is similar to the way SLSA Level 3 guidelines operate, as they require cryptographically sure proof of provenance. This approach isolates the user-facing key from the key used to authenticate the builder itself, rather than relying purely on simple bearer token schemes (which I suspect is what is used now, though I admit I haven't read the code thoroughly yet...)
Another very good thing to note is that this allows real revocation; assuming it is ever discovered that a build is compromised for some reason, it's now possible to track this down to individual keys assigned during the build step, and revoke those keys behind the scenes e.g. when a narinfo is requested that is associated with a revoked key, simply 404 instead.
In the world of CA derivations, this provenance is still useful for those reasons, though the need for end-user signatures is not needed since the hashes allow self authentication.
Anyway, I'd be interested to know your thoughts on something like this. It is complex to work out the details, but I think a significant step up over the current state of the art in cache security and helps operate closer to modern standards like SLSA. With a globally deduplicated cache it's also important since any user can easily "poison the well" for all other users, so having some auditability for cases like this is nice, which is something I realized while thinking about multi-tenancy and deduplication myself.