A new (remote) store protocol, go-nix updates

This article describes a proposal for a new Nix remote store protocol I’ve been thinking about for a while. It also gives an update on recent improvements in go-nix , as well as an outlook on what’s next.

A new (remote) store protocol Link to heading

Shortcomings in the current format Link to heading

As already written in a previous article , the NAR file itself isn’t quite nice as a wire format:

It doesn’t provide an index, so if you only want to access one file inside a big NAR file, right now you still need to download the whole archive, then seek through it until you’re at the file you initially requested.
While nix copy has an option to write .ls files alongside the NAR files, it feels very bolted on, and doesn’t have any trust root.
Even with .ls files and range requests, there’s no way to know the content hash of the chunk you want to download, so you can’t know if you already have it elsewhere locally.

While thinking about how substitution between a nix-casync binary cache and a local client should look like, I quickly realized what I’m really looking for was a generic remote store protocol, so all the improvements could be used in Tvix , and other projects.

I wanted to increase the metadata about a store path (so the index/list of files is included in the metadata), and each regular file can refer to a list of content-addressed chunks, allowing chunk substitution to become a much more out-of-band mechanism.

I also disliked the fact that uploaded .drv files are somewhat treated the same way as store paths, with a $drvHash.narinfo file, and a NAR file containing the literal ATerm contents.

A new proposal Link to heading

My current proposal uses a Manifest structure, on a Derivation (not per-output) granularity.

I brainstormed on various versions of this with a bunch of people (thanks adisbladis, andi-, edef and tazjin!).

In its current form, the manifest structure contains the following data (some of them being optional, TBD):

The derivation path
The derivation content
A list/map of outputs

Each output contains the following information:

The name of the output
A list of references to other store paths
A listing of all the elements in the output.
- Each regular file can contain a list of chunks.
  - Each chunk is identified by its hash. It also contains some metadata on the hashing algorithm used, and it’s size (so we can seek into files).
(TBD, see further down) A list (narinfo-style) signatures, NarHash and NarSize.

Actual chunk substitution happens out-of-band.

This design has a bunch of advantages:

Assuming there’s some sort of local chunk cache, individual chunks that are already available locally can be re-used.
Store paths without any regular files inside (symlinks etc.) don’t need any chunk downloads at all
Substitution doesn’t care about the exact chunking mechanism used. We can start with one chunk per file, and use different chunking mechanisms as we go.
Because chunks are content-adressed, they can trivially be substituted from anywhere, not just the binary cache that’s asked. This allows zero-trust gossip-style substitution from local network peers, IPFS or not-need-to-trust CDNs.
As chunks are independent from each other, they can be requested in a much more parallel fashion, allowing a higher substitution throughput for high-latency networks, or networks with slow per-stream throughput in general.

Signature mechanisms, trust Link to heading

Right now, the protocol doesn’t specify any signature mechanism, except maybe storing the existing narinfo-style signatures. However, as that requires assembling (and possibly subsituting the whole NAR), we might want to come up with a better signature scheme in the longer run, that doesn’t require subsituting all chunks.

Another option would be to simply require a proper HTTPS connection to the backend serving the Manifests. As can be seen with Cachix , people seem to be fine delegating the signing part to their binary cache.

However, signatures make sense for things like Trustix , so I’m in discussions with adisbladis on this.

Remote protocol Link to heading

So far, we only talked about the structure used to store metadata, not about the actual methods used to query a remote binary cache. There’s a lot of potential for optimization on the query side; similar to git’s “smart protocol”, substituting clients (which evaluated locally) could signal which manifests they already have, and get back all missing Manifests in one request, which should dramatically reduce roundtrip time.

We plan to experiment here a bit as we go, and more of this is implemented.

What’s next? Link to heading

There’s more things in the works, like an implementation of a Builder using OCI, and Nix Stores using the above mentioned protocol.

I’m also planning to slowly move some of the other concepts from nix-casync (store path substitution, the current Nix HTTP binary cache interface) into go-nix, once interfaces have settled a bit.

Feedback and Contributions welcome! Link to heading

There’s now also a Matrix Channel¹, that’s used for communication. Feel free to join!

#go-nix:matrix.org ↩︎