NixOS Oceansprint 2022/2

After not having time for Oceansprint 2022/1, I managed to attend Oceanprint 2022/2!

This year, I decided to spend some time designing and writing code for the tvix-store protocol, a more efficient way to store and represent store paths.

It is what will be used for Tvix, both for {Evaluator,Builder} <-> Store interaction, as well as during substitution.

The current iteration went through some iterations, but started initially during the first Oceansprint with nix-casync.

As last time, this Oceansprint also provided the right environment to get into focus mode for these kind of things :-) It’s very easy to get help, both coding and productive feedback.

I’d like to especially thank Thu and brainrake for helping with the implementation, as well as edef for helping from further away :-)

Let me share what we came up with:

Tvix Store protocol

Nix stores some information about a store path in a local sqlite database, or in NARInfo files on a binary cache, and a NAR file containing the store path contents.

As soon as a single byte in the store path changes, a whole different NAR file needs to be downloaded, causing a lot of traffic when updating a system.

The Tvix Store protocol is much more granular than Nix, because it represents store paths in a much more granular, and content-addressed form 1.

See “Store Model” further below for a detailled explanation.

Components

The tvix-store protocol can be split into three services:

PathInfoService

The PathInfo service provides lookups from an output path hash to a PathInfo message.

PathInfo message

A PathInfo message contains a root node {Directory,File,Symlink}Node, as a Nix Path can be one of these three types.

The root nodes' name field is populated with the (base)name inside /nix/store, so xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-pname-1.2.3.

The PathInfo message also stores references to other store paths, and some more NARInfo-specific metadata (signatures, narhash, narsize).

BlobService

BlobService takes care of storing blobs of data, used to host regular file contents.

It is content-addressed, using blake3 as a hashing function.

Due to its content-addressed nature, we only store files once. Uploading the same file again is a no-op.

Blobs can be stored in more granular chunks, which each are content-addressed too. Combined with fast content-defined chunkink, this allows splitting larger files into chunks that can be reused.

Thanks to blake3 being a tree hash, there’s an opportunity to do verified streaming of parts of the file, which doesn’t need to trust (or sign) any more information than the root hash itself.

DirectoryService

DirectoryService allows asking for Directory objects, and recursively for Directory objects contained in there.

Store Model

Directory message

Directory messages use the blake3 hash of their canonical protobuf serialization as its identifier.

A Directory message contains three lists, directories, files and symlinks, holding DirectoryNode, FileNode and SymlinkNode messages respectively. They describe all the direct child elements that are contained in a directory.

All three message types have a name field, specifying the (base)name of the element, and for reproducibility reasons, the lists MUST be sorted by that name.

In addition to the name field, the various *Node messages have the following fields:

DirectoryNode

A DirectoryNode message represents a child directory.

It has a digest field, which points to the identifier of another Directory message, making a Directory a merkle tree (or strictly speaking, a graph, as two elements pointing to a child directory with the same contents would point to the same Directory message.

There’s also a size field, containing the (total) number of all child elements in the referenced Directory, which helps for inode calculation.

FileNode

A FileNode message represents a child (regular) file.

Its digest field contains the blake3 hash of the file contents. It can be looked up in the BlobService.

The size field contains the size of the blob the digest field refers to.

The executable field specifies whether the file should be marked as executable or not.

SymlinkNode

A SymlinkNode message represents a child symlink.

In addition to the name field, the only additional field is the target, which is a string containing the target of the symlink.

Trust model / Distribution

As already described above, the only non-content-addressed service is the PathInfo service.

This means, all other messages (such as Blob and Directory messages) can be substituted from other sources/mirrors, which will make plugging in additional substitution strategies like IPFS, local network neighbors super simple.

As of now, we don’t specify an additional signature mechanism yet, as the only “real” client so far is Nix, which gets streamed the whole NAR file (and it can use the NARInfo-based signatures for verification).

A future signature mechanism, that is only signing (parts of) the PathInfo message, which only points to content-addressed data will enable verified partial access into a store path, opening up opportunities for lazy filesystem access, which is very useful in remote builder scenarios.

How to use this / Outlook

Parts of this can already be used from within Nix, without having to wait for Tvix to be fully functional.

We spent a bunch of time on nar-bridge, which provides a Nix HTTP Binary Cache interface for a tvix-store. This means, you can point Nix to a link-local HTTP Binary Cache URL, which will respond to .narinfo and nar/….nar HTTP requests, but use the more efficient tvix-store protocol under the hood.

It works in both directions, so this can also be used not only to substitute from tvix-stores, but also to upload things into a tvix-store.

As of today (2022-12-07), most of the code has been sent for review to the TVL Gerrit, which is where the Tvix development happens.

In addition to getting these changes reviewed and integrated with the rest of Tvix, I also want to write some “Store Combinators” to expose different tvix- store implementations via one interface, as well as something exposing an existing Nix-style binary cache via tvix-store protocol, so I can have faster downloads from the cache on my laptop.

Getting involved

Tvix and TVL are always welcoming new contributors. Check out the main website on how to participate and get in touch!


  1. This refers to how store path contents themselves are represented internally, not how they are “mounted/named” in /nix/store. That can be both content-addressed or not, it’s another layer. ↩︎