After not having time for Oceansprint 2022/1, I managed to attend Oceanprint 2022/2!
This year, I decided to spend some time designing and writing code for the tvix-store protocol, a more efficient way to store and represent store paths.
It is what will be used for Tvix , both for {Evaluator,Builder} <-> Store interaction, as well as during substitution.
The current iteration went through some iterations, but started initially during the first Oceansprint with nix-casync .
As last time, this Oceansprint also provided the right environment to get into focus mode for these kind of things :-) It’s very easy to get help, both coding and productive feedback.
I’d like to especially thank Thu and brainrake for helping with the implementation, as well as edef for helping from further away :-)
Let me share what we came up with:
Tvix Store protocol
Nix stores some information about a store path in a local sqlite database, or
in NARInfo
files on a binary cache, and a NAR
file containing the store path
contents.
As soon as a single byte in the store path changes, a whole different NAR
file
needs to be downloaded, causing a lot of traffic when updating a system.
The Tvix Store protocol is much more granular than Nix, because it represents store paths in a much more granular, and content-addressed form 1.
See “Store Model” further below for a detailled explanation.
Components
The tvix-store protocol can be split into three services:
PathInfoService
The PathInfo service provides lookups from an output path hash to a PathInfo
message.
PathInfo message
A PathInfo
message contains a root node {Directory,File,Symlink}Node
, as a
Nix Path can be one of these three types.
The root nodes' name
field is populated with the (base)name inside
/nix/store
, so xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-pname-1.2.3
.
The PathInfo
message also stores references to other store paths, and some
more NARInfo-specific metadata (signatures, narhash, narsize).
BlobService
BlobService
takes care of storing blobs of data, used to host regular file
contents.
It is content-addressed, using blake3 as a hashing function.
Due to its content-addressed nature, we only store files once. Uploading the same file again is a no-op.
Blobs can be stored in more granular chunks, which each are content-addressed too. Combined with fast content-defined chunkink , this allows splitting larger files into chunks that can be reused.
Thanks to blake3 being a tree hash, there’s an opportunity to do verified streaming of parts of the file, which doesn’t need to trust (or sign) any more information than the root hash itself.
DirectoryService
DirectoryService
allows asking for Directory objects, and recursively for
Directory objects contained in there.
Store Model
Directory message
Directory
messages use the blake3 hash of their canonical protobuf
serialization as its identifier.
A Directory
message contains three lists, directories
, files
and
symlinks
, holding DirectoryNode
, FileNode
and SymlinkNode
messages
respectively. They describe all the direct child elements that are contained in
a directory.
All three message types have a name
field, specifying the (base)name of the
element, and for reproducibility reasons, the lists MUST be sorted by that
name.
In addition to the name
field, the various *Node messages have the following
fields:
DirectoryNode
A DirectoryNode
message represents a child directory.
It has a digest
field, which points to the identifier of another Directory
message, making a Directory
a merkle tree (or strictly speaking, a graph, as
two elements pointing to a child directory with the same contents would point
to the same Directory
message.
There’s also a size
field, containing the (total) number of all child
elements in the referenced Directory
, which helps for inode calculation.
FileNode
A FileNode
message represents a child (regular) file.
Its digest
field contains the blake3 hash of the file contents. It can be
looked up in the BlobService
.
The size
field contains the size of the blob the digest
field refers to.
The executable
field specifies whether the file should be marked as
executable or not.
SymlinkNode
A SymlinkNode
message represents a child symlink.
In addition to the name
field, the only additional field is the target
,
which is a string containing the target of the symlink.
Trust model / Distribution
As already described above, the only non-content-addressed service is the
PathInfo
service.
This means, all other messages (such as Blob
and Directory
messages) can be
substituted from other sources/mirrors, which will make plugging in additional
substitution strategies like IPFS, local network neighbors super simple.
As of now, we don’t specify an additional signature mechanism yet, as the only “real” client so far is Nix, which gets streamed the whole NAR file (and it can use the NARInfo-based signatures for verification).
A future signature mechanism, that is only signing (parts of) the PathInfo
message, which only points to content-addressed data will enable verified
partial access into a store path, opening up opportunities for lazy filesystem
access, which is very useful in remote builder scenarios.
How to use this / Outlook
Parts of this can already be used from within Nix, without having to wait for Tvix to be fully functional.
We spent a bunch of time on nar-bridge, which provides a Nix HTTP Binary Cache
interface for a tvix-store. This means, you can point Nix to a link-local HTTP
Binary Cache URL, which will respond to .narinfo
and nar/….nar
HTTP
requests, but use the more efficient tvix-store protocol under the hood.
It works in both directions, so this can also be used not only to substitute from tvix-stores, but also to upload things into a tvix-store.
As of today (2022-12-07), most of the code has been sent for review to the TVL Gerrit , which is where the Tvix development happens.
In addition to getting these changes reviewed and integrated with the rest of Tvix, I also want to write some “Store Combinators” to expose different tvix- store implementations via one interface, as well as something exposing an existing Nix-style binary cache via tvix-store protocol, so I can have faster downloads from the cache on my laptop.
Getting involved
Tvix and TVL are always welcoming new contributors. Check out the main website on how to participate and get in touch!
-
This refers to how store path contents themselves are represented internally, not how they are “mounted/named” in
/nix/store
. That can be both content-addressed or not, it’s another layer. ↩︎