Using Tvix Store to Reduce Nix Storage Costs by 90%

Victor Fuentes

Victor Fuentes

For the past few years, Replit has been using Nix to serve packages/dependencies and provide consistent development environments to our users. Nix on Replit allows users to have access to a large number of packages and libraries that can be seamlessly integrated into a user's project.

Motivation

In order to quickly serve thousands of packages to our users, we attach a large Nix store persistent disk to all development containers serving Replit apps. While this approach has worked well for a while, this persistent disk grows with every subsequent NixOS release (eventually reaching a size of 20Tb). When considering ways to reduce the size of this disk, one important constraint was to never remove store paths from the cache. Project that depend on packages in older Nix channel releases could still link to these store paths, so in order to maintain backwards compatibility those store paths must remain.

Tvix Store

Tvix is a new modular implementation of Nix that contains a series of components that can be used individually. One of these components is the Tvix store. tvix-store is a Nix store implementation backed by the tvix-castore. The tvix-castore manages blobs (file contents) and directory info (file metadata, like names, permissions, etc). tvix-store manages nix path info metadata, effectively creating a mapping of Nix store paths to tvix-castore contents.

Tvix store model
Tvix store model

The tvix-castore is a content addressed system, and larger blobs store in the store are chunked. For our purposes this means that if the files for multiple packages are stored within the tvix-castore, any duplicate chunks of files are only stored once while being referenced in multiple places. Given that a lot of the files we store are different versions of the same package, this content addressing massively reduces the amount of data we store.

Content-defined chunking works by breaking files into smaller chunks, and then hashing them based on the chunk contents.

Content defined chunking works by breaking files into smaller chunks, and then hashing them based on the chunk contents.
Content defined chunking works by breaking files into smaller chunks, and then hashing them based on the chunk contents.

Implementation in Replit

Our previous cache disk is used as a Nix local store. The disk is mounted into each user’s container and used as part of the Nix store available to users.

Persistent disk nix store before tvix-store implementation
Persistent disk nix store before tvix-store implementation

tvix-store contains functionality to expose a Nix local store via a FUSE filesystem. Instead of containing the Nix store files, we can instead store a compressed tvix-store on the persistent disk. By running tvix-store mount as service on our container manager VM, we can expose a Nix local store and mount it such that appears identical from the container’s perspective.

Persistent disk contains tvix-store, and the tvix-store service exposes a nix local store
Persistent disk contains tvix-store, and the tvix-store service exposes a nix local store

While making file operations to the tvix-store exposed FUSE filesystem is slower, enabling basic FUSE caching mostly mitigates this issue.

By implementing this, we were able to compress our 6Tb cache of Nix store paths into 1.2Tb. In doing so, we also reduced out storage cost for the persistent disk cache by 90%.

Thank you to Florian Klink and the Tvix community for their help with this work. We are happy were were able to monetarily support the development of Tvix and make the work available to all as open-source. We look forward to the continued development of Tvix and the Nix ecosystem as a whole!

If solving hard problems like this is interesting to you, apply to work at Replit.

More