Searching Nixpkgs in Under 30 Milliseconds

Colton Donnelly

Colton Donnelly

Today, we’re releasing the first version of rippkgs, a CLI utility for indexing and searching Nix expressions. With rippkgs, you can quickly search the nixpkgs available to your system with accurate results. Read on for more details about why we created it, how to use it, and how it works.

Motivation

At Replit, we use Nix to empower millions of users with hundreds of programming languages. The power of Nix’s reproducible expressions allows us to share system packages fearlessly and quickly with ultimate flexibility for end users. However, users are often not familiar with Nix, so we need to give them the tools they need to interact with it comfortably.

Experienced Nix users looking to install or use a package may reach for nix-env, nix search, nix-locate, or search.nixos.org. These tools are excellent for visibility into what’s available in the largest package repository, nixpkgs. Unfortunately, none of these tools give us what we need to provide great search for Replit users:

  • nix-env and nix search are bundled with Nix, which means they’re already accessible in Replit environments, but searching for a package can take several seconds - way too long for those of us who are impatient and just want to find what we’re looking for quickly.
  • nix-locate works by indexing built derivation paths, which is great when you know the path you’re looking for (like /bin/jq), but not great when looking for a package with unknown output formats.
  • search.nixos.org is fast and responsive but only provides results for the most up-to-date channels. This is restrictive for users who are pinned to specific nixpkgs releases like we are at Replit.

When working on search functionality for the System Dependencies pane in the Replit Workspace, better performance and results are necessary to serve our users the information they want in a pleasant experience. To achieve this, rippkgs was created as an alternative to the solutions above, with a focus on speed and correctness.

We also incorporated rippkgs into our command-not-found handler, which gives users more information about the packages we find via nix-locate when prompted to accept or reject the suggested packages.

Usage

rippkgs needs an index database to operate on to provide results. It does this via a sibling CLI, rippkgs-index, which evaluates a nixpkgs expression in a (mostly) safe way and provides an SQLite database with the information gathered. Generating an index is as simple as:

rippkgs-index nixpkgs -o rippkgs-index.sqlite

This creates an index database in the current directory as rippkgs-index.sqlite. It uses NIX_PATH to find nixpkgs and starts indexing all of the packages in that nixpkgs distribution. Searching the index is as easy as:

rippkgs -i rippkgs-index.sqlite zsh

If your index is available at $XDG_DATA_HOME/rippkgs-index.sqlite, you can search without passing in the index using the -i flag.

How it works

rippkgs-index evaluates all of nixpkgs, taking special measures to prevent failures in most cases. For every attribute available in nixpkgs, the attribute is evaluated to determine that:

  • The value shallowly evaluates successfully
  • The value is a derivation
  • The value’s platform availability can be successfully evaluated
  • The value is available on the target platform

When the attribute isn’t a derivation, the recurseForDerivations field is checked to see if the attribute’s attributes should be checked. This is used to generate a registry of the nixpkgs distribution which can then be used to generate the index database, simply iterating through the generated registry as a JSON file and inserting each attribute into the SQLite database. Since the index produced by rippkgs-index is a plain SQLite database, it can be queried freely using the sqlite3 CLI.

When the user queries rippkgs without the --exact flag, rippkgs uses rusqlite to install a scalar function into the SQLite connection in memory which provides a “score” for each attribute saved in the database. The function uses the fuzzy_matcher crate to determine how similar the attribute is to the query input, which SQLite then uses to order each row in the database according to its score. The top results are returned, and that’s it!

We hope that others find rippkgs interesting and find interesting ways to use it. For example, we’ve incorporated rippkgs into our command-not-found hook to give users better visibility into what they’re installing. Interested in writing small CLIs for problems affecting millions of users daily? We’re hiring!

More