This article describes how we1 extended nsncd to support NSS host lookups, and provided it for NixOS 22.11 as a drop-in alternative.
What is NSS, how does it work, why should I care?
NSS is what’s being used under the hood of a Linux system to translate users, groups and hosts 2 from names to numbers/IPs (and back).
You probably heard of DNS and
/etc/hosts that are used to look up hostnames,
but in a modern system, there’s a bunch of other (dynamic) sources to choose
from, such as:
- Local network device discovery (Zeroconf / Avahi)
- Names of containers running on your machine.
*.localhostnames, which come handy for multi-vhost testing.
Similarly, user and group names might be provided by some directory services (LDAP etc.)
All of these lookups are provided by the NSS (Name Service Switch) mechanism, which is part of glibc, a low-level system library used in most binaries.
/etc/nsswitch.conf for the list of configured NSS modules, and
then queries each of these in the defined order.
Usually, this file is configured by the system administrator according to the desired local configuration.
What’s problematic with it?
All NSS modules are essentially just a
libnss_*.so file, that’s
from well-known locations into the running process on the first lookup.
On regular distros, this mostly works as long as the
.sofiles are mostly compatible, but long-running processes (or just old binaries)
dlopen()‘ing new NSS modules can segault the binary. 3
Nix-built binaries running on non-NixOS systems can’t find the NSS modules specified in the hosts
/etc/nsswitch.conf, because a nix-built glibc only knows how to load the most basic NSS modules (the ones shipped with glibc directly) look in
On NixOS (and GUIX), this is worked this around so far, by
making use of
nscd was meant as a “caching daemon” for NSS requests. In case
glibc sees a
unix socket at
/var/run/nscd/socket, it tries to connect to it, and run
queries through it, using a undocumented, but somewhat stable binary protocol.
That daemon can be “steered appropriately” to find the NSS modules specified in
/etc/nsswitch.conf 4, or in the case of non-NixOS, use the host-provided
NSS modules from
/usr/lib or similar.
nscd takes care of the
dlopen() calls, segfaults and problems with
ABI incompatibilities are minimized.
Problems with nscd
nscd exposed some problems:
Caching, even when disabled
We tried hard to disable caching in nscd.
Yet, there were occurences where nscd still seems to cache results.
Occasionally getting stuck
When roaming around in various WiFi networks, especially those with captive portals, I often experienced entirely “stuck” DNS lookups for tens of seconds.
systemctl restart nscd did help, sometimes not.
Search for alternatives
This problem kept popping up over and over again. We tried some of the alternatives, such as versioned import paths, or using some of the alternative nscd implementations, but ultimately none of them supported the feature set we needed to be a nscd replacement.
We ultimately decided to extend nsncd, a non-caching nscd alternative, written in Rust, that already supported most of the lookup types, with support for host lookups.
This required understanding a lot of very hard-to-read glibc code, combined with using sockdump to stare at the bits going over the wire, and re-implementing the various lookup methods required for it.
We also added support for
sd_notify readyness signalling.
A NixOS test was added to verify matching NSS lookup behaviour for both nscd
and nsncd, as well as providing wire format dumps via
We also added a NixOS option,
services.nscd.enableNsncd, which can be set to
true to use
nsncd instead of
We plan to flip the default for the release after NixOS 22.11. Please give this some testing!
The host lookup patches are still under review by the upstream maintainer, Two Sigma. For now, the nixpkgs version points to a fork maintained in the nix-community project, but obviously, having host lookups “just work” in the “official” nsncd package will make it much easier for users on non-NixOS systems to install it.
We’re working with upstream to hopefully get this merged in some form.
nsncd: Wire Tests
We want to include more wire format unit tests of various lookup responses into nsncd itself. andi found a bug when looking up IPv6-only hosts that some glibc clients handled ungracefully.
nsncd: Socket activation
nsncd is very quick to start up, we still would like to see it being
socket-activated to prevent failed lookups early during boot, or when switching
to a new NixOS configuration and restarting nsncd while doing so.
nsncd had support for being socket-activated, but that got
removed due to some deadlocks.
It might have gotten fixed by a recent systemd commit and should probably be re-evaluated.
nsncd: Use client namespace
We also got feedback from some users they disable ns(n)cd because they run some workloads in a separate network namespace, where everything is tunneled via a VPN, and don’t want to leak DNS lookups to the untunneled connection. We should investigate if we can detect the network namespace of the client that’s connecting, and do the lookup in that namespace, rather than in the host namespace.
There’s a nixpkgs PR adding a patch to glibc, to have it look for NSS modules in another path, which doesn’t affect other module loading path.
It could solve as a workaround for:
- the “host lookup network namespace leakage problems” described above
- non-NixOS distributions where ns(n)cd can’t be run at all, but the user is super sure about the NSS modules pointed to being compatible with the run binaries
Even with all these improvements on nsncd, we should probably still include this somehow. Ideally, reach out to glibc upstream, and see if something like this can be added.
glibc: Simplify client code
The current glibc nscd client code is pretty convoluted, and in some cases, asks for a file descriptor pointing to the internal nscd cache structures “to look if the response is there already”, and then some logic to extract it from there client-side. The protocol also has some more commands regarding shutdown and flushing of the cache.
All of these commands are not really desirable in case of a non-caching implementation that simply acts as a dispatcher, so the client code could probably be simplified a lot / rewritten, to stop using the other request types.
This should be in line with Fedora’s choice to remove nscd in Fedora 36 and discussion around a simplification.
We should use nsncd for these usecases, and get the nscd client-code simplified.
This is mostly me and NinjaTrappeur, while helping a NumTide customer, OTTO Motors. ↩︎
There’s some more “databases” it provides lookups for, check nsswitch.conf for the full list ↩︎
https://github.com/NixOS/nixpkgs/pull/138178#issuecomment-925104467, https://github.com/erikarvstedt/check-glibc-compatibilities/ ↩︎
This is usually accomplished by setting its
LD_LIBRARY_PATHto all the NSS module paths configured in the Host OS. ↩︎