NSS is what’s being used under the hood of a Linux system to translate users, groups and hosts 2 from names to numbers/IPs (and back).
You probably heard of DNS and
/etc/hosts that are used to look up hostnames,
but in a modern system, there’s a bunch of other (dynamic) sources to choose
from, such as:
- Local network device discovery (Zeroconf / Avahi)
- Names of containers running on your machine.
*.localhostnames, which come handy for multi-vhost testing.
Similarly, user and group names might be provided by some directory services (LDAP etc.)
All of these lookups are provided by the NSS (Name Service Switch) mechanism, which is part of glibc, a low-level system library used in most binaries.
/etc/nsswitch.conf for the list of configured NSS modules, and
then queries each of these in the defined order.
Usually, this file is configured by the system administrator according to the desired local configuration.
All NSS modules are essentially just a
libnss_*.so file, that’s
from well-known locations into the running process on the first lookup.
On regular distros, this mostly works as long as the
.sofiles are mostly compatible, but long-running processes (or just old binaries)
dlopen()‘ing new NSS modules can segault the binary. 3
Nix-built binaries running on non-NixOS systems can’t find the NSS modules specified in the hosts
/etc/nsswitch.conf, because a nix-built glibc only knows how to load the most basic NSS modules (the ones shipped with glibc directly) look in
On NixOS (and GUIX), this is worked this around so far, by
making use of
nscd was meant as a “caching daemon” for NSS requests. In case
glibc sees a
unix socket at
/var/run/nscd/socket, it tries to connect to it, and run
queries through it, using a undocumented, but somewhat stable binary protocol.
That daemon can be “steered appropriately” to find the NSS modules specified in
/etc/nsswitch.conf 4, or in the case of non-NixOS, use the host-provided
NSS modules from
/usr/lib or similar.
nscd takes care of the
dlopen() calls, segfaults and problems with
ABI incompatibilities are minimized.
nscd exposed some problems:
We tried hard to disable caching in nscd.
Yet, there were occurences where nscd still seems to cache results.
When roaming around in various WiFi networks, especially those with captive portals, I often experienced entirely “stuck” DNS lookups for tens of seconds.
systemctl restart nscd did help, sometimes not.
This problem kept popping up over and over again. We tried some of the alternatives, such as versioned import paths, or using some of the alternative nscd implementations, but ultimately none of them supported the feature set we needed to be a nscd replacement.
This required understanding a lot of very hard-to-read glibc code, combined with using sockdump to stare at the bits going over the wire, and re-implementing the various lookup methods required for it.
We also added support for
sd_notify readyness signalling.
A NixOS test was added to verify matching NSS lookup behaviour for both nscd
and nsncd, as well as providing wire format dumps via
We also added a NixOS option,
services.nscd.enableNsncd, which can be set to
true to use
nsncd instead of
We plan to flip the default for the release after NixOS 22.11. Please give this some testing!
The host lookup patches are still under review by the upstream maintainer, Two Sigma. For now, the nixpkgs version points to a fork maintained in the nix-community project, but obviously, having host lookups “just work” in the “official” nsncd package will make it much easier for users on non-NixOS systems to install it.
We’re working with upstream to hopefully get this merged in some form.
nsncd is very quick to start up, we still would like to see it being
socket-activated to prevent failed lookups early during boot, or when switching
to a new NixOS configuration and restarting nsncd while doing so.
nsncd had support for being socket-activated, but that got
removed due to some deadlocks.
It might have gotten fixed by a recent systemd commit and should probably be re-evaluated.
We also got feedback from some users they disable ns(n)cd because they run some workloads in a separate network namespace, where everything is tunneled via a VPN, and don’t want to leak DNS lookups to the untunneled connection. We should investigate if we can detect the network namespace of the client that’s connecting, and do the lookup in that namespace, rather than in the host namespace.
There’s a nixpkgs PR adding a patch to glibc, to have it look for NSS modules in another path, which doesn’t affect other module loading path.
It could solve as a workaround for:
- the “host lookup network namespace leakage problems” described above
- non-NixOS distributions where ns(n)cd can’t be run at all, but the user is super sure about the NSS modules pointed to being compatible with the run binaries
Even with all these improvements on nsncd, we should probably still include this somehow. Ideally, reach out to glibc upstream, and see if something like this can be added.
The current glibc nscd client code is pretty convoluted, and in some cases, asks for a file descriptor pointing to the internal nscd cache structures “to look if the response is there already”, and then some logic to extract it from there client-side. The protocol also has some more commands regarding shutdown and flushing of the cache.
All of these commands are not really desirable in case of a non-caching implementation that simply acts as a dispatcher, so the client code could probably be simplified a lot / rewritten, to stop using the other request types.
This should be in line with Fedora’s choice to remove nscd in Fedora 36 and discussion around a simplification.
We should use nsncd for these usecases, and get the nscd client-code simplified.
This is usually accomplished by setting its
LD_LIBRARY_PATHto all the NSS module paths configured in the Host OS. ↩︎