Ludovic Courtès <ludo@gnu.org> writes:

> @@ -2013,23 +2057,36 @@ void DerivationGoal::runChild()
>  
>          _writeToStderr = 0;
>  
> +	if (readiness.writeSide > 0) readiness.writeSide.close();
> +
> +	if (readiness.readSide > 0) {
> +	     /* Wait for the parent process to initialize the UID/GID mapping
> +		of our user namespace.  */
> +	     char str[20] = { '\0' };
> +	     readFull(readiness.readSide, (unsigned char*)str, 3);
> +	     if (strcmp(str, "go\n") != 0)
> +		  throw Error("failed to initialize process in unprivileged user namespace");
> +	}
> +
>          restoreAffinity();

(in patch 7/16, in nix/libstore/build.cc)

Strictly speaking we should check whether the fds are >= 0, not > 0,
since 0 is technically a valid file descriptor, and we use -1 to
indicate the absence of a file descriptor.

Also, readiness.readSide isn't explicitly closed in the child after
we're done with it.

> +
> +	# The unprivileged daemon cannot create the log directory by itself.
> +	mkdir /var/log/guix
> +	chown guix-daemon:guix-daemon /var/log/guix
> +	chmod 755 /var/log/guix

(in patch 15/16, in etc/guix-install.sh)

Should this be 'mkdir -p' or some other conditional creation?  If I
understand correctly this will fail when overwriting an existing install
using GUIX_ALLOW_OVERWRITE.

Concerning guix-install.sh, to be clear, is the intent to specifically
not support installing the rootful daemon on systemd systems?

For my two cents, I do think that it's still a tradeoff - not just
because of the reliance on different kernel mechanisms for security, but
also because the rootless daemon currently causes visible changes to the
build environment (EROFS on /, and nothing owned by root, for example).
Which one should we consider the "canonical" build environment going
forward?


I decided to do some searching for container escapes / vulnerabilities
online, just to be extra careful, and found one that relies on the
container entry point being run as the user that owns the program that
execs the container entry point: CVE-2019-5736.  It exploits the fact
that /proc/self/exe, despite being displayed like a symlink in the
output of ls, does not actually act like a symlink, and indeed acts more
like a hardlink that readlink happens to have some associated data for.

The demo, modified for guix circumstances, would go something like this:

1. A derivation is created whose builder is /proc/self/exe, and whose
   LD_PRELOAD environment variable points into a malicious store item
   for one of its shared libraries - for example, libc.
2. The daemon reads this in, and, to my knowledge, does no verification
   of the builder string.  Note that this aspect isn't actually
   necessary, as the builder could also be a symlink to /proc/self/exe
   from the store.
3. The daemon sets up the build environment, and execs /proc/self/exe.
4. An attacker-controlled load-time function gets run.
5. It opens /proc/self/exe, initially read-only because it can't be
   opened writable while a process is executing it.  It then execs
   another attacker-controlled process, which inherits the open file
   descriptor and subsequently opens it via /proc/self/fd/<fd>, this
   time read-write (it can do this because it owns the file, and even if
   it's not writable, a quick fchmod will fix that, and the filesystem
   it was originally opened from isn't read-only, because guix-daemon
   starts before gnu-store.mount bind-mounts /gnu/store to itself prior
   to making it read-only).  It then overwrites the resulting file
   descriptor with whatever contents it wants.
6. The next time guix-daemon is started outside the container, it runs
   attacker-controlled code.


There are several points at which that particular attack could be
stopped, and I'd like to try to stop it at as many of them as possible.
A good start would be canonicalizing the builder prior to executing it
and then checking to make sure it is in the store.  A more general
solution could look like writing out and then executing a tiny binary,
something like /tmp/runbuilder, that does nothing but unlink itself and
then exec the actual program.

Here's a writeup of the CVE in question:
https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736/

Aside from all that, it looks good to me.

- reepca