Ludovic Courtès <ludo@gnu.org> writes:

>> I still think it would be a good idea to call unshare to create an extra
>> user and mount namespace just before executing the builder in the
>> unprivileged case, just to be sure that the mount-locking behavior is
>> triggered in a way that is documented.
>
> For some reason, it’s not working as advertised: mounts are seemingly
> not locked together and umount(2) on one of them returns EPERM (instead
> of EINVAL).  I suspect chroot, pivot_root, & co. spoil it all.

What this shows is that we're not currently testing the mount-locking
because the builder lacks the necessary capability in its user namespace
(this capability was removed from the effective set when the builder was
exec'ed).  That is, the kernel doesn't get as far as checking whether
the mount is locked because the caller wouldn't have the permission to
unmount it even if it weren't locked.  One way to test this would be to
use setns (perhaps via container-excursion) to enter the namespaces of
the builder, which will cause you to start out with a full set of
effective capabilities in its user namespace, then try umount.  Or a
test could be done within the daemon shortly prior to exec.

> Attached is a patch and test case.

[...]

> diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
> index 057a15ccd0..6a6a960a1c 100644
> --- a/nix/libstore/build.cc
> +++ b/nix/libstore/build.cc
> @@ -2244,6 +2244,13 @@ void DerivationGoal::runChild()
>  	    /* Remount root as read-only.  */
>              if (mount("/", "/", 0, MS_BIND | MS_REMOUNT | MS_RDONLY, 0) == -1)
>                  throw SysError(format("read-only remount of build root '%1%' failed") % chrootRootDir);
> +
> +	    if (getuid() != 0) {
> +		/* Create a new mount namespace to "lock" previous mounts.
> +		   See mount_namespaces(7).  */
> +		if (unshare(CLONE_NEWNS | CLONE_NEWUSER) == -1)
> +		    throw SysError(format("creating new user and mount namespaces"));
> +	    }
>          }
>  #endif

Note that we still need to initialize /proc/self/uid_map and friends for
the new user namespace, if I understand correctly.  My reading of
user_namespaces(7) is that it should be possible to do this from within
the new user namespace.

> To be sure, I wrote a minimal C program: umount returns EINVAL as
> expected.  However, when compiling it with -DWITH_CHROOT, unshare(2)
> fails with EPERM because “the caller's root directory does not match the
> root directory of the mount namespace in which it resides” (quoting
> unshare(2)).
>
> So I now get the idea of “locked mounts” but I’m at loss as to how this
> is supposed to interact with chroots.

I hadn't heard of that restriction on unshare and clone.  Thinking about
it, I suppose the reason could be that merely creating a user namespace
grants CAP_SYS_CHROOT, and because the current root directory is a
per-process property whose setting isn't limited by any namespace, it
would be possible to undo a chroot someone had tried to set as a
restriction on the current process by just calling chroot("/").  But if
we use pivot_root in conjunction with it, like we do in the daemon, it
should work.

- reepca