Ludovic Courtès <ludo@gnu.org> writes:

>> We seem to be relying entirely on the Linux behavior of chown to reset
>> setuid / setgid bits.
>
> Yes, and I think chown(2) is quite clear:
>
>   When the owner or group of an executable file is changed by an
>   unprivileged user, the S_ISUID and S_ISGID mode bits are cleared.
>   […] since Linux 2.2.13, root is treated like other users.

Ah, I misread it as "changed by an unprivileged process", not "changed
by an unprivileged user".  That clears things up.

>> Now, for the non-pedantic, significant issue that I came across while
>> writing all that: previously, it was not possible for the
>> tmpDir-exposing code to be reached without doing the _chown that also
>> reset setuid and setgid bits.  But with this patch, in the non-root,
>> non-CAP_CHOWN case (which is what is currently proposed for Guix
>> System), it can be reached through the catch clause.  In that case it
>> will expose tmpDir without changing any permission bits of files beneath
>> it, allowing anybody who can access a setuid program in tmpDir (which,
>> due to that 0755 chmod, is "everybody") to take control of the build
>> user (which in this case would be guix-daemon).
>
> I’m not sure I understand what you mean by “the tmpDir-exposing code”;
> are you talking about ‘DerivationGoal::deleteTmpDir’?

by "the tmpDir-exposing code" I mean this section inside of
DerivationGoal::deleteTmpDir, starting at nix/libstore/build.cc line
2818 in commit e6c588:

--8<---------------cut here---------------start------------->8---
		if (top != tmpDir) {
		    // Rename tmpDir to its parent, with an intermediate step.
		    string pivot = top + ".pivot";
		    if (rename(top.c_str(), pivot.c_str()) == -1)
			throw SysError("pivoting failed build tree");
		    if (rename((pivot + "/top").c_str(), top.c_str()) == -1)
			throw SysError("renaming failed build tree");

		    if (getuid() != 0)
			/* Running unprivileged but with CAP_CHOWN.  */
			chown(top.c_str(), uid, gid);

		    rmdir(pivot.c_str());
		}
--8<---------------cut here---------------end--------------->8---

>> I still think it would be a good idea to call unshare to create an extra
>> user and mount namespace just before executing the builder in the
>> unprivileged case, just to be sure that the mount-locking behavior is
>> triggered in a way that is documented.
>
> The problem with “mount locking” (and “peer group” and in fact most
> “concepts” mentioned in mount(2)) is that it’s not clearly defined.
> Here I’m relying on unit tests to ensure that the various bits can
> indeed not be remounted read-write, for instance.  (“make check” tests
> the same setup as unprivileged daemon, which is an advantage over the
> current situation where the separate-build-user setup is not covered by
> the test suite.)

Both of those concepts are described in mount_namespaces(7).  While my
reading of that leaves me with several questions, the section
"restrictions on mount namespaces" does have this:

       [5] The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the "atime"
           flags  (MS_NOATIME,  MS_NODIRATIME,  MS_RELATIME)  settings  become
           locked when propagated from a more privileged to a less  privileged
           mount  namespace,  and  may  not  be changed in the less privileged
           mount namespace.

           This point is illustrated in the following example where, in a more
           privileged  mount  namespace, we create a bind mount that is marked
           as read-only.  For security reasons, it should not be  possible  to
           make  the  mount writable in a less privileged mount namespace, and
           indeed the kernel prevents this:

               $ sudo mkdir /mnt/dir
               $ sudo mount --bind -o ro /some/path /mnt/dir
               $ sudo unshare --user --map-root-user --mount \
                              mount -o remount,rw /mnt/dir
               mount: /mnt/dir: permission denied.

which seems to indicate that it is sufficient for preventing
modification of mount flags that the caller be in a less privileged
mount namespace than the one the mount was inherited from.  "Less
privileged" is defined as:

       [1] Each mount namespace has an owner  user  namespace.   As  explained
           above,  when  a  new  mount namespace is created, its mount list is
           initialized as a copy of the mount list of another mount namespace.
           If  the  new  namespace and the namespace from which the mount list
           was copied are owned by different user  namespaces,  then  the  new
           mount namespace is considered less privileged.

So putting the builder in a fresh mount namespace owned by a fresh user
namespace should suffice to achieve this.

It's worth noting that EPERM is returned by mount both for attempts to
modify locked mount points and for just generally not having the
required capability, so a unit test may have trouble establishing why a
particular behavior is being observed.  Ideally it wouldn't be possible
to modify the inputs even if the builder managed to acquire the required
capability in its user namespace.

- reepca