Ludovic Courtès writes: >> We seem to be relying entirely on the Linux behavior of chown to reset >> setuid / setgid bits. > > Yes, and I think chown(2) is quite clear: > > When the owner or group of an executable file is changed by an > unprivileged user, the S_ISUID and S_ISGID mode bits are cleared. > […] since Linux 2.2.13, root is treated like other users. Ah, I misread it as "changed by an unprivileged process", not "changed by an unprivileged user". That clears things up. >> Now, for the non-pedantic, significant issue that I came across while >> writing all that: previously, it was not possible for the >> tmpDir-exposing code to be reached without doing the _chown that also >> reset setuid and setgid bits. But with this patch, in the non-root, >> non-CAP_CHOWN case (which is what is currently proposed for Guix >> System), it can be reached through the catch clause. In that case it >> will expose tmpDir without changing any permission bits of files beneath >> it, allowing anybody who can access a setuid program in tmpDir (which, >> due to that 0755 chmod, is "everybody") to take control of the build >> user (which in this case would be guix-daemon). > > I’m not sure I understand what you mean by “the tmpDir-exposing code”; > are you talking about ‘DerivationGoal::deleteTmpDir’? by "the tmpDir-exposing code" I mean this section inside of DerivationGoal::deleteTmpDir, starting at nix/libstore/build.cc line 2818 in commit e6c588: --8<---------------cut here---------------start------------->8--- if (top != tmpDir) { // Rename tmpDir to its parent, with an intermediate step. string pivot = top + ".pivot"; if (rename(top.c_str(), pivot.c_str()) == -1) throw SysError("pivoting failed build tree"); if (rename((pivot + "/top").c_str(), top.c_str()) == -1) throw SysError("renaming failed build tree"); if (getuid() != 0) /* Running unprivileged but with CAP_CHOWN. */ chown(top.c_str(), uid, gid); rmdir(pivot.c_str()); } --8<---------------cut here---------------end--------------->8--- >> I still think it would be a good idea to call unshare to create an extra >> user and mount namespace just before executing the builder in the >> unprivileged case, just to be sure that the mount-locking behavior is >> triggered in a way that is documented. > > The problem with “mount locking” (and “peer group” and in fact most > “concepts” mentioned in mount(2)) is that it’s not clearly defined. > Here I’m relying on unit tests to ensure that the various bits can > indeed not be remounted read-write, for instance. (“make check” tests > the same setup as unprivileged daemon, which is an advantage over the > current situation where the separate-build-user setup is not covered by > the test suite.) Both of those concepts are described in mount_namespaces(7). While my reading of that leaves me with several questions, the section "restrictions on mount namespaces" does have this: [5] The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the "atime" flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME) settings become locked when propagated from a more privileged to a less privileged mount namespace, and may not be changed in the less privileged mount namespace. This point is illustrated in the following example where, in a more privileged mount namespace, we create a bind mount that is marked as read-only. For security reasons, it should not be possible to make the mount writable in a less privileged mount namespace, and indeed the kernel prevents this: $ sudo mkdir /mnt/dir $ sudo mount --bind -o ro /some/path /mnt/dir $ sudo unshare --user --map-root-user --mount \ mount -o remount,rw /mnt/dir mount: /mnt/dir: permission denied. which seems to indicate that it is sufficient for preventing modification of mount flags that the caller be in a less privileged mount namespace than the one the mount was inherited from. "Less privileged" is defined as: [1] Each mount namespace has an owner user namespace. As explained above, when a new mount namespace is created, its mount list is initialized as a copy of the mount list of another mount namespace. If the new namespace and the namespace from which the mount list was copied are owned by different user namespaces, then the new mount namespace is considered less privileged. So putting the builder in a fresh mount namespace owned by a fresh user namespace should suffice to achieve this. It's worth noting that EPERM is returned by mount both for attempts to modify locked mount points and for just generally not having the required capability, so a unit test may have trouble establishing why a particular behavior is being observed. Ideally it wouldn't be possible to modify the inputs even if the builder managed to acquire the required capability in its user namespace. - reepca