Ludovic Courtès writes: >> For my two cents, I do think that it's still a tradeoff - not just >> because of the reliance on different kernel mechanisms for security, but >> also because the rootless daemon currently causes visible changes to the >> build environment (EROFS on /, and nothing owned by root, for example). >> Which one should we consider the "canonical" build environment going >> forward? > > The way I see it, we would gradually move to the non-root daemon: > > • First step here is to enable it by default on systemd distros. > > • Second step would be to allow Guix System users to migrate to > non-root daemon, keeping the default unchanged. > > • Third step (a year later maybe?) would be to default to non-root > daemon on Guix System and on remaining distros (though for these it > might be trickier because we probably cannot rely on CAP_SYS_CHOWN, > not as easily as with systemd at least). > > The visible changes in the build environment are unfortunate; I’m hoping > they won’t have any practical impact, not any more than the other > parameters that may change currently (build UID, binfmt_misc, file > system, etc.) We could test this hypothesis by rebuilding at least the > entire set of packages up to ‘hello’. (I tried doing it just now in a > Debian VM but failed since the main partition cannot easily be extended; > it’ll be easier to do with Guix System.) > For what it's worth, the visible changes could be avoided with subordinate ids, as I wrote in https://debbugs.gnu.org/cgi/bugreport.cgi?bug=75810#86. We could try it as-is and see how it goes, and if there are problems with reproducibility add on using subordinate ids. I would expect it to be a much smaller change than the root->rootless transition. Hopefully it works well enough as-is that offload systems could be set up without any special permissions. > Then there are 3 possibilities: > > 1. If /proc/self/exe points to (say) /usr/bin/guix-daemon, outside the > store, execve fails with ENOENT because that file is not mounted in > the chroot. No, like I wrote, /proc/self/exe, despite being reported as a symlink by stat, does not follow the usual symlink semantics. This is much like how the files in /proc/self/fd work, e.g.: scheme@(guile-user)> (open-file "/tmp/freshfile" "w+") $1 = # scheme@(guile-user)> (delete-file "/tmp/freshfile") scheme@(guile-user)> (stat:type (lstat "/proc/self/fd/13")) $3 = symlink scheme@(guile-user)> (readlink "/proc/self/fd/13") $4 = "/tmp/freshfile (deleted)" scheme@(guile-user)> (open-file "/proc/self/fd/13" "w+") $5 = # Here is a test program to demonstrate this (it's unfortunately rather tricky to demonstrate using usual command line tools): --8<---------------cut here---------------start------------->8--- /* Test program to demonstrate that /proc/self/exe does not behave like a symlink, and a process can exec /proc/self/exe even if there is no other filename by which the currently-executing program can be reached. Compile with -static (may require a 'guix shell gcc-toolchain glibc:static') and run. */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #define die(msg, status) \ do \ { \ perror(msg); \ exit(status); \ } while(0); \ #define pivot_root(new_root, put_old) (syscall(SYS_pivot_root, new_root,put_old)) int child_main(void * arg) { int *pipefds = (int *) arg; char *argv[2] = {"???", NULL}; int saved_errno; char c; if(close(pipefds[1])) die("Child closing write end of pipe", 20); if(read(pipefds[0], &c, 1 * sizeof(char)) < (1 * sizeof(char))) die("Reading from pipe", 21); if(c != 'y') die("Parent setup failed", 22); if(close(pipefds[0])) die("Child closing read end of pipe", 23); if(mkdir("/tmp/test-chroot", 0700) && errno != EEXIST) die("mkdir(\"/tmp/test-chroot\")", 24); if(mount(0, "/", 0, MS_REC|MS_PRIVATE, 0)) die("making / private", 25); if(mount("/tmp/test-chroot", "/tmp/test-chroot", 0, MS_BIND, 0)) die("making /tmp/test-chroot its own filesystem", 26); if(mkdir("/tmp/test-chroot/proc", 0700) && errno != EEXIST) die("mkdir(\"/tmp/test-chroot/proc\")", 27); if(mount("none", "/tmp/test-chroot/proc", "proc", 0, 0)) die("mount procfs", 28); if(chdir("/tmp/test-chroot")) die("chdir to /tmp/test-chroot", 29); if(mkdir("real-root", 0)) die("mkdir(\"real-root\")", 30); if(pivot_root(".", "real-root")) die("pivot_root", 31); if(chroot(".")) die("chroot", 32); if(umount2("real-root", MNT_DETACH)) die("unmounting real root", 33); if(rmdir("real-root")) die("removing real root directory", 34); if(execve("/proc/self/exe", argv, environ)) { saved_errno = errno; fprintf(stderr, "execve errno: %d\n", saved_errno); errno = saved_errno; die("execve", 35); } } int main(int argc, char **argv) { pid_t child_pid = -1; char stack[32 * 1024]; int flags; int result; int status; FILE * f; char strbuf[512]; int j; int pipefds[2]; if(getenv("TEST_PROGRAM_MAGIC_ENV_VAR")) { fprintf(stderr, "Self-exec'ed!\n"); fprintf(stderr, "Arguments: \n"); for(j = 0; j < argc; j++) fprintf(stderr, "%s\n", argv[j]); j = readlink("/proc/self/exe", strbuf, sizeof(strbuf) - 1); if(j >= 0) { strbuf[j] = '\0'; fprintf(stderr, "/proc/self/exe readlink result: %s\n", strbuf); f = fopen(strbuf, "r"); if(f) { fprintf(stderr, "... and that file exists!\n"); fclose(f); } else { perror("fopen /proc/self/exe readlink result error"); fprintf(stderr, "... and that file does not exist.\n"); } f = fopen("/proc/self/exe", "r"); if(f) { fprintf(stderr, "/proc/self/exe can be opened\n"); fclose(f); } else { perror("fopen /proc/self/exe error"); fprintf(stderr, "/proc/self/exe cannot be opened\n"); } } return 0; } setenv("TEST_PROGRAM_MAGIC_ENV_VAR", "1", 1); if(pipe(pipefds)) die("pipe()", 1); flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWUTS | CLONE_NEWUSER; flags = flags | SIGCHLD; child_pid = clone(child_main, (char *)(((uintptr_t)stack + sizeof(stack) - 8) & ~((uintptr_t)0xf)), flags, (void *) pipefds); if(child_pid == -1) die("clone()", 1); if(close(pipefds[0])) die("parent closing read pipe end", 2); if(snprintf(strbuf, sizeof(strbuf), "/proc/%d/uid_map", child_pid) >= sizeof(strbuf)) die("uid_map snprintf", 3); f = fopen(strbuf, "w"); if(!f) die("uid_map fopen", 4); if(snprintf(strbuf, sizeof(strbuf), "%u %u 1", geteuid(), geteuid()) >= sizeof(strbuf)) die("uid_map contents snprintf", 5); if(fwrite(strbuf, strlen(strbuf) * sizeof(char), 1, f) < 1) die("uid_map fwrite", 6); if(fclose(f) == EOF) die("uid_map fclose", 7); if(snprintf(strbuf, sizeof(strbuf), "/proc/%d/setgroups", child_pid) >= sizeof(strbuf)) die("setgroups snprintf", 8); f = fopen(strbuf, "w"); if(!f) die("setgroups fopen", 9); if(fwrite("deny", 4 * sizeof(char), 1, f) < 1) die("setgroups fwrite", 10); if(fclose(f) == EOF) die("setgroups fclose", 11); if(snprintf(strbuf, sizeof(strbuf), "/proc/%d/gid_map", child_pid) >= sizeof(strbuf)) die("gid_map snprintf", 12); f = fopen(strbuf, "w"); if(!f) die("gid_map fopen", 13); if(snprintf(strbuf, sizeof(strbuf), "%u %u 1", getegid(), getegid()) >= sizeof(strbuf)) die("gid_map contents snprintf", 14); if(fwrite(strbuf, strlen(strbuf) * sizeof(char), 1, f) < 1) die("gid_map fwrite", 15); if(fclose(f) == EOF) die("gid_map fclose", 16); if(write(pipefds[1], "y", sizeof("y") - (1 * sizeof(char))) < (1 * sizeof(char))) die("writing to pipe", 17); if(close(pipefds[1])) die("parent closing write pipe end", 18) while (1) { result = waitpid(child_pid, &status, 0); if(result == child_pid) return WEXITSTATUS(status); if(result == -1 && errno != EINTR) die("waitpid", 19); } } --8<---------------cut here---------------end--------------->8--- Here's what it looks like when I run it: --8<---------------cut here---------------start------------->8--- $ /tmp/test-program Self-exec'ed! Arguments: ??? /proc/self/exe readlink result: /tmp/test-program fopen /proc/self/exe readlink result error: No such file or directory ... and that file does not exist. /proc/self/exe can be opened --8<---------------cut here---------------end--------------->8--- > Here’s a test that fails both “rootfull” and “rootless”: > > (let* ((builder (add-file-tree-to-store %store > `("builder" symlink "/proc/self/exe"))) > (drv (derivation %store "attempt-to-run-guix-daemon" builder '() > #:env-vars > '(("LD_PRELOAD" . "attacker-controlled.so"))))) > (guard (c ((store-protocol-error? c) c)) > (build-derivations %store (list drv)) > #f)) From 'man 2 execve': ENOENT The file pathname or a script *or ELF interpreter* does not exist. (emphasis mine). The dynamic linker registered in guix-daemon's binary is not likely to exist in the container in this test, but an attacker could easily make it so as long as it's in the store. > diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc > index 7484ea8fcf..970107c265 100644 > --- a/nix/libstore/build.cc > +++ b/nix/libstore/build.cc > @@ -2374,6 +2374,15 @@ void DerivationGoal::runChild() > writeFull(STDERR_FILENO, "error: " + string(e.what()) + "\n"); > _exit(1); > } > + } else { > + /* Ensure that the builder is within the store. This prevents > + users from using /proc/self/exe (or a symlink to it) as their > + builder, which could allow them to overwrite the guix-daemon > + binary (CVE-2019-5736). */ > + drv.builder = canonPath(drv.builder, true); > + printMsg(lvlError, format("builder is `%1%'") % drv.builder); > + if (!isInStore(drv.builder)) > + throw Error(format("derivation builder '%1%' is outside the store") % drv.builder); > } > > /* Fill in the arguments. */ Note that we should still supply the original name or basename as argv[0]. While ensuring that what actually gets execve'd is in the store suffices to eliminate the vulnerability, it may be "conceptually purer" to require that the links pointing to it are all in the store as well. For example, while a builder that is a symlink pointing to /proc/self/exe wouldn't be able to modify the daemon binary, it's still a piece of basically "undefined behavior" as far as the build environment is concerned, which could be closed up. But that can come later just as well. One more consideration I noticed when looking at v6's patch 14/16 (the guix-daemon.service one): we don't do anything to set the gid. I know on guix system we usually use both dedicated privilege separation users and groups for services. Should we use a dedicated group for guix-daemon as well? Note that currently the chroot directories have 0750 permissions, so it's very important that their group not be accessible to others. - reepca