GNU bug report logs - #75810
[PATCH 0/6] Rootless guix-daemon

Previous Next

Package: guix-patches;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Fri, 24 Jan 2025 17:24:02 UTC

Severity: normal

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #340 received at 75810 <at> debbugs.gnu.org (full text, mbox):

From: Reepca Russelstein <reepca <at> russelstein.xyz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 75810 <at> debbugs.gnu.org
Subject: Re: [bug#75810] [PATCH v6 00/16] Rootless guix-daemon
Date: Tue, 18 Mar 2025 18:19:04 -0500
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

>> For my two cents, I do think that it's still a tradeoff - not just
>> because of the reliance on different kernel mechanisms for security, but
>> also because the rootless daemon currently causes visible changes to the
>> build environment (EROFS on /, and nothing owned by root, for example).
>> Which one should we consider the "canonical" build environment going
>> forward?
>
> The way I see it, we would gradually move to the non-root daemon:
>
>   • First step here is to enable it by default on systemd distros.
>
>   • Second step would be to allow Guix System users to migrate to
>     non-root daemon, keeping the default unchanged.
>
>   • Third step (a year later maybe?) would be to default to non-root
>     daemon on Guix System and on remaining distros (though for these it
>     might be trickier because we probably cannot rely on CAP_SYS_CHOWN,
>     not as easily as with systemd at least).
>
> The visible changes in the build environment are unfortunate; I’m hoping
> they won’t have any practical impact, not any more than the other
> parameters that may change currently (build UID, binfmt_misc, file
> system, etc.)  We could test this hypothesis by rebuilding at least the
> entire set of packages up to ‘hello’.  (I tried doing it just now in a
> Debian VM but failed since the main partition cannot easily be extended;
> it’ll be easier to do with Guix System.)
>

For what it's worth, the visible changes could be avoided with
subordinate ids, as I wrote in
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=75810#86.

We could try it as-is and see how it goes, and if there are problems
with reproducibility add on using subordinate ids.  I would expect it to
be a much smaller change than the root->rootless transition.

Hopefully it works well enough as-is that offload systems could be set
up without any special permissions.

> Then there are 3 possibilities:
>
>   1. If /proc/self/exe points to (say) /usr/bin/guix-daemon, outside the
>      store, execve fails with ENOENT because that file is not mounted in
>      the chroot.

No, like I wrote, /proc/self/exe, despite being reported as a symlink by
stat, does not follow the usual symlink semantics.  This is much like
how the files in /proc/self/fd work, e.g.:

scheme@(guile-user)> (open-file "/tmp/freshfile" "w+")
$1 = #<input-output: /tmp/freshfile 13>
scheme@(guile-user)> (delete-file "/tmp/freshfile")
scheme@(guile-user)> (stat:type (lstat "/proc/self/fd/13"))
$3 = symlink
scheme@(guile-user)> (readlink "/proc/self/fd/13")
$4 = "/tmp/freshfile (deleted)"
scheme@(guile-user)> (open-file "/proc/self/fd/13" "w+")
$5 = #<input-output: /proc/self/fd/13 14>

Here is a test program to demonstrate this (it's unfortunately rather
tricky to demonstrate using usual command line tools):

--8<---------------cut here---------------start------------->8---
/* Test program to demonstrate that /proc/self/exe does not behave like a
   symlink, and a process can exec /proc/self/exe even if there is no other
   filename by which the currently-executing program can be reached.

   Compile with -static (may require a 'guix shell gcc-toolchain glibc:static')
   and run. */


#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
#include <sched.h>
#include <sys/param.h>
#include <sys/mount.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <errno.h>
#include <signal.h>

#define die(msg, status) \
  do \
    { \
      perror(msg); \
      exit(status); \
    } while(0); \
  
#define pivot_root(new_root, put_old) (syscall(SYS_pivot_root, new_root,put_old))

int child_main(void * arg)
{
  int *pipefds = (int *) arg;
  char *argv[2] = {"???", NULL};
  int saved_errno;
  char c;
  if(close(pipefds[1]))
    die("Child closing write end of pipe", 20);

  if(read(pipefds[0], &c, 1 * sizeof(char)) < (1 * sizeof(char)))
    die("Reading from pipe", 21);

  if(c != 'y')
    die("Parent setup failed", 22);

  if(close(pipefds[0]))
    die("Child closing read end of pipe", 23);

  if(mkdir("/tmp/test-chroot", 0700) && errno != EEXIST)
    die("mkdir(\"/tmp/test-chroot\")", 24);

  if(mount(0, "/", 0, MS_REC|MS_PRIVATE, 0))
    die("making / private", 25);

  if(mount("/tmp/test-chroot", "/tmp/test-chroot", 0, MS_BIND, 0))
    die("making /tmp/test-chroot its own filesystem", 26);

  if(mkdir("/tmp/test-chroot/proc", 0700) && errno != EEXIST)
    die("mkdir(\"/tmp/test-chroot/proc\")", 27);

  if(mount("none", "/tmp/test-chroot/proc", "proc", 0, 0))
    die("mount procfs", 28);

  if(chdir("/tmp/test-chroot"))
    die("chdir to /tmp/test-chroot", 29);

  if(mkdir("real-root", 0))
    die("mkdir(\"real-root\")", 30);

  if(pivot_root(".", "real-root"))
    die("pivot_root", 31);

  if(chroot("."))
    die("chroot", 32);

  if(umount2("real-root", MNT_DETACH))
    die("unmounting real root", 33);

  if(rmdir("real-root"))
    die("removing real root directory", 34);

  if(execve("/proc/self/exe", argv, environ))
    {
      saved_errno = errno;
      fprintf(stderr, "execve errno: %d\n", saved_errno);
      errno = saved_errno;
      die("execve", 35);
    }
}



int main(int argc, char **argv)
{
  pid_t child_pid = -1;
  char stack[32 * 1024];
  int flags;
  int result;
  int status;
  FILE * f;
  char strbuf[512];
  int j;
  int pipefds[2];
  if(getenv("TEST_PROGRAM_MAGIC_ENV_VAR"))
    {
      fprintf(stderr, "Self-exec'ed!\n");
      fprintf(stderr, "Arguments: \n");
      for(j = 0; j < argc; j++) fprintf(stderr, "%s\n", argv[j]);
      j = readlink("/proc/self/exe", strbuf, sizeof(strbuf) - 1);
      if(j >= 0)
        {
          strbuf[j] = '\0';
          fprintf(stderr, "/proc/self/exe readlink result: %s\n", strbuf);
          f = fopen(strbuf, "r");
          if(f)
            {
              fprintf(stderr, "... and that file exists!\n");
              fclose(f);
            }
          else
            {
              perror("fopen /proc/self/exe readlink result error");
              fprintf(stderr, "... and that file does not exist.\n");
            }
          f = fopen("/proc/self/exe", "r");
          if(f)
            {
              fprintf(stderr, "/proc/self/exe can be opened\n");
              fclose(f);
            }
          else
            {
              perror("fopen /proc/self/exe error");
              fprintf(stderr, "/proc/self/exe cannot be opened\n");
            }
        }
      return 0;
    }
  setenv("TEST_PROGRAM_MAGIC_ENV_VAR", "1", 1);
  
  if(pipe(pipefds))
    die("pipe()", 1);

  flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWUTS | CLONE_NEWUSER;
  flags = flags | SIGCHLD; 
  child_pid = clone(child_main,
                    (char *)(((uintptr_t)stack + sizeof(stack) - 8) & ~((uintptr_t)0xf)),
                    flags,
                    (void *) pipefds);
  if(child_pid == -1)
    die("clone()", 1);

  if(close(pipefds[0]))
    die("parent closing read pipe end", 2);

  if(snprintf(strbuf, sizeof(strbuf), "/proc/%d/uid_map", child_pid)
     >= sizeof(strbuf))
    die("uid_map snprintf", 3);

  f = fopen(strbuf, "w");
  if(!f) die("uid_map fopen", 4);

  if(snprintf(strbuf, sizeof(strbuf), "%u %u 1", geteuid(), geteuid())
     >= sizeof(strbuf))
    die("uid_map contents snprintf", 5);

  if(fwrite(strbuf, strlen(strbuf) * sizeof(char), 1, f) < 1)
    die("uid_map fwrite", 6);

  if(fclose(f) == EOF)
    die("uid_map fclose", 7);

  if(snprintf(strbuf, sizeof(strbuf), "/proc/%d/setgroups", child_pid)
     >= sizeof(strbuf))
    die("setgroups snprintf", 8);

  f = fopen(strbuf, "w");
  if(!f) die("setgroups fopen", 9);

  if(fwrite("deny", 4 * sizeof(char), 1, f) < 1)
    die("setgroups fwrite", 10);

  if(fclose(f) == EOF)
    die("setgroups fclose", 11);

  if(snprintf(strbuf, sizeof(strbuf), "/proc/%d/gid_map", child_pid)
     >= sizeof(strbuf))
    die("gid_map snprintf", 12);

  f = fopen(strbuf, "w");
  if(!f) die("gid_map fopen", 13);

  if(snprintf(strbuf, sizeof(strbuf), "%u %u 1", getegid(), getegid())
     >= sizeof(strbuf))
    die("gid_map contents snprintf", 14);

  if(fwrite(strbuf, strlen(strbuf) * sizeof(char), 1, f) < 1)
    die("gid_map fwrite", 15);

  if(fclose(f) == EOF)
    die("gid_map fclose", 16);

  if(write(pipefds[1], "y", sizeof("y") - (1 * sizeof(char))) < (1 * sizeof(char)))
    die("writing to pipe", 17);

  if(close(pipefds[1]))
    die("parent closing write pipe end", 18)

  while (1)
    {
      result = waitpid(child_pid, &status, 0);
      if(result == child_pid)
        return WEXITSTATUS(status);
      if(result == -1 && errno != EINTR)
        die("waitpid", 19);
    }
}
--8<---------------cut here---------------end--------------->8---

Here's what it looks like when I run it:

--8<---------------cut here---------------start------------->8---
$ /tmp/test-program
Self-exec'ed!
Arguments: 
???
/proc/self/exe readlink result: /tmp/test-program
fopen /proc/self/exe readlink result error: No such file or directory
... and that file does not exist.
/proc/self/exe can be opened
--8<---------------cut here---------------end--------------->8---

> Here’s a test that fails both “rootfull” and “rootless”:
>
> (let* ((builder (add-file-tree-to-store %store
>                                         `("builder" symlink "/proc/self/exe")))
>        (drv (derivation %store "attempt-to-run-guix-daemon" builder '()
>                         #:env-vars
>                         '(("LD_PRELOAD" . "attacker-controlled.so")))))
>   (guard (c ((store-protocol-error? c) c))
>     (build-derivations %store (list drv))
>     #f))

From 'man 2 execve':

       ENOENT The file pathname or a script *or ELF interpreter* does not exist.

(emphasis mine).  The dynamic linker registered in guix-daemon's binary
is not likely to exist in the container in this test, but an attacker
could easily make it so as long as it's in the store.

> diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
> index 7484ea8fcf..970107c265 100644
> --- a/nix/libstore/build.cc
> +++ b/nix/libstore/build.cc
> @@ -2374,6 +2374,15 @@ void DerivationGoal::runChild()
>                  writeFull(STDERR_FILENO, "error: " + string(e.what()) + "\n");
>                  _exit(1);
>              }
> +        } else {
> +	    /* Ensure that the builder is within the store.  This prevents
> +	       users from using /proc/self/exe (or a symlink to it) as their
> +	       builder, which could allow them to overwrite the guix-daemon
> +	       binary (CVE-2019-5736).  */
> +	    drv.builder = canonPath(drv.builder, true);
> +	    printMsg(lvlError, format("builder is `%1%'") % drv.builder);
> +	    if (!isInStore(drv.builder))
> +		throw Error(format("derivation builder '%1%' is outside the store") % drv.builder);
>  	}
>  
>          /* Fill in the arguments. */

Note that we should still supply the original name or basename as
argv[0].

While ensuring that what actually gets execve'd is in the store suffices
to eliminate the vulnerability, it may be "conceptually purer" to
require that the links pointing to it are all in the store as well.  For
example, while a builder that is a symlink pointing to /proc/self/exe
wouldn't be able to modify the daemon binary, it's still a piece of
basically "undefined behavior" as far as the build environment is
concerned, which could be closed up.  But that can come later just as
well.



One more consideration I noticed when looking at v6's patch 14/16 (the
guix-daemon.service one): we don't do anything to set the gid.  I know
on guix system we usually use both dedicated privilege separation users
and groups for services.  Should we use a dedicated group for
guix-daemon as well?  Note that currently the chroot directories have
0750 permissions, so it's very important that their group not be
accessible to others.

- reepca
[signature.asc (application/pgp-signature, inline)]

This bug report was last modified 56 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.