GNU bug report logs - #77992
Childhurd stuck booting

Previous Next

Package: guix;

Reported by: yelninei <at> tutamail.com

Date: Tue, 22 Apr 2025 15:46:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org>
Cc: 77992 <at> debbugs.gnu.org, yelninei <at> tutamail.com
Subject: Re: bug#77992: Childhurd stuck booting
Date: Wed, 23 Apr 2025 00:17:12 +0200
[Message part 1 (text/plain, inline)]
Hi,

yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:

> Reverting da741d89310efd0530351670d9c55ec2f952ab98 "services: account: Create /var/guix/profiles/per-user/$USER." fixes this, but I am not sure why.

Woow, thanks for bisecting this, I would never had thought this could be
a problem.

I built the image for ‘bare-hurd.tmpl’ and booted it (with
“console=com1” on the Mach command line) and here’s what we see:

--8<---------------cut here---------------start------------->8---
shepherd[1]: Starting service file-systems...
shepherd[1]: Service file-systems started.
shepherd[1]: Service file-systems running with value #t.
shepherd[1]: Service file-systems has been started.
shepherd[1]: Starting service user-homes...
shepherd[1]: Service user-homes failed to start.
shepherd[1]: Exception caught while starting user-homes: (misc-error "scm_fdes_to_port" "requested file mode not available on fdes" () #f)
shepherd[1]: Service loopback has been started.
shepherd[1]: Service loopback started.
shepherd[1]: Service loopback running with value #t.
--8<---------------cut here---------------end--------------->8---

The ‘user-homes’ service fails to start, so basically the system isn’t
brought up.

The culprit appears to be ‘mkdir-p/perms’:

--8<---------------cut here---------------start------------->8---
ludo <at> childhurd ~$ rpctrace -o log guile -c '(use-modules (gnu build activation)) (mkdir-p/perms "foo/bar/baz" (getpwnam "ludo") #o755)'
Backtrace:
In ice-9/boot-9.scm:
  1752:10  7 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
           6 (apply-smob/0 #<thunk 20f91a0>)
In ice-9/boot-9.scm:
    724:2  5 (call-with-prompt _ _ #<procedure default-prompt-handle?>)
In ice-9/eval.scm:
    619:8  4 (_ #(#(#<directory (guile-user) 20ec6e0>)))
In ice-9/command-line.scm:
   185:19  3 (_ #<input: string 2106fc0>)
In unknown file:
           2 (eval (mkdir-p/perms "foo/bar/baz" (getpwnam "ludo") #) #)
In gnu/build/activation.scm:
    97:20  1 (mkdir-p/perms _ #("ludo" "x" 1000 998 "Ludovic Cou?" ?) ?)
In unknown file:
           0 (open "." 7340032 #<undefined>)

ERROR: In procedure open:
In procedure scm_fdes_to_port: requested file mode not available on fdes
--8<---------------cut here---------------end--------------->8---

The relevant log snippet is this:

--8<---------------cut here---------------start------------->8---
  17<--33(pid168)->dir_lookup ("etc/passwd" 4194305 0) = 0 1 ""    66<--74(pid168)
  66<--74(pid168)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message ID) 
  66<--74(pid168)->io_stat_request () = 0 {23 7 0 56029 0 1745320104 0 33188 1 0 0 1841 0 17453
19370 840000000 1745319369 220000000 1745319369 220000000 8192 8 0 0 0 0 0 0 0 0 0 0 0}
  66<--74(pid168)->io_seek_request (0 0) = 0 0
  66<--74(pid168)->io_read_request (-1 8192) = 0 "root:x:0:0:System administrator:/root:/gnu/st
ore/a1vynvd381hxsf979qzv8r25bc3pd2r"
task13(pid168)-> 3206 (pn{ 30}) = 0 
  20<--32(pid168)->dir_lookup ("." 7340160 0) = 0 1 ""    66<--70(pid168)
  66<--70(pid168)->io_stat_request () = 0 {23 7 0 264001 0 1745320625 0 16832 3 1000 998 4096 0
 1745342831 30000000 1745342821 950000000 1745319372 110000000 8192 8 0 0 0 8388736 8388736 838
8736 8388736 8388736 8388736 8388736 8388736}
  66<--70(pid168)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message ID) 
  66<--70(pid168)->io_get_openmodes_request () = 0 0
  25<--37(pid168)->io_write_request ("Backtrace:\n" -1) = 0 11
--8<---------------cut here---------------end--------------->8---

The ‘io_get_openmodes’ RPC corresponds to F_GETFL in
‘scm_i_fdes_is_valid’ in Guile.

Can be reproduced with just this:

  guile -c '(open "." O_DIRECTORY)'

I think ‘flags_to_mode’ in Guile returns “r” on Linux, which is fine
because O_RDONLY is set.  But on the Hurd, O_RDONLY is not set:

--8<---------------cut here---------------start------------->8---
ludo <at> childhurd ~$ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))'

;;; ("i586-pc-gnu" 0)
--8<---------------cut here---------------end--------------->8---

vs.:

--8<---------------cut here---------------start------------->8---
$ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))'

;;; ("x86_64-unknown-linux-gnu" 98304)
--8<---------------cut here---------------end--------------->8---

Long story short, O_RDONLY = 0 on Linux but it’s non-zero on the Hurd,
so to placate ‘scm_i_fdes_is_valid’, we need to show it that the
directory is opened with O_RDONLY:

[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/build/activation.scm b/gnu/build/activation.scm
index 11f7c82d67..038d8327de 100644
--- a/gnu/build/activation.scm
+++ b/gnu/build/activation.scm
@@ -90,6 +90,7 @@ (define (mkdir-p/perms directory owner bits)
   ;; By combining O_NOFOLLOW and O_DIRECTORY, this procedure automatically
   ;; verifies that no components are symlinks.
   (define open-flags (logior O_CLOEXEC ; don't pass the port on to subprocesses
+                             O_RDONLY  ;need on the Hurd, harmless on Linux
                              O_NOFOLLOW ; don't follow symlinks
                              O_DIRECTORY)) ; reject anything not a directory
 
[Message part 3 (text/plain, inline)]
Tested on both systems and it seems to work.

Let me know how it goes for you!

> Finding this was a lot of trial and error (bisecting did now work
> because of the python cross compilation failure) but sshd not showing
> up is caught by the childhurd system test. Encountering a record ABI
> mismatch requiring a recompile of the entire guix tree slowed this
> down as well.

For the API mismatch, you could probably rebuild just the small subset
of modules affected by this (for example, those that refer to
<guix-configuration> if that’s what’s involved).

> Also https://issues.guix.gnu.org/77610 is causing the the rest of the
> failures in the chldhurd  system test which expect the guix daemon to
> be avaialble immediately. I started looking around in glibc and hurd
> but I haven't found a good setup yet to easily try changes without a
> full rebuild.

For such things, I found that testing interactively in QEMU is best.

Thanks for finding and debugging this!

Ludo’.

This bug report was last modified 31 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.