Package: guix;
Reported by: yelninei <at> tutamail.com
Date: Tue, 22 Apr 2025 15:46:02 UTC
Severity: normal
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org> Cc: 77992 <at> debbugs.gnu.org, yelninei <at> tutamail.com Subject: Re: bug#77992: Childhurd stuck booting Date: Wed, 23 Apr 2025 00:17:12 +0200
[Message part 1 (text/plain, inline)]
Hi, yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes: > Reverting da741d89310efd0530351670d9c55ec2f952ab98 "services: account: Create /var/guix/profiles/per-user/$USER." fixes this, but I am not sure why. Woow, thanks for bisecting this, I would never had thought this could be a problem. I built the image for ‘bare-hurd.tmpl’ and booted it (with “console=com1” on the Mach command line) and here’s what we see: --8<---------------cut here---------------start------------->8--- shepherd[1]: Starting service file-systems... shepherd[1]: Service file-systems started. shepherd[1]: Service file-systems running with value #t. shepherd[1]: Service file-systems has been started. shepherd[1]: Starting service user-homes... shepherd[1]: Service user-homes failed to start. shepherd[1]: Exception caught while starting user-homes: (misc-error "scm_fdes_to_port" "requested file mode not available on fdes" () #f) shepherd[1]: Service loopback has been started. shepherd[1]: Service loopback started. shepherd[1]: Service loopback running with value #t. --8<---------------cut here---------------end--------------->8--- The ‘user-homes’ service fails to start, so basically the system isn’t brought up. The culprit appears to be ‘mkdir-p/perms’: --8<---------------cut here---------------start------------->8--- ludo <at> childhurd ~$ rpctrace -o log guile -c '(use-modules (gnu build activation)) (mkdir-p/perms "foo/bar/baz" (getpwnam "ludo") #o755)' Backtrace: In ice-9/boot-9.scm: 1752:10 7 (with-exception-handler _ _ #:unwind? _ # _) In unknown file: 6 (apply-smob/0 #<thunk 20f91a0>) In ice-9/boot-9.scm: 724:2 5 (call-with-prompt _ _ #<procedure default-prompt-handle?>) In ice-9/eval.scm: 619:8 4 (_ #(#(#<directory (guile-user) 20ec6e0>))) In ice-9/command-line.scm: 185:19 3 (_ #<input: string 2106fc0>) In unknown file: 2 (eval (mkdir-p/perms "foo/bar/baz" (getpwnam "ludo") #) #) In gnu/build/activation.scm: 97:20 1 (mkdir-p/perms _ #("ludo" "x" 1000 998 "Ludovic Cou?" ?) ?) In unknown file: 0 (open "." 7340032 #<undefined>) ERROR: In procedure open: In procedure scm_fdes_to_port: requested file mode not available on fdes --8<---------------cut here---------------end--------------->8--- The relevant log snippet is this: --8<---------------cut here---------------start------------->8--- 17<--33(pid168)->dir_lookup ("etc/passwd" 4194305 0) = 0 1 "" 66<--74(pid168) 66<--74(pid168)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message ID) 66<--74(pid168)->io_stat_request () = 0 {23 7 0 56029 0 1745320104 0 33188 1 0 0 1841 0 17453 19370 840000000 1745319369 220000000 1745319369 220000000 8192 8 0 0 0 0 0 0 0 0 0 0 0} 66<--74(pid168)->io_seek_request (0 0) = 0 0 66<--74(pid168)->io_read_request (-1 8192) = 0 "root:x:0:0:System administrator:/root:/gnu/st ore/a1vynvd381hxsf979qzv8r25bc3pd2r" task13(pid168)-> 3206 (pn{ 30}) = 0 20<--32(pid168)->dir_lookup ("." 7340160 0) = 0 1 "" 66<--70(pid168) 66<--70(pid168)->io_stat_request () = 0 {23 7 0 264001 0 1745320625 0 16832 3 1000 998 4096 0 1745342831 30000000 1745342821 950000000 1745319372 110000000 8192 8 0 0 0 8388736 8388736 838 8736 8388736 8388736 8388736 8388736 8388736} 66<--70(pid168)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message ID) 66<--70(pid168)->io_get_openmodes_request () = 0 0 25<--37(pid168)->io_write_request ("Backtrace:\n" -1) = 0 11 --8<---------------cut here---------------end--------------->8--- The ‘io_get_openmodes’ RPC corresponds to F_GETFL in ‘scm_i_fdes_is_valid’ in Guile. Can be reproduced with just this: guile -c '(open "." O_DIRECTORY)' I think ‘flags_to_mode’ in Guile returns “r” on Linux, which is fine because O_RDONLY is set. But on the Hurd, O_RDONLY is not set: --8<---------------cut here---------------start------------->8--- ludo <at> childhurd ~$ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))' ;;; ("i586-pc-gnu" 0) --8<---------------cut here---------------end--------------->8--- vs.: --8<---------------cut here---------------start------------->8--- $ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))' ;;; ("x86_64-unknown-linux-gnu" 98304) --8<---------------cut here---------------end--------------->8--- Long story short, O_RDONLY = 0 on Linux but it’s non-zero on the Hurd, so to placate ‘scm_i_fdes_is_valid’, we need to show it that the directory is opened with O_RDONLY:
[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/build/activation.scm b/gnu/build/activation.scm index 11f7c82d67..038d8327de 100644 --- a/gnu/build/activation.scm +++ b/gnu/build/activation.scm @@ -90,6 +90,7 @@ (define (mkdir-p/perms directory owner bits) ;; By combining O_NOFOLLOW and O_DIRECTORY, this procedure automatically ;; verifies that no components are symlinks. (define open-flags (logior O_CLOEXEC ; don't pass the port on to subprocesses + O_RDONLY ;need on the Hurd, harmless on Linux O_NOFOLLOW ; don't follow symlinks O_DIRECTORY)) ; reject anything not a directory
[Message part 3 (text/plain, inline)]
Tested on both systems and it seems to work. Let me know how it goes for you! > Finding this was a lot of trial and error (bisecting did now work > because of the python cross compilation failure) but sshd not showing > up is caught by the childhurd system test. Encountering a record ABI > mismatch requiring a recompile of the entire guix tree slowed this > down as well. For the API mismatch, you could probably rebuild just the small subset of modules affected by this (for example, those that refer to <guix-configuration> if that’s what’s involved). > Also https://issues.guix.gnu.org/77610 is causing the the rest of the > failures in the chldhurd system test which expect the guix daemon to > be avaialble immediately. I started looking around in glibc and hurd > but I haven't found a good setup yet to easily try changes without a > full rebuild. For such things, I found that testing interactively in QEMU is best. Thanks for finding and debugging this! Ludo’.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.