GNU bug report logs - #76315
System does not boot after switching to system-log service

Previous Next

Package: guix;

Reported by: Tomas Volf <~@wolfsden.cz>

Date: Sun, 16 Feb 2025 00:43:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 76315 in the body.
You can then email your comments to 76315 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Sun, 16 Feb 2025 00:43:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tomas Volf <~@wolfsden.cz>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sun, 16 Feb 2025 00:43:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: bug-guix <at> gnu.org
Subject: System does not boot after switching to system-log service
Date: Sun, 16 Feb 2025 01:41:50 +0100
[Message part 1 (text/plain, inline)]
Hello,

after pulling recent Guix, I got this error during guix deploy:

--8<---------------cut here---------------start------------->8---
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>> 
--8<---------------cut here---------------end--------------->8---

After rebooting, the system got stack during startup.  No error message
was visible, it was just hanging.

Booting to previous generation did work.

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Sun, 16 Feb 2025 14:24:02 GMT) Full text and rfc822 format available.

Message #8 received at 76315 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: 76315 <at> debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log
 service
Date: Sun, 16 Feb 2025 15:23:12 +0100
[Message part 1 (text/plain, inline)]
I have put together a reproducer in a VM:

1. Install Guix system using 1.4.0 installer
  --> Include sshd, openbox

2. Reboot
3. Copy the /run/current-system/configuration.scm out of the VM
4. Adjust the configuration.scm (full file attached)
4.1 Allow NOPASSWD sudo
  (sudoers-file
   (plain-file "sudoers"
               (string-append (plain-file-content %sudoers-specification)
                              (format #f "x ALL = NOPASSWD: ALL~%"))))
4.2 Use %base-services, delete set-xorg-configuration service
4.3 Add dhcp-client-service-type service.
4.4 Authorize your key
  (simple-service
   'extra-authorized-keys guix-service-type
   (guix-extension
    (authorized-keys (list
                      (local-file "/etc/guix/signing-key.pub")))))

5. Manually tweak /etc/sudoers to support NOPASSWD for user x
6. Create machine configuration (full file attached)

7. Guix deploy the machine using b99df83c591104655a6b387817d8f7bb3c50204c
8. Reboot

9. Guix deploy the machine using 1afbf48b250f667ce45de40a6c275e3e42ade67c
  --> See the following error:
  
--8<---------------cut here---------------start------------->8---
building path(s) `/gnu/store/zdknxv3knkkxx52nwfbz120p32z4j2aa-upgrade-shepherd-services.scm'
building path(s) `/gnu/store/x7bzglpc0vvr5ak24k3i33ikq5ph8sfx-remote-exp.scm'
guix deploy: warning: an error occurred while upgrading services on 'localhost':
%exception #<inferior-object #<&service-not-found-error service: system-log>> 
--8<---------------cut here---------------end--------------->8---

A. Reboot
  --> The system does not come up (I gave it ~10 minutes).

[config.scm (text/x-scheme, inline)]
;; This is an operating system configuration generated
;; by the graphical installer.
;;
;; Once installation is complete, you can learn and modify
;; this file to tweak the system configuration, and pass it
;; to the 'guix system reconfigure' command to effect your
;; changes.


;; Indicate which modules to import to access the variables
;; used in this configuration.
(use-modules (gnu))
(use-service-modules cups desktop networking ssh xorg)

(operating-system
  (locale "en_US.utf8")
  (timezone "Europe/Prague")
  (keyboard-layout (keyboard-layout "us"))
  (host-name "x")

  ;; The list of user accounts ('root' is implicit).
  (users (cons* (user-account
                 (name "x")
                 (comment "X")
                 (group "users")
                 (home-directory "/home/x")
                 (supplementary-groups '("wheel" "netdev" "audio" "video")))
                %base-user-accounts))

  ;; Packages installed system-wide.  Users can also install packages
  ;; under their own account: use 'guix search KEYWORD' to search
  ;; for packages and 'guix install PACKAGE' to install a package.
  (packages (append (list (specification->package "openbox")
                          (specification->package "nss-certs"))
                    %base-packages))

  (sudoers-file
   (plain-file "sudoers"
               (string-append (plain-file-content %sudoers-specification)
                              (format #f "x ALL = NOPASSWD: ALL~%"))))

  ;; Below is the list of system services.  To search for available
  ;; services, run 'guix system search KEYWORD' in a terminal.
  (services
   (append (list
            (service dhcp-client-service-type)
            ;; To configure OpenSSH, pass an 'openssh-configuration'
            ;; record as a second argument to 'service' below.
            (service openssh-service-type)

            (simple-service
             'extra-authorized-keys guix-service-type
             (guix-extension
              (authorized-keys (list
                                (local-file "/etc/guix/signing-key.pub"))))))

           ;; This is the default list of services we
           ;; are appending to.
           %base-services))
  (bootloader (bootloader-configuration
               (bootloader grub-efi-bootloader)
               (targets (list "/boot/efi"))
               (keyboard-layout keyboard-layout)))
  (swap-devices (list (swap-space
                        (target (uuid
                                 "aa8dee07-5bf4-4ad2-8db7-8ee6139d6fc5")))))

  ;; The list of file systems that get "mounted".  The unique
  ;; file system identifiers there ("UUIDs") can be obtained
  ;; by running 'blkid' in a terminal.
  (file-systems (cons* (file-system
                         (mount-point "/boot/efi")
                         (device (uuid "79EB-4D57"
                                       'fat32))
                         (type "vfat"))
                       (file-system
                         (mount-point "/")
                         (device (uuid
                                  "11d0a98d-7200-4a9b-ae0a-0cb4db3e808d"
                                  'ext4))
                         (type "ext4")) %base-file-systems)))
[machine.scm (text/x-scheme, inline)]
(use-modules (gnu))

(use-service-modules networking ssh)
(use-package-modules bootloaders)

(list (machine
       (operating-system (primitive-load "config.scm"))
       (environment managed-host-environment-type)
       (configuration (machine-ssh-configuration
                       (build-locally? #f)
                       (host-name "localhost")
                       (system "x86_64-linux")
                       (user "x")
                       (port 8888)))))
[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 16 Feb 2025 17:47:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Sun, 16 Feb 2025 21:31:01 GMT) Full text and rfc822 format available.

Message #13 received at 76315 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315 <at> debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log
 service
Date: Sun, 16 Feb 2025 22:30:18 +0100
Hi,

Tomas Volf <~@wolfsden.cz> skribis:

> A. Reboot
>   --> The system does not come up (I gave it ~10 minutes).

I tried the config file you gave with:

  ./pre-inst-env guix system vm /tmp/config.scm

and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
since June, and “make check-system TESTS=basic” & co. pass).

I’ll keep investigating and probably revert the change in the interim.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Sun, 16 Feb 2025 22:22:02 GMT) Full text and rfc822 format available.

Message #16 received at 76315 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315 <at> debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log
 service
Date: Sun, 16 Feb 2025 23:20:57 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> I’ll keep investigating and probably revert the change in the interim.

Reverted in 8c483c12e94bcf43e4c44170f1d5fea5fbba4970.

Ludo'.




Information forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Wed, 19 Feb 2025 21:05:02 GMT) Full text and rfc822 format available.

Message #19 received at 76315 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315 <at> debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log
 service
Date: Wed, 19 Feb 2025 22:04:20 +0100
Hey Tomas,

Ludovic Courtès <ludo <at> gnu.org> skribis:

> I tried the config file you gave with:
>
>   ./pre-inst-env guix system vm /tmp/config.scm
>
> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
> since June, and “make check-system TESTS=basic” & co. pass).

After spending hours on this and fixing improbable issues in the
Shepherd (will push shortly), I found that the root of the problem is
exactly what I feared and which led to the patches at
<https://issues.guix.gnu.org/76262>.

Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
it loses the race and waits forever.  (I’m using
‘network-manager-service-type’ on my laptop, which is why I did not
stumble upon this bug.)

Could you try your config with the patch at
<https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
the metal?

Thanks in advance,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Wed, 19 Feb 2025 21:09:02 GMT) Full text and rfc822 format available.

Message #22 received at 76315 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315 <at> debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log
 service
Date: Wed, 19 Feb 2025 22:07:53 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?

You need to do that on top of a pre-revert commit, such as
eba8c08b1bfc7ac333a0eda658a0be5acac7f151.




Information forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Thu, 20 Feb 2025 21:33:02 GMT) Full text and rfc822 format available.

Message #25 received at 76315 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 76315 <at> debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log
 service
Date: Thu, 20 Feb 2025 22:32:03 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hey Tomas,
>
> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> I tried the config file you gave with:
>>
>>   ./pre-inst-env guix system vm /tmp/config.scm
>>
>> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
>> since June, and “make check-system TESTS=basic” & co. pass).
>
> After spending hours on this and fixing improbable issues in the
> Shepherd (will push shortly), I found that the root of the problem is
> exactly what I feared and which led to the patches at
> <https://issues.guix.gnu.org/76262>.
>
> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
> it loses the race and waits forever.

Observation here.  While yes, based on the description I agree that it
is (bad) luck based, in practice it seems to be extremely reliable to
reproduce.

At first I struggled to reproduce again, it did not hang even single
time (out of 5 tries) on the bad commit, but once I reverted my
configuration to what it was back then (== removed few shepherd timers),
the hang started happening every single time.

So, while in theory it should be a probabilistic problem, in practice it
does not seem to be the case.  Not sure where I am going with this, I
just think it is interesting.

>
> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?

I have reverted your revert and applied the patch 2 on top of that.

Steps I took (both in VM and on a spare laptop):

1. Reconfigure from commit 1.
2. Ensure it still hangs (5x).
3. Reconfigure from commit 2.
4. Ensure it no longer hangs (5x).

I can confirm the patch 2 fixes the issue for me, both in the VM and on
physical machine.

Only thing I have noticed that even when deploying the "good" commit, I
see the following error in the log:

--8<---------------cut here---------------start------------->8---
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>>
--8<---------------cut here---------------end--------------->8---

The system comes up fine after reboot though.

>
> Thanks in advance,
> Ludo’.

Thank you for figuring this one out. :)

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Fri, 21 Feb 2025 11:18:01 GMT) Full text and rfc822 format available.

Message #28 received at 76315 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315 <at> debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log
 service
Date: Fri, 21 Feb 2025 12:17:16 +0100
Hi,

Tomas Volf <~@wolfsden.cz> skribis:

>> After spending hours on this and fixing improbable issues in the
>> Shepherd (will push shortly), I found that the root of the problem is
>> exactly what I feared and which led to the patches at
>> <https://issues.guix.gnu.org/76262>.
>>
>> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
>> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
>> it loses the race and waits forever.
>
> Observation here.  While yes, based on the description I agree that it
> is (bad) luck based, in practice it seems to be extremely reliable to
> reproduce.

Yes, I could reproduce it 100% with just ‘bare-bones.tmpl’.  Thing is,
as soon as you would change something non-trivial, for instance the
‘message-destination’ procedure of shepherd so that it writes everything
to /dev/console, the problem would go away.  Even just commenting out
some of the parameters passed to ‘system-log’ could make the problem
disappear (!), which is why it took me a lot of time to figure it out.

>> Could you try your config with the patch at
>> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
>> the metal?

[...]

> I can confirm the patch 2 fixes the issue for me, both in the VM and on
> physical machine.

Yay!

> Only thing I have noticed that even when deploying the "good" commit, I
> see the following error in the log:
>
> guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
> %exception #<inferior-object #<&service-not-found-error service: system-log>>

I think I understood this one now.

The old service has only one name: syslogd.  The new one, which upgrades
it, has two names: system-log and syslogd (system-log is its “canonical
name”).

The service upgrade machinery gets confused because it uses the
canonical name in one place.

I’ll investigate.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#76315; Package guix. (Sun, 23 Feb 2025 14:50:02 GMT) Full text and rfc822 format available.

Message #31 received at 76315 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315 <at> debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log
 service
Date: Sun, 23 Feb 2025 15:49:44 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

>> Only thing I have noticed that even when deploying the "good" commit, I
>> see the following error in the log:
>>
>> guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
>> %exception #<inferior-object #<&service-not-found-error service: system-log>>
>
> I think I understood this one now.

Patch 👉 https://issues.guix.gnu.org/76502




bug closed, send any further explanations to 76315 <at> debbugs.gnu.org and Tomas Volf <~@wolfsden.cz> Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 07 Apr 2025 14:25:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 06 May 2025 11:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.