GNU bug report logs -
#76315
System does not boot after switching to system-log service
Previous Next
Reported by: Tomas Volf <~@wolfsden.cz>
Date: Sun, 16 Feb 2025 00:43:01 UTC
Severity: important
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
Message #25 received at 76315 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:
> Hey Tomas,
>
> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> I tried the config file you gave with:
>>
>> ./pre-inst-env guix system vm /tmp/config.scm
>>
>> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
>> since June, and “make check-system TESTS=basic” & co. pass).
>
> After spending hours on this and fixing improbable issues in the
> Shepherd (will push shortly), I found that the root of the problem is
> exactly what I feared and which led to the patches at
> <https://issues.guix.gnu.org/76262>.
>
> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
> it loses the race and waits forever.
Observation here. While yes, based on the description I agree that it
is (bad) luck based, in practice it seems to be extremely reliable to
reproduce.
At first I struggled to reproduce again, it did not hang even single
time (out of 5 tries) on the bad commit, but once I reverted my
configuration to what it was back then (== removed few shepherd timers),
the hang started happening every single time.
So, while in theory it should be a probabilistic problem, in practice it
does not seem to be the case. Not sure where I am going with this, I
just think it is interesting.
>
> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?
I have reverted your revert and applied the patch 2 on top of that.
Steps I took (both in VM and on a spare laptop):
1. Reconfigure from commit 1.
2. Ensure it still hangs (5x).
3. Reconfigure from commit 2.
4. Ensure it no longer hangs (5x).
I can confirm the patch 2 fixes the issue for me, both in the VM and on
physical machine.
Only thing I have noticed that even when deploying the "good" commit, I
see the following error in the log:
--8<---------------cut here---------------start------------->8---
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>>
--8<---------------cut here---------------end--------------->8---
The system comes up fine after reboot though.
>
> Thanks in advance,
> Ludo’.
Thank you for figuring this one out. :)
Tomas
--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]
This bug report was last modified 40 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.