GNU bug report logs - #57827
Shepherd 0.9.2 possible regressions

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <othacehe <at> gnu.org>

Date: Thu, 15 Sep 2022 13:06:02 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Mathieu Othacehe <othacehe <at> gnu.org>
Subject: bug#57827: closed (Re: bug#57827: Shepherd 0.9.2 possible
 regressions)
Date: Tue, 20 Sep 2022 17:32:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#57827: Shepherd 0.9.2 possible regressions

which was filed against the guix package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 57827 <at> debbugs.gnu.org.

-- 
57827: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=57827
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 57827-done <at> debbugs.gnu.org
Subject: Re: bug#57827: Shepherd 0.9.2 possible regressions
Date: Tue, 20 Sep 2022 19:30:53 +0200
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> Regarding those four, I was able to reproduce the issue this way:
>
> $ guix repl
> (stop-service 'guix-daemon)
> (start-service 'guix-daemon (list (number->string (getpid))))

Or from the shell:

  herd stop guix-daemon
  herd start guix-daemon $$

I was able to reproduce it using a bare-bones.tmpl VM.

> The latter command hangs and Shepherd becomes unresponsive.  I collected
> an (attached) strace dump of Shepherd showing that there is no response
> on the socket when the service is started.
>
> Note that, this works:
>
> $ guix repl
> (stop-service 'guix-daemon)
> (start-service 'guix-daemon) 
>
> So the problem could be caused by the "container-excursion*" in the
> "fork+exec-command/container" procedure.

PID 1 gets stuck on read(16, …) forever, after reading the string “2866”
(a PID):

--8<---------------cut here---------------start------------->8---
[pid  2865] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 2866 attached
, child_tidptr=0x7fccfbe00a10) = 2866
[pid  2866] set_robust_list(0x7fccfbe00a20, 24) = 0
[pid  2866] close(3)                    = 0
[pid  2865] write(39, "2866", 4 <unfinished ...>
[pid  2866] close(4 <unfinished ...>
[pid  2865] <... write resumed>)        = 4
[pid  2866] <... close resumed>)        = 0
[pid  2866] pipe2( <unfinished ...>
[pid  2865] close(39 <unfinished ...>
[pid  2866] <... pipe2 resumed>[3, 4], O_CLOEXEC) = 0
[pid  2865] <... close resumed>)        = 0
[pid  2865] exit_group(0)               = ?
[pid  2866] rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER|SA_NOCLDSTOP, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid  2866] rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid  2866] rt_sigaction(SIGHUP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid  2866] rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid  2866] rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM CHLD], [HUP INT TERM CHLD], 8) = 0
[pid  2866] mkdir("/var", 0777)         = -1 EEXIST (File exists)
[pid  2866] mkdir("/var/run", 0777)     = -1 EEXIST (File exists)
[pid  2865] +++ exited with 0 +++
[pid     1] <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2865
[pid     1] close(39)                   = 0
[pid  2866] setsid( <unfinished ...>
[pid     1] read(16,  <unfinished ...>
[pid  2866] <... setsid resumed>)       = 2866
[pid     1] <... read resumed>"2866", 4096) = 4
[pid  2866] chdir("/")                  = 0
[pid     1] read(16,  <unfinished ...>
[pid  2866] prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=4*1024}) = 0
[pid  2866] close(0)                    = 0
[pid  2866] openat(AT_FDCWD, "/dev/null", O_RDONLY) = 0
[pid  2866] dup2(0, 0)                  = 0
[pid  2866] close(1)                    = 0
[pid  2866] close(2)                    = 0
[pid  2866] openat(AT_FDCWD, "/var/log/guix-daemon.log", O_WRONLY|O_CREAT|O_APPEND, 0640) = 1
[pid  2866] dup2(1, 1)                  = 1
[pid  2866] dup2(1, 2)                  = 2
[pid  2866] execve("/gnu/store/bxnkqnpbf4q4z6245b61wgpm8gkr9nj1-guix-1.3.0-29.9e46320/bin/guix-daemon", ["/gnu/store/bxnkqnpbf4q4z6245b61w"..., "--build-users-group", "guixbuild", "--max-silent-time", "0", "--timeout", "0", "--log-compression", "gzip", "--discover=yes", "--substitute-urls", "https://substitutes.nonguix.org "...], 0x7fccf71fa480 /* 3 vars */) = 0
--8<---------------cut here---------------end--------------->8---

This happens because the other end of the file descriptor happens to be
inherited by 2866, which will never close it because it just execs
guix-daemon.

This is fixed by 6abdcef4a68e98f538ab69fde096adc5f5ca4ff4; the log
contains extra details.

Thanks!

Ludo’.

[Message part 3 (message/rfc822, inline)]
From: Mathieu Othacehe <othacehe <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: Shepherd 0.9.2 possible regressions
Date: Thu, 15 Sep 2022 15:05:27 +0200
Hello,

Since Shepherd 0.9.2 the following tests are failing:

* cgit: https://ci.guix.gnu.org/build/1427375/details
* gitile https://ci.guix.gnu.org/build/1427377/details

It seems that an unexpected #<eof> object is received on the marionette
socket.

* gui-uefi-installed-os https://ci.guix.gnu.org/build/1431041/details
* gui-installed-os https://ci.guix.gnu.org/build/1431027/details
* gui-installed-os-encrypted https://ci.guix.gnu.org/build/1431040/details
* gui-installed-desktop-os-encrypted https://ci.guix.gnu.org/build/1431044/details

It seems that the Shepherd cannot be restarted in the install-system
procedure.

Thanks,

Mathieu



This bug report was last modified 2 years and 295 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.