GNU bug report logs - #57827
Shepherd 0.9.2 possible regressions

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <othacehe <at> gnu.org>

Date: Thu, 15 Sep 2022 13:06:02 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#57827: closed (Shepherd 0.9.2 possible regressions)
Date: Tue, 20 Sep 2022 17:32:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Tue, 20 Sep 2022 19:30:53 +0200
with message-id <87r106lzqq.fsf <at> gnu.org>
and subject line Re: bug#57827: Shepherd 0.9.2 possible regressions
has caused the debbugs.gnu.org bug report #57827,
regarding Shepherd 0.9.2 possible regressions
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
57827: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=57827
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Mathieu Othacehe <othacehe <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: Shepherd 0.9.2 possible regressions
Date: Thu, 15 Sep 2022 15:05:27 +0200
Hello,

Since Shepherd 0.9.2 the following tests are failing:

* cgit: https://ci.guix.gnu.org/build/1427375/details
* gitile https://ci.guix.gnu.org/build/1427377/details

It seems that an unexpected #<eof> object is received on the marionette
socket.

* gui-uefi-installed-os https://ci.guix.gnu.org/build/1431041/details
* gui-installed-os https://ci.guix.gnu.org/build/1431027/details
* gui-installed-os-encrypted https://ci.guix.gnu.org/build/1431040/details
* gui-installed-desktop-os-encrypted https://ci.guix.gnu.org/build/1431044/details

It seems that the Shepherd cannot be restarted in the install-system
procedure.

Thanks,

Mathieu


[Message part 3 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 57827-done <at> debbugs.gnu.org
Subject: Re: bug#57827: Shepherd 0.9.2 possible regressions
Date: Tue, 20 Sep 2022 19:30:53 +0200
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> Regarding those four, I was able to reproduce the issue this way:
>
> $ guix repl
> (stop-service 'guix-daemon)
> (start-service 'guix-daemon (list (number->string (getpid))))

Or from the shell:

  herd stop guix-daemon
  herd start guix-daemon $$

I was able to reproduce it using a bare-bones.tmpl VM.

> The latter command hangs and Shepherd becomes unresponsive.  I collected
> an (attached) strace dump of Shepherd showing that there is no response
> on the socket when the service is started.
>
> Note that, this works:
>
> $ guix repl
> (stop-service 'guix-daemon)
> (start-service 'guix-daemon) 
>
> So the problem could be caused by the "container-excursion*" in the
> "fork+exec-command/container" procedure.

PID 1 gets stuck on read(16, …) forever, after reading the string “2866”
(a PID):

--8<---------------cut here---------------start------------->8---
[pid  2865] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 2866 attached
, child_tidptr=0x7fccfbe00a10) = 2866
[pid  2866] set_robust_list(0x7fccfbe00a20, 24) = 0
[pid  2866] close(3)                    = 0
[pid  2865] write(39, "2866", 4 <unfinished ...>
[pid  2866] close(4 <unfinished ...>
[pid  2865] <... write resumed>)        = 4
[pid  2866] <... close resumed>)        = 0
[pid  2866] pipe2( <unfinished ...>
[pid  2865] close(39 <unfinished ...>
[pid  2866] <... pipe2 resumed>[3, 4], O_CLOEXEC) = 0
[pid  2865] <... close resumed>)        = 0
[pid  2865] exit_group(0)               = ?
[pid  2866] rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER|SA_NOCLDSTOP, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid  2866] rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid  2866] rt_sigaction(SIGHUP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid  2866] rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid  2866] rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM CHLD], [HUP INT TERM CHLD], 8) = 0
[pid  2866] mkdir("/var", 0777)         = -1 EEXIST (File exists)
[pid  2866] mkdir("/var/run", 0777)     = -1 EEXIST (File exists)
[pid  2865] +++ exited with 0 +++
[pid     1] <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2865
[pid     1] close(39)                   = 0
[pid  2866] setsid( <unfinished ...>
[pid     1] read(16,  <unfinished ...>
[pid  2866] <... setsid resumed>)       = 2866
[pid     1] <... read resumed>"2866", 4096) = 4
[pid  2866] chdir("/")                  = 0
[pid     1] read(16,  <unfinished ...>
[pid  2866] prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=4*1024}) = 0
[pid  2866] close(0)                    = 0
[pid  2866] openat(AT_FDCWD, "/dev/null", O_RDONLY) = 0
[pid  2866] dup2(0, 0)                  = 0
[pid  2866] close(1)                    = 0
[pid  2866] close(2)                    = 0
[pid  2866] openat(AT_FDCWD, "/var/log/guix-daemon.log", O_WRONLY|O_CREAT|O_APPEND, 0640) = 1
[pid  2866] dup2(1, 1)                  = 1
[pid  2866] dup2(1, 2)                  = 2
[pid  2866] execve("/gnu/store/bxnkqnpbf4q4z6245b61wgpm8gkr9nj1-guix-1.3.0-29.9e46320/bin/guix-daemon", ["/gnu/store/bxnkqnpbf4q4z6245b61w"..., "--build-users-group", "guixbuild", "--max-silent-time", "0", "--timeout", "0", "--log-compression", "gzip", "--discover=yes", "--substitute-urls", "https://substitutes.nonguix.org "...], 0x7fccf71fa480 /* 3 vars */) = 0
--8<---------------cut here---------------end--------------->8---

This happens because the other end of the file descriptor happens to be
inherited by 2866, which will never close it because it just execs
guix-daemon.

This is fixed by 6abdcef4a68e98f538ab69fde096adc5f5ca4ff4; the log
contains extra details.

Thanks!

Ludo’.


This bug report was last modified 2 years and 298 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.