GNU bug report logs -
#41948
Shepherd deadlocks
Previous Next
Reported by: Mathieu Othacehe <othacehe <at> gnu.org>
Date: Fri, 19 Jun 2020 08:42:01 UTC
Severity: important
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
Message #19 received at 41948 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hey Ludo,
> We should be able to reproduce it with much simpler tests then, right?
> Like maybe “while : ; do herd restart guix-daemon ; done” or similar?
Well I tried that without success. Then I had a closer look to the
strace log.
Turns out there are two concurrent "finalizer" threads:
--8<---------------cut here---------------start------------->8---
1 clone(child_stack=0x7f17981e6fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[271], tls=0x7f17981e7700, child_tidptr=0x7f17981e79d0) = 271
--8<---------------cut here---------------end--------------->8---
and this one,
--8<---------------cut here---------------start------------->8---
217 <... clone resumed>, parent_tid=[253], tls=0x7f1799309700, child_tidptr=0x7f17993099d0) = 253
--8<---------------cut here---------------end--------------->8---
The first one is spawned from Shepherd directly. The other one is
spawned from the forked process in "marionette-shepherd-service".
Those two finalizer threads share the same pipe. When we try to
stop the finalizer thread in Shepherd, right before forking a new
process, we send a '\1' byte to the finalizer pipe.
--8<---------------cut here---------------start------------->8---
1 write(6, "\1", 1 <unfinished ...>
--8<---------------cut here---------------end--------------->8---
which is received by (line 183597):
--8<---------------cut here---------------start------------->8---
253 <... read resumed>"\1", 1) = 1
--8<---------------cut here---------------end--------------->8---
the marionette finalizer thread. Then, we pthread_join the Shepherd
finalizer thread, which never stops! Quite unfortunate.
Here's a small reproducer attached. So unless I'm wrong this is a Guile
issue, that will cause any program that uses at least two primitive-fork
calls to possibly hang.
I'm quite convinced that those two bugs are directly related:
* https://issues.guix.info/31925
* https://issues.guix.gnu.org/42353
Now regarding the fix of this issue, I guess that a process forked with
"primitive-fork" in Guile should close it's parent finalizer pipe and
open a new one.
WDYT?
Thanks,
Mathieu
[t.scm (application/octet-stream, attachment)]
This bug report was last modified 4 years and 76 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.