GNU bug report logs - #40981
shepherd 0.8.0 race condition can lead to stopping itself

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <m.othacehe <at> gmail.com>

Date: Thu, 30 Apr 2020 11:52:02 UTC

Severity: important

Merged with 41429

Done: Mathieu Othacehe <mathieu <at> meru.i-did-not-set--mail-host-address--so-tickle-me>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#40981: closed (shepherd 0.8.0 race condition can lead to
 stopping itself)
Date: Mon, 11 May 2020 21:10:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Mon, 11 May 2020 23:09:05 +0200
with message-id <87o8qudvri.fsf <at> gnu.org>
and subject line Re: bug#40981: Graphical installer tests sometimes hang.
has caused the debbugs.gnu.org bug report #40981,
regarding shepherd 0.8.0 race condition can lead to stopping itself
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
40981: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=40981
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Mathieu Othacehe <m.othacehe <at> gmail.com>
To: bug-guix <at> gnu.org
Subject: Graphical installer tests sometimes hang.
Date: Thu, 30 Apr 2020 13:51:49 +0200
Hello,

Graphical installer tests sometimes hang, before starting marionette
tests. See for instance:
https://ci.guix.gnu.org/log/d31s9sycixhvfak5lpzdg0mzvz5aa9av-gui-installed-os-encrypted.

Restarting tests just after a hang (on the same installed image),
sometimes work. I removed the "quiet" kernel argument to see what's
going on.

--8<---------------cut here---------------start------------->8---
[    0.862608] Freeing unused kernel image memory: 1964K
[    0.863177] Run /init as init process
GC Warning: pthread_getattr_np or pthread_attr_getstack failed for main thread
GC Warning: Couldn't read /proc/stat
Welcome, this is GNU's early boot Guile.
Use '--repl' for an initrd REPL.

loading kernel modules...
[    0.915640] usbcore: registered new interface driver usb-storage
[    0.917800] usbcore: registered new interface driver uas
[    0.919569] hidraw: raw HID events driver (C) Jiri Kosina
[    0.920519] usbcore: registered new interface driver usbhid
[    0.921177] usbhid: USB HID core driver
[    0.933506] isci: Intel(R) C600 SAS Controller Driver - version 1.2.0
[    0.951303] PCI Interrupt Link [LNKD] enabled at IRQ 11
[    0.970144] PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.974033] virtio_blk virtio1: [vda] 4505600 512-byte logical blocks (2.31 GB/2.15 GiB)
[    0.976186]  vda: vda1 vda2

;; hanging here.
--8<---------------cut here---------------end--------------->8---

It seems that the boot freezes, soon after the initrd is started, and
before loading the boot script.

Mathieu


[Message part 3 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 40981-done <at> debbugs.gnu.org
Subject: Re: bug#40981: Graphical installer tests sometimes hang.
Date: Mon, 11 May 2020 23:09:05 +0200
Hello,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

>>> The work-around here is to save the installed SIGTERM handler and reset
>>> it. Then, after forking, the parent can restore the SIGTERM handler. The
>>> child will use the default SIGTERM handler that terminates the process.
>>
>> OK, makes sense.  (Another occasion to see how something like
>> ‘posix_spawn’ would be more appropriate than fork + exec…)
>
> Didn't know about that function, but it seems way easier to manipulate
> and less error prone indeed!

Make sure to read “A fork() in the Road” on that topic:

  https://lwn.net/Articles/785430/

>>> +# Try to trigger eventual race conditions, when killing a process between fork
>>> +# and execv calls.
>>> +for i in {1..50}
>>> +do
>>> +    $herd restart test3
>>> +done
>>
>> Would it reproduce the problem well enough?
>
> On a slow machine one time out of two, and on a faster machine,
> never. The 'reproducer' I used, was to add a 'sleep' between fork and
> exec, it works way better!
>
> Tell me if you think its better to drop it.

It’s better than nothing, let’s keep it.

>>From 79d3603bf15b8f815136178be8c8a236734a7596 Mon Sep 17 00:00:00 2001
> From: Mathieu Othacehe <m.othacehe <at> gmail.com>
> Date: Thu, 7 May 2020 18:39:41 +0200
> Subject: [PATCH] service: Fix 'make-kill-destructor' when PGID is zero.
>
> When a process is forked, and before its GID is changed in "exec-command",
> it will share the parent GID, which is 0 for Shepherd. In that case, use
> the PID instead of the PGID.
>
> Also make sure to reset the SIGTERM handler before forking a process. Failing
> to do so, will result in stopping Shepherd if a SIGTERM is received between
> fork and execv calls. Restore the SIGTERM handler once the process has been
> forked.
>
> * modules/shepherd/service.scm (fork+exec-command): Save the current SIGTERM
> handler and reset it before forking. Then, restore it on the parent after
> forking.
> (make-kill-destructor): Handle the case when PGID is zero, between the process
> fork and exec.

I added a “Fixes” line and pushed it.

Thanks a lot!

I can roll a 0.8.1 release soonish (I’d like to add signalfd support
while at it, we’ll see.)

Ludo’.


This bug report was last modified 4 years and 341 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.