GNU bug report logs - #40981
shepherd 0.8.0 race condition can lead to stopping itself

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <m.othacehe <at> gmail.com>

Date: Thu, 30 Apr 2020 11:52:02 UTC

Severity: important

Merged with 41429

Done: Mathieu Othacehe <mathieu <at> meru.i-did-not-set--mail-host-address--so-tickle-me>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Mathieu Othacehe <m.othacehe <at> gmail.com>
Subject: bug#40981: closed (Re: bug#40981: Graphical installer tests
 sometimes hang.)
Date: Mon, 11 May 2020 21:10:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#40981: shepherd 0.8.0 race condition can lead to stopping itself

which was filed against the guix package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 40981 <at> debbugs.gnu.org.

-- 
40981: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=40981
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 40981-done <at> debbugs.gnu.org
Subject: Re: bug#40981: Graphical installer tests sometimes hang.
Date: Mon, 11 May 2020 23:09:05 +0200
Hello,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

>>> The work-around here is to save the installed SIGTERM handler and reset
>>> it. Then, after forking, the parent can restore the SIGTERM handler. The
>>> child will use the default SIGTERM handler that terminates the process.
>>
>> OK, makes sense.  (Another occasion to see how something like
>> ‘posix_spawn’ would be more appropriate than fork + exec…)
>
> Didn't know about that function, but it seems way easier to manipulate
> and less error prone indeed!

Make sure to read “A fork() in the Road” on that topic:

  https://lwn.net/Articles/785430/

>>> +# Try to trigger eventual race conditions, when killing a process between fork
>>> +# and execv calls.
>>> +for i in {1..50}
>>> +do
>>> +    $herd restart test3
>>> +done
>>
>> Would it reproduce the problem well enough?
>
> On a slow machine one time out of two, and on a faster machine,
> never. The 'reproducer' I used, was to add a 'sleep' between fork and
> exec, it works way better!
>
> Tell me if you think its better to drop it.

It’s better than nothing, let’s keep it.

>>From 79d3603bf15b8f815136178be8c8a236734a7596 Mon Sep 17 00:00:00 2001
> From: Mathieu Othacehe <m.othacehe <at> gmail.com>
> Date: Thu, 7 May 2020 18:39:41 +0200
> Subject: [PATCH] service: Fix 'make-kill-destructor' when PGID is zero.
>
> When a process is forked, and before its GID is changed in "exec-command",
> it will share the parent GID, which is 0 for Shepherd. In that case, use
> the PID instead of the PGID.
>
> Also make sure to reset the SIGTERM handler before forking a process. Failing
> to do so, will result in stopping Shepherd if a SIGTERM is received between
> fork and execv calls. Restore the SIGTERM handler once the process has been
> forked.
>
> * modules/shepherd/service.scm (fork+exec-command): Save the current SIGTERM
> handler and reset it before forking. Then, restore it on the parent after
> forking.
> (make-kill-destructor): Handle the case when PGID is zero, between the process
> fork and exec.

I added a “Fixes” line and pushed it.

Thanks a lot!

I can roll a 0.8.1 release soonish (I’d like to add signalfd support
while at it, we’ll see.)

Ludo’.

[Message part 3 (message/rfc822, inline)]
From: Mathieu Othacehe <m.othacehe <at> gmail.com>
To: bug-guix <at> gnu.org
Subject: Graphical installer tests sometimes hang.
Date: Thu, 30 Apr 2020 13:51:49 +0200
Hello,

Graphical installer tests sometimes hang, before starting marionette
tests. See for instance:
https://ci.guix.gnu.org/log/d31s9sycixhvfak5lpzdg0mzvz5aa9av-gui-installed-os-encrypted.

Restarting tests just after a hang (on the same installed image),
sometimes work. I removed the "quiet" kernel argument to see what's
going on.

--8<---------------cut here---------------start------------->8---
[    0.862608] Freeing unused kernel image memory: 1964K
[    0.863177] Run /init as init process
GC Warning: pthread_getattr_np or pthread_attr_getstack failed for main thread
GC Warning: Couldn't read /proc/stat
Welcome, this is GNU's early boot Guile.
Use '--repl' for an initrd REPL.

loading kernel modules...
[    0.915640] usbcore: registered new interface driver usb-storage
[    0.917800] usbcore: registered new interface driver uas
[    0.919569] hidraw: raw HID events driver (C) Jiri Kosina
[    0.920519] usbcore: registered new interface driver usbhid
[    0.921177] usbhid: USB HID core driver
[    0.933506] isci: Intel(R) C600 SAS Controller Driver - version 1.2.0
[    0.951303] PCI Interrupt Link [LNKD] enabled at IRQ 11
[    0.970144] PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.974033] virtio_blk virtio1: [vda] 4505600 512-byte logical blocks (2.31 GB/2.15 GiB)
[    0.976186]  vda: vda1 vda2

;; hanging here.
--8<---------------cut here---------------end--------------->8---

It seems that the boot freezes, soon after the initrd is started, and
before loading the boot script.

Mathieu



This bug report was last modified 4 years and 341 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.