GNU bug report logs - #40981
shepherd 0.8.0 race condition can lead to stopping itself

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <m.othacehe <at> gmail.com>

Date: Thu, 30 Apr 2020 11:52:02 UTC

Severity: important

Merged with 41429

Done: Mathieu Othacehe <mathieu <at> meru.i-did-not-set--mail-host-address--so-tickle-me>

Bug is archived. No further changes may be made.

Full log


Message #38 received at 40981-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 40981-done <at> debbugs.gnu.org
Subject: Re: bug#40981: Graphical installer tests sometimes hang.
Date: Mon, 11 May 2020 23:09:05 +0200
Hello,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

>>> The work-around here is to save the installed SIGTERM handler and reset
>>> it. Then, after forking, the parent can restore the SIGTERM handler. The
>>> child will use the default SIGTERM handler that terminates the process.
>>
>> OK, makes sense.  (Another occasion to see how something like
>> ‘posix_spawn’ would be more appropriate than fork + exec…)
>
> Didn't know about that function, but it seems way easier to manipulate
> and less error prone indeed!

Make sure to read “A fork() in the Road” on that topic:

  https://lwn.net/Articles/785430/

>>> +# Try to trigger eventual race conditions, when killing a process between fork
>>> +# and execv calls.
>>> +for i in {1..50}
>>> +do
>>> +    $herd restart test3
>>> +done
>>
>> Would it reproduce the problem well enough?
>
> On a slow machine one time out of two, and on a faster machine,
> never. The 'reproducer' I used, was to add a 'sleep' between fork and
> exec, it works way better!
>
> Tell me if you think its better to drop it.

It’s better than nothing, let’s keep it.

>>From 79d3603bf15b8f815136178be8c8a236734a7596 Mon Sep 17 00:00:00 2001
> From: Mathieu Othacehe <m.othacehe <at> gmail.com>
> Date: Thu, 7 May 2020 18:39:41 +0200
> Subject: [PATCH] service: Fix 'make-kill-destructor' when PGID is zero.
>
> When a process is forked, and before its GID is changed in "exec-command",
> it will share the parent GID, which is 0 for Shepherd. In that case, use
> the PID instead of the PGID.
>
> Also make sure to reset the SIGTERM handler before forking a process. Failing
> to do so, will result in stopping Shepherd if a SIGTERM is received between
> fork and execv calls. Restore the SIGTERM handler once the process has been
> forked.
>
> * modules/shepherd/service.scm (fork+exec-command): Save the current SIGTERM
> handler and reset it before forking. Then, restore it on the parent after
> forking.
> (make-kill-destructor): Handle the case when PGID is zero, between the process
> fork and exec.

I added a “Fixes” line and pushed it.

Thanks a lot!

I can roll a 0.8.1 release soonish (I’d like to add signalfd support
while at it, we’ll see.)

Ludo’.




This bug report was last modified 4 years and 341 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.