GNU bug report logs - #56674
[Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Wed, 20 Jul 2022 21:40:01 UTC

Severity: important

Merged with 58926

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Ludovic Courtès <ludo <at> gnu.org>
To: 56674 <at> debbugs.gnu.org
Subject: bug#56674: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks
Date: Wed, 20 Jul 2022 23:39:08 +0200
Hi!

We’ve just had a bad experience with the nginx service on berlin, where
‘herd restart nginx’ would cause shepherd to get stuck forever in
‘waitpid’ on the process that was supposed to start nginx.

The details are unclear, but one thing is clear is that using ‘waitpid’
(either directly or indirectly with ‘system*’, which is what
‘nginx-service-type’ does) is not great:

  1. In the best case, shepherd (as of 0.9.1) is stuck while ‘system*’
     is in ‘waitpid’ waiting for child process completion (“stuck” as
     in: doesn’t do anything, not even answering ‘herd’ requests or
     inetd connections.)

  2. I don’t think that can happen with ‘system*’ (because it’s in C),
     but generally speaking, there’s a possibility that shepherd’s event
     loop will handle child process termination before some other
     user-made ‘waitpid’ call does.

Anyway, that’s a bad situation.

So I can think of several ways to address it:

  1. Change the nginx service ‘stop’ method to just
     (make-kill-destructor), which should work just as well as invoking
     “nginx -s stop”.

  2. Have Shepherd provide a replacement for ‘system*’.

Thoughts?

Ludo’.




This bug report was last modified 2 years and 182 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.