GNU bug report logs -
#30637
[WIP] shepherd: Poll every 0.5s to find dead forked services
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Hey Ludo,
On Wed, Feb 28 2018, Ludovic Courtès wrote:
>> The problem is that shepherd, when run as a user process, can
>> "lose"
>> services which fork away. Shepherd can still kill them, but a
>> SIGCHLD
>> won't be delivered if they die, so shepherd can't
>> restart/disable
>> them. My prime example is emacs, which I run with --daemon. If
>> I then
>> kill emacs, shepherd will still think that it is running.
>
> There are two issues here, I think.
>
> 1. shepherd cannot lose SIGCHLD: if a process dies immediately
> once
> it’s been spawned, as is the case with “emacs --daemon” or
> any
> other daemon-style program, it should receive SIGCHLD and
> process
> it.
Yeah, that's true, but the problem is that shepherd only processes
the SIGCHLD if there is a service with its `running` slot set to
the pid. When emacs forks, the original process may have its
SIGCHLD handled, but that doesn't affect shepherd's service state
(as it shouldn't, because it's using #:pid-file to track the
forked process).
> 2. shepherd currently can’t do much with real daemons. So
> what we do
> in GuixSD is to either start programs in non-daemon mode,
> when
> that’s an option, or pass #:pid-file to retrieve the forked
> process
> PID. I think you should do one of these as well.
I am doing that. The problem is that when a service dies (crashes,
quits, etc.) the `respawn?` option cannot be honoured because
shepherd is not notified that the process has terminated (because
it never receives a SIGCHLD for the forked pid). My patch polls
for the processes we expect, to make up for the lack of
notification. I would much rather it receive an event/signal to
notify that the forked process has died, but I don't know how to
do that in a robust, portable way so I chose to poll instead.
If you look at my test case in tests/respawn-service.sh (which can
be read in its entirety in the diff attached to my previous email)
you can see the problem that this patch solves. The test will fail
without the rest of my patch, but will pass with them (guix build
container issue notwithstanding).
Carlo
[signature.asc (application/pgp-signature, inline)]
This bug report was last modified 7 years and 133 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.