GNU bug report logs -
#30637
[WIP] shepherd: Poll every 0.5s to find dead forked services
Previous Next
Full log
Message #17 received at 30637 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hey Ludo,
On Fri, Mar 02 2018, Ludovic Courtès wrote:
>> I am doing that. The problem is that when a service dies
>> (crashes, quits, etc.) the `respawn?` option cannot be honoured
>> because shepherd is not notified that the process has
>> terminated (because it never receives a SIGCHLD for the forked
>> pid). My patch polls for the processes we expect, to make up
>> for the lack of notification.
>
> I see.
>
> Actually, thinking more about it, we should be using
> PR_SET_CHILD_SUBREAPER from prctl(2), which is designed exactly
> for that.
Excellent! This is exactly the information that I needed. This is
what I've been looking for, but without enough knowledge to be
able to find it. Thanks!
> So what about this plan:
>
> 1. Add FFI bindings in (shepherd system) for prctl(2). We
> should arrange for it to throw to 'system-error when the
> ‘prctl’ symbol is missing, as is the case on GNU/Hurd.
Are we okay with having this just not work on GNU/Hurd (or kernels
older than 3.4, according to the prctl manpage)? We could fall
back to a polling approach if prctl isn't available? I don't
really like the idea of this working on some kernels but not
others, given that process supervision is one of the main jobs of
shepherd.
> 2. Use prctl/PR_SET_CHILD_SUBREAPER in ‘exec-command’. Here we
> must ‘catch-system-error’ around that call to cater to
> GNU/Hurd.
Why would we need to set it in exec-command? It looks like it
modifies the state of the calling process, which means we'd want
to set it in the shepherd service, not in each of the child
processes.
> That would address the main issue without having to resort to
> polling. Respawning will work only when #:pid-file is used
> though, but that’s already an improvement.
>
> Thoughts?
I'll try to get this working in the next few days. Hopefully
you'll see a patch from me soon.
Carlo
[signature.asc (application/pgp-signature, inline)]
This bug report was last modified 7 years and 134 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.