GNU bug report logs - #30637
[WIP] shepherd: Poll every 0.5s to find dead forked services

Previous Next

Package: guix-patches;

Reported by: Carlo Zancanaro <carlo <at> zancanaro.id.au>

Date: Tue, 27 Feb 2018 21:58:02 UTC

Severity: normal

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

Full log


Message #17 received at 30637 <at> debbugs.gnu.org (full text, mbox):

From: Carlo Zancanaro <carlo <at> zancanaro.id.au>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 30637 <at> debbugs.gnu.org
Subject: Re: [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked
 services
Date: Fri, 02 Mar 2018 21:13:53 +1100
[Message part 1 (text/plain, inline)]
Hey Ludo,

On Fri, Mar 02 2018, Ludovic Courtès wrote:
>> I am doing that. The problem is that when a service dies 
>> (crashes, quits, etc.) the `respawn?` option cannot be honoured 
>> because shepherd is not notified that the process has 
>> terminated (because it never receives a SIGCHLD for the forked 
>> pid). My patch polls for the processes we expect, to make up 
>> for the lack of notification.
>
> I see.
>
> Actually, thinking more about it, we should be using 
> PR_SET_CHILD_SUBREAPER from prctl(2), which is designed exactly 
> for that.

Excellent! This is exactly the information that I needed. This is 
what I've been looking for, but without enough knowledge to be 
able to find it. Thanks!

> So what about this plan:
>
>   1. Add FFI bindings in (shepherd system) for prctl(2). We 
>   should arrange for it to throw to 'system-error when the 
>   ‘prctl’ symbol is missing, as is the case on GNU/Hurd.

Are we okay with having this just not work on GNU/Hurd (or kernels 
older than 3.4, according to the prctl manpage)? We could fall 
back to a polling approach if prctl isn't available? I don't 
really like the idea of this working on some kernels but not 
others, given that process supervision is one of the main jobs of 
shepherd.

>   2. Use prctl/PR_SET_CHILD_SUBREAPER in ‘exec-command’. Here we 
>   must ‘catch-system-error’ around that call to cater to 
>   GNU/Hurd.

Why would we need to set it in exec-command? It looks like it 
modifies the state of the calling process, which means we'd want 
to set it in the shepherd service, not in each of the child 
processes.

> That would address the main issue without having to resort to 
> polling. Respawning will work only when #:pid-file is used 
> though, but that’s already an improvement.
>
> Thoughts?

I'll try to get this working in the next few days. Hopefully 
you'll see a patch from me soon.

Carlo
[signature.asc (application/pgp-signature, inline)]

This bug report was last modified 7 years and 134 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.