GNU bug report logs -
#57922
Shepherd doesn't seem to correctly handle waitpid itself
Previous Next
Full log
View this message in rfc822 format
Hi,
Josselin Poiret <dev <at> jpoiret.xyz> skribis:
> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:
>
>> This leads me to believe that Shepherd does not block until the process
>> is actually dead to mark the process as stopped (it just waitpid on the
>> group pid with WNOHANG), which means it won't block if the child process
>> hasn't exited yet, if I'm correct.
Correct: the service is marked as stopped as soon as ‘stop’ returns.
>> When we are in the stop slot, we know for sure that the process should
>> terminate completely, hence it'd make sense to call 'waitpid' *without*
>> WNOHANG there, to avoid 'herd restart' from starting the service while
>> its stopped process is not done terminating.
>>
>> jamid can take quite some time to terminate cleanly because of the
>> networking threads in the opendht library that needs to be finalized,
>> which is probably the reason this problem can be observed here.
>>
>> Thoughts?
>
> I agree with you, make-kill-destructor should waitpid the processes it's
> killing. There shouldn't be any issues waitpid'ing before the
> shepherd's signal handler, since stop actions are run with asyncs
> disabled. The signal handler will run once but won't get anything
> because all the processes were already waitpid'd and it uses WNOHANG.
I think we need an extra “stopping” state for services. In general,
we’ll want to send SIGTERM, wait for some grace period or dead process
notification, then send SIGKILL, and finally change state to “stopped”.
This is not possible in 0.9 but is something I’d like to have in 0.10¹.
Ludo’.
¹ https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00350.html
This bug report was last modified 2 years and 236 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.