GNU bug report logs -
#77373
[shepherd] Deadlock between service-controller and service-builtin-logger
Previous Next
Reported by: nathan <nathan_mail <at> nborghese.com>
Date: Sat, 29 Mar 2025 23:42:02 UTC
Severity: normal
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 77373 in the body.
You can then email your comments to 77373 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#77373
; Package
guix
.
(Sat, 29 Mar 2025 23:42:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
nathan <nathan_mail <at> nborghese.com>
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Sat, 29 Mar 2025 23:42:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I definitely have a deadlock problem with Shepherd and I do believe I've found it.
shepherd 1.0.3
This is in service-controller when the service has been stopped:
(when logger
(put-message logger 'terminate))
But in service-builtin-logger, this is called every time a line is read:
(or pid
(and service
(eq? 'running (service-status service))
(match (service-running-value service)
((? process? process)
(process-id process))
(value
value))))
service-status -> service-control-message -> put-message to the service
The fibers documentation says put-message is blocking. Surely this is a deadlock.
The value of `pid' can be cached and we might avoid the deadlock.
But this happens ONLY if a line of output is read while the service is 'running.
If output only comes while the service is 'starting/'stopping, we will keep calling service-status.
Potential fix:
At the time spawn-service-builtin-logger is called in fork+exec-command, we already know the pid.
Maybe just prepend the pid to the `command' argument if you want it to appear in the log output.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#77373
; Package
guix
.
(Sun, 30 Mar 2025 14:45:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 77373 <at> debbugs.gnu.org (full text, mbox):
Hi nathan,
nathan <nathan_mail <at> nborghese.com> skribis:
> I definitely have a deadlock problem with Shepherd and I do believe I've found it.
> shepherd 1.0.3
Could you explain exactly how that happens (sequence of actions leading
to the deadlock) and share the relevant /var/log/messages excerpt?
> This is in service-controller when the service has been stopped:
> (when logger
> (put-message logger 'terminate))
> But in service-builtin-logger, this is called every time a line is read:
> (or pid
> (and service
> (eq? 'running (service-status service))
> (match (service-running-value service)
> ((? process? process)
> (process-id process))
> (value
> value))))
>
> service-status -> service-control-message -> put-message to the service
> The fibers documentation says put-message is blocking. Surely this is a deadlock.
Yes, but only if ‘pid’ hasn’t been cached before, which I think would
mean that not a single line was logged before stopping the service.
I’ll take a closer look.
Thanks for reporting it and for investigating!
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#77373
; Package
guix
.
(Sun, 30 Mar 2025 16:17:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 77373 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
happy sunday
> Yes, but only if ‘pid’ hasn’t been cached before, which I think would
> mean that not a single line was logged before stopping the service.
doesn't get cached if no output is read while service is 'running.
fork+exec-command: if the pid-file doesnt show up immediately, there is an entire 1 second sleep. The logger can easily read the output while the service is still in 'starting
also: if the service doesn't flush stdout, we dont get its output until it dies. ('stopping)
> Could you explain exactly how that happens (sequence of actions leading
> to the deadlock) and share the relevant /var/log/messages excerpt?
./shepherd --socket /tmp/s2/mysocket --config <path>
GNU Shepherd 1.0.3 (Guile 3.0.9, x86_64-unknown-linux-gnu)
Starting service root...
Service root started.
Service root running with value #<<process> id: 18114 command: #f>.
Service root has been started.
starting services...
Configuration successfully loaded from '<path>'.
Starting service myservice...
Service myservice has been started.
Service myservice started.
Successfully started 1 service in the background.
Service myservice running with value #<<process> id: 18132 command: ("/tmp/a.out")>.
in other terminal:
./herd -s /tmp/s2/mysocket status myservice
<status is 'running and doesnt show any "Recent messages">
./herd -s /tmp/s2/mysocket stop myservice
works fine
more shepherd output:
Stopping service myservice...
Service myservice stopped.
Service myservice is now stopped.
in other terminal, all of these hang:
./herd -s /tmp/s2/mysocket status myservice
./herd -s /tmp/s2/mysocket stop myservice
./herd -s /tmp/s2/mysocket start myservice
./herd -s /tmp/s2/mysocket status
./herd -s /tmp/s2/mysocket stop root
does not hang:
./herd -s /tmp/s2/mysocket status aaaaa
herd: error: service 'aaaaa' could not be found
I have to kill -9 shepherd.
c source code attached for the test program.
I mentioned two possibilities above, and this is scenario #2. stdout not flushed. I also had what is probably scenario #1 with a different program.
On 3/30/25 2:44 PM, Ludovic Courtès wrote:
> Hi nathan,
>
> nathan <nathan_mail <at> nborghese.com> skribis:
>
>> I definitely have a deadlock problem with Shepherd and I do believe I've found it.
>> shepherd 1.0.3
>
> Could you explain exactly how that happens (sequence of actions leading
> to the deadlock) and share the relevant /var/log/messages excerpt?
>
>> This is in service-controller when the service has been stopped:
>> (when logger
>> (put-message logger 'terminate))
>> But in service-builtin-logger, this is called every time a line is read:
>> (or pid
>> (and service
>> (eq? 'running (service-status service))
>> (match (service-running-value service)
>> ((? process? process)
>> (process-id process))
>> (value
>> value))))
>>
>> service-status -> service-control-message -> put-message to the service
>> The fibers documentation says put-message is blocking. Surely this is a deadlock.
>
> Yes, but only if ‘pid’ hasn’t been cached before, which I think would
> mean that not a single line was logged before stopping the service.
>
> I’ll take a closer look.
>
> Thanks for reporting it and for investigating!
>
> Ludo’.
[b.c (text/x-csrc, attachment)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#77373
; Package
guix
.
(Sun, 30 Mar 2025 16:45:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 77373 <at> debbugs.gnu.org (full text, mbox):
Hey,
nathan <nathan_mail <at> nborghese.com> skribis:
> happy sunday
>
>> Yes, but only if ‘pid’ hasn’t been cached before, which I think would
>> mean that not a single line was logged before stopping the service.
>
> doesn't get cached if no output is read while service is 'running.
> fork+exec-command: if the pid-file doesnt show up immediately, there is an entire 1 second sleep. The logger can easily read the output while the service is still in 'starting
> also: if the service doesn't flush stdout, we dont get its output until it dies. ('stopping)
>
>> Could you explain exactly how that happens (sequence of actions leading
>> to the deadlock) and share the relevant /var/log/messages excerpt?
>
> ./shepherd --socket /tmp/s2/mysocket --config <path>
Yup, I see.
I believe a5d9a41aa0bfadea9617df4411b86c6ea0c525bd fixes it. I added a
test case that’s similar to what you have: the service only starts
logging when being stopped, which makes it very likely that the logger
will receive lines of log after the service has terminated. It would
trigger the deadlock 100% of the time.
Let me know what you think.
Thanks a lot!
Ludo’.
bug closed, send any further explanations to
77373 <at> debbugs.gnu.org and nathan <nathan_mail <at> nborghese.com>
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Sun, 30 Mar 2025 16:46:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#77373
; Package
guix
.
(Sat, 05 Apr 2025 17:09:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 77373 <at> debbugs.gnu.org (full text, mbox):
looks good to me and works. thanks
On 3/30/25 4:44 PM, Ludovic Courtès wrote:
> Hey,
>
> nathan <nathan_mail <at> nborghese.com> skribis:
>
>> happy sunday
>>
>>> Yes, but only if ‘pid’ hasn’t been cached before, which I think would
>>> mean that not a single line was logged before stopping the service.
>>
>> doesn't get cached if no output is read while service is 'running.
>> fork+exec-command: if the pid-file doesnt show up immediately, there is an entire 1 second sleep. The logger can easily read the output while the service is still in 'starting
>> also: if the service doesn't flush stdout, we dont get its output until it dies. ('stopping)
>>
>>> Could you explain exactly how that happens (sequence of actions leading
>>> to the deadlock) and share the relevant /var/log/messages excerpt?
>>
>> ./shepherd --socket /tmp/s2/mysocket --config <path>
>
> Yup, I see.
>
> I believe a5d9a41aa0bfadea9617df4411b86c6ea0c525bd fixes it. I added a
> test case that’s similar to what you have: the service only starts
> logging when being stopped, which makes it very likely that the logger
> will receive lines of log after the service has terminated. It would
> trigger the deadlock 100% of the time.
>
> Let me know what you think.
>
> Thanks a lot!
>
> Ludo’.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 04 May 2025 11:24:24 GMT)
Full text and
rfc822 format available.
This bug report was last modified 97 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.