GNU bug report logs - #77373
[shepherd] Deadlock between service-controller and service-builtin-logger

Previous Next

Package: guix;

Reported by: nathan <nathan_mail <at> nborghese.com>

Date: Sat, 29 Mar 2025 23:42:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 77373 in the body.
You can then email your comments to 77373 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#77373; Package guix. (Sat, 29 Mar 2025 23:42:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to nathan <nathan_mail <at> nborghese.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sat, 29 Mar 2025 23:42:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: nathan <nathan_mail <at> nborghese.com>
To: bug-guix <at> gnu.org
Subject: [shepherd] Deadlock between service-controller and
 service-builtin-logger
Date: Sat, 29 Mar 2025 23:41:24 +0000
I definitely have a deadlock problem with Shepherd and I do believe I've found it.
shepherd 1.0.3

This is in service-controller when the service has been stopped:
(when logger
  (put-message logger 'terminate))
But in service-builtin-logger, this is called every time a line is read:
(or pid
    (and service
         (eq? 'running (service-status service))
         (match (service-running-value service)
           ((? process? process)
            (process-id process))
           (value
            value))))

service-status -> service-control-message -> put-message to the service
The fibers documentation says put-message is blocking. Surely this is a deadlock.

The value of `pid' can be cached and we might avoid the deadlock.
But this happens ONLY if a line of output is read while the service is 'running.
If output only comes while the service is 'starting/'stopping, we will keep calling service-status.

Potential fix:
At the time spawn-service-builtin-logger is called in fork+exec-command, we already know the pid.
Maybe just prepend the pid to the `command' argument if you want it to appear in the log output.





Information forwarded to bug-guix <at> gnu.org:
bug#77373; Package guix. (Sun, 30 Mar 2025 14:45:02 GMT) Full text and rfc822 format available.

Message #8 received at 77373 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: nathan <nathan_mail <at> nborghese.com>
Cc: 77373 <at> debbugs.gnu.org
Subject: Re: bug#77373: [shepherd] Deadlock between service-controller and
 service-builtin-logger
Date: Sun, 30 Mar 2025 16:44:24 +0200
Hi nathan,

nathan <nathan_mail <at> nborghese.com> skribis:

> I definitely have a deadlock problem with Shepherd and I do believe I've found it.
> shepherd 1.0.3

Could you explain exactly how that happens (sequence of actions leading
to the deadlock) and share the relevant /var/log/messages excerpt?

> This is in service-controller when the service has been stopped:
> (when logger
>   (put-message logger 'terminate))
> But in service-builtin-logger, this is called every time a line is read:
> (or pid
>     (and service
>          (eq? 'running (service-status service))
>          (match (service-running-value service)
>            ((? process? process)
>             (process-id process))
>            (value
>             value))))
>
> service-status -> service-control-message -> put-message to the service
> The fibers documentation says put-message is blocking. Surely this is a deadlock.

Yes, but only if ‘pid’ hasn’t been cached before, which I think would
mean that not a single line was logged before stopping the service.

I’ll take a closer look.

Thanks for reporting it and for investigating!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#77373; Package guix. (Sun, 30 Mar 2025 16:17:02 GMT) Full text and rfc822 format available.

Message #11 received at 77373 <at> debbugs.gnu.org (full text, mbox):

From: nathan <nathan_mail <at> nborghese.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77373 <at> debbugs.gnu.org
Subject: Re: bug#77373: [shepherd] Deadlock between service-controller and
 service-builtin-logger
Date: Sun, 30 Mar 2025 16:16:15 +0000
[Message part 1 (text/plain, inline)]
happy sunday

> Yes, but only if ‘pid’ hasn’t been cached before, which I think would
> mean that not a single line was logged before stopping the service.

doesn't get cached if no output is read while service is 'running.
fork+exec-command: if the pid-file doesnt show up immediately, there is an entire 1 second sleep. The logger can easily read the output while the service is still in 'starting
also: if the service doesn't flush stdout, we dont get its output until it dies. ('stopping)

> Could you explain exactly how that happens (sequence of actions leading
> to the deadlock) and share the relevant /var/log/messages excerpt?

./shepherd --socket /tmp/s2/mysocket --config <path>

GNU Shepherd 1.0.3 (Guile 3.0.9, x86_64-unknown-linux-gnu)
Starting service root...
Service root started.
Service root running with value #<<process> id: 18114 command: #f>.
Service root has been started.
starting services...
Configuration successfully loaded from '<path>'.
Starting service myservice...
Service myservice has been started.
Service myservice started.
Successfully started 1 service in the background.
Service myservice running with value #<<process> id: 18132 command: ("/tmp/a.out")>.

in other terminal:
./herd -s /tmp/s2/mysocket status myservice
<status is 'running and doesnt show any "Recent messages">

./herd -s /tmp/s2/mysocket stop myservice
works fine

more shepherd output:
Stopping service myservice...
Service myservice stopped.
Service myservice is now stopped.

in other terminal, all of these hang:
./herd -s /tmp/s2/mysocket status myservice
./herd -s /tmp/s2/mysocket stop myservice
./herd -s /tmp/s2/mysocket start myservice
./herd -s /tmp/s2/mysocket status
./herd -s /tmp/s2/mysocket stop root

does not hang:
./herd -s /tmp/s2/mysocket status aaaaa
herd: error: service 'aaaaa' could not be found

I have to kill -9 shepherd.

c source code attached for the test program.
I mentioned two possibilities above, and this is scenario #2. stdout not flushed. I also had what is probably scenario #1 with a different program.


On 3/30/25 2:44 PM, Ludovic Courtès wrote:
> Hi nathan,
> 
> nathan <nathan_mail <at> nborghese.com> skribis:
> 
>> I definitely have a deadlock problem with Shepherd and I do believe I've found it.
>> shepherd 1.0.3
> 
> Could you explain exactly how that happens (sequence of actions leading
> to the deadlock) and share the relevant /var/log/messages excerpt?
> 
>> This is in service-controller when the service has been stopped:
>> (when logger
>>    (put-message logger 'terminate))
>> But in service-builtin-logger, this is called every time a line is read:
>> (or pid
>>      (and service
>>           (eq? 'running (service-status service))
>>           (match (service-running-value service)
>>             ((? process? process)
>>              (process-id process))
>>             (value
>>              value))))
>>
>> service-status -> service-control-message -> put-message to the service
>> The fibers documentation says put-message is blocking. Surely this is a deadlock.
> 
> Yes, but only if ‘pid’ hasn’t been cached before, which I think would
> mean that not a single line was logged before stopping the service.
> 
> I’ll take a closer look.
> 
> Thanks for reporting it and for investigating!
> 
> Ludo’.
[b.c (text/x-csrc, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#77373; Package guix. (Sun, 30 Mar 2025 16:45:02 GMT) Full text and rfc822 format available.

Message #14 received at 77373 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: nathan <nathan_mail <at> nborghese.com>
Cc: 77373 <at> debbugs.gnu.org
Subject: Re: bug#77373: [shepherd] Deadlock between service-controller and
 service-builtin-logger
Date: Sun, 30 Mar 2025 18:44:50 +0200
Hey,

nathan <nathan_mail <at> nborghese.com> skribis:

> happy sunday
>
>> Yes, but only if ‘pid’ hasn’t been cached before, which I think would
>> mean that not a single line was logged before stopping the service.
>
> doesn't get cached if no output is read while service is 'running.
> fork+exec-command: if the pid-file doesnt show up immediately, there is an entire 1 second sleep. The logger can easily read the output while the service is still in 'starting
> also: if the service doesn't flush stdout, we dont get its output until it dies. ('stopping)
>
>> Could you explain exactly how that happens (sequence of actions leading
>> to the deadlock) and share the relevant /var/log/messages excerpt?
>
> ./shepherd --socket /tmp/s2/mysocket --config <path>

Yup, I see.

I believe a5d9a41aa0bfadea9617df4411b86c6ea0c525bd fixes it.  I added a
test case that’s similar to what you have: the service only starts
logging when being stopped, which makes it very likely that the logger
will receive lines of log after the service has terminated.  It would
trigger the deadlock 100% of the time.

Let me know what you think.

Thanks a lot!

Ludo’.




bug closed, send any further explanations to 77373 <at> debbugs.gnu.org and nathan <nathan_mail <at> nborghese.com> Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 30 Mar 2025 16:46:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#77373; Package guix. (Sat, 05 Apr 2025 17:09:02 GMT) Full text and rfc822 format available.

Message #19 received at 77373 <at> debbugs.gnu.org (full text, mbox):

From: nathan <nathan_mail <at> nborghese.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77373 <at> debbugs.gnu.org
Subject: Re: bug#77373: [shepherd] Deadlock between service-controller and
 service-builtin-logger
Date: Sat, 5 Apr 2025 17:08:23 +0000
looks good to me and works. thanks

On 3/30/25 4:44 PM, Ludovic Courtès wrote:
> Hey,
> 
> nathan <nathan_mail <at> nborghese.com> skribis:
> 
>> happy sunday
>>
>>> Yes, but only if ‘pid’ hasn’t been cached before, which I think would
>>> mean that not a single line was logged before stopping the service.
>>
>> doesn't get cached if no output is read while service is 'running.
>> fork+exec-command: if the pid-file doesnt show up immediately, there is an entire 1 second sleep. The logger can easily read the output while the service is still in 'starting
>> also: if the service doesn't flush stdout, we dont get its output until it dies. ('stopping)
>>
>>> Could you explain exactly how that happens (sequence of actions leading
>>> to the deadlock) and share the relevant /var/log/messages excerpt?
>>
>> ./shepherd --socket /tmp/s2/mysocket --config <path>
> 
> Yup, I see.
> 
> I believe a5d9a41aa0bfadea9617df4411b86c6ea0c525bd fixes it.  I added a
> test case that’s similar to what you have: the service only starts
> logging when being stopped, which makes it very likely that the logger
> will receive lines of log after the service has terminated.  It would
> trigger the deadlock 100% of the time.
> 
> Let me know what you think.
> 
> Thanks a lot!
> 
> Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 04 May 2025 11:24:24 GMT) Full text and rfc822 format available.

This bug report was last modified 97 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.