GNU bug report logs - #74279
Shepherd service is not getting respawned.

Previous Next

Package: guix;

Reported by: Tomas Volf <~@wolfsden.cz>

Date: Sat, 9 Nov 2024 15:01:01 UTC

Severity: normal

Tags: notabug

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 74279 in the body.
You can then email your comments to 74279 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#74279; Package guix. (Sat, 09 Nov 2024 15:01:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tomas Volf <~@wolfsden.cz>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sat, 09 Nov 2024 15:01:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: bug-guix <at> gnu.org
Subject: Shepherd service is not getting respawned.
Date: Sat, 09 Nov 2024 15:58:02 +0100
Hi,

I wrote a shepherd service to function as a check for networking being
actually up, but it does not get respawned when it fails and I do not
understand why.

This is the service in my operating-system:

--8<---------------cut here---------------start------------->8---
(simple-service
 'network-online
 shepherd-root-service-type
 (list (shepherd-service
        (requirement '(networking))
        (provision '(network-online))
        (documentation "Wait for the network to come up.")
        (start #~(lambda _
                   (let* ((cmd "/run/privileged/bin/ping -qc1 -W1 1.1.1.1")
                          (status (system cmd)))
                     (= 0 (status:exit-val status)))))
        (one-shot? #t)
        ;; Try every second.
        (respawn-delay 1)
        ;; Retry forever.  Double-quoting is intentional.
        (respawn-limit ''(5 . 5)))))
--8<---------------cut here---------------end--------------->8---

Now, when I reboot the machine, I see in the log that the service did
start:

--8<---------------cut here---------------start------------->8---
Nov  7 00:18:20 localhost shepherd[1]: Starting service network-online...
[..]
Nov  7 00:18:20 localhost shepherd[1]: [sh] PING 192.168.0.110 (192.168.0.110): 56 data bytes
Nov  7 00:18:20 localhost shepherd[1]: [sh] /run/privileged/bin/ping: sending packet: Network is unreachable
Nov  7 00:18:20 localhost shepherd[1]: Service network-online could not be started.
Nov  7 00:18:20 localhost shepherd[1]: Service network-online failed to start.
--8<---------------cut here---------------end--------------->8---

The fail on first run is expected, however the problem is it starts
exactly once.  I do not see any attempts to respawn it in the
/var/log/messages, but based on the documentation the service *should*
get respawned, since it failed.  What am I doing wrong?  Would anyone
have any suggestions, either what is wrong with the code above or how to
approach it in another way?

Have a nice day,
Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.




Information forwarded to bug-guix <at> gnu.org:
bug#74279; Package guix. (Sun, 10 Nov 2024 11:33:01 GMT) Full text and rfc822 format available.

Message #8 received at 74279 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 74279 <at> debbugs.gnu.org
Subject: Re: bug#74279: Shepherd service is not getting respawned.
Date: Sun, 10 Nov 2024 12:32:48 +0100
Hi Tomas,

Tomas Volf <~@wolfsden.cz> skribis:

>         (start #~(lambda _
>                    (let* ((cmd "/run/privileged/bin/ping -qc1 -W1 1.1.1.1")
>                           (status (system cmd)))
>                      (= 0 (status:exit-val status)))))
>         (one-shot? #t)
>         ;; Try every second.
>         (respawn-delay 1)
>         ;; Retry forever.  Double-quoting is intentional.
>         (respawn-limit ''(5 . 5)))))

[...]

> Nov  7 00:18:20 localhost shepherd[1]: Starting service network-online...
> [..]
> Nov  7 00:18:20 localhost shepherd[1]: [sh] PING 192.168.0.110 (192.168.0.110): 56 data bytes
> Nov  7 00:18:20 localhost shepherd[1]: [sh] /run/privileged/bin/ping: sending packet: Network is unreachable
> Nov  7 00:18:20 localhost shepherd[1]: Service network-online could not be started.
> Nov  7 00:18:20 localhost shepherd[1]: Service network-online failed to start.

I think there’s a misunderstanding here: ‘respawn?’ is about respawning
a service that, once it is running, terminates prematurely.

In your case, the service does not start (its ‘start’ method returns
#f).

Now, it would probably make sense to have a mechanism to retry starting
services.

In the specific case of ‘network-online’ though, you could use a
different approach: the ‘start’ method could itself try retry pinging
the network several times and fail only if it failed to reach the
network after, say, 10s.  (Remember that ‘start’ and ‘stop’ must
complete in a timely fashion.)

HTH,
Ludo’.




Added tag(s) notabug. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 20 Nov 2024 21:49:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 74279 <at> debbugs.gnu.org and Tomas Volf <~@wolfsden.cz> Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 20 Nov 2024 21:49:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 19 Dec 2024 12:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 184 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.