GNU bug report logs - #58485
[shepherd] EADDRINUSE while restarting ssh-daemon

Previous Next

Package: guix;

Reported by: Lars-Dominik Braun <lars <at> 6xq.net>

Date: Thu, 13 Oct 2022 07:53:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#58485: closed ([shepherd] EADDRINUSE while restarting ssh-daemon)
Date: Sun, 11 Jun 2023 14:21:01 +0000
[Message part 1 (text/plain, inline)]
Your message dated Sun, 11 Jun 2023 16:20:21 +0200
with message-id <87ttveqdwa.fsf <at> gnu.org>
and subject line Re: bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon
has caused the debbugs.gnu.org bug report #58485,
regarding [shepherd] EADDRINUSE while restarting ssh-daemon
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
58485: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=58485
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Lars-Dominik Braun <lars <at> 6xq.net>
To: bug-guix <at> gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [shepherd] Restarting guix-publish fails
Date: Thu, 13 Oct 2022 09:51:58 +0200
Hi,

it seems that `herd restart guix-publish` stopped working after the
introduction of socket activation into shepherd. This is a problem,
because I restart guix-publish automatically after unattended-upgrades. It
fails with the following error for me:

---snip---
Backtrace:
           7 (primitive-load "/gnu/store/7xrg2sbb529ki6hv99n27svg0fi?")
In ice-9/boot-9.scm:
    724:2  6 (call-with-prompt ("prompt") #<procedure 7f8173184940 ?> ?)
  1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In ice-9/eval.scm:
    619:8  4 (_ #(#(#<directory (guile-user) 7f817318ac80>)))
In ice-9/boot-9.scm:
   260:13  3 (for-each #<procedure restart-service (name)> _)
In gnu/services/herd.scm:
    168:4  2 (invoke-action guix-publish restart () #<procedure 7f81?>)
    176:7  1 (failure)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
ERROR:
  1. &action-exception-error:
      service: guix-publish
      action: start
      key: system-error
      args: ("bind" "~A" ("Address already in use") (98))
---snap---

Note that due to the socket activation you must visit the URL at least
once to start up the guix-publish process. Otherwise a restart will
work fine. It also works fine the second time I invoke `herd restart
guix-publish`, because `guix-publish` is dead by that time.

Looking at an strace shepherd is indeed trying to kill `guix-publish`
and re-bind to the same address:

---snip---
1     read(23, "(shepherd-command (version 0) (action restart) (service guix-publish) (arguments ()) (directory \"/root\"))", 1024) = 105
1     getpgid(18096)                    = 18096
1     getpgid(0)                        = 0
1     kill(-18096, SIGTERM)             = 0
1     newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2298, ...}, 0) = 0
1     write(17, "shepherd[1]: Service guix-publish has been stopped.\n", 52) = 52
1     socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 36
1     setsockopt(36, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
1     bind(36, {sa_family=AF_INET, sin_port=htons(8082), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRINUSE (Address already in use)
1     write(23, "(reply (version 0) (result #f) (error (error (version 0) action-exception start guix-publish system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service guix-publish has been stopped.\")))", 208) = 208
1     close(23)
---snap---

The obvious explanation would be that stopping does not wait for the
process to actually exit. make-kill-destructor does not waitpid it seems
and 'running is set unconditionally to #f after 'stop has finished.

Cheers,
Lars



[Message part 3 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: Lars-Dominik Braun <lars <at> 6xq.net>, 58485-done <at> debbugs.gnu.org
Subject: Re: bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon
Date: Sun, 11 Jun 2023 16:20:21 +0200
Hi Lars,

Ludovic Courtès <ludo <at> gnu.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>>> 1     14:12:15.117035 read(21, "(shepherd-command (version 0) (action restart) (service ssh-daemon) (arguments ()) (directory \"/root\"))", 1024) = 103
>>> 1     14:12:15.117254 close(27)         = 0
>>> 1     14:12:15.117283 close(30)         = 0
>>> 1     14:12:15.117416 newfstatat(AT_FDCWD, "/etc/localtime", {st_dev=makedev(0x8, 0x2), st_ino=110100491, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, s
>>> t_size=2298, st_atime=1676898665 /* 2023-02-20T14:11:05.338746772+0100 */, st_atime_nsec=338746772, st_mtime=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_mtime_nsec=874743456, st_c
>>> time=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_ctime_nsec=874743456}, 0) = 0
>>> 1     14:12:15.117475 write(17, "shepherd[1]: Service ssh-daemon has been stopped.\n", 50) = 50
>>> 1     14:12:15.117524 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 26
>>> 1     14:12:15.117561 setsockopt(26, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>>> 1     14:12:15.117598 bind(26, {sa_family=AF_INET, sin_port=htons(2222), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
>>> 1     14:12:15.117724 write(21, "(reply (version 0) (result #f) (error (error (version 0) action-exception start ssh-daemon system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service ssh-daemon has been stopped.\")))", 204) = 204
>>> 1     14:12:15.117754 close(21)         = 0
>
> [...]
>
>> Maybe we should just have shepherd retry upon EADDRINUSE (like nginx
>> does, as you wrote), though I’d like to understand under what conditions
>> we can get EADDRINUSE in the first place.
>
> Done:
>
>   https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=41789ee8d0e164967f9ca196db4e9601400a462e

I’m assuming that this is fixed in Shepherd 0.10.x.  Please reopen if
you stumble upon this issue again.

Ludo’.


This bug report was last modified 2 years and 18 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.