GNU bug report logs - #58485
[shepherd] EADDRINUSE while restarting ssh-daemon

Previous Next

Package: guix;

Reported by: Lars-Dominik Braun <lars <at> 6xq.net>

Date: Thu, 13 Oct 2022 07:53:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 58485 <at> debbugs.gnu.org, Lars-Dominik Braun <lars <at> 6xq.net>
Subject: bug#58485: [shepherd] Restarting guix-publish fails
Date: Thu, 27 Apr 2023 23:23:58 +0200
Hi,

Sorry for the late reply.  I’m going through Shepherd bug reports and I
remembered this discussion…

Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:

>> Can you confirm shepherd (PID 1) is 0.9.3?
> it is:
>
> root         1  0.2  0.2 308148 76816 ?        Sl   Feb07  52:08 /gnu/store/kphp5d85rrb3q1rdc2lfqc1mdklwh3qp-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/4nw0zb4swga0cb8i35nvng3rg6z5qm8p-shepherd-0.9.3/bin/shepherd --config /gnu/store/cvrai6z8777jf7860rnvppfznl1lcxi1-shepherd.conf
>
>> ‘sudo herd restart ssh-daemon’ works fine on my laptop FWIW.
> This works fine too. Only unattended-upgrades seems to have this issue :/
>
> The strace looks unsuspicious right now:
>
> ---snip---
> 1     14:12:15.117035 read(21, "(shepherd-command (version 0) (action restart) (service ssh-daemon) (arguments ()) (directory \"/root\"))", 1024) = 103
> 1     14:12:15.117254 close(27)         = 0
> 1     14:12:15.117283 close(30)         = 0
> 1     14:12:15.117416 newfstatat(AT_FDCWD, "/etc/localtime", {st_dev=makedev(0x8, 0x2), st_ino=110100491, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, s
> t_size=2298, st_atime=1676898665 /* 2023-02-20T14:11:05.338746772+0100 */, st_atime_nsec=338746772, st_mtime=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_mtime_nsec=874743456, st_c
> time=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_ctime_nsec=874743456}, 0) = 0
> 1     14:12:15.117475 write(17, "shepherd[1]: Service ssh-daemon has been stopped.\n", 50) = 50
> 1     14:12:15.117524 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 26
> 1     14:12:15.117561 setsockopt(26, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 1     14:12:15.117598 bind(26, {sa_family=AF_INET, sin_port=htons(2222), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
> 1     14:12:15.117724 write(21, "(reply (version 0) (result #f) (error (error (version 0) action-exception start ssh-daemon system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service ssh-daemon has been stopped.\")))", 204) = 204
> 1     14:12:15.117754 close(21)         = 0

This suggests ‘bind’ can return EADDRINUSE even though the sockets have
been closed before (presumably file descriptors 27 and 30 above).

Can you confirm nothing else is competing to bind port 2222 on that
machine?

I tried to reproduce it with something as brutal as:

  while sudo herd restart sshd ; do : ; done

… to no avail (I’m on current Shepherd ‘master’ though).

Maybe we should just have shepherd retry upon EADDRINUSE (like nginx
does, as you wrote), though I’d like to understand under what conditions
we can get EADDRINUSE in the first place.

Ludo’.




This bug report was last modified 2 years and 18 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.