GNU bug report logs - #58485
[shepherd] EADDRINUSE while restarting ssh-daemon

Previous Next

Package: guix;

Reported by: Lars-Dominik Braun <lars <at> 6xq.net>

Date: Thu, 13 Oct 2022 07:53:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Lars-Dominik Braun <lars <at> 6xq.net>
Subject: bug#58485: closed (Re: bug#58485: [shepherd] Restarting
 guix-publish fails)
Date: Thu, 17 Nov 2022 10:20:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#58485: [shepherd] Restarting guix-publish fails

which was filed against the guix package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 58485 <at> debbugs.gnu.org.

-- 
58485: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=58485
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <lars <at> 6xq.net>
Cc: 58485-done <at> debbugs.gnu.org
Subject: Re: bug#58485: [shepherd] Restarting guix-publish fails
Date: Thu, 17 Nov 2022 11:19:04 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Indeed.  This is fixed by Shepherd commit
> d97592f58603ff51cb280ae57d413c8731e601b3, which will be in the upcoming
> 0.9.3 release.

The Shepherd 0.9.3 has landed in Guix commit
283d7318c5b312d7129adb6dbeea6ad205ce89d1.

Ludo’.

[Message part 3 (message/rfc822, inline)]
From: Lars-Dominik Braun <lars <at> 6xq.net>
To: bug-guix <at> gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [shepherd] Restarting guix-publish fails
Date: Thu, 13 Oct 2022 09:51:58 +0200
Hi,

it seems that `herd restart guix-publish` stopped working after the
introduction of socket activation into shepherd. This is a problem,
because I restart guix-publish automatically after unattended-upgrades. It
fails with the following error for me:

---snip---
Backtrace:
           7 (primitive-load "/gnu/store/7xrg2sbb529ki6hv99n27svg0fi?")
In ice-9/boot-9.scm:
    724:2  6 (call-with-prompt ("prompt") #<procedure 7f8173184940 ?> ?)
  1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In ice-9/eval.scm:
    619:8  4 (_ #(#(#<directory (guile-user) 7f817318ac80>)))
In ice-9/boot-9.scm:
   260:13  3 (for-each #<procedure restart-service (name)> _)
In gnu/services/herd.scm:
    168:4  2 (invoke-action guix-publish restart () #<procedure 7f81?>)
    176:7  1 (failure)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
ERROR:
  1. &action-exception-error:
      service: guix-publish
      action: start
      key: system-error
      args: ("bind" "~A" ("Address already in use") (98))
---snap---

Note that due to the socket activation you must visit the URL at least
once to start up the guix-publish process. Otherwise a restart will
work fine. It also works fine the second time I invoke `herd restart
guix-publish`, because `guix-publish` is dead by that time.

Looking at an strace shepherd is indeed trying to kill `guix-publish`
and re-bind to the same address:

---snip---
1     read(23, "(shepherd-command (version 0) (action restart) (service guix-publish) (arguments ()) (directory \"/root\"))", 1024) = 105
1     getpgid(18096)                    = 18096
1     getpgid(0)                        = 0
1     kill(-18096, SIGTERM)             = 0
1     newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2298, ...}, 0) = 0
1     write(17, "shepherd[1]: Service guix-publish has been stopped.\n", 52) = 52
1     socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 36
1     setsockopt(36, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
1     bind(36, {sa_family=AF_INET, sin_port=htons(8082), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRINUSE (Address already in use)
1     write(23, "(reply (version 0) (result #f) (error (error (version 0) action-exception start guix-publish system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service guix-publish has been stopped.\")))", 208) = 208
1     close(23)
---snap---

The obvious explanation would be that stopping does not wait for the
process to actually exit. make-kill-destructor does not waitpid it seems
and 'running is set unconditionally to #f after 'stop has finished.

Cheers,
Lars




This bug report was last modified 2 years and 18 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.