GNU bug report logs - #63982
[Shepherd] shepherd does not handle signals after 'daemonize'

Previous Next

Package: guix;

Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Date: Fri, 9 Jun 2023 17:14:01 UTC

Severity: normal

Tags: moreinfo

Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#63982: closed ([Shepherd] shepherd does not handle signals
 after 'daemonize')
Date: Wed, 19 Jul 2023 01:12:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Tue, 18 Jul 2023 21:11:42 -0400
with message-id <87sf9kln75.fsf <at> gmail.com>
and subject line Re: bug#63982: Shepherd can crash when a user service fails to start
has caused the debbugs.gnu.org bug report #63982,
regarding [Shepherd] shepherd does not handle signals after 'daemonize'
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
63982: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=63982
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: bug-guix <bug-guix <at> gnu.org>
Subject: Shepherd can crash when a user service fails to start
Date: Fri, 09 Jun 2023 13:13:11 -0400
[Message part 3 (text/plain, inline)]
Hi!

I've noticed that while all my user services (managed via GNU Stow --
not via Guix Home) were working, 'herd status' would report that
/run/user/1000/shepherd/socket was missing and bail out.

Starting from a nonexistent /run/user/1000/shepherd/socket, using old
Shepherd 0.9.1:

--8<---------------cut here---------------start------------->8---
$ /gnu/store/dblbnj1yra4yrrfjbnzsa0ldcl3170ap-shepherd-0.9.1/bin/shepherd
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g115}#).  Add #:declarative? #f to your define-module invocation.

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.
$
Warning: due to a long standing Gtk+ bug
https://gitlab.gnome.org/GNOME/gtk/issues/221
Emacs might crash when run in daemon mode and the X11 connection is unexpectedly lost.
Using an Emacs configured with --with-x-toolkit=lucid does not have this problem.
Loading time (native compiled elisp)...
Loading time (native compiled elisp)...done
Loading /home/maxim/.emacs.d/recentf...
Loading /home/maxim/.emacs.d/recentf...done
Cleaning up the recentf list...
Cleaning up the recentf list...done (0 removed)
../../.emacs: Warning: Use keywords rather than deprecated positional arguments to `define-minor-mode'
Preparing diary...
No diary entries for Friday, June 9, 2023
Preparing diary...done
Appointment reminders enabled
Loading /home/maxim/.emacs.d/emms/cache...
Loading /home/maxim/.emacs.d/emms/cache...done
[yas] Prepared just-in-time loading of snippets successfully.
[yas] Prepared just-in-time loading of snippets successfully.
Starting new Ispell process aspell with english dictionary... \
Starting new Ispell process aspell with english dictionary...done
Starting Emacs daemon.
Unable to start the daemon.
Another instance of Emacs is running the server, either as daemon or interactively.
You can use emacsclient to connect to that Emacs process.
Saving file /home/maxim/.emacs.d/emms/history...
Wrote /home/maxim/.emacs.d/emms/history
Wrote /home/maxim/.emacs.d/recentf
Error: server did not start correctly
Service emacs could not be started.
gpg-agent: a gpg-agent is already running - not starting a new one
Service gpg-agent could not be started.
Service ibus-daemon has been started.

$ herd status
Started:
 + ibus-daemon
 + root
Stopped:
 - emacs
 - gpg-agent
 - jackd
 - workrave
--8<---------------cut here---------------end--------------->8---

If I then run it anew, it fails with "shepherd: while opening socket
'/run/user/1000/shepherd/socket': bind: Address already in use", because
apparently 'herd stop root' didn't remove it.

--8<---------------cut here---------------start------------->8---
$ herd stop root
Exiting.
[...]

$ /gnu/store/dblbnj1yra4yrrfjbnzsa0ldcl3170ap-shepherd-0.9.1/bin/shepherd
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g115}#).  Add #:declarative? #f to your define-module invocation.

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.
maxim <at> hurd ~/src/guix [env]$
Warning: due to a long standing Gtk+ bug
https://gitlab.gnome.org/GNOME/gtk/issues/221
Emacs might crash when run in daemon mode and the X11 connection is unexpectedly lost.
Using an Emacs configured with --with-x-toolkit=lucid does not have this problem.
Loading time (native compiled elisp)...
Loading time (native compiled elisp)...done
Loading /home/maxim/.emacs.d/recentf...
Loading /home/maxim/.emacs.d/recentf...done
Cleaning up the recentf list...
Cleaning up the recentf list...done (0 removed)
../../.emacs: Warning: Use keywords rather than deprecated positional arguments to `define-minor-mode'
Preparing diary...
No diary entries for Friday, June 9, 2023
Preparing diary...done
Appointment reminders enabled
Loading /home/maxim/.emacs.d/emms/cache...
Loading /home/maxim/.emacs.d/emms/cache...done
[yas] Prepared just-in-time loading of snippets successfully.
[yas] Prepared just-in-time loading of snippets successfully.
Starting new Ispell process aspell with english dictionary... \
Starting new Ispell process aspell with english dictionary...done
Starting Emacs daemon.
Unable to start the daemon.
Another instance of Emacs is running the server, either as daemon or interactively.
You can use emacsclient to connect to that Emacs process.
Saving file /home/maxim/.emacs.d/emms/history...
Wrote /home/maxim/.emacs.d/emms/history
Wrote /home/maxim/.emacs.d/recentf
Error: server did not start correctly
Service emacs could not be started.
gpg-agent: a gpg-agent is already running - not starting a new one
Service gpg-agent could not be started.
Service ibus-daemon has been started.
shepherd: while opening socket '/run/user/1000/shepherd/socket': bind: Address already in use

Exiting shepherd...
Service ibus-daemon has been stopped.

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.

$
--8<---------------cut here---------------end--------------->8---

Even after removing it manually with 'rm
/run/user/1000/shepherd/socket', it still fails:

--8<---------------cut here---------------start------------->8---
$ /gnu/store/dblbnj1yra4yrrfjbnzsa0ldcl3170ap-shepherd-0.9.1/bin/shepherd
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g115}#).  Add #:declarative? #f to your define-module invocation.

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.
maxim <at> hurd ~/src/guix [env]$
Warning: due to a long standing Gtk+ bug
https://gitlab.gnome.org/GNOME/gtk/issues/221
Emacs might crash when run in daemon mode and the X11 connection is unexpectedly lost.
Using an Emacs configured with --with-x-toolkit=lucid does not have this problem.
Loading time (native compiled elisp)...
Loading time (native compiled elisp)...done
Loading /home/maxim/.emacs.d/recentf...
Loading /home/maxim/.emacs.d/recentf...done
Cleaning up the recentf list...
Cleaning up the recentf list...done (0 removed)
../../.emacs: Warning: Use keywords rather than deprecated positional arguments to `define-minor-mode'
Preparing diary...
No diary entries for Friday, June 9, 2023
Preparing diary...done
Appointment reminders enabled
Loading /home/maxim/.emacs.d/emms/cache...
Loading /home/maxim/.emacs.d/emms/cache...done
[yas] Prepared just-in-time loading of snippets successfully.
[yas] Prepared just-in-time loading of snippets successfully.
Starting new Ispell process aspell with english dictionary... \
Starting new Ispell process aspell with english dictionary...done
Starting Emacs daemon.
Unable to start the daemon.
Another instance of Emacs is running the server, either as daemon or interactively.
You can use emacsclient to connect to that Emacs process.
Saving file /home/maxim/.emacs.d/emms/history...
Wrote /home/maxim/.emacs.d/emms/history
Wrote /home/maxim/.emacs.d/recentf
Error: server did not start correctly
Service emacs could not be started.
gpg-agent: a gpg-agent is already running - not starting a new one
Service gpg-agent could not be started.
Service ibus-daemon has been started.
shepherd: while opening socket '/run/user/1000/shepherd/socket': bind: Address already in use

Exiting shepherd...
Service ibus-daemon has been stopped.

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.
--8<---------------cut here---------------end--------------->8---

It apparently is caused by Emacs failing to start, because if I comment
it out from the init.scm file, then the same Shepherd invocation is
happy:

--8<---------------cut here---------------start------------->8---
;; Services to start when shepherd starts:
(for-each start '(;emacs
		  gpg-agent
		  ibus-daemon))
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
$ herd status
Started:
 + ibus-daemon
 + root
Stopped:
 - emacs
 - gpg-agent
 - jackd
 - workrave
--8<---------------cut here---------------end--------------->8---

But that's with Shepherd 0.9.1.  If I run the exact same config that now
works, I see:

--8<---------------cut here---------------start------------->8---
rm /run/user/1000/shepherd/socket

$ /gnu/store/y826g8wrpzskcs82ffxppj7mmz257ksi-shepherd-0.10.1/bin/shepherd
Starting service root...
Service root started.
Service root running with value #t.
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g119}#).  Add #:declarative? #f to your define-module invocation.

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.
Starting service gpg-agent...

$ herd status
herd: error: /run/user/1000/shepherd/socket: No such file or directory

$ file /run/user/1000/shepherd/socket
/run/user/1000/shepherd/socket: cannot open `/run/user/1000/shepherd/socket' (No such file or directory)

$ pgrep -a shepherd
1 /gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/nl0948z46yndpx3kihhi540l5c422wv4-shepherd-0.10.0/bin/shepherd --config /gnu/store/7dxbjccbqamk4wa0nyf7zsc4ywimb1fh-shepherd.conf
24700 /gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/y826g8wrpzskcs82ffxppj7mmz257ksi-shepherd-0.10.1/bin/shepherd
--8<---------------cut here---------------end--------------->8---

It seems a bug exists in both 0.9.1 and 0.10.1, but that something also
regressed going from 0.9.1 to 0.10.1.

Attached are the two relevant
Shepherd config files to test:

[init.scm (application/octet-stream, attachment)]
[services.scm (application/octet-stream, attachment)]
[Message part 6 (text/plain, inline)]
-- 
Thanks,
Maxim
[Message part 7 (message/rfc822, inline)]
From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 63982-done <at> debbugs.gnu.org
Subject: Re: bug#63982: Shepherd can crash when a user service fails to start
Date: Tue, 18 Jul 2023 21:11:42 -0400
Hey Ludo!

Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi!
>
> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> Turns out that this happens when calling the ‘daemonize’ action on
>> ‘root’.  I have a reproducer now and am investigating…
>
> Good news: this is fixed in Shepherd commit
> f4272d2f0f393d2aa3e9d76b36ab6aa5f2fc72c2!
>
> The root cause is inconsistent semantics when mixing epoll, signalfd,
> and fork, specifically this part from signalfd(2):
>
>    epoll(7) semantics
>        If  a  process adds (via epoll_ctl(2)) a signalfd file descriptor to an
>        epoll(7) instance, then epoll_wait(2) returns events only  for  signals
>        sent  to that process.  In particular, if the process then uses fork(2)
>        to create a child process, then the child will be able to read(2)  sig‐
>        nals  that  are  sent  to  it  using  the signalfd file descriptor, but
>        epoll_wait(2) will not indicate that the signalfd  file  descriptor  is
>        ready.   In  this  scenario,  a  possible  workaround is that after the
>        fork(2), the child process can close the signalfd file descriptor  that
>        it  inherited  from the parent process and then create another signalfd
>        file descriptor and add it to the epoll instance. […]
>
> The C program below illustrates this behavior:
>
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/signal.h>
> #include <sys/signalfd.h>
> #include <sys/epoll.h>
>
> int
> main ()
> {
>   int ep, sfd;
>
>   sigset_t signals;
>   sigemptyset (&signals);
>   sigaddset (&signals, SIGINT);
>   sigaddset (&signals, SIGHUP);
>
>   sigprocmask (SIG_BLOCK, &signals, NULL);
>   sfd = signalfd (-1, &signals, SFD_CLOEXEC);
>
>   ep = epoll_create1 (EPOLL_CLOEXEC);
>
>   struct epoll_event events = { .events = EPOLLIN | EPOLLONESHOT, .data = NULL };
>   epoll_ctl (ep, EPOLL_CTL_ADD, sfd, &events);
>
>   epoll_wait (ep, &events, 1, 123);
>
>   if (fork () == 0)
>     {
>       /* Quoth signalfd(2):
>
> 	 If  a  process adds (via epoll_ctl(2)) a signalfd file descriptor to an
> 	 epoll(7) instance, then epoll_wait(2) returns events only  for  signals
> 	 sent  to that process.  In particular, if the process then uses fork(2)
> 	 to create a child process, then the child will be able to read(2)  sig‐
> 	 nals  that  are  sent  to  it  using  the signalfd file descriptor, but
> 	 epoll_wait(2) will not indicate that the signalfd  file  descriptor  is
> 	 ready.   */
>
>       printf ("try this: kill -INT %i\n", getpid ());
>       while (1)
> 	{
> 	  struct signalfd_siginfo info;
> 	  if (epoll_wait (ep, &events, 1, 777) > 0)
> 	    {
> 	      read (sfd, &info, sizeof info);
> 	      printf ("got signal %i!\n", info.ssi_signo);
> 	      epoll_ctl (ep, EPOLL_CTL_MOD, sfd, &events);
> 	    }
> 	}
>     }
>
>   return 0;
> }
>
>
> Of course it took me a while to find out about this; I first looked at
> things individually and didn’t expect the mixture to behave
> inconsistently.

Tricky!  Thanks for sharing the result of your investigation, it's
always enlightening!

> Maxim, let me know if it works for you!

Better than ever!  Thanks a lot for fixing the various issues reported
here.

I'm closing this one!

-- 
Thanks,
Maxim


This bug report was last modified 1 year and 364 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.