GNU bug report logs - #74912
Guix Home leaves user shepherd on logout, starts new instance on login

Previous Next

Package: guix;

Reported by: Jake <jforst.mailman <at> gmail.com>

Date: Mon, 16 Dec 2024 14:24:01 UTC

Severity: important

Merged with 67863, 76998

Done: Ludovic Courtès <ludo <at> gnu.org>

To reply to this bug, email your comments to 74912 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Mon, 16 Dec 2024 14:24:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jake <jforst.mailman <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 16 Dec 2024 14:24:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jake <jforst.mailman <at> gmail.com>
To: bug-guix <at> gnu.org
Cc: ludovic.courtes <at> inria.fr
Subject: Shepherd: Growing number of user shepherds when relogging
Date: Mon, 16 Dec 2024 14:23:20 +0000
[Message part 1 (text/plain, inline)]
Hi

I think I'm experiencing a bug in Shepherd since version 1.0.
Whenever I log out and log back in again, my user shepherd from the
previous login session is still present, and a new user shepherd spawns for
the current login session.
So relogging N times results in N+1 user shepherds.

For example, I have relogged 5 times since I last rebooted:

$ herd status root
Status of root:
  It is running since 00:30:02 (10 minutes ago).
  Main PID: 23450
  Command:
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
...

$ pgrep shepherd
1
9891
10777
16417
18510
21960
23450

$  ps aux | grep shepherd
root         1  0.0  0.9 222872 74456 ?        Sl   Dec15   0:08
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--config /gnu/store/p7al8wd1inwk8f5di2q4llcpd64mjn5q-shepherd.conf
jake      9891  0.0  0.2  75816 23624 ?        Ss   Dec15   0:04
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     10777  0.0  0.3  76224 24752 ?        Ss   Dec16   0:03
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     16417  0.0  0.3  75752 24004 ?        Ss   Dec16   0:02
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     18510  0.0  0.2  75752 23760 ?        Ss   Dec16   0:01
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     21960  0.0  0.2 114608 22124 ?        Ss   Dec16   0:00
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     23450  0.0  0.2 114204 21328 ?        Ss   00:30   0:00
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     23672  0.0  0.0   6636  2552 pts/1    S+   00:32   0:00 grep
--color=auto shepherd

In addition, any daemons managed by the zombie shepherds also persist!

I'm experiencing this on both of my Guix System machines. One is running
GDM and XFCE. The other is running GDM and CWM.
Please let me know if I can provide more information.

Thanks
Jake
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Wed, 18 Dec 2024 22:39:01 GMT) Full text and rfc822 format available.

Message #8 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Jake <jforst.mailman <at> gmail.com>
Cc: 74912 <at> debbugs.gnu.org
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Wed, 18 Dec 2024 23:35:58 +0100
Hello,

Jake <jforst.mailman <at> gmail.com> skribis:

> I think I'm experiencing a bug in Shepherd since version 1.0.
> Whenever I log out and log back in again, my user shepherd from the
> previous login session is still present, and a new user shepherd spawns for
> the current login session.
> So relogging N times results in N+1 user shepherds.

I have a user shepherd via Guix Home and I experience the same problem
(though because I rarely log out it’s not really annoying :-)).

I suspect the problem has to do with how Guix Home determines whether or
not it should launch shepherd, but I haven’t checked yet.

Thanks for reporting the issue,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Thu, 19 Dec 2024 00:30:02 GMT) Full text and rfc822 format available.

Message #11 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Thu, 19 Dec 2024 01:29:13 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hello,
>
> Jake <jforst.mailman <at> gmail.com> skribis:
>
>> I think I'm experiencing a bug in Shepherd since version 1.0.
>> Whenever I log out and log back in again, my user shepherd from the
>> previous login session is still present, and a new user shepherd spawns for
>> the current login session.
>> So relogging N times results in N+1 user shepherds.
>
> I have a user shepherd via Guix Home and I experience the same problem
> (though because I rarely log out it’s not really annoying :-)).
>
> I suspect the problem has to do with how Guix Home determines whether or
> not it should launch shepherd, but I haven’t checked yet.

When you have another login session active when you log out and in
again, new shepherd is *not* spawned.  I am guessing here but probably
last log out causes XDG_RUNTIME_DIR to be removed (by elogind in my
case), so on log in there is no /run/user/$UID/on-first-login-executed,
so it runs again and starts the shepherd.

But even if that would be solved, since the runtime directory was nuked,
there is no shepherd socket around anymore, so the (still running)
shepherd from previous login session cannot be contacted by herd.

Of the top of my head I can think of two possible solutions:

1. Stop the shepherd on log out.  So as we have on-first-login, we would
have on-last-logout.  I have no idea how to implement that.  Maybe we
could use ~/.bash_logout?  Or some PAM thing?

2. Shepherd could shutdown gracefully when the control socket is deleted
from the file system.  It is arguable how useful running shepherd is
without the socket anyway.

Any other ideas?

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Thu, 26 Dec 2024 10:51:02 GMT) Full text and rfc822 format available.

Message #14 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Thu, 26 Dec 2024 11:50:00 +0100
Hi!

Tomas Volf <~@wolfsden.cz> skribis:

> When you have another login session active when you log out and in
> again, new shepherd is *not* spawned.  I am guessing here but probably
> last log out causes XDG_RUNTIME_DIR to be removed (by elogind in my
> case), so on log in there is no /run/user/$UID/on-first-login-executed,
> so it runs again and starts the shepherd.
>
> But even if that would be solved, since the runtime directory was nuked,
> there is no shepherd socket around anymore, so the (still running)
> shepherd from previous login session cannot be contacted by herd.

Hmm, when is /run/user/UID deleted?

> Of the top of my head I can think of two possible solutions:
>
> 1. Stop the shepherd on log out.  So as we have on-first-login, we would
> have on-last-logout.  I have no idea how to implement that.  Maybe we
> could use ~/.bash_logout?  Or some PAM thing?

Or some elogind thing, rather?

But then, how do we make it work on other distros?  Maybe on systemd
distros shepherd receives SIGTERM or something, in which case it
terminates properly.

> 2. Shepherd could shutdown gracefully when the control socket is deleted
> from the file system.  It is arguable how useful running shepherd is
> without the socket anyway.

I don’t think that’s workable: you’d need to poll/inotify for the
existence of that socket, but even if it exists on the file system, you
cannot tell whether it matches the socket you’re accepting on.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Thu, 26 Dec 2024 17:26:01 GMT) Full text and rfc822 format available.

Message #17 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: bokr <at> bokr.com
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org,
 Tomas Volf <~@wolfsden.cz>
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Thu, 26 Dec 2024 09:25:18 -0800
On +2024-12-26 11:50:00 +0100, Ludovic Courtès wrote:
> Hi!
> 
> Tomas Volf <~@wolfsden.cz> skribis:
> 
> > When you have another login session active when you log out and in
> > again, new shepherd is *not* spawned.  I am guessing here but probably
> > last log out causes XDG_RUNTIME_DIR to be removed (by elogind in my
> > case), so on log in there is no /run/user/$UID/on-first-login-executed,
> > so it runs again and starts the shepherd.
> >
> > But even if that would be solved, since the runtime directory was nuked,
> > there is no shepherd socket around anymore, so the (still running)
> > shepherd from previous login session cannot be contacted by herd.
> 
> Hmm, when is /run/user/UID deleted?
> 
> > Of the top of my head I can think of two possible solutions:
> >
> > 1. Stop the shepherd on log out.  So as we have on-first-login, we would
> > have on-last-logout.  I have no idea how to implement that.  Maybe we
> > could use ~/.bash_logout?  Or some PAM thing?
> 
> Or some elogind thing, rather?
> 
> But then, how do we make it work on other distros?  Maybe on systemd
> distros shepherd receives SIGTERM or something, in which case it
> terminates properly.
> 
> > 2. Shepherd could shutdown gracefully when the control socket is deleted
> > from the file system.  It is arguable how useful running shepherd is
> > without the socket anyway.
> 
> I don’t think that’s workable: you’d need to poll/inotify for the
> existence of that socket, but even if it exists on the file system, you
> cannot tell whether it matches the socket you’re accepting on.
> 
> Ludo’.
> 
> 
> 

I wonder how many guix-daemon-process-relationship type problems would be simplified
if (radical vision) one let wayland's inner event-driven loop/protocol be the dispatcher
for guix processes instead of the current guix daemon switching between its collection of threads.
I.e., all the guix threads would be individual login or spawned user processes securely communicating
virtualizably (shared memory or networked rendezvous buffers etc) for offloading?




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Fri, 27 Dec 2024 23:20:02 GMT) Full text and rfc822 format available.

Message #20 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Sat, 28 Dec 2024 00:19:03 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi!
>
> Tomas Volf <~@wolfsden.cz> skribis:
>
>> When you have another login session active when you log out and in
>> again, new shepherd is *not* spawned.  I am guessing here but probably
>> last log out causes XDG_RUNTIME_DIR to be removed (by elogind in my
>> case), so on log in there is no /run/user/$UID/on-first-login-executed,
>> so it runs again and starts the shepherd.
>>
>> But even if that would be solved, since the runtime directory was nuked,
>> there is no shepherd socket around anymore, so the (still running)
>> shepherd from previous login session cannot be contacted by herd.
>
> Hmm, when is /run/user/UID deleted?

I believe it is done by elogind (in my setup) when last user session
(for the given UID) logs out.  If I grepped right, it is done by
user_finalize function in logind-user.c.

It (AFAIUT) it should be performed when last session of the seat
terminates.  So if you log only into a single TTY, the XDG_RUNTIME_DIR
will be removed on every log out.

>
>> Of the top of my head I can think of two possible solutions:
>>
>> 1. Stop the shepherd on log out.  So as we have on-first-login, we would
>> have on-last-logout.  I have no idea how to implement that.  Maybe we
>> could use ~/.bash_logout?  Or some PAM thing?
>
> Or some elogind thing, rather?

I looked around the manual page, but did not found anything.  There is
KillUserProcesses, but that feels like fairly big hammer, and something
that should *not* be enabled by default.

We could patch elogind to add new RemoveRuntimeDirectory boolean flag to
allow keeping the XDG_RUNTIME_DIR even after last log out (I personally
would prefer that behavior anyway).  I am not sure what our policy
regarding patches here is.

>
> But then, how do we make it work on other distros?  Maybe on systemd
> distros shepherd receives SIGTERM or something, in which case it
> terminates properly.

No idea here.  ~/.bash_logout?

>
>> 2. Shepherd could shutdown gracefully when the control socket is deleted
>> from the file system.  It is arguable how useful running shepherd is
>> without the socket anyway.
>
> I don’t think that’s workable: you’d need to poll/inotify for the
> existence of that socket, but even if it exists on the file system, you
> cannot tell whether it matches the socket you’re accepting on.

For files I would suggest checking if both `stat:dev' and `stat:ino'
match in order to detect whether it is the same file.  Not sure if same
strategy can be used for unix sockets.

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Fri, 27 Dec 2024 23:21:03 GMT) Full text and rfc822 format available.

Message #23 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: bokr <at> bokr.com
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org,
 Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Sat, 28 Dec 2024 00:20:55 +0100
[Message part 1 (text/plain, inline)]
I am not sure how this relates to this specific bug report, but

bokr <at> bokr.com writes:

> I wonder how many guix-daemon-process-relationship type problems would be simplified
> if (radical vision) one let wayland's inner event-driven loop/protocol
> be the dispatcher

not everyone uses wayland.

> for guix processes instead of the current guix daemon switching between its collection of threads.
> I.e., all the guix threads would be individual login or spawned user processes securely communicating
> virtualizably (shared memory or networked rendezvous buffers etc) for offloading?

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 07 Jan 2025 23:00:01 GMT) Full text and rfc822 format available.

Merged 67863 74912. Request was from Julian Flake <flake <at> uni-koblenz.de> to control <at> debbugs.gnu.org. (Sat, 11 Jan 2025 21:55:02 GMT) Full text and rfc822 format available.

Merged 67863 74912 76998. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 15 Mar 2025 10:58:03 GMT) Full text and rfc822 format available.

Changed bug title to 'Guix Home leaves user shepherd on logout, starts new instance on login' from 'Shepherd: Growing number of user shepherds when relogging' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 17 Mar 2025 19:38:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Wed, 26 Mar 2025 12:19:04 GMT) Full text and rfc822 format available.

Message #34 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> friendly-machines.com>
To: Tomas Volf <~@wolfsden.cz>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org,
 Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Wed, 26 Mar 2025 13:18:23 +0100
Hi,

>KillUserProcesses

Warning: That actually runs on every session logout (if enabled at all),
not just once per user.  Also, I think session_stop_scope is commented
out in our elogind, so it won't actually kill anything.  If it hadn't been
commented out, it would have used dbus to communicate with systemd to
stop a special (session) scope unit (see "manager_stop_unit").  That is
a good idea--to have only one guy managing all the user processes
(in order to prevent races).

>We could patch elogind to add new RemoveRuntimeDirectory boolean flag to
>allow keeping the XDG_RUNTIME_DIR even after last log out (I personally
>would prefer that behavior anyway).

About the implication:
I would prefer if random user processes would not linger after I logged out.
What possible good can come from that?  And definitely not have my
user services linger after I logged out.

> ~/.bash_logout?

I think first we have to decide whether shepherd should run per user or
per session.  These are not the same.  This is a design decision--and it
HAS to be decided--otherwise nothing will work right.  There is a risk
of data loss (backups run by shepherd step on each other's toes etc)
until that's decided.

I think shepherd should be run once per user, not per session.

I also think the on-first-login handling in guix home means that at
least guix home has already decided on shepherd once per user.

There used to be a check in shepherd to ensure that it can only run at
most once per user at the same time.  It wasn't perfect--but I mean that
even shepherd itself apparently had decided on shepherd once per user.

>>> 2. Shepherd could shutdown gracefully when the control socket is deleted
>> from the file system.  It is arguable how useful running shepherd is
>> without the socket anyway.

I recommend against magic like this.  I don't think it's possible to do this
in a way that is atomic.

Also, in an ideal world this would have been the way things worked in the
first place--but we aren't in that world.  So I don't think it would be
wise to single out just one UNIX program, shepherd, and do it just for
that.
If you want to do stuff like that, add it to the POSIX standard.
Otherwise it's too surprising.

I would suggest the following:

(1) For Guix native, patch elogind[i] to also kill -TERM shepherd
(See user_stop_service--which is for that).
How does it find the shepherd process, specifically?

So elogind probably could also start

  /run/current-system/profile/bin/shepherd
  (with which config?)

on first user session login (and remember its pid)
(See user_start_service--which is for that, anyway).

elogind also has control over the directory with the socket file, so
I think it's the best place to also control the process.

Alternatively, we'd tell system shepherd to do it.
If shepherd could do dbus, dbus is already hooked up in elogind.

elogind's "sd_event_source" already has "child": "process_owned",
"exited", "waited"; and "sd_event_add_child" exists and is used for
"brightness_writer_fork"--haha totally random functionality.
But that means there's already a process manager hooked up in elogind.
It also has "kill_and_sigcont" and/or "sigterm_wait"--which we'd
probably use.

(2) When a foreign distro uses systemd (there's a very high chance it
does), then we can just install shepherd as a systemd user unit
(from guix-install.sh).  systemd will do the right thing, the end.

(3) Maybe use .bash_logout and have it invoke "w" (or "loginctl") to see
whether we are the last session of that user (that would have a race...).
If we are, then kill shepherd.

I have seen bugs that it doesn't add an entry to "w" even though you
logged in.  Then we'd be out of luck for (3).

Also, it would have a race anyway--even otherwise.

So maybe let's not do (3)--although it was a good find (cool that that
exists!).

------

What about shepherd's child processes (for example services)?
Will shepherd clean those up on shepherd termination?

There are also abstract UNIX domain sockets (think URN) that don't have
or need a filesystem entry.
It might be a good idea to use that for shepherd and prevent problem
stemming from the /run/user/xxx deletion.  But in my opinion, stopping
user shepherd (once user logged out of all their sessions) is more
important than that, anyway.

[i] Would cause 3571 dependents to rebuild

P.S. in elogind, almost the entire cgroup handling in src/core/cgroup.c
has been disabled.  That's disappointing.  Someday, we should have cgroup
support as well!




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Tue, 01 Apr 2025 10:14:04 GMT) Full text and rfc822 format available.

Message #37 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> friendly-machines.com>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org,
 Tomas Volf <~@wolfsden.cz>
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Tue, 01 Apr 2025 12:13:43 +0200
Hi Danny,

Danny Milosavljevic <dannym <at> friendly-machines.com> skribis:

> I would suggest the following:
>
> (1) For Guix native, patch elogind[i] to also kill -TERM shepherd
> (See user_stop_service--which is for that).
> How does it find the shepherd process, specifically?

I think ‘user_stop_service’ could run:

  herd stop root -s /run/user/$UID/shepherd/socket

> So elogind probably could also start
>
>   /run/current-system/profile/bin/shepherd
>   (with which config?)
>
> on first user session login (and remember its pid)
> (See user_start_service--which is for that, anyway).

Oh yes, that too.

> (2) When a foreign distro uses systemd (there's a very high chance it
> does), then we can just install shepherd as a systemd user unit
> (from guix-install.sh).  systemd will do the right thing, the end.

I wouldn’t do it from ‘guix-install.sh’ because it only makes sense if
you’re going to use Guix Home; and if you use Guix Home, it has its own
way of starting shepherd.

> (3) Maybe use .bash_logout and have it invoke "w" (or "loginctl") to see
> whether we are the last session of that user (that would have a race...).
> If we are, then kill shepherd.

Yes.

Question is how to keep Home portable between Guix and foreign distros.
Neither the elogind nor the systemd approach are portable; the
‘.bash_logout’ thing may be portable, but it’s probably more fragile.

Maybe we shouldn’t try to be portable, and first start by fixing the
problem on Guix System?

> What about shepherd's child processes (for example services)?
> Will shepherd clean those up on shepherd termination?

Yes: if you ‘herd stop root’ or send SIGTERM to shepherd, it will shut
down all the services properly.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Mon, 14 Apr 2025 08:10:02 GMT) Full text and rfc822 format available.

Message #40 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> friendly-machines.com>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org,
 Tomas Volf <~@wolfsden.cz>, Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#74912: Shepherd: Growing number of user shepherds when
 relogging
Date: Mon, 14 Apr 2025 10:08:04 +0200
Hi Danny and all,

Following reports by Daniel Littlewood, who talked about involuntarily
running a second shepherd instance shadowing the previous one (this time
not in a Guix Home context), I realized shepherd itself could avoid this
entirely.

So shepherd will now refuse to start when it determines that an instance
is already listening on its socket:

  https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=787d5a33aea061b5052faa0863c96be722440ce3

Feedback welcome!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Wed, 14 May 2025 17:05:02 GMT) Full text and rfc822 format available.

Message #43 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> friendly-machines.com>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org,
 Tomas Volf <~@wolfsden.cz>, 76998-done <at> debbugs.gnu.org,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Wed, 14 May 2025 18:06:11 +0200
Hi,

Ludovic Courtès <ludo <at> gnu.org> writes:

> So shepherd will now refuse to start when it determines that an instance
> is already listening on its socket:
>
>   https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=787d5a33aea061b5052faa0863c96be722440ce3

This commit is in 1.0.4.  Closing!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Thu, 15 May 2025 02:18:02 GMT) Full text and rfc822 format available.

Message #46 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Jake <jforst.mailman <at> gmail.com>
To: 74912 <at> debbugs.gnu.org
Cc: ludo <at> gnu.org
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Thu, 15 May 2025 02:16:54 +0000
[Message part 1 (text/plain, inline)]
Hi Ludo’

That commit made a difference but didn't fix the problem for me.
After a couple of relogs since the last reboot:

$ herd --v
herd (GNU Shepherd) 1.0.4
Copyright (C) 2025 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law

$ pgrep shepherd
1
1491
9303
28426

$ herd status
Started:
 + gpg-agent
 + root
 + timer
 + transient
Running timers:
 + log-rotation
Failed to start:
 ! dicod

The only difference from before is that now that home dicod service fails
to start on relog.

Thanks
Jake
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Thu, 15 May 2025 08:34:02 GMT) Full text and rfc822 format available.

Message #49 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Jake <jforst.mailman <at> gmail.com>
Cc: 74912 <at> debbugs.gnu.org
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Thu, 15 May 2025 10:32:35 +0200
Hi Jake,

Jake <jforst.mailman <at> gmail.com> writes:

> That commit made a difference but didn't fix the problem for me.
> After a couple of relogs since the last reboot:

Hmm is /run/user/$UID deleted when logging out?  This would explain that
the fix in the Shepherd doesn’t make any difference.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Thu, 15 May 2025 10:21:02 GMT) Full text and rfc822 format available.

Message #52 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Jake <jforst.mailman <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Thu, 15 May 2025 19:50:25 +0930
[Message part 1 (text/plain, inline)]
> is /run/user/$UID deleted when logging out?

I think so, since the timestamps in /run/uswr/$UID are updated to the
new login time.

Jake

On Thu, 15 May 2025 at 6:03 pm, Ludovic Courtès <ludo <at> gnu.org> wrote:

> Hi Jake,
>
> Jake <jforst.mailman <at> gmail.com> writes:
>
> > That commit made a difference but didn't fix the problem for me.
> > After a couple of relogs since the last reboot:
>
> Hmm is /run/user/$UID deleted when logging out?  This would explain that
> the fix in the Shepherd doesn’t make any difference.
>
> Ludo’.
>
[Message part 2 (text/html, inline)]

Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 17 May 2025 15:34:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Sun, 18 May 2025 12:32:02 GMT) Full text and rfc822 format available.

Message #57 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> friendly-machines.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, Tomas Volf <~@wolfsden.cz>,
 76998-done <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#74912: bug#76998: Guix Home leaves user shepherd on logout,
 starts new instance on login
Date: Sun, 18 May 2025 14:30:49 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

That is not a fix.  It's a workaround for now.

It's good that the "is a shepherd already running" check is back in shepherd.  It was in shepherd years ago, then got removed without explanation, then now it's back again (now in a very convoluted but safer way).  This shouldn't have been removed in the first place.  It's EXTREMELY dangerous to have multiple parallel shepherds for the same user (automated backup service destroying backups etc).  Please, let's not remove it ever again.

In any case, what shepherd 1.0.4 does is stop the bleeding, but not fix the problem:
It prevents two (or 100) user shepherds for the same user from running in parallel.
It does not stop shepherd when a user closed all their sessions.

Why close this bug report before elogind is patched and before ~/.bash_logout is generated in guix home?  That makes no sense.

Also, I don't understand why this is so broken for so long.  Isn't Guix used in HPC?
Doesn't HPC need support for multiple sessions for the same user on day one?

My untested elogind patch that invokes shepherd root stop is attached.  Reading the elogind source code, especially what they patched out and what they added themselves, makes me despair.  Why is it so terrible?  That all used to be fine! :P

Even my patch is not great.  A service manager's job is to manage services.  PID 1 is the main service manager.  It should manage services.  One of those services should be the user's shepherd, which should be managed by PID 1 shepherd and not weirdly attached to an already-running session (WTF!) of the user by this:

~$ cat ~/.profile
HOME_ENVIRONMENT=$HOME/.guix-home
. $HOME_ENVIRONMENT/setup-environment
$HOME_ENVIRONMENT/on-first-login
unset HOME_ENVIRONMENT

In my opinion, no one but the service manager should manage services.  Does ~/.profile look like a service manager?  No :P

I understand that we want to support this on non-guix-system stuff.  But the default should be a systemd user service to run the user shepherd.  If the user absolutely wants to do a workaround like ~/.profile above, fine, they can.  But let's not do that by default.

The problems with my elogind patch are the following:
- What if "herd stop root -s ..." hangs?  Then elogind hangs forever?  No one can log in or out anymore?  That's not okay.  Therefore, I don't wait.  Now user processes can have the floor upon they are walking removed on user stop, while they still need it :P
- When can /run/user/1000 be deleted?  There's a weird GC mechanism in elogind for that, and my patch says it can be deleted before waiting on the result of herd stop (see above why).  If I DID wait on the result of herd stop, I could wait indefinitely--which is not okay.  I think elogind uses signalfd, so I can't waitpid in a random spot either, or wait until waitpid returned.  I think the user shepherd knows when to delete /run/user/1000--and no one else.  But if user shepherd crashes, it won't delete /run/user/1000 and we want it to be able to start again even when /run/user/1000 is still there.  Hence complicated shepherd fix in 1.0.4 is useful.
- There is tool_fork_pid and sleep_fork_pid in elogind which is not a queue.  And, again, that is trying to be a service manager.  What if those scripts hang?  What if they DON'T hang?  Similar questions as before.  Separate the concerns already :P

Personally, I'd also like something that, if all sessions of user x are closed, it kills all remaining processes of that effective user id.  elogind has a setting KillUserProcesses that--despite the name--kills (WHICH!?) processes when a SESSION (of 42 sessions of that user :P) is closed.  Who wants THAT?  And even if someone does: how would THAT be implemented?

elogind is like containers never happened.  It's so weird.

I think to fix this problem for good, first there needs to be a system diagram created on how this is supposed to work.

[ELOGIND.patch (text/x-patch, attachment)]

Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 04 Jun 2025 16:26:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Sat, 14 Jun 2025 21:28:02 GMT) Full text and rfc822 format available.

Message #62 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> friendly-machines.com>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, Tomas Volf <~@wolfsden.cz>,
 76998-done <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Sat, 14 Jun 2025 23:26:53 +0200
Hi Danny,

Danny Milosavljevic <dannym <at> friendly-machines.com> writes:

> It's good that the "is a shepherd already running" check is back in
> shepherd.  It was in shepherd years ago, then got removed without
> explanation, then now it's back again (now in a very convoluted but
> safer way).  This shouldn't have been removed in the first place.
> It's EXTREMELY dangerous to have multiple parallel shepherds for the
> same user (automated backup service destroying backups etc).  Please,
> let's not remove it ever again.

If you’re referring to 649a98a6697d358a53eccc45b387e5130278b5ec (8 years
ago), I believe it wasn’t doing just that due to lingering.

(Aside: I think the tone of this paragraph is uncalled for.)

> In any case, what shepherd 1.0.4 does is stop the bleeding, but not fix the problem:
> It prevents two (or 100) user shepherds for the same user from running in parallel.
> It does not stop shepherd when a user closed all their sessions.

Yes.  It just occurred to me that we probably just got it wrong from the
start: ‘XDG_RUNTIME_DIR’ (/run/user/$UID) is specified as having limited
lifetime.  Quoth
<https://specifications.freedesktop.org/basedir-spec/latest/>:

  The lifetime of the directory MUST be bound to the user being logged
  in.  It MUST be created when the user first logs in and if the user
  fully logs out the directory MUST be removed.

So it was probably a bad idea in the first place for shepherd to store
its socket in /run/user/$UID (even more so that this directory doesn’t
exist on systems without elogind/systemd).  GnuPG avoids
$XDG_RUNTIME_DIR for exactly this reason (there’s a comment in
‘homedir.c’).

So, what can we do?

In the Shepherd 1.1, we could default to $XDG_STATE_HOME instead; we
probably shouldn’t change that in 1.0.x.

Any other idea?

Thanks,
Ludo’.




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 14 Jun 2025 22:38:07 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Sun, 15 Jun 2025 13:41:02 GMT) Full text and rfc822 format available.

Message #67 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, 76998-done <at> debbugs.gnu.org,
 Jake <jforst.mailman <at> gmail.com>, Daniel Littlewood <dan <at> danielittlewood.xyz>,
 Danny Milosavljevic <dannym <at> friendly-machines.com>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Sun, 15 Jun 2025 15:40:33 +0200
Hello :)

Ludovic Courtès <ludo <at> gnu.org> writes:

>> In any case, what shepherd 1.0.4 does is stop the bleeding, but not fix the problem:
>> It prevents two (or 100) user shepherds for the same user from running in parallel.
>> It does not stop shepherd when a user closed all their sessions.
>
> Yes.  It just occurred to me that we probably just got it wrong from the
> start: ‘XDG_RUNTIME_DIR’ (/run/user/$UID) is specified as having limited
> lifetime.  Quoth
> <https://specifications.freedesktop.org/basedir-spec/latest/>:
>
>   The lifetime of the directory MUST be bound to the user being logged
>   in.  It MUST be created when the user first logs in and if the user
>   fully logs out the directory MUST be removed.
>
> So it was probably a bad idea in the first place for shepherd to store
> its socket in /run/user/$UID (even more so that this directory doesn’t
> exist on systems without elogind/systemd).  GnuPG avoids
> $XDG_RUNTIME_DIR for exactly this reason (there’s a comment in
> ‘homedir.c’).

Minor correction here.  Looking at the source code, GnuPG avoids the
XDG_RUNTIME_DIR environment variable, but it still tries to use the
/run/user/$UID directory, if it exists.

> So, what can we do?
>
> In the Shepherd 1.1, we could default to $XDG_STATE_HOME instead; we
> probably shouldn’t change that in 1.0.x.

Not sure here, the specification says the following about this location:

> The $XDG_STATE_HOME contains state data that *should persist between
> (application) restarts*, but that is not important or portable enough
> to the user that it should be stored in $XDG_DATA_HOME.

So... control socket does not seem to fit that description.

> Any other idea?

Well, since you have mentioned the GnuPG as an example, we could just
mirror what it does, and what I have suggested before.

--8<---------------cut here---------------start------------->8---
$ mkdir /tmp/xxx && cd /tmp/xxx
$ guix shell -u test -C findutils gnupg coreutils bash procps -- env HOME=/tmp/xxx GNUPGHOME=/tmp/xxx bash
test <at> xx ~ [env]$ gpg-agent --daemon
gpg-agent[2]: directory '/tmp/xxx/private-keys-v1.d' created
gpg-agent[3]: gpg-agent (GnuPG) 2.4.7 started
test <at> xx ~ [env]$ find /run/user
/run/user
/run/user/1000
/run/user/1000/gnupg
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.ssh
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.browser
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.extra
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent
test <at> xx ~ [env]$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
test         1  0.0  0.0   5136  4068 ?        S    13:32   0:00 bash
test         3  0.0  0.0   5516  2400 ?        Ss   13:32   0:00 gpg-agent --daemon
test         5  0.0  0.0   5224  3852 ?        R+   13:32   0:00 ps aux
test <at> xx ~ [env]$ rm -r /run/user/1000/gnupg
gpg-agent[3]: socket file has been removed - shutting down
gpg-agent[3]: gpg-agent (GnuPG) 2.4.7 stopped
test <at> xx ~ [env]$ find /run/user
/run/user
/run/user/1000
test <at> xx ~ [env]$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
test         1  0.0  0.0   5136  4068 ?        S    13:32   0:00 bash
test         8  0.0  0.0   5224  3776 ?        R+   13:33   0:00 ps aux
--8<---------------cut here---------------end--------------->8---

So my suggestion is that when the socket is deleted, the shepherd
process stops itself.

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 15 Jun 2025 13:48:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Mon, 16 Jun 2025 13:38:05 GMT) Full text and rfc822 format available.

Message #72 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, 76998-done <at> debbugs.gnu.org,
 Jake <jforst.mailman <at> gmail.com>, Daniel Littlewood <dan <at> danielittlewood.xyz>,
 Danny Milosavljevic <dannym <at> friendly-machines.com>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Mon, 16 Jun 2025 15:28:54 +0200
Hi,

Tomas Volf <~@wolfsden.cz> writes:

> Well, since you have mentioned the GnuPG as an example, we could just
> mirror what it does, and what I have suggested before.
>
> --8<---------------cut here---------------start------------->8---
> $ mkdir /tmp/xxx && cd /tmp/xxx
> $ guix shell -u test -C findutils gnupg coreutils bash procps -- env HOME=/tmp/xxx GNUPGHOME=/tmp/xxx bash
> test <at> xx ~ [env]$ gpg-agent --daemon
> gpg-agent[2]: directory '/tmp/xxx/private-keys-v1.d' created
> gpg-agent[3]: gpg-agent (GnuPG) 2.4.7 started
> test <at> xx ~ [env]$ find /run/user
> /run/user
> /run/user/1000
> /run/user/1000/gnupg
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.ssh
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.browser
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.extra
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent
> test <at> xx ~ [env]$ ps aux
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> test         1  0.0  0.0   5136  4068 ?        S    13:32   0:00 bash
> test         3  0.0  0.0   5516  2400 ?        Ss   13:32   0:00 gpg-agent --daemon
> test         5  0.0  0.0   5224  3852 ?        R+   13:32   0:00 ps aux
> test <at> xx ~ [env]$ rm -r /run/user/1000/gnupg
> gpg-agent[3]: socket file has been removed - shutting down
> gpg-agent[3]: gpg-agent (GnuPG) 2.4.7 stopped
> test <at> xx ~ [env]$ find /run/user
> /run/user
> /run/user/1000
> test <at> xx ~ [env]$ ps aux
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> test         1  0.0  0.0   5136  4068 ?        S    13:32   0:00 bash
> test         8  0.0  0.0   5224  3776 ?        R+   13:33   0:00 ps aux
> --8<---------------cut here---------------end--------------->8---
>
> So my suggestion is that when the socket is deleted, the shepherd
> process stops itself.

Brilliant!

The only downside is that it’ll be a bit of work (using inotify on Linux
and some other method elsewhere, presumably polling) but it definitely
sounds like a good plan.

I can look into it later it nobody beats me at it.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Mon, 16 Jun 2025 22:54:06 GMT) Full text and rfc822 format available.

Message #75 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> friendly-machines.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, Tomas Volf <~@wolfsden.cz>,
 76998-done <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Tue, 17 Jun 2025 00:53:35 +0200
Hi Ludo,

First, I apologize for the tone back that day.  Sorry!

Hmm, I think using inotify is nice, but I don't think it's guaranteed to block the deletion of the socket (and thus the dismantling of the entire user scope) on a notification.  That doesn't have to be bad--but it means that whatever stopping of user services shepherd does are in parallel to (and possibly later than!) the cleanup elogind and/or systemd does.  elogind/systemd could totally (and probably do!) remove the floor from the underneath shepherd while it's still busy stopping services (and syncing user service stuff to disk etc).

I don't see obvious actionable downsides, but some potential for stuff going wrong.

Frankly, this is some really weird patchwork GNU/Linux is doing there.  There's a reason systemd is the service manager integrating almost everything slightly (in different cooperating processes, though) and it's not just because they are weird.

But I'd say for now we can totally do the inotify thing and handle problems as they arise.

That said, patching elogind (for example just the herd stop root thing) would be easy enough too--but wouldn't help on other distros.

Maybe we can future-proof shepherd a bit (if we don't already): If we wanted to change the mechanism later (to elogind, or to whatever) in the future it it could get hairy to decide who's responsible for stopping shepherd now.  So we should definitely figure out if we are already stopping shepherd--and not begin stopping shepherd again while we are busy stopping shepherd :)

As for gnupg, it's a lot easier to stop gnupg since it isn't responsible for stopping like half the machine in an orderly fashion, than it is shepherd. :)




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Tue, 17 Jun 2025 08:16:02 GMT) Full text and rfc822 format available.

Message #78 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> friendly-machines.com>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, Tomas Volf <~@wolfsden.cz>,
 76998-done <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Tue, 17 Jun 2025 10:14:35 +0200
Hi Danny,

We can do both:

  1. Have ‘shepherd’ stop itself if its socket is removed (for user
     shepherd), or recreate the socket (for PID 1).

  2. Change elogind to potentially allow user shepherd to outlive user
     sessions.

From the Shepherd’s viewpoint, #1 seems to be the safe thing to do.
From the Guix Home viewpoint, #2 would be nice.

How does that sound?

Thanks,
Ludo’.




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 17 Jun 2025 09:55:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Sun, 29 Jun 2025 22:39:02 GMT) Full text and rfc822 format available.

Message #83 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 74912 <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 76998 <at> debbugs.gnu.org, Danny Milosavljevic <dannym <at> friendly-machines.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Sun, 29 Jun 2025 23:46:41 +0200
Hi Tomas,

Tomas Volf <~@wolfsden.cz> writes:

> So my suggestion is that when the socket is deleted, the shepherd
> process stops itself.

I posted a patch that does exactly that:

  https://codeberg.org/shepherd/shepherd/pulls/14

Let me know what you think!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Mon, 30 Jun 2025 18:17:02 GMT) Full text and rfc822 format available.

Message #86 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 76998 <at> debbugs.gnu.org, Danny Milosavljevic <dannym <at> friendly-machines.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Mon, 30 Jun 2025 20:16:47 +0200
Hello :)

Thanks a lot for putting this together!

Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi Tomas,
>
> Tomas Volf <~@wolfsden.cz> writes:
>
>> So my suggestion is that when the socket is deleted, the shepherd
>> process stops itself.
>
> I posted a patch that does exactly that:
>
>   https://codeberg.org/shepherd/shepherd/pulls/14
>
> Let me know what you think!

I wanted to try it, but did not figure out how to get the source code,
for some reason my git does not see the civodul/monitor-socket-deletion
branch:

--8<---------------cut here---------------start------------->8---
~/src/shepherd $ git remote -v
origin	https://codeberg.org/shepherd/shepherd.git (fetch)
origin	https://codeberg.org/shepherd/shepherd.git (push)
~/src/shepherd $ git fetch
~/src/shepherd $ git branch -r
  origin/HEAD -> origin/main
  origin/keyring
  origin/main
  origin/wip-goblinsify
~/src/shepherd $ git checkout civodul/monitor-socket-deletion
error: pathspec 'civodul/monitor-socket-deletion' did not match any file(s) known to git  
--8<---------------cut here---------------end--------------->8---

So all I could do is read it in the browser, without being able to test
the code locally.

It seems nice and there is only a one bug I have noticed, so the list
below is mostly just few suggestions and/or observations.

In configure.ac, I wonder whether you could use `action-if-fails'
argument of AC_COMPUTE_INT instead of the separate `test -z ...' block.
But not sure, maybe the current approach is more readable.  In the
comment, you have `Inotify', but it seems that is should always be
spelled in lower case (as in `inotify').  At least that is what
wikipedia does, even at the start of a sentence.

In the shepherd.texi file, the `If this' on a line by itself looks bit
weird, but maybe you did it this way intentionally to minimize the diff?

In the Note, you use `it' often and it is not always obvious to what it
refers to (without knowing the context).  Maybe ending the first
sentence with `to control @command{shepherd}' would be bit cleaner?
Also, if I read the code correctly, even the PID 1 shuts down if the
attempt to re-open the socket fails, that is not mentioned here.

In `socket-monitor', I admit I did not figure out when the
wait-for-file-deletion returns #f, it seems to me that when it returns,
it returns #t?  So I am unsure what the point of the `when' is instead
of just (wait-for-file-deletion socket-file) (on-deletion) (loop).

In `on-socket-deletion' (the PID 1 branch), I have to admit I am not
sure what exactly happens when you call (stop-service root-service).  Is
that a clean shutdown equivalent to running `shutdown' command?  Since
simply stopping the PID 1 leads to a kernel panic, I want to make sure
that is not what we are doing here.  I am unsure whether stopping the
system is the right thing to do (even if we fail to re-open the socket),
but at least it should be grateful.

For the system.scm.in, I do not have much experience with inotify, so
cannot comment much here.  However I believe you have a race condition
in `wait-for-file-deletion'.  You are checking via (file-exists? file),
but you have no guarantee that the file on the disk matches the file you
have open as a socket.  I think you should use `stat' and check whether
`stat:dev' and `stat:ino' match what you expect them to be.

One more observation (if I read the code right) is that if you fail to
set up the inotify watch, you will close the socket and immediately
return #t.  Which will cause the shepherd to spin forever in a tight
loop opening and closing the socket.  Did I miss anything?  I am not
sure what is the correct approach here.  Maybe, if the inotify setup
fails, we should fallback to the polling approach from GNU/Hurd?

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.




Information forwarded to bug-guix <at> gnu.org:
bug#74912; Package guix. (Sat, 12 Jul 2025 16:56:02 GMT) Full text and rfc822 format available.

Message #89 received at 74912 <at> debbugs.gnu.org (full text, mbox):

From: Jake <jforst.mailman <at> gmail.com>
To: Tomas Volf <~@wolfsden.cz>
Cc: 74912 <at> debbugs.gnu.org, Daniel Littlewood <dan <at> danielittlewood.xyz>,
 Ludovic Courtès <ludo <at> gnu.org>, 76998 <at> debbugs.gnu.org,
 Danny Milosavljevic <dannym <at> friendly-machines.com>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Sat, 12 Jul 2025 16:55:01 +0000
This seems fixed for me in 1.0.6.  Thanks all!

On Mon, Jun 30, 2025 at 6:16 PM Tomas Volf <~@wolfsden.cz> wrote:
>
> Hello :)
>
> Thanks a lot for putting this together!
>
> Ludovic Courtès <ludo <at> gnu.org> writes:
>
> > Hi Tomas,
> >
> > Tomas Volf <~@wolfsden.cz> writes:
> >
> >> So my suggestion is that when the socket is deleted, the shepherd
> >> process stops itself.
> >
> > I posted a patch that does exactly that:
> >
> >   https://codeberg.org/shepherd/shepherd/pulls/14
> >
> > Let me know what you think!
>
> I wanted to try it, but did not figure out how to get the source code,
> for some reason my git does not see the civodul/monitor-socket-deletion
> branch:
>
> --8<---------------cut here---------------start------------->8---
> ~/src/shepherd $ git remote -v
> origin  https://codeberg.org/shepherd/shepherd.git (fetch)
> origin  https://codeberg.org/shepherd/shepherd.git (push)
> ~/src/shepherd $ git fetch
> ~/src/shepherd $ git branch -r
>   origin/HEAD -> origin/main
>   origin/keyring
>   origin/main
>   origin/wip-goblinsify
> ~/src/shepherd $ git checkout civodul/monitor-socket-deletion
> error: pathspec 'civodul/monitor-socket-deletion' did not match any file(s) known to git
> --8<---------------cut here---------------end--------------->8---
>
> So all I could do is read it in the browser, without being able to test
> the code locally.
>
> It seems nice and there is only a one bug I have noticed, so the list
> below is mostly just few suggestions and/or observations.
>
> In configure.ac, I wonder whether you could use `action-if-fails'
> argument of AC_COMPUTE_INT instead of the separate `test -z ...' block.
> But not sure, maybe the current approach is more readable.  In the
> comment, you have `Inotify', but it seems that is should always be
> spelled in lower case (as in `inotify').  At least that is what
> wikipedia does, even at the start of a sentence.
>
> In the shepherd.texi file, the `If this' on a line by itself looks bit
> weird, but maybe you did it this way intentionally to minimize the diff?
>
> In the Note, you use `it' often and it is not always obvious to what it
> refers to (without knowing the context).  Maybe ending the first
> sentence with `to control @command{shepherd}' would be bit cleaner?
> Also, if I read the code correctly, even the PID 1 shuts down if the
> attempt to re-open the socket fails, that is not mentioned here.
>
> In `socket-monitor', I admit I did not figure out when the
> wait-for-file-deletion returns #f, it seems to me that when it returns,
> it returns #t?  So I am unsure what the point of the `when' is instead
> of just (wait-for-file-deletion socket-file) (on-deletion) (loop).
>
> In `on-socket-deletion' (the PID 1 branch), I have to admit I am not
> sure what exactly happens when you call (stop-service root-service).  Is
> that a clean shutdown equivalent to running `shutdown' command?  Since
> simply stopping the PID 1 leads to a kernel panic, I want to make sure
> that is not what we are doing here.  I am unsure whether stopping the
> system is the right thing to do (even if we fail to re-open the socket),
> but at least it should be grateful.
>
> For the system.scm.in, I do not have much experience with inotify, so
> cannot comment much here.  However I believe you have a race condition
> in `wait-for-file-deletion'.  You are checking via (file-exists? file),
> but you have no guarantee that the file on the disk matches the file you
> have open as a socket.  I think you should use `stat' and check whether
> `stat:dev' and `stat:ino' match what you expect them to be.
>
> One more observation (if I read the code right) is that if you fail to
> set up the inotify watch, you will close the socket and immediately
> return #t.  Which will cause the shepherd to spin forever in a tight
> loop opening and closing the socket.  Did I miss anything?  I am not
> sure what is the correct approach here.  Maybe, if the inotify setup
> fails, we should fallback to the polling approach from GNU/Hurd?
>
> Tomas
>
> --
> There are only two hard things in Computer Science:
> cache invalidation, naming things and off-by-one errors.




bug closed, send any further explanations to 76998 <at> debbugs.gnu.org and dannym <at> friendly-machines.com Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 13 Jul 2025 14:42:03 GMT) Full text and rfc822 format available.

This bug report was last modified 27 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.