GNU bug report logs - #76998
Guix Home leaves user shepherd on logout, starts new instance on login

Previous Next

Package: guix;

Reported by: dannym <at> friendly-machines.com

Date: Thu, 13 Mar 2025 19:11:02 UTC

Severity: important

Merged with 67863, 74912

Done: Ludovic Courtès <ludo <at> gnu.org>

To reply to this bug, email your comments to 76998 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#76998; Package guix. (Thu, 13 Mar 2025 19:11:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to dannym <at> friendly-machines.com:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Thu, 13 Mar 2025 19:11:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: dannym <at> friendly-machines.com
To: Bug Guix <bug-guix <at> gnu.org>
Subject: user shepherd stays around with some zombies
Date: Thu, 13 Mar 2025 20:10:36 +0100
Steps to reproduce:

1. Log into the console using your regular user
2. Log into GUI using your regular user
3. Log out of GUI
4. Switch to logged-in console
5. Run "px --tree" there
6. Observe the following:

shepherd(1)
  accounts-daemon(1110)
  avahi-daemon:(2443)
    avahi-daemon:(2446)
  bluetoothd(1026)
  colord(25587)
  cupsd(2440)
  dbus-daemon(769)
  dnsmasq(1845)
    dnsmasq(1846)
  earlyoom(744)
  elogind(1024)
  gdm(1038)
  guix-daemon(740)
  libvirtd(1023)
  login(26536)
    -bash(6739)
  mcron(747)
  mingetty... (5×)
  ModemManager(1276)
  NetworkManager(1256)
  nginx:(797)
    nginx:(798)
  nscd(2177)
  polkitd(1231)
  postgres(852)
    postgres:... (6×)
  rasdaemon(796)
  rpc.idmapd(2447)
  rpc.mountd(2501)
  rpc.statd(2444)
  rpcbind(2441)
  shepherd(6395) <--- also dannym
    [dbus-daemon](6397)
    [ssh-agent](6444)
    [xdg-permission-](6411)
    wireplumber(6399)
  shepherd(26114) <--- dannym
    dbus-daemon(6881)
    pipewire(6882)
    pipewire-pulse(6883)
    ssh-agent(6880)
    wireplumber(6888)
    xdg-permission-store(7259)
  udevd(330)
  upowerd(1025)
  virtlogd(742)
  wpa_supplicant(1045)

Those "[...]" with brackets mean that these processes were not reaped 
(so is defunct).

What the hell?

$ guix describe
Generation 194	Mar 13 2025 19:11:33	(current)
  guix 678b3dd
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 678b3dddfe442e643fe5cff7730d4f9690c3e2c2




Information forwarded to bug-guix <at> gnu.org:
bug#76998; Package guix. (Thu, 13 Mar 2025 22:50:02 GMT) Full text and rfc822 format available.

Message #8 received at 76998 <at> debbugs.gnu.org (full text, mbox):

From: Jake <jforst.mailman <at> gmail.com>
To: dannym <at> friendly-machines.com
Cc: 76998 <at> debbugs.gnu.org
Subject: Re: bug#76998: user shepherd stays around with some zombies
Date: Fri, 14 Mar 2025 09:18:49 +1030
[Message part 1 (text/plain, inline)]
Is this the same as
https://issues.guix.gnu.org/74912
?

Jake

On Fri, 14 Mar 2025 at 5:41 am, <dannym <at> friendly-machines.com> wrote:

> Steps to reproduce:
>
> 1. Log into the console using your regular user
> 2. Log into GUI using your regular user
> 3. Log out of GUI
> 4. Switch to logged-in console
> 5. Run "px --tree" there
> 6. Observe the following:
>
> shepherd(1)
>    accounts-daemon(1110)
>    avahi-daemon:(2443)
>      avahi-daemon:(2446)
>    bluetoothd(1026)
>    colord(25587)
>    cupsd(2440)
>    dbus-daemon(769)
>    dnsmasq(1845)
>      dnsmasq(1846)
>    earlyoom(744)
>    elogind(1024)
>    gdm(1038)
>    guix-daemon(740)
>    libvirtd(1023)
>    login(26536)
>      -bash(6739)
>    mcron(747)
>    mingetty... (5×)
>    ModemManager(1276)
>    NetworkManager(1256)
>    nginx:(797)
>      nginx:(798)
>    nscd(2177)
>    polkitd(1231)
>    postgres(852)
>      postgres:... (6×)
>    rasdaemon(796)
>    rpc.idmapd(2447)
>    rpc.mountd(2501)
>    rpc.statd(2444)
>    rpcbind(2441)
>    shepherd(6395) <--- also dannym
>      [dbus-daemon](6397)
>      [ssh-agent](6444)
>      [xdg-permission-](6411)
>      wireplumber(6399)
>    shepherd(26114) <--- dannym
>      dbus-daemon(6881)
>      pipewire(6882)
>      pipewire-pulse(6883)
>      ssh-agent(6880)
>      wireplumber(6888)
>      xdg-permission-store(7259)
>    udevd(330)
>    upowerd(1025)
>    virtlogd(742)
>    wpa_supplicant(1045)
>
> Those "[...]" with brackets mean that these processes were not reaped
> (so is defunct).
>
> What the hell?
>
> $ guix describe
> Generation 194  Mar 13 2025 19:11:33    (current)
>    guix 678b3dd
>      repository URL: https://git.savannah.gnu.org/git/guix.git
>      branch: master
>      commit: 678b3dddfe442e643fe5cff7730d4f9690c3e2c2
>
>
>
>
[Message part 2 (text/html, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 15 Mar 2025 10:57:02 GMT) Full text and rfc822 format available.

Merged 67863 74912 76998. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 15 Mar 2025 10:58:03 GMT) Full text and rfc822 format available.

Changed bug title to 'Guix Home leaves user shepherd on logout, starts new instance on login' from 'user shepherd stays around with some zombies' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 17 Mar 2025 19:38:04 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Wed, 14 May 2025 17:05:07 GMT) Full text and rfc822 format available.

Notification sent to dannym <at> friendly-machines.com:
bug acknowledged by developer. (Wed, 14 May 2025 17:05:07 GMT) Full text and rfc822 format available.

Message #19 received at 76998-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> friendly-machines.com>
Cc: Jake <jforst.mailman <at> gmail.com>, 74912 <at> debbugs.gnu.org,
 Tomas Volf <~@wolfsden.cz>, 76998-done <at> debbugs.gnu.org,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Wed, 14 May 2025 18:06:11 +0200
Hi,

Ludovic Courtès <ludo <at> gnu.org> writes:

> So shepherd will now refuse to start when it determines that an instance
> is already listening on its socket:
>
>   https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=787d5a33aea061b5052faa0863c96be722440ce3

This commit is in 1.0.4.  Closing!

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Wed, 14 May 2025 17:05:07 GMT) Full text and rfc822 format available.

Notification sent to xeji <at> cat3.de:
bug acknowledged by developer. (Wed, 14 May 2025 17:05:08 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Wed, 14 May 2025 17:05:08 GMT) Full text and rfc822 format available.

Notification sent to Jake <jforst.mailman <at> gmail.com>:
bug acknowledged by developer. (Wed, 14 May 2025 17:05:08 GMT) Full text and rfc822 format available.

Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 17 May 2025 15:34:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#76998; Package guix. (Sun, 18 May 2025 12:32:02 GMT) Full text and rfc822 format available.

Message #34 received at 76998 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> friendly-machines.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, Tomas Volf <~@wolfsden.cz>,
 76998-done <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#74912: bug#76998: Guix Home leaves user shepherd on logout,
 starts new instance on login
Date: Sun, 18 May 2025 14:30:49 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

That is not a fix.  It's a workaround for now.

It's good that the "is a shepherd already running" check is back in shepherd.  It was in shepherd years ago, then got removed without explanation, then now it's back again (now in a very convoluted but safer way).  This shouldn't have been removed in the first place.  It's EXTREMELY dangerous to have multiple parallel shepherds for the same user (automated backup service destroying backups etc).  Please, let's not remove it ever again.

In any case, what shepherd 1.0.4 does is stop the bleeding, but not fix the problem:
It prevents two (or 100) user shepherds for the same user from running in parallel.
It does not stop shepherd when a user closed all their sessions.

Why close this bug report before elogind is patched and before ~/.bash_logout is generated in guix home?  That makes no sense.

Also, I don't understand why this is so broken for so long.  Isn't Guix used in HPC?
Doesn't HPC need support for multiple sessions for the same user on day one?

My untested elogind patch that invokes shepherd root stop is attached.  Reading the elogind source code, especially what they patched out and what they added themselves, makes me despair.  Why is it so terrible?  That all used to be fine! :P

Even my patch is not great.  A service manager's job is to manage services.  PID 1 is the main service manager.  It should manage services.  One of those services should be the user's shepherd, which should be managed by PID 1 shepherd and not weirdly attached to an already-running session (WTF!) of the user by this:

~$ cat ~/.profile
HOME_ENVIRONMENT=$HOME/.guix-home
. $HOME_ENVIRONMENT/setup-environment
$HOME_ENVIRONMENT/on-first-login
unset HOME_ENVIRONMENT

In my opinion, no one but the service manager should manage services.  Does ~/.profile look like a service manager?  No :P

I understand that we want to support this on non-guix-system stuff.  But the default should be a systemd user service to run the user shepherd.  If the user absolutely wants to do a workaround like ~/.profile above, fine, they can.  But let's not do that by default.

The problems with my elogind patch are the following:
- What if "herd stop root -s ..." hangs?  Then elogind hangs forever?  No one can log in or out anymore?  That's not okay.  Therefore, I don't wait.  Now user processes can have the floor upon they are walking removed on user stop, while they still need it :P
- When can /run/user/1000 be deleted?  There's a weird GC mechanism in elogind for that, and my patch says it can be deleted before waiting on the result of herd stop (see above why).  If I DID wait on the result of herd stop, I could wait indefinitely--which is not okay.  I think elogind uses signalfd, so I can't waitpid in a random spot either, or wait until waitpid returned.  I think the user shepherd knows when to delete /run/user/1000--and no one else.  But if user shepherd crashes, it won't delete /run/user/1000 and we want it to be able to start again even when /run/user/1000 is still there.  Hence complicated shepherd fix in 1.0.4 is useful.
- There is tool_fork_pid and sleep_fork_pid in elogind which is not a queue.  And, again, that is trying to be a service manager.  What if those scripts hang?  What if they DON'T hang?  Similar questions as before.  Separate the concerns already :P

Personally, I'd also like something that, if all sessions of user x are closed, it kills all remaining processes of that effective user id.  elogind has a setting KillUserProcesses that--despite the name--kills (WHICH!?) processes when a SESSION (of 42 sessions of that user :P) is closed.  Who wants THAT?  And even if someone does: how would THAT be implemented?

elogind is like containers never happened.  It's so weird.

I think to fix this problem for good, first there needs to be a system diagram created on how this is supposed to work.

[ELOGIND.patch (text/x-patch, attachment)]

Reply sent to Danny Milosavljevic <dannym <at> friendly-machines.com>:
You have taken responsibility. (Sun, 18 May 2025 12:32:03 GMT) Full text and rfc822 format available.

Notification sent to dannym <at> friendly-machines.com:
bug acknowledged by developer. (Sun, 18 May 2025 12:32:03 GMT) Full text and rfc822 format available.

Reply sent to Danny Milosavljevic <dannym <at> friendly-machines.com>:
You have taken responsibility. (Sun, 18 May 2025 12:32:03 GMT) Full text and rfc822 format available.

Notification sent to xeji <at> cat3.de:
bug acknowledged by developer. (Sun, 18 May 2025 12:32:03 GMT) Full text and rfc822 format available.

Reply sent to Danny Milosavljevic <dannym <at> friendly-machines.com>:
You have taken responsibility. (Sun, 18 May 2025 12:32:04 GMT) Full text and rfc822 format available.

Notification sent to Jake <jforst.mailman <at> gmail.com>:
bug acknowledged by developer. (Sun, 18 May 2025 12:32:04 GMT) Full text and rfc822 format available.

Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 04 Jun 2025 16:26:05 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#76998; Package guix. (Sat, 14 Jun 2025 21:28:02 GMT) Full text and rfc822 format available.

Message #54 received at 76998 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> friendly-machines.com>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, Tomas Volf <~@wolfsden.cz>,
 76998-done <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Sat, 14 Jun 2025 23:26:53 +0200
Hi Danny,

Danny Milosavljevic <dannym <at> friendly-machines.com> writes:

> It's good that the "is a shepherd already running" check is back in
> shepherd.  It was in shepherd years ago, then got removed without
> explanation, then now it's back again (now in a very convoluted but
> safer way).  This shouldn't have been removed in the first place.
> It's EXTREMELY dangerous to have multiple parallel shepherds for the
> same user (automated backup service destroying backups etc).  Please,
> let's not remove it ever again.

If you’re referring to 649a98a6697d358a53eccc45b387e5130278b5ec (8 years
ago), I believe it wasn’t doing just that due to lingering.

(Aside: I think the tone of this paragraph is uncalled for.)

> In any case, what shepherd 1.0.4 does is stop the bleeding, but not fix the problem:
> It prevents two (or 100) user shepherds for the same user from running in parallel.
> It does not stop shepherd when a user closed all their sessions.

Yes.  It just occurred to me that we probably just got it wrong from the
start: ‘XDG_RUNTIME_DIR’ (/run/user/$UID) is specified as having limited
lifetime.  Quoth
<https://specifications.freedesktop.org/basedir-spec/latest/>:

  The lifetime of the directory MUST be bound to the user being logged
  in.  It MUST be created when the user first logs in and if the user
  fully logs out the directory MUST be removed.

So it was probably a bad idea in the first place for shepherd to store
its socket in /run/user/$UID (even more so that this directory doesn’t
exist on systems without elogind/systemd).  GnuPG avoids
$XDG_RUNTIME_DIR for exactly this reason (there’s a comment in
‘homedir.c’).

So, what can we do?

In the Shepherd 1.1, we could default to $XDG_STATE_HOME instead; we
probably shouldn’t change that in 1.0.x.

Any other idea?

Thanks,
Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sat, 14 Jun 2025 21:28:03 GMT) Full text and rfc822 format available.

Notification sent to dannym <at> friendly-machines.com:
bug acknowledged by developer. (Sat, 14 Jun 2025 21:28:03 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sat, 14 Jun 2025 21:28:03 GMT) Full text and rfc822 format available.

Notification sent to xeji <at> cat3.de:
bug acknowledged by developer. (Sat, 14 Jun 2025 21:28:03 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sat, 14 Jun 2025 21:28:04 GMT) Full text and rfc822 format available.

Notification sent to Jake <jforst.mailman <at> gmail.com>:
bug acknowledged by developer. (Sat, 14 Jun 2025 21:28:04 GMT) Full text and rfc822 format available.

Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 14 Jun 2025 22:38:06 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#76998; Package guix. (Sun, 15 Jun 2025 13:41:02 GMT) Full text and rfc822 format available.

Message #74 received at 76998 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, 76998-done <at> debbugs.gnu.org,
 Jake <jforst.mailman <at> gmail.com>, Daniel Littlewood <dan <at> danielittlewood.xyz>,
 Danny Milosavljevic <dannym <at> friendly-machines.com>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Sun, 15 Jun 2025 15:40:33 +0200
Hello :)

Ludovic Courtès <ludo <at> gnu.org> writes:

>> In any case, what shepherd 1.0.4 does is stop the bleeding, but not fix the problem:
>> It prevents two (or 100) user shepherds for the same user from running in parallel.
>> It does not stop shepherd when a user closed all their sessions.
>
> Yes.  It just occurred to me that we probably just got it wrong from the
> start: ‘XDG_RUNTIME_DIR’ (/run/user/$UID) is specified as having limited
> lifetime.  Quoth
> <https://specifications.freedesktop.org/basedir-spec/latest/>:
>
>   The lifetime of the directory MUST be bound to the user being logged
>   in.  It MUST be created when the user first logs in and if the user
>   fully logs out the directory MUST be removed.
>
> So it was probably a bad idea in the first place for shepherd to store
> its socket in /run/user/$UID (even more so that this directory doesn’t
> exist on systems without elogind/systemd).  GnuPG avoids
> $XDG_RUNTIME_DIR for exactly this reason (there’s a comment in
> ‘homedir.c’).

Minor correction here.  Looking at the source code, GnuPG avoids the
XDG_RUNTIME_DIR environment variable, but it still tries to use the
/run/user/$UID directory, if it exists.

> So, what can we do?
>
> In the Shepherd 1.1, we could default to $XDG_STATE_HOME instead; we
> probably shouldn’t change that in 1.0.x.

Not sure here, the specification says the following about this location:

> The $XDG_STATE_HOME contains state data that *should persist between
> (application) restarts*, but that is not important or portable enough
> to the user that it should be stored in $XDG_DATA_HOME.

So... control socket does not seem to fit that description.

> Any other idea?

Well, since you have mentioned the GnuPG as an example, we could just
mirror what it does, and what I have suggested before.

--8<---------------cut here---------------start------------->8---
$ mkdir /tmp/xxx && cd /tmp/xxx
$ guix shell -u test -C findutils gnupg coreutils bash procps -- env HOME=/tmp/xxx GNUPGHOME=/tmp/xxx bash
test <at> xx ~ [env]$ gpg-agent --daemon
gpg-agent[2]: directory '/tmp/xxx/private-keys-v1.d' created
gpg-agent[3]: gpg-agent (GnuPG) 2.4.7 started
test <at> xx ~ [env]$ find /run/user
/run/user
/run/user/1000
/run/user/1000/gnupg
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.ssh
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.browser
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.extra
/run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent
test <at> xx ~ [env]$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
test         1  0.0  0.0   5136  4068 ?        S    13:32   0:00 bash
test         3  0.0  0.0   5516  2400 ?        Ss   13:32   0:00 gpg-agent --daemon
test         5  0.0  0.0   5224  3852 ?        R+   13:32   0:00 ps aux
test <at> xx ~ [env]$ rm -r /run/user/1000/gnupg
gpg-agent[3]: socket file has been removed - shutting down
gpg-agent[3]: gpg-agent (GnuPG) 2.4.7 stopped
test <at> xx ~ [env]$ find /run/user
/run/user
/run/user/1000
test <at> xx ~ [env]$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
test         1  0.0  0.0   5136  4068 ?        S    13:32   0:00 bash
test         8  0.0  0.0   5224  3776 ?        R+   13:33   0:00 ps aux
--8<---------------cut here---------------end--------------->8---

So my suggestion is that when the socket is deleted, the shepherd
process stops itself.

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.




Reply sent to Tomas Volf <~@wolfsden.cz>:
You have taken responsibility. (Sun, 15 Jun 2025 13:41:03 GMT) Full text and rfc822 format available.

Notification sent to dannym <at> friendly-machines.com:
bug acknowledged by developer. (Sun, 15 Jun 2025 13:41:03 GMT) Full text and rfc822 format available.

Reply sent to Tomas Volf <~@wolfsden.cz>:
You have taken responsibility. (Sun, 15 Jun 2025 13:41:04 GMT) Full text and rfc822 format available.

Notification sent to xeji <at> cat3.de:
bug acknowledged by developer. (Sun, 15 Jun 2025 13:41:04 GMT) Full text and rfc822 format available.

Reply sent to Tomas Volf <~@wolfsden.cz>:
You have taken responsibility. (Sun, 15 Jun 2025 13:41:04 GMT) Full text and rfc822 format available.

Notification sent to Jake <jforst.mailman <at> gmail.com>:
bug acknowledged by developer. (Sun, 15 Jun 2025 13:41:04 GMT) Full text and rfc822 format available.

Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 15 Jun 2025 13:48:02 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Mon, 16 Jun 2025 13:37:06 GMT) Full text and rfc822 format available.

Notification sent to dannym <at> friendly-machines.com:
bug acknowledged by developer. (Mon, 16 Jun 2025 13:37:06 GMT) Full text and rfc822 format available.

Message #96 received at 76998-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, 76998-done <at> debbugs.gnu.org,
 Jake <jforst.mailman <at> gmail.com>, Daniel Littlewood <dan <at> danielittlewood.xyz>,
 Danny Milosavljevic <dannym <at> friendly-machines.com>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Mon, 16 Jun 2025 15:28:54 +0200
Hi,

Tomas Volf <~@wolfsden.cz> writes:

> Well, since you have mentioned the GnuPG as an example, we could just
> mirror what it does, and what I have suggested before.
>
> --8<---------------cut here---------------start------------->8---
> $ mkdir /tmp/xxx && cd /tmp/xxx
> $ guix shell -u test -C findutils gnupg coreutils bash procps -- env HOME=/tmp/xxx GNUPGHOME=/tmp/xxx bash
> test <at> xx ~ [env]$ gpg-agent --daemon
> gpg-agent[2]: directory '/tmp/xxx/private-keys-v1.d' created
> gpg-agent[3]: gpg-agent (GnuPG) 2.4.7 started
> test <at> xx ~ [env]$ find /run/user
> /run/user
> /run/user/1000
> /run/user/1000/gnupg
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.ssh
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.browser
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent.extra
> /run/user/1000/gnupg/d.j1yiifhhjrep9xunazyff54c/S.gpg-agent
> test <at> xx ~ [env]$ ps aux
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> test         1  0.0  0.0   5136  4068 ?        S    13:32   0:00 bash
> test         3  0.0  0.0   5516  2400 ?        Ss   13:32   0:00 gpg-agent --daemon
> test         5  0.0  0.0   5224  3852 ?        R+   13:32   0:00 ps aux
> test <at> xx ~ [env]$ rm -r /run/user/1000/gnupg
> gpg-agent[3]: socket file has been removed - shutting down
> gpg-agent[3]: gpg-agent (GnuPG) 2.4.7 stopped
> test <at> xx ~ [env]$ find /run/user
> /run/user
> /run/user/1000
> test <at> xx ~ [env]$ ps aux
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> test         1  0.0  0.0   5136  4068 ?        S    13:32   0:00 bash
> test         8  0.0  0.0   5224  3776 ?        R+   13:33   0:00 ps aux
> --8<---------------cut here---------------end--------------->8---
>
> So my suggestion is that when the socket is deleted, the shepherd
> process stops itself.

Brilliant!

The only downside is that it’ll be a bit of work (using inotify on Linux
and some other method elsewhere, presumably polling) but it definitely
sounds like a good plan.

I can look into it later it nobody beats me at it.

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Mon, 16 Jun 2025 13:37:07 GMT) Full text and rfc822 format available.

Notification sent to xeji <at> cat3.de:
bug acknowledged by developer. (Mon, 16 Jun 2025 13:37:07 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Mon, 16 Jun 2025 13:37:07 GMT) Full text and rfc822 format available.

Notification sent to Jake <jforst.mailman <at> gmail.com>:
bug acknowledged by developer. (Mon, 16 Jun 2025 13:37:07 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#76998; Package guix. (Mon, 16 Jun 2025 13:37:09 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#76998; Package guix. (Mon, 16 Jun 2025 22:54:09 GMT) Full text and rfc822 format available.

Message #112 received at 76998-done <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> friendly-machines.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, Tomas Volf <~@wolfsden.cz>,
 76998-done <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#76998: Guix Home leaves user shepherd on logout, starts new
 instance on login
Date: Tue, 17 Jun 2025 00:53:35 +0200
Hi Ludo,

First, I apologize for the tone back that day.  Sorry!

Hmm, I think using inotify is nice, but I don't think it's guaranteed to block the deletion of the socket (and thus the dismantling of the entire user scope) on a notification.  That doesn't have to be bad--but it means that whatever stopping of user services shepherd does are in parallel to (and possibly later than!) the cleanup elogind and/or systemd does.  elogind/systemd could totally (and probably do!) remove the floor from the underneath shepherd while it's still busy stopping services (and syncing user service stuff to disk etc).

I don't see obvious actionable downsides, but some potential for stuff going wrong.

Frankly, this is some really weird patchwork GNU/Linux is doing there.  There's a reason systemd is the service manager integrating almost everything slightly (in different cooperating processes, though) and it's not just because they are weird.

But I'd say for now we can totally do the inotify thing and handle problems as they arise.

That said, patching elogind (for example just the herd stop root thing) would be easy enough too--but wouldn't help on other distros.

Maybe we can future-proof shepherd a bit (if we don't already): If we wanted to change the mechanism later (to elogind, or to whatever) in the future it it could get hairy to decide who's responsible for stopping shepherd now.  So we should definitely figure out if we are already stopping shepherd--and not begin stopping shepherd again while we are busy stopping shepherd :)

As for gnupg, it's a lot easier to stop gnupg since it isn't responsible for stopping like half the machine in an orderly fashion, than it is shepherd. :)




Information forwarded to bug-guix <at> gnu.org:
bug#76998; Package guix. (Mon, 16 Jun 2025 22:55:03 GMT) Full text and rfc822 format available.

This bug report was last modified today.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.