GNU bug report logs - #76998
Guix Home leaves user shepherd on logout, starts new instance on login

Previous Next

Package: guix;

Reported by: dannym <at> friendly-machines.com

Date: Thu, 13 Mar 2025 19:11:02 UTC

Severity: important

Merged with 67863, 74912

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Jake <jforst.mailman <at> gmail.com>
Subject: bug#74912: closed (Re: bug#74912: bug#76998: Guix Home leaves
 user shepherd on logout, starts new instance on login)
Date: Sun, 18 May 2025 12:32:04 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#76998: Guix Home leaves user shepherd on logout, starts new instance on login

which was filed against the guix package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 74912 <at> debbugs.gnu.org.

-- 
76998: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=76998
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Danny Milosavljevic <dannym <at> friendly-machines.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 74912 <at> debbugs.gnu.org, 76998 <at> debbugs.gnu.org, Tomas Volf <~@wolfsden.cz>,
 76998-done <at> debbugs.gnu.org, Jake <jforst.mailman <at> gmail.com>,
 Daniel Littlewood <dan <at> danielittlewood.xyz>
Subject: Re: bug#74912: bug#76998: Guix Home leaves user shepherd on logout,
 starts new instance on login
Date: Sun, 18 May 2025 14:30:49 +0200
[Message part 3 (text/plain, inline)]
Hi Ludo,

That is not a fix.  It's a workaround for now.

It's good that the "is a shepherd already running" check is back in shepherd.  It was in shepherd years ago, then got removed without explanation, then now it's back again (now in a very convoluted but safer way).  This shouldn't have been removed in the first place.  It's EXTREMELY dangerous to have multiple parallel shepherds for the same user (automated backup service destroying backups etc).  Please, let's not remove it ever again.

In any case, what shepherd 1.0.4 does is stop the bleeding, but not fix the problem:
It prevents two (or 100) user shepherds for the same user from running in parallel.
It does not stop shepherd when a user closed all their sessions.

Why close this bug report before elogind is patched and before ~/.bash_logout is generated in guix home?  That makes no sense.

Also, I don't understand why this is so broken for so long.  Isn't Guix used in HPC?
Doesn't HPC need support for multiple sessions for the same user on day one?

My untested elogind patch that invokes shepherd root stop is attached.  Reading the elogind source code, especially what they patched out and what they added themselves, makes me despair.  Why is it so terrible?  That all used to be fine! :P

Even my patch is not great.  A service manager's job is to manage services.  PID 1 is the main service manager.  It should manage services.  One of those services should be the user's shepherd, which should be managed by PID 1 shepherd and not weirdly attached to an already-running session (WTF!) of the user by this:

~$ cat ~/.profile
HOME_ENVIRONMENT=$HOME/.guix-home
. $HOME_ENVIRONMENT/setup-environment
$HOME_ENVIRONMENT/on-first-login
unset HOME_ENVIRONMENT

In my opinion, no one but the service manager should manage services.  Does ~/.profile look like a service manager?  No :P

I understand that we want to support this on non-guix-system stuff.  But the default should be a systemd user service to run the user shepherd.  If the user absolutely wants to do a workaround like ~/.profile above, fine, they can.  But let's not do that by default.

The problems with my elogind patch are the following:
- What if "herd stop root -s ..." hangs?  Then elogind hangs forever?  No one can log in or out anymore?  That's not okay.  Therefore, I don't wait.  Now user processes can have the floor upon they are walking removed on user stop, while they still need it :P
- When can /run/user/1000 be deleted?  There's a weird GC mechanism in elogind for that, and my patch says it can be deleted before waiting on the result of herd stop (see above why).  If I DID wait on the result of herd stop, I could wait indefinitely--which is not okay.  I think elogind uses signalfd, so I can't waitpid in a random spot either, or wait until waitpid returned.  I think the user shepherd knows when to delete /run/user/1000--and no one else.  But if user shepherd crashes, it won't delete /run/user/1000 and we want it to be able to start again even when /run/user/1000 is still there.  Hence complicated shepherd fix in 1.0.4 is useful.
- There is tool_fork_pid and sleep_fork_pid in elogind which is not a queue.  And, again, that is trying to be a service manager.  What if those scripts hang?  What if they DON'T hang?  Similar questions as before.  Separate the concerns already :P

Personally, I'd also like something that, if all sessions of user x are closed, it kills all remaining processes of that effective user id.  elogind has a setting KillUserProcesses that--despite the name--kills (WHICH!?) processes when a SESSION (of 42 sessions of that user :P) is closed.  Who wants THAT?  And even if someone does: how would THAT be implemented?

elogind is like containers never happened.  It's so weird.

I think to fix this problem for good, first there needs to be a system diagram created on how this is supposed to work.

[ELOGIND.patch (text/x-patch, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Jake <jforst.mailman <at> gmail.com>
To: bug-guix <at> gnu.org
Cc: ludovic.courtes <at> inria.fr
Subject: Shepherd: Growing number of user shepherds when relogging
Date: Mon, 16 Dec 2024 14:23:20 +0000
[Message part 6 (text/plain, inline)]
Hi

I think I'm experiencing a bug in Shepherd since version 1.0.
Whenever I log out and log back in again, my user shepherd from the
previous login session is still present, and a new user shepherd spawns for
the current login session.
So relogging N times results in N+1 user shepherds.

For example, I have relogged 5 times since I last rebooted:

$ herd status root
Status of root:
  It is running since 00:30:02 (10 minutes ago).
  Main PID: 23450
  Command:
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
...

$ pgrep shepherd
1
9891
10777
16417
18510
21960
23450

$  ps aux | grep shepherd
root         1  0.0  0.9 222872 74456 ?        Sl   Dec15   0:08
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--config /gnu/store/p7al8wd1inwk8f5di2q4llcpd64mjn5q-shepherd.conf
jake      9891  0.0  0.2  75816 23624 ?        Ss   Dec15   0:04
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     10777  0.0  0.3  76224 24752 ?        Ss   Dec16   0:03
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     16417  0.0  0.3  75752 24004 ?        Ss   Dec16   0:02
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     18510  0.0  0.2  75752 23760 ?        Ss   Dec16   0:01
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     21960  0.0  0.2 114608 22124 ?        Ss   Dec16   0:00
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     23450  0.0  0.2 114204 21328 ?        Ss   00:30   0:00
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake     23672  0.0  0.0   6636  2552 pts/1    S+   00:32   0:00 grep
--color=auto shepherd

In addition, any daemons managed by the zombie shepherds also persist!

I'm experiencing this on both of my Guix System machines. One is running
GDM and XFCE. The other is running GDM and CWM.
Please let me know if I can provide more information.

Thanks
Jake
[Message part 7 (text/html, inline)]

This bug report was last modified today.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.