GNU bug report logs -
#78355
guix-ownership inconsistent state
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 78355 in the body.
You can then email your comments to 78355 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
ludo <at> gnu.org, bug-guix <at> gnu.org
:
bug#78355
; Package
guix
.
(Sat, 10 May 2025 15:35:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Rutherther <rutherther <at> ditigal.xyz>
:
New bug report received and forwarded. Copy sent to
ludo <at> gnu.org, bug-guix <at> gnu.org
.
(Sat, 10 May 2025 15:35:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
There are reports from users with inconsistencies in ownership, it seems that at
least /var/guix is sometimes left with wrong owner, but maybe even parts
of the store? I cannot verify that.
The guix-ownership service checks /gnu/store ownership to check if the
whole store and all files important for the daemon (/etc/guix,
/var/guix) are owned by the appropriate user.
If the folder isn't owned by appropriate user, it moves to those steps:
1. Fix permissions in /gnu/store - first under it, then /gnu/store
itself as last step
2. Fix /var/guix
3. Fix /etc/guix
4. Fix /var/log/guix
So from those laid out steps it should be obvious that if guix-ownership
service somehow stops between steps 1 and 2, it will never recover
ownerships of /var/guix, /etc/guix and /var/log/guix. /gnu/store should
change owner as last.
On the other hand it feels much of a coincidence users would be
consistently hitting reboots between those steps. So maybe I am
overlooking another thing. I checked the file-system-fold, it goes to
/gnu/store as last, so that would mean putting step 1 after 4 should fix
that. Still, maybe only /gnu/store itself should be skipped instead of moving
the step, and done as last, step 5 to ensure it's fine even if
file-system-fold somehow changed the ordering? Not sure how exactly it
should behave in that regard.
Regards
Rutherther
Information forwarded
to
bug-guix <at> gnu.org
:
bug#78355
; Package
guix
.
(Wed, 14 May 2025 21:52:05 GMT)
Full text and
rfc822 format available.
Message #8 received at 78355 <at> debbugs.gnu.org (full text, mbox):
Hi,
Rutherther <rutherther <at> ditigal.xyz> writes:
> There are reports from users with inconsistencies in ownership, it seems that at
> least /var/guix is sometimes left with wrong owner, but maybe even parts
> of the store? I cannot verify that.
Would be nice to get their reports here, otherwise we’re left
speculating.
> The guix-ownership service checks /gnu/store ownership to check if the
> whole store and all files important for the daemon (/etc/guix,
> /var/guix) are owned by the appropriate user.
>
> If the folder isn't owned by appropriate user, it moves to those steps:
> 1. Fix permissions in /gnu/store - first under it, then /gnu/store
> itself as last step
> 2. Fix /var/guix
> 3. Fix /etc/guix
> 4. Fix /var/log/guix
>
> So from those laid out steps it should be obvious that if guix-ownership
> service somehow stops between steps 1 and 2, it will never recover
> ownerships of /var/guix, /etc/guix and /var/log/guix. /gnu/store should
> change owner as last.
Well, the fundamental assumption is that ‘guix-ownership’ is not
interrupted during its work; manual intervention is needed to repair
things if it is interrupted.
I don’t see any way around that but perhaps we should warn about it more
clearly?
> On the other hand it feels much of a coincidence users would be
> consistently hitting reboots between those steps. So maybe I am
> overlooking another thing. I checked the file-system-fold, it goes to
> /gnu/store as last, so that would mean putting step 1 after 4 should fix
> that. Still, maybe only /gnu/store itself should be skipped instead of moving
> the step, and done as last, step 5 to ensure it's fine even if
> file-system-fold somehow changed the ordering? Not sure how exactly it
> should behave in that regard.
Doing /gnu/store last is a good idea because it reduces the window
during which the inconsistent state could go undetected.
Feel free to propose a patch; otherwise I’ll give it a try, but not
before next week.
Thanks,
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#78355
; Package
guix
.
(Thu, 15 May 2025 07:37:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 78355 <at> debbugs.gnu.org (full text, mbox):
Hi Ben,
I am CCing you to get more information about the inconsistent ownership,
if you could help with that.
The most important questions are probably:
0. Are you sure the service actually ran after you reconfigured to root?
It should definitely run after reboot, not sure if after reconfigure
as the service already exists, it's just modified
1. Do you think you could've killed the service when it was running
after you reconfigured back to privileged daemon, ie. by rebooting when
it was running?
2. Do you know what folders had wrong owners?
- Was everything under /var/guix fully owned by 971?
- Was everything under /etc/guix fully owned by 971?
- Was everything under store fully owned by 971?
Hi Ludo,
Ludovic Courtès <ludo <at> gnu.org> writes:
> Hi,
>
> Rutherther <rutherther <at> ditigal.xyz> writes:
>
>> There are reports from users with inconsistencies in ownership, it seems that at
>> least /var/guix is sometimes left with wrong owner, but maybe even parts
>> of the store? I cannot verify that.
>
> Would be nice to get their reports here, otherwise we’re left
> speculating.
I am afraid we will be left speculating either way, that's why I chose
this approach. That is because none of the users I know of took time to
debug it and just fixed it. CCing Ben Sturmfels who
encountered it (see
https://lists.gnu.org/archive/html/help-guix/2025-05/msg00052.html).
For the other user I know of, I don't know their e-mail.
Note that on IRC I recommended the user to chown $USER /gnu/store
($USER just cause it's easiest, any non-root user would be fine) and
herd start guix-ownership and that fixed the issue. So the service
definitely is doing its job. See here https://logs.guix.gnu.org/guix/2025-05-10.log#171215
>
>> The guix-ownership service checks /gnu/store ownership to check if the
>> whole store and all files important for the daemon (/etc/guix,
>> /var/guix) are owned by the appropriate user.
>>
>> If the folder isn't owned by appropriate user, it moves to those steps:
>> 1. Fix permissions in /gnu/store - first under it, then /gnu/store
>> itself as last step
>> 2. Fix /var/guix
>> 3. Fix /etc/guix
>> 4. Fix /var/log/guix
>>
>> So from those laid out steps it should be obvious that if guix-ownership
>> service somehow stops between steps 1 and 2, it will never recover
>> ownerships of /var/guix, /etc/guix and /var/log/guix. /gnu/store should
>> change owner as last.
>
> Well, the fundamental assumption is that ‘guix-ownership’ is not
> interrupted during its work; manual intervention is needed to repair
> things if it is interrupted.
I think it would at least be good if there was a script to do what
guix-ownership does, but force it without the /gnu/store ownership
check, to make it easier for users to recover. Maybe even an optional argument to
guix-ownership, where you could `sudo herd start guix-ownership 1` and
that would force the chown'ing?
>
> I don’t see any way around that but perhaps we should warn about it more
> clearly?
That would definitely be great, I think you can easily oversee that the
service has started. Now I am not sure if one-shot services are started
after change when you reconfigure, if they are, I think it's going to be
a common issue - people reconfigure & reboot! Meaning they will usually
stop the service, or am I mistaken here?
>
>> On the other hand it feels much of a coincidence users would be
>> consistently hitting reboots between those steps. So maybe I am
>> overlooking another thing. I checked the file-system-fold, it goes to
>> /gnu/store as last, so that would mean putting step 1 after 4 should fix
>> that. Still, maybe only /gnu/store itself should be skipped instead of moving
>> the step, and done as last, step 5 to ensure it's fine even if
>> file-system-fold somehow changed the ordering? Not sure how exactly it
>> should behave in that regard.
>
> Doing /gnu/store last is a good idea because it reduces the window
> during which the inconsistent state could go undetected.
I think it completely removes it. Or why do you think not?
If it really doesn't I think it would be good if we came up with an
approach that would remove this window. The best way to achieve no
inconsistence-window would be to just check all of the permissions, but
that's probably an overkill.
For example creating a 'stamp'
somewhere that says it was done, at the end, and checking if that stamp
file matches what we expect.
>
> Feel free to propose a patch; otherwise I’ll give it a try, but not
> before next week.
>
> Thanks,
> Ludo’.
Thanks
Rutherther
Information forwarded
to
bug-guix <at> gnu.org
:
bug#78355
; Package
guix
.
(Thu, 15 May 2025 08:34:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 78355 <at> debbugs.gnu.org (full text, mbox):
Hi,
Rutherther <rutherther <at> ditigal.xyz> writes:
> I think it would at least be good if there was a script to do what
> guix-ownership does, but force it without the /gnu/store ownership
> check, to make it easier for users to recover. Maybe even an optional argument to
> guix-ownership, where you could `sudo herd start guix-ownership 1` and
> that would force the chown'ing?
There’s +/- a script in the manual (info "(guix) Build Environment
Setup").
>>
>> I don’t see any way around that but perhaps we should warn about it more
>> clearly?
>
> That would definitely be great, I think you can easily oversee that the
> service has started. Now I am not sure if one-shot services are started
> after change when you reconfigure, if they are, I think it's going to be
> a common issue - people reconfigure & reboot! Meaning they will usually
> stop the service, or am I mistaken here?
The one-shot service is restarted upon reconfigure, but one also has to
restart guix-daemon in this case.
>> Doing /gnu/store last is a good idea because it reduces the window
>> during which the inconsistent state could go undetected.
>
> I think it completely removes it. Or why do you think not?
Yes, you’re right, as long as /gnu/store itself is done last.
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#78355
; Package
guix
.
(Fri, 30 May 2025 09:32:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 78355 <at> debbugs.gnu.org (full text, mbox):
(CCing the bug tracker)
Hi Ruther,
Thanks for all your work and help on this issue! Sorry for the
delay - I
needed to find a quiet time to reconfigure and watch what happens
more
carefully.
Rutherther <rutherther <at> ditigal.xyz> writes:
> I am CCing you to get more information about the inconsistent
> ownership,
> if you could help with that.
>
> The most important questions are probably:
> 0. Are you sure the service actually ran after you reconfigured
> to root?
> It should definitely run after reboot, not sure if after
> reconfigure
> as the service already exists, it's just modified
To be honest, I had no understanding of how the change was taking
place. I
read about it briefly in the `guix pull --news --details` and may
have
briefly referred to the manual - I can't recall exactly. Possibly
naive of
me.
I'll give it another shot now. I've set "privileged? #f" again and
run
"system reconfigure". I see that it prints:
...
building
/gnu/store/yyn762lwzra0nqrrwhgbwwlz2k0qyn0h-upgrade-shepherd-services.scm.drv...
shepherd: Starting service host-name...
shepherd: Service host-name started.
shepherd: Service host-name running with value "Marseille".
shepherd: Service host-name has been started.
shepherd: Starting service user-homes...
shepherd: Service user-homes has been started.
shepherd: Starting service sysctl...
shepherd: Service sysctl has been started.
Then appears to hang, though I now realise that it's busy updating
ownership. Unfortunately there is no feedback to the user or
advice not to
interrupt the process. I watched the ownership in /gnu/store be
progressively updated to "guix-daemon".
It took about 5 mins, then quickly printed:
shepherd: Service user-homes has been started.
shepherd: Starting service guix-ownership...
shepherd: Changing to unprivileged guix-daemon.
shepherd: Service guix-ownership has been started.
shepherd: Starting service x11-socket-directory...
shepherd: Service x11-socket-directory has been started.
shepherd: Service user-homes has been started.
shepherd: Service tor is currently disabled.
shepherd: Service user-homes has been started.
...
The message "Changing to unprivileged guix-daemon" needs to be
visible
before the work is done though. Is it possible that stdout is
being
buffered?
> 1. Do you think you could've killed the service when it was
> running
> after you reconfigured back to privileged daemon, ie. by
> rebooting when
> it was running?
Absolutely I could have killed it - to me it just looks like
reconfigure
had locked up for some reason. I'd probably forgotten that I even
set the
"privileged?" option, or didn't consider the pause might be
related.
So yes, user error on my part, but reflecting on it, the mental
model I
have of Guix is that it's immune to issues such as power outages
during
upgrades. I realise now that this is a complex migration, so I
appreciate
that there's some state to manage and it takes time.
Possibly the uprivileged transition might need to be either:
a. more obvious to the user, eg. red text and interactive: "The
migration
to the unprivileged daemon needs to update a very large number of
files and
must not be interrupted. Do not proceed on battery power. Proceed?
(y/N)"
b. robust to being interrupted part way, such as starting the
daemon as
privileged unless the transition process ran successfully. Maybe
that's not
technically feasible though - just a thought.
> 2. Do you know what folders had wrong owners?
> - Was everything under /var/guix fully owned by 971?
> - Was everything under /etc/guix fully owned by 971?
> - Was everything under store fully owned by 971?
Not everything, just some. I'm not 100% sure it was 971, but
something like
that - the ID of a user that didn't exist on the system
(presumably because
I rolled back/reconfigured to "privileged? #t" again).
Now that it's run successfully, all of /gnu/store and /etc/guix
are
guix-daemon:guix-daemon. All of /var/guix is
guix-daemon:guix-daemon except
for files /var/guix/profiles and /var/guix/userpool, which are
root:root.
I haven't rebooted yet, but if you don't hear back from me, assume
that
worked fine.
Thanks again for all your work!
Ben
Information forwarded
to
bug-guix <at> gnu.org
:
bug#78355
; Package
guix
.
(Fri, 30 May 2025 12:30:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 78355 <at> debbugs.gnu.org (full text, mbox):
Ben Sturmfels <ben <at> sturm.com.au> writes:
> Now that it's run successfully, all of /gnu/store and /etc/guix
> are
> guix-daemon:guix-daemon. All of /var/guix is
> guix-daemon:guix-daemon except
> for files /var/guix/profiles and /var/guix/userpool, which are
> root:root.
>
> I haven't rebooted yet, but if you don't hear back from me,
> assume that
> worked fine.
After rebooting, I didn't get any WiFi, but that was expected due
to
https://issues.guix.gnu.org/78047. I think this is primarily what
was
causing me problems originally.
I ran `guix system roll-back`, which completed nearly
instantly. After
rebooting I could see that the "guix-ownership" process was
running in the
background busily changing ownership back to root:root. Once that
was
completed I ran `sudo herd restart NetworkManager` which gave me
WiFi
again.
All seems to be working as intended. I didn't test
rebooting/interrupting
the guix-ownership process part way through. Ideally it would
retry I
guess?
Regards,
Ben
Information forwarded
to
bug-guix <at> gnu.org
:
bug#78355
; Package
guix
.
(Tue, 10 Jun 2025 13:50:05 GMT)
Full text and
rfc822 format available.
Message #23 received at 78355 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Rutherther,
Rutherther <rutherther <at> ditigal.xyz> writes:
> The guix-ownership service checks /gnu/store ownership to check if the
> whole store and all files important for the daemon (/etc/guix,
> /var/guix) are owned by the appropriate user.
>
> If the folder isn't owned by appropriate user, it moves to those steps:
> 1. Fix permissions in /gnu/store - first under it, then /gnu/store
> itself as last step
> 2. Fix /var/guix
> 3. Fix /etc/guix
> 4. Fix /var/log/guix
>
> So from those laid out steps it should be obvious that if guix-ownership
> service somehow stops between steps 1 and 2, it will never recover
> ownerships of /var/guix, /etc/guix and /var/log/guix. /gnu/store should
> change owner as last.
Sorry for dropping the ball. How about the patch below?
Note that it would only help if the user retries to change ownership in
the same direction after interrupting the service; ownership change
remains fundamentally non-atomic so it’s still possible to end up in a
partially chown’d state, if one insists.
Thanks,
Ludo’.
[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index edc6f45850..c2851ef1a9 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -1997,10 +1997,9 @@ (define (guix-ownership-change-program)
lstat))
(define (claim-data-ownership uid gid)
- (format #t "Changing file ownership for /gnu/store \
+ (format #t "Changing file ownership for ~a \
and data directories to ~a:~a...~%"
- uid gid)
- (change-ownership #$(%store-prefix) uid gid)
+ #$(%store-prefix) uid gid)
(let ((excluded '("." ".." "profiles" "userpool")))
(for-each (lambda (directory)
(change-ownership (in-vicinity "/var/guix" directory)
@@ -2012,7 +2011,11 @@ (define (guix-ownership-change-program)
(chown "/var/guix" uid gid)
(change-ownership "/etc/guix" uid gid)
(mkdir-p "/var/log/guix")
- (change-ownership "/var/log/guix" uid gid))
+ (change-ownership "/var/log/guix" uid gid)
+
+ ;; Change the store last so that, if this service is interrupted,
+ ;; ownership appears as having yet to be changed.
+ (change-ownership #$(%store-prefix) uid gid))
(match (command-line)
((_ (= string->number (? integer? uid))
Reply sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
You have taken responsibility.
(Tue, 01 Jul 2025 22:30:05 GMT)
Full text and
rfc822 format available.
Notification sent
to
Rutherther <rutherther <at> ditigal.xyz>
:
bug acknowledged by developer.
(Tue, 01 Jul 2025 22:30:06 GMT)
Full text and
rfc822 format available.
Message #28 received at 78355-done <at> debbugs.gnu.org (full text, mbox):
Hi Rutherther,
Ludovic Courtès <ludo <at> gnu.org> writes:
> Rutherther <rutherther <at> ditigal.xyz> writes:
>
>> The guix-ownership service checks /gnu/store ownership to check if the
>> whole store and all files important for the daemon (/etc/guix,
>> /var/guix) are owned by the appropriate user.
>>
>> If the folder isn't owned by appropriate user, it moves to those steps:
>> 1. Fix permissions in /gnu/store - first under it, then /gnu/store
>> itself as last step
>> 2. Fix /var/guix
>> 3. Fix /etc/guix
>> 4. Fix /var/log/guix
>>
>> So from those laid out steps it should be obvious that if guix-ownership
>> service somehow stops between steps 1 and 2, it will never recover
>> ownerships of /var/guix, /etc/guix and /var/log/guix. /gnu/store should
>> change owner as last.
>
> Sorry for dropping the ball. How about the patch below?
Pushed as c33bc8008090bafda228e475dedc71cd06f56e4f.
Thanks!
Ludo'.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 30 Jul 2025 11:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.