GNU bug report logs - #25018
GC incorrectly removes the temporary root file of the calling process

Previous Next

Package: guix;

Reported by: ludo <at> gnu.org (Ludovic Courtès)

Date: Thu, 24 Nov 2016 14:08:01 UTC

Severity: important

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#25018: closed (GC incorrectly removes the temporary root file
 of the calling process)
Date: Fri, 07 Oct 2022 21:00:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Fri, 07 Oct 2022 16:59:20 -0400
with message-id <87pmf34afr.fsf <at> gmail.com>
and subject line Re: bug#25018: GC incorrectly removes the temporary root file of the calling process
has caused the debbugs.gnu.org bug report #25018,
regarding GC incorrectly removes the temporary root file of the calling process
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
25018: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=25018
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: ludo <at> gnu.org (Ludovic Courtès)
To: Eelco Dolstra <eelco.dolstra <at> logicblox.com>
Cc: Hartmut Goebel <h.goebel <at> crazy-compilers.com>, bug-guix <at> gnu.org
Subject: GC incorrectly removes the temporary root file of the calling process
Date: Thu, 24 Nov 2016 15:07:24 +0100
[Message part 3 (text/plain, inline)]
Hello,

The ‘readTempRoots’ function in gc.cc has this:

        /* Try to acquire a write lock without blocking.  This can
           only succeed if the owning process has died.  In that case
           we don't care about its temporary roots. */
        if (lockFile(*fd, ltWrite, false)) {
            printMsg(lvlError, format("removing stale temporary roots file `%1%'") % path);
            unlink(path.c_str());

There’s a thinko here: locking the file also succeeds when the lock is
already held by the calling process.

In that case, this code ends up removing the temporary root file of
calling process, which is bad.  Here’s a sample session:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(guix)
scheme@(guile-user)> (define s (open-connection))
scheme@(guile-user)> (current-build-output-port (current-error-port))
$2 = #<output: file /dev/pts/9>
scheme@(guile-user)> (set-build-options s #:verbosity 10)
$3 = #t
scheme@(guile-user)> (add-text-to-store s "foo" "bar!")
acquiring global GC lock `/var/guix/gc.lock'
acquiring read lock on `/var/guix/temproots/4259'
acquiring write lock on `/var/guix/temproots/4259'
downgrading to read lock on `/var/guix/temproots/4259'
locking path `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
lock acquired on `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo.lock'
`/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo' has hash `c756ef12a70bad10c9ac276ecd1213ea7cc3a2e6c462ba47e4f9a88756a055d0'
lock released on `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo.lock'
$4 = "/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo"
scheme@(guile-user)> (delete-paths s (list $4))
acquiring global GC lock `/var/guix/gc.lock'
finding garbage collector roots...
executing `/gnu/store/l99rkv2713nl53kr3gn4akinvifsx19h-guix-0.11.0-3.7ca3/libexec/guix/list-runtime-roots' to find additional roots
[…]
reading temporary root file `/var/guix/temproots/4259'
removing stale temporary roots file `/var/guix/temproots/4259'
[…]
considering whether to delete `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
|   invalidating path `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
|   deleting `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
|   recursively deleting path `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
|   |   /gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo
deleting `/gnu/store/trash'
recursively deleting path `/gnu/store/trash'
|   /gnu/store/trash
deleting unused links...
deleting unused link `/gnu/store/.links/1l2ml1b8ga7rwi3vlqn4wsic6z7a2c9csvi7mk4i1b8blw9fymn7'
note: currently hard linking saves 6699.22 MiB
$5 = ("/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo")
$6 = 4096
--8<---------------cut here---------------end--------------->8---

Notice the “removing stale temporary roots file” message.

Eelco: shouldn’t it be changed along the lines of the attached path?

Thanks,
Ludo’.

[Message part 4 (text/x-patch, inline)]
diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
index 72eff52..d92388f 100644
--- a/nix/libstore/gc.cc
+++ b/nix/libstore/gc.cc
@@ -2,6 +2,7 @@
 #include "misc.hh"
 #include "local-store.hh"
 
+#include <string>
 #include <functional>
 #include <queue>
 #include <algorithm>
@@ -225,10 +226,10 @@ static void readTempRoots(PathSet & tempRoots, FDs & fds)
         //FDPtr fd(new AutoCloseFD(openLockFile(path, false)));
         //if (*fd == -1) continue;
 
-        /* Try to acquire a write lock without blocking.  This can
-           only succeed if the owning process has died.  In that case
-           we don't care about its temporary roots. */
-        if (lockFile(*fd, ltWrite, false)) {
+        /* Try to acquire a write lock without blocking.  This can only
+           succeed if the owning process has died, in which case we don't care
+           about its temporary roots, or if we are the owning process.  */
+        if (i.name != std::to_string(getpid()) && lockFile(*fd, ltWrite, false)) {
             printMsg(lvlError, format("removing stale temporary roots file `%1%'") % path);
             unlink(path.c_str());
             writeFull(*fd, "d");
[Message part 5 (message/rfc822, inline)]
From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: h.goebel <at> crazy-compilers.com, Eelco Dolstra <eelco.dolstra <at> logicblox.com>,
 25018-done <at> debbugs.gnu.org
Subject: Re: bug#25018: GC incorrectly removes the temporary root file of
 the calling process
Date: Fri, 07 Oct 2022 16:59:20 -0400
Hi Ludo,

ludo <at> gnu.org (Ludovic Courtès) writes:

> Hello,
>
> The ‘readTempRoots’ function in gc.cc has this:
>
>         /* Try to acquire a write lock without blocking.  This can
>            only succeed if the owning process has died.  In that case
>            we don't care about its temporary roots. */
>         if (lockFile(*fd, ltWrite, false)) {
>             printMsg(lvlError, format("removing stale temporary roots file `%1%'") % path);
>             unlink(path.c_str());
>
> There’s a thinko here: locking the file also succeeds when the lock is
> already held by the calling process.
>
> In that case, this code ends up removing the temporary root file of
> calling process, which is bad.  Here’s a sample session:
>
> scheme@(guile-user)> ,use(guix)
> scheme@(guile-user)> (define s (open-connection))
> scheme@(guile-user)> (current-build-output-port (current-error-port))
> $2 = #<output: file /dev/pts/9>
> scheme@(guile-user)> (set-build-options s #:verbosity 10)
> $3 = #t
> scheme@(guile-user)> (add-text-to-store s "foo" "bar!")
> acquiring global GC lock `/var/guix/gc.lock'
> acquiring read lock on `/var/guix/temproots/4259'
> acquiring write lock on `/var/guix/temproots/4259'
> downgrading to read lock on `/var/guix/temproots/4259'
> locking path `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
> lock acquired on `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo.lock'
> `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo' has hash `c756ef12a70bad10c9ac276ecd1213ea7cc3a2e6c462ba47e4f9a88756a055d0'
> lock released on `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo.lock'
> $4 = "/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo"
> scheme@(guile-user)> (delete-paths s (list $4))
> acquiring global GC lock `/var/guix/gc.lock'
> finding garbage collector roots...
> executing `/gnu/store/l99rkv2713nl53kr3gn4akinvifsx19h-guix-0.11.0-3.7ca3/libexec/guix/list-runtime-roots' to find additional roots
> […]
> reading temporary root file `/var/guix/temproots/4259'
> removing stale temporary roots file `/var/guix/temproots/4259'
> […]
> considering whether to delete `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
> |   invalidating path `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
> |   deleting `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
> |   recursively deleting path `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
> |   |   /gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo
> deleting `/gnu/store/trash'
> recursively deleting path `/gnu/store/trash'
> |   /gnu/store/trash
> deleting unused links...
> deleting unused link `/gnu/store/.links/1l2ml1b8ga7rwi3vlqn4wsic6z7a2c9csvi7mk4i1b8blw9fymn7'
> note: currently hard linking saves 6699.22 MiB
> $5 = ("/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo")
> $6 = 4096
>
> Notice the “removing stale temporary roots file” message.
>
> Eelco: shouldn’t it be changed along the lines of the attached path?
>
>
> Thanks,
> Ludo’.
>
> diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
> index 72eff52..d92388f 100644
> --- a/nix/libstore/gc.cc
> +++ b/nix/libstore/gc.cc
> @@ -2,6 +2,7 @@
>  #include "misc.hh"
>  #include "local-store.hh"
>  
> +#include <string>
>  #include <functional>
>  #include <queue>
>  #include <algorithm>
> @@ -225,10 +226,10 @@ static void readTempRoots(PathSet & tempRoots, FDs & fds)
>          //FDPtr fd(new AutoCloseFD(openLockFile(path, false)));
>          //if (*fd == -1) continue;
>  
> -        /* Try to acquire a write lock without blocking.  This can
> -           only succeed if the owning process has died.  In that case
> -           we don't care about its temporary roots. */
> -        if (lockFile(*fd, ltWrite, false)) {
> +        /* Try to acquire a write lock without blocking.  This can only
> +           succeed if the owning process has died, in which case we don't care
> +           about its temporary roots, or if we are the owning process.  */
> +        if (i.name != std::to_string(getpid()) && lockFile(*fd, ltWrite, false)) {
>              printMsg(lvlError, format("removing stale temporary roots file `%1%'") % path);
>              unlink(path.c_str());
>              writeFull(*fd, "d");
>

I'm not Eelco, but your change LGTM.  Note that the upstream version
still uses the original code [0].

I've installed the change, tested that it had the expected result:

--8<---------------cut here---------------start------------->8---
reading temporary root file `/var/guix/temproots/8386'
waiting for read lock on `/var/guix/temproots/8386'
got temporary root `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
considering whether to delete `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo'
|   cannot delete `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo' because it's a root
|   cannot delete `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo' because it's still reachable
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
ERROR:
  1. &store-protocol-error:
      message: "cannot delete path `/gnu/store/0siy93lggjw7sfdg8gsvrzafaa974h2d-foo' since it is still alive"
      status: 1
--8<---------------cut here---------------end--------------->8---

and pushed!

Closing.

[0]  https://github.com/NixOS/nix/blob/master/src/libstore/gc.cc#L194

-- 
Thanks,
Maxim


This bug report was last modified 2 years and 238 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.