Package: guix-patches;
Reported by: Christopher Baines <mail <at> cbaines.net>
Date: Tue, 23 Jun 2020 16:38:02 UTC
Severity: normal
Tags: fixed, patch
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Message #14 received at 42023 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: Caleb Ristvedt <caleb.ristvedt <at> cune.org> Cc: 42023 <at> debbugs.gnu.org, Christopher Baines <mail <at> cbaines.net> Subject: Re: [bug#42023] [PATCH] database: register-items: reduce transaction scope. Date: Thu, 25 Jun 2020 09:45:14 +0200
[Message part 1 (text/plain, inline)]
Hi, Caleb Ristvedt <caleb.ristvedt <at> cune.org> skribis: > Ludovic Courtès <ludo <at> gnu.org> writes: [...] >> Two questions: >> >> 1. Can another process come and fiddle with TO-REGISTER while we’re >> still in ‘reset-timestamps’? Or can GC happen while we’re in >> ‘reset-timestamps’ and delete TO-REGISTER and remove it from the >> database? > >> >> I think none of these scenarios can happen, as long as we’ve taken the >> .lock file for TO-REGISTER before, like ‘finalize-store-file’ does. > > The lock file has no bearing on liveness of the path it locks, though > the liveness of the path it locks *does* count as liveness for the lock > file and for the .chroot file; see tryToDelete() in nix/libstore/gc.cc. > > (Well, semi-liveness; .lock and .chroot files won't show up in a list of > live paths, but they will still be protected from deletion if their > associated store item is a temp root) > > So general "fiddling" can't happen, but GC can. The responsibility for > both locking and registering as temporary roots falls on the caller, > currently, as I believe it should. We may want to document this > responsibility in the docstring for register-items, though. Yes, it would be good to add a sentence to document it. > So currently finalize-store-file is safe from clobbering by another > process attempting to finalize to the same path, but not safe from > garbage collection between when the temporary store file is renamed and > when it is registered. It needs an add-temp-root prior to renaming. Ah, so we’d need to do that before applying the patch that reduces the scope of the transaction, right? >> 2. After the transaction, TO-REGISTER is considered valid. But are >> the effects of the on-going deduplication observable, due to >> non-atomicity of some operation? > > Subdirectories of store items need to be made writable prior to renaming > the temp link, so there will necessarily be a window of time during > which various subdirectories will appear writable. I don't think this > should cause problems; we already assume that the .lock file is held, so > we should be the only process attempting to deduplicate it. On a related > note, we should probably use dynamic-wind for setting and restoring the > permissions in replace-with-link. Yes. Here’s a patch for ‘dynamic-wind’:
[Message part 2 (text/x-patch, inline)]
diff --git a/.dir-locals.el b/.dir-locals.el index 92979fc5ed..155255a770 100644 --- a/.dir-locals.el +++ b/.dir-locals.el @@ -37,6 +37,7 @@ (eval . (put 'with-file-lock 'scheme-indent-function 1)) (eval . (put 'with-file-lock/no-wait 'scheme-indent-function 1)) (eval . (put 'with-profile-lock 'scheme-indent-function 1)) + (eval . (put 'with-writable-file 'scheme-indent-function 1)) (eval . (put 'package 'scheme-indent-function 0)) (eval . (put 'origin 'scheme-indent-function 0)) diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm index 6784ee0b92..af52c03370 100644 --- a/guix/store/deduplication.scm +++ b/guix/store/deduplication.scm @@ -94,6 +94,20 @@ LINK-PREFIX." (try (tempname-in link-prefix)) (apply throw args)))))) +(define (call-with-writable-file file thunk) + (let ((stat (lstat file))) + (dynamic-wind + (lambda () + (make-file-writable file)) + thunk + (lambda () + (set-file-time file stat) + (chmod file (stat:mode stat)))))) + +(define-syntax-rule (with-writable-file file exp ...) + "Make FILE writable for the dynamic extent of EXP..." + (call-with-writable-file file (lambda () exp ...))) + ;; There are 3 main kinds of errors we can get from hardlinking: "Too many ;; things link to this" (EMLINK), "this link already exists" (EEXIST), and ;; "can't fit more stuff in this directory" (ENOSPC). @@ -120,20 +134,14 @@ Note: TARGET, TO-REPLACE, and SWAP-DIRECTORY must be on the same file system." ;; If we couldn't create TEMP-LINK, that's OK: just don't do the ;; replacement, which means TO-REPLACE won't be deduplicated. (when temp-link - (let* ((parent (dirname to-replace)) - (stat (stat parent))) - (make-file-writable parent) + (with-writable-file (dirname to-replace) (catch 'system-error (lambda () (rename-file temp-link to-replace)) (lambda args (delete-file temp-link) (unless (= EMLINK (system-error-errno args)) - (apply throw args)))) - - ;; Restore PARENT's mtime and permissions. - (set-file-time parent stat) - (chmod parent (stat:mode stat))))) + (apply throw args))))))) (define* (deduplicate path hash #:key (store %store-directory)) "Check if a store item with sha256 hash HASH already exists. If so,
[Message part 3 (text/plain, inline)]
> Also, replace-with-link doesn't check whether the "containing directory" > is the store like optimisePath_() does, so in theory if another process > tried to make a permanent change to the store's permissions it could be > clobbered when replace-with-link restores the permissions. I don't know > of any instance where this could happen currently, but it's something to > keep in mind. I guess we should also avoid changing permissions on /gnu/store, it would be wiser. > And, of course, there's the inherent visible change of deduplication - > possible modification of inode number, but I don't see how that could > cause problems with any reasonable programs. Yes, that’s fine. >> I think the ‘replace-with-link’ dance is atomic, so we should be fine. >> >> Thoughts? > > replace-with-link is atomic, but it can fail if the "canonical" link in > .links is deleted by the garbage collector between when its existence is > checked in 'deduplicate' and when temp link creation in > replace-with-link happens. Then ENOENT would be returned, and > 'deduplicate' wouldn't catch that exception, nor would optimisePath_() > in nix/libstore/optimise-store.cc. > > The proper behavior there, in my opinion, would be to retry the > deduplication. Attached is a patch that makes 'deduplicate' do that. > > Also, while I'm perusing optimisePath_(), there's a minor bug in which > EMLINK generated by renaming the temp link may still be treated as a > fatal error. This is because errno may be clobbered by unlink() prior to > being checked (according to the errno man page it can still be modified > even if the call succeeds). Indeed, good catch! > From 461064da9e169df3dd939b734bb2860598d113f4 Mon Sep 17 00:00:00 2001 > From: Caleb Ristvedt <caleb.ristvedt <at> cune.org> > Date: Wed, 24 Jun 2020 00:56:50 -0500 > Subject: [PATCH 1/2] deduplication: retry on ENOENT. > > It's possible for the garbage collector to remove the "canonical" link after > it's been detected as existing by 'deduplicate'. This would cause an ENOENT > error when replace-with-link attempts to create the temporary link. This > changes it so that it will properly handle that by retrying. > > * guix/store/deduplication.scm (deduplicate): retry on ENOENT. > --- > guix/store/deduplication.scm | 64 +++++++++++++++++++++++------------- > 1 file changed, 41 insertions(+), 23 deletions(-) > > diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm > index 6784ee0b92..b6d94e49c2 100644 > --- a/guix/store/deduplication.scm > +++ b/guix/store/deduplication.scm > @@ -161,26 +161,44 @@ under STORE." > (scandir* path)) > (let ((link-file (string-append links-directory "/" > (bytevector->nix-base32-string hash)))) > - (if (file-exists? link-file) > - (replace-with-link link-file path > - #:swap-directory links-directory) > - (catch 'system-error > - (lambda () > - (link path link-file)) > - (lambda args > - (let ((errno (system-error-errno args))) > - (cond ((= errno EEXIST) > - ;; Someone else put an entry for PATH in > - ;; LINKS-DIRECTORY before we could. Let's use it. > - (replace-with-link path link-file > - #:swap-directory links-directory)) > - ((= errno ENOSPC) > - ;; There's not enough room in the directory index for > - ;; more entries in .links, but that's fine: we can > - ;; just stop. > - #f) > - ((= errno EMLINK) > - ;; PATH has reached the maximum number of links, but > - ;; that's OK: we just can't deduplicate it more. > - #f) > - (else (apply throw args))))))))))) > + (let retry () > + (if (file-exists? link-file) > + (catch 'system-error > + (lambda () > + (replace-with-link link-file path > + #:swap-directory links-directory)) > + (lambda args > + (if (and (= (system-error-errno args) ENOENT) > + ;; ENOENT can be produced because: > + ;; - LINK-FILE has missing directory components > + ;; - LINKS-DIRECTORY has missing directory > + ;; components > + ;; - PATH has missing directory components > + ;; > + ;; the last two are errors, the first just > + ;; requires a retry. > + (file-exists? (dirname path)) > + (file-exists? links-directory)) > + (retry) > + (apply throw args)))) I feel like there are TOCTTOU issues here and redundant ‘stat’ calls, plus the risk of catching a ‘system-error’ coming from “somewhere else.” What about baking this logic directly in ‘replace-with-link’, and replacing ‘file-exists?’ calls by 'system-error handling? > From a14ff0a9dab4d73680befaf9d244d6cce0a73ab3 Mon Sep 17 00:00:00 2001 > From: Caleb Ristvedt <caleb.ristvedt <at> cune.org> > Date: Wed, 24 Jun 2020 01:00:40 -0500 > Subject: [PATCH 2/2] .dir-locals.el: fix call-with-{retrying-}transaction > indenting. > > * .dir-locals.el (call-with-transaction, call-with-retrying-transaction): > change scheme-indent-function property from 2 to 1. LGTM! Thanks for your quick feedback and thorough analyses! Ludo’.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.