GNU bug report logs - #58035
sync-before-registering is false, possibly the cause of empty files in the store

Previous Next

Package: guix-patches;

Reported by: Maxime Devos <maximedevos <at> telenet.be>

Date: Sat, 24 Sep 2022 01:36:01 UTC

Severity: normal

Tags: moreinfo

To reply to this bug, email your comments to 58035 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Sat, 24 Sep 2022 01:36:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maxime Devos <maximedevos <at> telenet.be>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Sat, 24 Sep 2022 01:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maxime Devos <maximedevos <at> telenet.be>
To: "guix-patches <at> gnu.org" <guix-patches <at> gnu.org>
Subject: sync-before-registering is false, possibly the cause of empty files
 in the store
Date: Sat, 24 Sep 2022 03:35:01 +0200
[Message part 1 (text/plain, inline)]
libstore/local-store.cc has the following comment:

void LocalStore::registerValidPaths(const ValidPathInfos & infos)
{
    /* SQLite will fsync by default, but the new valid paths may not be 
fsync-ed.
     * So some may want to fsync them before registering the validity, 
at the
     * expense of some speed of the path registering operation. */
    if (settings.syncBeforeRegistering) sync();

    [...]
}

However, currently sync-before-registering is set to 'false' AFAICT.  I 
think this might be the cause of bugs like 
<https://issues.guix.gnu.org/58013> (‘Can't use "guix pull"’), and maybe 
<https://issues.guix.gnu.org/57838> (‘failing to boot, probably due to 
guix gc’).

As such, I think we need to set it to 'true' by default instead.  Or if 
that turns out to be too expensive, instead do a recursive 'fsync' on 
the 'paths' (store items) that are about to be registered (and maybe 
some other tricks like disabling the 'fsync/sync' for most tests).

(Unverified if this solves those issues, just an idea ...)

Greetings,
Maxime.
[OpenPGP_0x49E3EE22191725EE.asc (application/pgp-keys, attachment)]
[OpenPGP_signature (application/pgp-signature, attachment)]

Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Tue, 04 Oct 2022 07:53:02 GMT) Full text and rfc822 format available.

Message #8 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxime Devos <maximedevos <at> telenet.be>
Cc: 58035 <at> debbugs.gnu.org
Subject: Re: bug#58035: sync-before-registering is false, possibly the cause
 of empty files in the store
Date: Tue, 04 Oct 2022 09:52:30 +0200
Hi,

Maxime Devos <maximedevos <at> telenet.be> skribis:

> libstore/local-store.cc has the following comment:
>
> void LocalStore::registerValidPaths(const ValidPathInfos & infos)
> {
>     /* SQLite will fsync by default, but the new valid paths may not
>     be fsync-ed.
>      * So some may want to fsync them before registering the validity,
>        at the
>      * expense of some speed of the path registering operation. */
>     if (settings.syncBeforeRegistering) sync();
>
>     [...]
> }
>
> However, currently sync-before-registering is set to 'false' AFAICT.
> I think this might be the cause of bugs like
> <https://issues.guix.gnu.org/58013> (‘Can't use "guix pull"’), and
> maybe <https://issues.guix.gnu.org/57838> (‘failing to boot, probably
> due to guix gc’).

It might be a factor, combined with the fact that the file system was
not properly unmounted (power outage or similar).

However, calling sync(2) for each store item is going to be expensive.
Recursive fsync/fdatasync calls are also likely to be too expensive (see
<https://issues.guix.gnu.org/55707> for a concrete example of the cost
on a spinning disk).

Thoughts?

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Tue, 04 Oct 2022 14:05:01 GMT) Full text and rfc822 format available.

Message #11 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: Maxime Devos <maximedevos <at> telenet.be>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 58035 <at> debbugs.gnu.org
Subject: Re: bug#58035: sync-before-registering is false, possibly the cause
 of empty files in the store
Date: Tue, 4 Oct 2022 16:04:29 +0200
[Message part 1 (text/plain, inline)]

On 04-10-2022 09:52, Ludovic Courtès wrote:
> Hi,
> 
> [...]
>> However, currently sync-before-registering is set to 'false' AFAICT.
>> I think this might be the cause of bugs like
>> <https://issues.guix.gnu.org/58013> (‘Can't use "guix pull"’), and
>> maybe <https://issues.guix.gnu.org/57838> (‘failing to boot, probably
>> due to guix gc’).
> 
> It might be a factor, combined with the fact that the file system was
> not properly unmounted (power outage or similar).
> 
> However, calling sync(2) for each store item is going to be expensive.
> Recursive fsync/fdatasync calls are also likely to be too expensive (see
> <https://issues.guix.gnu.org/55707> for a concrete example of the cost
> on a spinning disk).
> 
> Thoughts?

Debian uses fsync (going by https://wiki.debian.org/Teams/Dpkg/FAQ), and 
even though that according to that FAQ dpkg can be slow, in my 
experience it wasn't too bad.  Also, having to investigate store 
corruption and how to fix it is a form of slowness, especially when it 
fails or you don't have the technical expertise and consequentially you 
need to reinstall (losing old non-back-upped work).

'sync' seems relatively inexpensive to me, compared to the time required 
for building a package and even more inexpensive compared to the cost of 
debugging store corruption:

antipode <at> antipode ~$ time sync

real	0m0,230s
user	0m0,004s
sys	0m0,047s
antipode <at> antipode ~$
antipode <at> antipode ~$ time sync

real	0m0,045s
user	0m0,003s
sys	0m0,014s
antipode <at> antipode ~$ time sync

real	0m0,044s
user	0m0,004s
sys	0m0,012s

Or, after a download:

$ time "guix build download"
real	0m50,681s
user	0m3,856s
sys	0m0,198s
$ sync
# I forgot to properly time this one, but < 0.5 sec
# Don't have numbers on the time required for debugging corruption.

(On a SSD)

Also, the situation is unlike 55707 -- we don't need to call 'fsync' or 
'sync' after building each store item or writing each line of a store 
item file, we only need to do it before registering it in the database 
and returning it to the user -- in some sense, the 'fsync' can be done 
sort-of asynchronuously.

For example, if "guix build" asks for foo.drv is built, and it depends 
on bar.drv and baz.drv, then the daemon can build bar.drv, baz.drv and 
'foo.drv' (without registering or fsyncing or registering in the database).

Once all the things are built, the daemon could then fsync the things, 
and after the fsyncing completes, register things in the database -- on 
the speed, I would like to note that:

  (*) if the store items that were made were small, then fsync'ing them
      should be pretty fast, as there isn't much to sync (at least in
      theory, I think I read about some limitation in the ext3
      implementation where 'fsync' is essentially 'sync' or something
      like that?  Don't know if that's still the case, though.)

  (*) if the store items were sufficiently large (say, a bunch more than
      Linux is willing to buffer), then at some point Linux will have
      flushed most of them anyway.  I don't have a clue what heuristics
      it uses though (except for 'no more than there is free RAM :p)').

  (*) In theory, if a file is already written to disk (implicitly as
      part of some heuristic, or by an explicit 'fsync'), 'fsync'
      should be about zero cost.  Also, for a reasonable implementation
      of 'fsync', I would expect the OS to take the opportunity to
      write some other files as well (if the disk is seeking anyways,
      it might might as well write some stuff while it's moving
      to the right position and such).

(This requires changes to the daemon of course).

Another difference with 55707, is that the write/fsync pattern is very 
different -- in 55707, it's like

   write a small line (after waiting for the previous fsync to complete)
   fsync the file
   repeat very often

whereas with 'recursive fsync before registering (without other 
changes)', it's like

   write files (number and size varies)
   recursive fsync the store item (note: as written elsewhere, the cost 
of a recursive fsync should in theory be a lot less than the sum of the 
fsync cost of an individual file, as the kernel likely takes the 
opportunity to write some other stuff anyways)
   wait for fsync to complete
   repeat for the next store item (much less frequent than the previous 
case (*))

and with 'recursive fsync before registering, and delay the registering 
where possible':

  write files for a store item
  repeat for other store items
  fsync the new files (good chance they were flushed to disk already 
when the store items are large)
  wait for fsync to complete
  repeat with next call to "guix build" / "guix shell" ...

-- there should be much less frequent 'fsync' than in 55707, and the 
'fsyncs' that are done would be mostly batched.

(*) I'm not considering things like the 'can compute derivation' linter 
-- that linter would be in theory slowed down a lot, but I don't see a 
reason why that linter would need a daemon to talk too.

Summarised, the main two points are that:

  * the fsync can be delayed for a while
  * there's a good chance that delayed fsyncs are done automatically
    by the kernel in the background (making the explicit 'fsync'
    of later mostly free)
  * the total time for doing multiple fsyncs should be much less
    than the sum of the times of doing individual fsyncs

(This is currently all theoretical)

Greetings,
Maxime.
[OpenPGP_0x49E3EE22191725EE.asc (application/pgp-keys, attachment)]
[OpenPGP_signature (application/pgp-signature, attachment)]

Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Tue, 04 Oct 2022 21:51:02 GMT) Full text and rfc822 format available.

Message #14 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxime Devos <maximedevos <at> telenet.be>
Cc: 58035 <at> debbugs.gnu.org
Subject: Re: bug#58035: sync-before-registering is false, possibly the cause
 of empty files in the store
Date: Tue, 04 Oct 2022 23:49:58 +0200
Hi,

Maxime Devos <maximedevos <at> telenet.be> skribis:

> On 04-10-2022 09:52, Ludovic Courtès wrote:
>> Hi,
>> [...]
>>> However, currently sync-before-registering is set to 'false' AFAICT.
>>> I think this might be the cause of bugs like
>>> <https://issues.guix.gnu.org/58013> (‘Can't use "guix pull"’), and
>>> maybe <https://issues.guix.gnu.org/57838> (‘failing to boot, probably
>>> due to guix gc’).
>> It might be a factor, combined with the fact that the file system
>> was
>> not properly unmounted (power outage or similar).
>> However, calling sync(2) for each store item is going to be
>> expensive.
>> Recursive fsync/fdatasync calls are also likely to be too expensive (see
>> <https://issues.guix.gnu.org/55707> for a concrete example of the cost
>> on a spinning disk).
>> Thoughts?
>
> Debian uses fsync (going by https://wiki.debian.org/Teams/Dpkg/FAQ),
> and even though that according to that FAQ dpkg can be slow, in my
> experience it wasn't too bad.  Also, having to investigate store
> corruption and how to fix it is a form of slowness, especially when it
> fails or you don't have the technical expertise and consequentially
> you need to reinstall (losing old non-back-upped work).

Having not experience corruption issues as in the bug you mention, I’m
tempted to minimize the problem, but it’s true that we get such reports
from time to time.

> 'sync' seems relatively inexpensive to me, compared to the time
> required for building a package and even more inexpensive compared to
> the cost of debugging store corruption:

That’s not a fair comparison.  :-) Imagine, you run reconfigure/upgrade;
that downloads tens to hundreds of store items.  Calling sync(2) after
each item may be hardly noticeably on an SSD, but I bet it’s going to be
super expensive on an HDD.  (In the syslogd case, each fsync(2) call—not
even sync(2)—would cause pauses of several 100s of ms.)

Maybe a good test would be to run a daemon on an “average” HDD with
sync-before-registering = true and to run ‘perf timechart record’ while
it’s fetching a large number of substitutes.  That way we’d have
concrete data to talk about.

Any takers?  :-)

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Wed, 05 Oct 2022 07:59:02 GMT) Full text and rfc822 format available.

Message #17 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: Maxime Devos <maximedevos <at> telenet.be>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 58035 <at> debbugs.gnu.org
Subject: Re: bug#58035: sync-before-registering is false, possibly the cause
 of empty files in the store
Date: Wed, 5 Oct 2022 09:58:18 +0200
[Message part 1 (text/plain, inline)]

On 04-10-2022 23:49, Ludovic Courtès wrote:
> Hi,
> 
> Maxime Devos <maximedevos <at> telenet.be> skribis:
>
>> 'sync' seems relatively inexpensive to me, compared to the time
>> required for building a package and even more inexpensive compared to
>> the cost of debugging store corruption:
> 
> That’s not a fair comparison.  :-)

Possibly, openjdk is a bit of an extreme case.

> Imagine, you run reconfigure/upgrade;
> that downloads tens to hundreds of store items.  Calling sync(2) after
> each item may be hardly noticeably on an SSD, but I bet it’s going to be
> super expensive on an HDD.  (In the syslogd case, each fsync(2) call—not
> even sync(2)—would cause pauses of several 100s of ms.)

If after some testing, this turns out to be a problem, there are some 
options to avoid this (see: the delaying 'fsync' of the previous response).

> Maybe a good test would be to run a daemon on an “average” HDD with
> sync-before-registering = true and to run ‘perf timechart record’ while
> it’s fetching a large number of substitutes.  That way we’d have
> concrete data to talk about.
> 
> Any takers?  :-)

I don't have a HDD to test sync-before-registering=true with.

Greetings,
Maxime.
[OpenPGP_0x49E3EE22191725EE.asc (application/pgp-keys, attachment)]
[OpenPGP_signature (application/pgp-signature, attachment)]

Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Fri, 07 Oct 2022 07:56:02 GMT) Full text and rfc822 format available.

Message #20 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>, Maxime Devos
 <maximedevos <at> telenet.be>
Cc: 58035 <at> debbugs.gnu.org
Subject: Re: [bug#58035] sync-before-registering is false, possibly the
 cause of empty files in the store
Date: Wed, 05 Oct 2022 13:54:18 +0200
Hi,

On Tue, 04 Oct 2022 at 23:49, Ludovic Courtès <ludo <at> gnu.org> wrote:

> Maybe a good test would be to run a daemon on an “average” HDD with
> sync-before-registering = true and to run ‘perf timechart record’ while
> it’s fetching a large number of substitutes.  That way we’d have
> concrete data to talk about.

What is ’perf timechart record’?  I can take a look if no one beats me.



Cheers,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Fri, 07 Oct 2022 09:45:02 GMT) Full text and rfc822 format available.

Message #23 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: Maxime Devos <maximedevos <at> telenet.be>, 58035 <at> debbugs.gnu.org
Subject: Re: [bug#58035] sync-before-registering is false, possibly the
 cause of empty files in the store
Date: Fri, 07 Oct 2022 11:44:17 +0200
zimoun <zimon.toutoune <at> gmail.com> skribis:

> On Tue, 04 Oct 2022 at 23:49, Ludovic Courtès <ludo <at> gnu.org> wrote:
>
>> Maybe a good test would be to run a daemon on an “average” HDD with
>> sync-before-registering = true and to run ‘perf timechart record’ while
>> it’s fetching a large number of substitutes.  That way we’d have
>> concrete data to talk about.
>
> What is ’perf timechart record’?

It’s a tool of Linux’s ‘perf’—see ‘guix install perf’.

HTH!

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Mon, 17 Oct 2022 18:04:01 GMT) Full text and rfc822 format available.

Message #26 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>, Maxime Devos
 <maximedevos <at> telenet.be>
Cc: 58035 <at> debbugs.gnu.org
Subject: Re: [bug#58035] sync-before-registering is false, possibly the
 cause of empty files in the store
Date: Mon, 17 Oct 2022 20:03:28 +0200
Hi,

3x time slower with my hardware.  Closing?


On mar., 04 oct. 2022 at 23:49, Ludovic Courtès <ludo <at> gnu.org> wrote:

> Maybe a good test would be to run a daemon on an “average” HDD with
> sync-before-registering = true and to run ‘perf timechart record’ while
> it’s fetching a large number of substitutes.  That way we’d have
> concrete data to talk about.

+ Comparison using the manifest below.
+ guix-daemon built using fd4cbcbed7; two flavors.
+ Guix revision f43a783.
+ Before the experiment, complete ’guix gc’.

  # test-58035/guix-daemon --build-users-group=guixbuild
  # perf timechart record
  $ time guix build -m some-packages.scm
  # perf timechart --highlight guix

1/ sync-before-registering = false

real    6m24.215s
user    0m10.627s
sys     0m0.512s

[ perf record: Woken up 1984 times to write data ]
[ perf record: Captured and wrote 497.403 MB perf.data (5042727 samples)]


2/ sync-before-registering = true

real    19m46.470s
user    0m12.367s
sys     0m0.557s      

[ perf record: Woken up 2625 times to write data ]
[ perf record: Captured and wrote 658.066 MB perf.data (6712342 samples) ]


Well, the charts are really big.

    $ du -sh {1,2}-build-*.svg
    331M    1-build-false.svg
    599M    2-build-true.svg

Therefore, please find them using this link (alive for 15 days):

https://filesender.renater.fr/?s=download&token=0e861286-ec0c-4815-bb49-2bce91b14462


Cheers,
simon


--8<---------------cut here---------------start------------->8---
(use-modules (guix build-system haskell)
             (guix build-system julia)
             (guix packages)
             (guix profiles)
             (gnu packages))

(manifest
 (map package->manifest-entry
      (fold-packages
       (lambda (package result)
         (if (or
              ;;(eq? (package-build-system package) haskell-build-system)
              (eq? (package-build-system package) julia-build-system))
             (cons package result)
             result))
       '())))
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
diff --git a/nix/libstore/globals.cc b/nix/libstore/globals.cc
index d4f9a46a74..5f8a3a3031 100644
--- a/nix/libstore/globals.cc
+++ b/nix/libstore/globals.cc
@@ -40,7 +40,7 @@ Settings::Settings()
     reservedSize = 8 * 1024 * 1024;
     fsyncMetadata = true;
     useSQLiteWAL = true;
-    syncBeforeRegistering = false;
+    syncBeforeRegistering = true;
     useSubstitutes = true;
     useChroot = false;
     impersonateLinux26 = false;
--8<---------------cut here---------------end--------------->8---




Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Mon, 17 Oct 2022 18:52:01 GMT) Full text and rfc822 format available.

Message #29 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: Maxime Devos <maximedevos <at> telenet.be>
To: zimoun <zimon.toutoune <at> gmail.com>, Ludovic Courtès
 <ludo <at> gnu.org>
Cc: 58035 <at> debbugs.gnu.org
Subject: Re: [bug#58035] sync-before-registering is false, possibly the cause
 of empty files in the store
Date: Mon, 17 Oct 2022 20:51:43 +0200
[Message part 1 (text/plain, inline)]

On 17-10-2022 20:03, zimoun wrote:
> 3x time slower with my hardware.  Closing?

I had an alternative proposal to 'sync-before-registering = true' -- 
more work to implement, but potentially faster, so IMO no.

Greetings,
Maxime.
[OpenPGP_0x49E3EE22191725EE.asc (application/pgp-keys, attachment)]
[OpenPGP_signature (application/pgp-signature, attachment)]

Information forwarded to guix-patches <at> gnu.org:
bug#58035; Package guix-patches. (Tue, 18 Oct 2022 15:35:02 GMT) Full text and rfc822 format available.

Message #32 received at 58035 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: Maxime Devos <maximedevos <at> telenet.be>, 58035 <at> debbugs.gnu.org
Subject: Re: [bug#58035] sync-before-registering is false, possibly the
 cause of empty files in the store
Date: Tue, 18 Oct 2022 17:34:19 +0200
Hi,

zimoun <zimon.toutoune <at> gmail.com> skribis:

> 3x time slower with my hardware.  Closing?

Ouch!

> On mar., 04 oct. 2022 at 23:49, Ludovic Courtès <ludo <at> gnu.org> wrote:
>
>> Maybe a good test would be to run a daemon on an “average” HDD with
>> sync-before-registering = true and to run ‘perf timechart record’ while
>> it’s fetching a large number of substitutes.  That way we’d have
>> concrete data to talk about.
>
> + Comparison using the manifest below.
> + guix-daemon built using fd4cbcbed7; two flavors.
> + Guix revision f43a783.
> + Before the experiment, complete ’guix gc’.
>
>   # test-58035/guix-daemon --build-users-group=guixbuild
>   # perf timechart record
>   $ time guix build -m some-packages.scm
>   # perf timechart --highlight guix

OK.

> 1/ sync-before-registering = false
>
> real    6m24.215s
> user    0m10.627s
> sys     0m0.512s
>
> [ perf record: Woken up 1984 times to write data ]
> [ perf record: Captured and wrote 497.403 MB perf.data (5042727 samples)]
>
>
> 2/ sync-before-registering = true
>
> real    19m46.470s
> user    0m12.367s
> sys     0m0.557s      
>
> [ perf record: Woken up 2625 times to write data ]
> [ perf record: Captured and wrote 658.066 MB perf.data (6712342 samples) ]

That speaks for itself.

Let’s keep the bug open in case Maxime or someone else comes up with a
more practical fix.

Thanks,
Ludo’.




Added tag(s) moreinfo. Request was from Christopher Baines <mail <at> cbaines.net> to control <at> debbugs.gnu.org. (Thu, 03 Nov 2022 15:42:02 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 223 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.