GNU bug report logs - #24937
"deleting unused links" GC phase is too slow

Previous Next

Package: guix;

Reported by: ludo <at> gnu.org (Ludovic Courtès)

Date: Sun, 13 Nov 2016 17:42:02 UTC

Severity: important

Full log


Message #64 received at 24937 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 24937 <at> debbugs.gnu.org
Subject: Re: bug#24937: "deleting unused links" GC phase is too slow
Date: Sat, 13 Nov 2021 17:56:52 +0100
[Message part 1 (text/plain, inline)]
Hi,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:

> I haven't done any analysis, just grabbed the result, but here it what
> it looks for me:

There’s a bit more than 35% of deduplicated files that are < 1KiB, and
not much to be gained by deduplicating them.

On IRC several people shared the results on their machine; several had
similar results, and one person had a lot more of those small files (50%
of deduplicated files were < 1KiB).

The chart (with a kinda bogus layout) below is perhaps more interesting:
it shows the contribution of files below a certain size to the overall
space savings.

[space-saving-contribution.png (image/png, inline)]
[Message part 3 (text/plain, inline)]
In a nutshell:

  • Files < 1KiB contribute to 0.3% of the space savings;

  • Files < 4KiB contribute to 2.5% of the space savings;

  • Files < 256KiB contribute to 42% of the space savings.

You can create this plot with:

--8<---------------cut here---------------start------------->8---
(make-scatter-plot #:title "Contribution to space savings"
                   #:write-to-png "/tmp/space-saving-contribution.png"
                   #:chart-width 1000
                   #:y-axis-label "contribution (%)"
                   #:x-axis-label "size (B)"
                   #:log-x-base 2
                   #:min-x 513
                   #:data
                   (let ((total (saved-space l)))
                     `(("contribution"
                        ,@(map (lambda (size)
                                 (cons size
                                       (/ (saved-space (filter (lambda (file)
                                                                 (< (deduplicated-file-size
                                                                     file)
                                                                    size))
                                                               l))
                                          total .01)))
                               (map (cut expt 2 <>)
                                    (iota 12 10 1)))))))
--8<---------------cut here---------------end--------------->8---

You can also compute individual points like this:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) 1024))
					     l))
			(saved-space l) 1.)
$60 = 0.0034284626558736746
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) 4096))
					     l))
			(saved-space l) 1.)
$62 = 0.025190871178467848
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) (expt 2 18)))
					     l))
			(saved-space l) 1.)
$65 = 0.42411104869782185
--8<---------------cut here---------------end--------------->8---

Choosing a deduplication threshold of 2KiB or 4KiB would have a
negligible impact on disk usage on my machine.

Thanks,
Ludo’.

This bug report was last modified 3 years and 203 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.