GNU bug report logs -
#65720
Guile-Git-managed checkouts grow way too much
Previous Next
Reported by: Ludovic Courtès <ludo <at> gnu.org>
Date: Sun, 3 Sep 2023 20:45:02 UTC
Severity: important
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> skribis:
> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason. As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M .git
Unsurprisingly, GC makes a big difference:
--8<---------------cut here---------------start------------->8---
$ cp -r ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq /tmp/checkout
$ (cd /tmp/checkout/; git gc)
Enumerating objects: 717785, done.
Counting objects: 100% (717785/717785), done.
Delta compression using up to 4 threads
Compressing objects: 100% (154644/154644), done.
Writing objects: 100% (717785/717785), done.
Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0
Enumerating cruft objects: 103412, done.
Traversing cruft objects: 81753, done.
Counting objects: 100% (64171/64171), done.
Delta compression using up to 4 threads
Compressing objects: 100% (17379/17379), done.
Writing objects: 100% (64171/64171), done.
Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0
Expanding reachable commits in commit graph: 133730, done.
$ du -hs /tmp/checkout
539M /tmp/checkout
--8<---------------cut here---------------end--------------->8---
> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.
Confirmed: <https://github.com/libgit2/libgit2/issues/3247>.
My inclination for the short term would be to work around this
limitation by (1) finding a heuristic to determine is a checkout has
likely accumulated too much cruft, and (2) considering such checkouts as
expired (thereby forcing a re-clone) or running ‘git gc’ on them if
‘git’ is available.
I can’t think of a good heuristic for (1). Birth time could be one, but
we’d need statx(2):
--8<---------------cut here---------------start------------->8---
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -4
Access: 2023-09-04 23:13:54.668279105 +0200
Modify: 2023-09-04 11:34:41.665385000 +0200
Change: 2023-09-04 11:34:41.661629102 +0200
Birth: 2021-08-09 10:48:17.748722151 +0200
--8<---------------cut here---------------end--------------->8---
Lacking statx(2), we can approximate creation time by looking at
‘.git/config’:
--8<---------------cut here---------------start------------->8---
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config | tail -3
Modify: 2021-08-09 10:50:28.031760953 +0200
Change: 2021-08-09 10:50:28.031760953 +0200
Birth: 2021-08-09 10:50:28.031760953 +0200
--8<---------------cut here---------------end--------------->8---
This strategy can be implemented like this:
[Message part 2 (text/x-patch, inline)]
diff --git a/guix/git.scm b/guix/git.scm
index ebe2600209..ed3fa56bc8 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -405,7 +405,16 @@ (define cached-checkout-expiration
;; Use the mtime rather than the atime to cope with file systems mounted
;; with 'noatime'.
- (file-expiration-time (* 90 24 3600) stat:mtime))
+ (let ((ttl (* 90 24 3600))
+ (max-checkout-retention (* 9 30 24 3600)))
+ (lambda (file)
+ (match (false-if-exception (lstat file))
+ (#f 0) ;FILE may have been deleted in the meantime
+ (st (min (pk 'ttl (+ (stat:mtime st) ttl))
+ (pk 'maxttl (match (false-if-exception
+ (lstat (in-vicinity file ".git/config")))
+ (#f +inf.0)
+ (st (+ (stat:mtime st) max-checkout-retention))))))))))
(define %checkout-cache-cleanup-period
;; Period for the removal of expired cached checkouts.
[Message part 3 (text/plain, inline)]
Namely, a cached checkout as considered as “expired” after 9 months. In
my case, it gives this:
--8<---------------cut here---------------start------------->8---
scheme@(guix git)> (cached-checkout-expiration "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")
;;; (ttl 1701596081)
;;; (maxttl 1651827028)
$6 = 1651827028
--8<---------------cut here---------------end--------------->8---
Of course having to re-clone entire repositories every 9 months is
ridiculous, but storing gigabytes of packs is worse IMO (I’m
specifically thinking about the Guix repo, which every users copies via
‘guix pull’).
Thoughts?
Thanks,
Ludo’.
This bug report was last modified 1 year and 178 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.