GNU bug report logs - #65720
Guile-Git-managed checkouts grow way too much

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Sun, 3 Sep 2023 20:45:02 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #43 received at 65720 <at> debbugs.gnu.org (full text, mbox):

From: Simon Tournier <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, 65720 <at> debbugs.gnu.org
Subject: Re: bug#65720: Guile-Git-managed checkouts grow way too much
Date: Sat, 09 Sep 2023 12:31:48 +0200
Hi,

On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès <ludo <at> gnu.org> wrote:

>>> It would also be pretty bad for closure size:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> $ guix size guile-git | tail -1
>>> total: 106.6 MiB
>>> $ guix size guile-git git-minimal | tail -1
>>> total: 169.8 MiB
>>> --8<---------------cut here---------------end--------------->8---
>>>
>>> It’s also not clear concretely how we’d add that dependency.  Try
>>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>>> But then, what about applications like Cuirass and hpcguix-web?
>>
>> I think we can rely on something like,
>>
>>     guix shell -C git-minimal -- git gc
>
> We’re talking about the implementation of a cache (meant to speed up
> operations), that would actually fill said cache plus do a whole bunch
> of expensive operations?  Nah.  :-)

I do not think.  If I understand correctly, we need to run “git gc” at
some point, therefore git-minimal needs to me around.  The question is
how and when.

Well, maybe I am missing what the bug is about.  For me, it is about
running ‘git gc’ for cleaning the Git checkout cache, no?


Solution #1.  Add git-minimal as inputs.  It increases the closure and
the extra load (on average) is about the ratio between the rate of “guix
pull” and the rate of the git-minimal changes.

Assuming, that people are running “guix pull” once per week and say “git
gc” is run after 50 pulls.  (These both number are totally arbitrary and
based on my personal estimate).

Data Service [1] tells:

        2023-07-07 15:45:22 2023-09-08 21:22:08
        2023-05-11 16:10:48 2023-07-07 14:21:45
        2023-05-01 16:40:08 2023-05-11 14:36:16
        2023-04-25 13:34:54 2023-05-01 15:19:55
        2023-04-25 13:34:54 2023-09-08 21:22:08        
        2023-03-06 17:22:28 2023-04-25 12:27:33
        2023-01-17 23:49:19 2023-03-06 16:48:43
        2022-11-08 13:06:42 2023-01-17 15:11:47
        2022-10-08 05:14:46 2022-11-08 09:56:31
        2022-09-06 15:00:08 2022-10-08 04:15:43
        2022-08-13 22:02:31 2022-09-06 12:58:52
        …

It means that an user will download ~10 times git-minimal for nothing.


Solution #2.  The one I am proposing. :-)  Download git-minimal only
when Guix needs it for running “git gc”.  Yeah, there is probably a
small overload with some operations.  But, I bet this overload is much
smaller than the one of solution #1.

Well, it depends on the number of times people are updating the cache vs
the rate of change of git-minimal.

For sure, if one updates 100 times per week the cache, having
git-minimal as inputs is far better.  But I do not think that the
regular usage on average. :-)

That’s why I am proposing to have an option for turning off this “git
gc“ operation.

Well, we have lived since years without running ‘git gc’ so running it
once per year on average is probably enough to keep the cache size
reasonable.  And git-minimal is changing every month.


Maybe, there is some solution #3. ;-)

Cheers,
simon


1: https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history




This bug report was last modified 1 year and 177 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.