GNU bug report logs - #76503
[GCD] Migrating repositories, issues, and patches to Codeberg

Previous Next

Package: guix-patches;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Sun, 23 Feb 2025 15:21:02 UTC

Severity: normal

Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Full log


View this message in rfc822 format

From: Arun Isaac <arunisaac <at> systemreboot.net>
To: 76503 <at> debbugs.gnu.org
Cc: Ricardo Wurmus <rekado <at> elephly.net>, Christopher Baines <guix <at> cbaines.net>, Ludovic Courtès <ludo <at> gnu.org>, Benjamin Slade <slade <at> lambda-y.net>
Subject: [bug#76503] [GCD] Migrating repositories, issues, and patches to Codeberg
Date: Tue, 25 Feb 2025 14:03:02 +0000
Hi,

First off, full disclosure: Along with Ricardo, I wrote most of mumi and
the underlying guile libraries. So, I am simultaneously saddened and
relieved that we are deprecating mumi and moving to Codeberg—saddened
because mumi was many years of work, and relieved because I do not have
to bear any responsibility for the future! So, very mixed feelings. :-)

And, while I strongly prefer the email workflow, I concede that moving
to a pull request workflow will lower the barrier to entry simply
because it is more familiar thanks to GitHub's dominant mindshare. So,
unless there is significant support for mumi and the email workflow, I
will stand aside and go with the flow of the community. That said, my
arguments against Codeberg follow.

GitHub invented the fork+pull-request workflow. Despite git's
under-the-hood hardlinking magic, the pull request workflow is very
expensive in terms of disk space. Every fork takes up disk space, and we
need one fork per contributor. That's an enormous amount of disk space.
Now, I would think git's object storage model makes it easy to
deduplicate storage. But apparently, it's not that simple. GitHub does
complex software engineering behind the scenes to make it all seem easy.
See [1] for details. Cash-strapped and labour-starved community projects
like Codeberg/Forgejo don't (yet?) have the means to replicate such
cloud-scale software engineering. In fact, I am not speculating here;
Codeberg openly mentions this in [2]. Codeberg is already strapped for
storage and is introducing storage quotas[3] to make up for it.

The storage issue is particularly pertinent considering how big of a
project we are and how big our git repo is. At this moment, our git repo
is close to 1 GiB in size. Incidentally, that's already bigger than the
750 MiB Codeberg storage quota. Codeberg is willing to allocate more
storage on a per-project basis, but you have to request for it. Now, we
have a little more than 1K contributors. That means, we are already up
to 1 TiB in storage. That's enormous, especially considering that all
data on Codeberg combined adds up to only 12 TiB[3]. And, for
comparison, all our substitutes stored up from the founding of Guix only
add up to about 25 TiB[4]. Now, imagine we had a 10-fold increase in
contributors—a prospect that's pretty easy considering drive-by
contributors—we would be up to 10 TiB in storage for our git repo.
That's ginormous! We would be putting a lot of stress on a tiny
community organization.

Then, there's the bandwidth issue as well. Our git repo is not merely a
development repo. Every Guix user clones[5] it and pulls from it
regularly. As the Guix userbase grows, this will be an increasingly
heavy toll on Codeberg's bandwidth. [2] already alludes to occasional
slowness when the load is high. Even if we move to Codeberg, we should
maintain a mirror of the git repo on our own machines that is actually
capable of serving a large userbase.

I was present with Ludo and others when we visited the Codeberg stall at
FOSDEM, and enquired about the possibility of hosting Guix on Codeberg.
The person at the stall was hesitant about our large repo, and our many
users. In fact, in order to save on disk space, they suggested that we
encourage our contributors to delete their forks once done. :-D Needless
to say, that's never going to happen!

As well-intentioned as Codeberg is, a single non-profit hoping to host
all the git repos in the world in perpetuity and free of charge is a
very tough proposition. Technical solutions like federation that may
mitigate this are unlikely to materialize, especially in the short term.
More likely, Codeberg may go the way of Matrix, where federation is
possible in theory, but matrix.org is the only server in practice.

It would be nice if we were just a normal software project that could
fit well into a traditional forge. But, we are not. We are a
"distribution", and "distributing" is part of our core competency.
Critical parts of our distribution infrastructure should be directly
under our own control. We are a large enough and specialized enough
organization that this is necessary.

[1]: https://github.blog/open-source/git/counting-objects/
[2]: https://blog.codeberg.org/the-hardest-scaling-issue.html
[3]:
https://blog.codeberg.org/more-power-for-you-what-a-storage-quota-will-bring.html
[4]: number from Chris Baines

[5]: Quick digression: Users must actually download about 1 GiB of data
on their first guix pull. That's frustrating to new users, and
effectively excludes users from parts of the world where a good Internet
connection cannot be taken for granted.

Cheers,
Arun




This bug report was last modified 16 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.