GNU bug report logs - #43442
Code stored with Subversion (SVN) cannot be retrieved from SWH

Previous Next

Package: guix-patches;

Reported by: zimoun <zimon.toutoune <at> gmail.com>

Date: Wed, 16 Sep 2020 08:15:01 UTC

Severity: important

Tags: patch

Done: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Bug is archived. No further changes may be made.

Full log


Message #43 received at 43442 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 43442 <at> debbugs.gnu.org
Subject: Re: bug#43442: [PATCH] Fixes init of #42162: gforge.inria.fr down
 Dec. 2020
Date: Mon, 20 Mar 2023 15:09:11 +0100
[Message part 1 (text/plain, inline)]
Hi!

Ludovic Courtès <ludo <at> gnu.org> skribis:

> Ah yes, under “extra_headers” there’s the SVN revision number.  (guix
> swh) doesn’t expose “extra_headers” yet, but once it does, we can walk
> snapshots until we find the SVN revision we’re looking for.
>
> scheme@(guile-user)> (lookup-origin "https://scm.gforge.inria.fr/anonscm/svn/mpfi/")
> $3 = #<<origin> visits-url: "https://archive.softwareheritage.org/api/1/origin/https://scm.gforge.inria.fr/anonscm/svn/mpfi/visits/" type: #f url: "https://scm.gforge.inria.fr/anonscm/svn/mpfi">
> scheme@(guile-user)> (origin-visits $3)
> $4 = (#<<visit> date: #<date nanosecond: 902765 second: 25 minute: 53 hour: 21 day: 21 month: 9 year: 2020 zone-offset: 0> origin: "https://scm.gforge.inria.fr/anonscm/svn/mpfi" url: "https://archive.softwareheritage.org/api/1/origin/https://scm.gforge.inria.fr/anonscm/svn/mpfi/visit/1/" snapshot-url: "https://archive.softwareheritage.org/api/1/snapshot/e7fdd4dc6230f710dbc55c1b308804fa1b5f51f0/" status: full number: 1>)
> scheme@(guile-user)> (visit-snapshot (car $4))
> $5 = #<<snapshot> branches: (#<<branch> name: "HEAD" target-type: revision target-url: "https://archive.softwareheritage.org/api/1/revision/f7b445a6bdc38bf075f29265120ca49824f698ea/">)>
>
> So the next step is to augment (guix swh) with a
> ‘lookup-subversion-revision’ procedure.

The attached patch does that:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (lookup-subversion-revision "https://scm.gforge.inria.fr/anonscm/svn/mpfi" 680)
$12 = #<<revision> id: "72102de7605a2459ebcb016338ebbf1a997e8c8d" date: #<date nanosecond: 938388 second: 35 minute: 32 hour: 11 day: 6 month: 9 year: 2018 zone-offset: 0> directory: "5c89c025a4cd9d16befdfec12dfc23f7318d0d5b" directory-url: "https://archive.softwareheritage.org/api/1/directory/5c89c025a4cd9d16befdfec12dfc23f7318d0d5b/" parents-ids: ("16da41f1848d77a93aec565320b72b460c429b61") extra-headers: (("svn_repo_uuid" . "e2f78e0c-bb60-4709-9413-9660a31d4696") ("svn_revision" . "680"))>
scheme@(guile-user)> (lookup-subversion-revision "https://scm.gforge.inria.fr/anonscm/svn/mpfi" 666)
$13 = #<<revision> id: "148eb1e7206b111af4075c73c656e54c9efed6cb" date: #<date nanosecond: 654167 second: 2 minute: 52 hour: 12 day: 2 month: 8 year: 2018 zone-offset: 0> directory: "ed7b0bd7019fb85cd86d948a97c23b9d43aa8728" directory-url: "https://archive.softwareheritage.org/api/1/directory/ed7b0bd7019fb85cd86d948a97c23b9d43aa8728/" parents-ids: ("0ba2aa7e0d3fc0a1eb3ba72b32094515415ae47a") extra-headers: (("svn_repo_uuid" . "e2f78e0c-bb60-4709-9413-9660a31d4696") ("svn_revision" . "666"))>
--8<---------------cut here---------------end--------------->8---

The implementation is pretty bad though, because it walks the revision
history until it finds the right revision number—so you’re likely to
reach the bandwidth rate limit before you’ve found the revision you’re
looking for.

More importantly, most svn origins cannot be found, or at least not
by passing the URL as-is:

  https://sympa.inria.fr/sympa/arc/swh-devel/2023-03/msg00009.html

This whole hack looks like a dead end.

It would be ideal if SWH would compute nar hashes, as you proposed:

  https://gitlab.softwareheritage.org/swh/meta/-/issues/4538

As a stopgap, I wonder if we could use “double hashing” on our side, but
only for svn: we’d store both the nar sha256 as we currently do, plus
the swhid.  It still seems to me that it’d be hard to scale and to
maintain that over time, even if it’s limited to svn.  Plus, there’d
still be the problem of ‘svn-multi-fetch’, which is what most TeX Live
packages use.

Thoughts?

Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/guix/swh.scm b/guix/swh.scm
index c7c1c873a2..a65635b1db 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2018, 2019, 2020, 2021 Ludovic Courtès <ludo <at> gnu.org>
+;;; Copyright © 2018, 2019, 2020, 2021, 2023 Ludovic Courtès <ludo <at> gnu.org>
 ;;; Copyright © 2020 Jakub Kądziołka <kuba <at> kadziolka.net>
 ;;; Copyright © 2021 Xinglu Chen <public <at> yoctocell.xyz>
 ;;; Copyright © 2021 Simon Tournier <zimon.toutoune <at> gmail.com>
@@ -75,8 +75,10 @@ (define-module (guix swh)
             revision-id
             revision-date
             revision-directory
+            revision-parents
             lookup-revision
             lookup-origin-revision
+            lookup-subversion-revision
 
             content?
             content-checksums
@@ -207,6 +209,14 @@ (define string*
     ((? null?) #f)                                ;Guile-JSON 3.x
     ('null #f)))                                  ;Guile-JSON 4.x
 
+(define pair-vector->alist
+  (match-lambda
+    ('null '())
+    ((= vector->list lst)
+     (map (match-lambda
+            (#(key value) (cons key value)))
+          lst))))
+
 (define %allow-request?
   ;; Takes a URL and method (e.g., the 'http-get' procedure) and returns true
   ;; to keep going.  This can be used to disallow requests when
@@ -346,7 +356,14 @@ (define-json-mapping <revision> make-revision revision?
   (id            revision-id)
   (date          revision-date "date" (maybe-null string->date*))
   (directory     revision-directory)
-  (directory-url revision-directory-url "directory_url"))
+  (directory-url revision-directory-url "directory_url")
+  (parents-ids   revision-parent-ids "parents"
+                 (lambda (vector)
+                   (map (lambda (alist)
+                          (assoc-ref alist "id"))
+                        (vector->list vector))))
+  (extra-headers revision-extra-headers      ;alist--e.g., with "svn_revision"
+                 "extra_headers" pair-vector->alist))
 
 ;; <https://archive.softwareheritage.org/api/1/content/>
 (define-json-mapping <content> make-content content?
@@ -524,6 +541,50 @@ (define (lookup-origin-revision url tag)
         (()
          #f)))))
 
+(define (revision-parents revision)
+  "Return the parent revision(s) of REVISION."
+  (filter-map lookup-revision (revision-parent-ids revision)))
+
+(define (lookup-subversion-revision-in-history revision revision-number)
+  "Look for Subversion REVISION-NUMBER starting from REVISION and going back
+in history."
+  (let loop ((revision revision))
+    (let ((number (and=> (assoc-ref (revision-extra-headers revision)
+                                    "svn_revision")
+                         string->number)))
+      (and number
+           (cond ((= number revision-number)
+                  ;; Found it!
+                  revision)
+                 ((< number revision-number)
+                  ;; REVISION is ancestor of REVISION-NUMBER, so stop here.
+                  #f)
+                 (else
+                  ;; Check the parent(s) of REVISION.
+                  (any loop (revision-parents revision))))))))
+
+(define (lookup-subversion-revision url revision-number)
+  "Return either #f or the revision of the Subversion repository once
+available at URL with the given REVISION-NUMBER."
+  (match (lookup-origin url)
+    (#f #f)
+    (origin
+      (match (filter (lambda (visit)
+                       ;; Return #f if (visit-snapshot VISIT) would return #f.
+                       (and (visit-snapshot-url visit)
+                            (eq? 'full (visit-status visit))))
+                     (origin-visits origin))
+        (()
+         #f)
+        ((visit . _)
+         (any (lambda (branch)
+                (match (branch-target branch)
+                  ((? revision? revision)
+                   (lookup-subversion-revision-in-history revision
+                                                          revision-number))
+                  (_ #f)))
+              (snapshot-branches (visit-snapshot visit))))))))
+
 (define (release-target release)
   "Return the revision that is the target of RELEASE."
   (match (release-target-type release)

This bug report was last modified 1 year and 106 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.