From unknown Sat Jun 21 17:35:03 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#33432 <33432@debbugs.gnu.org> To: bug#33432 <33432@debbugs.gnu.org> Subject: Status: [PATCH 0/2] Download Git checkouts from Software Heritage as a last resort Reply-To: bug#33432 <33432@debbugs.gnu.org> Date: Sun, 22 Jun 2025 00:35:03 +0000 retitle 33432 [PATCH 0/2] Download Git checkouts from Software Heritage as = a last resort reassign 33432 guix-patches submitter 33432 Ludovic Court=C3=A8s severity 33432 normal tag 33432 patch thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 19 11:13:54 2018 Received: (at submit) by debbugs.gnu.org; 19 Nov 2018 16:13:54 +0000 Received: from localhost ([127.0.0.1]:32875 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gOmBO-0005V0-0x for submit@debbugs.gnu.org; Mon, 19 Nov 2018 11:13:54 -0500 Received: from eggs.gnu.org ([208.118.235.92]:40691) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gOmBM-0005Uk-It for submit@debbugs.gnu.org; Mon, 19 Nov 2018 11:13:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gOmBF-0001j4-HB for submit@debbugs.gnu.org; Mon, 19 Nov 2018 11:13:47 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:59857) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gOmBF-0001ix-De for submit@debbugs.gnu.org; Mon, 19 Nov 2018 11:13:45 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50296) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gOmBE-0005ym-Dv for guix-patches@gnu.org; Mon, 19 Nov 2018 11:13:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gOmBD-0001hG-IC for guix-patches@gnu.org; Mon, 19 Nov 2018 11:13:44 -0500 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:53847) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gOmB9-0001da-KI; Mon, 19 Nov 2018 11:13:39 -0500 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=60742 helo=gnu.org) by fencepost.gnu.org with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1gOmB9-00065r-BM; Mon, 19 Nov 2018 11:13:39 -0500 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= To: guix-patches@gnu.org Subject: [PATCH 0/2] Download Git checkouts from Software Heritage as a last resort Date: Mon, 19 Nov 2018 17:13:25 +0100 Message-Id: <20181119161325.7801-1-ludo@gnu.org> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit Cc: =?UTF-8?q?Ludovic=20Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Hello Guix! This patch series adds the Software Heritage (SWH) client library initially discussed at: https://lists.gnu.org/archive/html/guix-devel/2018-11/msg00285.html Furthermore, it uses it in (guix git-download) to download code from SWH when it is unavailable upstream and on our servers. This bit relies on the “vault” API of SWH, which allows you to fetch a checkout as a tarball. Not all revisions are readily available as tarballs, understandably, so the vault API has a mechanism that allows you to request the “cooking” of a specific checkout. Cooking is asynchronous and can take some time. https://docs.softwareheritage.org/devel/swh-vault/api.html When downloading over SWH, the ‘swh-download’ procedure first resolves the tag (if it’s a tag), then tries to download the corresponding tarball from the vault. If the vault doesn’t have it yet, it sends a cooking request and waits for it to complete by periodically checking the cooking status. In the future, we should provide a “lister” and “loader” so that SWH can regularly obtain a list of Guix packages with their source URL and commit/tag: https://forge.softwareheritage.org/T1352 The SWH team is also considering pre-cooking all VCS tags such that every time we refer to a tag, we can be sure its contents are already available in the vault: https://forge.softwareheritage.org/T1350 Feedback welcome! Ludo’. Ludovic Courtès (2): Add (guix swh). git-download: Download from Software Heritage as a last resort. Makefile.am | 1 + guix/git-download.scm | 64 +++-- guix/swh.scm | 551 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 596 insertions(+), 20 deletions(-) create mode 100644 guix/swh.scm -- 2.19.1 From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 19 11:24:30 2018 Received: (at 33432) by debbugs.gnu.org; 19 Nov 2018 16:24:30 +0000 Received: from localhost ([127.0.0.1]:32889 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gOmLd-0007lu-NH for submit@debbugs.gnu.org; Mon, 19 Nov 2018 11:24:29 -0500 Received: from eggs.gnu.org ([208.118.235.92]:44714) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gOmLb-0007lb-Ms for 33432@debbugs.gnu.org; Mon, 19 Nov 2018 11:24:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gOmLV-0004jg-SQ for 33432@debbugs.gnu.org; Mon, 19 Nov 2018 11:24:22 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:54063) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gOmLV-0004jS-P5; Mon, 19 Nov 2018 11:24:21 -0500 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=60932 helo=gnu.org) by fencepost.gnu.org with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1gOmLV-000700-GF; Mon, 19 Nov 2018 11:24:21 -0500 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= To: 33432@debbugs.gnu.org Subject: [PATCH 2/2] git-download: Download from Software Heritage as a last resort. Date: Mon, 19 Nov 2018 17:24:09 +0100 Message-Id: <20181119162409.8130-2-ludo@gnu.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181119162409.8130-1-ludo@gnu.org> References: <20181119162409.8130-1-ludo@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 33432 Cc: =?UTF-8?q?Ludovic=20Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) From: Ludovic Courtès * guix/git-download.scm (git-fetch)[inputs]: Add gzip and tar when 'git-reference-recursive?' is false. [guile-json, gnutls]: New variables. [modules]: Add (guix swh). [build]: Wrap in 'with-extensions'. Add call to 'swh-download'. --- guix/git-download.scm | 64 +++++++++++++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 20 deletions(-) diff --git a/guix/git-download.scm b/guix/git-download.scm index fa94fad8f8..2689658af8 100644 --- a/guix/git-download.scm +++ b/guix/git-download.scm @@ -74,11 +74,22 @@ HASH-ALGO (a symbol). Use NAME as the file name, or a generic name if #f." ;; available so that 'git submodule' works. (if (git-reference-recursive? ref) (standard-packages) - '())) + + ;; The 'swh-download' procedure requires tar and gzip. + `(("gzip" ,(module-ref (resolve-interface '(gnu packages compression)) + 'gzip)) + ("tar" ,(module-ref (resolve-interface '(gnu packages base)) + 'tar))))) (define zlib (module-ref (resolve-interface '(gnu packages compression)) 'zlib)) + (define guile-json + (module-ref (resolve-interface '(gnu packages guile)) 'guile-json)) + + (define gnutls + (module-ref (resolve-interface '(gnu packages tls)) 'gnutls)) + (define config.scm (scheme-file "config.scm" #~(begin @@ -93,30 +104,43 @@ HASH-ALGO (a symbol). Use NAME as the file name, or a generic name if #f." (delete '(guix config) (source-module-closure '((guix build git) (guix build utils) - (guix build download-nar)))))) + (guix build download-nar) + (guix swh)))))) (define build (with-imported-modules modules - #~(begin - (use-modules (guix build git) - (guix build utils) - (guix build download-nar) - (ice-9 match)) + (with-extensions (list guile-json gnutls) ;for (guix swh) + #~(begin + (use-modules (guix build git) + (guix build utils) + (guix build download-nar) + (guix swh) + (ice-9 match)) - ;; The 'git submodule' commands expects Coreutils, sed, - ;; grep, etc. to be in $PATH. - (set-path-environment-variable "PATH" '("bin") - (match '#+inputs - (((names dirs outputs ...) ...) - dirs))) + (define recursive? + (call-with-input-string (getenv "git recursive?") read)) - (or (git-fetch (getenv "git url") (getenv "git commit") - #$output - #:recursive? (call-with-input-string - (getenv "git recursive?") - read) - #:git-command (string-append #+git "/bin/git")) - (download-nar #$output))))) + ;; The 'git submodule' commands expects Coreutils, sed, + ;; grep, etc. to be in $PATH. + (set-path-environment-variable "PATH" '("bin") + (match '#+inputs + (((names dirs outputs ...) ...) + dirs))) + + (setvbuf (current-output-port) 'line) + (setvbuf (current-error-port) 'line) + + (or (git-fetch (getenv "git url") (getenv "git commit") + #$output + #:recursive? recursive? + #:git-command (string-append #+git "/bin/git")) + (download-nar #$output) + + ;; As a last resort, attempt to download from Software Heritage. + ;; XXX: Currently recursive checkouts are not supported. + (and (not recursive?) + (swh-download (getenv "git url") (getenv "git commit") + #$output))))))) (mlet %store-monad ((guile (package->derivation guile system))) (gexp->derivation (or name "git-checkout") build -- 2.19.1 From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 19 11:24:31 2018 Received: (at 33432) by debbugs.gnu.org; 19 Nov 2018 16:24:31 +0000 Received: from localhost ([127.0.0.1]:32891 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gOmLf-0007m2-1M for submit@debbugs.gnu.org; Mon, 19 Nov 2018 11:24:31 -0500 Received: from eggs.gnu.org ([208.118.235.92]:44719) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gOmLc-0007ld-EI for 33432@debbugs.gnu.org; Mon, 19 Nov 2018 11:24:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gOmLV-0004ip-9r for 33432@debbugs.gnu.org; Mon, 19 Nov 2018 11:24:23 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:54062) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gOmLV-0004id-5V; Mon, 19 Nov 2018 11:24:21 -0500 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=60932 helo=gnu.org) by fencepost.gnu.org with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1gOmLU-000700-My; Mon, 19 Nov 2018 11:24:21 -0500 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= To: 33432@debbugs.gnu.org Subject: [PATCH 1/2] Add (guix swh). Date: Mon, 19 Nov 2018 17:24:08 +0100 Message-Id: <20181119162409.8130-1-ludo@gnu.org> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 33432 Cc: =?UTF-8?q?Ludovic=20Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) From: Ludovic Courtès * guix/swh.scm: New file. * Makefile.am (MODULES): Add it. --- Makefile.am | 1 + guix/swh.scm | 551 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 552 insertions(+) create mode 100644 guix/swh.scm diff --git a/Makefile.am b/Makefile.am index c63b65ba56..63266bd96b 100644 --- a/Makefile.am +++ b/Makefile.am @@ -74,6 +74,7 @@ MODULES = \ guix/discovery.scm \ guix/git-download.scm \ guix/hg-download.scm \ + guix/swh.scm \ guix/monads.scm \ guix/monad-repl.scm \ guix/gexp.scm \ diff --git a/guix/swh.scm b/guix/swh.scm new file mode 100644 index 0000000000..c188e17c69 --- /dev/null +++ b/guix/swh.scm @@ -0,0 +1,551 @@ +;;; GNU Guix --- Functional package management for GNU +;;; Copyright © 2018 Ludovic Courtès +;;; +;;; This file is part of GNU Guix. +;;; +;;; GNU Guix is free software; you can redistribute it and/or modify it +;;; under the terms of the GNU General Public License as published by +;;; the Free Software Foundation; either version 3 of the License, or (at +;;; your option) any later version. +;;; +;;; GNU Guix is distributed in the hope that it will be useful, but +;;; WITHOUT ANY WARRANTY; without even the implied warranty of +;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;;; GNU General Public License for more details. +;;; +;;; You should have received a copy of the GNU General Public License +;;; along with GNU Guix. If not, see . + +(define-module (guix swh) + #:use-module (guix base16) + #:use-module (guix build utils) + #:use-module ((guix build syscalls) #:select (mkdtemp!)) + #:use-module (web client) + #:use-module (web response) + #:use-module (json) + #:use-module (srfi srfi-1) + #:use-module (srfi srfi-9) + #:use-module (srfi srfi-11) + #:use-module (srfi srfi-19) + #:use-module (ice-9 match) + #:use-module (ice-9 regex) + #:use-module (ice-9 popen) + #:use-module ((ice-9 ftw) #:select (scandir)) + #:export (origin? + origin-id + origin-type + origin-url + origin-visits + lookup-origin + + visit? + visit-date + visit-origin + visit-url + visit-snapshot-url + visit-status + visit-number + visit-snapshot + + branch? + branch-name + branch-target + + release? + release-id + release-name + release-message + release-target + + revision? + revision-id + revision-date + revision-directory + lookup-revision + lookup-origin-revision + + content? + content-checksums + content-data-url + content-length + lookup-content + + directory-entry? + directory-entry-name + directory-entry-type + directory-entry-checksums + directory-entry-length + directory-entry-permissions + lookup-directory + directory-entry-target + + vault-reply? + vault-reply-id + vault-reply-fetch-url + vault-reply-object-id + vault-reply-object-type + vault-reply-progress-message + vault-reply-status + query-vault + request-cooking + vault-fetch + + swh-download)) + +;;; Commentary: +;;; +;;; This module provides bindings to the HTTP interface of Software Heritage. +;;; It allows you to browse the archive, look up revisions (such as SHA1 +;;; commit IDs), "origins" (code hosting URLs), content (files), etc. See +;;; for more information. +;;; +;;; The high-level 'swh-download' procedure allows you to download a Git +;;; revision from Software Heritage, provided it is available. +;;; +;;; Code: + +(define %swh-base-url + ;; Presumably we won't need to change it. + "https://archive.softwareheritage.org") + +(define (swh-url path . rest) + (define url + (string-append %swh-base-url path + (string-join rest "/" 'prefix))) + + ;; Ensure there's a trailing slash or we get a redirect. + (if (string-suffix? "/" url) + url + (string-append url "/"))) + +(define-syntax-rule (define-json-reader json->record ctor spec ...) + "Define JSON->RECORD as a procedure that converts a JSON representation, +read from a port, string, or hash table, into a record created by CTOR and +following SPEC, a series of field specifications." + (define (json->record input) + (let ((table (cond ((port? input) + (json->scm input)) + ((string? input) + (json-string->scm input)) + ((hash-table? input) + input)))) + (let-syntax ((extract-field (syntax-rules () + ((_ table (field key json->value)) + (json->value (hash-ref table key))) + ((_ table (field key)) + (hash-ref table key)) + ((_ table (field)) + (hash-ref table + (symbol->string 'field)))))) + (ctor (extract-field table spec) ...))))) + +(define-syntax-rule (define-json-mapping rtd ctor pred json->record + (field getter spec ...) ...) + "Define RTD as a record type with the given FIELDs and GETTERs, à la SRFI-9, +and define JSON->RECORD as a conversion from JSON to a record of this type." + (begin + (define-record-type rtd + (ctor field ...) + pred + (field getter) ...) + + (define-json-reader json->record ctor + (field spec ...) ...))) + +(define %date-regexp + ;; Match strings like "2014-11-17T22:09:38+01:00" or + ;; "2018-09-30T23:20:07.815449+00:00"". + (make-regexp "^([0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2}):([0-9]{2}):([0-9]{2})((\\.[0-9]+)?)([+-][0-9]{2}):([0-9]{2})$")) + +(define (string->date* str) + "Return a SRFI-19 date parsed from STR, a date string as returned by +Software Heritage." + ;; We can't use 'string->date' because of the timezone format: SWH returns + ;; "+01:00" when the '~z' template expects "+0100". So we roll our own! + (or (and=> (regexp-exec %date-regexp str) + (lambda (match) + (define (ref n) + (string->number (match:substring match n))) + + (make-date (let ((ns (match:substring match 8))) + (if ns + (string->number (string-drop ns 1)) + 0)) + (ref 6) (ref 5) (ref 4) + (ref 3) (ref 2) (ref 1) + (+ (* 3600 (ref 9)) ;time zone + (if (< (ref 9) 0) + (- (ref 10)) + (ref 10)))))) + str)) ;oops! + +(define* (call url decode #:optional (method http-get) + #:key (false-if-404? #t)) + "Invoke the endpoint at URL using METHOD. Decode the resulting JSON body +using DECODE, a one-argument procedure that takes an input port. When +FALSE-IF-404? is true, return #f upon 404 responses." + (let*-values (((response port) + (method url #:streaming? #t))) + ;; See . + (match (assq-ref (response-headers response) 'x-ratelimit-remaining) + (#f #t) + ((? (compose zero? string->number)) + (throw 'swh-error url response)) + (_ #t)) + + (cond ((= 200 (response-code response)) + (let ((result (decode port))) + (close-port port) + result)) + ((and false-if-404? + (= 404 (response-code response))) + (close-port port) + #f) + (else + (close-port port) + (throw 'swh-error url response))))) + +(define-syntax define-query + (syntax-rules (path) + "Define a procedure that performs a Software Heritage query." + ((_ (name args ...) docstring (path components ...) + json->value) + (define (name args ...) + docstring + (call (swh-url components ...) json->value))))) + +;; +(define-json-mapping make-origin origin? + json->origin + (id origin-id) + (visits-url origin-visits-url "origin_visits_url") + (type origin-type) + (url origin-url)) + +;; +(define-json-mapping make-visit visit? + json->visit + (date visit-date "date" string->date*) + (origin visit-origin) + (url visit-url "origin_visit_url") + (snapshot-url visit-snapshot-url "snapshot_url") + (status visit-status) + (number visit-number "visit")) + +;; +(define-json-mapping make-snapshot snapshot? + json->snapshot + (branches snapshot-branches "branches" json->branches)) + +;; This is used for the "branches" field of snapshots. +(define-record-type + (make-branch name target-type target-url) + branch? + (name branch-name) + (target-type branch-target-type) ;release | revision + (target-url branch-target-url)) + +(define (json->branches branches) + (hash-map->list (lambda (key value) + (make-branch key + (string->symbol + (hash-ref value "target_type")) + (hash-ref value "target_url"))) + branches)) + +;; +(define-json-mapping make-release release? + json->release + (id release-id) + (name release-name) + (message release-message) + (target-type release-target-type "target_type" string->symbol) + (target-url release-target-url "target_url")) + +;; +(define-json-mapping make-revision revision? + json->revision + (id revision-id) + (date revision-date "date" string->date*) + (directory revision-directory) + (directory-url revision-directory-url "directory_url")) + +;; +(define-json-mapping make-content content? + json->content + (checksums content-checksums "checksums" json->checksums) + (data-url content-data-url "data_url") + (file-type-url content-file-type-url "filetype_url") + (language-url content-language-url "language_url") + (length content-length) + (license-url content-license-url "license_url")) + +(define (json->checksums checksums) + (hash-map->list (lambda (key value) + (cons key (base16-string->bytevector value))) + checksums)) + +;; +(define-json-mapping make-directory-entry directory-entry? + json->directory-entry + (name directory-entry-name) + (type directory-entry-type "type" + (match-lambda + ("dir" 'directory) + (str (string->symbol str)))) + (checksums directory-entry-checksums "checksums" + (match-lambda + (#f #f) + (lst (json->checksums lst)))) + (id directory-entry-id "dir_id") + (length directory-entry-length) + (permissions directory-entry-permissions "perms") + (target-url directory-entry-target-url "target_url")) + +;; +(define-json-mapping make-save-reply save-reply? + json->save-reply + (origin-url save-reply-origin-url "origin_url") + (origin-type save-reply-origin-type "origin_type") + (request-date save-reply-request-date "save_request_date" + string->date*) + (request-status save-reply-request-status "save_request_status" + string->symbol) + (task-status save-reply-task-status "save_task_status" + (match-lambda + ("not created" 'not-created) + ((? string? str) (string->symbol str))))) + +;; +(define-json-mapping make-vault-reply vault-reply? + json->vault-reply + (id vault-reply-id) + (fetch-url vault-reply-fetch-url "fetch_url") + (object-id vault-reply-object-id "obj_id") + (object-type vault-reply-object-type "obj_type" string->symbol) + (progress-message vault-reply-progress-message "progress_message") + (status vault-reply-status "status" string->symbol)) + + +;;; +;;; RPCs. +;;; + +(define-query (lookup-origin url) + "Return an origin for URL." + (path "/api/1/origin/git/url" url) + json->origin) + +(define-query (lookup-content hash type) + "Return a content for HASH, of the given TYPE--e.g., \"sha256\"." + (path "/api/1/content" + (string-append type ":" + (bytevector->base16-string hash))) + json->content) + +(define-query (lookup-revision id) + "Return the revision with the given ID, typically a Git commit SHA1." + (path "/api/1/revision" id) + json->revision) + +(define-query (lookup-directory id) + "Return the directory with the given ID." + (path "/api/1/directory" id) + json->directory-entries) + +(define (json->directory-entries port) + (map json->directory-entry (json->scm port))) + +(define (origin-visits origin) + "Return the list of visits of ORIGIN, a record as returned by +'lookup-origin'." + (call (swh-url (origin-visits-url origin)) + (lambda (port) + (map json->visit (json->scm port))))) + +(define (visit-snapshot visit) + "Return the snapshot corresponding to VISIT." + (call (swh-url (visit-snapshot-url visit)) + json->snapshot)) + +(define (branch-target branch) + "Return the target of BRANCH, either a or a ." + (match (branch-target-type branch) + ('release + (call (swh-url (branch-target-url branch)) + json->release)) + ('revision + (call (swh-url (branch-target-url branch)) + json->revision)))) + +(define (lookup-origin-revision url tag) + "Return a corresponding to the given TAG for the repository +coming from URL. Example: + + (lookup-origin-release \"https://github.com/guix-mirror/guix/\" \"v0.8\") + => #< id: \"44941…\" …> + +The information is based on the latest visit of URL available. Return #f if +URL could not be found." + (match (lookup-origin url) + (#f #f) + (origin + (match (origin-visits origin) + ((visit . _) + (let ((snapshot (visit-snapshot visit))) + (match (and=> (find (lambda (branch) + (string=? (string-append "refs/tags/" tag) + (branch-name branch))) + (snapshot-branches snapshot)) + branch-target) + ((? release? release) + (release-target release)) + ((? revision? revision) + revision) + (#f ;tag not found + #f)))) + (() + #f))))) + +(define (release-target release) + "Return the revision that is the target of RELEASE." + (match (release-target-type release) + ('revision + (call (swh-url (release-target-url release)) + json->revision)))) + +(define (directory-entry-target entry) + "If ENTRY, a directory entry, has type 'directory, return its list of +directory entries; if it has type 'file, return its object." + (call (swh-url (directory-entry-target-url entry)) + (match (directory-entry-type entry) + ('file json->content) + ('directory json->directory-entries)))) + +(define* (save-origin url #:optional (type "git")) + "Request URL to be saved." + (call (swh-url "/api/1/origin/save" type "url" url) json->save-reply + http-post)) + +(define-query (save-origin-status url type) + "Return the status of a /save request for URL and TYPE (e.g., \"git\")." + (path "/api/1/origin/save" type "url" url) + json->save-reply) + +(define-query (query-vault id kind) + "Ask the availability of object ID and KIND to the vault, where KIND is +'directory or 'revision. Return #f if it could not be found, or a + on success." + ;; + ;; There's a single format supported for directories and revisions and for + ;; now, the "/format" bit of the URL *must* be omitted. + (path "/api/1/vault" (symbol->string kind) id) + json->vault-reply) + +(define (request-cooking id kind) + "Request the cooking of object ID and KIND (one of 'directory or 'revision) +to the vault. Return a ." + (call (swh-url "/api/1/vault" (symbol->string kind) id) + json->vault-reply + http-post)) + +(define* (vault-fetch id kind + #:key (log-port (current-error-port))) + "Return an input port from which a bundle of the object with the given ID +and KIND (one of 'directory or 'revision) can be retrieved, or #f if the +object could not be found. + +For a directory, the returned stream is a gzip-compressed tarball. For a +revision, it is a gzip-compressed stream for 'git fast-import'." + (let loop ((reply (query-vault id kind))) + (match reply + (#f + (and=> (request-cooking id kind) loop)) + (_ + (match (vault-reply-status reply) + ('done + ;; Fetch the bundle. + (let-values (((response port) + (http-get (swh-url (vault-reply-fetch-url reply)) + #:streaming? #t))) + (if (= (response-code response) 200) + port + (begin ;shouldn't happen + (close-port port) + #f)))) + ('failed + ;; Upon failure, we're supposed to try again. + (format log-port "SWH vault: failure: ~a~%" + (vault-reply-progress-message reply)) + (format log-port "SWH vault: retrying...~%") + (loop (request-cooking id kind))) + ((and (or 'new 'pending) status) + ;; Wait until the bundle shows up. + (let ((message (vault-reply-progress-message reply))) + (when (eq? 'new status) + (format log-port "SWH vault: \ +requested bundle cooking, waiting for completion...~%")) + (when (string? message) + (format log-port "SWH vault: ~a~%" message)) + + ;; Wait long enough so we don't exhaust our maximum number of + ;; requests per hour too fast (as of this writing, the limit is 60 + ;; requests per hour per IP address.) + (sleep (if (eq? status 'new) 60 30)) + + (loop (query-vault id kind))))))))) + + +;;; +;;; High-level interface. +;;; + +(define (commit-id? reference) + "Return true if REFERENCE is likely a commit ID, false otherwise---e.g., if +it is a tag name." + (and (= (string-length reference) 40) + (string-every char-set:hex-digit reference))) + +(define (call-with-temporary-directory proc) ;FIXME: factorize + "Call PROC with a name of a temporary directory; close the directory and +delete it when leaving the dynamic extent of this call." + (let* ((directory (or (getenv "TMPDIR") "/tmp")) + (template (string-append directory "/guix-directory.XXXXXX")) + (tmp-dir (mkdtemp! template))) + (dynamic-wind + (const #t) + (lambda () + (proc tmp-dir)) + (lambda () + (false-if-exception (delete-file-recursively tmp-dir)))))) + +(define (swh-download url reference output) + "Download from Software Heritage a checkout of the Git tag or commit +REFERENCE originating from URL, and unpack it in OUTPUT. Return #t on success +and #f on failure. + +This procedure uses the \"vault\", which contains \"cooked\" directories in +the form of tarballs. If the requested directory is not cooked yet, it will +wait until it becomes available, which could take several minutes." + (match (if (commit-id? reference) + (lookup-revision reference) + (lookup-origin-revision url reference)) + ((? revision? revision) + (call-with-temporary-directory + (lambda (directory) + (let ((input (vault-fetch (revision-directory revision) 'directory)) + (tar (open-pipe* OPEN_WRITE "tar" "-C" directory "-xzvf" "-"))) + (dump-port input tar) + (close-port input) + (let ((status (close-pipe tar))) + (unless (zero? status) + (error "tar extraction failure" status))) + + (match (scandir directory) + (("." ".." sub-directory) + (copy-recursively (string-append directory "/" sub-directory) + output + #:log (%make-void-port "w")) + #t)))))) + (#f + #f))) -- 2.19.1 From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 21 05:16:01 2018 Received: (at 33432) by debbugs.gnu.org; 21 Nov 2018 10:16:01 +0000 Received: from localhost ([127.0.0.1]:37809 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gPPY9-0006m8-AA for submit@debbugs.gnu.org; Wed, 21 Nov 2018 05:16:01 -0500 Received: from eggs.gnu.org ([208.118.235.92]:44164) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gPPY8-0006lr-C9 for 33432@debbugs.gnu.org; Wed, 21 Nov 2018 05:16:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gPPY2-0006KP-2R for 33432@debbugs.gnu.org; Wed, 21 Nov 2018 05:15:55 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43705) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gPPY1-0006Jw-Rk for 33432@debbugs.gnu.org; Wed, 21 Nov 2018 05:15:53 -0500 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=45678 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1gPPY1-0003xK-KL for 33432@debbugs.gnu.org; Wed, 21 Nov 2018 05:15:53 -0500 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: 33432@debbugs.gnu.org Subject: On tags References: <20181119161325.7801-1-ludo@gnu.org> Date: Wed, 21 Nov 2018 11:15:52 +0100 In-Reply-To: <20181119161325.7801-1-ludo@gnu.org> ("Ludovic \=\?utf-8\?Q\?Cour\?\= \=\?utf-8\?Q\?t\=C3\=A8s\=22's\?\= message of "Mon, 19 Nov 2018 17:13:25 +0100") Message-ID: <87muq2u46f.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 33432 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Hello, Ludovic Court=C3=A8s skribis: > When downloading over SWH, the =E2=80=98swh-download=E2=80=99 procedure f= irst resolves > the tag (if it=E2=80=99s a tag), then tries to download the corresponding= tarball Speaking of tags, it=E2=80=99s not news but tags are bad from a reproducibi= lity standpoint: they are mutable and per-repository. Tag lookup is necessarily relative to a repository URL (and to a snapshot of the repository, since it can be mutated): scheme@(guile-user)> (lookup-origin-revision "https://git.savannah.gnu.or= g/git/guix.git" "v0.15.0") $5 =3D #< id: "359fdda40f754bbf1b5dc261e7427b75463b59be" date: = # directory: "27c69c5d298a43096a53affbf881e7b13f17bdcd= " directory-url: "/api/1/directory/27c69c5d298a43096a53affbf881e7b13f17bdcd= /"> So if, say, SWH archived a mirror of but not itself, then tag lookup will fail, which is sad given that the code is actually there. To address this, possible options include: 1. Always store commit IDs rather than tags, effectively giving us =E2=80=9Cnormal=E2=80=9D Git content-addressability. This is not grea= t for code readability and review though. 2. Store =E2=80=98sha1_git=E2=80=99 hashes (SHA1s of Git trees) instead o= f or in addition to nar sha256 hashes so we can perform lookups by content hash on SWH or Git mirrors. #2 might be the best long-term option though it would require daemon support to compute, store, and check these Git-style hashes. Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 26 05:11:35 2018 Received: (at 33432-done) by debbugs.gnu.org; 26 Nov 2018 10:11:35 +0000 Received: from localhost ([127.0.0.1]:47837 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gRDrb-0002Pm-LN for submit@debbugs.gnu.org; Mon, 26 Nov 2018 05:11:35 -0500 Received: from eggs.gnu.org ([208.118.235.92]:44897) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gRDra-0002PZ-Fk for 33432-done@debbugs.gnu.org; Mon, 26 Nov 2018 05:11:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gRDrP-0006xf-Md for 33432-done@debbugs.gnu.org; Mon, 26 Nov 2018 05:11:29 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43382) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gRDrP-0006xY-JS for 33432-done@debbugs.gnu.org; Mon, 26 Nov 2018 05:11:23 -0500 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=49918 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1gRDrP-00004r-Am for 33432-done@debbugs.gnu.org; Mon, 26 Nov 2018 05:11:23 -0500 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: 33432-done@debbugs.gnu.org Subject: Re: [bug#33432] [PATCH 0/2] Download Git checkouts from Software Heritage as a last resort References: <20181119161325.7801-1-ludo@gnu.org> Date: Mon, 26 Nov 2018 11:11:21 +0100 In-Reply-To: <20181119161325.7801-1-ludo@gnu.org> ("Ludovic \=\?utf-8\?Q\?Cour\?\= \=\?utf-8\?Q\?t\=C3\=A8s\=22's\?\= message of "Mon, 19 Nov 2018 17:13:25 +0100") Message-ID: <87tvk49mie.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 33432-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Hello, Ludovic Court=C3=A8s skribis: > Add (guix swh). > git-download: Download from Software Heritage as a last resort. Pushed! 608d3dca89 git-download: Download from Software Heritage as a last resort. de2bfe9029 Add (guix swh). Ludo=E2=80=99. From unknown Sat Jun 21 17:35:03 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 24 Dec 2018 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator