GNU bug report logs - #54787
importer Bioconductor: no tarball, only Git

Previous Next

Package: guix;

Reported by: zimoun <zimon.toutoune <at> gmail.com>

Date: Fri, 8 Apr 2022 11:52:01 UTC

Severity: normal

To reply to this bug, email your comments to 54787 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to rekado <at> elephly.net, bug-guix <at> gnu.org:
bug#54787; Package guix. (Fri, 08 Apr 2022 11:52:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to zimoun <zimon.toutoune <at> gmail.com>:
New bug report received and forwarded. Copy sent to rekado <at> elephly.net, bug-guix <at> gnu.org. (Fri, 08 Apr 2022 11:52:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Bug Guix <bug-guix <at> gnu.org>
Subject: importer Bioconductor: no tarball, only Git
Date: Fri, 08 Apr 2022 13:48:58 +0200
Hi,

Consider the package CHETAH, included in Bioconductor release 3.14;

<https://bioconductor.org/packages/release/bioc/html/CHETAH.html>

but then,

--8<---------------cut here---------------start------------->8---
$ guix import cran -a bioconductor CHETAH
guix import: warning: failed to retrieve package information from https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found)
guix import: error: failed to download description for package 'CHETAH'
--8<---------------cut here---------------end--------------->8---

The reason is because there is no source package.  Only the Git source
repo.


Cheers,
simon




Information forwarded to bug-guix <at> gnu.org:
bug#54787; Package guix. (Mon, 11 Apr 2022 16:19:01 GMT) Full text and rfc822 format available.

Message #8 received at 54787 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: debbugs-submit <at> debbugs.gnu.org, 54787 <at> debbugs.gnu.org
Subject: Re: bug#54787: importer Bioconductor: no tarball, only Git
Date: Mon, 11 Apr 2022 18:15:39 +0200
zimoun <zimon.toutoune <at> gmail.com> writes:

> $ guix import cran -a bioconductor CHETAH
> guix import: warning: failed to retrieve package information from https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found)
> guix import: error: failed to download description for package 'CHETAH'
>
> The reason is because there is no source package.  Only the Git source
> repo.

We should finally switch to fetching the sources from Git.  I wonder why
we haven’t done this earlier.

I guess we should do this gradually to avoid mass updates, so perhaps we
should introduce bioconductor-git-reference and switch over packages one
by one.

What do you think?

-- 
Ricardo




Information forwarded to bug-guix <at> gnu.org:
bug#54787; Package guix. (Tue, 12 Apr 2022 16:27:02 GMT) Full text and rfc822 format available.

Message #11 received at 54787 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: 54787 <at> debbugs.gnu.org
Subject: Re: bug#54787: importer Bioconductor: no tarball, only Git
Date: Tue, 12 Apr 2022 18:25:51 +0200
Hi Ricardo,

On lun., 11 avril 2022 at 18:15, Ricardo Wurmus <rekado <at> elephly.net> wrote:
> zimoun <zimon.toutoune <at> gmail.com> writes:
>
>> $ guix import cran -a bioconductor CHETAH
>> guix import: warning: failed to retrieve package information from
>> https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found)
>> guix import: error: failed to download description for package 'CHETAH'
>>
>> The reason is because there is no source package.  Only the Git source
>> repo.
>
> We should finally switch to fetching the sources from Git.  I wonder why
> we haven’t done this earlier.

Because, maybe, we have just finished the janitor work cleaning the
files cran.scm, bioconductor.scm and bioinformatics.scm. :-)

> I guess we should do this gradually to avoid mass updates, so perhaps we
> should introduce bioconductor-git-reference and switch over packages one
> by one.

First, note that annotations do not have Git repo; at least not always,
e.g.,

<https://bioconductor.org/packages/release/data/annotation/html/GenomeInfoDbData.html>

Second, if we go for something like:

--8<---------------cut here---------------start------------->8---
(define* (bioconductor-git-reference name #:optional
                                     (release %bioconductor-version))
  "Return a <git-reference> for the R package archive on Bioconductor for the
RELEASE corresponding to NAME."
  (git-reference
   (url (string-append %bioconductor-git-url name))
   (commit (string-append "RELEASE_" (string-replace-substring
                                      %bioconductor-version "." "_")))))
--8<---------------cut here---------------end--------------->8---

then, it raises the question: import/cran.scm or build-system/r.scm ?
i.e., do we put a module dependency against (guix git-download) for the
r-build-system or not?

TeXLive already has a dependency to svn-download, so why not.

Well, I am also in favor to break the API and move %bioconductor-version
and %bioconductor-url to (guix build-system r).  WDYT?  It would
simplify some things (#36805 and #39885), I guess.


Third, the adjustments of the importer require a large cup of coffee.


Back to CHETAH, note that

   guix import cran -a git htpps://git.bioconductor.org/CHETAH

works but it points to master instead of RELEASE_3_14.  Well, I am not
very familiar with the Bioconductor workflow for their release.


Last, using this in gnu/packages/bioconductor.scm,

--8<---------------cut here---------------start------------->8---
(define-public r-chetah
  (package
    (name "r-chetah")
    (version "1.11.2")
    (source
     (origin
       (method git-fetch)
       (uri (bioconductor-git-reference "CHETAH"))
       (file-name (git-file-name name version))
       (sha256
        (base32 "021v5831zqdy4pirfsb35kbnz8kmz4lxqc4cwi55qgd6r081xlgh"))))
    (properties `((upstream-name . "CHETAH")))
    (build-system r-build-system)
    (propagated-inputs
     (list r-biodist
           r-corrplot
           r-cowplot
           r-dendextend
           r-ggplot2
           r-gplots
           r-pheatmap
           r-plotly
           r-reshape2
           r-s4vectors
           r-shiny
           r-singlecellexperiment
           r-summarizedexperiment))
    (native-inputs (list r-knitr))
    (home-page "https://git.bioconductor.org/packages/CHETAH")
    (synopsis "Fast and accurate scRNA-seq cell type identification")
    (description
     "CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is
an accurate, selective and fast scRNA-seq classifier.  Classification is guided
by a reference dataset, preferentially also a scRNA-seq dataset.  By
hierarchical clustering of the reference data, CHETAH creates a classification
tree that enables a step-wise, top-to-bottom classification.  Using a novel
stopping rule, CHETAH classifies the input cells to the cell types of the
references and to \"intermediate types\": more general classifications that ended
in an intermediate node of the tree.")
    (license #f)))
--8<---------------cut here---------------end--------------->8---

it just builds with,

    ./pre-inst-env guix build r-chetah



WDYT?


Cheers,
simon




Information forwarded to bug-guix <at> gnu.org:
bug#54787; Package guix. (Thu, 14 Apr 2022 11:48:02 GMT) Full text and rfc822 format available.

Message #14 received at 54787 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 54787 <at> debbugs.gnu.org
Subject: Re: bug#54787: importer Bioconductor: no tarball, only Git
Date: Thu, 14 Apr 2022 13:43:36 +0200
zimoun <zimon.toutoune <at> gmail.com> writes:

> First, note that annotations do not have Git repo; at least not always,
> e.g.,
>
> <https://bioconductor.org/packages/release/data/annotation/html/GenomeInfoDbData.html>

That’s fine.  We just ignore annotation and experiment packages, and use
git only for regular packages.

> Second, if we go for something like:
>
> (define* (bioconductor-git-reference name #:optional
>                                      (release %bioconductor-version))
>   "Return a <git-reference> for the R package archive on Bioconductor for the
> RELEASE corresponding to NAME."
>   (git-reference
>    (url (string-append %bioconductor-git-url name))
>    (commit (string-append "RELEASE_" (string-replace-substring
>                                       %bioconductor-version "." "_")))))
>
>
> then, it raises the question: import/cran.scm or build-system/r.scm ?
> i.e., do we put a module dependency against (guix git-download) for the
> r-build-system or not?
>
> TeXLive already has a dependency to svn-download, so why not.

Yes, I don’t think that’s a problem.

We probably should *not* use RELEASE_3_14 (or whatever) as the commit,
though, because that is a moving target.  We need to resolve to the
actual commit and use its hash.

I wonder how the updater would need to be changed.  It would need to
know about the release branch and look for new commits in that branch
only.

> Well, I am also in favor to break the API and move %bioconductor-version
> and %bioconductor-url to (guix build-system r).  WDYT?  It would
> simplify some things (#36805 and #39885), I guess.

We tried this before and we couldn’t do this because of a circular
reference.

> Back to CHETAH, note that
>
>    guix import cran -a git htpps://git.bioconductor.org/CHETAH
>
> works but it points to master instead of RELEASE_3_14.  Well, I am not
> very familiar with the Bioconductor workflow for their release.

That’s because the importer doesn’t let us specify a different branch.
We should add that, but it’s strictly separate from the migration we’re
about to embark on.

> Last, using this in gnu/packages/bioconductor.scm,
>
> (define-public r-chetah
>   (package
>     (name "r-chetah")
>     (version "1.11.2")
>     (source
>      (origin
>        (method git-fetch)
>        (uri (bioconductor-git-reference "CHETAH"))
>        (file-name (git-file-name name version))
>        (sha256
>         (base32 "021v5831zqdy4pirfsb35kbnz8kmz4lxqc4cwi55qgd6r081xlgh"))))
>     (properties `((upstream-name . "CHETAH")))
>     (build-system r-build-system)
>     (propagated-inputs
>      (list r-biodist
>            r-corrplot
>            r-cowplot
>            r-dendextend
>            r-ggplot2
>            r-gplots
>            r-pheatmap
>            r-plotly
>            r-reshape2
>            r-s4vectors
>            r-shiny
>            r-singlecellexperiment
>            r-summarizedexperiment))
>     (native-inputs (list r-knitr))
>     (home-page "https://git.bioconductor.org/packages/CHETAH")
>     (synopsis "Fast and accurate scRNA-seq cell type identification")
>     (description
>      "CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is
> an accurate, selective and fast scRNA-seq classifier.  Classification is guided
> by a reference dataset, preferentially also a scRNA-seq dataset.  By
> hierarchical clustering of the reference data, CHETAH creates a classification
> tree that enables a step-wise, top-to-bottom classification.  Using a novel
> stopping rule, CHETAH classifies the input cells to the cell types of the
> references and to \"intermediate types\": more general classifications that ended
> in an intermediate node of the tree.")
>     (license #f)))
>
> it just builds with,
>
>     ./pre-inst-env guix build r-chetah
>
>
>
> WDYT?

Neat :)

-- 
Ricardo




Information forwarded to bug-guix <at> gnu.org:
bug#54787; Package guix. (Thu, 14 Apr 2022 13:10:01 GMT) Full text and rfc822 format available.

Message #17 received at 54787 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: 54787 <at> debbugs.gnu.org
Subject: Re: bug#54787: importer Bioconductor: no tarball, only Git
Date: Thu, 14 Apr 2022 14:59:54 +0200
Hi Ricardo,

On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado <at> elephly.net> wrote:

> We probably should *not* use RELEASE_3_14 (or whatever) as the commit,
> though, because that is a moving target.  We need to resolve to the
> actual commit and use its hash.
>
> I wonder how the updater would need to be changed.  It would need to
> know about the release branch and look for new commits in that branch
> only.

To be honest, I have not checked the Bioconductor documentation about
their Git repo structure.  What I see is:

--8<---------------cut here---------------start------------->8---
$ git clone https://git.bioconductor.org/packages/CHETAH
$ cd CHETAH
$ git branch -av
* master                      5d5f5df [origin/master] Pass serialized S4 instances thru updateObject()
  remotes/origin/HEAD         -> origin/master
  remotes/origin/RELEASE_3_10 063de2d bump x.y.z version to even y prior to creation of RELEASE_3_10 branch
  remotes/origin/RELEASE_3_11 701ca7f bump x.y.z version to even y prior to creation of RELEASE_3_11 branch
  remotes/origin/RELEASE_3_12 cd3dd78 bump x.y.z version to even y prior to creation of RELEASE_3_12 branch
  remotes/origin/RELEASE_3_13 1eacdb8 bump x.y.z version to even y prior to creation of RELEASE_3_13 branch
  remotes/origin/RELEASE_3_14 03295c9 bump x.y.z version to even y prior to creation of RELEASE_3_14 branch
  remotes/origin/RELEASE_3_9  22b53f2 version bump
  remotes/origin/master       5d5f5df Pass serialized S4 instances thru updateObject()
--8<---------------cut here---------------end--------------->8---


Do we follow ’master’?  Is it a mirror of what Bioconductor names their
3.14 release?

My guess was that RELEASE_3_14 mirrors their 3.14 release.


>> Well, I am also in favor to break the API and move %bioconductor-version
>> and %bioconductor-url to (guix build-system r).  WDYT?  It would
>> simplify some things (#36805 and #39885), I guess.
>
> We tried this before and we couldn’t do this because of a circular
> reference.

Well, I have something that works.  So I do not know if this circular
reference is still there.



> That’s because the importer doesn’t let us specify a different branch.
> We should add that, but it’s strictly separate from the migration we’re
> about to embark on.

I am not familiar with the updater (guix refresh -u).  My plan is:

 1. Add bioconductor-git-reference
 2. Adapt the bioconductor importer.
 3. Updater?

The question is: do we have to include the migration in the updater?  Or
do we do the migration by custom scripts?


Note that, because we do not support shallow clones, the complete
sources will be a bit bigger; since they contain all the Bioconductor
history of all the packages.


Cheers,
simon





Information forwarded to bug-guix <at> gnu.org:
bug#54787; Package guix. (Thu, 14 Apr 2022 14:01:02 GMT) Full text and rfc822 format available.

Message #20 received at 54787 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 54787 <at> debbugs.gnu.org
Subject: Re: bug#54787: importer Bioconductor: no tarball, only Git
Date: Thu, 14 Apr 2022 15:57:25 +0200
zimoun <zimon.toutoune <at> gmail.com> writes:

> On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado <at> elephly.net> wrote:
>
>> We probably should *not* use RELEASE_3_14 (or whatever) as the commit,
>> though, because that is a moving target.  We need to resolve to the
>> actual commit and use its hash.
>>
>> I wonder how the updater would need to be changed.  It would need to
>> know about the release branch and look for new commits in that branch
>> only.
>
> To be honest, I have not checked the Bioconductor documentation about
> their Git repo structure.  What I see is:
>
> $ git clone https://git.bioconductor.org/packages/CHETAH
> $ cd CHETAH
> $ git branch -av
> * master                      5d5f5df [origin/master] Pass serialized S4 instances thru updateObject()
>   remotes/origin/HEAD         -> origin/master
>   remotes/origin/RELEASE_3_10 063de2d bump x.y.z version to even y prior to creation of RELEASE_3_10 branch
>   remotes/origin/RELEASE_3_11 701ca7f bump x.y.z version to even y prior to creation of RELEASE_3_11 branch
>   remotes/origin/RELEASE_3_12 cd3dd78 bump x.y.z version to even y prior to creation of RELEASE_3_12 branch
>   remotes/origin/RELEASE_3_13 1eacdb8 bump x.y.z version to even y prior to creation of RELEASE_3_13 branch
>   remotes/origin/RELEASE_3_14 03295c9 bump x.y.z version to even y prior to creation of RELEASE_3_14 branch
>   remotes/origin/RELEASE_3_9  22b53f2 version bump
>   remotes/origin/master       5d5f5df Pass serialized S4 instances thru updateObject()
>
>
> Do we follow ’master’?  Is it a mirror of what Bioconductor names their
> 3.14 release?

We should not follow “master”.  That’s the development branch.  We
should follow the current release branch.

> My guess was that RELEASE_3_14 mirrors their 3.14 release.

Correct.

>>> Well, I am also in favor to break the API and move %bioconductor-version
>>> and %bioconductor-url to (guix build-system r).  WDYT?  It would
>>> simplify some things (#36805 and #39885), I guess.
>>
>> We tried this before and we couldn’t do this because of a circular
>> reference.
>
> Well, I have something that works.  So I do not know if this circular
> reference is still there.

If “make as-derivation” does not fail it is probably okay.

>> That’s because the importer doesn’t let us specify a different branch.
>> We should add that, but it’s strictly separate from the migration we’re
>> about to embark on.
>
> I am not familiar with the updater (guix refresh -u).  My plan is:
>
>  1. Add bioconductor-git-reference
>  2. Adapt the bioconductor importer.
>  3. Updater?

The updater is closely connected to the importer.  It just needs to be
told how it can find new releases.

> The question is: do we have to include the migration in the updater?  Or
> do we do the migration by custom scripts?

We can do the migration manually.  But if we end up with a broken
updater I won’t be able to update Bioconductor packages in bulk; that
would be a serious problem for future maintenance.

> Note that, because we do not support shallow clones, the complete
> sources will be a bit bigger; since they contain all the Bioconductor
> history of all the packages.

Doesn’t Guile-Git support shallow clones?  In any case, this should not
be an obstacle for us.  Ensuring long-term reproducibility is more
important than space savings.

-- 
Ricardo




Information forwarded to bug-guix <at> gnu.org:
bug#54787; Package guix. (Thu, 14 Apr 2022 14:05:02 GMT) Full text and rfc822 format available.

Message #23 received at 54787 <at> debbugs.gnu.org (full text, mbox):

From: Maxime Devos <maximedevos <at> telenet.be>
To: Ricardo Wurmus <rekado <at> elephly.net>, zimoun <zimon.toutoune <at> gmail.com>
Cc: 54787 <at> debbugs.gnu.org
Subject: Re: bug#54787: importer Bioconductor: no tarball, only Git
Date: Thu, 14 Apr 2022 16:04:51 +0200
[Message part 1 (text/plain, inline)]
Ricardo Wurmus schreef op do 14-04-2022 om 13:43 [+0200]:
> I wonder how the updater would need to be changed.  It would need to
> know about the release branch and look for new commits in that branch
> only.

Perhaps <https://issues.guix.gnu.org/53144> would be useful?  It adds a
'latest-git-updater' refresher that looks in a branch (or more
generally, any reference, so in principle a tag that is repeatedly
replaced would work as well) for the latest commit.  There are some
unaddressed comments though ...

Greetings,
Maxime.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#54787; Package guix. (Thu, 14 Apr 2022 15:11:02 GMT) Full text and rfc822 format available.

Message #26 received at 54787 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: 54787 <at> debbugs.gnu.org
Subject: Re: bug#54787: importer Bioconductor: no tarball, only Git
Date: Thu, 14 Apr 2022 17:03:37 +0200
On Thu, 14 Apr 2022 at 15:57, Ricardo Wurmus <rekado <at> elephly.net> wrote:
> zimoun <zimon.toutoune <at> gmail.com> writes:
>
>> On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado <at> elephly.net> wrote:
>>
>>> We probably should *not* use RELEASE_3_14 (or whatever) as the commit,
>>> though, because that is a moving target.  We need to resolve to the
>>> actual commit and use its hash.

[...]

>> Do we follow ’master’?  Is it a mirror of what Bioconductor names their
>> 3.14 release?
>
> We should not follow “master”.  That’s the development branch.  We
> should follow the current release branch.

To be sure to well understand you, you point is to have something like:

--8<---------------cut here---------------start------------->8---
  (define* (bioconductor-git-reference name #:key commit)
    (git-reference
     (url (string-append %bioconductor-git-url name))
     (commit commit))))
--8<---------------cut here---------------end--------------->8---

with an explicit commit for each package definition, right?


> Doesn’t Guile-Git support shallow clones?  In any case, this should not
> be an obstacle for us.  Ensuring long-term reproducibility is more
> important than space savings.

No, since libgit2 does not support it, IIUC.

<https://github.com/libgit2/libgit2/issues/3058>


Cheers,
simon




This bug report was last modified 3 years and 63 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.