GNU bug report logs - #78542
[Security] hash locking needed for tree-sitter downloads

Previous Next

Package: emacs;

Reported by: Daniel Colascione <dancol <at> dancol.org>

Date: Wed, 21 May 2025 19:13:04 UTC

Severity: normal

Fixed in version 31.0.50

Done: Juri Linkov <juri <at> linkov.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 78542 in the body.
You can then email your comments to 78542 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Wed, 21 May 2025 19:13:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniel Colascione <dancol <at> dancol.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 21 May 2025 19:13:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: bug-gnu-emacs <at> gnu.org
Subject: [Security] hash locking needed for tree-sitter downloads
Date: Wed, 21 May 2025 15:12:32 -0400
When downloading code, a tag isn't good enough.  We should insist on a
specific commit.

We have a fair bit of code in Emacs that looks like this:

(add-to-list
 'treesit-language-source-alist
 '(javascript "https://github.com/tree-sitter/tree-sitter-javascript" "v0.23.1")
 t)
(add-to-list
 'treesit-language-source-alist
 '(jsdoc "https://github.com/tree-sitter/tree-sitter-jsdoc" "v0.23.2")
 t)

The entries in treesit-language-source-alist mostly have tags but not
commit hashes.  The expected commit hash should be *mandatory*, because
right now, anyone with access to one of these repositories can retarget
any of those tags at malicious code.

See https://snyk.io/blog/npm-security-preventing-supply-chain-attacks/

Every other important language ecosystem has evolved some kind of "hash
locking" capability for breaking the author-retargets-to-malware attack
vector.  We should too.  We shouldn't allow the commit hash to be absent
for ordinary users.

P.S. we've debated vendoring these grammars with Emacs.  I still think
that's the right way to go.  But if we're going to download and build,
we should at least do it in a secure way.

P.S.S. Do we need the list of grammars in build.sh under admin? It
duplicates what's in Lisp elsewhere in the tree.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Thu, 22 May 2025 06:47:04 GMT) Full text and rfc822 format available.

Message #8 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Thu, 22 May 2025 09:36:57 +0300
> When downloading code, a tag isn't good enough.  We should insist on a
> specific commit.
> [...]
> The entries in treesit-language-source-alist mostly have tags but not
> commit hashes.  The expected commit hash should be *mandatory*, because
> right now, anyone with access to one of these repositories can retarget
> any of those tags at malicious code.

Indeed, tags can be easily relocated to a different commit.

> Every other important language ecosystem has evolved some kind of "hash
> locking" capability for breaking the author-retargets-to-malware attack
> vector.  We should too.  We shouldn't allow the commit hash to be absent
> for ordinary users.

Agreed, "hash locking" should lock commit hashes, not tags.

> P.S. we've debated vendoring these grammars with Emacs.  I still think
> that's the right way to go.  But if we're going to download and build,
> we should at least do it in a secure way.

The only reason currently tags are used instead of commit hashes is
because there is no way to checkout a specific commit with the
current implementation when the default value of
'treesit--install-language-grammar-full-clone' is nil.

> P.S.S. Do we need the list of grammars in build.sh under admin? It
> duplicates what's in Lisp elsewhere in the tree.

Apparently no need, so they could be removed.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sat, 07 Jun 2025 08:07:02 GMT) Full text and rfc822 format available.

Message #11 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>, casouri <at> gmail.com
Cc: 78542 <at> debbugs.gnu.org, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Sat, 07 Jun 2025 11:05:51 +0300
Ping!  Do we want to make some progress here?

> Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org
> From: Juri Linkov <juri <at> linkov.net>
> Date: Thu, 22 May 2025 09:36:57 +0300
> 
> > When downloading code, a tag isn't good enough.  We should insist on a
> > specific commit.
> > [...]
> > The entries in treesit-language-source-alist mostly have tags but not
> > commit hashes.  The expected commit hash should be *mandatory*, because
> > right now, anyone with access to one of these repositories can retarget
> > any of those tags at malicious code.
> 
> Indeed, tags can be easily relocated to a different commit.
> 
> > Every other important language ecosystem has evolved some kind of "hash
> > locking" capability for breaking the author-retargets-to-malware attack
> > vector.  We should too.  We shouldn't allow the commit hash to be absent
> > for ordinary users.
> 
> Agreed, "hash locking" should lock commit hashes, not tags.
> 
> > P.S. we've debated vendoring these grammars with Emacs.  I still think
> > that's the right way to go.  But if we're going to download and build,
> > we should at least do it in a secure way.
> 
> The only reason currently tags are used instead of commit hashes is
> because there is no way to checkout a specific commit with the
> current implementation when the default value of
> 'treesit--install-language-grammar-full-clone' is nil.
> 
> > P.S.S. Do we need the list of grammars in build.sh under admin? It
> > duplicates what's in Lisp elsewhere in the tree.
> 
> Apparently no need, so they could be removed.
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sun, 08 Jun 2025 17:50:01 GMT) Full text and rfc822 format available.

Message #14 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78542 <at> debbugs.gnu.org, casouri <at> gmail.com, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Sun, 08 Jun 2025 20:45:42 +0300
>> The only reason currently tags are used instead of commit hashes is
>> because there is no way to checkout a specific commit with the
>> current implementation when the default value of
>> 'treesit--install-language-grammar-full-clone' is nil.

Here is the current state:

1. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json")

  installs the latest commit 46aa487.

2. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json"
    "v0.24.8")

  installs the commit ee35a6e tagged v0.24.8.

3. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json"
    "4d770d3")

  fails to check out "4d770d3" with the error:

  git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
  warning: Could not find remote branch 4d770d3 to clone
  fatal: Remote branch 4d770d3 not found in upstream origin

4. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json"
    nil nil nil nil "4d770d3")

  fails to check out "4d770d3" with the error:

  git -C /tmp/treesit-workdirHhEIhg/repo checkout 4d770d3
  error: pathspec '4d770d3' did not match any file(s) known to git

After (setq treesit--install-language-grammar-full-clone t):

5. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json"
    "4d770d3")

  successfully installs the commit "v0.24.8-1-g4d770d3".

When treesit--install-language-grammar-full-clone is nil,
"--depth 1" is added to "git clone".

So we need a Git guru to recommend a command line to use
"git clone" with "--depth 1" to check out a single commit.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Tue, 10 Jun 2025 01:39:01 GMT) Full text and rfc822 format available.

Message #17 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 78542 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Mon, 9 Jun 2025 18:38:03 -0700

> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>>> The only reason currently tags are used instead of commit hashes is
>>> because there is no way to checkout a specific commit with the
>>> current implementation when the default value of
>>> 'treesit--install-language-grammar-full-clone' is nil.
> 
> Here is the current state:
> 
> 1. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json")
> 
>  installs the latest commit 46aa487.
> 
> 2. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    "v0.24.8")
> 
>  installs the commit ee35a6e tagged v0.24.8.
> 
> 3. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    "4d770d3")
> 
>  fails to check out "4d770d3" with the error:
> 
>  git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
>  warning: Could not find remote branch 4d770d3 to clone
>  fatal: Remote branch 4d770d3 not found in upstream origin
> 
> 4. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    nil nil nil nil "4d770d3")
> 
>  fails to check out "4d770d3" with the error:
> 
>  git -C /tmp/treesit-workdirHhEIhg/repo checkout 4d770d3
>  error: pathspec '4d770d3' did not match any file(s) known to git
> 
> After (setq treesit--install-language-grammar-full-clone t):
> 
> 5. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    "4d770d3")
> 
>  successfully installs the commit "v0.24.8-1-g4d770d3".
> 
> When treesit--install-language-grammar-full-clone is nil,
> "--depth 1" is added to "git clone".
> 
> So we need a Git guru to recommend a command line to use
> "git clone" with "--depth 1" to check out a single commit.

Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,

git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
cd tree-sitter-json
git checkout 4d770d3
git fetch --depth=1

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Tue, 10 Jun 2025 06:48:03 GMT) Full text and rfc822 format available.

Message #20 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 78542 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Tue, 10 Jun 2025 09:23:31 +0300
>> When treesit--install-language-grammar-full-clone is nil,
>> "--depth 1" is added to "git clone".
>> 
>> So we need a Git guru to recommend a command line to use
>> "git clone" with "--depth 1" to check out a single commit.
>
> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>
> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
> cd tree-sitter-json
> git checkout 4d770d3
> git fetch --depth=1

This still keeps full history.  This means we could simply
set the default value of treesit--install-language-grammar-full-clone
to t, or completely remove this variable, if there is no way
to clone at a specific commit without fetching full history?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Tue, 10 Jun 2025 07:45:02 GMT) Full text and rfc822 format available.

Message #23 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Tue, 10 Jun 2025 00:44:30 -0700
Juri Linkov <juri <at> linkov.net> writes:

>>> When treesit--install-language-grammar-full-clone is nil,
>>> "--depth 1" is added to "git clone".
>>> 
>>> So we need a Git guru to recommend a command line to use
>>> "git clone" with "--depth 1" to check out a single commit.
>>
>> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>>
>> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
>> cd tree-sitter-json
>> git checkout 4d770d3
>> git fetch --depth=1
>
> This still keeps full history.

There's a difference between full history and all blobs for all
revisions in this history.  You can also use --shallow-since during the
clone with a date to further limit history.  --shallow-exclude would
probably work even better, since you wouldn't need a date, but it's
broken for me somehow, at least with the repository above.
But --shallow-since works.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Thu, 19 Jun 2025 17:09:01 GMT) Full text and rfc822 format available.

Message #26 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Thu, 19 Jun 2025 20:06:51 +0300
>>>> When treesit--install-language-grammar-full-clone is nil,
>>>> "--depth 1" is added to "git clone".
>>>> 
>>>> So we need a Git guru to recommend a command line to use
>>>> "git clone" with "--depth 1" to check out a single commit.
>>>
>>> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>>>
>>> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
>>> cd tree-sitter-json
>>> git checkout 4d770d3
>>> git fetch --depth=1
>>
>> This still keeps full history.
>
> There's a difference between full history and all blobs for all
> revisions in this history.  You can also use --shallow-since during the
> clone with a date to further limit history.  --shallow-exclude would
> probably work even better, since you wouldn't need a date, but it's
> broken for me somehow, at least with the repository above.
> But --shallow-since works.

I can't find what value to provide for --shallow-since.
So let's just use a blobless full clone:

diff --git a/lisp/treesit.el b/lisp/treesit.el
index 353e991ec20..5d03f0cf45e 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -5238,7 +5238,13 @@ treesit--install-language-grammar-1
           (if url-is-dir
               (when revision
                 (treesit--git-checkout-branch workdir revision))
-            (treesit--git-clone-repo url revision workdir))
+            (if commit
+                ;; Force blobless full clone to be able later
+                ;; to checkout a commit (bug#78542).
+                (let ((treesit--install-language-grammar-blobless t)
+                      (treesit--install-language-grammar-full-clone t))
+                  (treesit--git-clone-repo url revision workdir))
+              (treesit--git-clone-repo url revision workdir)))
           (when commit
             (treesit--git-checkout-branch workdir commit))
           (setq version (treesit--language-git-revision workdir))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Thu, 19 Jun 2025 17:57:01 GMT) Full text and rfc822 format available.

Message #29 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Thu, 19 Jun 2025 20:54:08 +0300
[Message part 1 (text/plain, inline)]
> +            (if commit
> +                ;; Force blobless full clone to be able later
> +                ;; to checkout a commit (bug#78542).
> +                (let ((treesit--install-language-grammar-blobless t)
> +                      (treesit--install-language-grammar-full-clone t))
> +                  (treesit--git-clone-repo url revision workdir))
> +              (treesit--git-clone-repo url revision workdir)))

Since with this change it's possible to specify the commit,
let's also improve the format of the source list.
Currently adding a commit to the list requires
prefixing it with four nils:

  (treesit--install-language-grammar-1
   (locate-user-emacs-file "tree-sitter") 'json
   "https://github.com/tree-sitter/tree-sitter-json"
   nil nil nil nil "4d770d3")

The following patch introduces an alternative format
using keywords, e.g.:

  (treesit--install-language-grammar-1
   (locate-user-emacs-file "tree-sitter") 'json
   "https://github.com/tree-sitter/tree-sitter-json"
   :commit "4d770d3")

[treesit-language-source-alist.patch (text/x-diff, inline)]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 353e991ec20..fedcb6ed1e9 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -4998,7 +4998,7 @@ treesit-language-source-alist
 
 The value should be an alist where each element has the form
 
-    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
+    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
 
 Only LANG and URL are mandatory.  LANG is the language symbol.
 URL is the URL of the grammar's Git repository or a directory
@@ -5015,8 +5015,17 @@ treesit-language-source-alist
 CC and C++ are C and C++ compilers, defaulting to \"cc\" and
 \"c++\", respectively.
 
+Another way to specify optional data is to use keywords:
+
+    (LANG . (URL [KEYWORD VALUE]...))
+
 The currently supported keywords:
 
+`:revision' is the same as REVISION above.
+`:source-dir' is the same as SOURCE-DIR above.
+`:cc' is the same as CC above.
+`:c++' is the same as C++ above.
+`:commit' is the same as COMMIT above.
 `:copy-queries' when non-nil specifies whether to copy the files
 in the \"queries\" directory from the source directory to the
 installation directory.")
@@ -5203,7 +5212,7 @@ treesit--git-clone-repo
     (apply #'treesit--call-process-signal args)))
 
 (defun treesit--install-language-grammar-1
-    (out-dir lang url &optional revision source-dir cc c++ commit &rest args)
+    (out-dir lang url &rest args)
   "Compile and install a tree-sitter language grammar library.
 
 OUT-DIR is the directory to put the compiled library file.  If it
@@ -5211,8 +5220,7 @@ treesit--install-language-grammar-1
 configuration directory is used (and automatically created if it
 does not exist).
 
-For LANG, URL, REVISION, SOURCE-DIR, GRAMMAR-DIR, CC, C++, COMMIT, see
-`treesit-language-source-alist'.
+For ARGS, see `treesit-language-source-alist'.
 
 Return the git revision of the installed grammar.  The revision is
 generated by \"git describe\".  It only works when
@@ -5225,13 +5233,25 @@ treesit--install-language-grammar-1
          (workdir (if url-is-dir
                       maybe-repo-dir
                     (expand-file-name "repo")))
-         copy-queries version)
+         version
+         revision source-dir cc c++ commit copy-queries)
 
     ;; Process the keyword args.
     (while (keywordp (car args))
       (pcase (pop args)
-        (:copy-queries (setq copy-queries (pop args)))
-        (_ (pop args))))
+        (:revision     (setq revision     (pop args)))
+        (:source-dir   (setq source-dir   (pop args)))
+        (:cc           (setq cc           (pop args)))
+        (:c++          (setq c++          (pop args)))
+        (:commit       (setq commit       (pop args)))
+        (:copy-queries (setq copy-queries (pop args)))))
+
+    ;; Old positional convention for backward-compatibility:
+    (unless revision   (setq revision   (nth 0 args)))
+    (unless source-dir (setq source-dir (nth 1 args)))
+    (unless cc         (setq cc         (nth 2 args)))
+    (unless c++        (setq c++        (nth 3 args)))
+    (unless commit     (setq commit     (nth 4 args)))
 
     (unwind-protect
         (with-temp-buffer

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Thu, 19 Jun 2025 18:14:03 GMT) Full text and rfc822 format available.

Message #32 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter downloads
Date: Thu, 19 Jun 2025 14:12:56 -0400
On June 19, 2025 1:54:08 PM EDT, Juri Linkov <juri <at> linkov.net> wrote:
>> +            (if commit
>> +                ;; Force blobless full clone to be able later
>> +                ;; to checkout a commit (bug#78542).
>> +                (let ((treesit--install-language-grammar-blobless t)
>> +                      (treesit--install-language-grammar-full-clone t))
>> +                  (treesit--git-clone-repo url revision workdir))
>> +              (treesit--git-clone-repo url revision workdir)))
>
>Since with this change it's possible to specify the commit,
>let's also improve the format of the source list.
>Currently adding a commit to the list requires
>prefixing it with four nils:
>
>  (treesit--install-language-grammar-1
>   (locate-user-emacs-file "tree-sitter") 'json
>   "https://github.com/tree-sitter/tree-sitter-json"
>   nil nil nil nil "4d770d3")
>
>The following patch introduces an alternative format
>using keywords, e.g.:
>
>  (treesit--install-language-grammar-1
>   (locate-user-emacs-file "tree-sitter") 'json
>   "https://github.com/tree-sitter/tree-sitter-json"
>   :commit "4d770d3")
>


Great. While you're doing this, can you also please use full hashes? Short ones aren't particularly collision resistant.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Fri, 20 Jun 2025 06:56:02 GMT) Full text and rfc822 format available.

Message #35 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 78542 <at> debbugs.gnu.org, casouri <at> gmail.com, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Fri, 20 Jun 2025 09:55:04 +0300
> From: Juri Linkov <juri <at> linkov.net>
> Cc: Yuan Fu <casouri <at> gmail.com>,  78542 <at> debbugs.gnu.org,  Eli Zaretskii
>  <eliz <at> gnu.org>
> Date: Thu, 19 Jun 2025 20:54:08 +0300
> 
>  The value should be an alist where each element has the form
>  
> -    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
> +    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
>  
>  Only LANG and URL are mandatory.  LANG is the language symbol.
>  URL is the URL of the grammar's Git repository or a directory
> @@ -5015,8 +5015,17 @@ treesit-language-source-alist
>  CC and C++ are C and C++ compilers, defaulting to \"cc\" and
>  \"c++\", respectively.
>  
> +Another way to specify optional data is to use keywords:
> +
> +    (LANG . (URL [KEYWORD VALUE]...))
> +
>  The currently supported keywords:
>  
> +`:revision' is the same as REVISION above.
> +`:source-dir' is the same as SOURCE-DIR above.
> +`:cc' is the same as CC above.
> +`:c++' is the same as C++ above.
> +`:commit' is the same as COMMIT above.
>  `:copy-queries' when non-nil specifies whether to copy the files
>  in the \"queries\" directory from the source directory to the
>  installation directory.")

This is okay, but I guess the keywords are not entirely independent?
That is, to have a valid spec one needs several keywords to be
specified together?  In that case, I think this should be stated in
the doc string.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Fri, 20 Jun 2025 17:00:02 GMT) Full text and rfc822 format available.

Message #38 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78542 <at> debbugs.gnu.org, casouri <at> gmail.com, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Fri, 20 Jun 2025 19:48:09 +0300
>>  The value should be an alist where each element has the form
>>  
>> -    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
>> +    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
>>  
>>  Only LANG and URL are mandatory.  LANG is the language symbol.
>>  URL is the URL of the grammar's Git repository or a directory
>> @@ -5015,8 +5015,17 @@ treesit-language-source-alist
>>  CC and C++ are C and C++ compilers, defaulting to \"cc\" and
>>  \"c++\", respectively.
>>  
>> +Another way to specify optional data is to use keywords:
>> +
>> +    (LANG . (URL [KEYWORD VALUE]...))
>> +
>>  The currently supported keywords:
>>  
>> +`:revision' is the same as REVISION above.
>> +`:source-dir' is the same as SOURCE-DIR above.
>> +`:cc' is the same as CC above.
>> +`:c++' is the same as C++ above.
>> +`:commit' is the same as COMMIT above.
>>  `:copy-queries' when non-nil specifies whether to copy the files
>>  in the \"queries\" directory from the source directory to the
>>  installation directory.")
>
> This is okay, but I guess the keywords are not entirely independent?
> That is, to have a valid spec one needs several keywords to be
> specified together?  In that case, I think this should be stated in
> the doc string.

Actually, the keywords are independent.  This was the reason
to introduce the keywords, so they could be specified separately
from other keywords.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Fri, 20 Jun 2025 17:01:02 GMT) Full text and rfc822 format available.

Message #41 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Fri, 20 Jun 2025 19:56:46 +0300
close 78542 31.0.50
thanks

>> The following patch introduces an alternative format
>> using keywords, e.g.:
>>
>>  (treesit--install-language-grammar-1
>>   (locate-user-emacs-file "tree-sitter") 'json
>>   "https://github.com/tree-sitter/tree-sitter-json"
>>   :commit "4d770d3")
>
> Great. While you're doing this, can you also please use full hashes?
> Short ones aren't particularly collision resistant.

So now replaced tags with full hashes that either correspond
to the previous tags or are mentioned explicitly in the comments
section of ts-mode files.

> P.S.S. Do we need the list of grammars in build.sh under admin? It
> duplicates what's in Lisp elsewhere in the tree.

I don't know if build.sh is still used or can be removed.
Maybe Yuan could answer.




bug marked as fixed in version 31.0.50, send any further explanations to 78542 <at> debbugs.gnu.org and Daniel Colascione <dancol <at> dancol.org> Request was from Juri Linkov <juri <at> linkov.net> to control <at> debbugs.gnu.org. (Fri, 20 Jun 2025 17:01:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Fri, 20 Jun 2025 22:39:03 GMT) Full text and rfc822 format available.

Message #46 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Peter Oliver <p.d.oliver <at> mavit.org.uk>
To: juri <at> linkov.net
Cc: casouri <at> gmail.com, 78542 <at> debbugs.gnu.org, dancol <at> dancol.org, eliz <at> gnu.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
Date: Fri, 20 Jun 2025 23:37:47 +0100 (BST)
[Message part 1 (text/plain, inline)]
On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:

> Here is the current state:
> 
> 3. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    "4d770d3")
>
>  fails to check out "4d770d3" with the error:
>
>  git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
>  warning: Could not find remote branch 4d770d3 to clone
>  fatal: Remote branch 4d770d3 not found in upstream origin

I’m a bit late to the party, here, but would it make sense to have, say:

  (treesit--install-language-grammar-1
   (locate-user-emacs-file "tree-sitter") 'json
   "https://github.com/tree-sitter/tree-sitter-json"
   :tag "v0.24.8"
   :commit "4d770d31f732d50d3ec373865822fbe659e47c75")

We could then:

  git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b v0.24.8
  git checkout 4d770d31f732d50d3ec373865822fbe659e47c75

Additionally, I think including the tag helps to clarify the intention to anyone reading the code, without them having to go away and refer to the repository to find out about that commit.

-- 
Peter Oliver

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Fri, 20 Jun 2025 23:06:02 GMT) Full text and rfc822 format available.

Message #49 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Stéphane Marks <shipmints <at> gmail.com>
To: Peter Oliver <p.d.oliver <at> mavit.org.uk>
Cc: casouri <at> gmail.com, 78542 <at> debbugs.gnu.org, dancol <at> dancol.org, eliz <at> gnu.org,
 juri <at> linkov.net
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
Date: Fri, 20 Jun 2025 19:04:55 -0400
[Message part 1 (text/plain, inline)]
On Fri, Jun 20, 2025 at 6:39 PM Peter Oliver <p.d.oliver <at> mavit.org.uk>
wrote:

> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
>
> > Here is the current state:
> >
> > 3. (treesit--install-language-grammar-1
> >    (locate-user-emacs-file "tree-sitter") 'json
> >    "https://github.com/tree-sitter/tree-sitter-json"
> >    "4d770d3")
> >
> >  fails to check out "4d770d3" with the error:
> >
> >  git clone https://github.com/tree-sitter/tree-sitter-json --quiet
> --depth 1 -b 4d770d3
> >  warning: Could not find remote branch 4d770d3 to clone
> >  fatal: Remote branch 4d770d3 not found in upstream origin
>
> I’m a bit late to the party, here, but would it make sense to have, say:
>
>    (treesit--install-language-grammar-1
>     (locate-user-emacs-file "tree-sitter") 'json
>     "https://github.com/tree-sitter/tree-sitter-json"
>     :tag "v0.24.8"
>     :commit "4d770d31f732d50d3ec373865822fbe659e47c75")
>
> We could then:
>
>    git clone https://github.com/tree-sitter/tree-sitter-json --quiet
> --depth 1 -b v0.24.8
>    git checkout 4d770d31f732d50d3ec373865822fbe659e47c75
>
> Additionally, I think including the tag helps to clarify the intention to
> anyone reading the code, without them having to go away and refer to the
> repository to find out about that commit.


git tags aren't really immutable, though, as they can be changed to point
to other commits.  If you want to specify both a commit hash and a tag and
the tag doesn't or no longer points to that commit, that would be
confusing.  I'd say prioritize commit hashes over tags and not sure if a
:tag keyword would just act as documentation or a comment or just use a
comment?
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sat, 21 Jun 2025 04:25:02 GMT) Full text and rfc822 format available.

Message #52 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Stéphane Marks <shipmints <at> gmail.com>
Cc: 78542 <at> debbugs.gnu.org, casouri <at> gmail.com,
 Peter Oliver <p.d.oliver <at> mavit.org.uk>, eliz <at> gnu.org, juri <at> linkov.net
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
Date: Sat, 21 Jun 2025 00:24:27 -0400
Stéphane Marks <shipmints <at> gmail.com> writes:

> On Fri, Jun 20, 2025 at 6:39 PM Peter Oliver <p.d.oliver <at> mavit.org.uk>
> wrote:
>
>> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
>>
>> > Here is the current state:
>> >
>> > 3. (treesit--install-language-grammar-1
>> >    (locate-user-emacs-file "tree-sitter") 'json
>> >    "https://github.com/tree-sitter/tree-sitter-json"
>> >    "4d770d3")
>> >
>> >  fails to check out "4d770d3" with the error:
>> >
>> >  git clone https://github.com/tree-sitter/tree-sitter-json --quiet
>> --depth 1 -b 4d770d3
>> >  warning: Could not find remote branch 4d770d3 to clone
>> >  fatal: Remote branch 4d770d3 not found in upstream origin
>>
>> I’m a bit late to the party, here, but would it make sense to have, say:
>>
>>    (treesit--install-language-grammar-1
>>     (locate-user-emacs-file "tree-sitter") 'json
>>     "https://github.com/tree-sitter/tree-sitter-json"
>>     :tag "v0.24.8"
>>     :commit "4d770d31f732d50d3ec373865822fbe659e47c75")
>>
>> We could then:
>>
>>    git clone https://github.com/tree-sitter/tree-sitter-json --quiet
>> --depth 1 -b v0.24.8
>>    git checkout 4d770d31f732d50d3ec373865822fbe659e47c75
>>
>> Additionally, I think including the tag helps to clarify the intention to
>> anyone reading the code, without them having to go away and refer to the
>> repository to find out about that commit.
>
>
> git tags aren't really immutable, though, as they can be changed to point
> to other commits.  If you want to specify both a commit hash and a tag and
> the tag doesn't or no longer points to that commit, that would be
> confusing.

Or an error. I guess you could include tag names to allow for some kind
of UX shorthand while verifying, using the hashes, that the tags still
refer to their designated trees.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sat, 21 Jun 2025 06:28:02 GMT) Full text and rfc822 format available.

Message #55 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 78542 <at> debbugs.gnu.org, casouri <at> gmail.com, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Sat, 21 Jun 2025 09:27:38 +0300
> From: Juri Linkov <juri <at> linkov.net>
> Cc: dancol <at> dancol.org,  casouri <at> gmail.com,  78542 <at> debbugs.gnu.org
> Date: Fri, 20 Jun 2025 19:48:09 +0300
> 
> >>  The value should be an alist where each element has the form
> >>  
> >> -    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
> >> +    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
> >>  
> >>  Only LANG and URL are mandatory.  LANG is the language symbol.
> >>  URL is the URL of the grammar's Git repository or a directory
> >> @@ -5015,8 +5015,17 @@ treesit-language-source-alist
> >>  CC and C++ are C and C++ compilers, defaulting to \"cc\" and
> >>  \"c++\", respectively.
> >>  
> >> +Another way to specify optional data is to use keywords:
> >> +
> >> +    (LANG . (URL [KEYWORD VALUE]...))
> >> +
> >>  The currently supported keywords:
> >>  
> >> +`:revision' is the same as REVISION above.
> >> +`:source-dir' is the same as SOURCE-DIR above.
> >> +`:cc' is the same as CC above.
> >> +`:c++' is the same as C++ above.
> >> +`:commit' is the same as COMMIT above.
> >>  `:copy-queries' when non-nil specifies whether to copy the files
> >>  in the \"queries\" directory from the source directory to the
> >>  installation directory.")
> >
> > This is okay, but I guess the keywords are not entirely independent?
> > That is, to have a valid spec one needs several keywords to be
> > specified together?  In that case, I think this should be stated in
> > the doc string.
> 
> Actually, the keywords are independent.

You mean, it's okay to have just the :source-dir, say, and nothing
else, and that would produce a complete specification that could be
used to install or upgrade the grammar library?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sat, 21 Jun 2025 10:52:04 GMT) Full text and rfc822 format available.

Message #58 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Stéphane Marks <shipmints <at> gmail.com>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: 78542 <at> debbugs.gnu.org, casouri <at> gmail.com,
 Peter Oliver <p.d.oliver <at> mavit.org.uk>, eliz <at> gnu.org, juri <at> linkov.net
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
Date: Sat, 21 Jun 2025 06:51:01 -0400
[Message part 1 (text/plain, inline)]
On Sat, Jun 21, 2025 at 12:24 AM Daniel Colascione <dancol <at> dancol.org>
wrote:

> Stéphane Marks <shipmints <at> gmail.com> writes:
>
> > On Fri, Jun 20, 2025 at 6:39 PM Peter Oliver <p.d.oliver <at> mavit.org.uk>
> > wrote:
> >
> >> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
> >>
> >> > Here is the current state:
> >> >
> >> > 3. (treesit--install-language-grammar-1
> >> >    (locate-user-emacs-file "tree-sitter") 'json
> >> >    "https://github.com/tree-sitter/tree-sitter-json"
> >> >    "4d770d3")
> >> >
> >> >  fails to check out "4d770d3" with the error:
> >> >
> >> >  git clone https://github.com/tree-sitter/tree-sitter-json --quiet
> >> --depth 1 -b 4d770d3
> >> >  warning: Could not find remote branch 4d770d3 to clone
> >> >  fatal: Remote branch 4d770d3 not found in upstream origin
> >>
> >> I’m a bit late to the party, here, but would it make sense to have, say:
> >>
> >>    (treesit--install-language-grammar-1
> >>     (locate-user-emacs-file "tree-sitter") 'json
> >>     "https://github.com/tree-sitter/tree-sitter-json"
> >>     :tag "v0.24.8"
> >>     :commit "4d770d31f732d50d3ec373865822fbe659e47c75")
> >>
> >> We could then:
> >>
> >>    git clone https://github.com/tree-sitter/tree-sitter-json --quiet
> >> --depth 1 -b v0.24.8
> >>    git checkout 4d770d31f732d50d3ec373865822fbe659e47c75
> >>
> >> Additionally, I think including the tag helps to clarify the intention
> to
> >> anyone reading the code, without them having to go away and refer to the
> >> repository to find out about that commit.
> >
> >
> > git tags aren't really immutable, though, as they can be changed to point
> > to other commits.  If you want to specify both a commit hash and a tag
> and
> > the tag doesn't or no longer points to that commit, that would be
> > confusing.
>
> Or an error. I guess you could include tag names to allow for some kind
> of UX shorthand while verifying, using the hashes, that the tags still
> refer to their designated trees.
>

Good.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sun, 22 Jun 2025 07:00:02 GMT) Full text and rfc822 format available.

Message #61 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78542 <at> debbugs.gnu.org, casouri <at> gmail.com, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Sun, 22 Jun 2025 09:44:49 +0300
>> >>  The value should be an alist where each element has the form
>> >>  
>> >> -    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
>> >> +    (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
>> >>  
>> >>  Only LANG and URL are mandatory.  LANG is the language symbol.
>> >>  URL is the URL of the grammar's Git repository or a directory
>> >> @@ -5015,8 +5015,17 @@ treesit-language-source-alist
>> >>  CC and C++ are C and C++ compilers, defaulting to \"cc\" and
>> >>  \"c++\", respectively.
>> >>  
>> >> +Another way to specify optional data is to use keywords:
>> >> +
>> >> +    (LANG . (URL [KEYWORD VALUE]...))
>> >> +
>> >>  The currently supported keywords:
>> >>  
>> >> +`:revision' is the same as REVISION above.
>> >> +`:source-dir' is the same as SOURCE-DIR above.
>> >> +`:cc' is the same as CC above.
>> >> +`:c++' is the same as C++ above.
>> >> +`:commit' is the same as COMMIT above.
>> >>  `:copy-queries' when non-nil specifies whether to copy the files
>> >>  in the \"queries\" directory from the source directory to the
>> >>  installation directory.")
>> >
>> > This is okay, but I guess the keywords are not entirely independent?
>> > That is, to have a valid spec one needs several keywords to be
>> > specified together?  In that case, I think this should be stated in
>> > the doc string.
>> 
>> Actually, the keywords are independent.
>
> You mean, it's okay to have just the :source-dir, say, and nothing
> else, and that would produce a complete specification that could be
> used to install or upgrade the grammar library?

Yes, having just the :source-dir means to use a subdir from the HEAD commit.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sun, 22 Jun 2025 07:01:02 GMT) Full text and rfc822 format available.

Message #64 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Peter Oliver <p.d.oliver <at> mavit.org.uk>
Cc: casouri <at> gmail.com, 78542 <at> debbugs.gnu.org, dancol <at> dancol.org, eliz <at> gnu.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
Date: Sun, 22 Jun 2025 09:53:31 +0300
>> 3. (treesit--install-language-grammar-1
>>    (locate-user-emacs-file "tree-sitter") 'json
>>    "https://github.com/tree-sitter/tree-sitter-json"
>>    "4d770d3")
>>
>>  fails to check out "4d770d3" with the error:
>>
>>  git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
>>  warning: Could not find remote branch 4d770d3 to clone
>>  fatal: Remote branch 4d770d3 not found in upstream origin
>
> I’m a bit late to the party, here, but would it make sense to have, say:
>
>   (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    :tag "v0.24.8"
>    :commit "4d770d31f732d50d3ec373865822fbe659e47c75")
>
> We could then:
>
>   git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b v0.24.8
>   git checkout 4d770d31f732d50d3ec373865822fbe659e47c75

This fails with

  fatal: reference is not a tree: 4d770d31f732d50d3ec373865822fbe659e47c75

because the required commit is later than the tag.
This is indicated in the comments section of json-ts-mode.el:

  ;; - tree-sitter-json: v0.24.8-1-g4d770d3

> Additionally, I think including the tag helps to clarify the intention
> to anyone reading the code, without them having to go away and refer
> to the repository to find out about that commit.

Anyone reading the code could look into the comments section where tags
are generated by treesit-admin using 'treesit--language-git-revision'.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Mon, 23 Jun 2025 01:48:02 GMT) Full text and rfc822 format available.

Message #67 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Juri Linkov <juri <at> linkov.net>, Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 78542 <at> debbugs.gnu.org, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Mon, 23 Jun 2025 04:47:32 +0300
On 10/06/2025 09:23, Juri Linkov wrote:
> This still keeps full history.  This means we could simply
> set the default value of treesit--install-language-grammar-full-clone
> to t, or completely remove this variable, if there is no way
> to clone at a specific commit without fetching full history?

This SO answer gives two solutions: https://stackoverflow.com/a/43136160

The first (shorter one) requires the very latest Git client to be 
installed - something for us to note for the future.

The second just requires a suitable configured Git server, which Github 
servers are. Quoting it here:

  git init
  git remote add origin <url>
  git fetch --depth 1 origin <sha1>
  git checkout FETCH_HEAD

The sha1 value must be full, but those are what we decided to use already.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Mon, 23 Jun 2025 06:43:02 GMT) Full text and rfc822 format available.

Message #70 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org, dancol <at> dancol.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Mon, 23 Jun 2025 09:39:20 +0300
>> This still keeps full history.  This means we could simply
>> set the default value of treesit--install-language-grammar-full-clone
>> to t, or completely remove this variable, if there is no way
>> to clone at a specific commit without fetching full history?
>
> This SO answer gives two solutions: https://stackoverflow.com/a/43136160
>
> The first (shorter one) requires the very latest Git client to be installed
> - something for us to note for the future.

Good news!  The new --revision option added in March 2025 is long overdue
and should have been added long ago together with the --branch option.

> The second just requires a suitable configured Git server, which Github
> servers are. Quoting it here:
>
>   git init
>   git remote add origin <url>
>   git fetch --depth 1 origin <sha1>
>   git checkout FETCH_HEAD
>
> The sha1 value must be full, but those are what we decided to use already.

When I tried various similar recipes, they all failed.  Maybe because I tried
with abbreviated SHA1s.  However, with the full SHA1 this seems to work.
I don't know how reliable this method is, since it requires setting
uploadpack.allowReachableSHA1InWant=true on the server side.

Otherwise, let's wait until the new --revision option becomes more widespread.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Mon, 23 Jun 2025 15:47:02 GMT) Full text and rfc822 format available.

Message #73 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Juri Linkov <juri <at> linkov.net>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org, dancol <at> dancol.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Mon, 23 Jun 2025 18:46:21 +0300
On 23/06/2025 09:39, Juri Linkov wrote:
> When I tried various similar recipes, they all failed.  Maybe because I tried
> with abbreviated SHA1s.  However, with the full SHA1 this seems to work.
> I don't know how reliable this method is, since it requires setting
> uploadpack.allowReachableSHA1InWant=true on the server side.

I wonder if the new --revision option relies on that server setting 
anyway (how else would it be implemented?)

> Otherwise, let's wait until the new --revision option becomes more widespread.

Might take a few years (or 5-10). This script runs on the user's 
machine, and historically we've been hesitant to up the requirements on 
the installed version of Git.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Mon, 23 Jun 2025 16:54:05 GMT) Full text and rfc822 format available.

Message #76 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org, dancol <at> dancol.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Mon, 23 Jun 2025 19:51:35 +0300
>> When I tried various similar recipes, they all failed.  Maybe because I tried
>> with abbreviated SHA1s.  However, with the full SHA1 this seems to work.
>> I don't know how reliable this method is, since it requires setting
>> uploadpack.allowReachableSHA1InWant=true on the server side.
>
> I wonder if the new --revision option relies on that server setting anyway
> (how else would it be implemented?)

Can't find any mentions of allowReachableSHA1InWant in
https://github.com/git/git/commit/337855629f59a3f435dabef900e22202ce8e00e1

Probably because --revision is a simplified and limited version of --branch:

  Option `--revision` on contrary detaches HEAD, creates no tracking
  branches, and writes no fetch refspec.

>> Otherwise, let's wait until the new --revision option becomes more widespread.
>
> Might take a few years (or 5-10). This script runs on the user's machine,
> and historically we've been hesitant to up the requirements on the
> installed version of Git.

Meanwhile we could use something like
(version<= "2.49.0" (vc-git--program-version))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Tue, 24 Jun 2025 00:46:02 GMT) Full text and rfc822 format available.

Message #79 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Juri Linkov <juri <at> linkov.net>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org, dancol <at> dancol.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Tue, 24 Jun 2025 03:45:15 +0300
On 23/06/2025 19:51, Juri Linkov wrote:
>>> When I tried various similar recipes, they all failed.  Maybe because I tried
>>> with abbreviated SHA1s.  However, with the full SHA1 this seems to work.
>>> I don't know how reliable this method is, since it requires setting
>>> uploadpack.allowReachableSHA1InWant=true on the server side.
>>
>> I wonder if the new --revision option relies on that server setting anyway
>> (how else would it be implemented?)
> 
> Can't find any mentions of allowReachableSHA1InWant in
> https://github.com/git/git/commit/337855629f59a3f435dabef900e22202ce8e00e1

I think that's client code. Whereas the setting above would be on the 
server.

> Probably because --revision is a simplified and limited version of --branch:
> 
>    Option `--revision` on contrary detaches HEAD, creates no tracking
>    branches, and writes no fetch refspec.

It still needs to know how to fetch a single revision that does not 
reference a tag or etc. Maybe that's a capability that was available 
internally already - but then why would the server setting be needed for 
the "old" solution for this? One using 'git fetch'.

>>> Otherwise, let's wait until the new --revision option becomes more widespread.
>>
>> Might take a few years (or 5-10). This script runs on the user's machine,
>> and historically we've been hesitant to up the requirements on the
>> installed version of Git.
> 
> Meanwhile we could use something like
> (version<= "2.49.0" (vc-git--program-version))

Yeah, that should work.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 22 Jul 2025 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 23 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.