GNU bug report logs - #78542
[Security] hash locking needed for tree-sitter downloads

Previous Next

Package: emacs;

Reported by: Daniel Colascione <dancol <at> dancol.org>

Date: Wed, 21 May 2025 19:13:04 UTC

Severity: normal

To reply to this bug, email your comments to 78542 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Wed, 21 May 2025 19:13:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniel Colascione <dancol <at> dancol.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 21 May 2025 19:13:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: bug-gnu-emacs <at> gnu.org
Subject: [Security] hash locking needed for tree-sitter downloads
Date: Wed, 21 May 2025 15:12:32 -0400
When downloading code, a tag isn't good enough.  We should insist on a
specific commit.

We have a fair bit of code in Emacs that looks like this:

(add-to-list
 'treesit-language-source-alist
 '(javascript "https://github.com/tree-sitter/tree-sitter-javascript" "v0.23.1")
 t)
(add-to-list
 'treesit-language-source-alist
 '(jsdoc "https://github.com/tree-sitter/tree-sitter-jsdoc" "v0.23.2")
 t)

The entries in treesit-language-source-alist mostly have tags but not
commit hashes.  The expected commit hash should be *mandatory*, because
right now, anyone with access to one of these repositories can retarget
any of those tags at malicious code.

See https://snyk.io/blog/npm-security-preventing-supply-chain-attacks/

Every other important language ecosystem has evolved some kind of "hash
locking" capability for breaking the author-retargets-to-malware attack
vector.  We should too.  We shouldn't allow the commit hash to be absent
for ordinary users.

P.S. we've debated vendoring these grammars with Emacs.  I still think
that's the right way to go.  But if we're going to download and build,
we should at least do it in a secure way.

P.S.S. Do we need the list of grammars in build.sh under admin? It
duplicates what's in Lisp elsewhere in the tree.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Thu, 22 May 2025 06:47:04 GMT) Full text and rfc822 format available.

Message #8 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Thu, 22 May 2025 09:36:57 +0300
> When downloading code, a tag isn't good enough.  We should insist on a
> specific commit.
> [...]
> The entries in treesit-language-source-alist mostly have tags but not
> commit hashes.  The expected commit hash should be *mandatory*, because
> right now, anyone with access to one of these repositories can retarget
> any of those tags at malicious code.

Indeed, tags can be easily relocated to a different commit.

> Every other important language ecosystem has evolved some kind of "hash
> locking" capability for breaking the author-retargets-to-malware attack
> vector.  We should too.  We shouldn't allow the commit hash to be absent
> for ordinary users.

Agreed, "hash locking" should lock commit hashes, not tags.

> P.S. we've debated vendoring these grammars with Emacs.  I still think
> that's the right way to go.  But if we're going to download and build,
> we should at least do it in a secure way.

The only reason currently tags are used instead of commit hashes is
because there is no way to checkout a specific commit with the
current implementation when the default value of
'treesit--install-language-grammar-full-clone' is nil.

> P.S.S. Do we need the list of grammars in build.sh under admin? It
> duplicates what's in Lisp elsewhere in the tree.

Apparently no need, so they could be removed.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sat, 07 Jun 2025 08:07:02 GMT) Full text and rfc822 format available.

Message #11 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>, casouri <at> gmail.com
Cc: 78542 <at> debbugs.gnu.org, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Sat, 07 Jun 2025 11:05:51 +0300
Ping!  Do we want to make some progress here?

> Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org
> From: Juri Linkov <juri <at> linkov.net>
> Date: Thu, 22 May 2025 09:36:57 +0300
> 
> > When downloading code, a tag isn't good enough.  We should insist on a
> > specific commit.
> > [...]
> > The entries in treesit-language-source-alist mostly have tags but not
> > commit hashes.  The expected commit hash should be *mandatory*, because
> > right now, anyone with access to one of these repositories can retarget
> > any of those tags at malicious code.
> 
> Indeed, tags can be easily relocated to a different commit.
> 
> > Every other important language ecosystem has evolved some kind of "hash
> > locking" capability for breaking the author-retargets-to-malware attack
> > vector.  We should too.  We shouldn't allow the commit hash to be absent
> > for ordinary users.
> 
> Agreed, "hash locking" should lock commit hashes, not tags.
> 
> > P.S. we've debated vendoring these grammars with Emacs.  I still think
> > that's the right way to go.  But if we're going to download and build,
> > we should at least do it in a secure way.
> 
> The only reason currently tags are used instead of commit hashes is
> because there is no way to checkout a specific commit with the
> current implementation when the default value of
> 'treesit--install-language-grammar-full-clone' is nil.
> 
> > P.S.S. Do we need the list of grammars in build.sh under admin? It
> > duplicates what's in Lisp elsewhere in the tree.
> 
> Apparently no need, so they could be removed.
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Sun, 08 Jun 2025 17:50:01 GMT) Full text and rfc822 format available.

Message #14 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78542 <at> debbugs.gnu.org, casouri <at> gmail.com, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Sun, 08 Jun 2025 20:45:42 +0300
>> The only reason currently tags are used instead of commit hashes is
>> because there is no way to checkout a specific commit with the
>> current implementation when the default value of
>> 'treesit--install-language-grammar-full-clone' is nil.

Here is the current state:

1. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json")

  installs the latest commit 46aa487.

2. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json"
    "v0.24.8")

  installs the commit ee35a6e tagged v0.24.8.

3. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json"
    "4d770d3")

  fails to check out "4d770d3" with the error:

  git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
  warning: Could not find remote branch 4d770d3 to clone
  fatal: Remote branch 4d770d3 not found in upstream origin

4. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json"
    nil nil nil nil "4d770d3")

  fails to check out "4d770d3" with the error:

  git -C /tmp/treesit-workdirHhEIhg/repo checkout 4d770d3
  error: pathspec '4d770d3' did not match any file(s) known to git

After (setq treesit--install-language-grammar-full-clone t):

5. (treesit--install-language-grammar-1
    (locate-user-emacs-file "tree-sitter") 'json
    "https://github.com/tree-sitter/tree-sitter-json"
    "4d770d3")

  successfully installs the commit "v0.24.8-1-g4d770d3".

When treesit--install-language-grammar-full-clone is nil,
"--depth 1" is added to "git clone".

So we need a Git guru to recommend a command line to use
"git clone" with "--depth 1" to check out a single commit.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Tue, 10 Jun 2025 01:39:01 GMT) Full text and rfc822 format available.

Message #17 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 78542 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Mon, 9 Jun 2025 18:38:03 -0700

> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>>> The only reason currently tags are used instead of commit hashes is
>>> because there is no way to checkout a specific commit with the
>>> current implementation when the default value of
>>> 'treesit--install-language-grammar-full-clone' is nil.
> 
> Here is the current state:
> 
> 1. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json")
> 
>  installs the latest commit 46aa487.
> 
> 2. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    "v0.24.8")
> 
>  installs the commit ee35a6e tagged v0.24.8.
> 
> 3. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    "4d770d3")
> 
>  fails to check out "4d770d3" with the error:
> 
>  git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
>  warning: Could not find remote branch 4d770d3 to clone
>  fatal: Remote branch 4d770d3 not found in upstream origin
> 
> 4. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    nil nil nil nil "4d770d3")
> 
>  fails to check out "4d770d3" with the error:
> 
>  git -C /tmp/treesit-workdirHhEIhg/repo checkout 4d770d3
>  error: pathspec '4d770d3' did not match any file(s) known to git
> 
> After (setq treesit--install-language-grammar-full-clone t):
> 
> 5. (treesit--install-language-grammar-1
>    (locate-user-emacs-file "tree-sitter") 'json
>    "https://github.com/tree-sitter/tree-sitter-json"
>    "4d770d3")
> 
>  successfully installs the commit "v0.24.8-1-g4d770d3".
> 
> When treesit--install-language-grammar-full-clone is nil,
> "--depth 1" is added to "git clone".
> 
> So we need a Git guru to recommend a command line to use
> "git clone" with "--depth 1" to check out a single commit.

Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,

git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
cd tree-sitter-json
git checkout 4d770d3
git fetch --depth=1

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Tue, 10 Jun 2025 06:48:03 GMT) Full text and rfc822 format available.

Message #20 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 78542 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, dancol <at> dancol.org
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Tue, 10 Jun 2025 09:23:31 +0300
>> When treesit--install-language-grammar-full-clone is nil,
>> "--depth 1" is added to "git clone".
>> 
>> So we need a Git guru to recommend a command line to use
>> "git clone" with "--depth 1" to check out a single commit.
>
> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>
> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
> cd tree-sitter-json
> git checkout 4d770d3
> git fetch --depth=1

This still keeps full history.  This means we could simply
set the default value of treesit--install-language-grammar-full-clone
to t, or completely remove this variable, if there is no way
to clone at a specific commit without fetching full history?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78542; Package emacs. (Tue, 10 Jun 2025 07:45:02 GMT) Full text and rfc822 format available.

Message #23 received at 78542 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78542: [Security] hash locking needed for tree-sitter
 downloads
Date: Tue, 10 Jun 2025 00:44:30 -0700
Juri Linkov <juri <at> linkov.net> writes:

>>> When treesit--install-language-grammar-full-clone is nil,
>>> "--depth 1" is added to "git clone".
>>> 
>>> So we need a Git guru to recommend a command line to use
>>> "git clone" with "--depth 1" to check out a single commit.
>>
>> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>>
>> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
>> cd tree-sitter-json
>> git checkout 4d770d3
>> git fetch --depth=1
>
> This still keeps full history.

There's a difference between full history and all blobs for all
revisions in this history.  You can also use --shallow-since during the
clone with a date to further limit history.  --shallow-exclude would
probably work even better, since you wouldn't need a date, but it's
broken for me somehow, at least with the repository above.
But --shallow-since works.




This bug report was last modified 7 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.