GNU bug report logs -
#78542
[Security] hash locking needed for tree-sitter downloads
Previous Next
To reply to this bug, email your comments to 78542 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Wed, 21 May 2025 19:13:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Daniel Colascione <dancol <at> dancol.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Wed, 21 May 2025 19:13:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
When downloading code, a tag isn't good enough. We should insist on a
specific commit.
We have a fair bit of code in Emacs that looks like this:
(add-to-list
'treesit-language-source-alist
'(javascript "https://github.com/tree-sitter/tree-sitter-javascript" "v0.23.1")
t)
(add-to-list
'treesit-language-source-alist
'(jsdoc "https://github.com/tree-sitter/tree-sitter-jsdoc" "v0.23.2")
t)
The entries in treesit-language-source-alist mostly have tags but not
commit hashes. The expected commit hash should be *mandatory*, because
right now, anyone with access to one of these repositories can retarget
any of those tags at malicious code.
See https://snyk.io/blog/npm-security-preventing-supply-chain-attacks/
Every other important language ecosystem has evolved some kind of "hash
locking" capability for breaking the author-retargets-to-malware attack
vector. We should too. We shouldn't allow the commit hash to be absent
for ordinary users.
P.S. we've debated vendoring these grammars with Emacs. I still think
that's the right way to go. But if we're going to download and build,
we should at least do it in a secure way.
P.S.S. Do we need the list of grammars in build.sh under admin? It
duplicates what's in Lisp elsewhere in the tree.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Thu, 22 May 2025 06:47:04 GMT)
Full text and
rfc822 format available.
Message #8 received at 78542 <at> debbugs.gnu.org (full text, mbox):
> When downloading code, a tag isn't good enough. We should insist on a
> specific commit.
> [...]
> The entries in treesit-language-source-alist mostly have tags but not
> commit hashes. The expected commit hash should be *mandatory*, because
> right now, anyone with access to one of these repositories can retarget
> any of those tags at malicious code.
Indeed, tags can be easily relocated to a different commit.
> Every other important language ecosystem has evolved some kind of "hash
> locking" capability for breaking the author-retargets-to-malware attack
> vector. We should too. We shouldn't allow the commit hash to be absent
> for ordinary users.
Agreed, "hash locking" should lock commit hashes, not tags.
> P.S. we've debated vendoring these grammars with Emacs. I still think
> that's the right way to go. But if we're going to download and build,
> we should at least do it in a secure way.
The only reason currently tags are used instead of commit hashes is
because there is no way to checkout a specific commit with the
current implementation when the default value of
'treesit--install-language-grammar-full-clone' is nil.
> P.S.S. Do we need the list of grammars in build.sh under admin? It
> duplicates what's in Lisp elsewhere in the tree.
Apparently no need, so they could be removed.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sat, 07 Jun 2025 08:07:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 78542 <at> debbugs.gnu.org (full text, mbox):
Ping! Do we want to make some progress here?
> Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org
> From: Juri Linkov <juri <at> linkov.net>
> Date: Thu, 22 May 2025 09:36:57 +0300
>
> > When downloading code, a tag isn't good enough. We should insist on a
> > specific commit.
> > [...]
> > The entries in treesit-language-source-alist mostly have tags but not
> > commit hashes. The expected commit hash should be *mandatory*, because
> > right now, anyone with access to one of these repositories can retarget
> > any of those tags at malicious code.
>
> Indeed, tags can be easily relocated to a different commit.
>
> > Every other important language ecosystem has evolved some kind of "hash
> > locking" capability for breaking the author-retargets-to-malware attack
> > vector. We should too. We shouldn't allow the commit hash to be absent
> > for ordinary users.
>
> Agreed, "hash locking" should lock commit hashes, not tags.
>
> > P.S. we've debated vendoring these grammars with Emacs. I still think
> > that's the right way to go. But if we're going to download and build,
> > we should at least do it in a secure way.
>
> The only reason currently tags are used instead of commit hashes is
> because there is no way to checkout a specific commit with the
> current implementation when the default value of
> 'treesit--install-language-grammar-full-clone' is nil.
>
> > P.S.S. Do we need the list of grammars in build.sh under admin? It
> > duplicates what's in Lisp elsewhere in the tree.
>
> Apparently no need, so they could be removed.
>
>
>
>
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sun, 08 Jun 2025 17:50:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> The only reason currently tags are used instead of commit hashes is
>> because there is no way to checkout a specific commit with the
>> current implementation when the default value of
>> 'treesit--install-language-grammar-full-clone' is nil.
Here is the current state:
1. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json")
installs the latest commit 46aa487.
2. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
"v0.24.8")
installs the commit ee35a6e tagged v0.24.8.
3. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
"4d770d3")
fails to check out "4d770d3" with the error:
git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
warning: Could not find remote branch 4d770d3 to clone
fatal: Remote branch 4d770d3 not found in upstream origin
4. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
nil nil nil nil "4d770d3")
fails to check out "4d770d3" with the error:
git -C /tmp/treesit-workdirHhEIhg/repo checkout 4d770d3
error: pathspec '4d770d3' did not match any file(s) known to git
After (setq treesit--install-language-grammar-full-clone t):
5. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
"4d770d3")
successfully installs the commit "v0.24.8-1-g4d770d3".
When treesit--install-language-grammar-full-clone is nil,
"--depth 1" is added to "git clone".
So we need a Git guru to recommend a command line to use
"git clone" with "--depth 1" to check out a single commit.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Tue, 10 Jun 2025 01:39:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 78542 <at> debbugs.gnu.org (full text, mbox):
> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
>
>>> The only reason currently tags are used instead of commit hashes is
>>> because there is no way to checkout a specific commit with the
>>> current implementation when the default value of
>>> 'treesit--install-language-grammar-full-clone' is nil.
>
> Here is the current state:
>
> 1. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json")
>
> installs the latest commit 46aa487.
>
> 2. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> "v0.24.8")
>
> installs the commit ee35a6e tagged v0.24.8.
>
> 3. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> "4d770d3")
>
> fails to check out "4d770d3" with the error:
>
> git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
> warning: Could not find remote branch 4d770d3 to clone
> fatal: Remote branch 4d770d3 not found in upstream origin
>
> 4. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> nil nil nil nil "4d770d3")
>
> fails to check out "4d770d3" with the error:
>
> git -C /tmp/treesit-workdirHhEIhg/repo checkout 4d770d3
> error: pathspec '4d770d3' did not match any file(s) known to git
>
> After (setq treesit--install-language-grammar-full-clone t):
>
> 5. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> "4d770d3")
>
> successfully installs the commit "v0.24.8-1-g4d770d3".
>
> When treesit--install-language-grammar-full-clone is nil,
> "--depth 1" is added to "git clone".
>
> So we need a Git guru to recommend a command line to use
> "git clone" with "--depth 1" to check out a single commit.
Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
cd tree-sitter-json
git checkout 4d770d3
git fetch --depth=1
Yuan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Tue, 10 Jun 2025 06:48:03 GMT)
Full text and
rfc822 format available.
Message #20 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> When treesit--install-language-grammar-full-clone is nil,
>> "--depth 1" is added to "git clone".
>>
>> So we need a Git guru to recommend a command line to use
>> "git clone" with "--depth 1" to check out a single commit.
>
> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>
> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
> cd tree-sitter-json
> git checkout 4d770d3
> git fetch --depth=1
This still keeps full history. This means we could simply
set the default value of treesit--install-language-grammar-full-clone
to t, or completely remove this variable, if there is no way
to clone at a specific commit without fetching full history?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Tue, 10 Jun 2025 07:45:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 78542 <at> debbugs.gnu.org (full text, mbox):
Juri Linkov <juri <at> linkov.net> writes:
>>> When treesit--install-language-grammar-full-clone is nil,
>>> "--depth 1" is added to "git clone".
>>>
>>> So we need a Git guru to recommend a command line to use
>>> "git clone" with "--depth 1" to check out a single commit.
>>
>> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>>
>> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
>> cd tree-sitter-json
>> git checkout 4d770d3
>> git fetch --depth=1
>
> This still keeps full history.
There's a difference between full history and all blobs for all
revisions in this history. You can also use --shallow-since during the
clone with a date to further limit history. --shallow-exclude would
probably work even better, since you wouldn't need a date, but it's
broken for me somehow, at least with the repository above.
But --shallow-since works.
This bug report was last modified 7 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.