GNU bug report logs -
#78542
[Security] hash locking needed for tree-sitter downloads
Previous Next
Reported by: Daniel Colascione <dancol <at> dancol.org>
Date: Wed, 21 May 2025 19:13:04 UTC
Severity: normal
Fixed in version 31.0.50
Done: Juri Linkov <juri <at> linkov.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 78542 in the body.
You can then email your comments to 78542 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Wed, 21 May 2025 19:13:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Daniel Colascione <dancol <at> dancol.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Wed, 21 May 2025 19:13:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
When downloading code, a tag isn't good enough. We should insist on a
specific commit.
We have a fair bit of code in Emacs that looks like this:
(add-to-list
'treesit-language-source-alist
'(javascript "https://github.com/tree-sitter/tree-sitter-javascript" "v0.23.1")
t)
(add-to-list
'treesit-language-source-alist
'(jsdoc "https://github.com/tree-sitter/tree-sitter-jsdoc" "v0.23.2")
t)
The entries in treesit-language-source-alist mostly have tags but not
commit hashes. The expected commit hash should be *mandatory*, because
right now, anyone with access to one of these repositories can retarget
any of those tags at malicious code.
See https://snyk.io/blog/npm-security-preventing-supply-chain-attacks/
Every other important language ecosystem has evolved some kind of "hash
locking" capability for breaking the author-retargets-to-malware attack
vector. We should too. We shouldn't allow the commit hash to be absent
for ordinary users.
P.S. we've debated vendoring these grammars with Emacs. I still think
that's the right way to go. But if we're going to download and build,
we should at least do it in a secure way.
P.S.S. Do we need the list of grammars in build.sh under admin? It
duplicates what's in Lisp elsewhere in the tree.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Thu, 22 May 2025 06:47:04 GMT)
Full text and
rfc822 format available.
Message #8 received at 78542 <at> debbugs.gnu.org (full text, mbox):
> When downloading code, a tag isn't good enough. We should insist on a
> specific commit.
> [...]
> The entries in treesit-language-source-alist mostly have tags but not
> commit hashes. The expected commit hash should be *mandatory*, because
> right now, anyone with access to one of these repositories can retarget
> any of those tags at malicious code.
Indeed, tags can be easily relocated to a different commit.
> Every other important language ecosystem has evolved some kind of "hash
> locking" capability for breaking the author-retargets-to-malware attack
> vector. We should too. We shouldn't allow the commit hash to be absent
> for ordinary users.
Agreed, "hash locking" should lock commit hashes, not tags.
> P.S. we've debated vendoring these grammars with Emacs. I still think
> that's the right way to go. But if we're going to download and build,
> we should at least do it in a secure way.
The only reason currently tags are used instead of commit hashes is
because there is no way to checkout a specific commit with the
current implementation when the default value of
'treesit--install-language-grammar-full-clone' is nil.
> P.S.S. Do we need the list of grammars in build.sh under admin? It
> duplicates what's in Lisp elsewhere in the tree.
Apparently no need, so they could be removed.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sat, 07 Jun 2025 08:07:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 78542 <at> debbugs.gnu.org (full text, mbox):
Ping! Do we want to make some progress here?
> Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org
> From: Juri Linkov <juri <at> linkov.net>
> Date: Thu, 22 May 2025 09:36:57 +0300
>
> > When downloading code, a tag isn't good enough. We should insist on a
> > specific commit.
> > [...]
> > The entries in treesit-language-source-alist mostly have tags but not
> > commit hashes. The expected commit hash should be *mandatory*, because
> > right now, anyone with access to one of these repositories can retarget
> > any of those tags at malicious code.
>
> Indeed, tags can be easily relocated to a different commit.
>
> > Every other important language ecosystem has evolved some kind of "hash
> > locking" capability for breaking the author-retargets-to-malware attack
> > vector. We should too. We shouldn't allow the commit hash to be absent
> > for ordinary users.
>
> Agreed, "hash locking" should lock commit hashes, not tags.
>
> > P.S. we've debated vendoring these grammars with Emacs. I still think
> > that's the right way to go. But if we're going to download and build,
> > we should at least do it in a secure way.
>
> The only reason currently tags are used instead of commit hashes is
> because there is no way to checkout a specific commit with the
> current implementation when the default value of
> 'treesit--install-language-grammar-full-clone' is nil.
>
> > P.S.S. Do we need the list of grammars in build.sh under admin? It
> > duplicates what's in Lisp elsewhere in the tree.
>
> Apparently no need, so they could be removed.
>
>
>
>
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sun, 08 Jun 2025 17:50:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> The only reason currently tags are used instead of commit hashes is
>> because there is no way to checkout a specific commit with the
>> current implementation when the default value of
>> 'treesit--install-language-grammar-full-clone' is nil.
Here is the current state:
1. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json")
installs the latest commit 46aa487.
2. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
"v0.24.8")
installs the commit ee35a6e tagged v0.24.8.
3. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
"4d770d3")
fails to check out "4d770d3" with the error:
git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
warning: Could not find remote branch 4d770d3 to clone
fatal: Remote branch 4d770d3 not found in upstream origin
4. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
nil nil nil nil "4d770d3")
fails to check out "4d770d3" with the error:
git -C /tmp/treesit-workdirHhEIhg/repo checkout 4d770d3
error: pathspec '4d770d3' did not match any file(s) known to git
After (setq treesit--install-language-grammar-full-clone t):
5. (treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
"4d770d3")
successfully installs the commit "v0.24.8-1-g4d770d3".
When treesit--install-language-grammar-full-clone is nil,
"--depth 1" is added to "git clone".
So we need a Git guru to recommend a command line to use
"git clone" with "--depth 1" to check out a single commit.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Tue, 10 Jun 2025 01:39:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 78542 <at> debbugs.gnu.org (full text, mbox):
> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
>
>>> The only reason currently tags are used instead of commit hashes is
>>> because there is no way to checkout a specific commit with the
>>> current implementation when the default value of
>>> 'treesit--install-language-grammar-full-clone' is nil.
>
> Here is the current state:
>
> 1. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json")
>
> installs the latest commit 46aa487.
>
> 2. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> "v0.24.8")
>
> installs the commit ee35a6e tagged v0.24.8.
>
> 3. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> "4d770d3")
>
> fails to check out "4d770d3" with the error:
>
> git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
> warning: Could not find remote branch 4d770d3 to clone
> fatal: Remote branch 4d770d3 not found in upstream origin
>
> 4. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> nil nil nil nil "4d770d3")
>
> fails to check out "4d770d3" with the error:
>
> git -C /tmp/treesit-workdirHhEIhg/repo checkout 4d770d3
> error: pathspec '4d770d3' did not match any file(s) known to git
>
> After (setq treesit--install-language-grammar-full-clone t):
>
> 5. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> "4d770d3")
>
> successfully installs the commit "v0.24.8-1-g4d770d3".
>
> When treesit--install-language-grammar-full-clone is nil,
> "--depth 1" is added to "git clone".
>
> So we need a Git guru to recommend a command line to use
> "git clone" with "--depth 1" to check out a single commit.
Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
cd tree-sitter-json
git checkout 4d770d3
git fetch --depth=1
Yuan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Tue, 10 Jun 2025 06:48:03 GMT)
Full text and
rfc822 format available.
Message #20 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> When treesit--install-language-grammar-full-clone is nil,
>> "--depth 1" is added to "git clone".
>>
>> So we need a Git guru to recommend a command line to use
>> "git clone" with "--depth 1" to check out a single commit.
>
> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>
> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
> cd tree-sitter-json
> git checkout 4d770d3
> git fetch --depth=1
This still keeps full history. This means we could simply
set the default value of treesit--install-language-grammar-full-clone
to t, or completely remove this variable, if there is no way
to clone at a specific commit without fetching full history?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Tue, 10 Jun 2025 07:45:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 78542 <at> debbugs.gnu.org (full text, mbox):
Juri Linkov <juri <at> linkov.net> writes:
>>> When treesit--install-language-grammar-full-clone is nil,
>>> "--depth 1" is added to "git clone".
>>>
>>> So we need a Git guru to recommend a command line to use
>>> "git clone" with "--depth 1" to check out a single commit.
>>
>> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>>
>> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
>> cd tree-sitter-json
>> git checkout 4d770d3
>> git fetch --depth=1
>
> This still keeps full history.
There's a difference between full history and all blobs for all
revisions in this history. You can also use --shallow-since during the
clone with a date to further limit history. --shallow-exclude would
probably work even better, since you wouldn't need a date, but it's
broken for me somehow, at least with the repository above.
But --shallow-since works.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Thu, 19 Jun 2025 17:09:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>>>> When treesit--install-language-grammar-full-clone is nil,
>>>> "--depth 1" is added to "git clone".
>>>>
>>>> So we need a Git guru to recommend a command line to use
>>>> "git clone" with "--depth 1" to check out a single commit.
>>>
>>> Would it work if we do a blobless full clone, checkout the commit, and fetch depth=1? Eg,
>>>
>>> git clone https://github.com/tree-sitter/tree-sitter-json.git --filter=blob:none
>>> cd tree-sitter-json
>>> git checkout 4d770d3
>>> git fetch --depth=1
>>
>> This still keeps full history.
>
> There's a difference between full history and all blobs for all
> revisions in this history. You can also use --shallow-since during the
> clone with a date to further limit history. --shallow-exclude would
> probably work even better, since you wouldn't need a date, but it's
> broken for me somehow, at least with the repository above.
> But --shallow-since works.
I can't find what value to provide for --shallow-since.
So let's just use a blobless full clone:
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 353e991ec20..5d03f0cf45e 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -5238,7 +5238,13 @@ treesit--install-language-grammar-1
(if url-is-dir
(when revision
(treesit--git-checkout-branch workdir revision))
- (treesit--git-clone-repo url revision workdir))
+ (if commit
+ ;; Force blobless full clone to be able later
+ ;; to checkout a commit (bug#78542).
+ (let ((treesit--install-language-grammar-blobless t)
+ (treesit--install-language-grammar-full-clone t))
+ (treesit--git-clone-repo url revision workdir))
+ (treesit--git-clone-repo url revision workdir)))
(when commit
(treesit--git-checkout-branch workdir commit))
(setq version (treesit--language-git-revision workdir))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Thu, 19 Jun 2025 17:57:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 78542 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
> + (if commit
> + ;; Force blobless full clone to be able later
> + ;; to checkout a commit (bug#78542).
> + (let ((treesit--install-language-grammar-blobless t)
> + (treesit--install-language-grammar-full-clone t))
> + (treesit--git-clone-repo url revision workdir))
> + (treesit--git-clone-repo url revision workdir)))
Since with this change it's possible to specify the commit,
let's also improve the format of the source list.
Currently adding a commit to the list requires
prefixing it with four nils:
(treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
nil nil nil nil "4d770d3")
The following patch introduces an alternative format
using keywords, e.g.:
(treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
:commit "4d770d3")
[treesit-language-source-alist.patch (text/x-diff, inline)]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 353e991ec20..fedcb6ed1e9 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -4998,7 +4998,7 @@ treesit-language-source-alist
The value should be an alist where each element has the form
- (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
+ (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
Only LANG and URL are mandatory. LANG is the language symbol.
URL is the URL of the grammar's Git repository or a directory
@@ -5015,8 +5015,17 @@ treesit-language-source-alist
CC and C++ are C and C++ compilers, defaulting to \"cc\" and
\"c++\", respectively.
+Another way to specify optional data is to use keywords:
+
+ (LANG . (URL [KEYWORD VALUE]...))
+
The currently supported keywords:
+`:revision' is the same as REVISION above.
+`:source-dir' is the same as SOURCE-DIR above.
+`:cc' is the same as CC above.
+`:c++' is the same as C++ above.
+`:commit' is the same as COMMIT above.
`:copy-queries' when non-nil specifies whether to copy the files
in the \"queries\" directory from the source directory to the
installation directory.")
@@ -5203,7 +5212,7 @@ treesit--git-clone-repo
(apply #'treesit--call-process-signal args)))
(defun treesit--install-language-grammar-1
- (out-dir lang url &optional revision source-dir cc c++ commit &rest args)
+ (out-dir lang url &rest args)
"Compile and install a tree-sitter language grammar library.
OUT-DIR is the directory to put the compiled library file. If it
@@ -5211,8 +5220,7 @@ treesit--install-language-grammar-1
configuration directory is used (and automatically created if it
does not exist).
-For LANG, URL, REVISION, SOURCE-DIR, GRAMMAR-DIR, CC, C++, COMMIT, see
-`treesit-language-source-alist'.
+For ARGS, see `treesit-language-source-alist'.
Return the git revision of the installed grammar. The revision is
generated by \"git describe\". It only works when
@@ -5225,13 +5233,25 @@ treesit--install-language-grammar-1
(workdir (if url-is-dir
maybe-repo-dir
(expand-file-name "repo")))
- copy-queries version)
+ version
+ revision source-dir cc c++ commit copy-queries)
;; Process the keyword args.
(while (keywordp (car args))
(pcase (pop args)
- (:copy-queries (setq copy-queries (pop args)))
- (_ (pop args))))
+ (:revision (setq revision (pop args)))
+ (:source-dir (setq source-dir (pop args)))
+ (:cc (setq cc (pop args)))
+ (:c++ (setq c++ (pop args)))
+ (:commit (setq commit (pop args)))
+ (:copy-queries (setq copy-queries (pop args)))))
+
+ ;; Old positional convention for backward-compatibility:
+ (unless revision (setq revision (nth 0 args)))
+ (unless source-dir (setq source-dir (nth 1 args)))
+ (unless cc (setq cc (nth 2 args)))
+ (unless c++ (setq c++ (nth 3 args)))
+ (unless commit (setq commit (nth 4 args)))
(unwind-protect
(with-temp-buffer
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Thu, 19 Jun 2025 18:14:03 GMT)
Full text and
rfc822 format available.
Message #32 received at 78542 <at> debbugs.gnu.org (full text, mbox):
On June 19, 2025 1:54:08 PM EDT, Juri Linkov <juri <at> linkov.net> wrote:
>> + (if commit
>> + ;; Force blobless full clone to be able later
>> + ;; to checkout a commit (bug#78542).
>> + (let ((treesit--install-language-grammar-blobless t)
>> + (treesit--install-language-grammar-full-clone t))
>> + (treesit--git-clone-repo url revision workdir))
>> + (treesit--git-clone-repo url revision workdir)))
>
>Since with this change it's possible to specify the commit,
>let's also improve the format of the source list.
>Currently adding a commit to the list requires
>prefixing it with four nils:
>
> (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> nil nil nil nil "4d770d3")
>
>The following patch introduces an alternative format
>using keywords, e.g.:
>
> (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> :commit "4d770d3")
>
Great. While you're doing this, can you also please use full hashes? Short ones aren't particularly collision resistant.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Fri, 20 Jun 2025 06:56:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 78542 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: Yuan Fu <casouri <at> gmail.com>, 78542 <at> debbugs.gnu.org, Eli Zaretskii
> <eliz <at> gnu.org>
> Date: Thu, 19 Jun 2025 20:54:08 +0300
>
> The value should be an alist where each element has the form
>
> - (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
> + (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
>
> Only LANG and URL are mandatory. LANG is the language symbol.
> URL is the URL of the grammar's Git repository or a directory
> @@ -5015,8 +5015,17 @@ treesit-language-source-alist
> CC and C++ are C and C++ compilers, defaulting to \"cc\" and
> \"c++\", respectively.
>
> +Another way to specify optional data is to use keywords:
> +
> + (LANG . (URL [KEYWORD VALUE]...))
> +
> The currently supported keywords:
>
> +`:revision' is the same as REVISION above.
> +`:source-dir' is the same as SOURCE-DIR above.
> +`:cc' is the same as CC above.
> +`:c++' is the same as C++ above.
> +`:commit' is the same as COMMIT above.
> `:copy-queries' when non-nil specifies whether to copy the files
> in the \"queries\" directory from the source directory to the
> installation directory.")
This is okay, but I guess the keywords are not entirely independent?
That is, to have a valid spec one needs several keywords to be
specified together? In that case, I think this should be stated in
the doc string.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Fri, 20 Jun 2025 17:00:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> The value should be an alist where each element has the form
>>
>> - (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
>> + (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
>>
>> Only LANG and URL are mandatory. LANG is the language symbol.
>> URL is the URL of the grammar's Git repository or a directory
>> @@ -5015,8 +5015,17 @@ treesit-language-source-alist
>> CC and C++ are C and C++ compilers, defaulting to \"cc\" and
>> \"c++\", respectively.
>>
>> +Another way to specify optional data is to use keywords:
>> +
>> + (LANG . (URL [KEYWORD VALUE]...))
>> +
>> The currently supported keywords:
>>
>> +`:revision' is the same as REVISION above.
>> +`:source-dir' is the same as SOURCE-DIR above.
>> +`:cc' is the same as CC above.
>> +`:c++' is the same as C++ above.
>> +`:commit' is the same as COMMIT above.
>> `:copy-queries' when non-nil specifies whether to copy the files
>> in the \"queries\" directory from the source directory to the
>> installation directory.")
>
> This is okay, but I guess the keywords are not entirely independent?
> That is, to have a valid spec one needs several keywords to be
> specified together? In that case, I think this should be stated in
> the doc string.
Actually, the keywords are independent. This was the reason
to introduce the keywords, so they could be specified separately
from other keywords.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Fri, 20 Jun 2025 17:01:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 78542 <at> debbugs.gnu.org (full text, mbox):
close 78542 31.0.50
thanks
>> The following patch introduces an alternative format
>> using keywords, e.g.:
>>
>> (treesit--install-language-grammar-1
>> (locate-user-emacs-file "tree-sitter") 'json
>> "https://github.com/tree-sitter/tree-sitter-json"
>> :commit "4d770d3")
>
> Great. While you're doing this, can you also please use full hashes?
> Short ones aren't particularly collision resistant.
So now replaced tags with full hashes that either correspond
to the previous tags or are mentioned explicitly in the comments
section of ts-mode files.
> P.S.S. Do we need the list of grammars in build.sh under admin? It
> duplicates what's in Lisp elsewhere in the tree.
I don't know if build.sh is still used or can be removed.
Maybe Yuan could answer.
bug marked as fixed in version 31.0.50, send any further explanations to
78542 <at> debbugs.gnu.org and Daniel Colascione <dancol <at> dancol.org>
Request was from
Juri Linkov <juri <at> linkov.net>
to
control <at> debbugs.gnu.org
.
(Fri, 20 Jun 2025 17:01:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Fri, 20 Jun 2025 22:39:03 GMT)
Full text and
rfc822 format available.
Message #46 received at 78542 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
> Here is the current state:
>
> 3. (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> "4d770d3")
>
> fails to check out "4d770d3" with the error:
>
> git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
> warning: Could not find remote branch 4d770d3 to clone
> fatal: Remote branch 4d770d3 not found in upstream origin
I’m a bit late to the party, here, but would it make sense to have, say:
(treesit--install-language-grammar-1
(locate-user-emacs-file "tree-sitter") 'json
"https://github.com/tree-sitter/tree-sitter-json"
:tag "v0.24.8"
:commit "4d770d31f732d50d3ec373865822fbe659e47c75")
We could then:
git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b v0.24.8
git checkout 4d770d31f732d50d3ec373865822fbe659e47c75
Additionally, I think including the tag helps to clarify the intention to anyone reading the code, without them having to go away and refer to the repository to find out about that commit.
--
Peter Oliver
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Fri, 20 Jun 2025 23:06:02 GMT)
Full text and
rfc822 format available.
Message #49 received at 78542 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Fri, Jun 20, 2025 at 6:39 PM Peter Oliver <p.d.oliver <at> mavit.org.uk>
wrote:
> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
>
> > Here is the current state:
> >
> > 3. (treesit--install-language-grammar-1
> > (locate-user-emacs-file "tree-sitter") 'json
> > "https://github.com/tree-sitter/tree-sitter-json"
> > "4d770d3")
> >
> > fails to check out "4d770d3" with the error:
> >
> > git clone https://github.com/tree-sitter/tree-sitter-json --quiet
> --depth 1 -b 4d770d3
> > warning: Could not find remote branch 4d770d3 to clone
> > fatal: Remote branch 4d770d3 not found in upstream origin
>
> I’m a bit late to the party, here, but would it make sense to have, say:
>
> (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> :tag "v0.24.8"
> :commit "4d770d31f732d50d3ec373865822fbe659e47c75")
>
> We could then:
>
> git clone https://github.com/tree-sitter/tree-sitter-json --quiet
> --depth 1 -b v0.24.8
> git checkout 4d770d31f732d50d3ec373865822fbe659e47c75
>
> Additionally, I think including the tag helps to clarify the intention to
> anyone reading the code, without them having to go away and refer to the
> repository to find out about that commit.
git tags aren't really immutable, though, as they can be changed to point
to other commits. If you want to specify both a commit hash and a tag and
the tag doesn't or no longer points to that commit, that would be
confusing. I'd say prioritize commit hashes over tags and not sure if a
:tag keyword would just act as documentation or a comment or just use a
comment?
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sat, 21 Jun 2025 04:25:02 GMT)
Full text and
rfc822 format available.
Message #52 received at 78542 <at> debbugs.gnu.org (full text, mbox):
Stéphane Marks <shipmints <at> gmail.com> writes:
> On Fri, Jun 20, 2025 at 6:39 PM Peter Oliver <p.d.oliver <at> mavit.org.uk>
> wrote:
>
>> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
>>
>> > Here is the current state:
>> >
>> > 3. (treesit--install-language-grammar-1
>> > (locate-user-emacs-file "tree-sitter") 'json
>> > "https://github.com/tree-sitter/tree-sitter-json"
>> > "4d770d3")
>> >
>> > fails to check out "4d770d3" with the error:
>> >
>> > git clone https://github.com/tree-sitter/tree-sitter-json --quiet
>> --depth 1 -b 4d770d3
>> > warning: Could not find remote branch 4d770d3 to clone
>> > fatal: Remote branch 4d770d3 not found in upstream origin
>>
>> I’m a bit late to the party, here, but would it make sense to have, say:
>>
>> (treesit--install-language-grammar-1
>> (locate-user-emacs-file "tree-sitter") 'json
>> "https://github.com/tree-sitter/tree-sitter-json"
>> :tag "v0.24.8"
>> :commit "4d770d31f732d50d3ec373865822fbe659e47c75")
>>
>> We could then:
>>
>> git clone https://github.com/tree-sitter/tree-sitter-json --quiet
>> --depth 1 -b v0.24.8
>> git checkout 4d770d31f732d50d3ec373865822fbe659e47c75
>>
>> Additionally, I think including the tag helps to clarify the intention to
>> anyone reading the code, without them having to go away and refer to the
>> repository to find out about that commit.
>
>
> git tags aren't really immutable, though, as they can be changed to point
> to other commits. If you want to specify both a commit hash and a tag and
> the tag doesn't or no longer points to that commit, that would be
> confusing.
Or an error. I guess you could include tag names to allow for some kind
of UX shorthand while verifying, using the hashes, that the tags still
refer to their designated trees.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sat, 21 Jun 2025 06:28:02 GMT)
Full text and
rfc822 format available.
Message #55 received at 78542 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: dancol <at> dancol.org, casouri <at> gmail.com, 78542 <at> debbugs.gnu.org
> Date: Fri, 20 Jun 2025 19:48:09 +0300
>
> >> The value should be an alist where each element has the form
> >>
> >> - (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
> >> + (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
> >>
> >> Only LANG and URL are mandatory. LANG is the language symbol.
> >> URL is the URL of the grammar's Git repository or a directory
> >> @@ -5015,8 +5015,17 @@ treesit-language-source-alist
> >> CC and C++ are C and C++ compilers, defaulting to \"cc\" and
> >> \"c++\", respectively.
> >>
> >> +Another way to specify optional data is to use keywords:
> >> +
> >> + (LANG . (URL [KEYWORD VALUE]...))
> >> +
> >> The currently supported keywords:
> >>
> >> +`:revision' is the same as REVISION above.
> >> +`:source-dir' is the same as SOURCE-DIR above.
> >> +`:cc' is the same as CC above.
> >> +`:c++' is the same as C++ above.
> >> +`:commit' is the same as COMMIT above.
> >> `:copy-queries' when non-nil specifies whether to copy the files
> >> in the \"queries\" directory from the source directory to the
> >> installation directory.")
> >
> > This is okay, but I guess the keywords are not entirely independent?
> > That is, to have a valid spec one needs several keywords to be
> > specified together? In that case, I think this should be stated in
> > the doc string.
>
> Actually, the keywords are independent.
You mean, it's okay to have just the :source-dir, say, and nothing
else, and that would produce a complete specification that could be
used to install or upgrade the grammar library?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sat, 21 Jun 2025 10:52:04 GMT)
Full text and
rfc822 format available.
Message #58 received at 78542 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sat, Jun 21, 2025 at 12:24 AM Daniel Colascione <dancol <at> dancol.org>
wrote:
> Stéphane Marks <shipmints <at> gmail.com> writes:
>
> > On Fri, Jun 20, 2025 at 6:39 PM Peter Oliver <p.d.oliver <at> mavit.org.uk>
> > wrote:
> >
> >> On Jun 8, 2025, at 10:45 AM, Juri Linkov <juri <at> linkov.net> wrote:
> >>
> >> > Here is the current state:
> >> >
> >> > 3. (treesit--install-language-grammar-1
> >> > (locate-user-emacs-file "tree-sitter") 'json
> >> > "https://github.com/tree-sitter/tree-sitter-json"
> >> > "4d770d3")
> >> >
> >> > fails to check out "4d770d3" with the error:
> >> >
> >> > git clone https://github.com/tree-sitter/tree-sitter-json --quiet
> >> --depth 1 -b 4d770d3
> >> > warning: Could not find remote branch 4d770d3 to clone
> >> > fatal: Remote branch 4d770d3 not found in upstream origin
> >>
> >> I’m a bit late to the party, here, but would it make sense to have, say:
> >>
> >> (treesit--install-language-grammar-1
> >> (locate-user-emacs-file "tree-sitter") 'json
> >> "https://github.com/tree-sitter/tree-sitter-json"
> >> :tag "v0.24.8"
> >> :commit "4d770d31f732d50d3ec373865822fbe659e47c75")
> >>
> >> We could then:
> >>
> >> git clone https://github.com/tree-sitter/tree-sitter-json --quiet
> >> --depth 1 -b v0.24.8
> >> git checkout 4d770d31f732d50d3ec373865822fbe659e47c75
> >>
> >> Additionally, I think including the tag helps to clarify the intention
> to
> >> anyone reading the code, without them having to go away and refer to the
> >> repository to find out about that commit.
> >
> >
> > git tags aren't really immutable, though, as they can be changed to point
> > to other commits. If you want to specify both a commit hash and a tag
> and
> > the tag doesn't or no longer points to that commit, that would be
> > confusing.
>
> Or an error. I guess you could include tag names to allow for some kind
> of UX shorthand while verifying, using the hashes, that the tags still
> refer to their designated trees.
>
Good.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sun, 22 Jun 2025 07:00:02 GMT)
Full text and
rfc822 format available.
Message #61 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> >> The value should be an alist where each element has the form
>> >>
>> >> - (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT [KEYWORD VALUE]...))
>> >> + (LANG . (URL REVISION SOURCE-DIR CC C++ COMMIT))
>> >>
>> >> Only LANG and URL are mandatory. LANG is the language symbol.
>> >> URL is the URL of the grammar's Git repository or a directory
>> >> @@ -5015,8 +5015,17 @@ treesit-language-source-alist
>> >> CC and C++ are C and C++ compilers, defaulting to \"cc\" and
>> >> \"c++\", respectively.
>> >>
>> >> +Another way to specify optional data is to use keywords:
>> >> +
>> >> + (LANG . (URL [KEYWORD VALUE]...))
>> >> +
>> >> The currently supported keywords:
>> >>
>> >> +`:revision' is the same as REVISION above.
>> >> +`:source-dir' is the same as SOURCE-DIR above.
>> >> +`:cc' is the same as CC above.
>> >> +`:c++' is the same as C++ above.
>> >> +`:commit' is the same as COMMIT above.
>> >> `:copy-queries' when non-nil specifies whether to copy the files
>> >> in the \"queries\" directory from the source directory to the
>> >> installation directory.")
>> >
>> > This is okay, but I guess the keywords are not entirely independent?
>> > That is, to have a valid spec one needs several keywords to be
>> > specified together? In that case, I think this should be stated in
>> > the doc string.
>>
>> Actually, the keywords are independent.
>
> You mean, it's okay to have just the :source-dir, say, and nothing
> else, and that would produce a complete specification that could be
> used to install or upgrade the grammar library?
Yes, having just the :source-dir means to use a subdir from the HEAD commit.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Sun, 22 Jun 2025 07:01:02 GMT)
Full text and
rfc822 format available.
Message #64 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> 3. (treesit--install-language-grammar-1
>> (locate-user-emacs-file "tree-sitter") 'json
>> "https://github.com/tree-sitter/tree-sitter-json"
>> "4d770d3")
>>
>> fails to check out "4d770d3" with the error:
>>
>> git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b 4d770d3
>> warning: Could not find remote branch 4d770d3 to clone
>> fatal: Remote branch 4d770d3 not found in upstream origin
>
> I’m a bit late to the party, here, but would it make sense to have, say:
>
> (treesit--install-language-grammar-1
> (locate-user-emacs-file "tree-sitter") 'json
> "https://github.com/tree-sitter/tree-sitter-json"
> :tag "v0.24.8"
> :commit "4d770d31f732d50d3ec373865822fbe659e47c75")
>
> We could then:
>
> git clone https://github.com/tree-sitter/tree-sitter-json --quiet --depth 1 -b v0.24.8
> git checkout 4d770d31f732d50d3ec373865822fbe659e47c75
This fails with
fatal: reference is not a tree: 4d770d31f732d50d3ec373865822fbe659e47c75
because the required commit is later than the tag.
This is indicated in the comments section of json-ts-mode.el:
;; - tree-sitter-json: v0.24.8-1-g4d770d3
> Additionally, I think including the tag helps to clarify the intention
> to anyone reading the code, without them having to go away and refer
> to the repository to find out about that commit.
Anyone reading the code could look into the comments section where tags
are generated by treesit-admin using 'treesit--language-git-revision'.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Mon, 23 Jun 2025 01:48:02 GMT)
Full text and
rfc822 format available.
Message #67 received at 78542 <at> debbugs.gnu.org (full text, mbox):
On 10/06/2025 09:23, Juri Linkov wrote:
> This still keeps full history. This means we could simply
> set the default value of treesit--install-language-grammar-full-clone
> to t, or completely remove this variable, if there is no way
> to clone at a specific commit without fetching full history?
This SO answer gives two solutions: https://stackoverflow.com/a/43136160
The first (shorter one) requires the very latest Git client to be
installed - something for us to note for the future.
The second just requires a suitable configured Git server, which Github
servers are. Quoting it here:
git init
git remote add origin <url>
git fetch --depth 1 origin <sha1>
git checkout FETCH_HEAD
The sha1 value must be full, but those are what we decided to use already.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Mon, 23 Jun 2025 06:43:02 GMT)
Full text and
rfc822 format available.
Message #70 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> This still keeps full history. This means we could simply
>> set the default value of treesit--install-language-grammar-full-clone
>> to t, or completely remove this variable, if there is no way
>> to clone at a specific commit without fetching full history?
>
> This SO answer gives two solutions: https://stackoverflow.com/a/43136160
>
> The first (shorter one) requires the very latest Git client to be installed
> - something for us to note for the future.
Good news! The new --revision option added in March 2025 is long overdue
and should have been added long ago together with the --branch option.
> The second just requires a suitable configured Git server, which Github
> servers are. Quoting it here:
>
> git init
> git remote add origin <url>
> git fetch --depth 1 origin <sha1>
> git checkout FETCH_HEAD
>
> The sha1 value must be full, but those are what we decided to use already.
When I tried various similar recipes, they all failed. Maybe because I tried
with abbreviated SHA1s. However, with the full SHA1 this seems to work.
I don't know how reliable this method is, since it requires setting
uploadpack.allowReachableSHA1InWant=true on the server side.
Otherwise, let's wait until the new --revision option becomes more widespread.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Mon, 23 Jun 2025 15:47:02 GMT)
Full text and
rfc822 format available.
Message #73 received at 78542 <at> debbugs.gnu.org (full text, mbox):
On 23/06/2025 09:39, Juri Linkov wrote:
> When I tried various similar recipes, they all failed. Maybe because I tried
> with abbreviated SHA1s. However, with the full SHA1 this seems to work.
> I don't know how reliable this method is, since it requires setting
> uploadpack.allowReachableSHA1InWant=true on the server side.
I wonder if the new --revision option relies on that server setting
anyway (how else would it be implemented?)
> Otherwise, let's wait until the new --revision option becomes more widespread.
Might take a few years (or 5-10). This script runs on the user's
machine, and historically we've been hesitant to up the requirements on
the installed version of Git.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Mon, 23 Jun 2025 16:54:05 GMT)
Full text and
rfc822 format available.
Message #76 received at 78542 <at> debbugs.gnu.org (full text, mbox):
>> When I tried various similar recipes, they all failed. Maybe because I tried
>> with abbreviated SHA1s. However, with the full SHA1 this seems to work.
>> I don't know how reliable this method is, since it requires setting
>> uploadpack.allowReachableSHA1InWant=true on the server side.
>
> I wonder if the new --revision option relies on that server setting anyway
> (how else would it be implemented?)
Can't find any mentions of allowReachableSHA1InWant in
https://github.com/git/git/commit/337855629f59a3f435dabef900e22202ce8e00e1
Probably because --revision is a simplified and limited version of --branch:
Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.
>> Otherwise, let's wait until the new --revision option becomes more widespread.
>
> Might take a few years (or 5-10). This script runs on the user's machine,
> and historically we've been hesitant to up the requirements on the
> installed version of Git.
Meanwhile we could use something like
(version<= "2.49.0" (vc-git--program-version))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78542
; Package
emacs
.
(Tue, 24 Jun 2025 00:46:02 GMT)
Full text and
rfc822 format available.
Message #79 received at 78542 <at> debbugs.gnu.org (full text, mbox):
On 23/06/2025 19:51, Juri Linkov wrote:
>>> When I tried various similar recipes, they all failed. Maybe because I tried
>>> with abbreviated SHA1s. However, with the full SHA1 this seems to work.
>>> I don't know how reliable this method is, since it requires setting
>>> uploadpack.allowReachableSHA1InWant=true on the server side.
>>
>> I wonder if the new --revision option relies on that server setting anyway
>> (how else would it be implemented?)
>
> Can't find any mentions of allowReachableSHA1InWant in
> https://github.com/git/git/commit/337855629f59a3f435dabef900e22202ce8e00e1
I think that's client code. Whereas the setting above would be on the
server.
> Probably because --revision is a simplified and limited version of --branch:
>
> Option `--revision` on contrary detaches HEAD, creates no tracking
> branches, and writes no fetch refspec.
It still needs to know how to fetch a single revision that does not
reference a tag or etc. Maybe that's a capability that was available
internally already - but then why would the server setting be needed for
the "old" solution for this? One using 'git fetch'.
>>> Otherwise, let's wait until the new --revision option becomes more widespread.
>>
>> Might take a few years (or 5-10). This script runs on the user's machine,
>> and historically we've been hesitant to up the requirements on the
>> installed version of Git.
>
> Meanwhile we could use something like
> (version<= "2.49.0" (vc-git--program-version))
Yeah, that should work.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 22 Jul 2025 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 23 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.