GNU bug report logs - #77255
Treesit font-lock override for embed ranges

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Tue, 25 Mar 2025 18:30:02 UTC

Severity: normal

Fixed in version 31.0.50

Done: Juri Linkov <juri <at> linkov.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 77255 in the body.
You can then email your comments to 77255 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to casouri <at> gmail.com, v.pupillo <at> gmail.com, bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Tue, 25 Mar 2025 18:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juri Linkov <juri <at> linkov.net>:
New bug report received and forwarded. Copy sent to casouri <at> gmail.com, v.pupillo <at> gmail.com, bug-gnu-emacs <at> gnu.org. (Tue, 25 Mar 2025 18:30:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Treesit font-lock override for embed ranges
Date: Tue, 25 Mar 2025 20:25:17 +0200
It looks like we need a new keyword like ':override t' for
'treesit-range-rules' that would override host font-lock rules.

I'm trying to create a generic minor mode for AlpineJS framework
where some known HTML attributes contain JS code.  For example:

  <div x-data="{ open: false }">
  <div x-bind:class="! open ? 'hidden' : ''">
  <span x-text="new Date().getFullYear()">

This works nicely with this code added for testing to mhtml-ts-mode:

#+begin_src emacs-lisp
    (setq-local treesit-range-settings
                (append treesit-range-settings
                        (treesit-range-rules
                         :embed 'javascript
                         :host 'html
                         :local t
                         `((attribute
                            (attribute_name) @_name
                            (:match ,(rx (or "x-data" "x-bind" "x-text")) @_name)
                            (quoted_attribute_value
                             (attribute_value) @cap))))))
#+end_src

But the problem is that its highlighting is not visible because
host html-ts-mode font-lock overrides embedded js-ts-mode font-lock.

html-ts-mode--font-lock-settings contains:

   :language 'html
   :override t
   :feature 'string
   `((quoted_attribute_value) @font-lock-string-face)

So only the whole attribute is highlighted by font-lock-string-face
that overrides js highlighting.

Probably there is no way to add ':override t' to all 'javascript' rules in
'js--treesit-font-lock-settings', like ':override t' is already added
to all 'jsdoc' rules in 'js--treesit-font-lock-settings'.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Wed, 26 Mar 2025 07:31:02 GMT) Full text and rfc822 format available.

Message #8 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: 77255 <at> debbugs.gnu.org
Cc: Yuan Fu <casouri <at> gmail.com>, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Wed, 26 Mar 2025 09:27:14 +0200
> It looks like we need a new keyword like ':override t' for
> 'treesit-range-rules' that would override host font-lock rules.

Or maybe not.  I managed to do this without any changes in core.
Also not sure if such functions as treesit-merge-font-lock-feature-list
and treesit-replace-font-lock-feature-settings could be used here.

The solution below works simply by replacing the html rule with
another rule that matches only HTML attributes that don't contain
js code:

#+begin_src emacs-lisp
    (setq-local treesit-range-settings
                (append treesit-range-settings
                        (treesit-range-rules
                         :embed 'javascript
                         :host 'html
                         :local t
                         `((attribute
                            (attribute_name) @_name
                            (:match ,(rx (or "x-data" "x-bind" "x-text")) @_name)
                            (quoted_attribute_value
                             (attribute_value) @cap))))))

    (setq-local treesit-font-lock-settings
                (mapcar (lambda (s)
                          (if (and (eq (treesit-query-language
                                        (treesit-font-lock-setting-query s))
                                       'html)
                                   (eq (treesit-font-lock-setting-feature s)
                                       'string))
                              (car (treesit-font-lock-rules
                                    :language 'html
                                    :override t
                                    :feature 'string
                                    `((attribute
                                       (attribute_name) @_name
                                       ;; (:match (not ,(rx (or "x-data" "x-bind" "x-text"))) @_name)
                                       ;; (:pred (lambda (node)
                                       ;;           (not (string-match-p (rx (or "x-data" "x-bind" "x-text"))
                                       ;;                 (treesit-node-text node t))))
                                       ;;        @_name)
                                       (:pred mhtml-ts-mode--not-match @_name)
                                       (quoted_attribute_value) @font-lock-string-face))))
                            s))
                        treesit-font-lock-settings))
#+end_src

The commented out code shows attempts to use a negated :match
that is not supported.  Also it seems a lambda for :pred is
also not supported.  So needed to add a separate function:

#+begin_src emacs-lisp
(defun mhtml-ts-mode--not-match (node)
  (not (string-match-p (rx (or "x-data" "x-bind" "x-text"))
                       (treesit-node-text node t))))
#+end_src

Then everything works: all HTML attributes are highlighted
except those that should highlight js code in them.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Wed, 26 Mar 2025 16:09:01 GMT) Full text and rfc822 format available.

Message #11 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Vincenzo Pupillo <v.pupillo <at> gmail.com>
To: 77255 <at> debbugs.gnu.org, Juri Linkov <juri <at> linkov.net>
Cc: Yuan Fu <casouri <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Wed, 26 Mar 2025 17:07:58 +0100
Caio Juri,

In data mercoledì 26 marzo 2025 08:27:14 Ora standard dell’Europa centrale, 
Juri Linkov ha scritto:
> > It looks like we need a new keyword like ':override t' for
> > 'treesit-range-rules' that would override host font-lock rules.
> 
> Or maybe not.  I managed to do this without any changes in core.
> Also not sure if such functions as treesit-merge-font-lock-feature-list
> and treesit-replace-font-lock-feature-settings could be used here.
> 
treesit-replace-font-lock-feature-settings works reliably only if the 
replacement is done for rules of the same language. So something like:

#+begin_src emacs-lisp
(setq-local liquid-ts-mode--font-lock-feature-list (treesit-replace-font-lock-
feature-settings
		 (treesit-font-lock-rules
		  :language 'html
		  :override t
		  :feature 'string
		  `((attribute
		     (attribute_name) @_name
		     (:pred mhtml-ts-mode--not-match @_name)
		     (quoted_attribute_value) @font-lock-string-face)))

		 html-ts-mode--treesit-font-lock-settings)
#+end_src

Then:
 
(defvar mhtml-ts-mode--treesit-font-lock-feature-list
  (treesit-merge-font-lock-feature-list
   liquid-ts-mode--treesit-font-lock-feature-list
   (treesit-merge-font-lock-feature-list
    js--treesit-font-lock-feature-list
    css--treesit-font-lock-feature-list))
  "Settings for `treesit-font-lock-feature-list'.")

However, we could modify treesit-replace-font-lock-feature-settings to check 
the language in addition to the feature.

Vincenzo


> The solution below works simply by replacing the html rule with
> another rule that matches only HTML attributes that don't contain
> js code:
> 
> #+begin_src emacs-lisp
>     (setq-local treesit-range-settings
>                 (append treesit-range-settings
>                         (treesit-range-rules
> 
>                          :embed 'javascript
>                          :host 'html
>                          :local t
> 
>                          `((attribute
>                             (attribute_name) @_name
>                             (:match ,(rx (or "x-data" "x-bind" "x-text"))
> @_name) (quoted_attribute_value
>                              (attribute_value) @cap))))))
> 
>     (setq-local treesit-font-lock-settings
>                 (mapcar (lambda (s)
>                           (if (and (eq (treesit-query-language
>                                         (treesit-font-lock-setting-query s))
> 'html)
>                                    (eq (treesit-font-lock-setting-feature s)
> 'string))
>                               (car (treesit-font-lock-rules
> 
>                                     :language 'html
>                                     :override t
>                                     :feature 'string
> 
>                                     `((attribute
>                                        (attribute_name) @_name
>                                        ;; (:match (not ,(rx (or "x-data"
> "x-bind" "x-text"))) @_name) ;; (:pred (lambda (node)
>                                        ;;           (not (string-match-p (rx
> (or "x-data" "x-bind" "x-text")) ;;                 (treesit-node-text node
> t)))) ;;        @_name)
>                                        (:pred mhtml-ts-mode--not-match
> @_name) (quoted_attribute_value) @font-lock-string-face)))) s))
>                         treesit-font-lock-settings))
> #+end_src
> 
> The commented out code shows attempts to use a negated :match
> that is not supported.  Also it seems a lambda for :pred is
> also not supported.  So needed to add a separate function:
> 
> #+begin_src emacs-lisp
> (defun mhtml-ts-mode--not-match (node)
>   (not (string-match-p (rx (or "x-data" "x-bind" "x-text"))
>                        (treesit-node-text node t))))
> #+end_src
> 
> Then everything works: all HTML attributes are highlighted
> except those that should highlight js code in them.








Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Thu, 27 Mar 2025 04:21:07 GMT) Full text and rfc822 format available.

Message #14 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Wed, 26 Mar 2025 21:20:31 -0700

> On Mar 26, 2025, at 12:27 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>> It looks like we need a new keyword like ':override t' for
>> 'treesit-range-rules' that would override host font-lock rules.
> 
> Or maybe not.  I managed to do this without any changes in core.
> Also not sure if such functions as treesit-merge-font-lock-feature-list
> and treesit-replace-font-lock-feature-settings could be used here.
> 
> The solution below works simply by replacing the html rule with
> another rule that matches only HTML attributes that don't contain
> js code:
> 
> #+begin_src emacs-lisp
>    (setq-local treesit-range-settings
>                (append treesit-range-settings
>                        (treesit-range-rules
>                         :embed 'javascript
>                         :host 'html
>                         :local t
>                         `((attribute
>                            (attribute_name) @_name
>                            (:match ,(rx (or "x-data" "x-bind" "x-text")) @_name)
>                            (quoted_attribute_value
>                             (attribute_value) @cap))))))
> 
>    (setq-local treesit-font-lock-settings
>                (mapcar (lambda (s)
>                          (if (and (eq (treesit-query-language
>                                        (treesit-font-lock-setting-query s))
>                                       'html)
>                                   (eq (treesit-font-lock-setting-feature s)
>                                       'string))
>                              (car (treesit-font-lock-rules
>                                    :language 'html
>                                    :override t
>                                    :feature 'string
>                                    `((attribute
>                                       (attribute_name) @_name
>                                       ;; (:match (not ,(rx (or "x-data" "x-bind" "x-text"))) @_name)
>                                       ;; (:pred (lambda (node)
>                                       ;;           (not (string-match-p (rx (or "x-data" "x-bind" "x-text"))
>                                       ;;                 (treesit-node-text node t))))
>                                       ;;        @_name)
>                                       (:pred mhtml-ts-mode--not-match @_name)
>                                       (quoted_attribute_value) @font-lock-string-face))))
>                            s))
>                        treesit-font-lock-settings))
> #+end_src
> 
> The commented out code shows attempts to use a negated :match
> that is not supported.  Also it seems a lambda for :pred is
> also not supported.  So needed to add a separate function:
> 
> #+begin_src emacs-lisp
> (defun mhtml-ts-mode--not-match (node)
>  (not (string-match-p (rx (or "x-data" "x-bind" "x-text"))
>                       (treesit-node-text node t))))
> #+end_src
> 
> Then everything works: all HTML attributes are highlighted
> except those that should highlight js code in them.

Looks reasonable to me. But if it’s a minor mode, we might need to have a way to negate the change made to treesit-font-lock-settings? OTOH if we use :override, we might run into an override arm race when enabling multiple minor modes, etc.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Thu, 27 Mar 2025 19:09:02 GMT) Full text and rfc822 format available.

Message #17 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Thu, 27 Mar 2025 21:04:10 +0200
>> The commented out code shows attempts to use a negated :match
>> that is not supported.  Also it seems a lambda for :pred is
>> also not supported.  So needed to add a separate function:
>> 
>> #+begin_src emacs-lisp
>> (defun mhtml-ts-mode--not-match (node)
>>  (not (string-match-p (rx (or "x-data" "x-bind" "x-text"))
>>                       (treesit-node-text node t))))
>> #+end_src
>> 
>> Then everything works: all HTML attributes are highlighted
>> except those that should highlight js code in them.
>
> Looks reasonable to me. But if it’s a minor mode, we might need to
> have a way to negate the change made to treesit-font-lock-settings?
> OTOH if we use :override, we might run into an override arm race when
> enabling multiple minor modes, etc.

We could declare that the last minor mode wins.  But indeed still need
a way to restore the original treesit-font-lock-settings after disabling
the minor mode.

BTW, I found another problem.  Please confirm if the range rules allow
only one query per embed language, or I'm doing something wrong?

I tried two queries to enable the liquid parser in html nodes 'text'
and also in html attributes 'attribute_value':

#+begin_src emacs-lisp
(setq-local treesit-range-settings
            (append treesit-range-settings
                    (treesit-range-rules
                     :embed 'liquid
                     :host 'html
                     `(((text) @cap1
                        (:match ,(rx (or "{{" "}}" "{%" "%}")) @cap1))
                       ((quoted_attribute_value
                         (attribute_value) @cap2)
                        (:match ,(rx (or "{{" "}}" "{%" "%}")) @cap2))))))
#+end_src

But it handles only one of these queries: when I remove the rule
for (text), it handles attribute_value, but when I remove the rule
for (attribute_value), it enables the liquid parser only for text.

This revealed another problem.  Actually, Liquid is a preprocessor.
Since it can be embedded everywhere in every html node, not depending on
the structure in the html parser, it would be more correct first to use
the liquid parser, and then allow html+js+css parsers to handle
remaining parts.  But both liquid and html parsers should apply
on the whole file.  The only difference is that liquid has a higher
precedence to decide what overlapping parts belong to the liquid parser.
Or maybe it makes sense to have two primary parsers?  They both could
add own highlighting.  And in regard to navigation, one of primary
parsers could have a precedence.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Sat, 29 Mar 2025 08:25:02 GMT) Full text and rfc822 format available.

Message #20 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Sat, 29 Mar 2025 01:23:47 -0700

> On Mar 27, 2025, at 12:04 PM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>>> The commented out code shows attempts to use a negated :match
>>> that is not supported.  Also it seems a lambda for :pred is
>>> also not supported.  So needed to add a separate function:
>>> 
>>> #+begin_src emacs-lisp
>>> (defun mhtml-ts-mode--not-match (node)
>>> (not (string-match-p (rx (or "x-data" "x-bind" "x-text"))
>>>                      (treesit-node-text node t))))
>>> #+end_src
>>> 
>>> Then everything works: all HTML attributes are highlighted
>>> except those that should highlight js code in them.
>> 
>> Looks reasonable to me. But if it’s a minor mode, we might need to
>> have a way to negate the change made to treesit-font-lock-settings?
>> OTOH if we use :override, we might run into an override arm race when
>> enabling multiple minor modes, etc.
> 
> We could declare that the last minor mode wins.  But indeed still need
> a way to restore the original treesit-font-lock-settings after disabling
> the minor mode.
> 
> BTW, I found another problem.  Please confirm if the range rules allow
> only one query per embed language, or I'm doing something wrong?

That’s curious, even if you included multiple patterns in the query, it’s still one query; and the range functions support multiple captured ranges when setting up ranges. So something is wrong here. (See treesit-query-range) I can look into this, but give me a few days.

> I tried two queries to enable the liquid parser in html nodes 'text'
> and also in html attributes 'attribute_value':
> 
> #+begin_src emacs-lisp
> (setq-local treesit-range-settings
>            (append treesit-range-settings
>                    (treesit-range-rules
>                     :embed 'liquid
>                     :host 'html
>                     `(((text) @cap1
>                        (:match ,(rx (or "{{" "}}" "{%" "%}")) @cap1))
>                       ((quoted_attribute_value
>                         (attribute_value) @cap2)
>                        (:match ,(rx (or "{{" "}}" "{%" "%}")) @cap2))))))
> #+end_src
> 
> But it handles only one of these queries: when I remove the rule
> for (text), it handles attribute_value, but when I remove the rule
> for (attribute_value), it enables the liquid parser only for text.
> 
> This revealed another problem.  Actually, Liquid is a preprocessor.
> Since it can be embedded everywhere in every html node, not depending on
> the structure in the html parser, it would be more correct first to use
> the liquid parser, and then allow html+js+css parsers to handle
> remaining parts.  But both liquid and html parsers should apply
> on the whole file.  The only difference is that liquid has a higher
> precedence to decide what overlapping parts belong to the liquid parser.
> Or maybe it makes sense to have two primary parsers?  They both could
> add own highlighting.  And in regard to navigation, one of primary
> parsers could have a precedence.

IMO preprocessor definitely should be the primary parser and let HTML embed in it. In the case of Liquid, it happens to uses a syntax that’s compatible to HTML; that’s fine, but it’s worth it or even necessary to add support for multiple primary parsers because of it. As for precedence, it can be customized by treesit-language-at-point-function.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Mon, 31 Mar 2025 17:06:01 GMT) Full text and rfc822 format available.

Message #23 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Mon, 31 Mar 2025 19:57:18 +0300
>>> Looks reasonable to me. But if it’s a minor mode, we might need to
>>> have a way to negate the change made to treesit-font-lock-settings?
>>> OTOH if we use :override, we might run into an override arm race when
>>> enabling multiple minor modes, etc.
>> 
>> We could declare that the last minor mode wins.  But indeed still need
>> a way to restore the original treesit-font-lock-settings after disabling
>> the minor mode.

I'm convinced now that minor modes should be avoided since it's not
straightforward to revert the original settings when they are disabled.

Everything works nicely in the attached example of the liquid major mode
where liquid is the primary parser.  Currently it copies settings
from mhtml-ts-mode.  But later I'll try to inherit from mhtml-ts-mode.

>> BTW, I found another problem.  Please confirm if the range rules allow
>> only one query per embed language, or I'm doing something wrong?
>
> That’s curious, even if you included multiple patterns in the query, it’s
> still one query; and the range functions support multiple captured ranges
> when setting up ranges. So something is wrong here. (See
> treesit-query-range) I can look into this, but give me a few days.

Multiple captured ranges are not needed anymore for the attached example.

>> This revealed another problem.  Actually, Liquid is a preprocessor.
>> Since it can be embedded everywhere in every html node, not depending on
>> the structure in the html parser, it would be more correct first to use
>> the liquid parser, and then allow html+js+css parsers to handle
>> remaining parts.  But both liquid and html parsers should apply
>> on the whole file.  The only difference is that liquid has a higher
>> precedence to decide what overlapping parts belong to the liquid parser.
>> Or maybe it makes sense to have two primary parsers?  They both could
>> add own highlighting.  And in regard to navigation, one of primary
>> parsers could have a precedence.
>
> IMO preprocessor definitely should be the primary parser and let HTML embed
> in it. In the case of Liquid, it happens to uses a syntax that’s compatible
> to HTML; that’s fine, but it’s worth it or even necessary to add support
> for multiple primary parsers because of it. As for precedence, it can be
> customized by treesit-language-at-point-function.

Thanks for the suggestion to use the preprocessor as the primary parser.
So multiple primary parsers are not required anymore since other parsers
are embedded to the primary parser ('define-treesit-generic-mode' sets
the primary parser).  And everything works for any embedded level:

liquid -> html -> js -> jsdoc
liquid -> html -> css
liquid -> yaml

#+begin_src emacs-lisp
(define-treesit-generic-mode liquid-generic-ts-mode
  "Tree-sitter generic mode for Liquid templates."
  :lang 'liquid
  :source "https://github.com/hankthetank27/tree-sitter-liquid"
  :mode-remap '(html-mode mhtml-mode html-ts-mode mhtml-ts-mode)
  :name "Liquid"
  ;; TODO: :parent mhtml-ts-mode

  (treesit-parser-create 'html)
  (treesit-parser-create 'css)
  (treesit-parser-create 'javascript)

  (setq-local treesit-range-settings
              (treesit-range-rules
               :embed 'html
               :host 'liquid
               '(((template_content) @cap))

               :embed 'javascript
               :host 'liquid
               '(((js_content) @cap))

               :embed 'css
               :host 'liquid
               '(((style_content) @cap))

               :embed 'javascript
               :host 'html
               '((script_element
                  (start_tag (tag_name))
                  (raw_text) @cap))

               :embed 'css
               :host 'html
               '((style_element
                  (start_tag (tag_name))
                  (raw_text) @cap))))

  (when (treesit-ready-p 'yaml t)
    (treesit-parser-create 'yaml)
    (setq-local treesit-range-settings
                (append treesit-range-settings
                        (treesit-range-rules
                         :embed 'yaml
                         :host 'liquid
                         '(((front_matter) @cap))))))

  (setq-local treesit-font-lock-settings
              (append treesit-font-lock-settings
                      html-ts-mode--font-lock-settings
                      js--treesit-font-lock-settings
                      (treesit-replace-font-lock-feature-settings
                       (treesit-font-lock-rules
                        :language 'css
                        :override t
                        :feature 'variable
                        '((plain_value) @mhtml-ts-mode--colorize-css-value
                          (color_value) @mhtml-ts-mode--colorize-css-value))
                       css--treesit-settings)))

  (setq-local treesit-font-lock-feature-list
              (treesit-merge-font-lock-feature-list
               treesit-font-lock-feature-list
               (treesit-merge-font-lock-feature-list
                html-ts-mode--treesit-font-lock-feature-list
                (treesit-merge-font-lock-feature-list
                 js--treesit-font-lock-feature-list
                 css--treesit-font-lock-feature-list))))

  (when (treesit-ready-p 'jsdoc t)
    (treesit-parser-create 'jsdoc)
    (setq-local treesit-range-settings
                (append treesit-range-settings
                        (treesit-range-rules
                         :embed 'jsdoc
                         :host 'javascript
                         :local t
                         `(((comment) @cap
                            (:match ,js--treesit-jsdoc-beginning-regexp @cap)))))))

  (setq treesit-thing-settings
        (append
         `((liquid (sexp (not ,(rx bos (or "program") eos)))
                   (list ,(rx bos (or "range"
                                      "if_statement"
                                      "for_loop_statement"
                                      "case_statement"
                                      "unless_statement"
                                      "capture_statement"
                                      "form_statement"
                                      "tablerow_statement"
                                      "paginate_statement")
                              eos))))
         mhtml-ts-mode--treesit-thing-settings))

  (setq-local treesit-aggregated-outline-predicate
              `((liquid . ,(rx bos (or "if_statement"
                                       "for_loop_statement")
                               eos))
                (html . ,#'html-ts-mode--outline-predicate)
                (javascript . ,js-ts-mode--outline-predicate)
                (css . ,css-ts-mode--outline-predicate))))
#+end_src




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Tue, 01 Apr 2025 00:39:03 GMT) Full text and rfc822 format available.

Message #26 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Mon, 31 Mar 2025 17:38:00 -0700

> On Mar 31, 2025, at 9:57 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>>>> Looks reasonable to me. But if it’s a minor mode, we might need to
>>>> have a way to negate the change made to treesit-font-lock-settings?
>>>> OTOH if we use :override, we might run into an override arm race when
>>>> enabling multiple minor modes, etc.
>>> 
>>> We could declare that the last minor mode wins.  But indeed still need
>>> a way to restore the original treesit-font-lock-settings after disabling
>>> the minor mode.
> 
> I'm convinced now that minor modes should be avoided since it's not
> straightforward to revert the original settings when they are disabled.
> 
> Everything works nicely in the attached example of the liquid major mode
> where liquid is the primary parser.  Currently it copies settings
> from mhtml-ts-mode.  But later I'll try to inherit from mhtml-ts-mode.
> 
>>> BTW, I found another problem.  Please confirm if the range rules allow
>>> only one query per embed language, or I'm doing something wrong?
>> 
>> That’s curious, even if you included multiple patterns in the query, it’s
>> still one query; and the range functions support multiple captured ranges
>> when setting up ranges. So something is wrong here. (See
>> treesit-query-range) I can look into this, but give me a few days.
> 
> Multiple captured ranges are not needed anymore for the attached example.
> 
>>> This revealed another problem.  Actually, Liquid is a preprocessor.
>>> Since it can be embedded everywhere in every html node, not depending on
>>> the structure in the html parser, it would be more correct first to use
>>> the liquid parser, and then allow html+js+css parsers to handle
>>> remaining parts.  But both liquid and html parsers should apply
>>> on the whole file.  The only difference is that liquid has a higher
>>> precedence to decide what overlapping parts belong to the liquid parser.
>>> Or maybe it makes sense to have two primary parsers?  They both could
>>> add own highlighting.  And in regard to navigation, one of primary
>>> parsers could have a precedence.
>> 
>> IMO preprocessor definitely should be the primary parser and let HTML embed
>> in it. In the case of Liquid, it happens to uses a syntax that’s compatible
>> to HTML; that’s fine, but it’s worth it or even necessary to add support
>> for multiple primary parsers because of it. As for precedence, it can be
>> customized by treesit-language-at-point-function.
> 
> Thanks for the suggestion to use the preprocessor as the primary parser.
> So multiple primary parsers are not required anymore since other parsers
> are embedded to the primary parser ('define-treesit-generic-mode' sets
> the primary parser).  And everything works for any embedded level:
> 
> liquid -> html -> js -> jsdoc
> liquid -> html -> css
> liquid -> yaml
> 

Awesome!





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Tue, 01 Apr 2025 17:23:02 GMT) Full text and rfc822 format available.

Message #29 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Vincenzo Pupillo <v.pupillo <at> gmail.com>
Cc: 77255 <at> debbugs.gnu.org, Yuan Fu <casouri <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Tue, 01 Apr 2025 20:17:58 +0300
> However, we could modify treesit-replace-font-lock-feature-settings to check 
> the language in addition to the feature.

Thanks for the suggestion.  So I modified
treesit-replace-font-lock-feature-settings
to check the language as well.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Tue, 01 Apr 2025 17:24:02 GMT) Full text and rfc822 format available.

Message #32 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Tue, 01 Apr 2025 20:18:36 +0300
close 77255 31.0.50
thanks

>> liquid -> html -> js -> jsdoc
>> liquid -> html -> css
>> liquid -> yaml
>
> Awesome!

So now added to treesit-x.el and closed.




bug marked as fixed in version 31.0.50, send any further explanations to 77255 <at> debbugs.gnu.org and Juri Linkov <juri <at> linkov.net> Request was from Juri Linkov <juri <at> linkov.net> to control <at> debbugs.gnu.org. (Tue, 01 Apr 2025 17:24:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Tue, 01 Apr 2025 17:45:03 GMT) Full text and rfc822 format available.

Message #37 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Tue, 01 Apr 2025 20:43:34 +0300
> Everything works nicely in the attached example of the liquid major mode
> where liquid is the primary parser.  Currently it copies settings
> from mhtml-ts-mode.  But later I'll try to inherit from mhtml-ts-mode.

However, with inheritance that causes several changes of the
primary parser and several calls to treesit-major-mode-setup
during the ts-mode initialization, sometimes I get such backtraces
only after the first edit:

Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
  #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_78>(beg #<treesit-node-outdated>)
  treesit-navigate-thing(35 1 beg html-ts-mode--outline-predicate)

Could you suggest where to look?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Tue, 01 Apr 2025 19:19:01 GMT) Full text and rfc822 format available.

Message #40 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Vincenzo Pupillo <v.pupillo <at> gmail.com>
To: Yuan Fu <casouri <at> gmail.com>, Juri Linkov <juri <at> linkov.net>
Cc: 77255 <at> debbugs.gnu.org
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Tue, 01 Apr 2025 21:17:55 +0200
Wow!!! Great work!

In data lunedì 31 marzo 2025 18:57:18 Ora legale dell’Europa centrale, Juri 
Linkov ha scritto:
> >>> Looks reasonable to me. But if it’s a minor mode, we might need to
> >>> have a way to negate the change made to treesit-font-lock-settings?
> >>> OTOH if we use :override, we might run into an override arm race when
> >>> enabling multiple minor modes, etc.
> >> 
> >> We could declare that the last minor mode wins.  But indeed still need
> >> a way to restore the original treesit-font-lock-settings after disabling
> >> the minor mode.
> 
> I'm convinced now that minor modes should be avoided since it's not
> straightforward to revert the original settings when they are disabled.
> 
> Everything works nicely in the attached example of the liquid major mode
> where liquid is the primary parser.  Currently it copies settings
> from mhtml-ts-mode.  But later I'll try to inherit from mhtml-ts-mode.
> 
> >> BTW, I found another problem.  Please confirm if the range rules allow
> >> only one query per embed language, or I'm doing something wrong?
> > 
> > That’s curious, even if you included multiple patterns in the query, it’s
> > still one query; and the range functions support multiple captured ranges
> > when setting up ranges. So something is wrong here. (See
> > treesit-query-range) I can look into this, but give me a few days.
> 
> Multiple captured ranges are not needed anymore for the attached example.
> 
> >> This revealed another problem.  Actually, Liquid is a preprocessor.
> >> Since it can be embedded everywhere in every html node, not depending on
> >> the structure in the html parser, it would be more correct first to use
> >> the liquid parser, and then allow html+js+css parsers to handle
> >> remaining parts.  But both liquid and html parsers should apply
> >> on the whole file.  The only difference is that liquid has a higher
> >> precedence to decide what overlapping parts belong to the liquid parser.
> >> Or maybe it makes sense to have two primary parsers?  They both could
> >> add own highlighting.  And in regard to navigation, one of primary
> >> parsers could have a precedence.
> > 
> > IMO preprocessor definitely should be the primary parser and let HTML
> > embed
> > in it. In the case of Liquid, it happens to uses a syntax that’s
> > compatible
> > to HTML; that’s fine, but it’s worth it or even necessary to add support
> > for multiple primary parsers because of it. As for precedence, it can be
> > customized by treesit-language-at-point-function.
> 
> Thanks for the suggestion to use the preprocessor as the primary parser.
> So multiple primary parsers are not required anymore since other parsers
> are embedded to the primary parser ('define-treesit-generic-mode' sets
> the primary parser).  And everything works for any embedded level:
> 
> liquid -> html -> js -> jsdoc
> liquid -> html -> css
> liquid -> yaml
> 
> #+begin_src emacs-lisp
> (define-treesit-generic-mode liquid-generic-ts-mode
>   "Tree-sitter generic mode for Liquid templates."
> 
>   :lang 'liquid
>   :source "https://github.com/hankthetank27/tree-sitter-liquid"
>   :mode-remap '(html-mode mhtml-mode html-ts-mode mhtml-ts-mode)
>   :name "Liquid"
> 
>   ;; TODO: :parent mhtml-ts-mode
> 
>   (treesit-parser-create 'html)
>   (treesit-parser-create 'css)
>   (treesit-parser-create 'javascript)
> 
>   (setq-local treesit-range-settings
>               (treesit-range-rules
> 
>                :embed 'html
>                :host 'liquid
> 
>                '(((template_content) @cap))
> 
>                :embed 'javascript
>                :host 'liquid
> 
>                '(((js_content) @cap))
> 
>                :embed 'css
>                :host 'liquid
> 
>                '(((style_content) @cap))
> 
>                :embed 'javascript
>                :host 'html
> 
>                '((script_element
>                   (start_tag (tag_name))
>                   (raw_text) @cap))
> 
>                :embed 'css
>                :host 'html
> 
>                '((style_element
>                   (start_tag (tag_name))
>                   (raw_text) @cap))))
> 
>   (when (treesit-ready-p 'yaml t)
>     (treesit-parser-create 'yaml)
>     (setq-local treesit-range-settings
>                 (append treesit-range-settings
>                         (treesit-range-rules
> 
>                          :embed 'yaml
>                          :host 'liquid
> 
>                          '(((front_matter) @cap))))))
> 
>   (setq-local treesit-font-lock-settings
>               (append treesit-font-lock-settings
>                       html-ts-mode--font-lock-settings
>                       js--treesit-font-lock-settings
>                       (treesit-replace-font-lock-feature-settings
>                        (treesit-font-lock-rules
> 
>                         :language 'css
>                         :override t
>                         :feature 'variable
> 
>                         '((plain_value) @mhtml-ts-mode--colorize-css-value
>                           (color_value) @mhtml-ts-mode--colorize-css-value))
> css--treesit-settings)))
> 
>   (setq-local treesit-font-lock-feature-list
>               (treesit-merge-font-lock-feature-list
>                treesit-font-lock-feature-list
>                (treesit-merge-font-lock-feature-list
>                 html-ts-mode--treesit-font-lock-feature-list
>                 (treesit-merge-font-lock-feature-list
>                  js--treesit-font-lock-feature-list
>                  css--treesit-font-lock-feature-list))))
> 
>   (when (treesit-ready-p 'jsdoc t)
>     (treesit-parser-create 'jsdoc)
>     (setq-local treesit-range-settings
>                 (append treesit-range-settings
>                         (treesit-range-rules
> 
>                          :embed 'jsdoc
>                          :host 'javascript
>                          :local t
> 
>                          `(((comment) @cap
>                             (:match ,js--treesit-jsdoc-beginning-regexp
> @cap)))))))
> 
>   (setq treesit-thing-settings
>         (append
>          `((liquid (sexp (not ,(rx bos (or "program") eos)))
>                    (list ,(rx bos (or "range"
>                                       "if_statement"
>                                       "for_loop_statement"
>                                       "case_statement"
>                                       "unless_statement"
>                                       "capture_statement"
>                                       "form_statement"
>                                       "tablerow_statement"
>                                       "paginate_statement")
>                               eos))))
>          mhtml-ts-mode--treesit-thing-settings))
> 
>   (setq-local treesit-aggregated-outline-predicate
>               `((liquid . ,(rx bos (or "if_statement"
>                                        "for_loop_statement")
>                                eos))
>                 (html . ,#'html-ts-mode--outline-predicate)
>                 (javascript . ,js-ts-mode--outline-predicate)
>                 (css . ,css-ts-mode--outline-predicate))))
> #+end_src








Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Tue, 01 Apr 2025 23:42:02 GMT) Full text and rfc822 format available.

Message #43 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Tue, 1 Apr 2025 16:41:13 -0700

> On Apr 1, 2025, at 10:43 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>> Everything works nicely in the attached example of the liquid major mode
>> where liquid is the primary parser.  Currently it copies settings
>> from mhtml-ts-mode.  But later I'll try to inherit from mhtml-ts-mode.
> 
> However, with inheritance that causes several changes of the
> primary parser and several calls to treesit-major-mode-setup
> during the ts-mode initialization, sometimes I get such backtraces
> only after the first edit:
> 
> Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
>  #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_78>(beg #<treesit-node-outdated>)
>  treesit-navigate-thing(35 1 beg html-ts-mode--outline-predicate)
> 
> Could you suggest where to look?

Could it be that you passed a lambda/closure to treesit-navigate-thing which contains a tree-sitter node?

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Wed, 02 Apr 2025 07:01:03 GMT) Full text and rfc822 format available.

Message #46 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Wed, 02 Apr 2025 09:55:18 +0300
>>> Everything works nicely in the attached example of the liquid major mode
>>> where liquid is the primary parser.  Currently it copies settings
>>> from mhtml-ts-mode.  But later I'll try to inherit from mhtml-ts-mode.
>> 
>> However, with inheritance that causes several changes of the
>> primary parser and several calls to treesit-major-mode-setup
>> during the ts-mode initialization, sometimes I get such backtraces
>> only after the first edit:
>> 
>> Debugger entered--Lisp error: (treesit-node-outdated #<treesit-node-outdated>)
>>  #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_78>(beg #<treesit-node-outdated>)
>>  treesit-navigate-thing(35 1 beg html-ts-mode--outline-predicate)
>> 
>> Could you suggest where to look?
>
> Could it be that you passed a lambda/closure to treesit-navigate-thing
> which contains a tree-sitter node?

Thanks, indeed this is because outline-minor-mode was activated
too early on the wrong hook.  It should be activated on the last hook,
i.e. liquid-generic-ts-mode-hook instead of the parent's mhtml-ts-mode-hook.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Wed, 09 Apr 2025 17:34:02 GMT) Full text and rfc822 format available.

Message #49 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Wed, 09 Apr 2025 20:31:28 +0300
BTW, for testing multi-parser ranges I used such a trick that
highlights different ranges using different background colors from hi-lock.
Maybe something like this could be added to 'treesit-explore-mode'
or 'treesit-inspect-mode':

diff --git a/lisp/treesit.el b/lisp/treesit.el
index 8e57a6dae14..5a2721cdda4 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -1036,6 +1026,9 @@ treesit--update-ranges-non-local
                   (overlay-put ov 'treesit-parser embed-parser)
                   (overlay-put ov 'treesit-parser-local-p nil)
                   (overlay-put ov 'treesit-host-parser host-parser)
+                  ;; (overlay-put ov 'font-lock-face (nth embed-level hi-lock-face-defaults))
+                  (overlay-put ov 'font-lock-face (nth (length (memq embed-parser (treesit-parser-list))) hi-lock-face-defaults))
+                  (overlay-put ov 'priority (+ 1000 embed-level))
                   (overlay-put ov 'treesit-parser-ov-timestamp
                                modified-tick)))))
           ;; Set ranges for the embed parser.
@@ -1130,6 +1123,9 @@ treesit--update-ranges-local
                 (treesit-parser-set-embed-level embedded-parser embed-level)
                 (overlay-put ov 'treesit-parser embedded-parser)
                 (overlay-put ov 'treesit-parser-local-p t)
+                ;; (overlay-put ov 'font-lock-face (nth embed-level hi-lock-face-defaults))
+                (overlay-put ov 'font-lock-face (nth (length (memq embedded-parser (treesit-parser-list))) hi-lock-face-defaults))
+                (overlay-put ov 'priority (+ 1000 embed-level))
                 (overlay-put ov 'treesit-host-parser host-parser)
                 (overlay-put ov 'treesit-parser-ov-timestamp
                              modified-tick)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77255; Package emacs. (Thu, 17 Apr 2025 23:41:02 GMT) Full text and rfc822 format available.

Message #52 received at 77255 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 77255 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#77255: Treesit font-lock override for embed ranges
Date: Thu, 17 Apr 2025 16:40:24 -0700

> On Apr 9, 2025, at 10:31 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
> BTW, for testing multi-parser ranges I used such a trick that
> highlights different ranges using different background colors from hi-lock.
> Maybe something like this could be added to 'treesit-explore-mode'
> or 'treesit-inspect-mode':
> 
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 8e57a6dae14..5a2721cdda4 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -1036,6 +1026,9 @@ treesit--update-ranges-non-local
>                   (overlay-put ov 'treesit-parser embed-parser)
>                   (overlay-put ov 'treesit-parser-local-p nil)
>                   (overlay-put ov 'treesit-host-parser host-parser)
> +                  ;; (overlay-put ov 'font-lock-face (nth embed-level hi-lock-face-defaults))
> +                  (overlay-put ov 'font-lock-face (nth (length (memq embed-parser (treesit-parser-list))) hi-lock-face-defaults))
> +                  (overlay-put ov 'priority (+ 1000 embed-level))
>                   (overlay-put ov 'treesit-parser-ov-timestamp
>                                modified-tick)))))
>           ;; Set ranges for the embed parser.
> @@ -1130,6 +1123,9 @@ treesit--update-ranges-local
>                 (treesit-parser-set-embed-level embedded-parser embed-level)
>                 (overlay-put ov 'treesit-parser embedded-parser)
>                 (overlay-put ov 'treesit-parser-local-p t)
> +                ;; (overlay-put ov 'font-lock-face (nth embed-level hi-lock-face-defaults))
> +                (overlay-put ov 'font-lock-face (nth (length (memq embedded-parser (treesit-parser-list))) hi-lock-face-defaults))
> +                (overlay-put ov 'priority (+ 1000 embed-level))
>                 (overlay-put ov 'treesit-host-parser host-parser)
>                 (overlay-put ov 'treesit-parser-ov-timestamp
>                              modified-tick)

Yes, that’ll be a fantastic addition. I thought about something like this but didn’t have the time to implement it.

Yuan



bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 16 May 2025 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 91 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.