GNU bug report logs - #76398
treesit-aggregated-outline-predicate

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Tue, 18 Feb 2025 17:36:01 UTC

Severity: normal

Fixed in version 31.0.50

Done: Juri Linkov <juri <at> linkov.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 76398 in the body.
You can then email your comments to 76398 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to casouri <at> gmail.com, v.pupillo <at> gmail.com, bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Tue, 18 Feb 2025 17:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juri Linkov <juri <at> linkov.net>:
New bug report received and forwarded. Copy sent to casouri <at> gmail.com, v.pupillo <at> gmail.com, bug-gnu-emacs <at> gnu.org. (Tue, 18 Feb 2025 17:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: bug-gnu-emacs <at> gnu.org
Subject: treesit-aggregated-outline-predicate
Date: Tue, 18 Feb 2025 19:27:17 +0200
[Message part 1 (text/plain, inline)]
As discussed in bug#74610, in multi-language modes
treesit-outline-predicate ends abruptly after the first embedded range
since it can't find more matches in its range, it can't go out back
to the primary parser.

So this patch helps 'treesit-outline-search' to get out of the local parser
to the primary parser to continue search for the next outline predicate.

'treesit-outline-level' should do the same, but currently I can't find
a suitable function to break out of embedded confinement
and get the host node that contains the guest ranges.
I mean that e.g. (treesit-parser-root-node (treesit-node-parser node))
can get the root node of the local parser, but how to get its parent node
in the primary parser?  It's understandable that treesit-node-parent
doesn't go out of its parser.  But maybe there is another function?
If such function doesn't exist, this is fine, then could find that
node manually by calculating from treesit-parser-included-ranges.
[treesit-aggregated-outline-predicate.patch (text/x-diff, inline)]
diff --git a/lisp/textmodes/mhtml-ts-mode.el b/lisp/textmodes/mhtml-ts-mode.el
index 83f8879f427..7a481599310 100644
--- a/lisp/textmodes/mhtml-ts-mode.el
+++ b/lisp/textmodes/mhtml-ts-mode.el
@@ -580,7 +580,10 @@ mhtml-ts-mode
     (setq-local treesit-aggregated-simple-imenu-settings
                 mhtml-ts-mode--treesit-aggregated-simple-imenu-settings)
 
-    ;; (setq-local treesit-outline-predicate nil)
+    (setq-local treesit-aggregated-outline-predicate
+                `((html . ,#'html-ts-mode--outline-predicate)
+                  (javascript . "\\`function_declaration\\'")
+                  (css . "\\`rule_set\\'")))
 
     (treesit-major-mode-setup)
 
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 30efd4d4599..ab9bfc33d3d 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -3601,6 +3601,16 @@ treesit-outline-predicate
 is constructed from the value of `treesit-simple-imenu-settings'
 when a major mode sets it.")
 
+(defvar-local treesit-aggregated-outline-predicate nil
+  "Settings that configure `treesit-outline-search' for multi-language modes.
+
+The value should be an alist of (LANG . SETTINGS), where LANG is a
+language symbol, and SETTINGS has the same form as
+`treesit-outline-predicate'.
+
+When both this variable and `treesit-outline-predicate' are non-nil,
+this variable takes priority.")
+
 (defun treesit-outline-predicate--from-imenu (node)
   ;; Return an outline searching predicate created from Imenu.
   ;; Return the value suitable to set `treesit-outline-predicate'.
@@ -3618,7 +3628,10 @@ treesit-outline-predicate--from-imenu
 
 (defun treesit-outline--at-point ()
   "Return the outline heading node at the current line."
-  (let* ((pred treesit-outline-predicate)
+  (let* ((pred (if treesit-aggregated-outline-predicate
+                   (alist-get (treesit-language-at (point))
+                              treesit-aggregated-outline-predicate)
+                 treesit-outline-predicate))
          (bol (pos-bol))
          (eol (pos-eol))
          (current (treesit-thing-at (point) pred))
@@ -3649,9 +3662,35 @@ treesit-outline-search
               (if (eq (point) (pos-bol))
                   (if (bobp) (point) (1- (point)))
                 (pos-eol))))
+           (pred (if treesit-aggregated-outline-predicate
+                     (alist-get (treesit-language-at pos)
+                                treesit-aggregated-outline-predicate)
+                   treesit-outline-predicate))
            (found (or bob-pos
-                      (treesit-navigate-thing pos (if backward -1 1) 'beg
-                                              treesit-outline-predicate))))
+                      (treesit-navigate-thing pos (if backward -1 1) 'beg pred))))
+
+      ;; Handle multi-language modes
+      (when-let* ((ranges (mapcar #'treesit-parser-included-ranges
+                                  (treesit-parser-list)))
+                  (ranges (delq nil (delete '((1 . 1)) ranges))))
+        (if found
+            nil
+          ;; Possibly was inside the local range, and when can't find
+          ;; more matches inside the local range then need to go out
+          (when-let* ((bounds (seq-filter
+                               (lambda (p) (if backward (< p pos) (> p pos)))
+                               (flatten-list
+                                (mapcar (lambda (rr)
+                                          (mapcar (if backward #'car #'cdr) rr))
+                                        ranges))))
+                      (closest (when bounds (if backward (seq-max bounds) (seq-min bounds)))))
+            (setq found (treesit-navigate-thing
+                         closest (if backward -1 1) 'beg
+                         (if treesit-aggregated-outline-predicate
+                             (alist-get (treesit-language-at closest)
+                                        treesit-aggregated-outline-predicate)
+                           treesit-outline-predicate))))))
+
       (if found
           (if (or (not bound) (if backward (>= found bound) (<= found bound)))
               (progn
@@ -3667,10 +3706,25 @@ treesit-outline-search
 (defun treesit-outline-level ()
   "Return the depth of the current outline heading."
   (let* ((node (treesit-outline--at-point))
-         (level 1))
-    (while (setq node (treesit-parent-until node treesit-outline-predicate))
+         (level 1)
+         (parser (when treesit-aggregated-outline-predicate
+                   (treesit-node-parser node)))
+         (pred (if treesit-aggregated-outline-predicate
+                   (alist-get (treesit-language-at (point))
+                              treesit-aggregated-outline-predicate)
+                 treesit-outline-predicate)))
+    (while (setq node (treesit-parent-until node pred))
       (setq level (1+ level)))
-    (if (zerop level) 1 level)))
+    (when-let* ((_ parser)
+                (host-lang (treesit-parser-language treesit-primary-parser))
+                (_ (not (eq (treesit-language-at (point)) host-lang)))
+                (host-pred (alist-get host-lang treesit-aggregated-outline-predicate)))
+      ;; Now need to break out of embedded confinement
+      ;; and get the host node that contains the guest ranges
+      (setq node (treesit-parser-root-node parser))
+      (while (setq node (treesit-parent-until node host-pred))
+        (setq level (1+ level))))
+    level))
 
 ;;; Hideshow mode
 
@@ -3955,11 +4009,14 @@ treesit-major-mode-setup
                 #'treesit-simple-imenu))
 
   ;; Outline minor mode.
-  (when (and (or treesit-outline-predicate treesit-simple-imenu-settings)
+  (when (and (or treesit-outline-predicate
+                 treesit-aggregated-outline-predicate
+                 treesit-simple-imenu-settings)
              (not (seq-some #'local-variable-p
                             '(outline-search-function
                               outline-regexp outline-level))))
-    (unless treesit-outline-predicate
+    (unless (or treesit-outline-predicate
+                treesit-aggregated-outline-predicate)
       (setq treesit-outline-predicate
             #'treesit-outline-predicate--from-imenu))
     (setq-local outline-search-function #'treesit-outline-search

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Wed, 19 Feb 2025 07:48:02 GMT) Full text and rfc822 format available.

Message #8 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: 76398 <at> debbugs.gnu.org
Cc: casouri <at> gmail.com, v.pupillo <at> gmail.com
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Wed, 19 Feb 2025 09:46:23 +0200
> So this patch helps 'treesit-outline-search' to get out of the local parser
> to the primary parser to continue search for the next outline predicate.
>
> 'treesit-outline-level' should do the same, but currently I can't find
> a suitable function to break out of embedded confinement
> and get the host node that contains the guest ranges.
> I mean that e.g. (treesit-parser-root-node (treesit-node-parser node))
> can get the root node of the local parser, but how to get its parent node
> in the primary parser?  It's understandable that treesit-node-parent
> doesn't go out of its parser.  But maybe there is another function?
> If such function doesn't exist, this is fine, then could find that
> node manually by calculating from treesit-parser-included-ranges.

Maybe we need two new primitives:

  (treesit-next-parser-boundary POS)
  (treesit-prev-parser-boundary POS)

and

  (treesit-upper-parser-node POS)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Fri, 21 Feb 2025 07:58:02 GMT) Full text and rfc822 format available.

Message #11 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: 76398 <at> debbugs.gnu.org
Cc: casouri <at> gmail.com, v.pupillo <at> gmail.com
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Fri, 21 Feb 2025 09:56:10 +0200
>> So this patch helps 'treesit-outline-search' to get out of the local parser
>> to the primary parser to continue search for the next outline predicate.
>>
>> 'treesit-outline-level' should do the same, but currently I can't find
>> a suitable function to break out of embedded confinement
>> and get the host node that contains the guest ranges.
>> I mean that e.g. (treesit-parser-root-node (treesit-node-parser node))
>> can get the root node of the local parser, but how to get its parent node
>> in the primary parser?  It's understandable that treesit-node-parent
>> doesn't go out of its parser.  But maybe there is another function?
>> If such function doesn't exist, this is fine, then could find that
>> node manually by calculating from treesit-parser-included-ranges.
>
> Maybe we need two new primitives:
>
>   (treesit-next-parser-boundary POS)
>   (treesit-prev-parser-boundary POS)

Now pushed as 'treesit-closest-parser-boundary'.

>   (treesit-upper-parser-node POS)

Addition of 'treesit-upper-parser-node' is underway that should
be used in 'treesit-up-list' as well.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Fri, 21 Feb 2025 08:09:02 GMT) Full text and rfc822 format available.

Message #14 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 76398 <at> debbugs.gnu.org, v.pupillo <at> gmail.com, casouri <at> gmail.com
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Fri, 21 Feb 2025 10:07:54 +0200
> Cc: casouri <at> gmail.com, v.pupillo <at> gmail.com
> From: Juri Linkov <juri <at> linkov.net>
> Date: Fri, 21 Feb 2025 09:56:10 +0200
> 
> > Maybe we need two new primitives:
> >
> >   (treesit-next-parser-boundary POS)
> >   (treesit-prev-parser-boundary POS)
> 
> Now pushed as 'treesit-closest-parser-boundary'.

Should it be documented in parsing.texi, together with other treesit
functions?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Sat, 22 Feb 2025 06:14:02 GMT) Full text and rfc822 format available.

Message #17 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 76398 <at> debbugs.gnu.org, v.pupillo <at> gmail.com
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Fri, 21 Feb 2025 22:12:59 -0800

> On Feb 20, 2025, at 11:56 PM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>>> So this patch helps 'treesit-outline-search' to get out of the local parser
>>> to the primary parser to continue search for the next outline predicate.
>>> 
>>> 'treesit-outline-level' should do the same, but currently I can't find
>>> a suitable function to break out of embedded confinement
>>> and get the host node that contains the guest ranges.
>>> I mean that e.g. (treesit-parser-root-node (treesit-node-parser node))
>>> can get the root node of the local parser, but how to get its parent node
>>> in the primary parser?  It's understandable that treesit-node-parent
>>> doesn't go out of its parser.  But maybe there is another function?
>>> If such function doesn't exist, this is fine, then could find that
>>> node manually by calculating from treesit-parser-included-ranges.
>> 
>> Maybe we need two new primitives:
>> 
>>  (treesit-next-parser-boundary POS)
>>  (treesit-prev-parser-boundary POS)
> 
> Now pushed as 'treesit-closest-parser-boundary'.

Hold on, let’s not get ahead of ourselves. First of all, the name is not very descriptive IMO it actually finds range boundary, not parser boundary; and the docstring mentions local parser while the function itself doesn’t really involve local parsers—it just checks parser ranges. It can be used for getting out of local parsers, yes, but that’s a use-case, not what it does. So if we want to add this function to the public API set for tree-sitter, it needs a better docstring. (And at the moment I have doubt on its general usefulness.)

More over, is this even necessary? Why do we need to go over all the ranges for all the parsers to get out of a local parser? I thought we can just get the local parser and get it’s range?

>>  (treesit-upper-parser-node POS)
> 
> Addition of 'treesit-upper-parser-node' is underway that should
> be used in 'treesit-up-list' as well.

And if we need to get the “parent node” of a local parser, we can do it in much nicer ways. We can record the parent node when creating the local parser, by either adding a field to the parser object, or record it in a local database, or even just save it in the text property alongside the local parser. Let’s take some time and think of the best way to solve this. Whatever you have in mind, I suspect that it wouldn’t work if there are more than one layer of nesting of parsers, ie, what if you want to get out of a embedded (local) parser’s embedded parser?

On the same note, we actually need some proper tree structure for the primary parser - local parser relationship, because there can be more than one layer. What we currently have doesn’t handle this well (font-lock and indentation). It’s a real use case, someone requested for this for the Perl (or Haskell?) mode, and imagine a rust buffer embeds a markdown comment which embeds rust code examples. I bring this up because this tree structure would’ve solved your problem here as well, once we have it.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Sat, 22 Feb 2025 19:56:01 GMT) Full text and rfc822 format available.

Message #20 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 76398 <at> debbugs.gnu.org, v.pupillo <at> gmail.com
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Sat, 22 Feb 2025 21:53:02 +0200
>>>> So this patch helps 'treesit-outline-search' to get out of the local parser
>>>> to the primary parser to continue search for the next outline predicate.
>>>> 
>>>> 'treesit-outline-level' should do the same, but currently I can't find
>>>> a suitable function to break out of embedded confinement
>>>> and get the host node that contains the guest ranges.
>>>> I mean that e.g. (treesit-parser-root-node (treesit-node-parser node))
>>>> can get the root node of the local parser, but how to get its parent node
>>>> in the primary parser?  It's understandable that treesit-node-parent
>>>> doesn't go out of its parser.  But maybe there is another function?
>>>> If such function doesn't exist, this is fine, then could find that
>>>> node manually by calculating from treesit-parser-included-ranges.
>>> 
>>> Maybe we need two new primitives:
>>> 
>>>  (treesit-next-parser-boundary POS)
>>>  (treesit-prev-parser-boundary POS)
>> 
>> Now pushed as 'treesit-closest-parser-boundary'.
>
> Hold on, let’s not get ahead of ourselves. First of all, the name is not
> very descriptive IMO it actually finds range boundary, not parser boundary;
> and the docstring mentions local parser while the function itself doesn’t
> really involve local parsers—it just checks parser ranges. It can be used
> for getting out of local parsers, yes, but that’s a use-case, not what it
> does. So if we want to add this function to the public API set for
> tree-sitter, it needs a better docstring. (And at the moment I have doubt
> on its general usefulness.)

I assumed that this function will not remain in the final
implementation of this feature.  I added it temporarily
to get the embedded outlines into a working state.

> More over, is this even necessary?  Why do we need to go over all the
> ranges for all the parsers to get out of a local parser?  I thought we
> can just get the local parser and get it’s range?

I tried to use treesit-local-parsers-at, but it always returns nil.

>>>  (treesit-upper-parser-node POS)
>> 
>> Addition of 'treesit-upper-parser-node' is underway that should
>> be used in 'treesit-up-list' as well.
>
> And if we need to get the “parent node” of a local parser, we can do it in
> much nicer ways. We can record the parent node when creating the local
> parser, by either adding a field to the parser object, or record it in
> a local database, or even just save it in the text property alongside the
> local parser. Let’s take some time and think of the best way to solve
> this.

Now I improved treesit-outline-level as well without adding
new functions.  Everything works now, so we can do more
refactoring without introducing regressions.

> Whatever you have in mind, I suspect that it wouldn’t work if there
> are more than one layer of nesting of parsers, ie, what if you want to get
> out of a embedded (local) parser’s embedded parser?

The current implementation in treesit-outline-level works with any depth
of nested parsers.  But the current solution is quite fragile.  So we need
to find a better way such as recording the parent parser node somewhere.

> On the same note, we actually need some proper tree structure for the
> primary parser - local parser relationship, because there can be more than
> one layer. What we currently have doesn’t handle this well (font-lock and
> indentation). It’s a real use case, someone requested for this for the Perl
> (or Haskell?) mode, and imagine a rust buffer embeds a markdown comment
> which embeds rust code examples. I bring this up because this tree
> structure would’ve solved your problem here as well, once we have it.

Agreed, this needs a better design.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Mon, 24 Feb 2025 06:49:02 GMT) Full text and rfc822 format available.

Message #23 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 76398 <at> debbugs.gnu.org, v.pupillo <at> gmail.com
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Sun, 23 Feb 2025 22:47:47 -0800

> On Feb 22, 2025, at 11:53 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>>>>> So this patch helps 'treesit-outline-search' to get out of the local parser
>>>>> to the primary parser to continue search for the next outline predicate.
>>>>> 
>>>>> 'treesit-outline-level' should do the same, but currently I can't find
>>>>> a suitable function to break out of embedded confinement
>>>>> and get the host node that contains the guest ranges.
>>>>> I mean that e.g. (treesit-parser-root-node (treesit-node-parser node))
>>>>> can get the root node of the local parser, but how to get its parent node
>>>>> in the primary parser?  It's understandable that treesit-node-parent
>>>>> doesn't go out of its parser.  But maybe there is another function?
>>>>> If such function doesn't exist, this is fine, then could find that
>>>>> node manually by calculating from treesit-parser-included-ranges.
>>>> 
>>>> Maybe we need two new primitives:
>>>> 
>>>> (treesit-next-parser-boundary POS)
>>>> (treesit-prev-parser-boundary POS)
>>> 
>>> Now pushed as 'treesit-closest-parser-boundary'.
>> 
>> Hold on, let’s not get ahead of ourselves. First of all, the name is not
>> very descriptive IMO it actually finds range boundary, not parser boundary;
>> and the docstring mentions local parser while the function itself doesn’t
>> really involve local parsers—it just checks parser ranges. It can be used
>> for getting out of local parsers, yes, but that’s a use-case, not what it
>> does. So if we want to add this function to the public API set for
>> tree-sitter, it needs a better docstring. (And at the moment I have doubt
>> on its general usefulness.)
> 
> I assumed that this function will not remain in the final
> implementation of this feature.  I added it temporarily
> to get the embedded outlines into a working state.

Great. Then let’s use double dash and maybe even add some comments to explain that, WDYT?

> 
>> More over, is this even necessary?  Why do we need to go over all the
>> ranges for all the parsers to get out of a local parser?  I thought we
>> can just get the local parser and get it’s range?
> 
> I tried to use treesit-local-parsers-at, but it always returns nil.

That’s probably because the major mode you’re testing with doesn’t use local parser for the embedded language. I’ll add a function to get the “parent node”, that should solve your problem here.

> 
>>>> (treesit-upper-parser-node POS)
>>> 
>>> Addition of 'treesit-upper-parser-node' is underway that should
>>> be used in 'treesit-up-list' as well.
>> 
>> And if we need to get the “parent node” of a local parser, we can do it in
>> much nicer ways. We can record the parent node when creating the local
>> parser, by either adding a field to the parser object, or record it in
>> a local database, or even just save it in the text property alongside the
>> local parser. Let’s take some time and think of the best way to solve
>> this.
> 
> Now I improved treesit-outline-level as well without adding
> new functions.  Everything works now, so we can do more
> refactoring without introducing regressions.
> 
>> Whatever you have in mind, I suspect that it wouldn’t work if there
>> are more than one layer of nesting of parsers, ie, what if you want to get
>> out of a embedded (local) parser’s embedded parser?
> 
> The current implementation in treesit-outline-level works with any depth
> of nested parsers.  But the current solution is quite fragile.  So we need
> to find a better way such as recording the parent parser node somewhere.
> 
>> On the same note, we actually need some proper tree structure for the
>> primary parser - local parser relationship, because there can be more than
>> one layer. What we currently have doesn’t handle this well (font-lock and
>> indentation). It’s a real use case, someone requested for this for the Perl
>> (or Haskell?) mode, and imagine a rust buffer embeds a markdown comment
>> which embeds rust code examples. I bring this up because this tree
>> structure would’ve solved your problem here as well, once we have it.
> 
> Agreed, this needs a better design.

Cool, working on it.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Mon, 24 Feb 2025 19:41:02 GMT) Full text and rfc822 format available.

Message #26 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 76398 <at> debbugs.gnu.org, v.pupillo <at> gmail.com
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Mon, 24 Feb 2025 21:37:16 +0200
>>>>>> So this patch helps 'treesit-outline-search' to get out of the local parser
>>>>>> to the primary parser to continue search for the next outline predicate.
>>>>>> 
>>>>>> 'treesit-outline-level' should do the same, but currently I can't find
>>>>>> a suitable function to break out of embedded confinement
>>>>>> and get the host node that contains the guest ranges.
>>>>>> I mean that e.g. (treesit-parser-root-node (treesit-node-parser node))
>>>>>> can get the root node of the local parser, but how to get its parent node
>>>>>> in the primary parser?  It's understandable that treesit-node-parent
>>>>>> doesn't go out of its parser.  But maybe there is another function?
>>>>>> If such function doesn't exist, this is fine, then could find that
>>>>>> node manually by calculating from treesit-parser-included-ranges.
>>>>> 
>>>>> Maybe we need two new primitives:
>>>>> 
>>>>> (treesit-next-parser-boundary POS)
>>>>> (treesit-prev-parser-boundary POS)
>>>> 
>>>> Now pushed as 'treesit-closest-parser-boundary'.
>>> 
>>> Hold on, let’s not get ahead of ourselves. First of all, the name is not
>>> very descriptive IMO it actually finds range boundary, not parser boundary;
>>> and the docstring mentions local parser while the function itself doesn’t
>>> really involve local parsers—it just checks parser ranges. It can be used
>>> for getting out of local parsers, yes, but that’s a use-case, not what it
>>> does. So if we want to add this function to the public API set for
>>> tree-sitter, it needs a better docstring. (And at the moment I have doubt
>>> on its general usefulness.)
>> 
>> I assumed that this function will not remain in the final
>> implementation of this feature.  I added it temporarily
>> to get the embedded outlines into a working state.
>
> Great. Then let’s use double dash and maybe even add some comments to explain that, WDYT?

Maybe a better name would be 'treesit-outline--closest-range-boundary'?

It's needed to prevent skipping the range boundaries that
treesit-navigate-thing does by default, e.g. instead of

from (the starting point of the navigation)
range_1_beg
range_1_end
to (the ending point of the navigation)

we need to stop inside the next range and search inside from its beginning:

from
range_1_beg
to
range_1_end

Another case it that when inside a local range, treesit-navigate-thing
returns nil, but need to go outside the local range, and continue the search:

range_1_beg
from
range_1_end
to

>>> More over, is this even necessary?  Why do we need to go over all the
>>> ranges for all the parsers to get out of a local parser?  I thought we
>>> can just get the local parser and get it’s range?
>> 
>> I tried to use treesit-local-parsers-at, but it always returns nil.
>
> That’s probably because the major mode you’re testing with doesn’t use
> local parser for the embedded language. I’ll add a function to get the
> “parent node”, that should solve your problem here.

Like you noticed, we need to check ranges, not parsers anyway,
so treesit-local-parsers-at can't help here.

>>>>> (treesit-upper-parser-node POS)
>>>> 
>>>> Addition of 'treesit-upper-parser-node' is underway that should
>>>> be used in 'treesit-up-list' as well.
>>> 
>>> And if we need to get the “parent node” of a local parser, we can do it in
>>> much nicer ways. We can record the parent node when creating the local
>>> parser, by either adding a field to the parser object, or record it in
>>> a local database, or even just save it in the text property alongside the
>>> local parser. Let’s take some time and think of the best way to solve
>>> this.
>> 
>> Now I improved treesit-outline-level as well without adding
>> new functions.  Everything works now, so we can do more
>> refactoring without introducing regressions.
>> 
>>> Whatever you have in mind, I suspect that it wouldn’t work if there
>>> are more than one layer of nesting of parsers, ie, what if you want to get
>>> out of a embedded (local) parser’s embedded parser?
>> 
>> The current implementation in treesit-outline-level works with any depth
>> of nested parsers.  But the current solution is quite fragile.  So we need
>> to find a better way such as recording the parent parser node somewhere.

Actually, here as well we need a parent node for the range, not the parser.
Maybe the range should be a special object like the parser is?

>>> On the same note, we actually need some proper tree structure for the
>>> primary parser - local parser relationship, because there can be more than
>>> one layer. What we currently have doesn’t handle this well (font-lock and
>>> indentation). It’s a real use case, someone requested for this for the Perl
>>> (or Haskell?) mode, and imagine a rust buffer embeds a markdown comment
>>> which embeds rust code examples. I bring this up because this tree
>>> structure would’ve solved your problem here as well, once we have it.
>> 
>> Agreed, this needs a better design.
>
> Cool, working on it.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Fri, 28 Feb 2025 03:11:02 GMT) Full text and rfc822 format available.

Message #29 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 76398 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Thu, 27 Feb 2025 19:10:00 -0800

> On Feb 24, 2025, at 11:37 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>>>>>>> So this patch helps 'treesit-outline-search' to get out of the local parser
>>>>>>> to the primary parser to continue search for the next outline predicate.
>>>>>>> 
>>>>>>> 'treesit-outline-level' should do the same, but currently I can't find
>>>>>>> a suitable function to break out of embedded confinement
>>>>>>> and get the host node that contains the guest ranges.
>>>>>>> I mean that e.g. (treesit-parser-root-node (treesit-node-parser node))
>>>>>>> can get the root node of the local parser, but how to get its parent node
>>>>>>> in the primary parser?  It's understandable that treesit-node-parent
>>>>>>> doesn't go out of its parser.  But maybe there is another function?
>>>>>>> If such function doesn't exist, this is fine, then could find that
>>>>>>> node manually by calculating from treesit-parser-included-ranges.
>>>>>> 
>>>>>> Maybe we need two new primitives:
>>>>>> 
>>>>>> (treesit-next-parser-boundary POS)
>>>>>> (treesit-prev-parser-boundary POS)
>>>>> 
>>>>> Now pushed as 'treesit-closest-parser-boundary'.
>>>> 
>>>> Hold on, let’s not get ahead of ourselves. First of all, the name is not
>>>> very descriptive IMO it actually finds range boundary, not parser boundary;
>>>> and the docstring mentions local parser while the function itself doesn’t
>>>> really involve local parsers—it just checks parser ranges. It can be used
>>>> for getting out of local parsers, yes, but that’s a use-case, not what it
>>>> does. So if we want to add this function to the public API set for
>>>> tree-sitter, it needs a better docstring. (And at the moment I have doubt
>>>> on its general usefulness.)
>>> 
>>> I assumed that this function will not remain in the final
>>> implementation of this feature.  I added it temporarily
>>> to get the embedded outlines into a working state.
>> 
>> Great. Then let’s use double dash and maybe even add some comments to explain that, WDYT?
> 
> Maybe a better name would be 'treesit-outline--closest-range-boundary'?
> 
> It's needed to prevent skipping the range boundaries that
> treesit-navigate-thing does by default, e.g. instead of
> 
> from (the starting point of the navigation)
> range_1_beg
> range_1_end
> to (the ending point of the navigation)
> 
> we need to stop inside the next range and search inside from its beginning:
> 
> from
> range_1_beg
> to
> range_1_end
> 
> Another case it that when inside a local range, treesit-navigate-thing
> returns nil, but need to go outside the local range, and continue the search:
> 
> range_1_beg
> from
> range_1_end
> to
> 
>>>> More over, is this even necessary?  Why do we need to go over all the
>>>> ranges for all the parsers to get out of a local parser?  I thought we
>>>> can just get the local parser and get it’s range?
>>> 
>>> I tried to use treesit-local-parsers-at, but it always returns nil.
>> 
>> That’s probably because the major mode you’re testing with doesn’t use
>> local parser for the embedded language. I’ll add a function to get the
>> “parent node”, that should solve your problem here.
> 
> Like you noticed, we need to check ranges, not parsers anyway,
> so treesit-local-parsers-at can't help here.

Combine with your reply above, it seems we need some information attached to the ranges (with either overlay or text property), rather than to the parser. Is that right? When updating ranges, we can mark the ranges covered by each embed parser, and perhaps link the text prop or overlay to the “parent node”.


>>>>>> (treesit-upper-parser-node POS)
>>>>> 
>>>>> Addition of 'treesit-upper-parser-node' is underway that should
>>>>> be used in 'treesit-up-list' as well.
>>>> 
>>>> And if we need to get the “parent node” of a local parser, we can do it in
>>>> much nicer ways. We can record the parent node when creating the local
>>>> parser, by either adding a field to the parser object, or record it in
>>>> a local database, or even just save it in the text property alongside the
>>>> local parser. Let’s take some time and think of the best way to solve
>>>> this.
>>> 
>>> Now I improved treesit-outline-level as well without adding
>>> new functions.  Everything works now, so we can do more
>>> refactoring without introducing regressions.
>>> 
>>>> Whatever you have in mind, I suspect that it wouldn’t work if there
>>>> are more than one layer of nesting of parsers, ie, what if you want to get
>>>> out of a embedded (local) parser’s embedded parser?
>>> 
>>> The current implementation in treesit-outline-level works with any depth
>>> of nested parsers.  But the current solution is quite fragile.  So we need
>>> to find a better way such as recording the parent parser node somewhere.
> 
> Actually, here as well we need a parent node for the range, not the parser.
> Maybe the range should be a special object like the parser is?
> 
>>>> On the same note, we actually need some proper tree structure for the
>>>> primary parser - local parser relationship, because there can be more than
>>>> one layer. What we currently have doesn’t handle this well (font-lock and
>>>> indentation). It’s a real use case, someone requested for this for the Perl
>>>> (or Haskell?) mode, and imagine a rust buffer embeds a markdown comment
>>>> which embeds rust code examples. I bring this up because this tree
>>>> structure would’ve solved your problem here as well, once we have it.
>>> 
>>> Agreed, this needs a better design.
>> 
>> Cool, working on it.
> 
> Thanks.

I’ve mostly worked out arbitrary nesting of embedded parsers. I tested with markdown -> javascript -> jsdoc, and it seems to work fine. What I’m not sure about right now is the navigation part.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Sun, 02 Mar 2025 17:53:02 GMT) Full text and rfc822 format available.

Message #32 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 76398 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Sun, 02 Mar 2025 19:27:42 +0200
>> Maybe a better name would be 'treesit-outline--closest-range-boundary'?
>> 
>> It's needed to prevent skipping the range boundaries that
>> treesit-navigate-thing does by default, e.g. instead of
>> 
>> from (the starting point of the navigation)
>> range_1_beg
>> range_1_end
>> to (the ending point of the navigation)

But currently handling a local parser is not possible in the above function.
For example, I tried:

  (mapcar #'treesit-parser-included-ranges
          (append (treesit-parser-list)
                  (treesit-local-parsers-at (point))))

But it's position-dependent and can't find the next local-parser
when the position of point is before the start of the local-parser.

>> Like you noticed, we need to check ranges, not parsers anyway,
>> so treesit-local-parsers-at can't help here.
>
> Combine with your reply above, it seems we need some information attached
> to the ranges (with either overlay or text property), rather than to the
> parser. Is that right?

Maybe.  One of two needs I have discovered so far is the ability
to find the next/previous range boundary.

> When updating ranges, we can mark the ranges covered
> by each embed parser, and perhaps link the text prop or overlay to the
> “parent node”.

The ability to find the parent node of the range is another need.

>>>>> I bring this up because this tree structure would’ve
>>>>> solved your problem here as well, once we have it.

A tree of ranges (including ranges of local parsers)
would solve the problem indeed.

> I’ve mostly worked out arbitrary nesting of embedded parsers.
> I tested with markdown -> javascript -> jsdoc, and it seems to work
> fine.  What I’m not sure about right now is the navigation part.

I tested on mhtml-ts-mode -> javascript -> jsdoc and
(mapcar 'treesit-parser-embed-level (treesit-local-parsers-at (point)))
correctly returns (2).  Whereas
(mapcar 'treesit-parser-parent-node (treesit-local-parsers-at (point)))
returns (nil).

BTW, while deleting a jsdoc block I get such backtrace:

Debugger entered--Lisp error: (treesit-parser-deleted #<treesit-parser for jsdoc>)
  treesit-node-on(1 202 #<treesit-parser for jsdoc>)
  treesit--explorer-refresh()
  apply(treesit--explorer-refresh nil)
  timer-event-handler([t ...

and while editing a jsdoc block (inserting a newline before it):

Debugger entered--Lisp error: (treesit-no-parser jsdoc)
  treesit-buffer-root-node(jsdoc)
  treesit-node-on(60 60)
  treesit--indent-1()
  treesit-indent()
  indent-according-to-mode()
  electric-indent-post-self-insert-function()
  newline(nil 1)
  funcall-interactively(newline nil 1)
  command-execute(newline)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Tue, 11 Mar 2025 08:48:02 GMT) Full text and rfc822 format available.

Message #35 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 76398 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Tue, 11 Mar 2025 01:46:47 -0700

> On Mar 2, 2025, at 9:27 AM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>>> Maybe a better name would be 'treesit-outline--closest-range-boundary'?
>>> 
>>> It's needed to prevent skipping the range boundaries that
>>> treesit-navigate-thing does by default, e.g. instead of
>>> 
>>> from (the starting point of the navigation)
>>> range_1_beg
>>> range_1_end
>>> to (the ending point of the navigation)
> 
> But currently handling a local parser is not possible in the above function.
> For example, I tried:
> 
>  (mapcar #'treesit-parser-included-ranges
>          (append (treesit-parser-list)
>                  (treesit-local-parsers-at (point))))
> 
> But it's position-dependent and can't find the next local-parser
> when the position of point is before the start of the local-parser.
> 
>>> Like you noticed, we need to check ranges, not parsers anyway,
>>> so treesit-local-parsers-at can't help here.
>> 
>> Combine with your reply above, it seems we need some information attached
>> to the ranges (with either overlay or text property), rather than to the
>> parser. Is that right?
> 
> Maybe.  One of two needs I have discovered so far is the ability
> to find the next/previous range boundary.

I pushed some changes. Now this should be doable by searching for overlays boundaries. Both local and non-local parsers now have an overlay that spans the range they’re in. In the case of non-local parsers, each range it parses gets an overlay. You can search for the comment that’s marked with (ref:local-parser-overlay) in treesit.el to see the properties I put on the overlays.

>> When updating ranges, we can mark the ranges covered
>> by each embed parser, and perhaps link the text prop or overlay to the
>> “parent node”.
> 
> The ability to find the parent node of the range is another need.

I think your use-case is to continue searching upwards across parser boundary, right? I wonder if having access to the parent parser is enough? Because once you know the parser, you can just get the node-at-point.

>>>>>> I bring this up because this tree structure would’ve
>>>>>> solved your problem here as well, once we have it.
> 
> A tree of ranges (including ranges of local parsers)
> would solve the problem indeed.

This should be possible now by going over the overlays at point and find the parser that has a -1 embed level. There should only one parser with -1 embed level.

>> I’ve mostly worked out arbitrary nesting of embedded parsers.
>> I tested with markdown -> javascript -> jsdoc, and it seems to work
>> fine.  What I’m not sure about right now is the navigation part.
> 
> I tested on mhtml-ts-mode -> javascript -> jsdoc and
> (mapcar 'treesit-parser-embed-level (treesit-local-parsers-at (point)))
> correctly returns (2).  Whereas
> (mapcar 'treesit-parser-parent-node (treesit-local-parsers-at (point)))
> returns (nil).
> 
> BTW, while deleting a jsdoc block I get such backtrace:
> 
> Debugger entered--Lisp error: (treesit-parser-deleted #<treesit-parser for jsdoc>)
>  treesit-node-on(1 202 #<treesit-parser for jsdoc>)
>  treesit--explorer-refresh()
>  apply(treesit--explorer-refresh nil)
>  timer-event-handler([t ...
> 
> and while editing a jsdoc block (inserting a newline before it):
> 
> Debugger entered--Lisp error: (treesit-no-parser jsdoc)
>  treesit-buffer-root-node(jsdoc)
>  treesit-node-on(60 60)
>  treesit--indent-1()
>  treesit-indent()
>  indent-according-to-mode()
>  electric-indent-post-self-insert-function()
>  newline(nil 1)
>  funcall-interactively(newline nil 1)
>  command-execute(newline)

Thanks, these should be fixed now.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#76398; Package emacs. (Tue, 11 Mar 2025 18:23:02 GMT) Full text and rfc822 format available.

Message #38 received at 76398 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 76398 <at> debbugs.gnu.org, Vincenzo Pupillo <v.pupillo <at> gmail.com>
Subject: Re: bug#76398: treesit-aggregated-outline-predicate
Date: Tue, 11 Mar 2025 20:18:32 +0200
close 76398 31.0.50
thanks

>>> Combine with your reply above, it seems we need some information attached
>>> to the ranges (with either overlay or text property), rather than to the
>>> parser. Is that right?
>> 
>> Maybe.  One of two needs I have discovered so far is the ability
>> to find the next/previous range boundary.
>
> I pushed some changes. Now this should be doable by searching for overlays
> boundaries. Both local and non-local parsers now have an overlay that spans
> the range they’re in. In the case of non-local parsers, each range it
> parses gets an overlay. You can search for the comment that’s marked with
> (ref:local-parser-overlay) in treesit.el to see the properties I put on the
> overlays.

Thanks!  Searching for the next overlay now works nicely,
so pushed the changes to 'treesit-outline-search'.

>>> When updating ranges, we can mark the ranges covered
>>> by each embed parser, and perhaps link the text prop or overlay to the
>>> “parent node”.
>> 
>> The ability to find the parent node of the range is another need.
>
> I think your use-case is to continue searching upwards across parser
> boundary, right?  I wonder if having access to the parent parser is
> enough?  Because once you know the parser, you can just get the
> node-at-point.

Yes, having access to the parent parser is enough,
and everything works with the 'treesit-host-parser' overlay
by finding the parent parser node with 'treesit-node-at',
so pushed the changes to 'treesit-outline-level' as well.




bug marked as fixed in version 31.0.50, send any further explanations to 76398 <at> debbugs.gnu.org and Juri Linkov <juri <at> linkov.net> Request was from Juri Linkov <juri <at> linkov.net> to control <at> debbugs.gnu.org. (Tue, 11 Mar 2025 18:23:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 09 Apr 2025 11:24:23 GMT) Full text and rfc822 format available.

This bug report was last modified 69 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.