GNU bug report logs - #78703
beginning-of-defun and friends still wrong in typescript-ts-mode

Package: emacs;

Reported by: Daniel Colascione <dancol <at> dancol.org>

Date: Thu, 5 Jun 2025 23:41:02 UTC

Severity: normal

To reply to this bug, email your comments to 78703 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Thu, 05 Jun 2025 23:41:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniel Colascione <dancol <at> dancol.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 05 Jun 2025 23:41:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: bug-gnu-emacs <at> gnu.org
Subject: beginning-of-defun and friends still wrong in typescript-ts-mode
Date: Thu, 05 Jun 2025 16:40:03 -0700

Right now, C-M-a runs treesit-beginning-of-defun which goes to the
start of the previous defun in buffer text, not the enclosing defun.

If we have this program:

    1 function foo() {
    2   function bar() {
    3     return 5
    4   }
    5   return 7
    6 }


and point is on line 5, then if we hit C-M-a, point goes to line 2, not
line 1.  In every single situation, when I use beginning-of-defun, I
intend to go to the start of my enclosing defun not the one that happens
to be previous in buffer linearization.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Fri, 06 Jun 2025 07:02:02 GMT) Full text and rfc822 format available.

Message #8 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>,
 Yuan Fu <casouri <at> gmail.com>
Cc: 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Fri, 06 Jun 2025 10:01:20 +0300

> From: Daniel Colascione <dancol <at> dancol.org>
> Date: Thu, 05 Jun 2025 16:40:03 -0700
> 
> Right now, C-M-a runs treesit-beginning-of-defun which goes to the
> start of the previous defun in buffer text, not the enclosing defun.
> 
> If we have this program:
> 
>     1 function foo() {
>     2   function bar() {
>     3     return 5
>     4   }
>     5   return 7
>     6 }
> 
> 
> and point is on line 5, then if we hit C-M-a, point goes to line 2, not
> line 1.  In every single situation, when I use beginning-of-defun, I
> intend to go to the start of my enclosing defun not the one that happens
> to be previous in buffer linearization.

I think you want to set treesit-defun-tactic to 'top-level'.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Fri, 06 Jun 2025 07:24:02 GMT) Full text and rfc822 format available.

Message #11 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Daniel Colascione <dancol <at> dancol.org>, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Fri, 6 Jun 2025 00:23:23 -0700


> On Jun 6, 2025, at 12:01 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>> From: Daniel Colascione <dancol <at> dancol.org>
>> Date: Thu, 05 Jun 2025 16:40:03 -0700
>> 
>> Right now, C-M-a runs treesit-beginning-of-defun which goes to the
>> start of the previous defun in buffer text, not the enclosing defun.
>> 
>> If we have this program:
>> 
>>    1 function foo() {
>>    2   function bar() {
>>    3     return 5
>>    4   }
>>    5   return 7
>>    6 }
>> 
>> 
>> and point is on line 5, then if we hit C-M-a, point goes to line 2, not
>> line 1.  In every single situation, when I use beginning-of-defun, I
>> intend to go to the start of my enclosing defun not the one that happens
>> to be previous in buffer linearization.
> 
> I think you want to set treesit-defun-tactic to 'top-level’.

Yes. Most likely top-level is what you want. 

Yuan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Fri, 06 Jun 2025 07:58:01 GMT) Full text and rfc822 format available.

Message #14 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Yuan Fu <casouri <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in typescript-ts-mode
Date: Fri, 06 Jun 2025 00:57:36 -0700


On June 6, 2025 12:23:23 AM PDT, Yuan Fu <casouri <at> gmail.com> wrote:
>
>
>> On Jun 6, 2025, at 12:01 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> 
>>> From: Daniel Colascione <dancol <at> dancol.org>
>>> Date: Thu, 05 Jun 2025 16:40:03 -0700
>>> 
>>> Right now, C-M-a runs treesit-beginning-of-defun which goes to the
>>> start of the previous defun in buffer text, not the enclosing defun.
>>> 
>>> If we have this program:
>>> 
>>>    1 function foo() {
>>>    2   function bar() {
>>>    3     return 5
>>>    4   }
>>>    5   return 7
>>>    6 }
>>> 
>>> 
>>> and point is on line 5, then if we hit C-M-a, point goes to line 2, not
>>> line 1.  In every single situation, when I use beginning-of-defun, I
>>> intend to go to the start of my enclosing defun not the one that happens
>>> to be previous in buffer linearization.
>> 
>> I think you want to set treesit-defun-tactic to 'top-level’.
>
>Yes. Most likely top-level is what you want. 
>
>Yuan
>
>

I don't think that's right either --- if I'm on line 3, I should go to line 2, not 1. Go to beginning of defun should mean go to the beginning of *my* defun. That's the traditional behavior from cc-mode, js-mode, etc. and the one that makes most sense. I can see top-level being an option, but I don't think the way nested works right now is useful and it's confusing and inconsistent as a default.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Fri, 06 Jun 2025 17:01:01 GMT) Full text and rfc822 format available.

Message #17 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Troy Brown <brownts <at> troybrown.dev>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: Yuan Fu <casouri <at> gmail.com>, 78703 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Fri, 6 Jun 2025 13:00:22 -0400

Daniel Colascione <dancol <at> dancol.org> writes:

> On June 6, 2025 12:23:23 AM PDT, Yuan Fu <casouri <at> gmail.com> wrote:
>>
>>
>>> On Jun 6, 2025, at 12:01 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>>>
>>>> From: Daniel Colascione <dancol <at> dancol.org>
>>>> Date: Thu, 05 Jun 2025 16:40:03 -0700
>>>>
>>>> Right now, C-M-a runs treesit-beginning-of-defun which goes to the
>>>> start of the previous defun in buffer text, not the enclosing defun.
>>>>
>>>> If we have this program:
>>>>
>>>>    1 function foo() {
>>>>    2   function bar() {
>>>>    3     return 5
>>>>    4   }
>>>>    5   return 7
>>>>    6 }
>>>>
>>>>
>>>> and point is on line 5, then if we hit C-M-a, point goes to line 2, not
>>>> line 1.  In every single situation, when I use beginning-of-defun, I
>>>> intend to go to the start of my enclosing defun not the one that happens
>>>> to be previous in buffer linearization.
>>>
>>> I think you want to set treesit-defun-tactic to 'top-level’.
>>
>>Yes. Most likely top-level is what you want.
>>
>>Yuan
>>
>>
>
> I don't think that's right either --- if I'm on line 3, I should go to
> line 2, not 1. Go to beginning of defun should mean go to the
> beginning of *my* defun. That's the traditional behavior from cc-mode,
> js-mode, etc. and the one that makes most sense. I can see top-level
> being an option, but I don't think the way nested works right now is
> useful and it's confusing and inconsistent as a default.

I agree.  This behavior is not intuitive at all.  I reported this in
detail in Bug#68664, but it never seemed to go anywhere.  Furthermore,
using a `treesit-defun-tactic` of `top-level` doesn't work when you
are arbitrarily nested and just want to go to the beginning of the
function containing point.

This behavior not only impacts interactive use of
`treesit-beginning-of-defun` (via "C-M-a"), but other functionality
which builds upon `beginning-of-defun`, most notably
`prog-fill-reindent-defun`.  For the Tree-sitter modes that I
maintain, I had to create a mode-specific version of
`prog-fill-reindent-defun` instead.  That's because, as written,
`prog-fill-reindent-defun` uses `beginning-of-defun`, and since in
cases like this, `treesit-beginning-of-defun' goes to the previous
nested function, rather than the function containing point, you end up
re-indenting the previous nested function rather than the function
containing point.  The mode-specific version that I created, instead
uses `treesit-defun-at-point` to locate the containing function.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Sat, 07 Jun 2025 06:10:01 GMT) Full text and rfc822 format available.

Message #20 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Troy Brown <brownts <at> troybrown.dev>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Daniel Colascione <dancol <at> dancol.org>,
 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Fri, 6 Jun 2025 23:08:57 -0700


> On Jun 6, 2025, at 10:00 AM, Troy Brown <brownts <at> troybrown.dev> wrote:
> 
> Daniel Colascione <dancol <at> dancol.org> writes:
> 
>> On June 6, 2025 12:23:23 AM PDT, Yuan Fu <casouri <at> gmail.com> wrote:
>>> 
>>> 
>>>> On Jun 6, 2025, at 12:01 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>>>> 
>>>>> From: Daniel Colascione <dancol <at> dancol.org>
>>>>> Date: Thu, 05 Jun 2025 16:40:03 -0700
>>>>> 
>>>>> Right now, C-M-a runs treesit-beginning-of-defun which goes to the
>>>>> start of the previous defun in buffer text, not the enclosing defun.
>>>>> 
>>>>> If we have this program:
>>>>> 
>>>>>   1 function foo() {
>>>>>   2   function bar() {
>>>>>   3     return 5
>>>>>   4   }
>>>>>   5   return 7
>>>>>   6 }
>>>>> 
>>>>> 
>>>>> and point is on line 5, then if we hit C-M-a, point goes to line 2, not
>>>>> line 1.  In every single situation, when I use beginning-of-defun, I
>>>>> intend to go to the start of my enclosing defun not the one that happens
>>>>> to be previous in buffer linearization.
>>>> 
>>>> I think you want to set treesit-defun-tactic to 'top-level’.
>>> 
>>> Yes. Most likely top-level is what you want.
>>> 
>>> Yuan
>>> 
>>> 
>> 
>> I don't think that's right either --- if I'm on line 3, I should go to
>> line 2, not 1. Go to beginning of defun should mean go to the
>> beginning of *my* defun. That's the traditional behavior from cc-mode,
>> js-mode, etc. and the one that makes most sense. I can see top-level
>> being an option, but I don't think the way nested works right now is
>> useful and it's confusing and inconsistent as a default.
> 
> I agree.  This behavior is not intuitive at all.  I reported this in
> detail in Bug#68664, but it never seemed to go anywhere.  Furthermore,
> using a `treesit-defun-tactic` of `top-level` doesn't work when you
> are arbitrarily nested and just want to go to the beginning of the
> function containing point.
> 
> This behavior not only impacts interactive use of
> `treesit-beginning-of-defun` (via "C-M-a"), but other functionality
> which builds upon `beginning-of-defun`, most notably
> `prog-fill-reindent-defun`.  For the Tree-sitter modes that I
> maintain, I had to create a mode-specific version of
> `prog-fill-reindent-defun` instead.  That's because, as written,
> `prog-fill-reindent-defun` uses `beginning-of-defun`, and since in
> cases like this, `treesit-beginning-of-defun' goes to the previous
> nested function, rather than the function containing point, you end up
> re-indenting the previous nested function rather than the function
> containing point.  The mode-specific version that I created, instead
> uses `treesit-defun-at-point` to locate the containing function.

So for this tactic, point should move out of the enclosing defun if it is inside a defun; and if point isn’t inside any defun it should move to the previous defund-beginning?

Any suggestions for a good name for this tactic? Right now we have `nested` and `top-level`.

Yuan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Sat, 07 Jun 2025 07:15:01 GMT) Full text and rfc822 format available.

Message #23 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 78703 <at> debbugs.gnu.org, brownts <at> troybrown.dev, dancol <at> dancol.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Sat, 07 Jun 2025 10:14:34 +0300

> From: Yuan Fu <casouri <at> gmail.com>
> Date: Fri, 6 Jun 2025 23:08:57 -0700
> Cc: Daniel Colascione <dancol <at> dancol.org>,
>  Eli Zaretskii <eliz <at> gnu.org>,
>  78703 <at> debbugs.gnu.org
> 
> So for this tactic, point should move out of the enclosing defun if it is inside a defun; and if point isn’t inside any defun it should move to the previous defund-beginning?
> 
> Any suggestions for a good name for this tactic? Right now we have `nested` and `top-level`.

Something like 'same-level'?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Sat, 07 Jun 2025 14:04:02 GMT) Full text and rfc822 format available.

Message #26 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Troy Brown <brownts <at> troybrown.dev>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Daniel Colascione <dancol <at> dancol.org>,
 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Sat, 7 Jun 2025 10:03:24 -0400

On Sat, Jun 7, 2025 at 2:09 AM Yuan Fu <casouri <at> gmail.com> wrote:
>
> So for this tactic, point should move out of the enclosing defun if it is inside a defun; and if point isn’t inside any defun it should move to the previous defund-beginning?

Care must be taken when talking about being "inside" or "not inside" a
defun.  Unless point is before, after, or between top-level defuns, it
will always be inside some defun.  I previously suggested
differentiating behavior based on whether point was at a defun
boundary.  By boundary, I mean point is either immediately before or
immediately after a defun.

It's also important to point out that classes, namespaces, packages,
etc. could all be defuns, not just functions.  Therefore the nesting
as being described is not a pathological exercise, but a much more
common occurrence than one might initially consider.

As far as behavior, I think the way I've observed it working in
non-Tree-sitter modes was more intuitive than the current Tree-sitter
behavior (but I haven't exhaustively checked).  Additionally, I think
it should work as described in the documentation.   Much of what I'm
saying was already expressed in Bug#68664.  There, I tried to codify
how I believed the non-Tree-sitter modes were behaving and how
intuitively I thought it should work for Tree-sitter modes.

Regardless of what tactic is configured, `treesit-beginning-of-defun`
must go to the beginning of the enclosing defun when point is not at a
defun boundary.  I think this is fundamental or the assumptions made
when using `beginning-of-defun` in the general case no longer hold
(such as its use in `prog-fill-reindent-defun`).  I think the tactic
should only come into play when point is already at a defun boundary.

After that, I'm less concerned about its behavior, but think the
current `nested` behavior of visiting the previous sibling defun until
there are no more, then visiting the parent defun makes sense.  The
key difference is that you only visit the previous sibling if point is
at a defun boundary...otherwise visit the defun containing point.

Furthermore, consider the following `top-level` tactic example using
`c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
defun for C++) as well as a "doSomething" function defun within the
namespace.  If we place point on the "printf" and press `M-q`,
triggering `prog-fill-reindent-defun`, we'll see that the entire
namespace has been re-indented (including "int i;").  I'd argue that
was not intuitively what I would have expected to happen.  Instead, I
would have expected only "doSomething" to have been re-indented.

```cpp
// -*- mode: c++-ts
#include <cstdio>

namespace Hello
{
int i;

  void doSomething(void)
  {
printf("doSomething\n");
  }
};

// Local Variables:
// treesit-defun-tactic: top-level
// End:
```

If the behavior of `treesit-beginning-of-defun` was to first move to
the beginning of the enclosing defun when point is not at a defun
boundary (as previously described), the expected behavior of only
reindenting "doSomething" would have occurred.  If
`treesit-beginning-of-defun` worked the way I propose, and you really
wanted to re-indent the entire namespace, it would have been as simple
as moving to the beginning of the enclosing defun and triggering
`prog-fill-reindent-defun` there (i.e., `C-M-a` `M-q`).

> Any suggestions for a good name for this tactic? Right now we have `nested` and `top-level`.

See above.  I don't think a new tactic is necessary, just a change in
how `treesit-beginning-of-defun` works when point is not at a defun
boundary, regardless of the configured tactic.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Tue, 10 Jun 2025 06:03:02 GMT) Full text and rfc822 format available.

Message #29 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Troy Brown <brownts <at> troybrown.dev>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Daniel Colascione <dancol <at> dancol.org>,
 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Mon, 9 Jun 2025 23:02:33 -0700


> On Jun 7, 2025, at 7:03 AM, Troy Brown <brownts <at> troybrown.dev> wrote:
> 
> On Sat, Jun 7, 2025 at 2:09 AM Yuan Fu <casouri <at> gmail.com> wrote:
>> 
>> So for this tactic, point should move out of the enclosing defun if it is inside a defun; and if point isn’t inside any defun it should move to the previous defund-beginning?
> 
> Care must be taken when talking about being "inside" or "not inside" a
> defun.  Unless point is before, after, or between top-level defuns, it
> will always be inside some defun.  I previously suggested
> differentiating behavior based on whether point was at a defun
> boundary.  By boundary, I mean point is either immediately before or
> immediately after a defun.
> 
> It's also important to point out that classes, namespaces, packages,
> etc. could all be defuns, not just functions.  Therefore the nesting
> as being described is not a pathological exercise, but a much more
> common occurrence than one might initially consider.
> 
> As far as behavior, I think the way I've observed it working in
> non-Tree-sitter modes was more intuitive than the current Tree-sitter
> behavior (but I haven't exhaustively checked).  Additionally, I think
> it should work as described in the documentation.   Much of what I'm
> saying was already expressed in Bug#68664.  There, I tried to codify
> how I believed the non-Tree-sitter modes were behaving and how
> intuitively I thought it should work for Tree-sitter modes.
> 
> Regardless of what tactic is configured, `treesit-beginning-of-defun`
> must go to the beginning of the enclosing defun when point is not at a
> defun boundary.  I think this is fundamental or the assumptions made
> when using `beginning-of-defun` in the general case no longer hold
> (such as its use in `prog-fill-reindent-defun`).  I think the tactic
> should only come into play when point is already at a defun boundary.

Consider that existing code in Emacs isn’t set in stone, it’s not really that fundamental that beginning-of-defun must work in a way that makes prog-fill-reindent-defun behave desirably. prog-fill-reindent-defun uses beginning-of-defun because we don’t have better choices before tree-sitter. In tree-sitter major modes, what we’ve been doing is to make the existing commands customizable so tree-sitter can provide a tree-sitter version of it. We’ve done this for forward-sexp: we added forward-sexp-function. Some commands already have customization points long ago, like beginning-of-defun, which has beginning-of-defun-function.

So I added prog-fill-reindent-defun-function and a tree-sitter version treesit-fill-reindent-defun. The tree-sitter implementation uses treesit-defun-at-point, so it doesn’t even need to concern with tactics.

Now in tree-sitter major modes, prog-fill-reindent-defun should always indent the enclosing defun.

> After that, I'm less concerned about its behavior, but think the
> current `nested` behavior of visiting the previous sibling defun until
> there are no more, then visiting the parent defun makes sense.  The
> key difference is that you only visit the previous sibling if point is
> at a defun boundary...otherwise visit the defun containing point.

Hmmm, it doesn’t feel very convenient, you’d need to first adjust your point to be precisely at the boundary, then press C-M-a/e? IMO that adds too much overhead. I added a tactic `parent-first` that always move to the beginning/end of the enclosing defun. People that prefers this kind of defun movement can use this tactic.

> 
> Furthermore, consider the following `top-level` tactic example using
> `c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
> defun for C++) as well as a "doSomething" function defun within the
> namespace.  If we place point on the "printf" and press `M-q`,
> triggering `prog-fill-reindent-defun`, we'll see that the entire
> namespace has been re-indented (including "int i;").  I'd argue that
> was not intuitively what I would have expected to happen.  Instead, I
> would have expected only "doSomething" to have been re-indented.
> 
> ```cpp
> // -*- mode: c++-ts
> #include <cstdio>
> 
> namespace Hello
> {
> int i;
> 
>  void doSomething(void)
>  {
> printf("doSomething\n");
>  }
> };
> 
> // Local Variables:
> // treesit-defun-tactic: top-level
> // End:
> ```
> 
> If the behavior of `treesit-beginning-of-defun` was to first move to
> the beginning of the enclosing defun when point is not at a defun
> boundary (as previously described), the expected behavior of only
> reindenting "doSomething" would have occurred.  If
> `treesit-beginning-of-defun` worked the way I propose, and you really
> wanted to re-indent the entire namespace, it would have been as simple
> as moving to the beginning of the enclosing defun and triggering
> `prog-fill-reindent-defun` there (i.e., `C-M-a` `M-q`).
> 
>> Any suggestions for a good name for this tactic? Right now we have `nested` and `top-level`.
> 
> See above.  I don't think a new tactic is necessary, just a change in
> how `treesit-beginning-of-defun` works when point is not at a defun
> boundary, regardless of the configured tactic.

Yuan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Tue, 10 Jun 2025 07:25:02 GMT) Full text and rfc822 format available.

Message #32 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Troy Brown <brownts <at> troybrown.dev>,
 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Tue, 10 Jun 2025 00:24:54 -0700

Yuan Fu <casouri <at> gmail.com> writes:

>> On Jun 7, 2025, at 7:03 AM, Troy Brown <brownts <at> troybrown.dev> wrote:
>> 
>> On Sat, Jun 7, 2025 at 2:09 AM Yuan Fu <casouri <at> gmail.com> wrote:
>>> 
>>> So for this tactic, point should move out of the enclosing defun if
>>> it is inside a defun; and if point isn’t inside any defun it should
>>> move to the previous defund-beginning?
>> 
>> Care must be taken when talking about being "inside" or "not inside" a
>> defun.  Unless point is before, after, or between top-level defuns, it
>> will always be inside some defun.  I previously suggested
>> differentiating behavior based on whether point was at a defun
>> boundary.  By boundary, I mean point is either immediately before or
>> immediately after a defun.
>> 
>> It's also important to point out that classes, namespaces, packages,
>> etc. could all be defuns, not just functions.  Therefore the nesting
>> as being described is not a pathological exercise, but a much more
>> common occurrence than one might initially consider.
>> 
>> As far as behavior, I think the way I've observed it working in
>> non-Tree-sitter modes was more intuitive than the current Tree-sitter
>> behavior (but I haven't exhaustively checked).  Additionally, I think
>> it should work as described in the documentation.   Much of what I'm
>> saying was already expressed in Bug#68664.  There, I tried to codify
>> how I believed the non-Tree-sitter modes were behaving and how
>> intuitively I thought it should work for Tree-sitter modes.
>> 
>> Regardless of what tactic is configured, `treesit-beginning-of-defun`
>> must go to the beginning of the enclosing defun when point is not at a
>> defun boundary.  I think this is fundamental or the assumptions made
>> when using `beginning-of-defun` in the general case no longer hold
>> (such as its use in `prog-fill-reindent-defun`).  I think the tactic
>> should only come into play when point is already at a defun boundary.
>
> Consider that existing code in Emacs isn’t set in stone, it’s not
> really that fundamental that beginning-of-defun must work in a way
> that makes prog-fill-reindent-defun behave
> desirably.

Experimenting with a different UI paradigm doesn't justify introducing
an inconsistency to behavior that's worked well for literally decades.
If I'm using, say, c++-ts-mode, my navigation commands should do the
same thing they do in c++-mode.  Plenty of code as well as muscle
memories rely on this behavior.

The "tactic" concept is an unnecessary layer of indirection.
If operation A takes you to place X and operation B takes you to
different place Y, the way to express the difference between operations
A and B is to make them _different commands_, not by twiddling some
global switch.

When you change tree-sitter "strategies" right now, you're silently
turning one command into another command, that's confusing for everyone.
Please give these "strategies" individual command names.  We have
beginning-of-defun and beginning-of-defun-comments, not a knob that
alters beginning-of-defun.

> prog-fill-reindent-defun uses beginning-of-defun because we
> don’t have better choices before tree-sitter. In tree-sitter major
> modes, what we’ve been doing is to make the existing commands
> customizable so tree-sitter can provide a tree-sitter version of
> it.

Why does there need to be a tree-sitter version of
prog-fill-reindent-defun?  Isn't it enough that tree-sitter provide the
low-level syntactic analysis for prog-fill-reindent-defun to do its job?
Why the high level hook?

> We’ve done this for forward-sexp: we added
> forward-sexp-function. Some commands already have customization points
> long ago, like beginning-of-defun, which has
> beginning-of-defun-function.

Modes use generally these "customization points" to _implement_ the
familiar behavior, not to give them random different
user-visible semantics.

> So I added prog-fill-reindent-defun-function and a tree-sitter version
> treesit-fill-reindent-defun. The tree-sitter implementation uses
> treesit-defun-at-point, so it doesn’t even need to concern
> with tactics.
>
> Now in tree-sitter major modes, prog-fill-reindent-defun should always
> indent the enclosing defun.

Which now means prog-fill-reindent-defun can indent something other than
what mark-defun highlights?  That seems odd to me.  Tree sitter's job is
syntactic analysis, not UI differentiation.

>> After that, I'm less concerned about its behavior, but think the
>> current `nested` behavior of visiting the previous sibling defun until
>> there are no more, then visiting the parent defun makes sense.  The
>> key difference is that you only visit the previous sibling if point is
>> at a defun boundary...otherwise visit the defun containing point.
>
> Hmmm, it doesn’t feel very convenient, you’d need to first adjust your
> point to be precisely at the boundary, then press C-M-a/e? IMO that
> adds too much overhead. I added a tactic `parent-first` that always
> move to the beginning/end of the enclosing defun. People that prefers
> this kind of defun movement can use this tactic.

The default should be to match behavior that's been stable for decades.
Use of tree sitter should be an implementation detail for users.

If we want to provide UI to better handle nested defuns, this UI should
go in prog-mode.el and rely on mode-provided syntactic analysis, not
just delegate to a mode function that does different random stuff in
each mode.

>> Furthermore, consider the following `top-level` tactic example using
>> `c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
>> defun for C++)

Namespaces aren't defuns and c++-ts-mode shouldn't be indenting their
contents by a level either.  c++-ts-mode is unusable without hacks like
pragmatically editing the indentation rules in user configuration.

>> as well as a "doSomething" function defun within the
>> namespace.  If we place point on the "printf" and press `M-q`,
>> triggering `prog-fill-reindent-defun`, we'll see that the entire
>> namespace has been re-indented (including "int i;").  I'd argue that
>> was not intuitively what I would have expected to happen.  Instead, I
>> would have expected only "doSomething" to have been re-indented.

Yes, because a namespace isn't a defun.

>> If the behavior of `treesit-beginning-of-defun` was to first move to
>> the beginning of the enclosing defun when point is not at a defun
>> boundary (as previously described), the expected behavior of only
>> reindenting "doSomething" would have occurred.  If
>> `treesit-beginning-of-defun` worked the way I propose, and you really
>> wanted to re-indent the entire namespace, it would have been as simple
>> as moving to the beginning of the enclosing defun and triggering
>> `prog-fill-reindent-defun` there (i.e., `C-M-a` `M-q`).
>> 
>>> Any suggestions for a good name for this tactic? Right now we have `nested` and `top-level`.
>> 
>> See above.  I don't think a new tactic is necessary, just a change in
>> how `treesit-beginning-of-defun` works when point is not at a defun
>> boundary, regardless of the configured tactic.
>
> Yuan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Tue, 10 Jun 2025 12:13:02 GMT) Full text and rfc822 format available.

Message #35 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Tue, 10 Jun 2025 15:12:22 +0300

> From: Daniel Colascione <dancol <at> dancol.org>
> Cc: Troy Brown <brownts <at> troybrown.dev>,  Eli Zaretskii <eliz <at> gnu.org>,
>   78703 <at> debbugs.gnu.org
> Date: Tue, 10 Jun 2025 00:24:54 -0700
> 
> If I'm using, say, c++-ts-mode, my navigation commands should do the
> same thing they do in c++-mode.  Plenty of code as well as muscle
> memories rely on this behavior.

I'm not sure I agree.  That c++-mode behaved like that doesn't mean
it's the last word, or that nothing can be improved in that behavior.

In addition, TS-based modes make certain behaviors very hard (at least
not if we base it on the parser information), and OTOH make certain
behaviors very easy that were hard with the "traditional" modes.  So
we should keep an open mind about these aspects, and not automatically
demand 110% compatibility to past behavior.

> The "tactic" concept is an unnecessary layer of indirection.

From where I stand, it's a new feature that was unavailable in non-TS
implementation.

> If operation A takes you to place X and operation B takes you to
> different place Y, the way to express the difference between operations
> A and B is to make them _different commands_, not by twiddling some
> global switch.

But if the semantics of a command is ambiguous, then a switch makes
perfect sense.  In this case, what exactly "beginning of defun" means
when there are nested defuns is ambiguous.

> When you change tree-sitter "strategies" right now, you're silently
> turning one command into another command, that's confusing for everyone.
> Please give these "strategies" individual command names.  We have
> beginning-of-defun and beginning-of-defun-comments, not a knob that
> alters beginning-of-defun.

I don't want to memorize two commands when one will do.  That's why we
have the various optional behaviors of commands and DWIM-ish
variations in their behavior.

> > prog-fill-reindent-defun uses beginning-of-defun because we
> > don’t have better choices before tree-sitter. In tree-sitter major
> > modes, what we’ve been doing is to make the existing commands
> > customizable so tree-sitter can provide a tree-sitter version of
> > it.
> 
> Why does there need to be a tree-sitter version of
> prog-fill-reindent-defun?

Because the way to get the indentation information from tree-sitter is
significantly different from the ad-hoc ways we do that in
"traditional" modes.

> Isn't it enough that tree-sitter provide the
> low-level syntactic analysis for prog-fill-reindent-defun to do its job?
> Why the high level hook?

How do you implement anything like c-set-offset or indentation styles
based only on low-level syntactic analysis? where will the rest of the
necessary information come from, and who and how will apply it?

> > We’ve done this for forward-sexp: we added
> > forward-sexp-function. Some commands already have customization points
> > long ago, like beginning-of-defun, which has
> > beginning-of-defun-function.
> 
> Modes use generally these "customization points" to _implement_ the
> familiar behavior, not to give them random different
> user-visible semantics.

I think the point in the above example was that the semantic of "sexp"
is ambiguous in any language that is not Lisp.  That was (and still
is) the hard part of figuring out how forward-sexp should behave in
TS-based modes.  (In non-TS modes the behavior is just arbitrary
nonsense, if you ask me.)

> > So I added prog-fill-reindent-defun-function and a tree-sitter version
> > treesit-fill-reindent-defun. The tree-sitter implementation uses
> > treesit-defun-at-point, so it doesn’t even need to concern
> > with tactics.
> >
> > Now in tree-sitter major modes, prog-fill-reindent-defun should always
> > indent the enclosing defun.
> 
> Which now means prog-fill-reindent-defun can indent something other than
> what mark-defun highlights?

They did subtly different things since long ago.  It's clearly visible
in the code.

> Tree sitter's job is syntactic analysis, not UI differentiation.

The way we use syntactic information in this commands is a leaky
abstraction: the syntax aspects leak into the UI.  So it is a small
wonder that tree-sitter affects the UI in some (relatively minor)
ways.

> The default should be to match behavior that's been stable for decades.

As I tried to explain above, I don't necessarily agree.

> Use of tree sitter should be an implementation detail for users.

Since the introduction of tree-sitter based capabilities into Emacs,
we've learned that this simply doesn't work, not in Emacs.  Syntax and
semantics leak into our UI, and tree-sitter deals with syntactic and
semantic information that is sometimes very different from what, e.g.,
syntax-ppss and friends let us use.

So I do understands where you are coming from, but experience taught
us that it cannot work that way in Emacs.  If we were designing Emacs
from scratch today, perhaps we could have done that in a way that
would avoid these leaks, but we are not there.

> If we want to provide UI to better handle nested defuns, this UI should
> go in prog-mode.el and rely on mode-provided syntactic analysis, not
> just delegate to a mode function that does different random stuff in
> each mode.

That'd be a massive rewrite of gobs of existing code, I'm afraid.  I
invite you to take a look at the existing code and see how it mixes
syntax with UI.  That's even visible at the level of the command
names: "sexp" only makes sense in Lisp, and the notion of "balanced
parens" has no place in languages without brackets and braces.

> >> Furthermore, consider the following `top-level` tactic example using
> >> `c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
> >> defun for C++)
> 
> Namespaces aren't defuns and c++-ts-mode shouldn't be indenting their
> contents by a level either.

But c++-mode does indent them.  Doesn't this contradict what you said
about following past practices?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Tue, 10 Jun 2025 15:51:05 GMT) Full text and rfc822 format available.

Message #38 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in typescript-ts-mode
Date: Tue, 10 Jun 2025 08:49:48 -0700

On June 10, 2025 5:12:22 AM PDT, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> From: Daniel Colascione <dancol <at> dancol.org>
>> Cc: Troy Brown <brownts <at> troybrown.dev>,  Eli Zaretskii <eliz <at> gnu.org>,
>>   78703 <at> debbugs.gnu.org
>> Date: Tue, 10 Jun 2025 00:24:54 -0700
>> 
>> If I'm using, say, c++-ts-mode, my navigation commands should do the
>> same thing they do in c++-mode.  Plenty of code as well as muscle
>> memories rely on this behavior.
>
>I'm not sure I agree.  That c++-mode behaved like that doesn't mean
>it's the last word, or that nothing can be improved in that behavior.

It's not just c++-mode. It's how most modes have behaved.

>In addition, TS-based modes make certain behaviors very hard (at least
>not if we base it on the parser information), 

Very hard how? You can use tree sitter like a more powerful parse partial sexp. It provides strictly more information. There is no behavior whatsoever that's harder to implement because tree sitter is giving you more information. 

> and OTOH make certain
>behaviors very easy that were hard with the "traditional" modes.  So
>we should keep an open mind about these aspects, and not automatically
>demand 110% compatibility to past behavior.
>
>> The "tactic" concept is an unnecessary layer of indirection.
>
>From where I stand, it's a new feature that was unavailable in non-TS
>implementation.

It's not a feature. It's a UI and programming annoyance. How are you supposed to write code against functions with behavior that shifts on a whim with no stable functions to call instead?

What am I supposed to do, let-bind every possible value around every function call? Why have functions then. Let's just have one function with a strategy.

(let ((dwim-strategy 'call-process)) (dwim "date -R"))

(let ((dwim-strategy 'switch-to-buffer)) (dwim "*scratch*"))

We already have a knob for users to express the concept of what happens when a key is pressed: the command binding mechanism.

org-mode is annoying in this way too. I'm in org mode. I want to see what, say, C-tab does. I type C-h c C-tab and I get something like "org-dwim-control-tab".

Yeah, that's useful.

Why bother having keymaps at all? Org made an inner platform for key binding. Inner platforms are bad and hurt generality.

>> If operation A takes you to place X and operation B takes you to
>> different place Y, the way to express the difference between operations
>> A and B is to make them _different commands_, not by twiddling some
>> global switch.
>
>But if the semantics of a command is ambiguous, then a switch makes
>perfect sense.  In this case, what exactly "beginning of defun" means
>when there are nested defuns is ambiguous.

Yet we use the concept of command names to express different concepts elsewhere. And if the concept is ambiguously defined, provide a minimal knob to adjust that concept, not change the operation of primitives to be inconsistent with each other. 

>> When you change tree-sitter "strategies" right now, you're silently
>> turning one command into another command, that's confusing for everyone.
>> Please give these "strategies" individual command names.  We have
>> beginning-of-defun and beginning-of-defun-comments, not a knob that
>> alters beginning-of-defun.
>
>I don't want to memorize two commands when one will do.  That's why we
>have the various optional behaviors of commands and DWIM-ish
>variations in their behavior.

This isn't DW*I*M and it's hard to imagine the current default of going to the previous lexical function beginning being what many people mean.

>> > prog-fill-reindent-defun uses beginning-of-defun because we
>> > don’t have better choices before tree-sitter. In tree-sitter major
>> > modes, what we’ve been doing is to make the existing commands
>> > customizable so tree-sitter can provide a tree-sitter version of
>> > it.
>> 
>> Why does there need to be a tree-sitter version of
>> prog-fill-reindent-defun?
>
>Because the way to get the indentation information from tree-sitter is
>significantly different from the ad-hoc ways we do that in
>"traditional" modes.

No it isn't. If the defun navigation functions in TS modes had their traditional behavior, they'd continue to work for higher level constructs built on top of them like the prog-mode reindent and mark defun. TS modes broke a whole bunch of things that had worked fine for decades, and instead of fixing them, they just made even more abstractions to plug inconsistent tree sitter things in place of the broken things.

You apply this procedure repeatedly and you get a new editor, and going by the defaults I've seen from the TS modes, it's not a better editor. 

>> Isn't it enough that tree-sitter provide the
>> low-level syntactic analysis for prog-fill-reindent-defun to do its job?
>> Why the high level hook?
>
>How do you implement anything like c-set-offset or indentation styles
>based only on low-level syntactic analysis? 

By using TS to implement c-guess-basic-syntax and friends. cc-mode indentation styles are clear expressions of user intent. No reason at all TS modes couldn't respect this intent and merely implement it a different way. Want to know whether you're after a class? Inside a namespace? Where a declaration begins? You have an AST right there!

> where will the rest of the
>necessary information come from, and who and how will apply it?

From the AST. Where else?

>> > We’ve done this for forward-sexp: we added
>> > forward-sexp-function. Some commands already have customization points
>> > long ago, like beginning-of-defun, which has
>> > beginning-of-defun-function.
>> 
>> Modes use generally these "customization points" to _implement_ the
>> familiar behavior, not to give them random different
>> user-visible semantics.
>
>I think the point in the above example was that the semantic of "sexp"
>is ambiguous in any language that is not Lisp.  That was (and still
>is) the hard part of figuring out how forward-sexp should behave in
>TS-based modes.  (In non-TS modes the behavior is just arbitrary
>nonsense, if you ask me.)

Yes, and because it's ambiguous we get annoyances like python-mode's default sexp movement. Now every mode is like that, and you can't turn it off half the time?

The key is the *relationships* better the commands that help users form mental models of what their actions are going to do. For example, if blink-paren-mode highlights the other end of some balanced construct, forward or backward-sexp will take you there. Easy to learn and predict. Likewise, beginning of defun should move to the start of the point that mark-defun highlights, and indent defun should indent the same part of the buffer that mark-defun highlights.

That's why it's just weird to have a TS hook specifically for indenting a defun: it just invites the kind of inconsistency that makes the system hard to reason about and annoying to work with.

>> Tree sitter's job is syntactic analysis, not UI differentiation.
>
>The way we use syntactic information in this commands is a leaky
>abstraction: the syntax aspects leak into the UI. 

It doesn't have to. There is nothing about the additional information TS provides that *forces* you to implement beginning-of-defun in a way that fails to respect program hierarchy. That was a choice, and the existence of this "strategy" system shows it.

 So it is a small
>wonder that tree-sitter affects the UI in some (relatively minor)
>ways.

No, it breaks the UI in unnecessary ways.

>> The default should be to match behavior that's been stable for decades.
>
>As I tried to explain above, I don't necessarily agree.
>
>> Use of tree sitter should be an implementation detail for users.
>
>Since the introduction of tree-sitter based capabilities into Emacs,
>we've learned that this simply doesn't work, not in Emacs.  

It would work fine if people cared about UI consistency. Subtle differences in semantic analysis are to be expected. Gross behavioral differences in long-stable and otherwise consistent commands are not.

That's like saying cars have inconsistent varying UIs, so when you should buy an electric car, you should *expect* the steering wheel to be in the right, rear seat. Hey, power train is a leaky abstraction!

> Syntax and
>semantics leak into our UI, and tree-sitter deals with syntactic and
>semantic information that is sometimes very different from what, e.g.,
>syntax-ppss and friends let us use.
>
>So I do understands where you are coming from, but experience taught
>us that it cannot work that way in Emacs.  If we were designing Emacs
>from scratch today, perhaps we could have done that in a way that
>would avoid these leaks, but we are not there.
>
>> If we want to provide UI to better handle nested defuns, this UI should
>> go in prog-mode.el and rely on mode-provided syntactic analysis, not
>> just delegate to a mode function that does different random stuff in
>> each mode.
>
>That'd be a massive rewrite of gobs of existing code, I'm afraid. 

Would it? How? You'd start with a baseline no different way and let gradually increasing lexical and syntactic knowledge provided by modes (perhaps using TS as a backend) add capabilities. forward-class, for example, might just signal in modes that didn't provide a definition of that construct.

Yes, languages have different ideas about what constitutes a function, but the idea of nesting constructs is common enough across languages that it ought to come with a roughly consistent way to navigate them.

> I
>invite you to take a look at the existing code and see how it mixes
>syntax with UI.  That's even visible at the level of the command
>names: "sexp" only makes sense in Lisp, and the notion of "balanced
>parens" has no place in languages without brackets and braces.
>
>> >> Furthermore, consider the following `top-level` tactic example using
>> >> `c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
>> >> defun for C++)
>> 
>> Namespaces aren't defuns and c++-ts-mode shouldn't be indenting their
>> contents by a level either.
>
>But c++-mode does indent them.  Doesn't this contradict what you said
>about following past practices?

In c++-mode, I can turn it off with a documented user knob. In c++-ts-mode, I have to write fragile hacks to monkeypatch mode internals. They're not the same thing. And no matter what indent style I choose in c++-mode, namespace isn't magically a defun.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Tue, 10 Jun 2025 16:20:07 GMT) Full text and rfc822 format available.

Message #41 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Tue, 10 Jun 2025 19:19:03 +0300

> Date: Tue, 10 Jun 2025 08:49:48 -0700
> From: Daniel Colascione <dancol <at> dancol.org>
> CC: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
> 
> >> If I'm using, say, c++-ts-mode, my navigation commands should do the
> >> same thing they do in c++-mode.  Plenty of code as well as muscle
> >> memories rely on this behavior.
> >
> >I'm not sure I agree.  That c++-mode behaved like that doesn't mean
> >it's the last word, or that nothing can be improved in that behavior.
> 
> It's not just c++-mode. It's how most modes have behaved.
> 
> >In addition, TS-based modes make certain behaviors very hard (at least
> >not if we base it on the parser information), 
> 
> Very hard how? You can use tree sitter like a more powerful parse partial sexp. It provides strictly more information.

Not really, not when you look closely.  The tools we've built before
tree-sitter are ad-hoc, so they allow us to provide information that
parsers don't have and don't need to have.  Our syntax tables are not
exactly "less efficient parsing", and regular expressions allow us to
match whatever we want and call that anything we want.

Take the DEFUN recognition by CC mode as an example.  Tree-sitter
knows nothing about them.

So "strictly more information" is perhaps an expectation, but it
breaks at closer looking.

> There is no behavior whatsoever that's harder to implement because tree sitter is giving you more information. 

I invite you to look at c-ts-mode sources.  You will see plenty of
what was "harder to implement".  We still lack some useful
functionalities that are present in CC Mode, for that very reason;
what was easy to implement was done long ago.

> >> The "tactic" concept is an unnecessary layer of indirection.
> >
> >From where I stand, it's a new feature that was unavailable in non-TS
> >implementation.
> 
> It's not a feature. It's a UI and programming annoyance. How are you supposed to write code against functions with behavior that shifts on a whim with no stable functions to call instead?

We've been doing that since day one: you write code that looks at the
variables to figure out what behavior to expect, or you write code
that is general enough to not care.

> What am I supposed to do, let-bind every possible value around every function call?

Sometimes, yes.  Although hopefully not so frequently and not "every
possible value".

> Why have functions then. Let's just have one function with a strategy.

Arguments "ad absurdum" are not always useful.  In this case, no one
is calling for such an extremity.  But sometimes this has to be done.

> >But if the semantics of a command is ambiguous, then a switch makes
> >perfect sense.  In this case, what exactly "beginning of defun" means
> >when there are nested defuns is ambiguous.
> 
> Yet we use the concept of command names to express different concepts elsewhere. And if the concept is ambiguously defined, provide a minimal knob to adjust that concept, not change the operation of primitives to be inconsistent with each other. 

I think we do the former, or at least we try.

> >> Why does there need to be a tree-sitter version of
> >> prog-fill-reindent-defun?
> >
> >Because the way to get the indentation information from tree-sitter is
> >significantly different from the ad-hoc ways we do that in
> >"traditional" modes.
> 
> No it isn't. If the defun navigation functions in TS modes had their traditional behavior, they'd continue to work for higher level constructs built on top of them like the prog-mode reindent and mark defun. TS modes broke a whole bunch of things that had worked fine for decades, and instead of fixing them, they just made even more abstractions to plug inconsistent tree sitter things in place of the broken things.

Indentation is a lot more than just navigation.  And I disagree with
you extreme interpretation of the current state of indentation and
navigation support in TS-based modes.

> >How do you implement anything like c-set-offset or indentation styles
> >based only on low-level syntactic analysis? 
> 
> By using TS to implement c-guess-basic-syntax and friends.

Did you look at the implementation of how c-set-offset encode
indentation information?  Did you try to think how to get the same
information from tree-sitter?  If you did, and found the way, how
about implementing c-ts-set-offset? I Think it's sorely missed.

> cc-mode indentation styles are clear expressions of user intent. No reason at all TS modes couldn't respect this intent and merely implement it a different way. Want to know whether you're after a class? Inside a namespace? Where a declaration begins? You have an AST right there!

Sorry, this is simplification.  A typical declaration breaks down into
smaller parts, and we have expectations and ideas about indentation of
each one of them.  But the tree-sitter classification of the AST
constituents does not necessarily make that easy, because you could
have the same syntactic symbol both inside a declaration and in other
places.  So having an AST does not always immediately tell you how to
indent correctly.

> > where will the rest of the
> >necessary information come from, and who and how will apply it?
> 
> From the AST. Where else?

See above.

> >I think the point in the above example was that the semantic of "sexp"
> >is ambiguous in any language that is not Lisp.  That was (and still
> >is) the hard part of figuring out how forward-sexp should behave in
> >TS-based modes.  (In non-TS modes the behavior is just arbitrary
> >nonsense, if you ask me.)
> 
> Yes, and because it's ambiguous we get annoyances like python-mode's default sexp movement. Now every mode is like that, and you can't turn it off half the time?

What else did you expect?  Some users like one style, others like the
other.  Are we supposed to say "my way or the highway"?  And that's
even before we consider that the disagreement cuts through the
developers themselves.

I find continuing this kind of argument not constructive, so I will
stop here.  Let me just say that I think you are looking at this stuff
from some semi-abstract, almost idealistic, aspect.  As if we didn't
have 40 years of development and user experience and expectations to
keep and uphold.

> >> >> Furthermore, consider the following `top-level` tactic example using
> >> >> `c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
> >> >> defun for C++)
> >> 
> >> Namespaces aren't defuns and c++-ts-mode shouldn't be indenting their
> >> contents by a level either.
> >
> >But c++-mode does indent them.  Doesn't this contradict what you said
> >about following past practices?
> 
> In c++-mode, I can turn it off with a documented user knob.

You've changed the subject.  But by all means, let's add such a knob
to c++-ts-mode, sure.

> In c++-ts-mode, I have to write fragile hacks to monkeypatch mode internals. They're not the same thing. And no matter what indent style I choose in c++-mode, namespace isn't magically a defun.

But beginning-of-defun nevertheless takes me to the beginning of the
namespace in c++-mode.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Tue, 10 Jun 2025 16:57:02 GMT) Full text and rfc822 format available.

Message #44 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in typescript-ts-mode
Date: Tue, 10 Jun 2025 09:55:59 -0700


On June 10, 2025 9:19:03 AM PDT, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> Date: Tue, 10 Jun 2025 08:49:48 -0700
>> From: Daniel Colascione <dancol <at> dancol.org>
>> CC: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
>> 
>> >> If I'm using, say, c++-ts-mode, my navigation commands should do the
>> >> same thing they do in c++-mode.  Plenty of code as well as muscle
>> >> memories rely on this behavior.
>> >
>> >I'm not sure I agree.  That c++-mode behaved like that doesn't mean
>> >it's the last word, or that nothing can be improved in that behavior.
>> 
>> It's not just c++-mode. It's how most modes have behaved.
>> 
>> >In addition, TS-based modes make certain behaviors very hard (at least
>> >not if we base it on the parser information), 
>> 
>> Very hard how? You can use tree sitter like a more powerful parse partial sexp. It provides strictly more information.
>
>Not really, not when you look closely.  The tools we've built before
>tree-sitter are ad-hoc, so they allow us to provide information that
>parsers don't have and don't need to have.  Our syntax tables are not
>exactly "less efficient parsing", and regular expressions allow us to
>match whatever we want and call that anything we want.

There is nothing one bit of information available to c-mode not available to c-ts-mode.

>Take the DEFUN recognition by CC mode as an example.  Tree-sitter
>knows nothing about them.
>
>So "strictly more information" is perhaps an expectation, but it
>breaks at closer looking.
>
>> There is no behavior whatsoever that's harder to implement because tree sitter is giving you more information. 
>
>I invite you to look at c-ts-mode sources.  You will see plenty of
>what was "harder to implement".

That's a choice.

> We still lack some useful
>functionalities that are present in CC Mode, forrea that very reason;
>what was easy to implement was done long ago.

That's the opposite of reality. Alan and others have spent years building a flexible and fast backtracking syntactic analyser for CC mode. Tree sitter does the same thing but in a more general way, in native code for better performance. Its availability makes doing what cc-mode does easier, not harder.

>> >> The "tactic" concept is an unnecessary layer of indirection.
>> >
>> >From where I stand, it's a new feature that was unavailable in non-TS
>> >implementation.
>> 
>> It's not a feature. It's a UI and programming annoyance. How are you supposed to write code against functions with behavior that shifts on a whim with no stable functions to call instead?
>
>We've been doing that since day one: you write code that looks at the
>variables to figure out what behavior to expect, or you write code
>that is general enough to not care.

I can't wait to program against our glorious new dwim function. The problem with let binding the world is that the set of dynamic inputs becomes unbounded. What if I just bind the strategy option and one day TS introduces, say, a new sub-strategy option that makes the function I call behave differently?

To the extent possible, and in the strategy case for TS mode it's certainly possible, commands should do one thing and if you want to do a different thing, you run a different command.

>> What am I supposed to do, let-bind every possible value around every function call?
>
>Sometimes, yes.  Although hopefully not so frequently and not "every
>possible value".
>
>> Why have functions then. Let's just have one function with a strategy.
>
>Arguments "ad absurdum" are not always useful.  In this case, no one
>is calling for such an extremity.  But sometimes this has to be done.
>
>> >But if the semantics of a command is ambiguous, then a switch makes
>> >perfect sense.  In this case, what exactly "beginning of defun" means
>> >when there are nested defuns is ambiguous.
>> 
>> Yet we use the concept of command names to express different concepts elsewhere. And if the concept is ambiguously defined, provide a minimal knob to adjust that concept, not change the operation of primitives to be inconsistent with each other. 
>
>I think we do the former, or at least we try.

Then let's make separate commands to express moving to a defun boundary one way versus another way and let users express their preference for connecting input to action using keymaps.

>> >> Why does there need to be a tree-sitter version of
>> >> prog-fill-reindent-defun?
>> >
>> >Because the way to get the indentation information from tree-sitter is
>> >significantly different from the ad-hoc ways we do that in
>> >"traditional" modes.
>> 
>> No it isn't. If the defun navigation functions in TS modes had their traditional behavior, they'd continue to work for higher level constructs built on top of them like the prog-mode reindent and mark defun. TS modes broke a whole bunch of things that had worked fine for decades, and instead of fixing them, they just made even more abstractions to plug inconsistent tree sitter things in place of the broken things.
>
>Indentation is a lot more than just navigation.  And I disagree with
>you extreme interpretation of the current state of indentation and
>navigation support in TS-based modes.

I'm right.

>> >How do you implement anything like c-set-offset or indentation styles
>> >based only on low-level syntactic analysis? 
>> 
>> By using TS to implement c-guess-basic-syntax and friends.
>
>Did you look at the implementation of how c-set-offset encode
>indentation information?  Did you try to think how to get the same
>information from tree-sitter?  If you did, and found the way, how
>about implementing c-ts-set-offset? I Think it's sorely missed.
>
>> cc-mode indentation styles are clear expressions of user intent. No reason at all TS modes couldn't respect this intent and merely implement it a different way. Want to know whether you're after a class? Inside a namespace? Where a declaration begins? You have an AST right there!
>
>Sorry, this is simplification.  A typical declaration breaks down into
>smaller parts, and we have expectations and ideas about indentation of
>each one of them.  But the tree-sitter classification of the AST
>constituents does not necessarily make that easy, because you could
>have the same syntactic symbol both inside a declaration and in other
>places.  So having an AST does not always immediately tell you how to
>indent correctly.

No, but it gives you more information than looking-at does, and cc-mode does its job admirable given only that simple tool. You can look at nesting and context in the TS AST to figure out what to do.


>> > where will the rest of the
>> >necessary information come from, and who and how will apply it?
>> 
>> From the AST. Where else?
>
>See above.
>
>> >I think the point in the above example was that the semantic of "sexp"
>> >is ambiguous in any language that is not Lisp.  That was (and still
>> >is) the hard part of figuring out how forward-sexp should behave in
>> >TS-based modes.  (In non-TS modes the behavior is just arbitrary
>> >nonsense, if you ask me.)
>> 
>> Yes, and because it's ambiguous we get annoyances like python-mode's default sexp movement. Now every mode is like that, and you can't turn it off half the time?
>
>What else did you expect?  Some users like one style, others like the
>other.  Are we supposed to say "my way or the highway"?  And that's
>even before we consider that the disagreement cuts through the
>developers themselves.

No. I'm expecting a generally consistent experience, and if we want to provide a configuration knob, it should affect everything consistently. One shouldn't have to form independent and different muscle memory for each language mode because the whims of their authors were different.

>I find continuing this kind of argument not constructive, so I will
>stop here.  Let me just say that I think you are looking at this stuff
>from some semi-abstract, almost idealistic, aspect.  As if we didn't
>have 40 years of development and user experience and expectations to
>keep and uphold.

I'll never understand the mindset that holds that things making sense and having a structure is bad actually because sense and structure are "academic" and "idealistic".

What 40 years of development and user experience holds is that if I'm four pages deep into a nasty TypeScript function and hit beginning-of-defun, I want to go to the beginning of the four page defun I am editing and not some random place two pages up that I didn't even know about in which someone scribbled out some kind of nested lambda irrelevant to my present task.

>> >> >> Furthermore, consider the following `top-level` tactic example using
>> >> >> `c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
>> >> >> defun for C++)
>> >> 
>> >> Namespaces aren't defuns and c++-ts-mode shouldn't be indenting their
>> >> contents by a level either.
>> >
>> >But c++-mode does indent them.  Doesn't this contradict what you said
>> >about following past practices?
>> 
>> In c++-mode, I can turn it off with a documented user knob.
>
>You've changed the subject.  But by all means, let's add such a knob
>to c++-ts-mode, sure.
>
>> In c++-ts-mode, I have to write fragile hacks to monkeypatch mode internals. They're not the same thing. And no matter what indent style I choose in c++-mode, namespace isn't magically a defun.
>
>But beginning-of-defun nevertheless takes me to the beginning of the
>namespace in c++-mode.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Tue, 10 Jun 2025 19:13:02 GMT) Full text and rfc822 format available.

Message #47 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Troy Brown <brownts <at> troybrown.dev>,
 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Tue, 10 Jun 2025 12:11:34 -0700

I’m hesitant to get into another long debate, but let me give my two cents.

>> Consider that existing code in Emacs isn’t set in stone, it’s not
>> really that fundamental that beginning-of-defun must work in a way
>> that makes prog-fill-reindent-defun behave
>> desirably.
> 
> Experimenting with a different UI paradigm doesn't justify introducing
> an inconsistency to behavior that's worked well for literally decades.
> If I'm using, say, c++-ts-mode, my navigation commands should do the
> same thing they do in c++-mode.  Plenty of code as well as muscle
> memories rely on this behavior.
> 
> The "tactic" concept is an unnecessary layer of indirection.
> If operation A takes you to place X and operation B takes you to
> different place Y, the way to express the difference between operations
> A and B is to make them _different commands_, not by twiddling some
> global switch.

c-defun-tactic exists since Emacs 24, treesit-defun-tactic is actually modeled after it. Treesit-defun-tactic’s `top-level` and `nested` corresponds to c-defun-tactic’s `t` and `go-outward`. The only difference is that tree-sitter defaults to the `nested` option. So we’re not breaking the UI paradigm as you imagined. Also c++-ts-mode is a new separate mode, and provides vastly different features to c++-mode, so some difference is warranted.

A lot of features in c++-mode is missing simply because we don’t have the resource to recreate them in c++-ts-mode. After all, Alan and others spend decades developing c++-mode. We also can’t just build c++-ts-mode on top of c++-mode because the engine in c++ mode is very heavy (meaning slow), if we use both tree-sitter parser (which has its own overhead) and the cc engine, it’ll be even slower than c++-mode for not much benefit. It just makes no sense.

Also, cc-mode is a huge codebase, and I don’t have expertise in it. If Alan or some other cc-mode expert decides to incorporate tree-sitter into cc-mode, they might get strictly more information as you claimed. But I and other people willing to work on c++-ts-mode don’t have that ability.

(Also Theo is the maintainer of c++-ts-mode, I’m just a co-maintainer trying to fix things up when he’s not available.)

> When you change tree-sitter "strategies" right now, you're silently
> turning one command into another command, that's confusing for everyone.
> Please give these "strategies" individual command names.  We have
> beginning-of-defun and beginning-of-defun-comments, not a knob that
> alters beginning-of-defun.

Using variables to alter a commands behavior is a very well accepted practice, it’s everywhere in Emacs. Both approach are equally valid and are suitable for difference circumstances. In our case, there’s already the long-existed tactic concept so we want to keep to it, and also people usually only want the command behave one way or another, so they can just set the variable and be done with it. Multiple commands are more suitable for cases where the user wants to use different behaviors in the same time, then they can bind each command to different keys. Multiple commands have their own downsides like more hassle to configure and harder to discover, so they’re not strictly better.

>> prog-fill-reindent-defun uses beginning-of-defun because we
>> don’t have better choices before tree-sitter. In tree-sitter major
>> modes, what we’ve been doing is to make the existing commands
>> customizable so tree-sitter can provide a tree-sitter version of
>> it.
> 
> Why does there need to be a tree-sitter version of
> prog-fill-reindent-defun?  Isn't it enough that tree-sitter provide the
> low-level syntactic analysis for prog-fill-reindent-defun to do its job?
> Why the high level hook?

If we want to use a lower-level abstraction, we’d need to create some framework to get the defun at point, and add a tree-sitter provider of it. That just brings a whole can of worms: what should this API look like exactly? Should we change all other existing functions (many of which are hundred line monsters that are hard to understand and refactor without introducing breakage) to use this new framework? Should it limit to only defun, or should it support other constructs? If we want to support other constructs, how do we implement the provider for those constructs in non-tree-sitter settings? Many other functions that use beginning/end-of-defun needs more than getting the defun at point, they might want to know if point is at the beginning of a defun, they might want to know if there’s another defun after this one. How do we design the framework so these requests are met?

For the reasons above, thing-at-point isn’t really suitable for our purpose, or at least we need to make significant changes to its API to make it suitable. Thing-at-point is mostly a  simple API to get things like url and symbol and word at point, not something that can give us rich information of syntactic construct at point.

You might think prog-fill-reindent-defun can just use tree-sitter functions to get syntax information, but that’s a big no-no. Existing Emacs features like imenu, font-lock, prog-fill-reindent-defun, etc, shouldn’t even know about tree-sitter, they are user-facing and only provide customization points for major modes/tree-sitter/eglot/etc to plugin and provide functionality.

So things aren’t that simple if you start thinking about how to actually doing it. If you can design something that give the right abstraction that major mode/tree-sitter/eglot/etc can plugin to, and can make it work for both non-tree-sitter and tree-sitter, and potential future tools that provide parse trees, I’ll be very happy, because we do want that.

So given all that consideration, I decided to go with a higher-level abstraction you saw. After my refactor of prog-fill-reindent-defun, reimplementing it for tree-sitter is just 8 lines of code. The small code duplication is perfectly acceptable IMO. A higher-level abstraction also has plenty benefits: major modes can define their own version if it’s necessary, we have more freedom in our implementation because there’s a looser coupling, etc.

> 
>> We’ve done this for forward-sexp: we added
>> forward-sexp-function. Some commands already have customization points
>> long ago, like beginning-of-defun, which has
>> beginning-of-defun-function.
> 
> Modes use generally these "customization points" to _implement_ the
> familiar behavior, not to give them random different
> user-visible semantics.

Let me also reply to your later messages here:

> It's not a feature. It's a UI and programming annoyance. How are you supposed to write code against functions with behavior that shifts on a whim with no stable functions to call instead?
> 
> Yet we use the concept of command names to express different concepts elsewhere. And if the concept is ambiguously defined, provide a minimal knob to adjust that concept, not change the operation of primitives to be inconsistent with each other. 

I agree that beginning/end-of-defun has evolved into a kind of primitive function for getting the defun at point, because we didn’t have a parse tree or an API for getting defun at point. But I disagree your conclusion that one can’t write code against beginning/end-of-defun that can change behavior. I mean, that’s the point of abstraction, no? User can choose they want top-level or nested kind of defun, and by changing that, all the functions that use beginning/end-of-defun automatically switch to the chosen behavior. Isn’t that better than having no choice?

But of course there'll be cases where the abstraction doesn’t work, like prog-fill-reindent-defun. It’s mostly because beginning/end-of-defun is a god awful abstraction, but that’s what all the existing functions use and we have to live with it, until someone creates a better abstraction and refactor all the existing functions to use it. Anyway, in cases where the abstraction doesn’t work well, we just need to do a bit more to make it work, and I added prog-fill-reindent-defun-function. 

>> So I added prog-fill-reindent-defun-function and a tree-sitter version
>> treesit-fill-reindent-defun. The tree-sitter implementation uses
>> treesit-defun-at-point, so it doesn’t even need to concern
>> with tactics.
>> 
>> Now in tree-sitter major modes, prog-fill-reindent-defun should always
>> indent the enclosing defun.
> 
> Which now means prog-fill-reindent-defun can indent something other than
> what mark-defun highlights?  That seems odd to me.  Tree sitter's job is
> syntactic analysis, not UI differentiation.
> 
>>> After that, I'm less concerned about its behavior, but think the
>>> current `nested` behavior of visiting the previous sibling defun until
>>> there are no more, then visiting the parent defun makes sense.  The
>>> key difference is that you only visit the previous sibling if point is
>>> at a defun boundary...otherwise visit the defun containing point.
>> 
>> Hmmm, it doesn’t feel very convenient, you’d need to first adjust your
>> point to be precisely at the boundary, then press C-M-a/e? IMO that
>> adds too much overhead. I added a tactic `parent-first` that always
>> move to the beginning/end of the enclosing defun. People that prefers
>> this kind of defun movement can use this tactic.
> 
> The default should be to match behavior that's been stable for decades.
> Use of tree sitter should be an implementation detail for users.
> 
> If we want to provide UI to better handle nested defuns, this UI should
> go in prog-mode.el and rely on mode-provided syntactic analysis,

As I described above, it’s not as simple as you imagined.

> not
> just delegate to a mode function that does different random stuff in
> each mode.

If the mode use tree-sitter provided default (which they 99.99% will), the behavior is well defined. Even if major mode decide to provide their own version, I trust the major mode author to know what they’re doing and have a good reason to implement their own, and provide a function that fits the docstring of prog-fill-reindent-defun. So no, it won’t be random stuff.

>>> Furthermore, consider the following `top-level` tactic example using
>>> `c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
>>> defun for C++)
> 
> Namespaces aren't defuns and c++-ts-mode shouldn't be indenting their
> contents by a level either.  c++-ts-mode is unusable without hacks like
> pragmatically editing the indentation rules in user configuration.

If it’s a bug in the indentation rule, we can fix it. But can you open another bug report and describe what exactly is wrong?

Yuan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Tue, 10 Jun 2025 20:13:01 GMT) Full text and rfc822 format available.

Message #50 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Troy Brown <brownts <at> troybrown.dev>,
 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in typescript-ts-mode
Date: Tue, 10 Jun 2025 13:12:18 -0700


On June 10, 2025 12:11:34 PM PDT, Yuan Fu <casouri <at> gmail.com> wrote:
>I’m hesitant to get into another long debate, but let me give my two cents.
>
>>> Consider that existing code in Emacs isn’t set in stone, it’s not
>>> really that fundamental that beginning-of-defun must work in a way
>>> that makes prog-fill-reindent-defun behave
>>> desirably.
>> 
>> Experimenting with a different UI paradigm doesn't justify introducing
>> an inconsistency to behavior that's worked well for literally decades.
>> If I'm using, say, c++-ts-mode, my navigation commands should do the
>> same thing they do in c++-mode.  Plenty of code as well as muscle
>> memories rely on this behavior.
>> 
>> The "tactic" concept is an unnecessary layer of indirection.
>> If operation A takes you to place X and operation B takes you to
>> different place Y, the way to express the difference between operations
>> A and B is to make them _different commands_, not by twiddling some
>> global switch.
>
>c-defun-tactic exists since Emacs 24, treesit-defun-tactic is actually modeled after it. Treesit-defun-tactic’s `top-level` and `nested` corresponds to c-defun-tactic’s `t` and `go-outward`. The only difference is that tree-sitter defaults to the `nested` option. So we’re not breaking the UI paradigm as you imagined. Also c++-ts-mode is a new separate mode, and provides vastly different features to c++-mode, so some difference is warranted.


I don't think I've seen that cc-mode variable change. And there's still no point in changing the default from cc-mode. If there's an exact correspondence, we should use it to minimize disruption.

>A lot of features in c++-mode is missing simply because we don’t have the resource to recreate them in c++-ts-mode. After all, Alan and others spend decades developing c++-mode. We also can’t just build c++-ts-mode on top of c++-mode because the engine in c++ mode is very heavy (meaning slow), if we use both tree-sitter parser (which has its own overhead) and the cc engine, it’ll be even slower than c++-mode for not much benefit. It just makes no sense.

Sure. That's not to say that c++-ts-mode can do less than cc-mode by its nature, however. It's an as-yet unimplemented feature, not an architectural gap. And given how powerful having an AST is, I'd expect the effort to implement parity with cc-mode's lexical and syntactic analysis to be a few OOMs less than it was for cc-mode.

>Also, cc-mode is a huge codebase, and I don’t have expertise in it. If Alan or some other cc-mode expert decides to incorporate tree-sitter into cc-mode, they might get strictly more information as you claimed. But I and other people willing to work on c++-ts-mode don’t have that ability.
>
>(Also Theo is the maintainer of c++-ts-mode, I’m just a co-maintainer trying to fix things up when he’s not available.)
>
>> When you change tree-sitter "strategies" right now, you're silently
>> turning one command into another command, that's confusing for everyone.
>> Please give these "strategies" individual command names.  We have
>> beginning-of-defun and beginning-of-defun-comments, not a knob that
>> alters beginning-of-defun.
>
>Using variables to alter a commands behavior is a very well accepted practice, 

I for one disapprove of this practice. I don't like DWIM commands because they make it hard to discover the actual behavior and hard to make user customizations. Instead of DWIM commands we should have keymaps a mode adjusts dynamically depending on context.

For example, in org, we should have a function that says "do the tab key thing". Org should have an org-dwim keymap from which org-mode-map inherits and that org changes depending on context as point moves. This way, C-h c would correctly say that *here* TAB indents and *there* it cycles visibility whenever the user invoked it -- and the user could override org-mode-map to either override TAB globally or remap a specific operation (e.g. whatever the org indent function is)

Likewise, for prog modes, instead of having a DWIM function movement command, we should have a customize option that changes the default command key binding in that composed keymap without affecting user bindings and while making C-h c tell the user about the true behavior.

> it’s everywhere in Emacs. 

Unfortunately.

> Both approach are equally valid and are suitable for difference circumstances. 

Commands that make big changes to their behavior based on user preferences are an approach to avoid.

> In our case, there’s already the long-existed tactic concept so we want to keep to it, and also people usually only want the command behave one way or another, so they can just set the variable and be done with it. Multiple commands are more suitable for cases where the user wants to use different behaviors at the same time, then they can bind each command to different keys.

The multiple keymap approach I described above is compatible with users explicitly binding commands to different keys.

>>> prog-fill-reindent-defun uses beginning-of-defun because we
>>> don’t have better choices before tree-sitter. In tree-sitter major
>>> modes, what we’ve been doing is to make the existing commands
>>> customizable so tree-sitter can provide a tree-sitter version of
>>> it.
>> 
>> Why does there need to be a tree-sitter version of
>> prog-fill-reindent-defun?  Isn't it enough that tree-sitter provide the
>> low-level syntactic analysis for prog-fill-reindent-defun to do its job?
>> Why the high level hook?

I get there being a TS version of a routine to get the bounds of the current defun. I can understand a TS hook for filling a region. What I don't understand is why we need to go even higher level than that and make a mode hook for *filling* the defun in particular. Why wouldn't composing the two hooks above be correct generic code we could put in prog-mode?

>If we want to use a lower-level abstraction, we’d need to create some framework to get the defun at point, and add a tree-sitter provider of it. 

Yeah. That's what CEDET's functions are for and can be for if we rehabilitate them. It's not like we're the first people talking about this.

Instead of modes defining random beginning of defun and other low level primitives, we:

1) define user level begining of defun, etc. in terms of CEDET tags (if provided)

2) provide options for users to customize these semantic movement commands in a generic way, e.g. go to parent, go to sibling, whatever 

3) provide fallbacks for exiting modes

This way, TS modes (and other modes) can use arbitrarily sophisticated analysis to determine what a defun *is* and the nesting structure of defuns and we decouple this analysis from movement between constructs, which we can implement with existing infrastructure. This way, we get UI consistency.

> That just brings a whole can of worms: what should this API look like exactly? Should we change all other existing functions (many of which are hundred line monsters that are hard to understand and refactor without introducing breakage) to use this new framework? Should it limit to only defun, or should it support other constructs? If we want to support other constructs, how do we implement the provider for those constructs in non-tree-sitter settings? 

We've had such a framework for years and years now. Shame nobody is using it.


> Many other functions that use beginning/end-of-defun needs more than getting the defun at point, they might want to know if point is at the beginning of a defun, they might want to know if there’s another defun after this one. How do we design the framework so these requests are met?
>
>For the reasons above, thing-at-point isn’t really suitable for our purpose, or at least we need to make significant changes to its API to make it suitable. Thing-at-point is mostly a  simple API to get things like url and symbol and word at point, not something that can give us rich information of syntactic construct at point.
>
>You might think prog-fill-reindent-defun can just use tree-sitter functions to get syntax information, but that’s a big no-no. Existing Emacs features like imenu, font-lock, prog-fill-reindent-defun, etc, shouldn’t even know about tree-sitter, they are user-facing and only provide customization points for major modes/tree-sitter/eglot/etc to plugin and provide functionality.
>
>So things aren’t that simple if you start thinking about how to actually doing it. If you can design something that give the right abstraction that major mode/tree-sitter/eglot/etc can plugin to, and can make it work for both non-tree-sitter and tree-sitter, and potential future tools that provide parse trees, I’ll be very happy, because we do want that.
>
>So given all that consideration, I decided to go with a higher-level abstraction you saw. After my refactor of prog-fill-reindent-defun, reimplementing it for tree-sitter is just 8 lines of code. The small code duplication is perfectly acceptable IMO. A higher-level abstraction also has plenty benefits: major modes can define their own version if it’s necessary, we have more freedom in our implementation because there’s a looser coupling, etc.
>
>> 
>>> We’ve done this for forward-sexp: we added
>>> forward-sexp-function. Some commands already have customization points
>>> long ago, like beginning-of-defun, which has
>>> beginning-of-defun-function.
>> 
>> Modes use generally these "customization points" to _implement_ the
>> familiar behavior, not to give them random different
>> user-visible semantics.
>
>Let me also reply to your later messages here:
>
>> It's not a feature. It's a UI and programming annoyance. How are you supposed to write code against functions with behavior that shifts on a whim with no stable functions to call instead?
>> 
>> Yet we use the concept of command names to express different concepts elsewhere. And if the concept is ambiguously defined, provide a minimal knob to adjust that concept, not change the operation of primitives to be inconsistent with each other. 
>
>I agree that beginning/end-of-defun has evolved into a kind of primitive function for getting the defun at point, because we didn’t have a parse tree or an API for getting defun at point. But I disagree your conclusion that one can’t write code against beginning/end-of-defun that can change behavior. I mean, that’s the point of abstraction, no? User can choose they want top-level or nested kind of defun, and by changing that, all the functions that use beginning/end-of-defun automatically switch to the chosen behavior. Isn’t that better than having no choice?

You can write code against these functions only if their behavior changes in consistent ways. For example, mark-defun in prog today moves backwards and forwards over defuns to find the defun region. This sibling strategy for movement van results in this region finding algorithm producing nonsense, yes?


>But of course there'll be cases where the abstraction doesn’t work, like prog-fill-reindent-defun. It’s mostly because beginning/end-of-defun is a god awful abstraction, but that’s what all the existing functions use and we have to live with it, until someone creates a better abstraction and refactor all the existing functions to use it. Anyway, in cases where the abstraction doesn’t work well, we just need to do a bit more to make it work, and I added prog-fill-reindent-defun-function. 

I agree it's not the best abstraction, but 1) a better one is available, and 2) it's worked for many years. Adding a better abstraction doesn't preclude implementing the current one as best we can.

>
>>> So I added prog-fill-reindent-defun-function and a tree-sitter version
>>> treesit-fill-reindent-defun. The tree-sitter implementation uses
>>> treesit-defun-at-point, so it doesn’t even need to concern
>>> with tactics.
>>> 
>>> Now in tree-sitter major modes, prog-fill-reindent-defun should always
>>> indent the enclosing defun.
>> 
>> Which now means prog-fill-reindent-defun can indent something other than
>> what mark-defun highlights?  That seems odd to me.  Tree sitter's job is
>> syntactic analysis, not UI differentiation.
>> 
>>>> After that, I'm less concerned about its behavior, but think the
>>>> current `nested` behavior of visiting the previous sibling defun until
>>>> there are no more, then visiting the parent defun makes sense.  The
>>>> key difference is that you only visit the previous sibling if point is
>>>> at a defun boundary...otherwise visit the defun containing point.
>>> 
>>> Hmmm, it doesn’t feel very convenient, you’d need to first adjust your
>>> point to be precisely at the boundary, then press C-M-a/e? IMO that
>>> adds too much overhead. I added a tactic `parent-first` that always
>>> move to the beginning/end of the enclosing defun. People that prefers
>>> this kind of defun movement can use this tactic.
>> 
>> The default should be to match behavior that's been stable for decades.
>> Use of tree sitter should be an implementation detail for users.
>> 
>> If we want to provide UI to better handle nested defuns, this UI should
>> go in prog-mode.el and rely on mode-provided syntactic analysis,
>
>As I described above, it’s not as simple as you imagined.
>
>> not
>> just delegate to a mode function that does different random stuff in
>> each mode.
>
>If the mode use tree-sitter provided default (which they 99.99% will), the behavior is well defined. Even if major mode decide to provide their own version, I trust the major mode author to know what they’re doing and have a good reason to implement their own, and provide a function that fits the docstring of prog-fill-reindent-defun. So no, it won’t be random stuff.

And TS modes will still be randomly inconsistent with everything else in Emacs. Users shouldn't have to use the editor differently because a mode happens to use tree sitter.

>
>>>> Furthermore, consider the following `top-level` tactic example using
>>>> `c++-ts-mode`.  Here, we have a C++ namespace (which is considered a
>>>> defun for C++)
>> 
>> Namespaces aren't defuns and c++-ts-mode shouldn't be indenting their
>> contents by a level either.  c++-ts-mode is unusable without hacks like
>> pragmatically editing the indentation rules in user configuration.
>
>If it’s a bug in the indentation rule, we can fix it. But can you open another bug report and describe what exactly is wrong?

It's not a bug. It's a missing feature. Lots of customization rules can't be expressed using the knobs TS affords right now, and moreover, those knobs depend on unstable raw AST nodes and so are fragile.

I have an existing bug somewhere about TS modes supporting high level indentation style customization using cc-mode's quite good rules and anchor system. There's more than enough information in the TS AST to implement c-guess-basic-syntax.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Wed, 11 Jun 2025 12:09:02 GMT) Full text and rfc822 format available.

Message #53 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Wed, 11 Jun 2025 15:08:24 +0300

> Date: Tue, 10 Jun 2025 09:55:59 -0700
> From: Daniel Colascione <dancol <at> dancol.org>
> CC: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
> 
> >Not really, not when you look closely.  The tools we've built before
> >tree-sitter are ad-hoc, so they allow us to provide information that
> >parsers don't have and don't need to have.  Our syntax tables are not
> >exactly "less efficient parsing", and regular expressions allow us to
> >match whatever we want and call that anything we want.
> 
> There is nothing one bit of information available to c-mode not available to c-ts-mode.

Sure, if you want c-ts-mode do everything c-mode did, plus what it
does based on the parser.  But the idea is to toss as much of the
ad-hoc "parsing" and "matching" we have, so that we don't need to keep
updating that forever, as the languages evolve and add/redefine
features.  That was the main purpose of integrating tree-sitter in
Emacs, after all.  So from where I stand, going back to all the stuff
we have in c-mode would mean we wasted a lot of efforts for nothing.
People who just want the c-mode behavior 1:1 should simply use c-mode,
it will never be removed from Emacs.

> > We still lack some useful
> >functionalities that are present in CC Mode, forrea that very reason;
> >what was easy to implement was done long ago.
> 
> That's the opposite of reality. Alan and others have spent years building a flexible and fast backtracking syntactic analyser for CC mode. Tree sitter does the same thing but in a more general way, in native code for better performance. Its availability makes doing what cc-mode does easier, not harder.

We expected that, yes.  The reality disappointed us to some extent.

> >We've been doing that since day one: you write code that looks at the
> >variables to figure out what behavior to expect, or you write code
> >that is general enough to not care.
> 
> I can't wait to program against our glorious new dwim function. The problem with let binding the world is that the set of dynamic inputs becomes unbounded. What if I just bind the strategy option and one day TS introduces, say, a new sub-strategy option that makes the function I call behave differently?

If we decide it's a good idea, yes.  But these additions are not a
force of nature: we decide whether to add or not to add, and we should
consider the adverse effects each such addition has when we do.  If we
do a good job, there won't be unbounded set of dynamic inputs to
consider in each case.

> >> Yet we use the concept of command names to express different concepts elsewhere. And if the concept is ambiguously defined, provide a minimal knob to adjust that concept, not change the operation of primitives to be inconsistent with each other. 
> >
> >I think we do the former, or at least we try.
> 
> Then let's make separate commands to express moving to a defun boundary one way versus another way and let users express their preference for connecting input to action using keymaps.

I'm not saying we should never do that.  I was responding to your much
more general remarks, not to this particular case.

IOW, if you argue about this specific issue, and with arguments
specific to it, it's possible we will eventually agree.  Just let's
not introduce general arguments like "this should never be done",
because I don't think I agree, based on our experiences and practices.
So such arguments will not convince.

> >> No it isn't. If the defun navigation functions in TS modes had their traditional behavior, they'd continue to work for higher level constructs built on top of them like the prog-mode reindent and mark defun. TS modes broke a whole bunch of things that had worked fine for decades, and instead of fixing them, they just made even more abstractions to plug inconsistent tree sitter things in place of the broken things.
> >
> >Indentation is a lot more than just navigation.  And I disagree with
> >you extreme interpretation of the current state of indentation and
> >navigation support in TS-based modes.
> 
> I'm right.

I beg to differ.

> >Sorry, this is simplification.  A typical declaration breaks down into
> >smaller parts, and we have expectations and ideas about indentation of
> >each one of them.  But the tree-sitter classification of the AST
> >constituents does not necessarily make that easy, because you could
> >have the same syntactic symbol both inside a declaration and in other
> >places.  So having an AST does not always immediately tell you how to
> >indent correctly.
> 
> No, but it gives you more information than looking-at does, and cc-mode does its job admirable given only that simple tool. You can look at nesting and context in the TS AST to figure out what to do.

Once again, going the looking-at way means we bring back all the
ad-hoc stuff that attempts to "parse" the code using heuristic
regexps, with the serious disadvantage that these heuristics need to
be well understood by someone who has a good knowledge of the
underlying language, and updated as the language evolves.  So I'd like
to do that only when absolutely necessary, and as little as possible
even then.

> >> Yes, and because it's ambiguous we get annoyances like python-mode's default sexp movement. Now every mode is like that, and you can't turn it off half the time?
> >
> >What else did you expect?  Some users like one style, others like the
> >other.  Are we supposed to say "my way or the highway"?  And that's
> >even before we consider that the disagreement cuts through the
> >developers themselves.
> 
> No. I'm expecting a generally consistent experience, and if we want to provide a configuration knob, it should affect everything consistently. One shouldn't have to form independent and different muscle memory for each language mode because the whims of their authors were different.

Consistency is problematic here, because "sexp" doesn't translate
consistently between languages, and even "defun" not always does
(cf. the "namespace" case).

> >I find continuing this kind of argument not constructive, so I will
> >stop here.  Let me just say that I think you are looking at this stuff
> >from some semi-abstract, almost idealistic, aspect.  As if we didn't
> >have 40 years of development and user experience and expectations to
> >keep and uphold.
> 
> I'll never understand the mindset that holds that things making sense and having a structure is bad actually because sense and structure are "academic" and "idealistic".

There's no such mindset.  I have no disagreement with you on that
level.  The disagreement is on a very practical level: we did try to
do things that way at the beginning, two years ago.  We are where we
are because that didn't work well enough.

> What 40 years of development and user experience holds is that if I'm four pages deep into a nasty TypeScript function and hit beginning-of-defun, I want to go to the beginning of the four page defun I am editing and not some random place two pages up that I didn't even know about in which someone scribbled out some kind of nested lambda irrelevant to my present task.

It turns out other users have other preferences and expectations,
especially if we include other languages in the context.  We cannot
tell them to get lost.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Wed, 11 Jun 2025 13:00:02 GMT) Full text and rfc822 format available.

Message #56 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Wed, 11 Jun 2025 05:59:28 -0700

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Tue, 10 Jun 2025 09:55:59 -0700
>> From: Daniel Colascione <dancol <at> dancol.org>
>> CC: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
>> 
>> >Not really, not when you look closely.  The tools we've built before
>> >tree-sitter are ad-hoc, so they allow us to provide information that
>> >parsers don't have and don't need to have.  Our syntax tables are not
>> >exactly "less efficient parsing", and regular expressions allow us to
>> >match whatever we want and call that anything we want.
>> 
>> There is nothing one bit of information available to c-mode not available to c-ts-mode.
>
> Sure, if you want c-ts-mode do everything c-mode did, plus what it
> does based on the parser.  But the idea is to toss as much of the
> ad-hoc "parsing" and "matching" we have, so that we don't need to keep
> updating that forever, as the languages evolve and add/redefine
> features.  That was the main purpose of integrating tree-sitter in
> Emacs, after all.  So from where I stand, going back to all the stuff
> we have in c-mode would mean we wasted a lot of efforts for nothing.
> People who just want the c-mode behavior 1:1 should simply use c-mode,
> it will never be removed from Emacs.
>
>> > We still lack some useful
>> >functionalities that are present in CC Mode, forrea that very reason;
>> >what was easy to implement was done long ago.
>> 
>> That's the opposite of reality. Alan and others have spent years
>> building a flexible and fast backtracking syntactic analyser for CC
>> mode. Tree sitter does the same thing but in a more general way, in
>> native code for better performance. Its availability makes doing
>> what cc-mode does easier, not harder.
>
> We expected that, yes.  The reality disappointed us to some extent.

What specific information is missing from the TS AST?

>> >We've been doing that since day one: you write code that looks at the
>> >variables to figure out what behavior to expect, or you write code
>> >that is general enough to not care.
>> 
>> I can't wait to program against our glorious new dwim function. The
>> problem with let binding the world is that the set of dynamic inputs
>> becomes unbounded. What if I just bind the strategy option and one
>> day TS introduces, say, a new sub-strategy option that makes the
>> function I call behave differently?
>
> If we decide it's a good idea, yes.  But these additions are not a
> force of nature: we decide whether to add or not to add, and we should
> consider the adverse effects each such addition has when we do.  If we
> do a good job, there won't be unbounded set of dynamic inputs to
> consider in each case.
>
>> >> Yet we use the concept of command names to express different
>> >> concepts elsewhere. And if the concept is ambiguously defined,
>> >> provide a minimal knob to adjust that concept, not change the
>> >> operation of primitives to be inconsistent with each other.
>> >
>> >I think we do the former, or at least we try.
>> 
>> Then let's make separate commands to express moving to a defun boundary one way versus another way and let users express their preference for connecting input to action using keymaps.
>
> I'm not saying we should never do that.  I was responding to your much
> more general remarks, not to this particular case.

By default, commands should do one thing.  One can rebut the presumption
that a command should do one simple thing for certain special cases, I
guess, but I don't see a strong argument for for beginning-of-defun
being one of these cases when it's easy to provide commands for the
various supported behaviors.

> IOW, if you argue about this specific issue, and with arguments
> specific to it, it's possible we will eventually agree.  Just let's
> not introduce general arguments like "this should never be done",
> because I don't think I agree, based on our experiences and practices.
> So such arguments will not convince.
>
>> >> No it isn't. If the defun navigation functions in TS modes had
>> >> their traditional behavior, they'd continue to work for higher
>> >> level constructs built on top of them like the prog-mode reindent
>> >> and mark defun. TS modes broke a whole bunch of things that had
>> >> worked fine for decades, and instead of fixing them, they just
>> >> made even more abstractions to plug inconsistent tree sitter
>> >> things in place of the broken things.
>> >
>> >Indentation is a lot more than just navigation.  And I disagree with
>> >you extreme interpretation of the current state of indentation and
>> >navigation support in TS-based modes.
>> 
>> I'm right.
>
> I beg to differ.
>
>> >Sorry, this is simplification.  A typical declaration breaks down into
>> >smaller parts, and we have expectations and ideas about indentation of
>> >each one of them.  But the tree-sitter classification of the AST
>> >constituents does not necessarily make that easy, because you could
>> >have the same syntactic symbol both inside a declaration and in other
>> >places.  So having an AST does not always immediately tell you how to
>> >indent correctly.
>> 
>> No, but it gives you more information than looking-at does, and cc-mode does its job admirable given only that simple tool. You can look at nesting and context in the TS AST to figure out what to do.
>
> Once again, going the looking-at way means we bring back all the
> ad-hoc stuff that attempts to "parse" the code using heuristic
> regexps, with the serious disadvantage that these heuristics need to
> be well understood by someone who has a good knowledge of the
> underlying language, and updated as the language evolves.  So I'd like
> to do that only when absolutely necessary, and as little as possible
> even then.

I'm not saying c++-ts-mode should parse using regexps like cc-mode does.
I am saying that the AST should contain all the information cc-mode
would instead get from this parsing and can be put to the same use.
What information is the AST missing?

>> >> Yes, and because it's ambiguous we get annoyances like python-mode's default sexp movement. Now every mode is like that, and you can't turn it off half the time?
>> >
>> >What else did you expect?  Some users like one style, others like the
>> >other.  Are we supposed to say "my way or the highway"?  And that's
>> >even before we consider that the disagreement cuts through the
>> >developers themselves.
>> 
>> No. I'm expecting a generally consistent experience, and if we want
>> to provide a configuration knob, it should affect everything
>> consistently. One shouldn't have to form independent and different
>> muscle memory for each language mode because the whims of their
>> authors were different.
>
> Consistency is problematic here, because "sexp" doesn't translate
> consistently between languages, and even "defun" not always does
> (cf. the "namespace" case).

That "sexp" and "defun" mean different things in different languages
doesn't mean movement commands between defuns should do different things
in different languages.  Movement _between_ defuns should work the same
way everywhere even if the _definition_ of a defun is language-specific.

I can write with a pen or a pencil.  I don't have to switch from
left-handed to right-handed writing when I want to write with ink
instead of graphite.

>> >I find continuing this kind of argument not constructive, so I will
>> >stop here.  Let me just say that I think you are looking at this stuff
>> >from some semi-abstract, almost idealistic, aspect.  As if we didn't
>> >have 40 years of development and user experience and expectations to
>> >keep and uphold.
>> 
>> I'll never understand the mindset that holds that things making sense and having a structure is bad actually because sense and structure are "academic" and "idealistic".
>
> There's no such mindset.  I have no disagreement with you on that
> level.  The disagreement is on a very practical level: we did try to
> do things that way at the beginning, two years ago.  We are where we
> are because that didn't work well enough.

Maybe we didn't try hard enough.  If there's something missing in the
AST, perhaps it could be added.

>> What 40 years of development and user experience holds is that if
>> I'm four pages deep into a nasty TypeScript function and hit
>> beginning-of-defun, I want to go to the beginning of the four page
>> defun I am editing and not some random place two pages up that I
>> didn't even know about in which someone scribbled out some kind of
>> nested lambda irrelevant to my present task.
>
> It turns out other users have other preferences and expectations,
> especially if we include other languages in the context.  We cannot
> tell them to get lost.

Users really prefer go-to-sibling behavior for beginning-of-defun?
Says who?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Wed, 11 Jun 2025 13:17:02 GMT) Full text and rfc822 format available.

Message #59 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Wed, 11 Jun 2025 16:16:14 +0300

> From: Daniel Colascione <dancol <at> dancol.org>
> Cc: casouri <at> gmail.com,  brownts <at> troybrown.dev,  78703 <at> debbugs.gnu.org
> Date: Wed, 11 Jun 2025 05:59:28 -0700
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >> > We still lack some useful
> >> >functionalities that are present in CC Mode, forrea that very reason;
> >> >what was easy to implement was done long ago.
> >> 
> >> That's the opposite of reality. Alan and others have spent years
> >> building a flexible and fast backtracking syntactic analyser for CC
> >> mode. Tree sitter does the same thing but in a more general way, in
> >> native code for better performance. Its availability makes doing
> >> what cc-mode does easier, not harder.
> >
> > We expected that, yes.  The reality disappointed us to some extent.
> 
> What specific information is missing from the TS AST?

Anything that is not part of the C grammar.  I already gave one
example: the way we support DEFUN.  Handling of cpp directives is also
not entirely satisfactory (because macros are not C, so TS added them
as a semi-kludgey feature).

> > I'm not saying we should never do that.  I was responding to your much
> > more general remarks, not to this particular case.
> 
> By default, commands should do one thing.  One can rebut the presumption
> that a command should do one simple thing for certain special cases, I
> guess, but I don't see a strong argument for for beginning-of-defun
> being one of these cases when it's easy to provide commands for the
> various supported behaviors.

In this case, the different behavior is from user preference of either
preferring the nested function or the top-level one.  That preference
affects more than just beginning-of-defun.  I think Yuan explained why
we are where we are with this.

> > Once again, going the looking-at way means we bring back all the
> > ad-hoc stuff that attempts to "parse" the code using heuristic
> > regexps, with the serious disadvantage that these heuristics need to
> > be well understood by someone who has a good knowledge of the
> > underlying language, and updated as the language evolves.  So I'd like
> > to do that only when absolutely necessary, and as little as possible
> > even then.
> 
> I'm not saying c++-ts-mode should parse using regexps like cc-mode does.
> I am saying that the AST should contain all the information cc-mode
> would instead get from this parsing and can be put to the same use.
> What information is the AST missing?

See above.

As another example that I saw only recently (which is why I still
remember it): the C grammar library only now added support for
__cdecl, something that c-mode supported long ago.  So we also have
features missing from tree-sitter just because they are missing.

> > Consistency is problematic here, because "sexp" doesn't translate
> > consistently between languages, and even "defun" not always does
> > (cf. the "namespace" case).
> 
> That "sexp" and "defun" mean different things in different languages
> doesn't mean movement commands between defuns should do different things
> in different languages.  Movement _between_ defuns should work the same
> way everywhere even if the _definition_ of a defun is language-specific.

Yes, but here the definition of the defun differs.

> > There's no such mindset.  I have no disagreement with you on that
> > level.  The disagreement is on a very practical level: we did try to
> > do things that way at the beginning, two years ago.  We are where we
> > are because that didn't work well enough.
> 
> Maybe we didn't try hard enough.  If there's something missing in the
> AST, perhaps it could be added.

I certainly hope so.  Please be sure to propose improvements based on
the AST, I don't think anyone will object.  What we do (or don't do)
now is just because we didn't yet get tro implementing it or found it
hard, not because we don't want the same features as in non-TS modes.

> >> What 40 years of development and user experience holds is that if
> >> I'm four pages deep into a nasty TypeScript function and hit
> >> beginning-of-defun, I want to go to the beginning of the four page
> >> defun I am editing and not some random place two pages up that I
> >> didn't even know about in which someone scribbled out some kind of
> >> nested lambda irrelevant to my present task.
> >
> > It turns out other users have other preferences and expectations,
> > especially if we include other languages in the context.  We cannot
> > tell them to get lost.
> 
> Users really prefer go-to-sibling behavior for beginning-of-defun?

Turns out that way.

> Says who?

Users.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Wed, 11 Jun 2025 14:16:02 GMT) Full text and rfc822 format available.

Message #62 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Wed, 11 Jun 2025 07:15:27 -0700

Eli Zaretskii <eliz <at> gnu.org> writes:
>> 
>> Users really prefer go-to-sibling behavior for beginning-of-defun?
>
> Turns out that way.
>
>> Says who?
>
> Users.

Do they?  Which ones?  What every user I've seen dislike more than any
default is inconsistency, especially gratuitous inconsistencies between
things that are logically the same and happen to have divergent
implementation details.  I find it impossible to believe that real
users, in code like this:

function foo() {
  blah;
  function bar() {
    ...
  }
  // [Snip four pages]
  for (let x of y) {
    [point]
  } 
}

When pressing C-M-a, want to go to the definition of bar, not the
definition foo.  It doesn't make any sense to me that anyone would want
to perform that operation.  If you're going to insist on bad defaults,
you're just going to drive more people to Doom and such.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Wed, 11 Jun 2025 14:24:02 GMT) Full text and rfc822 format available.

Message #65 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: casouri <at> gmail.com, brownts <at> troybrown.dev, 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Wed, 11 Jun 2025 07:23:24 -0700

Daniel Colascione <dancol <at> dancol.org> writes:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>>> 
>>> Users really prefer go-to-sibling behavior for beginning-of-defun?
>>
>> Turns out that way.
>>
>>> Says who?
>>
>> Users.
>
> Do they?  Which ones?  What every user I've seen dislike more than any
> default is inconsistency, especially gratuitous inconsistencies between
> things that are logically the same and happen to have divergent
> implementation details.  I find it impossible to believe that real
> users, in code like this:
>
> function foo() {
>   blah;
>   function bar() {
>     ...
>   }
>   // [Snip four pages]
>   for (let x of y) {
>     [point]
>   } 
> }
>
> When pressing C-M-a, want to go to the definition of bar, not the
> definition foo.  It doesn't make any sense to me that anyone would want
> to perform that operation.  If you're going to insist on bad defaults,
> you're just going to drive more people to Doom and such.

To clarify, the behavior above is useless because it's impossible to
predict.  With standard behavior, I can see, in the modeline, that I'm
in a defun called "foo".  I can reason, therefore, that if I type C-M-a,
I will go to the start of foo.  I can predict the effect of my actions.
I can't predict the behavior of C-M-a with the current TS default.
Maybe it'll go to bar.  Maybe someone deleted bar in a merge and it'll
go to foo.  Maybe someone added a guy after bar and C-M-a will go there
instead.  I can't know in advance.  This lack of predictability makes
C-M-a as a whole less useful.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78703; Package emacs. (Thu, 12 Jun 2025 12:06:02 GMT) Full text and rfc822 format available.

Message #68 received at 78703 <at> debbugs.gnu.org (full text, mbox):

From: Troy Brown <brownts <at> troybrown.dev>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Daniel Colascione <dancol <at> dancol.org>,
 78703 <at> debbugs.gnu.org
Subject: Re: bug#78703: beginning-of-defun and friends still wrong in
 typescript-ts-mode
Date: Thu, 12 Jun 2025 08:04:55 -0400

On Tue, Jun 10, 2025 at 2:02 AM Yuan Fu <casouri <at> gmail.com> wrote:
>
> > After that, I'm less concerned about its behavior, but think the
> > current `nested` behavior of visiting the previous sibling defun until
> > there are no more, then visiting the parent defun makes sense.  The
> > key difference is that you only visit the previous sibling if point is
> > at a defun boundary...otherwise visit the defun containing point.
>
> Hmmm, it doesn’t feel very convenient, you’d need to first adjust your point to be precisely at the boundary, then press C-M-a/e? IMO that adds too much overhead.

You shouldn't need to adjust point at all.  By boundary, I'm referring
to the place that `beginning-of-defun` (or
`beginning-of-defun-comments`) would place you after running.  I think
something like the following demonstrates what I mean by this.  When
you're within a defun, it takes you to the beginning of the enclosing
defun.  After that, it moves according to the tactic.  To me, this is
much more intuitive than the current behavior, and I believe would
have fixed the issues I described with the prog-fill-reindent-defun
behavior.

```elisp
(defun defun-boundary-p ()
  (or
   ;; Just before a defun
   (save-excursion
     (forward-comment (point-max))
     (when-let* ((treesit-defun-tactic 'nested)
                 (node (treesit-defun-at-point)))
       (eq (point) (treesit-node-start node))))
   ;; Just after a defun
   (save-excursion
     (skip-chars-backward " \t\n")
     (when (not (bobp))
       (goto-char (1- (point)))
       (when-let* ((treesit-defun-tactic 'nested)
                   (node (treesit-defun-at-point)))
         (eq (1+ (point)) (treesit-node-end node)))))
   ;; No enclosing defun
   (null (treesit-defun-at-point))))

(defun my/beginning-of-defun ()
  (interactive)
  (if (defun-boundary-p)
      ;; adhere to tactic
      (beginning-of-defun)
    (let* ((treesit-defun-tactic 'nested)
           (node (treesit-defun-at-point)))
      (goto-char (treesit-node-start node))
      (beginning-of-line))))
```

This bug report was last modified 57 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #78703 beginning-of-defun and friends still wrong in typescript-ts-mode

GNU bug report logs - #78703
beginning-of-defun and friends still wrong in typescript-ts-mode