GNU bug report logs -
#59628
29.0.50; treesit-beginning/end-of-defun problems in C/C++
Previous Next
Reported by: Eli Zaretskii <eliz <at> gnu.org>
Date: Sun, 27 Nov 2022 10:13:01 UTC
Severity: normal
Found in version 29.0.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 59628 in the body.
You can then email your comments to 59628 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#59628
; Package
emacs
.
(Sun, 27 Nov 2022 10:13:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 27 Nov 2022 10:13:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
or c++-ts-mode, go to the middle of some function, and type
M-: (treesit-beginning-of-defun) RET
or
M-: (treesit-end-of-defun) RET
This will move point to very strange places, which generally are neither the
beginning nor the end of the function. In very simple functions, like this
one:
void
__executable_start (void)
{
emacs_abort ();
}
the result is correct. But once the function is even slightly more
complicated, for example, like this:
static int
margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
{
if (margin > 0)
{
int width = w->total_cols;
double d = max (0, margin);
d = min (width / 2 - 1, d);
/* Since MARGIN is positive, we cannot possibly have less than
one glyph for the marginal area. */
return max (1, (int) ((double) total_glyphs / width * d));
}
return 0;
}
the results are very far off the mark.
These two functions are the only ones to move by defuns in treesit-based
modes, right? So they should be improved, IMO.
In GNU Emacs 29.0.50 (build 2273, i686-pc-mingw32) of 2022-11-27 built
on HOME-C4E4A596F7
Repository revision: 80dcd78ff1fce3241043edf1951289eef0bf50c9
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)
Configured using:
'configure -C --prefix=/d/usr --with-wide-int
--enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3''
Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY
W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB
Important settings:
value of $LANG: ENU
locale-coding-system: cp1255
Major mode: C
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
misearch multi-isearch vc-git diff-mode easy-mmode vc-dispatcher
c-ts-mode rx treesit cl-seq cl-loaddefs cl-lib rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads w32notify w32 lcms2 multi-tty make-network-process emacs)
Memory information:
((conses 16 60472 7440)
(symbols 48 7296 0)
(strings 16 20076 2163)
(string-bytes 1 498360)
(vectors 16 11107)
(vector-slots 8 164960 11733)
(floats 8 29 319)
(intervals 40 2962 92)
(buffers 896 14))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#59628
; Package
emacs
.
(Mon, 28 Nov 2022 10:57:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 59628 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
> or c++-ts-mode, go to the middle of some function, and type
>
> M-: (treesit-beginning-of-defun) RET
> or
> M-: (treesit-end-of-defun) RET
>
> This will move point to very strange places, which generally are neither the
> beginning nor the end of the function. In very simple functions, like this
> one:
>
> void
> __executable_start (void)
> {
> emacs_abort ();
> }
>
> the result is correct. But once the function is even slightly more
> complicated, for example, like this:
>
> static int
> margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
> {
> if (margin > 0)
> {
> int width = w->total_cols;
> double d = max (0, margin);
> d = min (width / 2 - 1, d);
> /* Since MARGIN is positive, we cannot possibly have less than
> one glyph for the marginal area. */
> return max (1, (int) ((double) total_glyphs / width * d));
> }
> return 0;
> }
>
> the results are very far off the mark.
>
> These two functions are the only ones to move by defuns in treesit-based
> modes, right? So they should be improved, IMO.
>
If I type
M-: (setq treesit-defun-type-regexp "function_definition") RET
treesit-beginning-of-defun and treesit-end-of-defun do the right thing.
That begs the question: Is it really necessary to have a Tree-sitter
regexp variable to match defun nodes? If yes, should it already have a
sensible default value so things work out of the box in most major
modes?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#59628
; Package
emacs
.
(Mon, 28 Nov 2022 22:09:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 59628 <at> debbugs.gnu.org (full text, mbox):
Daniel Martín <mardani29 <at> yahoo.es> writes:
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>> To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
>> or c++-ts-mode, go to the middle of some function, and type
>>
>> M-: (treesit-beginning-of-defun) RET
>> or
>> M-: (treesit-end-of-defun) RET
>>
>> This will move point to very strange places, which generally are neither the
>> beginning nor the end of the function. In very simple functions, like this
>> one:
>>
>> void
>> __executable_start (void)
>> {
>> emacs_abort ();
>> }
>>
>> the result is correct. But once the function is even slightly more
>> complicated, for example, like this:
>>
>> static int
>> margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
>> {
>> if (margin > 0)
>> {
>> int width = w->total_cols;
>> double d = max (0, margin);
>> d = min (width / 2 - 1, d);
>> /* Since MARGIN is positive, we cannot possibly have less than
>> one glyph for the marginal area. */
>> return max (1, (int) ((double) total_glyphs / width * d));
>> }
>> return 0;
>> }
>>
>> the results are very far off the mark.
>>
>> These two functions are the only ones to move by defuns in treesit-based
>> modes, right? So they should be improved, IMO.
Yeah, I’ll need to look at C grammar and fix treesit-defun-type-regexp.
>
> If I type
>
> M-: (setq treesit-defun-type-regexp "function_definition") RET
>
> treesit-beginning-of-defun and treesit-end-of-defun do the right thing.
> That begs the question: Is it really necessary to have a Tree-sitter
> regexp variable to match defun nodes? If yes, should it already have a
> sensible default value so things work out of the box in most major
> modes?
Different languages have different grammars that give different names to
function definitions and class definitions. So it is necessary to have a
regexp variable. Finding such a regexp isn’t too hard, so I don’t think
we need a default value. If we do have a default, it would be often wrong,
given differences between language grammars.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#59628
; Package
emacs
.
(Tue, 29 Nov 2022 00:14:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 59628 <at> debbugs.gnu.org (full text, mbox):
Yuan Fu <casouri <at> gmail.com> writes:
>
> Different languages have different grammars that give different names to
> function definitions and class definitions. So it is necessary to have a
> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
> we need a default value. If we do have a default, it would be often wrong,
> given differences between language grammars.
I see that each major mode sets the value of that buffer-local variable.
c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
correct? In C code, treesit-explore-mode shows function definition
nodes as "function_definition", so I think the regexp is matching more
nodes than expected, causing C-M-a C-M-e to move to weird places in the
buffer.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#59628
; Package
emacs
.
(Wed, 30 Nov 2022 23:08:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 59628 <at> debbugs.gnu.org (full text, mbox):
Daniel Martín <mardani29 <at> yahoo.es> writes:
> Yuan Fu <casouri <at> gmail.com> writes:
>
>>
>> Different languages have different grammars that give different names to
>> function definitions and class definitions. So it is necessary to have a
>> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
>> we need a default value. If we do have a default, it would be often wrong,
>> given differences between language grammars.
>
> I see that each major mode sets the value of that buffer-local variable.
> c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
> correct? In C code, treesit-explore-mode shows function definition
> nodes as "function_definition", so I think the regexp is matching more
> nodes than expected, causing C-M-a C-M-e to move to weird places in the
> buffer.
Right, I’ve fixed the value in 599369bf3a3.
Yuan
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Thu, 01 Dec 2022 08:10:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
bug acknowledged by developer.
(Thu, 01 Dec 2022 08:10:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 59628-done <at> debbugs.gnu.org (full text, mbox):
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Wed, 30 Nov 2022 15:07:45 -0800
> Cc: eliz <at> gnu.org,
> 59628 <at> debbugs.gnu.org
>
>
> Daniel Martín <mardani29 <at> yahoo.es> writes:
>
> > Yuan Fu <casouri <at> gmail.com> writes:
> >
> >>
> >> Different languages have different grammars that give different names to
> >> function definitions and class definitions. So it is necessary to have a
> >> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
> >> we need a default value. If we do have a default, it would be often wrong,
> >> given differences between language grammars.
> >
> > I see that each major mode sets the value of that buffer-local variable.
> > c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
> > correct? In C code, treesit-explore-mode shows function definition
> > nodes as "function_definition", so I think the regexp is matching more
> > nodes than expected, causing C-M-a C-M-e to move to weird places in the
> > buffer.
>
> Right, I’ve fixed the value in 599369bf3a3.
Thanks, this seems to work now as expected. So I'm closing the bug.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 29 Dec 2022 12:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 230 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.