GNU bug report logs - #59415
29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file

Previous Next

Package: emacs;

Reported by: Eli Zaretskii <eliz <at> gnu.org>

Date: Sun, 20 Nov 2022 17:56:02 UTC

Severity: normal

Found in version 29.0.50

Done: Yuan Fu <casouri <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 59415 in the body.
You can then email your comments to 59415 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 17:56:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eli Zaretskii <eliz <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 20 Nov 2022 17:56:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: bug-gnu-emacs <at> gnu.org
Cc: Yuan Fu <casouri <at> gmail.com>, Theodor Thornhill <theo <at> thornhill.no>
Subject: 29.0.50;
 [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large
 C file
Date: Sun, 20 Nov 2022 19:55:54 +0200
To reproduce:

  emacs -Q
  Evaluate:

    (setq auto-mode-alist
	  (append
	   '(("\\.c\\'" . c-ts-mode))
	   auto-mode-alist))
    (setq treesit-max-buffer-size (* 11 1024 1024))

  C-x C-f packet-rrc.c RET

(This file is the one from bug#45248.)

  C-u 194770 M-g g

Observe that fontifications stop at this line for some reason.
Fontification reappears on line 209271.  Maybe it's because of the many
braces that appear in warning face?  Why does TS think there are syntax
errors here?  The C++ TS parser doesn't have that problem, btw.

P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?


In GNU Emacs 29.0.50 (build 31, i686-pc-mingw32) of 2022-11-20 built on
 HOME-C4E4A596F7
Repository revision: 4fa13b2d838e11cbe3b713f3172721cb61d499f3
Repository branch: feature/tree-sitter
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)

Configured using:
 'configure -C --prefix=/d/usr --with-wide-int
 --enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3''

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY
W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1255

Major mode: C

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
c-ts-mode rx treesit cl-seq vc-bzr vc-dispatcher vc-cvs vc-rcs log-view
easy-mmode pcvs-util cc-mode cc-fonts cc-guess cc-menus cc-cmds
cc-styles cc-align cc-engine cc-vars cc-defs cl-loaddefs cl-lib rmc
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel dos-w32 ls-lisp disp-table
term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
faces cus-face macroexp files window text-properties overlay sha1 md5
base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty
make-network-process emacs)

Memory information:
((conses 16 71496 109093)
 (symbols 48 8917 42)
 (strings 16 25650 9189)
 (string-bytes 1 810278)
 (vectors 16 13855)
 (vector-slots 8 190997 54271)
 (floats 8 26 158)
 (intervals 40 583 684)
 (buffers 904 13))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 19:55:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Theodor Thornhill <theo <at> thornhill.no>
To: Eli Zaretskii <eliz <at> gnu.org>, bug-gnu-emacs <at> gnu.org
Cc: Yuan Fu <casouri <at> gmail.com>
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 20:54:05 +0100
Hi and thanks for cc.
>
> Observe that fontifications stop at this line for some reason.
> Fontification reappears on line 209271.  Maybe it's because of the many
> braces that appear in warning face?  Why does TS think there are syntax
> errors here?  The C++ TS parser doesn't have that problem, btw.
>

It seems the c parser definitely can't handle what it's seeing.

> P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
>

It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
it'll be some more memory usage.  I'll do some more digging, but in the
meantime I attach this profiler report that shows font-locking as the
culprit:

In this profile I followed your repro, and did some more movement around
the buffer after.  This isn't from emacs -Q, but I believe the results
will be just the same, considering where the slowness seems to be


       16695  85% - redisplay_internal (C function)
       16695  85%  - jit-lock-function
       16695  85%   - jit-lock-fontify-now
       16695  85%    - jit-lock--run-functions
       16695  85%     - run-hook-wrapped
       16695  85%      - #<compiled -0x156eddb48a262583>
       16695  85%       - font-lock-fontify-region
       16695  85%        - font-lock-default-fontify-region
       16679  84%         - treesit-font-lock-fontify-region
        2080  10%            treesit-buffer-root-node
        2689  13% - command-execute
        2689  13%  - call-interactively
        2380  12%   - funcall-interactively
        1576   8%    - scroll-up-command
        1525   7%     - scroll-up
        1525   7%      - jit-lock-function
        1525   7%       - jit-lock-fontify-now
        1525   7%        - jit-lock--run-functions
        1525   7%         - run-hook-wrapped
        1525   7%          - #<compiled -0x15bd2ea490f7f983>
        1525   7%           - font-lock-fontify-region
        1525   7%            - font-lock-default-fontify-region
        1525   7%               treesit-font-lock-fontify-region
         633   3%    - end-of-buffer
         628   3%     - recenter
         628   3%      - jit-lock-function
         628   3%       - jit-lock-fontify-now
         628   3%        - jit-lock--run-functions
         628   3%         - run-hook-wrapped
         628   3%          - #<compiled -0x14388b9914c40883>
         628   3%           - font-lock-fontify-region
         628   3%            - font-lock-default-fontify-region
         628   3%               treesit-font-lock-fontify-region
           5   0%       push-mark
         128   0%    - project-find-file
         128   0%     - project-find-file-in
          86   0%      - project--read-file-cpd-relative
          86   0%       - project--completing-read-strict
          86   0%        - completing-read
          86   0%         - completing-read-default
          86   0%          - apply
          86   0%           - vertico--advice
          86   0%            - apply
          86   0%             - #<compiled -0x2e553dfe9f75520>
          79   0%              - read-from-minibuffer
          37   0%               - vertico--exhibit
          26   0%                - vertico--update
          22   0%                   redisplay
           4   0%                 - vertico--recompute
           4   0%                  - vertico-sort-history-length-alpha
           4   0%                   - mapcan
           4   0%                    - #<compiled -0x1cada1a01280ac5f>
           4   0%                       sort
          11   0%                - vertico--display-candidates
          11   0%                   vertico--resize-window
          15   0%               - timer-event-handler
          10   0%                - apply
           7   0%                 - battery-update-handler
           7   0%                  - sit-for
           7   0%                   - redisplay
           7   0%                      redisplay_internal (C function)
           3   0%                   #<compiled 0x12c58df73848dc86>
           2   0%               - internal-timer-start-idle
           2   0%                  timerp
           2   0%               - command-execute
           2   0%                - call-interactively
           2   0%                 - funcall-interactively
           2   0%                  - vertico-exit
           2   0%                   - vertico--match-p
           2   0%                    - test-completion
           2   0%                     - #<compiled -0x1464df124877e5c8>
           2   0%                        complete-with-action
          27   0%      - find-file
          27   0%       - find-file-noselect
          24   0%        - find-file-noselect-1
           4   0%         - insert-file-contents
           4   0%          - set-auto-coding
           4   0%           - find-auto-coding
           4   0%              sgml-html-meta-auto-coding-function
           4   0%         - after-find-file
           4   0%          - normal-mode
           4   0%           - set-auto-mode
           4   0%            - set-auto-mode--apply-alist
           4   0%             - set-auto-mode-0
           4   0%              - c-ts-mode
           4   0%                 treesit-ready-p
           3   0%        - find-buffer-visiting
           3   0%           abbreviate-file-name
          15   0%      - project-files
          15   0%       - apply
          15   0%        - #<compiled -0x7a9f28e22b82f80>
          15   0%         - mapcan
          15   0%          - #<compiled 0x14d13416934a6c69>
          15   0%           - project--vc-list-files
          11   0%            - apply
          11   0%             - vc-git--run-command-string
          11   0%              - #<compiled 0x88854d79be8a>
          11   0%               - kill-buffer
          11   0%                - replace-buffer-in-windows
          11   0%                 - unrecord-window-buffer
          11   0%                    assq-delete-all
           4   0%              split-string
           5   0%    - next-line
           5   0%     - line-move
           5   0%        line-move-visual
           4   0%    - execute-extended-command
           4   0%     - command-execute
           4   0%      - call-interactively
           4   0%       - funcall-interactively
           4   0%          profiler-stop
           2   0%    - digit-argument
           2   0%     - universal-argument--mode
           2   0%        set-transient-map
         309   1%   - byte-code
         309   1%    - read-extended-command
         309   1%     - read-extended-command-1
         309   1%      - completing-read
         309   1%       - completing-read-default
         309   1%        - apply
         309   1%         - vertico--advice
         309   1%          - apply
         309   1%           - #<compiled -0x2e553dfe9f75520>
         276   1%            - read-from-minibuffer
         253   1%             - vertico--exhibit
         249   1%              - vertico--update
         240   1%               - vertico--recompute
         236   1%                - vertico--all-completions
         236   1%                 - apply
         236   1%                  - completion-all-completions
         236   1%                   - completion--nth-completion
         236   1%                    - completion--some
         236   1%                     - #<compiled -0x18735a95ea969dbf>
         163   0%                      - completion-basic-all-completions
         163   0%                       - completion-pcm--all-completions
         163   0%                        - all-completions
         163   0%                         - #<compiled -0xf2f3e8a19f62ad2>
         163   0%                          - complete-with-action
           4   0%                           - all-completions
           4   0%                            - #<compiled 0xadd42c29ce50255>
           4   0%                               #<compiled 0x1a1dcc3780af9553>
          73   0%                      - completion-substring-all-completions
          73   0%                       - completion-substring--all-completions
          64   0%                        - completion-pcm--all-completions
          64   0%                         - all-completions
          64   0%                          - #<compiled -0x1464df124877e5c8>
          64   0%                             complete-with-action
           4   0%                - test-completion
           4   0%                 - #<compiled -0xf2f3e8a19f62ad2>
           4   0%                    complete-with-action
           7   0%                 redisplay
           4   0%              - vertico--display-candidates
           4   0%                 vertico--resize-window
           4   0%             - redisplay_internal (C function)
           4   0%              - eval
           4   0%                 unless
         201   1% + timer-event-handler
          50   0% + ...
           4   0%   set-message-functions




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 20:17:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Theodor Thornhill <theo <at> thornhill.no>
Cc: bug-gnu-emacs <at> gnu.org, casouri <at> gmail.com
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 22:16:25 +0200
> From: Theodor Thornhill <theo <at> thornhill.no>
> Cc: Yuan Fu <casouri <at> gmail.com>
> Date: Sun, 20 Nov 2022 20:54:05 +0100
> 
> > Observe that fontifications stop at this line for some reason.
> > Fontification reappears on line 209271.  Maybe it's because of the many
> > braces that appear in warning face?  Why does TS think there are syntax
> > errors here?  The C++ TS parser doesn't have that problem, btw.
> 
> It seems the c parser definitely can't handle what it's seeing.

Yes, but do you have any clue why it gives up at that line?

One thing that I see is that many braces around there are shown in warning
face, so perhaps the parser is overwhelmed by the amount of parsing errors?

> > P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
> 
> It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
> it'll be some more memory usage.

After lifting the limit to allow visiting the file, this file causes Emacs
to go up to 350 MiB.  Which is significant, but definitely not outrageous
enough to prevent using TS with this file.  And I'm sure "normal" C files
(as opposed to ones written by a program) will need less memory.  So 4 MiB
sounds too restrictive to me.  We should maybe increase that to 15 MiB on
32-bit systems and say 40 MiB on 64-bit?

> I'll do some more digging, but in the
> meantime I attach this profiler report that shows font-locking as the
> culprit:

Culprit for what?  For slow performance?  Don't get me wrong: from my POV,
TS works here better than CC Mode, in many use cases which are much more
important than scrolling through the entire humongous file top to bottom.
For example, just visiting the file takes 3 times as much with CC Mode as
with c-ts-mode; going to EOB with CC Mode takes more 1 min 20 sec, whereas
TS does it in 2.5 sec.  And likewise jumping into a random point in the
file.  Instead of Alan's 150 sec for a full scroll by CC Mode I get 27 min.
The number of GC cycles with CC Mode is 10 times as large as with TS.
(Caveat: my Emacs is built without optimizations, whereas Tree-sitter and
the language support libraries are, of course, fully optimized.)

> In this profile I followed your repro, and did some more movement around
> the buffer after.  This isn't from emacs -Q, but I believe the results
> will be just the same, considering where the slowness seems to be
> 
> 
>        16695  85% - redisplay_internal (C function)
>        16695  85%  - jit-lock-function
>        16695  85%   - jit-lock-fontify-now
>        16695  85%    - jit-lock--run-functions
>        16695  85%     - run-hook-wrapped
>        16695  85%      - #<compiled -0x156eddb48a262583>
>        16695  85%       - font-lock-fontify-region
>        16695  85%        - font-lock-default-fontify-region
>        16679  84%         - treesit-font-lock-fontify-region

Yes, treesit-font-lock-fontify-region takes the lion's share.  If you or
Yuan can speed this up, please do.  But I see no reason to consider this a
catastrophe, quite to the contrary.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 20:18:01 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Theodor Thornhill <theo <at> thornhill.no>
To: Eli Zaretskii <eliz <at> gnu.org>, bug-gnu-emacs <at> gnu.org
Cc: Yuan Fu <casouri <at> gmail.com>
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 21:17:20 +0100
>
> P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?

I thought it was supposed to be 4 gigs, as seen in this function:

```
static void
treesit_check_buffer_size (struct buffer *buffer)
{
  ptrdiff_t buffer_size = (BUF_Z (buffer) - BUF_BEG (buffer));
  if (buffer_size > UINT32_MAX)
    xsignal2 (Qtreesit_buffer_too_large,
	      build_pure_c_string ("Buffer size cannot be larger than 4GB"),
	      make_fixnum (buffer_size));
}
```

So my guess is that that is a typo, and should be


(defcustom treesit-max-buffer-size (* 4 1024 1024 1024)
  "Maximum buffer size for enabling tree-sitter parsing (in bytes)."
  :type 'integer
  :version "29.1")

or something like that :-)

Theo




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 20:34:02 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Theodor Thornhill <theo <at> thornhill.no>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: bug-gnu-emacs <at> gnu.org, casouri <at> gmail.com
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 21:33:06 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Theodor Thornhill <theo <at> thornhill.no>
>> Cc: Yuan Fu <casouri <at> gmail.com>
>> Date: Sun, 20 Nov 2022 20:54:05 +0100
>> 
>> > Observe that fontifications stop at this line for some reason.
>> > Fontification reappears on line 209271.  Maybe it's because of the many
>> > braces that appear in warning face?  Why does TS think there are syntax
>> > errors here?  The C++ TS parser doesn't have that problem, btw.
>> 
>> It seems the c parser definitely can't handle what it's seeing.
>
> Yes, but do you have any clue why it gives up at that line?
>

No, not yet.


> One thing that I see is that many braces around there are shown in warning
> face, so perhaps the parser is overwhelmed by the amount of parsing errors?
>

Yeah that's my first guess, but that shouldn't be an issue, it should be
able to font-lock _something_.

>> > P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
>> 
>> It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
>> it'll be some more memory usage.
>
> After lifting the limit to allow visiting the file, this file causes Emacs
> to go up to 350 MiB.  Which is significant, but definitely not outrageous
> enough to prevent using TS with this file.  And I'm sure "normal" C files
> (as opposed to ones written by a program) will need less memory.  So 4 MiB
> sounds too restrictive to me.  We should maybe increase that to 15 MiB on
> 32-bit systems and say 40 MiB on 64-bit?
>

I think it should probably be the same as in the C level, as I mentioned
in the other mail?

>> I'll do some more digging, but in the
>> meantime I attach this profiler report that shows font-locking as the
>> culprit:
>
> Culprit for what?  For slow performance?

Yeah.

> Don't get me wrong: from my POV, TS works here better than CC Mode, in
> many use cases which are much more important than scrolling through
> the entire humongous file top to bottom.  For example, just visiting
> the file takes 3 times as much with CC Mode as with c-ts-mode; going
> to EOB with CC Mode takes more 1 min 20 sec, whereas TS does it in 2.5
> sec.  And likewise jumping into a random point in the file.  Instead
> of Alan's 150 sec for a full scroll by CC Mode I get 27 min.  The
> number of GC cycles with CC Mode is 10 times as large as with TS.
> (Caveat: my Emacs is built without optimizations, whereas Tree-sitter
> and the language support libraries are, of course, fully optimized.)
>

Ok, that's good to know!

>> In this profile I followed your repro, and did some more movement around
>> the buffer after.  This isn't from emacs -Q, but I believe the results
>> will be just the same, considering where the slowness seems to be
>> 
>> 
>>        16695  85% - redisplay_internal (C function)
>>        16695  85%  - jit-lock-function
>>        16695  85%   - jit-lock-fontify-now
>>        16695  85%    - jit-lock--run-functions
>>        16695  85%     - run-hook-wrapped
>>        16695  85%      - #<compiled -0x156eddb48a262583>
>>        16695  85%       - font-lock-fontify-region
>>        16695  85%        - font-lock-default-fontify-region
>>        16679  84%         - treesit-font-lock-fontify-region
>
> Yes, treesit-font-lock-fontify-region takes the lion's share.  If you or
> Yuan can speed this up, please do.  But I see no reason to consider this a
> catastrophe, quite to the contrary.

I think it boils down to getting the root too many times.  In an
unmodified buffer I think getting the root node should be instant, and
it seems to take some time.  I'll try to figure out why.

Theo




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 20:53:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Theodor Thornhill <theo <at> thornhill.no>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: bug-gnu-emacs <at> gnu.org, casouri <at> gmail.com
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 21:51:51 +0100
Theodor Thornhill <theo <at> thornhill.no> writes:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>>> From: Theodor Thornhill <theo <at> thornhill.no>
>>> Cc: Yuan Fu <casouri <at> gmail.com>
>>> Date: Sun, 20 Nov 2022 20:54:05 +0100
>>> 
>>> > Observe that fontifications stop at this line for some reason.
>>> > Fontification reappears on line 209271.  Maybe it's because of the many
>>> > braces that appear in warning face?  Why does TS think there are syntax
>>> > errors here?  The C++ TS parser doesn't have that problem, btw.
>>> 
>>> It seems the c parser definitely can't handle what it's seeing.
>>
>> Yes, but do you have any clue why it gives up at that line?
>>
>
> No, not yet.
>
>

This diff fixes the font-lock issues:

diff --git a/lisp/treesit.el b/lisp/treesit.el
index 674c984dfe..0f84d8b83e 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
       ;; will give you that quote node.  We want to capture the string
       ;; and apply string face to it, but querying on the quote node
       ;; will not give us the string node.
-      (when-let ((root (treesit-buffer-root-node language))
+      (when-let (
                  ;; Only activate if ENABLE flag is t.
                  (activate (eq t enable)))
         (ignore activate)
         (let ((captures (treesit-query-capture
-                         root query start end))
+                         (treesit-node-on start end) query start end))
               (inhibit-point-motion-hooks t))
           (with-silent-modifications
             (dolist (capture captures)


However, the comment right above makes a case for why we should have
this.  BUT, is this still relevant, Yuan, after the changes in treesit
reporting what has changed etc?  What exact case is that an issue?  And
is it more severe than the behavior this bug is exhibiting?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 21:00:02 GMT) Full text and rfc822 format available.

Message #23 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Theodor Thornhill <theo <at> thornhill.no>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Bug Report Emacs <bug-gnu-emacs <at> gnu.org>
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 12:59:42 -0800

> On Nov 20, 2022, at 12:33 PM, Theodor Thornhill <theo <at> thornhill.no> wrote:
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
>>> From: Theodor Thornhill <theo <at> thornhill.no>
>>> Cc: Yuan Fu <casouri <at> gmail.com>
>>> Date: Sun, 20 Nov 2022 20:54:05 +0100
>>> 
>>>> Observe that fontifications stop at this line for some reason.
>>>> Fontification reappears on line 209271.  Maybe it's because of the many
>>>> braces that appear in warning face?  Why does TS think there are syntax
>>>> errors here?  The C++ TS parser doesn't have that problem, btw.
>>> 
>>> It seems the c parser definitely can't handle what it's seeing.
>> 
>> Yes, but do you have any clue why it gives up at that line?
>> 
> 
> No, not yet.

Because the whole thing is contained in an ERROR node. I wasn’t covered in error face because our rule for error doesn’t “override”: if there are existing faces in the range, the error face isn’t applied. If I change the rule fontifying errors to override, everything is in error face. Alternatively, if you disable fontifying errors, like this:

(add-hook 'c-ts-mode-hook #'c-ts-setup)
(defun c-ts-setup ()
  (treesit-font-lock-recompute-features nil '(error)))

> 
> 
>> One thing that I see is that many braces around there are shown in warning
>> face, so perhaps the parser is overwhelmed by the amount of parsing errors?
>> 
> 
> Yeah that's my first guess, but that shouldn't be an issue, it should be
> able to font-lock _something_.

Yeah, see above.

> 
>>>> P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
>>> 
>>> It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
>>> it'll be some more memory usage.
>> 
>> After lifting the limit to allow visiting the file, this file causes Emacs
>> to go up to 350 MiB.  Which is significant, but definitely not outrageous
>> enough to prevent using TS with this file.  And I'm sure "normal" C files
>> (as opposed to ones written by a program) will need less memory.  So 4 MiB
>> sounds too restrictive to me.  We should maybe increase that to 15 MiB on
>> 32-bit systems and say 40 MiB on 64-bit?
>> 
> 
> I think it should probably be the same as in the C level, as I mentioned
> in the other mail?

4GB is the absolute upper limit, but the practical maximum size if well below that. Thought 4MB might be too conservative.

> 
>>> I'll do some more digging, but in the
>>> meantime I attach this profiler report that shows font-locking as the
>>> culprit:
>> 
>> Culprit for what?  For slow performance?
> 
> Yeah.
> 
>> Don't get me wrong: from my POV, TS works here better than CC Mode, in
>> many use cases which are much more important than scrolling through
>> the entire humongous file top to bottom.  For example, just visiting
>> the file takes 3 times as much with CC Mode as with c-ts-mode; going
>> to EOB with CC Mode takes more 1 min 20 sec, whereas TS does it in 2.5
>> sec.  And likewise jumping into a random point in the file.  Instead
>> of Alan's 150 sec for a full scroll by CC Mode I get 27 min.  The
>> number of GC cycles with CC Mode is 10 times as large as with TS.
>> (Caveat: my Emacs is built without optimizations, whereas Tree-sitter
>> and the language support libraries are, of course, fully optimized.)
>> 
> 
> Ok, that's good to know!
> 
>>> In this profile I followed your repro, and did some more movement around
>>> the buffer after.  This isn't from emacs -Q, but I believe the results
>>> will be just the same, considering where the slowness seems to be
>>> 
>>> 
>>>       16695  85% - redisplay_internal (C function)
>>>       16695  85%  - jit-lock-function
>>>       16695  85%   - jit-lock-fontify-now
>>>       16695  85%    - jit-lock--run-functions
>>>       16695  85%     - run-hook-wrapped
>>>       16695  85%      - #<compiled -0x156eddb48a262583>
>>>       16695  85%       - font-lock-fontify-region
>>>       16695  85%        - font-lock-default-fontify-region
>>>       16679  84%         - treesit-font-lock-fontify-region
>> 
>> Yes, treesit-font-lock-fontify-region takes the lion's share.  If you or
>> Yuan can speed this up, please do.  But I see no reason to consider this a
>> catastrophe, quite to the contrary.
> 
> I think it boils down to getting the root too many times.  In an
> unmodified buffer I think getting the root node should be instant, and
> it seems to take some time.  I'll try to figure out why.

Getting root is trivial, the bulk of the time is spent in query-capture

Running the following in that file gives me 1.87 seconds, while in a smaller file it only takes 0.00016.

(benchmark-run 100
  (let ((query (caar treesit-font-lock-settings))
        (root (treesit-buffer-root-node)))
    (treesit-query-capture root query 7700472 7703604)))

> This diff fixes the font-lock issues:
> 
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 674c984dfe..0f84d8b83e 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
>       ;; will give you that quote node.  We want to capture the string
>       ;; and apply string face to it, but querying on the quote node
>       ;; will not give us the string node.
> -      (when-let ((root (treesit-buffer-root-node language))
> +      (when-let (
>                  ;; Only activate if ENABLE flag is t.
>                  (activate (eq t enable)))
>         (ignore activate)
>         (let ((captures (treesit-query-capture
> -                         root query start end))
> +                         (treesit-node-on start end) query start end))
>               (inhibit-point-motion-hooks t))
>           (with-silent-modifications
>             (dolist (capture captures)
> 
> 
> However, the comment right above makes a case for why we should have
> this.  BUT, is this still relevant, Yuan, after the changes in treesit
> reporting what has changed etc?  What exact case is that an issue?  And
> is it more severe than the behavior this bug is exhibiting?

The case described by the comment is still relevant. With this patch, the quote described in that case still wouldn’t be fontified. We can use some heuristic to get a node “large enough” and not the root node. Eg, find some top-level node. That should make query-capture much faster.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 21:10:02 GMT) Full text and rfc822 format available.

Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Theodor Thornhill <theo <at> thornhill.no>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Bug Report Emacs <bug-gnu-emacs <at> gnu.org>
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 22:09:37 +0100
>> This diff fixes the font-lock issues:
>> 
>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>> index 674c984dfe..0f84d8b83e 100644
>> --- a/lisp/treesit.el
>> +++ b/lisp/treesit.el
>> @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
>>       ;; will give you that quote node.  We want to capture the string
>>       ;; and apply string face to it, but querying on the quote node
>>       ;; will not give us the string node.
>> -      (when-let ((root (treesit-buffer-root-node language))
>> +      (when-let (
>>                  ;; Only activate if ENABLE flag is t.
>>                  (activate (eq t enable)))
>>         (ignore activate)
>>         (let ((captures (treesit-query-capture
>> -                         root query start end))
>> +                         (treesit-node-on start end) query start end))
>>               (inhibit-point-motion-hooks t))
>>           (with-silent-modifications
>>             (dolist (capture captures)
>> 
>> 
>> However, the comment right above makes a case for why we should have
>> this.  BUT, is this still relevant, Yuan, after the changes in treesit
>> reporting what has changed etc?  What exact case is that an issue?  And
>> is it more severe than the behavior this bug is exhibiting?
>
> The case described by the comment is still relevant. With this patch,
> the quote described in that case still wouldn’t be fontified. We can
> use some heuristic to get a node “large enough” and not the root
> node. Eg, find some top-level node. That should make query-capture
> much faster.
>

I appreciate the explanation.  I think getting the root is a bit
excessive.  I got the same results as you in the capture.  Maybe reuse
the treesit-defun-type-regexp, and default to root if none found?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 21:28:01 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Theodor Thornhill <theo <at> thornhill.no>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Bug Report Emacs <bug-gnu-emacs <at> gnu.org>
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 13:27:04 -0800

> On Nov 20, 2022, at 1:09 PM, Theodor Thornhill <theo <at> thornhill.no> wrote:
> 
>>> This diff fixes the font-lock issues:
>>> 
>>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>>> index 674c984dfe..0f84d8b83e 100644
>>> --- a/lisp/treesit.el
>>> +++ b/lisp/treesit.el
>>> @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
>>>      ;; will give you that quote node.  We want to capture the string
>>>      ;; and apply string face to it, but querying on the quote node
>>>      ;; will not give us the string node.
>>> -      (when-let ((root (treesit-buffer-root-node language))
>>> +      (when-let (
>>>                 ;; Only activate if ENABLE flag is t.
>>>                 (activate (eq t enable)))
>>>        (ignore activate)
>>>        (let ((captures (treesit-query-capture
>>> -                         root query start end))
>>> +                         (treesit-node-on start end) query start end))
>>>              (inhibit-point-motion-hooks t))
>>>          (with-silent-modifications
>>>            (dolist (capture captures)
>>> 
>>> 
>>> However, the comment right above makes a case for why we should have
>>> this.  BUT, is this still relevant, Yuan, after the changes in treesit
>>> reporting what has changed etc?  What exact case is that an issue?  And
>>> is it more severe than the behavior this bug is exhibiting?
>> 
>> The case described by the comment is still relevant. With this patch,
>> the quote described in that case still wouldn’t be fontified. We can
>> use some heuristic to get a node “large enough” and not the root
>> node. Eg, find some top-level node. That should make query-capture
>> much faster.
>> 
> 
> I appreciate the explanation.  I think getting the root is a bit
> excessive.  I got the same results as you in the capture.  Maybe reuse
> the treesit-defun-type-regexp, and default to root if none found?

I tried the "top-level node” approach, and it didn’t help in package-rrc.c: the top-level node (a function definition) is still too large (spans 7680306-9936062). Since the case I described in the comment against using treesit-node-on is the exception rather than the norm, maybe we can go the other way around: use treesit-node-on first, and if the node seems too small (by some heuristic), enlarge it to some degree.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Sun, 20 Nov 2022 21:57:02 GMT) Full text and rfc822 format available.

Message #32 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Theodor Thornhill <theo <at> thornhill.no>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Bug Report Emacs <bug-gnu-emacs <at> gnu.org>
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Sun, 20 Nov 2022 22:56:12 +0100
>> 
>> I appreciate the explanation.  I think getting the root is a bit
>> excessive.  I got the same results as you in the capture.  Maybe reuse
>> the treesit-defun-type-regexp, and default to root if none found?
>
> I tried the "top-level node” approach, and it didn’t help in
> package-rrc.c: the top-level node (a function definition) is still too
> large (spans 7680306-9936062). Since the case I described in the
> comment against using treesit-node-on is the exception rather than the
> norm, maybe we can go the other way around: use treesit-node-on first,
> and if the node seems too small (by some heuristic), enlarge it to
> some degree.
>

Makes sense!

BTW, should the chunk-size of jit-lock be up for discussion again?  I
ran the benchmarks from this thread [0] on this file, and it seems like
increasing the chunk-size from 1500 to 4500 by 500 increments makes it
average from 2 seconds to 1.65.

The density of that file absolutely is a concern performance-wise.

Theo


[0]: https://lists.gnu.org/archive/html/emacs-devel/2021-09/msg00538.html




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Mon, 21 Nov 2022 01:28:02 GMT) Full text and rfc822 format available.

Message #35 received at 59415 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Theodor Thornhill <theo <at> thornhill.no>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 59415 <at> debbugs.gnu.org
Subject: Re: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to
 fontify a portion of a large C file
Date: Sun, 20 Nov 2022 17:27:16 -0800

> On Nov 20, 2022, at 1:56 PM, Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors <bug-gnu-emacs <at> gnu.org> wrote:
> 
>>> 
>>> I appreciate the explanation.  I think getting the root is a bit
>>> excessive.  I got the same results as you in the capture.  Maybe reuse
>>> the treesit-defun-type-regexp, and default to root if none found?
>> 
>> I tried the "top-level node” approach, and it didn’t help in
>> package-rrc.c: the top-level node (a function definition) is still too
>> large (spans 7680306-9936062). Since the case I described in the
>> comment against using treesit-node-on is the exception rather than the
>> norm, maybe we can go the other way around: use treesit-node-on first,
>> and if the node seems too small (by some heuristic), enlarge it to
>> some degree.
>> 
> 
> Makes sense!

I pushed a change that uses treesit-node-on. Now scrolling in most parts of the buffer is pretty fast. Scrolling around 194770 is still laggy, because the node we get from treesit-node-on is still too large. I tried some heuristics but they didn’t work very well, IMO because tree-sitter couldn’t parse that part of the code very well. The code should observe a structure like {{}, {}, {}, {}, {}, …} where there are tens thousands of inner brackets, so ideally we only need to grab the {}’s in the region we want to fontify. But tree-sitter seems to understand it in some weird structure and we still end up with very large nodes, which is far larger than the region we want to fontify and is slow to query.

I’ll try to improve it further in the future, but for now I think it’s good enough (because in most cases fontification is pretty fast).

Also I think we should probably disable fontifying errors in C. C’s macros just create too much errors.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Mon, 21 Nov 2022 11:01:02 GMT) Full text and rfc822 format available.

Message #38 received at 59415 <at> debbugs.gnu.org (full text, mbox):

From: Theodor Thornhill <theo <at> thornhill.no>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 59415 <at> debbugs.gnu.org
Subject: Re: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to
 fontify a portion of a large C file
Date: Mon, 21 Nov 2022 12:00:37 +0100
Yuan Fu <casouri <at> gmail.com> writes:

>> On Nov 20, 2022, at 1:56 PM, Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors <bug-gnu-emacs <at> gnu.org> wrote:
>> 
>>>> 
>>>> I appreciate the explanation.  I think getting the root is a bit
>>>> excessive.  I got the same results as you in the capture.  Maybe reuse
>>>> the treesit-defun-type-regexp, and default to root if none found?
>>> 
>>> I tried the "top-level node” approach, and it didn’t help in
>>> package-rrc.c: the top-level node (a function definition) is still too
>>> large (spans 7680306-9936062). Since the case I described in the
>>> comment against using treesit-node-on is the exception rather than the
>>> norm, maybe we can go the other way around: use treesit-node-on first,
>>> and if the node seems too small (by some heuristic), enlarge it to
>>> some degree.
>>> 
>> 
>> Makes sense!
>
> I pushed a change that uses treesit-node-on. Now scrolling in most
> parts of the buffer is pretty fast. Scrolling around 194770 is still
> laggy, because the node we get from treesit-node-on is still too
> large. I tried some heuristics but they didn’t work very well, IMO
> because tree-sitter couldn’t parse that part of the code very
> well. The code should observe a structure like {{}, {}, {}, {}, {}, …}
> where there are tens thousands of inner brackets, so ideally we only
> need to grab the {}’s in the region we want to fontify. But
> tree-sitter seems to understand it in some weird structure and we
> still end up with very large nodes, which is far larger than the
> region we want to fontify and is slow to query.
>
> I’ll try to improve it further in the future, but for now I think it’s
> good enough (because in most cases fontification is pretty fast).
>
> Also I think we should probably disable fontifying errors in C. C’s
> macros just create too much errors.


Good job.  I ran this in both c-ts-mode and c-mode on the same file:

(defun scroll-up-benchmark ()
  (interactive)
  (let ((oldgc gcs-done)
        (oldtime (float-time)))
    (condition-case nil (while t (scroll-up) (redisplay))
      (error (message "GCs: %d Elapsed time: %f seconds"
                      (- gcs-done oldgc) (- (float-time) oldtime))))))


c-ts-mode: GCs: 87 Elapsed time: 135.700742 seconds

c-mode: GCs: 224 Elapsed time: 133.329396 seconds

Font locking seems correct too.

Theo




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Mon, 21 Nov 2022 12:42:01 GMT) Full text and rfc822 format available.

Message #41 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Theodor Thornhill <theo <at> thornhill.no>
Cc: casouri <at> gmail.com, bug-gnu-emacs <at> gnu.org
Subject: Re: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a
 portion of a large C file
Date: Mon, 21 Nov 2022 14:41:11 +0200
> From: Theodor Thornhill <theo <at> thornhill.no>
> Cc: Eli Zaretskii <eliz <at> gnu.org>, Bug Report Emacs <bug-gnu-emacs <at> gnu.org>
> Date: Sun, 20 Nov 2022 22:56:12 +0100
> 
> > I tried the "top-level node” approach, and it didn’t help in
> > package-rrc.c: the top-level node (a function definition) is still too
> > large (spans 7680306-9936062). Since the case I described in the
> > comment against using treesit-node-on is the exception rather than the
> > norm, maybe we can go the other way around: use treesit-node-on first,
> > and if the node seems too small (by some heuristic), enlarge it to
> > some degree.
> >
> 
> Makes sense!
> 
> BTW, should the chunk-size of jit-lock be up for discussion again?  I
> ran the benchmarks from this thread [0] on this file, and it seems like
> increasing the chunk-size from 1500 to 4500 by 500 increments makes it
> average from 2 seconds to 1.65.
> 
> The density of that file absolutely is a concern performance-wise.

FWIW, if the root cause is the humongous data structure, I'm not too
worried, because such cases are extremely rare.  If some clever idea arises
that could improve things without endangering more practical use cases, then
fine; otherwise, I'm okay with the slightly slower performance in these
extreme cases -- after all, the interactive responsiveness is not that bad.

But I still don't understand why fontifications stopped _completely_
starting at that line.  That is, if the entire strict is in error, why most
of it is fontified, and only the last party isn't? what is the mechanism
which causes that?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Mon, 21 Nov 2022 13:45:02 GMT) Full text and rfc822 format available.

Message #44 received at 59415 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 59415 <at> debbugs.gnu.org, theo <at> thornhill.no
Subject: Re: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to
 fontify a portion of a large C file
Date: Mon, 21 Nov 2022 15:44:34 +0200
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Sun, 20 Nov 2022 17:27:16 -0800
> Cc: Eli Zaretskii <eliz <at> gnu.org>,
>  59415 <at> debbugs.gnu.org
> 
> Also I think we should probably disable fontifying errors in C. C’s macros just create too much errors.

Let's make it optional.  I think at least I would like to see those errors.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Mon, 21 Nov 2022 15:16:01 GMT) Full text and rfc822 format available.

Message #47 received at 59415 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 59415 <at> debbugs.gnu.org, theo <at> thornhill.no
Subject: Re: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to
 fontify a portion of a large C file
Date: Mon, 21 Nov 2022 17:15:13 +0200
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Sun, 20 Nov 2022 17:27:16 -0800
> Cc: Eli Zaretskii <eliz <at> gnu.org>,
>  59415 <at> debbugs.gnu.org
> 
> I pushed a change that uses treesit-node-on. Now scrolling in most parts of the buffer is pretty fast. Scrolling around 194770 is still laggy, because the node we get from treesit-node-on is still too large. I tried some heuristics but they didn’t work very well, IMO because tree-sitter couldn’t parse that part of the code very well. The code should observe a structure like {{}, {}, {}, {}, {}, …} where there are tens thousands of inner brackets, so ideally we only need to grab the {}’s in the region we want to fontify. But tree-sitter seems to understand it in some weird structure and we still end up with very large nodes, which is far larger than the region we want to fontify and is slow to query.
> 
> I’ll try to improve it further in the future, but for now I think it’s good enough (because in most cases fontification is pretty fast).

Agreed.

Thanks, I think we can close the bug now.

What do you think about enlarging treesit-max-buffer-size as I proposed
up-thread?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Mon, 21 Nov 2022 16:54:01 GMT) Full text and rfc822 format available.

Message #50 received at 59415 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 59415 <at> debbugs.gnu.org, Theodor Thornhill <theo <at> thornhill.no>
Subject: Re: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to
 fontify a portion of a large C file
Date: Mon, 21 Nov 2022 08:53:25 -0800

> On Nov 21, 2022, at 7:15 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>> From: Yuan Fu <casouri <at> gmail.com>
>> Date: Sun, 20 Nov 2022 17:27:16 -0800
>> Cc: Eli Zaretskii <eliz <at> gnu.org>,
>> 59415 <at> debbugs.gnu.org
>> 
>> I pushed a change that uses treesit-node-on. Now scrolling in most parts of the buffer is pretty fast. Scrolling around 194770 is still laggy, because the node we get from treesit-node-on is still too large. I tried some heuristics but they didn’t work very well, IMO because tree-sitter couldn’t parse that part of the code very well. The code should observe a structure like {{}, {}, {}, {}, {}, …} where there are tens thousands of inner brackets, so ideally we only need to grab the {}’s in the region we want to fontify. But tree-sitter seems to understand it in some weird structure and we still end up with very large nodes, which is far larger than the region we want to fontify and is slow to query.
>> 
>> I’ll try to improve it further in the future, but for now I think it’s good enough (because in most cases fontification is pretty fast).
> 
> Agreed.
> 
> Thanks, I think we can close the bug now.
> 
> What do you think about enlarging treesit-max-buffer-size as I proposed
> up-thread?

Yeah we should do it, but to what value though? 40MB?

Yuan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Mon, 21 Nov 2022 17:18:01 GMT) Full text and rfc822 format available.

Message #53 received at 59415 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 59415 <at> debbugs.gnu.org, theo <at> thornhill.no
Subject: Re: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to
 fontify a portion of a large C file
Date: Mon, 21 Nov 2022 19:17:31 +0200
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Mon, 21 Nov 2022 08:53:25 -0800
> Cc: Theodor Thornhill <theo <at> thornhill.no>,
>  59415 <at> debbugs.gnu.org
> 
> > What do you think about enlarging treesit-max-buffer-size as I proposed
> > up-thread?
> 
> Yeah we should do it, but to what value though? 40MB?

My suggestion was 15 MiB on 32-bit systems and 40 MiB on 64-bit systems.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59415; Package emacs. (Tue, 22 Nov 2022 07:33:01 GMT) Full text and rfc822 format available.

Message #56 received at 59415 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 59415 <at> debbugs.gnu.org, Theodor Thornhill <theo <at> thornhill.no>,
 59415-done <at> debbugs.gnu.org
Subject: Re: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to
 fontify a portion of a large C file
Date: Mon, 21 Nov 2022 23:31:57 -0800

> On Nov 21, 2022, at 9:17 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>> From: Yuan Fu <casouri <at> gmail.com>
>> Date: Mon, 21 Nov 2022 08:53:25 -0800
>> Cc: Theodor Thornhill <theo <at> thornhill.no>,
>> 59415 <at> debbugs.gnu.org
>> 
>>> What do you think about enlarging treesit-max-buffer-size as I proposed
>>> up-thread?
>> 
>> Yeah we should do it, but to what value though? 40MB?
> 
> My suggestion was 15 MiB on 32-bit systems and 40 MiB on 64-bit systems.

Cool, changed, will push soon.




Reply sent to Yuan Fu <casouri <at> gmail.com>:
You have taken responsibility. (Tue, 22 Nov 2022 07:33:02 GMT) Full text and rfc822 format available.

Notification sent to Eli Zaretskii <eliz <at> gnu.org>:
bug acknowledged by developer. (Tue, 22 Nov 2022 07:33:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 20 Dec 2022 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 183 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.