GNU bug report logs - #59574
29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer

Previous Next

Package: emacs;

Reported by: Eli Zaretskii <eliz <at> gnu.org>

Date: Fri, 25 Nov 2022 15:05:02 UTC

Severity: normal

Found in version 29.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 59574 in the body.
You can then email your comments to 59574 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#59574; Package emacs. (Fri, 25 Nov 2022 15:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eli Zaretskii <eliz <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 25 Nov 2022 15:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: bug-gnu-emacs <at> gnu.org
Cc: Yuan Fu <casouri <at> gmail.com>
Subject: 29.0.50;
 Emacs crashes when using tree-sitter-based mode in an empty buffer
Date: Fri, 25 Nov 2022 17:04:27 +0200
To reproduce:

  emacs -Q
  C-x C-f foo.c RET
  M-x c-ts-mode RET
  Type "in"

Make sure foo.c doesn't exist, so you start from an empty buffer.  As soon
as you type the second character of "in", there's an assertion violation:

treesit.c:1383: Emacs fatal error: assertion failed: end_byte <= BUF_ZV_BYTE (bu
ffer)

  Thread 1 hit Breakpoint 1, terminate_due_to_signal (sig=22, backtrace_limit=2147483647) at emacs.c:427
  427       signal (sig, SIG_DFL);
  (gdb) up
  #1  0x01230802 in die (
      msg=0x18e6778 <DEFAULT_REHASH_SIZE+3288> "end_byte <= BUF_ZV_BYTE (buffer)", file=0x18e5fcc <DEFAULT_REHASH_SIZE+1324> "treesit.c", line=1383)
      at alloc.c:7697
  7697      terminate_due_to_signal (SIGABRT, INT_MAX);
  (gdb)
  #2  0x01355636 in treesit_make_ranges (ranges=0x856a778, len=1,
      buffer=0x7fe94b0) at treesit.c:1383
  1383          eassert (end_byte <= BUF_ZV_BYTE (buffer));
  (gdb) p end_byte
  $1 = 4
  (gdb) p BUF_ZV_BYTE(buffer)
  $2 = 3

Interestingly, this only happens once, when the buffer includes exactly 1
byte and an additional character is inserted.  If you get past this
assertion, further characters can be inserted without any problems, and
end_byte always equals BUF_ZV_BYTE.

The backtrace is below, if it is interesting.

I couldn't figure out where did tree-sitter take the range it returns to us.
Yuan, can you describe how does the parser get the range it needs to
consider?  If I put a breakpoint in treesit-parser-set-included-ranges, the
breakpoint never breaks, so this doesn't seem to be how the range is set in
this scenario.

There's also something strange in treesit_record_change: when it is called
for the first time in a buffer which was empty and you insert one character,
we bypass the updating of visible_beg and visible_end fields of the Lisp
parser object, because XTS_PARSER (lisp_parser)->tree is NULL.  But it looks
to me that we should still update these two fields regardless, no?  Only the
call to treesit_tree_edit_1 needs the tree.  (I thought that maybe this lack
of update explains the assertion, but even if I move the condition to guard
only treesit_tree_edit_1, the assertion still happens, so I guess my
hypothesis eats dust.)

Here's the backtrace I promised:

(gdb) bt
#0  terminate_due_to_signal (sig=22, backtrace_limit=2147483647)
    at emacs.c:427
#1  0x01230802 in die (
    msg=0x18e6778 <DEFAULT_REHASH_SIZE+3288> "end_byte <= BUF_ZV_BYTE (buffer)",
 file=0x18e5fcc <DEFAULT_REHASH_SIZE+1324> "treesit.c", line=1383)
    at alloc.c:7697
#2  0x01355636 in treesit_make_ranges (ranges=0x856a778, len=1,
    buffer=0x7fe94b0) at treesit.c:1383
#3  0x01353c7e in treesit_call_after_change_functions (old_tree=0x84d9fe0,
    new_tree=0x856a5d0, parser=XIL(0xa00000000853e4e8)) at treesit.c:859
#4  0x01353fff in treesit_ensure_parsed (parser=XIL(0xa00000000853e4e8))
    at treesit.c:906
#5  0x01354ff8 in Ftreesit_parser_root_node (parser=XIL(0xa00000000853e4e8))
    at treesit.c:1328
#6  0x012773d2 in funcall_subr (subr=0x1883640 <Streesit_parser_root_node>,
    numargs=1, args=0x6c10470) at eval.c:3034
#7  0x012e9b92 in exec_byte_code (fun=XIL(0xa00000000850edc8),
    args_template=256, nargs=1, args=0x6c10390) at bytecode.c:809
#8  0x0127799a in fetch_and_exec_byte_code (fun=XIL(0xa0000000084b0d20),
    args_template=257, nargs=1, args=0x6c101c8) at eval.c:3081
#9  0x01277ef9 in funcall_lambda (fun=XIL(0xa0000000084b0d20), nargs=1,
    arg_vector=0x6c101c8) at eval.c:3153
#10 0x01276e66 in funcall_general (fun=XIL(0xa0000000084b0d20), numargs=1,
    args=0x6c101c8) at eval.c:2945
#11 0x012771eb in Ffuncall (nargs=2, args=0x6c101c0) at eval.c:2995
#12 0x012762ae in run_hook_wrapped_funcall (nargs=2, args=0x6c101c0)
    at eval.c:2773
#13 0x01276765 in run_hook_with_args (nargs=2, args=0x6c101c0,
    funcall=0x1276266 <run_hook_wrapped_funcall>) at eval.c:2854
#14 0x012762fd in Frun_hook_wrapped (nargs=2, args=0x6c101c0) at eval.c:2788
#15 0x0127784b in funcall_subr (subr=0x187cf00 <Srun_hook_wrapped>,
    numargs=2, args=0x6c101c0) at eval.c:3059
#16 0x012e9b92 in exec_byte_code (fun=XIL(0xa0000000061302c4),
    args_template=514, nargs=2, args=0x6c100f8) at bytecode.c:809
#17 0x0127799a in fetch_and_exec_byte_code (fun=XIL(0xa00000000612fd94),
    args_template=257, nargs=1, args=0x82ac88) at eval.c:3081
#18 0x01277ef9 in funcall_lambda (fun=XIL(0xa00000000612fd94), nargs=1,
    arg_vector=0x82ac88) at eval.c:3153
#19 0x01276e66 in funcall_general (fun=XIL(0xa00000000612fd94), numargs=1,
    args=0x82ac88) at eval.c:2945
#20 0x012771eb in Ffuncall (nargs=2, args=0x82ac80) at eval.c:2995
#21 0x012712a1 in internal_condition_case_n (bfun=0x127709f <Ffuncall>,
    nargs=2, args=0x82ac80, handlers=XIL(0x30),
    hfun=0x104286e <safe_eval_handler>) at eval.c:1558
#22 0x01042aa1 in safe__call (inhibit_quit=false, nargs=2,
    func=XIL(0x47648c4), ap=0x82ad44 "") at xdisp.c:3024
#23 0x01042b1a in safe_call (nargs=2, func=XIL(0x47648c4)) at xdisp.c:3039
#24 0x01042b6e in safe_call1 (fn=XIL(0x47648c4), arg=make_fixnum(1))
    at xdisp.c:3050
#25 0x010469d4 in handle_fontified_prop (it=0x82afd0) at xdisp.c:4416
#26 0x010453c7 in handle_stop (it=0x82afd0) at xdisp.c:3951
#27 0x01051ebf in reseat (it=0x82afd0, pos=..., force_p=true) at xdisp.c:7469
#28 0x01044495 in init_iterator (it=0x82afd0, w=0x7958be0, charpos=1,
    bytepos=1, row=0x7a214a0, base_face_id=DEFAULT_FACE_ID) at xdisp.c:3488
#29 0x010446c3 in start_display (it=0x82afd0, w=0x7958be0, pos=...)
    at xdisp.c:3568
#30 0x0107c99e in try_window (window=XIL(0xa000000007958be0), pos=...,
    flags=1) at xdisp.c:20511
#31 0x01079579 in redisplay_window (window=XIL(0xa000000007958be0),
    just_this_one_p=true) at xdisp.c:19903
#32 0x010706c6 in redisplay_window_1 (window=XIL(0xa000000007958be0))
    at xdisp.c:17405
#33 0x0127108e in internal_condition_case_1 (
    bfun=0x107066e <redisplay_window_1>, arg=XIL(0xa000000007958be0),
    handlers=XIL(0xc000000006462abc), hfun=0x10702c6 <redisplay_window_error>)
    at eval.c:1498
#34 0x0106f10a in redisplay_internal () at xdisp.c:16944
#35 0x0106c163 in redisplay () at xdisp.c:16006
#36 0x01174cf8 in read_char (commandflag=1, map=XIL(0xc000000008096220),
    prev_event=XIL(0), used_mouse_menu=0x82f41f, end_time=0x0)
    at keyboard.c:2623
#37 0x0118ec5e in read_key_sequence (keybuf=0x82f6f8, prompt=XIL(0),
    dont_downcase_last=false, can_return_switch_frame=true,
    fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:10070
#38 0x0117033d in command_loop_1 () at keyboard.c:1376
#39 0x01270fa4 in internal_condition_case (bfun=0x116fcdc <command_loop_1>,
    handlers=XIL(0x90), hfun=0x116ecaa <cmd_error>) at eval.c:1474
#40 0x0116f749 in command_loop_2 (handlers=XIL(0x90)) at keyboard.c:1125
#41 0x0126fe2b in internal_catch (tag=XIL(0x10290),
    func=0x116f712 <command_loop_2>, arg=XIL(0x90)) at eval.c:1197
#42 0x0116f6b4 in command_loop () at keyboard.c:1103
#43 0x0116e70a in recursive_edit_1 () at keyboard.c:712
#44 0x0116e9a8 in Frecursive_edit () at keyboard.c:795
#45 0x0116975d in main (argc=2, argv=0xa428e0) at emacs.c:2523

Lisp Backtrace:
"treesit-parser-root-node" (0x6c10470)
"treesit-buffer-root-node" (0x6c10388)
"treesit-font-lock-fontify-region" (0x6c10300)
"font-lock-default-fontify-region" (0x6c10298)
"font-lock-fontify-region" (0x6c10230)
0x84b0d20 PVEC_COMPILED
"run-hook-wrapped" (0x6c101c0)
"jit-lock--run-functions" (0x6c100e8)
"jit-lock-fontify-now" (0x6c10058)
"jit-lock-function" (0x82ac88)
"redisplay_internal (C function)" (0x0)
(gdb)


In GNU Emacs 29.0.50 (build 2261, i686-pc-mingw32) of 2022-11-25 built
 on HOME-C4E4A596F7
Repository revision: af545234314601ba3dcd8bf32e0d9b46e1917f79
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)

Configured using:
 'configure -C --prefix=/d/usr --with-wide-int
 --enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3''

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY
W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1255

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel dos-w32
ls-lisp disp-table term/w32-win w32-win w32-vars term/common-win
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq
simple cl-generic indonesian philippine cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button
loaddefs theme-loaddefs faces cus-face macroexp files window
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget keymap hashtable-print-readable backquote threads
w32notify w32 lcms2 multi-tty make-network-process emacs)

Memory information:
((conses 16 42624 11101)
 (symbols 48 6278 0)
 (strings 16 16553 2914)
 (string-bytes 1 398654)
 (vectors 16 9312)
 (vector-slots 8 146415 13640)
 (floats 8 23 27)
 (intervals 40 274 97)
 (buffers 896 10))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59574; Package emacs. (Sat, 26 Nov 2022 03:19:01 GMT) Full text and rfc822 format available.

Message #8 received at 59574 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 59574 <at> debbugs.gnu.org
Subject: Re: bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based
 mode in an empty buffer
Date: Fri, 25 Nov 2022 19:18:09 -0800

> On Nov 25, 2022, at 7:04 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
> To reproduce:
> 
>  emacs -Q
>  C-x C-f foo.c RET
>  M-x c-ts-mode RET
>  Type "in"

Thanks for finding this out! 

> 
> Make sure foo.c doesn't exist, so you start from an empty buffer.  As soon
> as you type the second character of "in", there's an assertion violation:
> 
> treesit.c:1383: Emacs fatal error: assertion failed: end_byte <= BUF_ZV_BYTE (bu
> ffer)
> 
>  Thread 1 hit Breakpoint 1, terminate_due_to_signal (sig=22, backtrace_limit=2147483647) at emacs.c:427
>  427       signal (sig, SIG_DFL);
>  (gdb) up
>  #1  0x01230802 in die (
>      msg=0x18e6778 <DEFAULT_REHASH_SIZE+3288> "end_byte <= BUF_ZV_BYTE (buffer)", file=0x18e5fcc <DEFAULT_REHASH_SIZE+1324> "treesit.c", line=1383)
>      at alloc.c:7697
>  7697      terminate_due_to_signal (SIGABRT, INT_MAX);
>  (gdb)
>  #2  0x01355636 in treesit_make_ranges (ranges=0x856a778, len=1,
>      buffer=0x7fe94b0) at treesit.c:1383
>  1383          eassert (end_byte <= BUF_ZV_BYTE (buffer));
>  (gdb) p end_byte
>  $1 = 4
>  (gdb) p BUF_ZV_BYTE(buffer)
>  $2 = 3
> 
> Interestingly, this only happens once, when the buffer includes exactly 1
> byte and an additional character is inserted.  If you get past this
> assertion, further characters can be inserted without any problems, and
> end_byte always equals BUF_ZV_BYTE.
> 
> The backtrace is below, if it is interesting.
> 
> I couldn't figure out where did tree-sitter take the range it returns to us.
> Yuan, can you describe how does the parser get the range it needs to
> consider?  If I put a breakpoint in treesit-parser-set-included-ranges, the
> breakpoint never breaks, so this doesn't seem to be how the range is set in
> this scenario.

After we parse the buffer (in treesit_ensure_parsed) we compute the ranges that has changed since last parse, by calling ts_tree_get_changed_ranges, and pass the ranges to notifier functions (those added by treesit-parser-add-notifier). This range is different from the range within which a parser operates. That range is set by treesit-parser-set-included-ranges, and is not involved with the parsing, treesit_record_changes, visible_beg/end stuff.

Both feature happens to use treesit_make_ranges as a helper function, but the similarity ends there.

> There's also something strange in treesit_record_change: when it is called
> for the first time in a buffer which was empty and you insert one character,
> we bypass the updating of visible_beg and visible_end fields of the Lisp
> parser object, because XTS_PARSER (lisp_parser)->tree is NULL.  But it looks
> to me that we should still update these two fields regardless, no?  Only the
> call to treesit_tree_edit_1 needs the tree.  (I thought that maybe this lack
> of update explains the assertion, but even if I move the condition to guard
> only treesit_tree_edit_1, the assertion still happens, so I guess my
> hypothesis eats dust.)

We don’t need to update visible_beg/end in treesit_record_change if tree is NULL, because visible_beg/end represents the range of buffer that the tree sees, so if there is no tree, visible_beg/end can be considered uninitialized. However you are right about needing to update visible_beg/end, but in treesit_ensure_position_synced (I renamed it to treesit_sync_visible_region): that’s where we ensure visible_beg/end equals to BUF_BEGV_BYTE/friends. 

The problem is we don’t update visible_beg/end for the very first parse, when tree is NULL.

I also added some comments, hopefully they sufficiently explain everything.

Yuan





Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sat, 26 Nov 2022 14:32:02 GMT) Full text and rfc822 format available.

Notification sent to Eli Zaretskii <eliz <at> gnu.org>:
bug acknowledged by developer. (Sat, 26 Nov 2022 14:32:02 GMT) Full text and rfc822 format available.

Message #13 received at 59574-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 59574-done <at> debbugs.gnu.org
Subject: Re: bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based
 mode in an empty buffer
Date: Sat, 26 Nov 2022 16:31:59 +0200
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Fri, 25 Nov 2022 19:18:09 -0800
> Cc: 59574 <at> debbugs.gnu.org
> 
> > There's also something strange in treesit_record_change: when it is called
> > for the first time in a buffer which was empty and you insert one character,
> > we bypass the updating of visible_beg and visible_end fields of the Lisp
> > parser object, because XTS_PARSER (lisp_parser)->tree is NULL.  But it looks
> > to me that we should still update these two fields regardless, no?  Only the
> > call to treesit_tree_edit_1 needs the tree.  (I thought that maybe this lack
> > of update explains the assertion, but even if I move the condition to guard
> > only treesit_tree_edit_1, the assertion still happens, so I guess my
> > hypothesis eats dust.)
> 
> We don’t need to update visible_beg/end in treesit_record_change if tree is NULL, because visible_beg/end represents the range of buffer that the tree sees, so if there is no tree, visible_beg/end can be considered uninitialized. However you are right about needing to update visible_beg/end, but in treesit_ensure_position_synced (I renamed it to treesit_sync_visible_region): that’s where we ensure visible_beg/end equals to BUF_BEGV_BYTE/friends. 
> 
> The problem is we don’t update visible_beg/end for the very first parse, when tree is NULL.
> 
> I also added some comments, hopefully they sufficiently explain everything.

Thanks, the problem is gone, so I'm closing the bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 25 Dec 2022 12:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 234 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.