GNU bug report logs - #60237
30.0.50; tree sitter core dumps when I edebug view a node

Previous Next

Package: emacs;

Reported by: Mickey Petersen <mickey <at> masteringemacs.org>

Date: Wed, 21 Dec 2022 12:30:02 UTC

Severity: normal

Found in version 30.0.50

Done: Stefan Monnier <monnier <at> iro.umontreal.ca>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Mickey Petersen <mickey <at> masteringemacs.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Po Lu <luangruo <at> yahoo.com>, Eli Zaretskii <eliz <at> gnu.org>, 60237 <at> debbugs.gnu.org
Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Date: Mon, 27 Feb 2023 08:22:04 +0000
Yuan Fu <casouri <at> gmail.com> writes:

>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey <at> masteringemacs.org> wrote:
>>
>>
>> Yuan Fu <casouri <at> gmail.com> writes:
>>
>>>> GC has historically never called xmalloc, so the profiler will
>>>> likely
>>>> crash upon growing the mark stack as well.  I guess another
>>>> important
>>>> question is why ts_delete_parser is calling xmalloc.
>>>>
>>>
>>>> As you see, when we call ts_tree_delete, it calls
>>>> ts_subtree_release,
>>>> which in turn calls malloc (redirected into our xmalloc).  Is this
>>>> expected?  Can you look in the tree-sitter sources and verify that
>>>> this is OK?
>>>
>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more
>>> precisely, a Subtree) is just some inlined data plus a refcounted
>>> pointer to the complete data. This way multiple trees share common
>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old
>>> tree and get a new tree, these two trees will share the unchanged part
>>> of the tree.
>>
>> Would that mean we could possibly preserve node instances -- either
>> the real TS ones, or an Emacs-created facsimile -- between
>> incremental parsing? That would be useful for refactoring.
>
> What kind of exact interface (function) do you want? The
> treesit-node-outdated error is solely Emacs’s product, tree-sitter
> itself doesn’t mark a node outdated. It is possible for Emacs to not
> delete the old tree and give it to you, or allow you to access
> information of an outdated node.

OK, so let me explain:

Touching the buffer for any reason invalidates the whole tree; that's
not good. It's not good, because a lot of the information may still be
useful and viable. Outdating the node is not a bad idea as it avoids a
lot of 'traps' around accidental modifications that can corrupt things
without the developer's knowledge.

I'd like to be able to access all the information possible; perhaps
behind a flag variable like `treesit-allow-outdated-node-access'. What
I'm really mostly interested in is:

- How well the node references handle changes in byte positions in TS.

- Does changing something at X shift (like a `point-marker`) everything
below it? Does an outdated node correctly reference its new location
and state, such as changes to children or its position in the tree?

Right now, Combobulate can make a proxy node, which essentially
captures the basics of a live node and stores it in a defstruct. That
way I can at least retain the start/end, type, text, etc. of a node
and still do light refactoring without contorting myself to do things
in a particular order, which is not always possible (like delaying
editing to the very end.)





This bug report was last modified 2 years and 128 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.