Package: emacs;
Reported by: Mickey Petersen <mickey <at> masteringemacs.org>
Date: Wed, 21 Dec 2022 12:30:02 UTC
Severity: normal
Found in version 30.0.50
Done: Stefan Monnier <monnier <at> iro.umontreal.ca>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Yuan Fu <casouri <at> gmail.com> To: Mickey Petersen <mickey <at> masteringemacs.org> Cc: Po Lu <luangruo <at> yahoo.com>, Eli Zaretskii <eliz <at> gnu.org>, 60237 <at> debbugs.gnu.org Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node Date: Mon, 27 Feb 2023 14:37:22 -0800
> On Feb 27, 2023, at 6:29 AM, Mickey Petersen <mickey <at> masteringemacs.org> wrote: > > > Yuan Fu <casouri <at> gmail.com> writes: > >>> On Feb 27, 2023, at 12:22 AM, Mickey Petersen <mickey <at> masteringemacs.org> wrote: >>> >>> >>> Yuan Fu <casouri <at> gmail.com> writes: >>> >>>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey <at> masteringemacs.org> wrote: >>>>> >>>>> >>>>> Yuan Fu <casouri <at> gmail.com> writes: >>>>> >>>>>>> GC has historically never called xmalloc, so the profiler will >>>>>>> likely >>>>>>> crash upon growing the mark stack as well. I guess another >>>>>>> important >>>>>>> question is why ts_delete_parser is calling xmalloc. >>>>>>> >>>>>> >>>>>>> As you see, when we call ts_tree_delete, it calls >>>>>>> ts_subtree_release, >>>>>>> which in turn calls malloc (redirected into our xmalloc). Is this >>>>>>> expected? Can you look in the tree-sitter sources and verify that >>>>>>> this is OK? >>>>>> >>>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more >>>>>> precisely, a Subtree) is just some inlined data plus a refcounted >>>>>> pointer to the complete data. This way multiple trees share common >>>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old >>>>>> tree and get a new tree, these two trees will share the unchanged part >>>>>> of the tree. >>>>> >>>>> Would that mean we could possibly preserve node instances -- either >>>>> the real TS ones, or an Emacs-created facsimile -- between >>>>> incremental parsing? That would be useful for refactoring. >>>> >>>> What kind of exact interface (function) do you want? The >>>> treesit-node-outdated error is solely Emacs’s product, tree-sitter >>>> itself doesn’t mark a node outdated. It is possible for Emacs to not >>>> delete the old tree and give it to you, or allow you to access >>>> information of an outdated node. >>> >>> OK, so let me explain: >>> >>> Touching the buffer for any reason invalidates the whole tree; that's >>> not good. It's not good, because a lot of the information may still be >>> useful and viable. Outdating the node is not a bad idea as it avoids a >>> lot of 'traps' around accidental modifications that can corrupt things >>> without the developer's knowledge. >>> >>> I'd like to be able to access all the information possible; perhaps >>> behind a flag variable like `treesit-allow-outdated-node-access'. What >>> I'm really mostly interested in is: >>> >>> - How well the node references handle changes in byte positions in TS. >> >> They don’t handle position changes. If the buffer content changed, we >> need to reparse. Once we reparsed the buffer, a new tree is >> born. While it is true that the new tree shares some node with the old >> tree, tree-sitter does not expose any function or information that >> tells you which node in the new tree is “the same” as which node in >> the old tree; nor does it tell you whether a node in the old tree >> still “exists” in the new tree. >> >> Now, there does exist a function (in tree-sitter’s API) that allows >> you to “edit” a node with position changes. But a) I’m not sure how >> does it handle the case where the node is deleted by the change and b) >> it is not very useful because once you reparse the buffer, the new >> tree is completely independent from the old tree (ignoring the >> implementation detail which is not exposed). >> >>> >>> - Does changing something at X shift (like a `point-marker`) everything >>> below it? Does an outdated node correctly reference its new location >>> and state, such as changes to children or its position in the tree? >> >> Like I said above, any buffer change will create a new tree with no relation to the old tree, so there is no shifting. >> >> And there really isn’t a “new location”: we don’t know if the old node >> is still in the new tree. Mind you, even if the node is completely >> outside of the changed region, it can still disappear from the new >> tree because of change of its surrounding context. For example, in the >> following C code: >> >> /* >> int c = 1; >> >> If I insert a closing comment delimiter, and buffer becomes >> >> /* >> int c = 1; >> */ >> >> Even though int c = 1; is not in the changed range, nor did it’s >> position move, all those nodes (int, c, =, etc) are not in the new >> tree anymore, because the whole thing becomes a comment. >> >> I made any access to outdated nodes error because there really isn’t >> any good reason to use them, at least I didn’t think of any at the >> time. And make them error out should help people catch errors. >> >>> >>> Right now, Combobulate can make a proxy node, which essentially >>> captures the basics of a live node and stores it in a defstruct. That >>> way I can at least retain the start/end, type, text, etc. of a node >>> and still do light refactoring without contorting myself to do things >>> in a particular order, which is not always possible (like delaying >>> editing to the very end.) >> >> IIUC, you want to do some very minor whitespace edit to the buffer >> which doesn’t really change the parse tree, so you don’t want the >> nodes to be invalidated for no good reason? Not erroring on outdated >> nodes is easy. As you said, we can add a >> treesit-inhibit-error-outdated variable. But not it’s not so easy to >> automatically update outdated nodes’ positions (with aforementioned >> tree-sitter function). However, if you are making those changes, you >> much know how to adjust your nodes position, right? So maybe it isn’t >> a must-have for your purpose. > > It's a good point, but it's also easy to create a scenario where you > at least want to keep the position and esp. the type and text (for > reporting information to the user, or similar.) I should be clearer. I meant that treesit-inhibit-error-outdated is reasonable and easy to implement. So if you want we can add it. OTOH auto-updating outdated nodes with position information is nontrivial, and might not be must-have for your purpose. > > My main interest is now refactoring and how to best do it. If TS can > do some of it, then all the better. I realise it was never meant to, > but if we can continue accessing the information contained in a node > even if it is outdated, then that could be useful, however niche. I guess “refactoring” includes not only whitespace changes but also some structural changes like slurping (or whatever it’s called), right? If you want to do structural changes, tree-sitter probably can’t help you much, as you observed. Maybe it’s better to “export” the tree-sitter tree to your own tree and do transformations with it? Maybe that’s already what you does now. > Currently I use overlays and point markers, but they are not > infallible. Yuan
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.