#70077 - An easier way to track buffer changes

GNU bug report logs - #70077
An easier way to track buffer changes

Package: emacs;

Reported by: Stefan Monnier <monnier <at> iro.umontreal.ca>

Date: Fri, 29 Mar 2024 16:17:01 UTC

Severity: normal

Tags: patch

Done: Stefan Monnier <monnier <at> iro.umontreal.ca>

Bug is archived. No further changes may be made.

Message #80 received at 70077 <at> debbugs.gnu.org (full text, mbox):

From: Ihor Radchenko <yantar92 <at> posteo.net> To: Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: casouri <at> gmail.com, 70077 <at> debbugs.gnu.org, qhong <at> alum.mit.edu, frederic.bour <at> lakaban.net, joaotavora <at> gmail.com, mail <at> nicolasgoaziou.fr, acm <at> muc.de, Eli Zaretskii <eliz <at> gnu.org>, stephen_leake <at> stephe-leake.org, alan.zimm <at> gmail.com, phillip.lord <at> russet.org.uk Subject: Re: bug#70077: An easier way to track buffer changes Date: Tue, 02 Apr 2024 16:21:25 +0000

Stefan Monnier <monnier <at> iro.umontreal.ca> writes: >>> I'm not sure how to combine the benefits of combining small changes into >>> larger ones with the benefits of keeping distant changes separate. >> I am not sure if combining small changes into larger ones is at all a >> good idea. > > In my experience, if the amount of work to be done "per change" is not > completely trivial, combining small changes is indispensable (the worst > part being that the cases where it's needed are sufficiently infrequent > that naive users of `*-change-functions` won't know about that, so we > end up with unacceptable slowdowns in corner cases). I agree, but only when subsequent changes intersect each other. When distant changes are combined together, they cover a lot of buffer text that has not been changed - they may trigger a lot more work compared to more granular changes. Consider, for example, two changes near the beginning and the end of the buffer: (1..10) and (1000000...(point-max)) If such changes are tracked by tree-sitter or other parser, there is a world of difference between requesting to re-parse individual segments and the whole buffer. >> The changes are stored in strings, which get allocated and >> re-allocated repeatedly. > > Indeed. Tho for N small changes, my code should perform only O(log N) > re-allocations. May you explain a bit more about this? My reading of `track-changes--before' is that `concat' is called on every change except when a new change is already inside the previously recorded one. (aside: I have a hard time reading the code because of confusing slot names: bbeg vs beg??) >> Repeated string allocations, especially when strings keep growing >> towards the buffer size, is likely going to increase consing and make >> GCs more frequent. > > Similar allocations presumably take place anyway while running the code > (e.g. for the `buffer-undo-list`), so I'm hoping the effect will be > "lost in the noise". I disagree. `buffer-undo-list' only includes the text that has been actually changed. It never creates strings that span between distant regions. As a terminal case, consider alternating between first and second half of the buffer, starting in the middle and editing towards the buffer boundaries - this will involve re-allocation of buffer-long strings on average. >>> We could expose a list of simultaneous (and thus disjoint) changes, >>> which avoids the last problem. But it's a fair bit more work for us, it >>> makes the API more complex for the clients, and it's rarely what the >>> clients really want anyway. >> FYI, Org parser maintains such a list. > > Could you point me to the relevant code (I failed to find it)? That code handles a lot more than just changes, so it might not be the best reference. Anyway... See the docstring of `org-element--cache-sync-requests', and the branch in `org-element--cache-submit-request' starting from ;; Current changes can be merged with first sync request: we ;; can save a partial cache synchronization. >> We previously discussed a similar API in >> https://yhetil.org/emacs-bugs/87o7iq1emo.fsf <at> localhost/ > > IIUC this discusses a *sequence* of edits. In the point to which you > replied I was discussing keeping a list of *simultaneous* edits. Hmm. By referring to buffer-undo-list, I meant that intersecting edits will be automatically merged, as it is usually done in undo system. I am also not 100% sure why edits being simultaneous is any relevant. Maybe we are talking about different things? What I was talking about is (1) automatically merging subsequent edits like 1..5 2..7 7..9 into 1..9. (2) if a subsequent edit does not intersect with previous edited region, record it separately without merging. > This said, the downside in both cases is that the specific data that we > need from such a list tends to depend on the user. E.g. you suggest > > (BEG END_BEFORE END_AFTER COUNTER) > > but that is not sufficient reconstruct the corresponding buffer state, > so things like Eglot/CRDT can't use it. Ideally for CRDT I think you'd > want a sequence of > > (BEG END-BEFORE STRING-AFTER) Right. It's just that Org mode does not need STRING-AFTER, which is why I did not think about it in my proposal. Of course, having STRING-AFTER is required to get full reconstruction of the buffer state. > but for Eglot this is not sufficient because Eglot needs to convert BEG > and END_BEFORE into LSP positions (i.e. "line+col") and for that it > needs to reproduce the past buffer state. So instead, what Eglot needs > (and does indeed build using `*-change-functions`) is a sequence of > > (LSP-BEG LSP-END-BEFORE STRING-AFTER) > > [ Tho it seems it also needs a "LENGTH" of the deleted chunk, not sure > exactly why, but I guess it's a piece of redundant info the servers > can use to sanity-check the data? ] It is to support deprecated LSP spec (I presume that older LSP servers may still be using the old spec): https://microsoft.github.io/language-server-protocol/specifications/specification-3-15/ export type TextDocumentContentChangeEvent = { * The range of the document that changed. range: Range; * The optional length of the range that got replaced. * @deprecated use range instead. rangeLength?: number; * The new text for the provided range. text: string; > The code changes were actually quite simple, so I have now included it. > See my current PoC code below. Am I missing something, or will `track-changes--signal-if-disjoint' alter `track-changes--state' when it calls `track-changes-fetch' -> `track-changes-reset'? Won't that interfere with the information passed to non-disjoint trackers? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>

This bug report was last modified 1 year and 101 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #70077 An easier way to track buffer changes

GNU bug report logs - #70077
An easier way to track buffer changes