GNU bug report logs - #70077
An easier way to track buffer changes

Previous Next

Package: emacs;

Reported by: Stefan Monnier <monnier <at> iro.umontreal.ca>

Date: Fri, 29 Mar 2024 16:17:01 UTC

Severity: normal

Tags: patch

Done: Stefan Monnier <monnier <at> iro.umontreal.ca>

Bug is archived. No further changes may be made.

Full log


Message #80 received at 70077 <at> debbugs.gnu.org (full text, mbox):

From: Ihor Radchenko <yantar92 <at> posteo.net>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: casouri <at> gmail.com, 70077 <at> debbugs.gnu.org, qhong <at> alum.mit.edu,
 frederic.bour <at> lakaban.net, joaotavora <at> gmail.com, mail <at> nicolasgoaziou.fr,
 acm <at> muc.de, Eli Zaretskii <eliz <at> gnu.org>, stephen_leake <at> stephe-leake.org,
 alan.zimm <at> gmail.com, phillip.lord <at> russet.org.uk
Subject: Re: bug#70077: An easier way to track buffer changes
Date: Tue, 02 Apr 2024 16:21:25 +0000
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>>> I'm not sure how to combine the benefits of combining small changes into
>>> larger ones with the benefits of keeping distant changes separate.
>> I am not sure if combining small changes into larger ones is at all a
>> good idea.
>
> In my experience, if the amount of work to be done "per change" is not
> completely trivial, combining small changes is indispensable (the worst
> part being that the cases where it's needed are sufficiently infrequent
> that naive users of `*-change-functions` won't know about that, so we
> end up with unacceptable slowdowns in corner cases).

I agree, but only when subsequent changes intersect each other.
When distant changes are combined together, they cover a lot of buffer
text that has not been changed - they may trigger a lot more work
compared to more granular changes. Consider, for example, two changes
near the beginning and the end of the buffer:
(1..10) and (1000000...(point-max))
If such changes are tracked by tree-sitter or other parser, there is a
world of difference between requesting to re-parse individual segments
and the whole buffer.

>> The changes are stored in strings, which get allocated and
>> re-allocated repeatedly.
>
> Indeed.  Tho for N small changes, my code should perform only O(log N)
> re-allocations.

May you explain a bit more about this?
My reading of `track-changes--before' is that `concat' is called on
every change except when a new change is already inside the previously
recorded one. (aside: I have a hard time reading the code because of
confusing slot names: bbeg vs beg??)

>> Repeated string allocations, especially when strings keep growing
>> towards the buffer size, is likely going to increase consing and make
>> GCs more frequent.
>
> Similar allocations presumably take place anyway while running the code
> (e.g. for the `buffer-undo-list`), so I'm hoping the effect will be
> "lost in the noise".

I disagree. `buffer-undo-list' only includes the text that has been
actually changed. It never creates strings that span between
distant regions. As a terminal case, consider alternating between first
and second half of the buffer, starting in the middle and editing
towards the buffer boundaries - this will involve re-allocation of
buffer-long strings on average.

>>> We could expose a list of simultaneous (and thus disjoint) changes,
>>> which avoids the last problem.  But it's a fair bit more work for us, it
>>> makes the API more complex for the clients, and it's rarely what the
>>> clients really want anyway.
>> FYI, Org parser maintains such a list.
>
> Could you point me to the relevant code (I failed to find it)?

That code handles a lot more than just changes, so it might not be the
best reference. Anyway...

See the docstring of `org-element--cache-sync-requests', and
the branch in `org-element--cache-submit-request' starting from
	  ;; Current changes can be merged with first sync request: we
	  ;; can save a partial cache synchronization.

>> We previously discussed a similar API in
>> https://yhetil.org/emacs-bugs/87o7iq1emo.fsf <at> localhost/
>
> IIUC this discusses a *sequence* of edits.  In the point to which you
> replied I was discussing keeping a list of *simultaneous* edits.

Hmm. By referring to buffer-undo-list, I meant that intersecting edits
will be automatically merged, as it is usually done in undo system.

I am also not 100% sure why edits being simultaneous is any relevant.
Maybe we are talking about different things?

What I was talking about is (1) automatically merging subsequent edits
like 1..5 2..7 7..9 into 1..9. (2) if a subsequent edit does not
intersect with previous edited region, record it separately without
merging.

> This said, the downside in both cases is that the specific data that we
> need from such a list tends to depend on the user.  E.g. you suggest
>
>     (BEG END_BEFORE END_AFTER COUNTER)
>
> but that is not sufficient reconstruct the corresponding buffer state,
> so things like Eglot/CRDT can't use it.  Ideally for CRDT I think you'd
> want a sequence of
>
>     (BEG END-BEFORE STRING-AFTER)

Right. It's just that Org mode does not need STRING-AFTER, which is why
I did not think about it in my proposal. Of course, having STRING-AFTER
is required to get full reconstruction of the buffer state.

> but for Eglot this is not sufficient because Eglot needs to convert BEG
> and END_BEFORE into LSP positions (i.e. "line+col") and for that it
> needs to reproduce the past buffer state.  So instead, what Eglot needs
> (and does indeed build using `*-change-functions`) is a sequence of
>
>     (LSP-BEG LSP-END-BEFORE STRING-AFTER)
>
> [ Tho it seems it also needs a "LENGTH" of the deleted chunk, not sure
>   exactly why, but I guess it's a piece of redundant info the servers
>   can use to sanity-check the data?  ]

It is to support deprecated LSP spec (I presume that older LSP servers
may still be using the old spec):

https://microsoft.github.io/language-server-protocol/specifications/specification-3-15/
export type TextDocumentContentChangeEvent = {
	 * The range of the document that changed.
	range: Range;

	 * The optional length of the range that got replaced.
	 * @deprecated use range instead.
	rangeLength?: number;

	 * The new text for the provided range.
	text: string;

> The code changes were actually quite simple, so I have now included it.
> See my current PoC code below.

Am I missing something, or will `track-changes--signal-if-disjoint'
alter `track-changes--state' when it calls `track-changes-fetch' ->
`track-changes-reset'? Won't that interfere with the information passed
to non-disjoint trackers?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>




This bug report was last modified 1 year and 101 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.