GNU bug report logs - #70077
An easier way to track buffer changes

Previous Next

Package: emacs;

Reported by: Stefan Monnier <monnier <at> iro.umontreal.ca>

Date: Fri, 29 Mar 2024 16:17:01 UTC

Severity: normal

Tags: patch

Done: Stefan Monnier <monnier <at> iro.umontreal.ca>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Ihor Radchenko <yantar92 <at> posteo.net>
Cc: casouri <at> gmail.com, 70077 <at> debbugs.gnu.org, qhong <at> alum.mit.edu, frederic.bour <at> lakaban.net, joaotavora <at> gmail.com, mail <at> nicolasgoaziou.fr, acm <at> muc.de, Eli Zaretskii <eliz <at> gnu.org>, stephen_leake <at> stephe-leake.org, alan.zimm <at> gmail.com, phillip.lord <at> russet.org.uk
Subject: bug#70077: An easier way to track buffer changes
Date: Tue, 02 Apr 2024 13:51:42 -0400
> I agree, but only when subsequent changes intersect each other.
> When distant changes are combined together, they cover a lot of buffer
> text that has not been changed - they may trigger a lot more work
> compared to more granular changes. Consider, for example, two changes
> near the beginning and the end of the buffer:
> (1..10) and (1000000...(point-max))
> If such changes are tracked by tree-sitter or other parser, there is a
> world of difference between requesting to re-parse individual segments
> and the whole buffer.

Oh, definitely.  For some clients, the amount of work doesn't really
depend on the size of the change, tho (OTOH the work done by
`track-changes.el` *is* affected by the size of the change).
Handling disjoint changes can be very important.

I'm fairly satisfied with my `track-changes-register-disjoint` solution
to this problem.

>>> The changes are stored in strings, which get allocated and
>>> re-allocated repeatedly.
>> Indeed.  Tho for N small changes, my code should perform only O(log N)
>> re-allocations.
> May you explain a bit more about this?
> My reading of `track-changes--before' is that `concat' is called on
> every change except when a new change is already inside the previously
> recorded one.

The "previously recorded one" (i.e. the info stored in
`track-changes--before-beg/end/string`) is a conservative
over-approximation of the changed area.  We (at least) double its size
every time we have to grow it, specifically to reduce the number of
times we need to grow it (otherwise we quickly fall into the O(N²)
trap).

> (aside: I have a hard time reading the code because of
> confusing slot names: bbeg vs beg??)

"bbeg/bend" stands for "before beg" and "before end" (i.e. the positions as
they were before the change).  BTW, those two slots don't exist any more
in the latest code I sent (but vars with those names are still present
here and there).

>>>> We could expose a list of simultaneous (and thus disjoint) changes,
>>>> which avoids the last problem.  But it's a fair bit more work for us, it
>>>> makes the API more complex for the clients, and it's rarely what the
>>>> clients really want anyway.
>>> FYI, Org parser maintains such a list.
>> Could you point me to the relevant code (I failed to find it)?
> That code handles a lot more than just changes, so it might not be the
> best reference. Anyway...
> See the docstring of `org-element--cache-sync-requests', and
> the branch in `org-element--cache-submit-request' starting from
> 	  ;; Current changes can be merged with first sync request: we
> 	  ;; can save a partial cache synchronization.

Thanks.  [ I see I did search the right file, but the code was too
intertwined with other things for me to find that part.  ]

[ Further discussions of Org's code moved off-list.  ]

> Hmm. By referring to buffer-undo-list, I meant that intersecting edits
> will be automatically merged, as it is usually done in undo system.

[ Actually `buffer-undo-list` doesn't do very much merging, if any.
  AFAIK the only merging that happens there is to "amalgamate" consecutive
  `self-insert-command`s or consecutive single-char `delete-char`s. šŸ™‚  ]

> I am also not 100% sure why edits being simultaneous is any relevant.

If they're not simultaneous it means that they describe N different
buffer states and that to interpret a specific change in the list
(e.g. to convert buffer positions to line+col positions) you
may have to reconstruct its before&after states by applying the
other changes.

For some use cases this is quite inconvenient.
In any case, it's not very important, I think.

> What I was talking about is (1) automatically merging subsequent edits
> like 1..5 2..7 7..9 into 1..9. (2) if a subsequent edit does not
> intersect with previous edited region, record it separately without
> merging.

That's what you can get now with `track-changes-register-disjoint`.

>> The code changes were actually quite simple, so I have now included it.
>> See my current PoC code below.
> Am I missing something, or will `track-changes--signal-if-disjoint'
> alter `track-changes--state' when it calls `track-changes-fetch' ->
> `track-changes-reset'?

[ I assume you meant `track-changes--clean-state` instead of
  `track-changes-reset`.  ] 

Yes it will (and there's a bug in `track-changes--before` because of
that that I still need to fix šŸ™), but that should not be a problem for
the clients.

> Won't that interfere with the information passed to
> non-disjoint trackers?

No: the separate states will be (re)combined when those trackers call
`track-changes-fetch`.

There is a nearby poor interference between clients, tho: if one
client calls `track-changes-fetch` eagerly all the time, it may prevent
another client from getting a "disjoint change" signal because
disjointness is noticed only between changes that occurred since the
last call to `track-changes--clean-state` (which is called mostly by
`track-changes-fetch`).
I guess this deserves a FIXME.


        Stefan





This bug report was last modified 1 year and 99 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.