GNU bug report logs -
#58158
29.0.50; [overlay] Interval tree iteration considered harmful
Previous Next
Reported by: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Date: Thu, 29 Sep 2022 05:30:02 UTC
Severity: normal
Found in version 29.0.50
Fixed in version 30.1
Done: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
>> > I may be missing something, but it looks like the sole purpose of the
>> > iter_start/iter_finish dance is to ensure only one iteration per tree
>> > is running at any given time, and that's because the iteration uses
>> > some state variable(s) of which there's only one instance per tree.
>> >
>> > Stefan, am I missing something?
>>
>> One reason is that traversing a binary tree usually requires something
>> like recursion, but that wouldn't fit very conveniently with the current
>> code (nor with C in general since you can't make a local recursive
>> closure which accesses local variables from the surrounding function).
>
> I'm not sure I understand how recursion is related. Are you saying
> that recursion is replaced with iteration? But then, if _start and
> _finish are called by the same caller, we don't really need the
> protection, since no one can start another iteration while the first
> one runs. Right?
Typically, current code will look something like:
int x;
Lisp_Object y;
buffer_overlay_iter_start (current_buffer, prev, pos, ITREE_DESCENDING);
while ((node = buffer_overlay_iter_next (current_buffer)))
{
... do something that updates x and y ...
}
buffer_overlay_iter_finish (current_buffer);
If we were to use recursion, then we'd need to define a new (recursive)
function which does what's currently done in the `while` loop, but this
function can't access `x` and `y`, so it would need to take them as
argument, or a reference to them...
The use of an iterator is definitely convenient (and is a standard
approach in many languages).
>> Another is the need to update the begin/end fields (these need updating
>> because of insertions/deletions but they're updated lazily while
>> traversing the tree to avoid an O(N) complexity during the
>> insertions/deletions). Hiding that behind 'some kind of "next node"
>> function keeps the code more readable.
>
> Where in the code do you see iteration that adds or deletes nodes?
No, I meant insertion/deletion of text in the buffer, thus requiring
updates to `begin/end` fields.
> Btw, couldn't we handle this by having a flag in the node that tells
> us whether the begin/end fields can be trusted? Then the first caller
> who need them would run the update and reset the flags, and we still
> have lazy update, albeit somewhat less lazy, but without the need for
> guarding the iteration with start/finish calls. Would that work?
Yes, it would, but it's still O(N).
The current approach is inspired by the approach used in `intervals.c`
which also updates those fields lazily/ondemand so as to avoid the O(N)
performance impact.
>> For now, I pushed a simple fix to traverse the tree "by hand" in the GC
>> rather than via the iterator.
> So this removes the restriction of not having GC during iteration?
Yes.
> Also, I take it that you don't consider the current code is as
> "harmful" as Gerd thinks it is?
I don't like the global state it uses, but I think we can fix this
aspect without too much trouble.
> IOW, you don't share his opinion that this implementation is
> a "no-go"?
No, indeed, I don't.
Stefan
This bug report was last modified 1 year and 311 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.