#58342 - 29.0.50; noverlay branch is O(N) for important calls

GNU bug report logs - #58342
29.0.50; noverlay branch is O(N) for important calls

Package: emacs;

Reported by: Matt Armstrong <matt <at> rfc20.org>

Date: Thu, 6 Oct 2022 23:27:01 UTC

Severity: normal

Merged with 58361

Found in version 29.0.50

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Matt Armstrong <matt <at> rfc20.org> To: bug-gnu-emacs <at> gnu.org Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>, Andreas Politz <mail <at> andreas-politz.de>, Stefan Monnier <monnier <at> iro.umontreal.ca> Subject: 29.0.50; noverlay branch is O(N) for important calls Date: Thu, 06 Oct 2022 16:25:48 -0700

[Message part 1 (text/plain, inline)]

First let me preface: A) I hope I'm wrong here, but after careful thought I've failed to convince myself that I am, so I am either right or need help from others to spot my flawed logic. B) Even if I'm right the problem may not be judged serious enough to address. I believe that the feature/noverlay branch uses an O(N) algorithm for next-overlay-change and previous-overlay-change. Or, more precisely, O(MIN(N, (buffer-size))), where N is the overlay count. Now, in "normal" cases the real world achieved performance will be fine. If overlays form mostly disjoint intervals, without too many overlapping overlays, without too many overlays that span large portions of the buffer, the algorithm used will find the next/previous change quickly. However, consider the new next_overlay_change: 1 ptrdiff_t 2 next_overlay_change (ptrdiff_t pos) 3 { 4 ptrdiff_t next = ZV; 5 struct interval_node *node; 6 ITREE_FOREACH (node, current_buffer->overlays, pos, next, ASCENDING) 7 { 8 if (node->begin > pos) 9 { 10 /* If we reach this branch, node->begin must be the least upper bound 11 of pos, because the search is limited to [pos,next) . */ 12 eassert (node->begin < next); 13 next = node->begin; 14 ITREE_FOREACH_ABORT (); 15 break; 16 } 17 else if (node->begin < node->end && node->end < next) 18 { 19 next = node->end; 20 ITREE_FOREACH_NARROW (pos, next); 21 } 22 } 23 return next; 24 } Here we traverse overlays in ASCENDING order of BEG positions. The best we can say is that this loop executes in O(K*log(N)) time, where K is the MIN of number of overlays that overlap POS and the number of valid positions in the buffer after POS. It is always going to be possible to construct pathalogical cases where lines 19-20 are taken for as many positions as there are in the buffer after POS, since the tree is not ordered by END. I've attached a patch that can go in to itree.c (which, if I'm right, I think would be good to add, especially if we decide to not improve the data structure). I've quoted the text: ==== FIXME: this data structure is O(N) for important operations -matt ==== The amortized costs of Emacs' previous-overlay-change and next-overlay-change functions are O(N) with this data structure. The root problem is that we only have an order for the BEG field, but not the END. The previous/next overlay change operations need to find the nearest point where there is *either* an interval BEG or END point, but there is no efficient way to narrow the search space over END postions. Consider the case where next-overlay-change is called at POS, all interval BEG positions are less than pos POS and all interval END posistions are after. These END positions have no order, and so *every* interval must be examined. This is at least O(N). The previous-overlay-change case is similar. The root issue is that the iterative "narrowing" approach is not guaranteed to reduce the search space in logarithmic time, since END is not ordered in the tree. One might argue that the LIMIT value will do this narrowing, but this narrowing is O(K*log(N)) where K is the size of the result set. If we are interested in finding the node in a range with the smallest END, we might have to examine all K nodes in that range. In the case of the *-overlay-channge functions, K may well be equal to N. Ideally, a tree based data structure for overlays would have O(log(N)) performance for previous-overlay-change and next-overlay-change, as these are called in performance sensitive situations such as redisplay. The only way I can think of achieving this is by keeping one ordering by BEG and a separate ordering by END, and then performing logic quite similar to the current Emacs overlays-before and overlays-after lists.

From 6ca364a6ad45b1cfdcf080c492fde536975f6e09 Mon Sep 17 00:00:00 2001 From: Matt Armstrong <matt <at> rfc20.org> Date: Thu, 6 Oct 2022 15:47:20 -0700 Subject: [PATCH 5/5] ; * src/itree.c: Add comment describing when noverlay is O(N) --- src/itree.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/src/itree.c b/src/itree.c index 79e39d6e2a..6de84ae0b8 100644 --- a/src/itree.c +++ b/src/itree.c @@ -62,6 +62,42 @@ Copyright (C) 2017-2022 Free Software Foundation, Inc. complexity of O(K*log(N)) for this operation, where K is the size of the result set and N the size of the tree. + ==== + FIXME: this data structure is O(N) for important operations -matt + ==== + + The amortized costs of Emacs' previous-overlay-change and + next-overlay-change functions are O(N) with this data structure. + The root problem is that we only have an order for the BEG field, + but not the END. The previous/next overlay change operations need + to find the nearest point where there is *either* an interval BEG + or END point, but there is no efficient way to narrow the search + space over END postions. + + Consider the case where next-overlay-change is called at POS, all + interval BEG positions are less than pos POS and all interval END + posistions are after. These END positions have no order, and so + *every* interval must be examined. This is at least O(N). The + previous-overlay-change case is similar. The root issue is that + the iterative "narrowing" approach is not guaranteed to reduce the + search space in logarithmic time, since END is not ordered in the + tree. + + One might argue that the LIMIT value will do this narrowing, but + this narrowing is O(K*log(N)) where K is the size of the result + set. If we are interested in finding the node in a range with the + smallest END, we might have to examine all K nodes in that range. + In the case of the *-overlay-channge functions, K may well be equal + to N. + + Ideally, a tree based data structure for overlays would have + O(log(N)) performance for previous-overlay-change and + next-overlay-change, as these are called in performance sensitive + situations such as redisplay. The only way I can think of + achieving this is by keeping one ordering by BEG and a separate + ordering by END, and then performing logic quite similar to the + current Emacs overlays-before and overlays-after lists. + ==== Adjusting intervals ==== Since this data-structure will be used for overlays in an Emacs -- 2.35.1

This bug report was last modified 2 years and 347 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #58342 29.0.50; noverlay branch is O(N) for important calls

GNU bug report logs - #58342
29.0.50; noverlay branch is O(N) for important calls