#77942 - 31.0.50; replace-region-contents gets stuck

GNU bug report logs - #77942
31.0.50; replace-region-contents gets stuck

Package: emacs;

Reported by: Gerd Möllmann <gerd.moellmann <at> gmail.com>

Date: Sun, 20 Apr 2025 15:30:07 UTC

Severity: normal

Found in version 31.0.50

View this message in rfc822 format

From: Stefan Monnier <monnier <at> iro.umontreal.ca> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>, 77942 <at> debbugs.gnu.org Subject: bug#77942: 31.0.50; replace-region-contents gets stuck Date: Mon, 21 Apr 2025 22:47:22 -0400

>> Paul, IIUC you've played with this algorithm. Would you be tempted to >> try and improve that situation where the replacement text is*much* >> smaller the text it replaces? > > My quick reaction is that you'd need a different algorithm than the > Myers-Ukkonen algorithm used by GNU diff etc. A good place to start might be > here: > > A. Andoni, R. Krauthgamer and K. Onak, "Polylogarithmic Approximation for > Edit Distance and the Asymmetric Query Complexity," FOCS 2010, 377-386, > <https://doi.org/10.1109/FOCS.2010.43>. > > ... though this is just the tip of an iceberg that I haven't had time to > look into. If there's real interest in this (it'd be some work) I can ask > a colleague who's more of an expert.... I don't think I have the time&energy to look into it. But the issue at hand is that in Gerd's example, the diff code takes a *really* long time (even though it has already found an optimal answer, AFAICT), and more importantly: without calling the XVECREF_YVECREF_EQUAL function, which prevents us from aborting the operation with a timeout. For example, if I interrupt the execution I found it is in `diag` in the following loop: for (d = fmax; d >= fmin; d -= 2) { OFFSET x; OFFSET y; OFFSET tlo = fd[d - 1]; OFFSET thi = fd[d + 1]; OFFSET x0 = tlo < thi ? thi : tlo + 1; for (x = x0, y = x0 - d; x < xlim && y < ylim && XREF_YREF_EQUAL (x, y); x++, y++) continue; if (x - x0 > SNAKE_LIMIT) big_snake = true; fd[d] = x; if (odd && bmin <= d && d <= bmax && bd[d] <= x) { part->xmid = x; part->ymid = y; part->lo_minimal = part->hi_minimal = true; return; } } where (in our case when we replace a very large region with a short string), the `y < ylim` is apparently always false (once the first 74 calls to `XVECREF_YVECREF_EQUAL` took place): (gdb) p x $1 = 51823 (gdb) p xlim $2 = 489755 (gdb) p y $3 = <optimized out> (gdb) p ylim $4 = 14 (gdb) p x0 $5 = 51823 (gdb) p d $6 = 41094 (gdb) p x0 - d $7 = 10729 I don't understand this code enough, but I get the impression that we might be able to streamline this loop once we get to that state because every iteration seems to do very little useful work that does not change between iterations (clearly this depends on being able to detect "that state", which depends on the `tlo` and `thi` we read from the array, so it might require keeping track somewhere of the min/max values stored in the `fd` array?). The same probably happens for the loop that follows (and does the same but for "the bottom-up search", according to the comment). I'm hoping you understand enough of the algorithm and of what I'm saying to be able to tell me if I'm making sense. Stefan

This bug report was last modified 109 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #77942 31.0.50; replace-region-contents gets stuck

GNU bug report logs - #77942
31.0.50; replace-region-contents gets stuck