GNU bug report logs - #77942
31.0.50; replace-region-contents gets stuck

Previous Next

Package: emacs;

Reported by: Gerd Möllmann <gerd.moellmann <at> gmail.com>

Date: Sun, 20 Apr 2025 15:30:07 UTC

Severity: normal

Found in version 31.0.50

Full log


View this message in rfc822 format

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>, 77942 <at> debbugs.gnu.org
Subject: bug#77942: 31.0.50; replace-region-contents gets stuck
Date: Mon, 21 Apr 2025 22:47:22 -0400
>> Paul, IIUC you've played with this algorithm.  Would you be tempted to
>> try and improve that situation where the replacement text is*much*
>> smaller the text it replaces?
>
> My quick reaction is that you'd need a different algorithm than the
> Myers-Ukkonen algorithm used by GNU diff etc. A good place to start might be
> here:
>
> A. Andoni, R. Krauthgamer and K. Onak, "Polylogarithmic Approximation for
>  Edit Distance and the Asymmetric Query Complexity," FOCS 2010, 377-386,
> <https://doi.org/10.1109/FOCS.2010.43>.
>
> ... though this is just the tip of an iceberg that I haven't had time to
>  look into. If there's real interest in this (it'd be some work) I can ask
> a colleague who's more of an expert....

I don't think I have the time&energy to look into it.

But the issue at hand is that in Gerd's example, the diff code takes
a *really* long time (even though it has already found an optimal
answer, AFAICT), and more importantly: without calling the
XVECREF_YVECREF_EQUAL function, which prevents us from aborting the
operation with a timeout.

For example, if I interrupt the execution I found it is in `diag` in the
following loop:

      for (d = fmax; d >= fmin; d -= 2)
        {
          OFFSET x;
          OFFSET y;
          OFFSET tlo = fd[d - 1];
          OFFSET thi = fd[d + 1];
          OFFSET x0 = tlo < thi ? thi : tlo + 1;

          for (x = x0, y = x0 - d;
               x < xlim && y < ylim && XREF_YREF_EQUAL (x, y);
               x++, y++)
            continue;
          if (x - x0 > SNAKE_LIMIT)
            big_snake = true;
          fd[d] = x;
          if (odd && bmin <= d && d <= bmax && bd[d] <= x)
            {
              part->xmid = x;
              part->ymid = y;
              part->lo_minimal = part->hi_minimal = true;
              return;
            }
        }

where (in our case when we replace a very large region with
a short string), the `y < ylim` is apparently always false (once the
first 74 calls to `XVECREF_YVECREF_EQUAL` took place):

    (gdb) p x
    $1 = 51823
    (gdb) p xlim
    $2 = 489755
    (gdb) p y
    $3 = <optimized out>
    (gdb) p ylim
    $4 = 14
    (gdb) p x0
    $5 = 51823
    (gdb) p d
    $6 = 41094
    (gdb) p x0 - d
    $7 = 10729

I don't understand this code enough, but I get the impression that we
might be able to streamline this loop once we get to that state because
every iteration seems to do very little useful work that does not change
between iterations (clearly this depends on being able to detect "that
state", which depends on the `tlo` and `thi` we read from the array, so
it might require keeping track somewhere of the min/max values stored in
the `fd` array?).

The same probably happens for the loop that follows (and does the same
but for "the bottom-up search", according to the comment).

I'm hoping you understand enough of the algorithm and of what I'm saying
to be able to tell me if I'm making sense.


        Stefan





This bug report was last modified 56 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.