GNU bug report logs - #58558
29.0.50; re-search-forward is slow in some buffers

Previous Next

Package: emacs;

Reported by: Ihor Radchenko <yantar92 <at> posteo.net>

Date: Sun, 16 Oct 2022 01:27:02 UTC

Severity: normal

Found in version 29.0.50

Full log


View this message in rfc822 format

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Ihor Radchenko <yantar92 <at> posteo.net>
Cc: 58558 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, larsi <at> gnus.org
Subject: bug#58558: 29.0.50; re-search-forward is slow in some buffers
Date: Tue, 13 Dec 2022 12:38:13 -0500
>>> I will look how to do it. Maybe perf probe.
>>> I guess, it will be useful to compile Emacs with debug symbols at this
>>> point.
>>
>> AFAIR, you can ask perf to profile a single function, and you can ask
>> it to annotate the profile with the source code.
>
> I now compiled Emacs with debug symbols, waited enough to see observable
> increase in the benchmark-run timing, and recorded the perf data.
>
> buf_bytepos_to_charpos is still on the top
>
>     78.06%  emacs         emacs                       [.] buf_bytepos_to_charpos
>      3.00%  emacs         emacs                       [.] re_match_2_internal
>      1.05%  emacs         emacs                       [.] find_interval
>      1.04%  emacs         emacs                       [.] CHAR_TABLE_REF_ASCII
>      0.85%  emacs         emacs                       [.] make_lisp_symbol
>      0.80%  emacs         emacs                       [.] re_search_2
>      0.76%  emacs         emacs                       [.] builtin_lisp_symbol
>      0.62%  emacs         emacs                       [.] PSEUDOVECTORP

AFAIK the main places where we call `buf_bytepos_to_charpos` from
`re_match_2_internal` is via the `SYNTAX_TABLE_BYTE_TO_CHAR` macro, used
for regexp elements that depend on syntax tables (i.e. \<, \>, \_<, ...).

But I'd expect those to be executed "frequently&closely" enough that the
`cached_(byte|char)pos` data should almost always be nearby, making the
call to `buf_bytepos_to_charpos` fairly cheap (more specifically
the `for (tail = BUF_MARKERS (b);...` loop should not iterate many
times, regardless how many markers there are).

> My guess: number of markers is growing somehow?

`buf_bytepos_to_charpos` itself creates markers (using them as a cache
of previous conversions), so that might be why.

But we only look at the first N markers where N*50 is the distance to
the closest marker found so far.  So growth is not sufficient (it's
clearly a part of the reason, tho).

Regarding growth: could you call `garbage-collect` between the calls to
`re-search-forward` to see if that avoids the accumulation?
[ I presume here that those markers are created/added by
  `buf_bytepos_to_charpos` itself, so they should be recovered by the GC
  because they're not referenced from anywhere else.  ]

I'd be interested to know how many iterations of the `for (tail =
BUF_MARKERS (b);...` loop are executed on average during your
`re-search-forward` (and how that average changes between runs of
`re-search-forward`).


        Stefan


PS: Of course, another approach would be to replace this code with
something else.  Using markers as a cache of bytepos/charpos conversions
has been a source of a few performance issues over the year.

Another approach could be to use a "vector with gap" managed alongside
the actual buffer text.  It could be indexed by "charpos divided by
1024", so conversion from charpos to bytepos could be a simple vector
lookup followed by scanning at most 1kB, and conversion in the other
direction would use a binary search in that same vector (or we could use
2 "vectors with gap", one per direction of conversion).





This bug report was last modified 2 years and 64 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.