GNU bug report logs -
#63225
Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)
Previous Next
Full log
Message #32 received at 63225 <at> debbugs.gnu.org (full text, mbox):
Mattias EngdegÄrd <mattiase <at> acm.org> writes:
>> I was able to get rid of the regex compilation-related slowdown simply
>> by increasing REGEXP_CACHE_SIZE 10x (see the attached patch).
>
> Indeed it sounds like you are suffering from regexp cache thrashing. I'm attaching two patches: one to measure the cache miss rate, and one that allows the regexp cache size to be changed at run time.
>
> That should let you find the working set size for your application, and ideally come up with a way to reduce it. Perhaps you could give us an idea of what these regexps look like and how they are used?
>
>> Does anyone know if there are potential side effects of this increase if
>> applied across Emacs? Or, alternatively, may Emacs provide an ability to
>> store compiled regexp patterns from Elisp (similar to what
>> `treesit-query-compile' does)?
>
> I don't think it's necessarily a good idea to increase the size to 200
> right away because of the linear cache lookup mechanism. Allowing the
> size to be changed at run time is probably less controversial (but
> arguably just as much of a crutch).
>
> Introducing regexp objects that could store compiled regexps and be used instead of strings would be quite some work but probably worthwhile.
Thanks for curing this instance of C programmer's disease.
> From f1246af3cc558bd38527f320964bb0e0a1e74de0 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase <at> acm.org>
> Date: Sat, 7 Nov 2020 17:00:53 +0100
> Subject: [PATCH 1/2] Add regexp cache hit/miss counters
>
> ---
> src/search.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/src/search.c b/src/search.c
> index 0bb52c03eef..6f71f3d16c1 100644
> --- a/src/search.c
> +++ b/src/search.c
> @@ -220,7 +220,10 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
> || EQ (cp->syntax_table, BVAR (current_buffer, syntax_table)))
> && !NILP (Fequal (cp->f_whitespace_regexp, Vsearch_spaces_regexp))
> && cp->buf.charset_unibyte == charset_unibyte)
> - break;
> + {
> + regexp_cache_hit++;
> + break;
> + }
>
> /* If we're at the end of the cache, compile into the last
> (least recently used) non-busy cell in the cache. */
> @@ -232,6 +235,7 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
> cp = *cpp;
> compile_it:
> eassert (!cp->busy);
> + regexp_cache_miss++;
> compile_pattern_1 (cp, pattern, translate, posix);
> break;
> }
> @@ -3431,6 +3435,13 @@ syms_of_search (void)
> is to bind it with `let' around a small expression. */);
> Vinhibit_changing_match_data = Qnil;
>
> + DEFVAR_INT("regexp-cache-hit", regexp_cache_hit,
> + doc: /* Regexp cache hit count. Internal use only. */);
> + regexp_cache_hit = 0;
> + DEFVAR_INT("regexp-cache-miss", regexp_cache_miss,
> + doc: /* Regexp cache miss count. Internal use only. */);
> + regexp_cache_miss = 0;
Please put a space between `DEFVAR_INT' and `('.
This bug report was last modified 2 years and 37 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.