GNU bug report logs - #63225
Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)

Previous Next

Package: emacs;

Reported by: Ihor Radchenko <yantar92 <at> posteo.net>

Date: Tue, 2 May 2023 07:35:02 UTC

Severity: normal

Tags: patch

Full log


Message #32 received at 63225 <at> debbugs.gnu.org (full text, mbox):

From: Po Lu <luangruo <at> yahoo.com>
To: Mattias EngdegÄrd <mattiase <at> acm.org>
Cc: 63225 <at> debbugs.gnu.org, Ihor Radchenko <yantar92 <at> posteo.net>
Subject: Re: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in
 search.c)
Date: Wed, 03 May 2023 07:36:46 +0800
Mattias EngdegÄrd <mattiase <at> acm.org> writes:

>> I was able to get rid of the regex compilation-related slowdown simply
>> by increasing REGEXP_CACHE_SIZE 10x (see the attached patch).
>
> Indeed it sounds like you are suffering from regexp cache thrashing. I'm attaching two patches: one to measure the cache miss rate, and one that allows the regexp cache size to be changed at run time.
>
> That should let you find the working set size for your application, and ideally come up with a way to reduce it. Perhaps you could give us an idea of what these regexps look like and how they are used?
>
>> Does anyone know if there are potential side effects of this increase if
>> applied across Emacs? Or, alternatively, may Emacs provide an ability to
>> store compiled regexp patterns from Elisp (similar to what
>> `treesit-query-compile' does)?
>
> I don't think it's necessarily a good idea to increase the size to 200
> right away because of the linear cache lookup mechanism. Allowing the
> size to be changed at run time is probably less controversial (but
> arguably just as much of a crutch).
>
> Introducing regexp objects that could store compiled regexps and be used instead of strings would be quite some work but probably worthwhile.

Thanks for curing this instance of C programmer's disease.

> From f1246af3cc558bd38527f320964bb0e0a1e74de0 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase <at> acm.org>
> Date: Sat, 7 Nov 2020 17:00:53 +0100
> Subject: [PATCH 1/2] Add regexp cache hit/miss counters
>
> ---
>  src/search.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/src/search.c b/src/search.c
> index 0bb52c03eef..6f71f3d16c1 100644
> --- a/src/search.c
> +++ b/src/search.c
> @@ -220,7 +220,10 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
>  	      || EQ (cp->syntax_table, BVAR (current_buffer, syntax_table)))
>  	  && !NILP (Fequal (cp->f_whitespace_regexp, Vsearch_spaces_regexp))
>  	  && cp->buf.charset_unibyte == charset_unibyte)
> -	break;
> +        {
> +          regexp_cache_hit++;
> +          break;
> +        }
>  
>        /* If we're at the end of the cache, compile into the last
>  	 (least recently used) non-busy cell in the cache.  */
> @@ -232,6 +235,7 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
>            cp = *cpp;
>  	compile_it:
>            eassert (!cp->busy);
> +          regexp_cache_miss++;
>  	  compile_pattern_1 (cp, pattern, translate, posix);
>  	  break;
>  	}
> @@ -3431,6 +3435,13 @@ syms_of_search (void)
>  is to bind it with `let' around a small expression.  */);
>    Vinhibit_changing_match_data = Qnil;
>  
> +  DEFVAR_INT("regexp-cache-hit", regexp_cache_hit,
> +             doc: /* Regexp cache hit count.  Internal use only. */);
> +  regexp_cache_hit = 0;
> +  DEFVAR_INT("regexp-cache-miss", regexp_cache_miss,
> +             doc: /* Regexp cache miss count.  Internal use only. */);
> +  regexp_cache_miss = 0;

Please put a space between `DEFVAR_INT' and `('.




This bug report was last modified 2 years and 37 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.