GNU bug report logs - #63225
Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)

Previous Next

Package: emacs;

Reported by: Ihor Radchenko <yantar92 <at> posteo.net>

Date: Tue, 2 May 2023 07:35:02 UTC

Severity: normal

Tags: patch

Full log


View this message in rfc822 format

From: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
To: Ihor Radchenko <yantar92 <at> posteo.net>
Cc: 63225 <at> debbugs.gnu.org
Subject: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)
Date: Wed, 3 May 2023 10:39:10 +0200
2 maj 2023 kl. 23.21 skrev Ihor Radchenko <yantar92 <at> posteo.net>:

> I tried this, and it did not give any noticeable improvement.
> Most likely, because the actual `cond' is
> 
>  (cond ((looking-at "foo) (<parser function calling many more regex searches>)) ...)

I see, so it doesn't run through all top-level cases very often then? I thought that would be the common path (plain text).

Would consolidating some of the secondary regexps help at all? What are the most frequent branches in the parser?

Perhaps you just don't see much improvement until the working set of regexps fits in the cache.

>> Otherwise it's very much a matter of optimisation of everything, including regexps. Minimise backtracking.
>> If you want to match five or more dashes, use "------*" instead of "-\\{5,\\}". And so on.
> 
> This example sounds like something that regexp compilation should be
> able to optimize, no? I do not easily see why the latter should cause
> more CPU time compared to the former.

It's a trivial point and definitely not the source of your problems, sorry! (Counted repetitions are slightly less efficient because they need to maintain the counter, it's all done in a terrible way.)
The regexp compiler doesn't do much optimisation in order to keep the translation fast. It doesn't even convert "[a]" to "a".

> Or, alternatively, the parsed regexps can be attached to string objects
> internally. Then, regexp cache lookup will degenerate to looking into a
> string object slot.

That would work too but we really don't want to make our strings any fancier, they are already much too big and slow.





This bug report was last modified 2 years and 37 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.