GNU bug report logs - #63225
Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)

Previous Next

Package: emacs;

Reported by: Ihor Radchenko <yantar92 <at> posteo.net>

Date: Tue, 2 May 2023 07:35:02 UTC

Severity: normal

Tags: patch

Full log


Message #80 received at 63225 <at> debbugs.gnu.org (full text, mbox):

From: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
To: Ihor Radchenko <yantar92 <at> posteo.net>
Cc: 63225 <at> debbugs.gnu.org
Subject: Re: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in
 search.c)
Date: Mon, 8 May 2023 20:21:03 +0200
8 maj 2023 kl. 13.58 skrev Ihor Radchenko <yantar92 <at> posteo.net>:

> 		 (save-excursion
> 		   (beginning-of-line 0)
> 		   (not (looking-at-p "[[:blank:]]*$"))))

I wonder if that last part isn't better written as

  (save-excursion
    (forward-line 0)   ; faster than beginning-of-line
    (skip-chars-forward "[:blank:]") ; faster than looking-at-p
    (not (eolp)))   ; very cheap

which doesn't use regexps at all. Worth a try.

But yes, I sort of understand what you are getting at (except the business with the MODE parameter which is still a bit mysterious to me).

> [now part of the giant rx]
> 
> (rx line-start (0+ (any ?\s ?\t))
>      ":" (1+ (any ?- ?_ word)) ":"
>      (0+ (any ?\s ?\t)) line-end)

Any reason you don't capture the part between the colons here, so that you don't need to match it later on?

> But why? Aren't (in word ?_ ?-) and (or word ?_ ?-) not the same?

"[-_[:word:]]" and "\\w\\|[_-]" indeed match the same thing but they don't generate the same regexp bytecode -- the former is faster. (In this case rx makes a literal translation to those strings but we should probably make it optimise to the faster regexp.)

There is a regexp disassembler for the really curious but it doesn't come with Emacs.

> Your suggestions about using (not (in ...)) in place of (in ...) are
> good, but I am afraid to break CJK cases where people can use unexpected
> set of characters. I was bitten by this in the past.

In this case there's no need since you could gain some speed by the simple rewrite (or -> in) above, but there may be others where conditions are different.

>> Maybe, but you still cons each time. (And remember that the plist-get equality funarg is new in Emacs 29.)
> 
> Sure it does.
> It is just one of the variable parts of Org syntax that might be
> changed. There are ways to make this into constant, but it is a fragile
> area of the code that I do not want to touch without a reason.
> (Especially given that I am not familiar with org-list.el)

So it's fine to use elisp constructs new in Emacs 29 in Org? Then the line

 ;; Package-Requires: ((emacs "26.1"))

in org.el should probably be updated, right?

> I hope we do. If only Emacs had a way to define `case-fold-search' right
> within the regexp itself.

I would like that too, but changing that isn't easy.
By the way, it seems that org-element-node-property-parser binds case-fold-search without actually using it. Bug?





This bug report was last modified 2 years and 37 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.