GNU bug report logs - #64128
regexp parser zero-width assertion bugs

Previous Next

Package: emacs;

Reported by: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>

Date: Sat, 17 Jun 2023 12:21:02 UTC

Severity: normal

Full log


Message #20 received at 64128 <at> debbugs.gnu.org (full text, mbox):

From: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, monnier <at> iro.umontreal.ca,
 64128 <at> debbugs.gnu.org
Subject: Re: bug#64128: regexp parser zero-width assertion bugs
Date: Sun, 18 Jun 2023 22:26:28 +0200
18 juni 2023 kl. 06.55 skrev Eli Zaretskii <eliz <at> gnu.org>:

> My comment is that since this was a documented feature, I'm not
> interested in making it an error.

Yes, it would be unwise to raise an error for "^*" or the like; it's in active use.
The manual is a bit hazy about what we actually promise, though.

As Paul notes, we must be able to document it and that might not be easy, so perhaps we shouldn't even try (to change, or document)?

To make everything clear, we have to groups of zero-width assertions:

Group A: ^ $ \` \' \b \B
Group B: \< \> \_< \_> \=

Group B assertions work like ordinary elements, syntactically and semantically. Simple, predictable, but also useless.

Group A assertions are more interesting: either there is nothing before a train of such assertions, such as

   "^\\`\\b\\`*?"

which turns the first character of the operator into a literal (and a second character, if present, now becomes an operator acting on that literal).
Or there is something, and the operator acts on the last element preceding the assertions, except that multiple literal characters coalesce to a single element. Except if one of the literal chars is an out-of-place `^` which splits a sequence of literals into separate segments but not exactly where you think it would.
For example,

  "abc^def\\B\\B+?"

means, I think,

  (seq "ab" (+? "c^def" not-word-boundary not-word-boundary))







This bug report was last modified 2 years and 2 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.