GNU bug report logs - #64128
regexp parser zero-width assertion bugs

Previous Next

Package: emacs;

Reported by: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>

Date: Sat, 17 Jun 2023 12:21:02 UTC

Severity: normal

Full log


Message #26 received at 64128 <at> debbugs.gnu.org (full text, mbox):

From: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 64128 <at> debbugs.gnu.org
Subject: Re: bug#64128: regexp parser zero-width assertion bugs
Date: Mon, 19 Jun 2023 10:44:04 +0200
19 juni 2023 kl. 05.04 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:

> `^` is only special if it's at the beginning of a group, so `^*` will
> always treat this * as a literal, right?
> "Similarly" `$` is only special if it's at the end of a group, so `$*` will
> always be a repetition of the $ character no?

Yes, ^ and $ have additional rules for when they are plain literals and not subject to these bugs at all.

The literal-splitting powers of ^ have now (075e77ac44) been removed.

> So the remaining problematic elements are \` \' \b and \B

\`* has been observed, so we probably need to keep that working as well.

> I suspect if we don't want to signal errors, the next best thing is to
> treat them like group B.

Yes, maybe; they are less likely to be followed by an operator-literal, but it would also be good to have all zero-width assertions work the same way.
On the other hand, it can't be worse than we have now, as long as we get rid of the "quack,\\b*" semantics.





This bug report was last modified 2 years and 2 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.