GNU bug report logs -
#64128
regexp parser zero-width assertion bugs
Previous Next
Full log
View this message in rfc822 format
On 2023-06-19 11:34, Mattias Engdegård wrote:
> Here is a reduced patch that only fixes the really silly behaviour reported earlier, by making sure that `laststart` is reset correctly for all group A assertions. This should be uncontroversial.
> Maybe we should change group B assertions so that they work in the same way.
> - operand. Reset at the beginning of groups and alternatives. */
> + operand. Reset at the beginning of groups and alternatives,
> + and after zero-width assertions which should not be the target
> + of any postfix repetition operators. */
If I understand things correctly, this would cause "\b*c" to be treated
like "\b\*c". If so, it's headed in the wrong direction.
It's long been documented that the only reason "*" is ordinary at the
start of a regular expression or subexpression is "historical
compatibility", and it's also long been documented that you shouldn't
take advantage of this and you should backslash-escape the "*" anyway.
In contrast, for constructs like \b* there is not a historical
compatibility reason, so there's not a good argument for treating "*" as
an ordinary character after "\b".
Instead, \b should not be a special case before "*", and \b* should be
equivalent to \(\b\)* and should match only the empty string. Similarly
for the other zero-width backslash escapes. This is what I would expect
from these constructs from the longstanding documentation.
If we instead added a rule to say that a construct that can only match
the empty string causes following "*" to ordinary, then \b* and \(\b\)*
would both be equivalent to \*. Although consistent, this would be
confusing: it would compound the historical-compatibility mistake. Let's
keep things simple instead.
Also, whatever change we make to the behavior should be documented in
the manual and in etc/NEWS.
This bug report was last modified 2 years and 2 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.