GNU bug report logs - #64128
regexp parser zero-width assertion bugs

Previous Next

Package: emacs;

Reported by: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>

Date: Sat, 17 Jun 2023 12:21:02 UTC

Severity: normal

Full log


Message #50 received at 64128 <at> debbugs.gnu.org (full text, mbox):

From: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 64128 <at> debbugs.gnu.org
Subject: Re: bug#64128: regexp parser zero-width assertion bugs
Date: Tue, 20 Jun 2023 13:36:42 +0200
[Message part 1 (text/plain, inline)]
19 juni 2023 kl. 22.08 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:

> Hmm... maybe it's less wrong, but I'd rather make it behave like
> AB\(\b\)*C, which is, I'd argue, even less wrong.

I agree, and you are probably right that it's safe to do that.

> Or maybe make it signal an error: I can't imagine that the current
> behavior is used by very much code at all, seeing how it's so
> seriously non-intuitive.

That might be even better if we can get away with it.

19 juni 2023 kl. 22.40 skrev Paul Eggert <eggert <at> cs.ucla.edu>:

> In other words, how about if we change the groups from your list:
> 
> Group A: ^ $ \` \' \b \B
> Group B: \< \> \_< \_> \=
> 
> to this:
> 
> Group A: ^ \`
> Group B: $ \' \b \B \< \> \_< \_> \=
> 
> where "*" is ordinary after Group A, and special after Group B and there is no other squirrelly behavior. And similarly for the other repetition operators.

Sounds fine, with the option to go full error on group B if we agree that that's even better.

> Attached is a proposed doc change for this, which I have not installed.

Thank you, it has been incorporated in the attached patch which follows your suggestions above.

Your previous regexp doc updates are most appreciated. I still think the whole chapter needs a reform from the sheer weight of organic growth over the years. In particular, the division between "regexp special" and "regexp backslash" is purely syntactical, not semantic, and groups things in the wrong way.


[0001-Straighten-regexp-postfix-operator-after-zero-width-.patch (application/octet-stream, attachment)]

This bug report was last modified 2 years and 2 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.