GNU bug report logs - #58726
29.0.50; Bug in regexp matching with shy groups

Previous Next

Package: emacs;

Reported by: Michael Heerdegen <michael_heerdegen <at> web.de>

Date: Sun, 23 Oct 2022 01:42:02 UTC

Severity: normal

Found in version 29.0.50

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 58726 in the body.
You can then email your comments to 58726 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#58726; Package emacs. (Sun, 23 Oct 2022 01:42:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Heerdegen <michael_heerdegen <at> web.de>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 23 Oct 2022 01:42:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; Bug in regexp matching with shy groups
Date: Sun, 23 Oct 2022 03:41:25 +0200
Hello,

  (string-match-p "\\`\\(?:ab\\)*\\'" "a") ==> 0

That's wrong, the expected result is nil.  The language matched by that
regexp is {"", "ab", "abab", "ababab", ...}.

Changing to a non-shy group doesn't exploit the issue:

  (string-match-p "\\`\\(ab\\)*\\'" "a")  ==> nil

as expected.

I've been told (emacs-help, Bruno Barbier) that the problem exists at
least in emacs 27, emacs 28 and emacs 29.


TIA,

Michael.






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58726; Package emacs. (Sun, 23 Oct 2022 13:52:01 GMT) Full text and rfc822 format available.

Message #8 received at 58726 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 58726 <at> debbugs.gnu.org
Subject: 29.0.50; Bug in regexp matching with shy groups
Date: Sun, 23 Oct 2022 15:50:41 +0200
[Message part 1 (text/plain, inline)]
Michael, thank you for finding this amusing bug!

>   (string-match-p "\\`\\(?:ab\\)*\\'" "a") ==> 0

With a bit of help from the regexp-disasm package, we see that this compiles to

    0  begbuf
    1  on-failure-jump-smart to 11
    4  exact "ab"
    8  jump to 1
   11  endbuf
   12  succeed

where the on-failure-jump-smart op turns into on-failure-keep-string-jump the first time it's executed.

This gives us a clue about what is wrong: when there is a failure inside an 'exact' string match, the target pointer should be reset to the start of that string ("ab" here) before jumping to the failure location.

Reading the source it becomes clear that this is done correctly when there is a mismatch, but not if the target string ends prematurely because PREFETCH() has no idea that it should reset the target pointer! Easy enough to fix.

Please try the attached patch. (The patch takes care of counted repetitions for good measure although I wasn't able to provoke a failure directly.)

[0001-Fix-regexp-matching-with-atomic-strings-and-optimise.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58726; Package emacs. (Mon, 24 Oct 2022 02:39:02 GMT) Full text and rfc822 format available.

Message #11 received at 58726 <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 58726 <at> debbugs.gnu.org
Subject: Re: 29.0.50; Bug in regexp matching with shy groups
Date: Mon, 24 Oct 2022 04:38:42 +0200
Mattias Engdegård <mattiase <at> acm.org> writes:

> Please try the attached patch. (The patch takes care of counted
> repetitions for good measure although I wasn't able to provoke a
> failure directly.)

Yes, works for me, thanks.

Unfortunately I can't estimate whether your fix is correct and the right
thing to do, so all I have to offer is that I will run Emacs with your
patch installed and watch for any problems that it may have introduced.

Thanks again,

Michael.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58726; Package emacs. (Mon, 24 Oct 2022 10:57:02 GMT) Full text and rfc822 format available.

Message #14 received at 58726 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 58726 <at> debbugs.gnu.org
Subject: Re: 29.0.50; Bug in regexp matching with shy groups
Date: Mon, 24 Oct 2022 12:55:57 +0200
24 okt. 2022 kl. 04.38 skrev Michael Heerdegen <michael_heerdegen <at> web.de>:

> Unfortunately I can't estimate whether your fix is correct and the right
> thing to do, so all I have to offer is that I will run Emacs with your
> patch installed and watch for any problems that it may have introduced.

Thanks for testing! I'm fairly certain of its correctness, but there could be other places with a similar bug that I didn't find.
Nevertheless this should do it for now -- pushed to master.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58726; Package emacs. (Mon, 24 Oct 2022 11:19:02 GMT) Full text and rfc822 format available.

Message #17 received at 58726 <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 58726 <at> debbugs.gnu.org
Subject: Re: 29.0.50; Bug in regexp matching with shy groups
Date: Mon, 24 Oct 2022 13:17:47 +0200
Mattias Engdegård <mattiase <at> acm.org> writes:

> Thanks for testing! I'm fairly certain of its correctness, but there
> could be other places with a similar bug that I didn't find.
> Nevertheless this should do it for now -- pushed to master.

Ok, thanks.  Can we close this one?

Side note: I'm sorry to tell you, but you messed up the example in the
commit message (that one evals to 1).


Michael.




Reply sent to Mattias Engdegård <mattiase <at> acm.org>:
You have taken responsibility. (Mon, 24 Oct 2022 11:29:01 GMT) Full text and rfc822 format available.

Notification sent to Michael Heerdegen <michael_heerdegen <at> web.de>:
bug acknowledged by developer. (Mon, 24 Oct 2022 11:29:02 GMT) Full text and rfc822 format available.

Message #22 received at 58726-done <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 58726-done <at> debbugs.gnu.org
Subject: Re: 29.0.50; Bug in regexp matching with shy groups
Date: Mon, 24 Oct 2022 13:28:27 +0200
24 okt. 2022 kl. 13.17 skrev Michael Heerdegen <michael_heerdegen <at> web.de>:

> Ok, thanks.  Can we close this one?

Of course, done.

> I'm sorry to tell you, but you messed up the example in the
> commit message (that one evals to 1).

Oh dear, sorry about that! At least there is a reference to this bug in case someone wonders.





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 21 Nov 2022 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 213 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.