GNU bug report logs -
#76731
C-style comment regexp example in (info "(elisp)Rx Notation") is not correct
Previous Next
Reported by: "Yue Yi" <include_yy <at> qq.com>
Date: Tue, 4 Mar 2025 04:09:02 UTC
Severity: wishlist
Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #8 received at 76731 <at> debbugs.gnu.org (full text, mbox):
"Yue Yi" via "Bug reports for GNU Emacs, the Swiss army knife of text
editors" <bug-gnu-emacs <at> gnu.org> writes:
> Hello Emacs,
>
> In Elisp Manual's Rx Notation section, we have
>
> -------------------------------------------------------------------
>
> Here is an ‘rx’ regexp(1) that matches a block comment in the C
> programming language:
>
> (rx "/*" ; Initial /*
> (zero-or-more
> (or (not "*") ; Either non-*,
> (seq "*" ; or * followed by
> (not "/")))) ; non-/
> (one-or-more "*") ; At least one star,
> "/") ; and the final /
>
> or, using shorter synonyms and written more compactly,
>
> (rx "/*"
> (* (| (not "*")
> (: "*" (not "/"))))
> (+ "*") "/")
>
> In conventional string syntax, it would be written
>
> "/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
> --------------------------------------------------------------------
>
> Sadly, this regexp is not correct, as demonstated by this simple
> example: (Try M-x isearch-forward-regexp with
> /\*\(?:[^*]\|\*[^/]\)*\*+/)
>
> /***/ 123 /* anything else */
>
> As you can see, the entire line above is highlighted by the search,
> meaning that the whole line has been matched. In fact, this issue
> occurs when the number of asterisks in /*(nstar)*/ is odd.
>
> The correct regular expression is:
>
> /\*\(?:[^*]\|\*+[^*/]\)*\*+/
>
> The corresponding RX expression in the original document could be:
>
> (rx "/*"
> (zero-or-more
> (or (not "*")
> (seq (one-or-more "*")
> (not (or "*" "/")))))
> (one-or-more "*")
> "/")
>
> Or:
>
> (rx "/*"
> (* (| (not "*")
> (: (1+ "*") (not (or "*" "/")))))
> (1+ "*") "/")
>
> BTW, using non-greedy `*?', the simplest way might be:
>
> (rx "/*"
> (*? anything)
> "*/")
>
> "/\\*[^z-a]*?\\*/" or "/\\*\\(?:.\\|\n\\)*?\\*/"
>
> Regards.
Mattias, any comments?
This bug report was last modified today.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.