GNU bug report logs - #76731
C-style comment regexp example in (info "(elisp)Rx Notation") is not correct

Previous Next

Package: emacs;

Reported by: "Yue Yi" <include_yy <at> qq.com>

Date: Tue, 4 Mar 2025 04:09:02 UTC

Severity: wishlist

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 76731 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Yue Yi <include_yy <at> qq.com>, 76731 <at> debbugs.gnu.org
Cc: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Subject: Re: bug#76731: C-style comment regexp example in (info "(elisp)Rx
 Notation") is not correct
Date: Tue, 4 Mar 2025 18:10:43 +0000
"Yue Yi" via "Bug reports for GNU Emacs, the Swiss army knife of text
editors" <bug-gnu-emacs <at> gnu.org> writes:

> Hello Emacs,
>
> In Elisp Manual's Rx Notation section, we have
>
> -------------------------------------------------------------------
>
>    Here is an ‘rx’ regexp(1) that matches a block comment in the C
> programming language:
>
>      (rx "/*"                    ; Initial /*
>          (zero-or-more
>           (or (not "*")          ;  Either non-*,
>               (seq "*"           ;  or * followed by
>                    (not "/"))))  ;     non-/
>          (one-or-more "*")       ; At least one star,
>          "/")                    ; and the final /
>
> or, using shorter synonyms and written more compactly,
>
>      (rx "/*"
>          (* (| (not "*")
>                (: "*" (not "/"))))
>          (+ "*") "/")
>
> In conventional string syntax, it would be written
>
>      "/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
> --------------------------------------------------------------------
>
> Sadly, this regexp is not correct, as demonstated by this simple
> example: (Try M-x isearch-forward-regexp with
> /\*\(?:[^*]\|\*[^/]\)*\*+/)
>
> /***/ 123 /* anything else */
>
> As you can see, the entire line above is highlighted by the search,
> meaning that the whole line has been matched. In fact, this issue
> occurs when the number of asterisks in /*(nstar)*/ is odd.
>
> The correct regular expression is:
>
> /\*\(?:[^*]\|\*+[^*/]\)*\*+/
>
> The corresponding RX expression in the original document could be:
>
> (rx "/*"
>     (zero-or-more
>      (or (not "*")
>          (seq (one-or-more "*")
>               (not (or "*" "/")))))
>     (one-or-more "*")
>     "/")
>
> Or:
>
> (rx "/*"
>     (* (| (not "*")
>           (: (1+ "*") (not (or "*" "/")))))
>     (1+ "*") "/")
>
> BTW, using non-greedy `*?', the simplest way might be:
>
> (rx "/*"
>     (*? anything)
>     "*/")
>
> "/\\*[^z-a]*?\\*/" or "/\\*\\(?:.\\|\n\\)*?\\*/"
>
> Regards.

Mattias, any comments?




This bug report was last modified today.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.