GNU bug report logs - #76731
C-style comment regexp example in (info "(elisp)Rx Notation") is not correct

Previous Next

Package: emacs;

Reported by: "Yue Yi" <include_yy <at> qq.com>

Date: Tue, 4 Mar 2025 04:09:02 UTC

Severity: wishlist

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: "Yue Yi" <include_yy <at> qq.com>
To: 76731 <at> debbugs.gnu.org
Subject: bug#76731: C-style comment regexp example in (info "(elisp)Rx Notation") is not correct
Date: Tue, 4 Mar 2025 11:58:36 +0800
[Message part 1 (text/plain, inline)]
Hello Emacs, In Elisp Manual's Rx Notation section, we have -------------------------------------------------------------------    Here is an ¡®rx¡¯ regexp(1) that matches a block comment in the C programming language:      (rx "/*"                    ; Initial /*          (zero-or-more           (or (not "*")          ;  Either non-*,               (seq "*"           ;  or * followed by                    (not "/"))))  ;     non-/          (one-or-more "*")       ; At least one star,          "/")                    ; and the final / or, using shorter synonyms and written more compactly,      (rx "/*"          (* (| (not "*")                (: "*" (not "/"))))          (+ "*") "/") In conventional string syntax, it would be written      "/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/" -------------------------------------------------------------------- Sadly, this regexp is not correct, as demonstated by this simple example: (Try M-x isearch-forward-regexp with /\*\(?:[^*]\|\*[^/]\)*\*+/) /***/ 123 /* anything else */ As you can see, the entire line above is highlighted by the search, meaning that the whole line has been matched. In fact, this issue occurs when the number of asterisks in /*(nstar)*/ is odd. The correct regular expression is: /\*\(?:[^*]\|\*+[^*/]\)*\*+/ The corresponding RX expression in the original document could be: (rx "/*"     (zero-or-more      (or (not "*") 	 (seq (one-or-more "*") 	      (not (or "*" "/")))))     (one-or-more "*")     "/") Or: (rx "/*"     (* (| (not "*") 	  (: (1+ "*") (not (or "*" "/")))))     (1+ "*") "/") BTW, using non-greedy `*?', the simplest way might be: (rx "/*"     (*? anything)     "*/") "/\\*[^z-a]*?\\*/" or "/\\*\\(?:.\\|\n\\)*?\\*/" Regards.
[Message part 2 (text/html, inline)]

This bug report was last modified today.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.