GNU bug report logs - #76731
C-style comment regexp example in (info "(elisp)Rx Notation") is not correct

Previous Next

Package: emacs;

Reported by: "Yue Yi" <include_yy <at> qq.com>

Date: Tue, 4 Mar 2025 04:09:02 UTC

Severity: wishlist

Done: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #21 received at 76731-done <at> debbugs.gnu.org (full text, mbox):

From: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
To: Yue Yi <include_yy <at> qq.com>
Cc: Stefan Kangas <stefankangas <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, 76731-done <at> debbugs.gnu.org
Subject: Re: bug#76731: C-style comment regexp example in (info "(elisp)Rx
 Notation") is not correct
Date: Sat, 17 May 2025 12:21:52 +0200
16 maj 2025 kl. 17.12 skrev Yue Yi <include_yy <at> qq.com>:

> I'm not an expert in regular expressions, but it seems that cases like C
> block comments are hard to handle without introducing
> backtracking.

I see no fundamental reason why they should be, as the C comment syntax can be parsed efficiently by a tiny state machine. The first "/*" encountered is always the beginning of the comment on matter what is found later, and the first "*/" after that is always the end. There is never any reason to go back and try a different parse.

Non-DFA regexp engines such as the one in Emacs need some hacks and/or carefully formulated regexps to avoid consuming stack space but that's a different matter. I still think we should be able to do better with either your or my regexps.

I kept your proposed fix instead of switching to a different example. The quoted-string case is simpler but the amount of backslashes detracted from the point of the exercise.

Fix pushed to master. Thank you again!





This bug report was last modified 1 day ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.