GNU bug report logs -
#76731
C-style comment regexp example in (info "(elisp)Rx Notation") is not correct
Previous Next
Reported by: "Yue Yi" <include_yy <at> qq.com>
Date: Tue, 4 Mar 2025 04:09:02 UTC
Severity: wishlist
Done: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #21 received at 76731-done <at> debbugs.gnu.org (full text, mbox):
16 maj 2025 kl. 17.12 skrev Yue Yi <include_yy <at> qq.com>:
> I'm not an expert in regular expressions, but it seems that cases like C
> block comments are hard to handle without introducing
> backtracking.
I see no fundamental reason why they should be, as the C comment syntax can be parsed efficiently by a tiny state machine. The first "/*" encountered is always the beginning of the comment on matter what is found later, and the first "*/" after that is always the end. There is never any reason to go back and try a different parse.
Non-DFA regexp engines such as the one in Emacs need some hacks and/or carefully formulated regexps to avoid consuming stack space but that's a different matter. I still think we should be able to do better with either your or my regexps.
I kept your proposed fix instead of switching to a different example. The quoted-string case is simpler but the amount of backslashes detracted from the point of the exercise.
Fix pushed to master. Thank you again!
This bug report was last modified 1 day ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.