GNU bug report logs -
#76731
C-style comment regexp example in (info "(elisp)Rx Notation") is not correct
Previous Next
Reported by: "Yue Yi" <include_yy <at> qq.com>
Date: Tue, 4 Mar 2025 04:09:02 UTC
Severity: wishlist
Done: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
>> Our regexp engine lacks to "switch" kind of operation needed for that.
>> After a series of `*` we have two mutually exclusive paths, depending on whether
>> the next char is a `/` or something else, but we don't have a "look
>> ahead" so we have to push a backtrack-point before we try to match the
>> next char with `/`.
>
> All true but we do have the ad-hoc non-backtracking optimisation. Right now
> it's only used for 'simple' patterns as in
>
> (: (* "a") "b")
>
> but there shouldn't be any reason not to use it for, say,
>
> (: (* "a" ARBITRARY-PATTERN) "b")
>
> should it? Except for ease of implementation, of course.
Yup, presumably the same `mutually_exclusive_p` test could be used to
detect when such an optimisation can be used, so it's a "small matter of
programming" (tho it includes the choice/design of appropriate
bytecodes, which is ... language design more than programming per se).
Last time I looked/thought about it, I saw two ways to do that:
- Add a bytecode to `pop` the last backtrack-point. This might(?) be
easy-ish to implement, and would indeed reduce the stack usage but
without speeding things up very much (we'd still push&pop the same
number of backtrack-points). Also it would require extra care
(e.g. punting) for cases where another backtrack-point gets pushed
in-between such as
(: (* (or "a" RE1) (or (and "abc") RE2)) "b")
- Add a kind of `switch` bytecode with a table that maps chars to
corresponding labels to jump to. This would speed up matching (and
would allow using that same regexp engine with a different
regexp-compiler that always generates a DFA) but it requires a fair
bit more work on the regexp-compiler side. I think it would require
a whole new regexp-compiler, which I would imagine implementing in
ELisp and calling explicitly at compile-time.
Stefan
This bug report was last modified 1 day ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.