GNU bug report logs - #76731
C-style comment regexp example in (info "(elisp)Rx Notation") is not correct

Previous Next

Package: emacs;

Reported by: "Yue Yi" <include_yy <at> qq.com>

Date: Tue, 4 Mar 2025 04:09:02 UTC

Severity: wishlist

Done: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>
Cc: Yue Yi <include_yy <at> qq.com>, 76731 <at> debbugs.gnu.org, Stefan Kangas <stefankangas <at> gmail.com>
Subject: bug#76731: C-style comment regexp example in (info "(elisp)Rx Notation") is not correct
Date: Tue, 20 May 2025 10:21:27 -0400
>> Our regexp engine lacks to "switch" kind of operation needed for that.
>> After a series of `*` we have two mutually exclusive paths, depending on whether
>> the next char is a `/` or something else, but we don't have a "look
>> ahead" so we have to push a backtrack-point before we try to match the
>> next char with `/`.
>
> All true but we do have the ad-hoc non-backtracking optimisation. Right now
> it's only used for 'simple' patterns as in
>
> (: (* "a") "b")
>
> but there shouldn't be any reason not to use it for, say,
>
> (: (* "a" ARBITRARY-PATTERN) "b")
>
> should it? Except for ease of implementation, of course.

Yup, presumably the same `mutually_exclusive_p` test could be used to
detect when such an optimisation can be used, so it's a "small matter of
programming" (tho it includes the choice/design of appropriate
bytecodes, which is ... language design more than programming per se).
Last time I looked/thought about it, I saw two ways to do that:

- Add a bytecode to `pop` the last backtrack-point.  This might(?) be
  easy-ish to implement, and would indeed reduce the stack usage but
  without speeding things up very much (we'd still push&pop the same
  number of backtrack-points).  Also it would require extra care
  (e.g. punting) for cases where another backtrack-point gets pushed
  in-between such as

      (: (* (or "a" RE1) (or (and "abc") RE2)) "b")

- Add a kind of `switch` bytecode with a table that maps chars to
  corresponding labels to jump to.  This would speed up matching (and
  would allow using that same regexp engine with a different
  regexp-compiler that always generates a DFA) but it requires a fair
  bit more work on the regexp-compiler side.  I think it would require
  a whole new regexp-compiler, which I would imagine implementing in
  ELisp and calling explicitly at compile-time.


        Stefan





This bug report was last modified 1 day ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.