GNU bug report logs -
#37659
rx additions: anychar, unmatchable, unordered-or
Previous Next
Reported by: Mattias Engdegård <mattiase <at> acm.org>
Date: Tue, 8 Oct 2019 09:37:01 UTC
Severity: wishlist
Tags: fixed, patch
Fixed in version 27.1
Done: Mattias Engdegård <mattiase <at> acm.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
22 okt. 2019 kl. 19.33 skrev Paul Eggert <eggert <at> cs.ucla.edu>:
> Moreover, if greed is the longstanding tradition for regexp-opt, shouldn't plain "or" be greedy, to be consistent with other operators?
Having second thoughts, I've come to believe that Paul may have been right after all. We might just as well let plain 'or' (alias '|') match as much as possible when it is able to do so. In particular, we should guarantee that this will happen when all arguments are strings, as used to be the case.
Initially I thought it was a bug that (or "a" "ab") was optimised into "ab?" on the grounds that this made the behaviour unpredictable: when matching the string "abc", (or "a" "ab") matched "ab", whereas (or "a" "ab" space) would match "a". However, the current 'fixed' code isn't necessarily more useful.
Since the change was introduced in Emacs 27 which has not yet been released, I suggest the attached patch for emacs-27. It reverts the use of regexp-opt with KEEP-ORDER = t. What do you think? It would solve the problem without introducing new constructs, and without running the risk of introducing subtle errors in existing rx expressions.
(In fact, if we do not do this in Emacs 27, we'd have to add a NEWS entry to warn users about the change.)
A further improvement would be to ensure that nested all-string 'or' forms would have the same property, and that expansion of user-defined forms would be transparent. In other words, that
(rx-let ((x (or "abc" "de")))
(rx (or "a" x (or "ab" "def"))))
would be equivalent to
(rx "abc" "ab" "a" "def" "de")
I'll prepare a patch for this QoI improvement, but the attached patch should be required no matter what.
[0001-rx-Use-longest-match-for-all-string-or-forms-bug-376.patch (application/octet-stream, attachment)]
This bug report was last modified 5 years and 81 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.