GNU bug report logs -
#37659
rx additions: anychar, unmatchable, unordered-or
Previous Next
Reported by: Mattias Engdegård <mattiase <at> acm.org>
Date: Tue, 8 Oct 2019 09:37:01 UTC
Severity: wishlist
Tags: fixed, patch
Fixed in version 27.1
Done: Mattias Engdegård <mattiase <at> acm.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
'regexp-opt' always generates a regexp preferring long matches. This is undocumented, but useful enough that I would be surprised if this property wasn't exploited (perhaps unknowingly) by callers. It's quite natural: given a set of strings, surely the caller want them all to be candidates for a match, even if there is no following anchoring pattern.
Thus, instead of 'unordered-or', define the operator in terms of long matches: 'or-max' (working name) would work like 'or' but guarantee a longest match, and only permit strings and 'or-max' forms as arguments. Thus, the rx user gets all the benefits from 'regexp-opt' in a composable way, without a need to sort the strings or otherwise prepare them.
(The old 'or' behaviour always used 'regexp-opt' when possible, which was very fragile: (or "a" "ab") would match "ab", but (or "a" "ab" digit) would just match "a". 'or-max' is robust, without surprises.)
Of course, we should also guarantee the maximum-matching property of regexp-opt. This is just a matter of documentation (and test); it does not restrict optimisations as far as I can tell.
Again, I'm open to suggestions about a better name than 'or-max'.
The other patches (anychar, unmatchable, and [^z-a]) have been pushed to master.
This bug report was last modified 5 years and 81 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.