GNU bug report logs -
#34641
rx: (or ...) order unpredictable
Previous Next
Full log
View this message in rfc822 format
The rx (or ...) construct sometimes reorders its subexpressions, which makes its semantics unpredictable. For example,
(rx (or "ab" "a") (or "a" "ab"))
=>
"\\(?:ab?\\)\\(?:ab?\\)"
The user reasonably expects (or e1 e2) to translate to E1\|E2, where ei translates to Ei, or a semantic equivalent. Not having this control makes rx useless or dangerous for many purposes.
The reason for the reordering is the use of regex-opt behind the scenes. Whether rx is the place to do this kind of optimisation is a matter of opinion; mine is that it belongs in the regexp engine, together with other, more aggressive optimisations (DFA, native-code generation, etc) could be performed as well.
We could determine whether any string is a prefix of another. If not, regexp-opt should be safe to call. Alternatively, this check could be done in regexp-opt (activated by a flag). That would be my preferred short-term solution.
(Speaking of regexp-opt, it has another bug that does not affect rx: it returns the empty string if given an empty list of strings. The correct return value is a regexp that never matches anything. Fix it, document it, or turn it into an error?)
This bug report was last modified 6 years and 72 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.