GNU bug report logs - #37849
composable character alternatives in rx

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Mon, 21 Oct 2019 10:25:01 UTC

Severity: normal

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 37849 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: 37849 <at> debbugs.gnu.org
Subject: Re: bug#37849: composable character alternatives in rx  
Date: Fri, 6 Dec 2019 22:58:46 +0100
[Message part 1 (text/plain, inline)]
This patch adds `union' and `intersection' to rx. They both take zero or more charsets as arguments. A charset is either an `any' form that does not contain character classes, a `union' or `intersection' form, or a `not' form with charset argument.

Example:

(rx (union (any "a-f") (any "b-m")))
=> "[a-m]"

(rx (intersection (any "a-f") (any "b-m")))
=> "[b-f]"

The character class limitation stems from the inability to complement or intersect classes in general. It would be possible to partially lift this restriction for `union'; it is clear that

(rx (union (any "ab" space) (any "bc" space digit)))
=> "[abc[:space:][:digit:]]"

but it makes the facility harder to explain to the user in a way that makes sense. Still, it could be a future extension.

A `difference' operator was not included but could be added; it is trivially defined in rx as

(rx-define difference (a b)
  (intersection a (not b)))

The names `union' and `intersection' are verbose, but should be rare enough that it's better with something descriptive.
SRE, from where the concept was taken, uses `|' and `&' respectively, and `~' for complement, `-' for difference.

[0001-Add-union-and-intersection-to-rx-bug-37849.patch (application/octet-stream, attachment)]

This bug report was last modified 5 years and 161 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.