GNU bug report logs - #37849
composable character alternatives in rx

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Mon, 21 Oct 2019 10:25:01 UTC

Severity: normal

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Mattias Engdegård <mattiase <at> acm.org>
To: 37849 <at> debbugs.gnu.org
Subject: bug#37849: composable character alternatives in rx
Date: Mon, 21 Oct 2019 12:24:21 +0200
[Message part 1 (text/plain, inline)]
Now that rx is user-extendible, some holes are showing. Example (from python.el):

      (simple-operator      . ,(rx (any ?+ ?- ?/ ?& ?^ ?~ ?| ?* ?< ?> ?= ?%)))
      ;; FIXME: rx should support (not simple-operator).
      (not-simple-operator  . ,(rx
                                (not
                                 (any ?+ ?- ?/ ?& ?^ ?~ ?| ?* ?< ?> ?= ?%))))

(This code uses the old rx-constituents mechanism, but the point applies equally to new-style definitions.)
More generally, there is currently no way to:

(1) Get the complement of a defined (any ...) form
(2) Get the union of two defined (any ...) forms
(3) Get the intersection of two defined (not (any ...)) forms

(1), which the example above was about, could be solved by expanding definitions inside 'not'. This is a step away from the principle that user-defined things are only allowed where general rx forms are, but perhaps tolerable. Proposed patch attached.

(2) can be solved by expanding definitions inside 'any', and allowing 'any' inside 'any' (flattening). Not sure I like this.

An alternative is to ensure that (or (any X) (any Y)) -> (any X Y), but then we either need to allow 'or' inside 'not', or add an intersection operator:

  (intersect (not (any X)) (not (any Y)) -> (not (any X Y))

We could also make 'not' variadic, turning it into complement-of-union:

  (not (any A) (any B)) -> (not (any A B))

Olin Shivers's SRE has a complete and closed set of operations on character sets (https://scsh.net/docu/post/sre.html). That would be principled and perhaps useful, but difficult to do fully in rx because not all such expressions can be rendered into Emacs regexps. Nothing prevents us from making a partial implementation, however.

[0001-Expand-rx-definitions-inside-not.patch (application/octet-stream, attachment)]

This bug report was last modified 5 years and 161 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.