GNU bug report logs - #30568
27.0.50; `rx' doesn't create optimal regex for (group (or ...))

Previous Next

Package: emacs;

Reported by: p.stephani2 <at> gmail.com

Date: Wed, 21 Feb 2018 14:33:01 UTC

Severity: wishlist

Found in version 27.0.50

To reply to this bug, email your comments to 30568 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#30568; Package emacs. (Wed, 21 Feb 2018 14:33:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to p.stephani2 <at> gmail.com:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 21 Feb 2018 14:33:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: p.stephani2 <at> gmail.com
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50; `rx' doesn't create optimal regex for (group (or ...))
Date: Wed, 21 Feb 2018 15:31:56 +0100
emacs -Q -batch --eval=3D'(progn (princ (rx (group (or "aaa" "bbb")))) (ter=
pri))'
=3D=3D> \(\(?:aaa\|bbb\)\)

This should generate \(aaa\|bbb\) instead.  Of course, these regexes are
equivalent, but the second one is easier to read (and maybe faster).


In GNU Emacs 27.0.50 (build 16, x86_64-pc-linux-gnu, GTK+ Version 3.22.24)
 of 2018-02-21 built on localhost
Repository revision: d599dce1353ce59d134fcff21cde02c70025253d
Windowing system distributor 'The X.Org Foundation', version 11.0.11903000
System Description: Debian GNU/Linux buster/sid

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --without-threads --enable-gcc-warnings=3Dwarn-only
 --enable-gtk-deprecation-warnings --without-pop --with-mailutils
 --enable-checking --enable-check-lisp-object-type --with-modules
 'CFLAGS=3D-O0 -ggdb3''

Configured features:
XPM JPEG TIFF GIF PNG SOUND DBUS GSETTINGS NOTIFY GNUTLS FREETYPE XFT
ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 MODULES JSON

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair time-date
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 95399 9075)
 (symbols 48 20247 1)
 (miscs 40 40 121)
 (strings 32 28348 1815)
 (string-bytes 1 757412)
 (vectors 16 14141)
 (vector-slots 8 499378 12892)
 (floats 8 50 67)
 (intervals 56 223 0)
 (buffers 992 12))

--=20
Google Germany GmbH
Erika-Mann-Stra=C3=9Fe 33
80636 M=C3=BCnchen

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Gesch=C3=A4ftsf=C3=BChrer: Paul Manicle, Halimah DeLaine Prado

If you received this communication by mistake, please don=E2=80=99t forward=
 it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person.  Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#30568; Package emacs. (Fri, 13 Dec 2019 19:20:02 GMT) Full text and rfc822 format available.

Message #8 received at 30568 <at> debbugs.gnu.org (full text, mbox):

From: Mattias EngdegÄrd <mattiase <at> acm.org>
To: 30568 <at> debbugs.gnu.org
Cc: Philipp <p.stephani2 <at> gmail.com>
Subject: Re: bug#30568: 27.0.50; `rx' doesn't create optimal regex for (group
 (or ...))
Date: Fri, 13 Dec 2019 20:18:58 +0100
> (rx (group (or "aaa" "bbb")))
> ==> \(\(?:aaa\|bbb\)\)
>
> This should generate \(aaa\|bbb\) instead.  Of course, these regexes are
> equivalent, but the second one is easier to read (and maybe faster).

This remains unchanged, I'm afraid, despite rx being completely rewritten. Not that it matters much: brackets do not generate any regexp bytecode, thus matching performance isn't affected once the regexp has been compiled. When the brackets are required, there is no waste:

(rx (+ (or "aaa" "bbb"))) 
=> "\\(?:aaa\\|bbb\\)+"

Still, it's a bit untidy, and I like that you reported it. We could add a special value for the PAREN argument to regexp-opt to prevent bracketing altogether, I suppose. It isn't immediately on my to-do list, however.





This bug report was last modified 5 years and 184 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.