GNU bug report logs -
#16046
Bug with Regexp Containing only a Character Class with a Caret
Previous Next
Reported by: Cameron Desautels <camdez <at> gmail.com>
Date: Wed, 4 Dec 2013 10:06:03 UTC
Severity: normal
Done: Stefan Monnier <monnier <at> iro.umontreal.ca>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16046 in the body.
You can then email your comments to 16046 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#16046
; Package
emacs
.
(Wed, 04 Dec 2013 10:06:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Cameron Desautels <camdez <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Wed, 04 Dec 2013 10:06:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi all,
I've run across a dilemma, in the most literal sense: either there's a
problem in Emacs's regexp engine or there's an issue with
`regexp-opt-charset`---I'm not sure which.
The issue has to do with regular expressions containing character
classes with only a caret character. I know this seems like a rather
silly case (why not just use "\\^"?) but it came up in the context of
trying to track down a bug in ruby-mode, so it does occur in real (and
particularly *programmatic*) settings.
The simplest case to reproduce is the following:
(re-search-forward "[^]")
; => Debugger entered--Lisp error: (invalid-regexp "Unmatched [ or [^")
; re-search-forward("[^]")
; eval((re-search-forward "[^]") nil)
; eval-last-sexp-1(t)
; eval-last-sexp(t)
; eval-print-last-sexp()
; call-interactively(eval-print-last-sexp record nil)
; command-execute(eval-print-last-sexp record)
; execute-extended-command(nil "eval-print-last-sexp")
; call-interactively(execute-extended-command nil nil)
Now, you can make a compelling case that that's not a valid regexp
(and the Emacs Lisp Reference Manual doesn't seem to *directly*
contradict this argument), but that presents a problem when paired
with `regexp-opt-charset`:
(regexp-opt-charset '(?^))
=> "[^]"
Note that that produces the problem regexp; which is to say that the
following code is bound to fail when it should succeed:
(re-search-forward (regexp-opt-charset '(?^)))
What's the correct behavior? I'd be happy to offer a patch for either
side of the equation but I'm not sure which one to target.
All the best.
-- Cameron
In GNU Emacs 24.3.1 (x86_64-apple-darwin11.4.2, Carbon Version 1.6.0
AppKit 1138.51)
of 2013-05-13 on atago
Windowing system distributor `Apple Inc.', version 10.9.0
Configured using:
`configure '--with-mac'
'--enable-mac-app=/Users/xin/Documents/emacs-mac-port/build'
'--prefix=/Users/xin/Documents/emacs-mac-port/build''
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
/Applications/Emacs.app/Contents/Resources/lisp/.dir-locals hides
/Applications/Emacs.app/Contents/Resources/lisp/gnus/.dir-locals
Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils help-mode easymenu debug time-date tooltip
ediff-hook vc-hooks lisp-float-type mwheel mac-win tool-bar dnd fontset
image regexp-opt fringe tabulated-list newcomment lisp-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote mac multi-tty make-network-process emacs)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#16046
; Package
emacs
.
(Thu, 05 Dec 2013 19:28:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 16046 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
After further experimentation, I suspect that "[^]" is simply not
a valid regular expression. For instance, grep(1) gives the
following behavior:
$ echo "^" | grep "[^]"
grep: brackets ([ ]) not balanced
This suggests that the broken behavior is within
`regexp-opt-charset`. I've attached a patch for that function.
Here are some test cases which reveal the behavior of the unpatched
and patched versions of the function (the only difference is the
handling of the "[^]" case):
;; Pre-patch
(regexp-opt-charset (list ?^)) ; "[^]"
(regexp-opt-charset (list ?^ ?a)) ; "[a^]"
(regexp-opt-charset (list ?^ ?-)) ; "[-^]"
(regexp-opt-charset (list ?^ ?\])) ; "[]^]"
(regexp-opt-charset (list ?^ ?- ?\])) ; "[]^-]"
;; Post-patch
(regexp-opt-charset (list ?^)) ; "\\^"
(regexp-opt-charset (list ?^ ?a)) ; "[a^]"
(regexp-opt-charset (list ?^ ?-)) ; "[-^]"
(regexp-opt-charset (list ?^ ?\])) ; "[]^]"
(regexp-opt-charset (list ?^ ?- ?\])) ; "[]^-]"
--
Cameron Desautels <camdez <at> gmail.com>
[regexp-opt.el.diff (text/plain, attachment)]
Reply sent
to
Stefan Monnier <monnier <at> iro.umontreal.ca>
:
You have taken responsibility.
(Thu, 05 Dec 2013 20:27:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Cameron Desautels <camdez <at> gmail.com>
:
bug acknowledged by developer.
(Thu, 05 Dec 2013 20:27:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 16046-done <at> debbugs.gnu.org (full text, mbox):
> After further experimentation, I suspect that "[^]" is simply not
> a valid regular expression.
Indeed, according to the documentation, for ^ to be treated as itself,
it needs to be "not the first char", but since we have nothing else to
put there, we're kind of screwed.
> This suggests that the broken behavior is within
> `regexp-opt-charset`. I've attached a patch for that function.
Thank you for tracking down the problem and providing a fix. I just
installed it in trunk, closing,
Stefan
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 03 Jan 2014 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 220 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.