GNU bug report logs - #16046
Bug with Regexp Containing only a Character Class with a Caret

Previous Next

Package: emacs;

Reported by: Cameron Desautels <camdez <at> gmail.com>

Date: Wed, 4 Dec 2013 10:06:03 UTC

Severity: normal

Done: Stefan Monnier <monnier <at> iro.umontreal.ca>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#16046: closed (Bug with Regexp Containing only a Character
 Class with a Caret)
Date: Thu, 05 Dec 2013 20:27:01 +0000
[Message part 1 (text/plain, inline)]
Your message dated Thu, 05 Dec 2013 15:26:38 -0500
with message-id <jwvfvq7cavg.fsf-monnier+emacsbugs <at> gnu.org>
and subject line Re: bug#16046: Bug with Regexp Containing only a Character Class with a Caret (PATCH)
has caused the debbugs.gnu.org bug report #16046,
regarding Bug with Regexp Containing only a Character Class with a Caret
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
16046: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16046
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Cameron Desautels <camdez <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Bug with Regexp Containing only a Character Class with a Caret
Date: Tue, 3 Dec 2013 22:57:56 -0600
Hi all,

I've run across a dilemma, in the most literal sense: either there's a
problem in Emacs's regexp engine or there's an issue with
`regexp-opt-charset`---I'm not sure which.

The issue has to do with regular expressions containing character
classes with only a caret character.  I know this seems like a rather
silly case (why not just use "\\^"?) but it came up in the context of
trying to track down a bug in ruby-mode, so it does occur in real (and
particularly *programmatic*) settings.

The simplest case to reproduce is the following:

    (re-search-forward "[^]")
    ; => Debugger entered--Lisp error: (invalid-regexp "Unmatched [ or [^")
    ;   re-search-forward("[^]")
    ;   eval((re-search-forward "[^]") nil)
    ;   eval-last-sexp-1(t)
    ;   eval-last-sexp(t)
    ;   eval-print-last-sexp()
    ;   call-interactively(eval-print-last-sexp record nil)
    ;   command-execute(eval-print-last-sexp record)
    ;   execute-extended-command(nil "eval-print-last-sexp")
    ;   call-interactively(execute-extended-command nil nil)

Now, you can make a compelling case that that's not a valid regexp
(and the Emacs Lisp Reference Manual doesn't seem to *directly*
contradict this argument), but that presents a problem when paired
with `regexp-opt-charset`:

    (regexp-opt-charset '(?^))
    => "[^]"

Note that that produces the problem regexp; which is to say that the
following code is bound to fail when it should succeed:

    (re-search-forward (regexp-opt-charset '(?^)))

What's the correct behavior? I'd be happy to offer a patch for either
side of the equation but I'm not sure which one to target.

All the best.

-- Cameron


In GNU Emacs 24.3.1 (x86_64-apple-darwin11.4.2, Carbon Version 1.6.0
AppKit 1138.51)
 of 2013-05-13 on atago
Windowing system distributor `Apple Inc.', version 10.9.0
Configured using:
 `configure '--with-mac'
 '--enable-mac-app=/Users/xin/Documents/emacs-mac-port/build'
 '--prefix=/Users/xin/Documents/emacs-mac-port/build''

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
/Applications/Emacs.app/Contents/Resources/lisp/.dir-locals hides
/Applications/Emacs.app/Contents/Resources/lisp/gnus/.dir-locals

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils help-mode easymenu debug time-date tooltip
ediff-hook vc-hooks lisp-float-type mwheel mac-win tool-bar dnd fontset
image regexp-opt fringe tabulated-list newcomment lisp-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote mac multi-tty make-network-process emacs)


[Message part 3 (message/rfc822, inline)]
From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Cameron Desautels <camdez <at> gmail.com>
Cc: 16046-done <at> debbugs.gnu.org
Subject: Re: bug#16046: Bug with Regexp Containing only a Character Class with
 a Caret (PATCH)
Date: Thu, 05 Dec 2013 15:26:38 -0500
> After further experimentation, I suspect that "[^]" is simply not
> a valid regular expression.

Indeed, according to the documentation, for ^ to be treated as itself,
it needs to be "not the first char", but since we have nothing else to
put there, we're kind of screwed.

> This suggests that the broken behavior is within
> `regexp-opt-charset`.  I've attached a patch for that function.

Thank you for tracking down the problem and providing a fix.  I just
installed it in trunk, closing,


        Stefan


This bug report was last modified 11 years and 221 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.