From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 04 05:05:43 2013 Received: (at submit) by debbugs.gnu.org; 4 Dec 2013 10:05:43 +0000 Received: from localhost ([127.0.0.1]:57311 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vo9Kw-0006wT-OC for submit@debbugs.gnu.org; Wed, 04 Dec 2013 05:05:43 -0500 Received: from eggs.gnu.org ([208.118.235.92]:49675) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vo4XC-0007dR-6E for submit@debbugs.gnu.org; Tue, 03 Dec 2013 23:58:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vo4XA-0006bg-ND for submit@debbugs.gnu.org; Tue, 03 Dec 2013 23:58:01 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:34077) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vo4XA-0006bc-Kp for submit@debbugs.gnu.org; Tue, 03 Dec 2013 23:58:00 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51391) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vo4X9-0004aq-K4 for bug-gnu-emacs@gnu.org; Tue, 03 Dec 2013 23:58:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vo4X8-0006b8-HH for bug-gnu-emacs@gnu.org; Tue, 03 Dec 2013 23:57:59 -0500 Received: from mail-pd0-x22c.google.com ([2607:f8b0:400e:c02::22c]:52991) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vo4X8-0006as-9s for bug-gnu-emacs@gnu.org; Tue, 03 Dec 2013 23:57:58 -0500 Received: by mail-pd0-f172.google.com with SMTP id g10so21666926pdj.3 for ; Tue, 03 Dec 2013 20:57:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=hPwxv+mv+IFf49T7fvesLI6o+fnNO1xlOKpifH5APiM=; b=QBepIBdTFCQbedAG352ZzUkZurtr0OaAgrPWthwVxL+qcVlWL08muZw2/DIJitPEvn pYXbmzIgvJ5x4+OozLw/QtHNC6JtiKpHGliQY2E51AMRH/E1aFuIBUbRxxx4mi+kXhIR p8d0amyftHOr7MRAOsIswFHbGVRSTVcf6GsbDBcPMwDJiPr+PoT1Jc8qRJ9q+XFnvOxT 8QGafKYffBZ5cmI2BPcgvDs3d15r5UMmhTdshlASMczMVY9bdzuZscyQBv2Uje/vO/V4 eCSAMeKQyh+Gl3GND5bjWcNdf0sslGElc7t1RuBXNiLFHIbj6EMUyeWJEK06hEKOCWiM G4JQ== MIME-Version: 1.0 X-Received: by 10.68.200.33 with SMTP id jp1mr9095923pbc.21.1386133076456; Tue, 03 Dec 2013 20:57:56 -0800 (PST) Received: by 10.70.50.228 with HTTP; Tue, 3 Dec 2013 20:57:56 -0800 (PST) Date: Tue, 3 Dec 2013 22:57:56 -0600 Message-ID: Subject: Bug with Regexp Containing only a Character Class with a Caret From: Cameron Desautels To: bug-gnu-emacs@gnu.org Content-Type: text/plain; charset=ISO-8859-1 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 04 Dec 2013 05:05:39 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Hi all, I've run across a dilemma, in the most literal sense: either there's a problem in Emacs's regexp engine or there's an issue with `regexp-opt-charset`---I'm not sure which. The issue has to do with regular expressions containing character classes with only a caret character. I know this seems like a rather silly case (why not just use "\\^"?) but it came up in the context of trying to track down a bug in ruby-mode, so it does occur in real (and particularly *programmatic*) settings. The simplest case to reproduce is the following: (re-search-forward "[^]") ; => Debugger entered--Lisp error: (invalid-regexp "Unmatched [ or [^") ; re-search-forward("[^]") ; eval((re-search-forward "[^]") nil) ; eval-last-sexp-1(t) ; eval-last-sexp(t) ; eval-print-last-sexp() ; call-interactively(eval-print-last-sexp record nil) ; command-execute(eval-print-last-sexp record) ; execute-extended-command(nil "eval-print-last-sexp") ; call-interactively(execute-extended-command nil nil) Now, you can make a compelling case that that's not a valid regexp (and the Emacs Lisp Reference Manual doesn't seem to *directly* contradict this argument), but that presents a problem when paired with `regexp-opt-charset`: (regexp-opt-charset '(?^)) => "[^]" Note that that produces the problem regexp; which is to say that the following code is bound to fail when it should succeed: (re-search-forward (regexp-opt-charset '(?^))) What's the correct behavior? I'd be happy to offer a patch for either side of the equation but I'm not sure which one to target. All the best. -- Cameron In GNU Emacs 24.3.1 (x86_64-apple-darwin11.4.2, Carbon Version 1.6.0 AppKit 1138.51) of 2013-05-13 on atago Windowing system distributor `Apple Inc.', version 10.9.0 Configured using: `configure '--with-mac' '--enable-mac-app=/Users/xin/Documents/emacs-mac-port/build' '--prefix=/Users/xin/Documents/emacs-mac-port/build'' Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix default enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Load-path shadows: /Applications/Emacs.app/Contents/Resources/lisp/.dir-locals hides /Applications/Emacs.app/Contents/Resources/lisp/gnus/.dir-locals Features: (shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils help-mode easymenu debug time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel mac-win tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote mac multi-tty make-network-process emacs) From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 05 14:27:02 2013 Received: (at 16046) by debbugs.gnu.org; 5 Dec 2013 19:27:02 +0000 Received: from localhost ([127.0.0.1]:60368 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VoeZh-0004mB-SC for submit@debbugs.gnu.org; Thu, 05 Dec 2013 14:27:02 -0500 Received: from mail-bk0-f46.google.com ([209.85.214.46]:35873) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VoeZf-0004lr-DT for 16046@debbugs.gnu.org; Thu, 05 Dec 2013 14:26:59 -0500 Received: by mail-bk0-f46.google.com with SMTP id u15so7295664bkz.19 for <16046@debbugs.gnu.org>; Thu, 05 Dec 2013 11:26:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=rUHWEyQVRepKhe52l0Zv9j1rkSadXOX9HyamtCsSq2M=; b=zm83jhgDc+qgGhHKJMxbLMyITmzvXLV9HJ8CXBVatpcdM3KTt1IpA29rKktxcLY4eX RLgd/ZX4qFg/pVv5hHQXRsgrRMZtLzLdSpFTXAcAnSGNsBqFiwfqoy951zwSVn8EyLQi yGNFeuXBxXG2WU/Y9BKrfdhsHiAjDccLDxRnlQjBKqISiOT2eHir8mznr+S2bepW1ptk nWWc6p8HxX/0D0il8NC4Ed/TpgATB3ToOotiPNUE/sughp/SIpPKB7tzHibN+HAL1yUR BqKw5+Kb9lkdfxG+Chw5zm+Q7wiJORDqYgB6tcsIcbfJux85gTF/9nZc5fG6nvduLlTE cyQQ== MIME-Version: 1.0 X-Received: by 10.204.227.14 with SMTP id iy14mr12738bkb.161.1386271618302; Thu, 05 Dec 2013 11:26:58 -0800 (PST) Received: by 10.205.26.197 with HTTP; Thu, 5 Dec 2013 11:26:58 -0800 (PST) Date: Thu, 5 Dec 2013 13:26:58 -0600 Message-ID: Subject: Bug with Regexp Containing only a Character Class with a Caret (PATCH) From: Cameron Desautels To: 16046@debbugs.gnu.org Content-Type: multipart/mixed; boundary=485b3970d1ee5f41ca04ecce83a2 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16046 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --485b3970d1ee5f41ca04ecce83a2 Content-Type: text/plain; charset=ISO-8859-1 After further experimentation, I suspect that "[^]" is simply not a valid regular expression. For instance, grep(1) gives the following behavior: $ echo "^" | grep "[^]" grep: brackets ([ ]) not balanced This suggests that the broken behavior is within `regexp-opt-charset`. I've attached a patch for that function. Here are some test cases which reveal the behavior of the unpatched and patched versions of the function (the only difference is the handling of the "[^]" case): ;; Pre-patch (regexp-opt-charset (list ?^)) ; "[^]" (regexp-opt-charset (list ?^ ?a)) ; "[a^]" (regexp-opt-charset (list ?^ ?-)) ; "[-^]" (regexp-opt-charset (list ?^ ?\])) ; "[]^]" (regexp-opt-charset (list ?^ ?- ?\])) ; "[]^-]" ;; Post-patch (regexp-opt-charset (list ?^)) ; "\\^" (regexp-opt-charset (list ?^ ?a)) ; "[a^]" (regexp-opt-charset (list ?^ ?-)) ; "[-^]" (regexp-opt-charset (list ?^ ?\])) ; "[]^]" (regexp-opt-charset (list ?^ ?- ?\])) ; "[]^-]" -- Cameron Desautels --485b3970d1ee5f41ca04ecce83a2 Content-Type: text/plain; charset=US-ASCII; name="regexp-opt.el.diff" Content-Disposition: attachment; filename="regexp-opt.el.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_houebhuj0 KioqIHJlZ2V4cC1vcHQuZWwub3JpZwlUaHUgRGVjICA1IDExOjE3OjE5IDIwMTMKLS0tIHJlZ2V4 cC1vcHQuZWwJVGh1IERlYyAgNSAxMToxOTozMSAyMDEzCioqKioqKioqKioqKioqKiBDSEFSUyBz aG91bGQgYmUgYSBsaXN0IG9mIGNoYXJhY3RlcnMuIgoqKiogMjg1LDI5MSAqKioqCiAgICAgIDs7 CiAgICAgIDs7IE1ha2Ugc3VyZSBhIGNhcmV0IGlzIG5vdCBmaXJzdCBhbmQgYSBkYXNoIGlzIGZp cnN0IG9yIGxhc3QuCiAgICAgIChpZiAoYW5kIChzdHJpbmctZXF1YWwgY2hhcnNldCAiIikgKHN0 cmluZy1lcXVhbCBicmFja2V0ICIiKSkKISAJKGNvbmNhdCAiWyIgZGFzaCBjYXJldCAiXSIpCiAg ICAgICAgKGNvbmNhdCAiWyIgYnJhY2tldCBjaGFyc2V0IGNhcmV0IGRhc2ggIl0iKSkpKQogIAog IChwcm92aWRlICdyZWdleHAtb3B0KQotLS0gMjg1LDI5MyAtLS0tCiAgICAgIDs7CiAgICAgIDs7 IE1ha2Ugc3VyZSBhIGNhcmV0IGlzIG5vdCBmaXJzdCBhbmQgYSBkYXNoIGlzIGZpcnN0IG9yIGxh c3QuCiAgICAgIChpZiAoYW5kIChzdHJpbmctZXF1YWwgY2hhcnNldCAiIikgKHN0cmluZy1lcXVh bCBicmFja2V0ICIiKSkKISAJKGlmIChzdHJpbmctZXF1YWwgZGFzaCAiIikKISAgICAgICAgICAg ICAiXFxeIiAgICAgICAgICAgICAgICAgICAgICAgOyBbXl0gaXMgbm90IGEgdmFsaWQgcmVnZXhw CiEgICAgICAgICAgIChjb25jYXQgIlsiIGRhc2ggY2FyZXQgIl0iKSkKICAgICAgICAoY29uY2F0 ICJbIiBicmFja2V0IGNoYXJzZXQgY2FyZXQgZGFzaCAiXSIpKSkpCiAgCiAgKHByb3ZpZGUgJ3Jl Z2V4cC1vcHQpCg== --485b3970d1ee5f41ca04ecce83a2-- From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 05 15:26:43 2013 Received: (at 16046-done) by debbugs.gnu.org; 5 Dec 2013 20:26:43 +0000 Received: from localhost ([127.0.0.1]:60445 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VofVS-0006IB-5b for submit@debbugs.gnu.org; Thu, 05 Dec 2013 15:26:42 -0500 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:13797) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VofVQ-0006I3-Ro for 16046-done@debbugs.gnu.org; Thu, 05 Dec 2013 15:26:41 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av4EABK/CFFFxL6g/2dsb2JhbABEvw4Xc4IeAQEEAVYjBQsLDiYSFBgNJIgeBsEtkQoDiGGcGYFegxU X-IPAS-Result: Av4EABK/CFFFxL6g/2dsb2JhbABEvw4Xc4IeAQEEAVYjBQsLDiYSFBgNJIgeBsEtkQoDiGGcGYFegxU X-IronPort-AV: E=Sophos;i="4.84,565,1355115600"; d="scan'208";a="41241335" Received: from 69-196-190-160.dsl.teksavvy.com (HELO pastel.home) ([69.196.190.160]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 05 Dec 2013 15:26:39 -0500 Received: by pastel.home (Postfix, from userid 20848) id 1CF49602B1; Thu, 5 Dec 2013 15:26:38 -0500 (EST) From: Stefan Monnier To: Cameron Desautels Subject: Re: bug#16046: Bug with Regexp Containing only a Character Class with a Caret (PATCH) Message-ID: References: Date: Thu, 05 Dec 2013 15:26:38 -0500 In-Reply-To: (Cameron Desautels's message of "Thu, 5 Dec 2013 13:26:58 -0600") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 16046-done Cc: 16046-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) > After further experimentation, I suspect that "[^]" is simply not > a valid regular expression. Indeed, according to the documentation, for ^ to be treated as itself, it needs to be "not the first char", but since we have nothing else to put there, we're kind of screwed. > This suggests that the broken behavior is within > `regexp-opt-charset`. I've attached a patch for that function. Thank you for tracking down the problem and providing a fix. I just installed it in trunk, closing, Stefan From unknown Sat Aug 09 13:19:04 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 03 Jan 2014 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator