From monnier@iro.umontreal.ca Sun Mar 30 15:10:48 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: X-Spam-Status: No, score=0.1 required=4.0 tests=FOURLA autolearn=no version=3.2.3-bugs.debian.org_2005_01_02 Received: (at submit) by emacsbugs.donarmstrong.com; 30 Mar 2008 22:10:48 +0000 Received: from ironport2-out.teksavvy.com (ironport2-out.teksavvy.com [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m2UMAeaJ011321 for ; Sun, 30 Mar 2008 15:10:42 -0700 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnkFAFus70fO+LLN/2dsb2JhbACBWpBplGo X-IronPort-AV: E=Sophos;i="4.25,579,1199682000"; d="scan'208";a="17176641" Received: from smtp.pppoe.ca (HELO smtp.teksavvy.com) ([65.39.196.238]) by ironport2-out.teksavvy.com with ESMTP; 30 Mar 2008 18:10:35 -0400 Received: from pastel.home ([206.248.178.205]) by smtp.teksavvy.com (Internet Mail Server v1.0) with ESMTP id KZO54635 for ; Sun, 30 Mar 2008 18:10:35 -0400 Received: by pastel.home (Postfix, from userid 20848) id 2FC4B7FDF; Sun, 30 Mar 2008 18:10:35 -0400 (EDT) Resent-Message-ID: Resent-To: submit@debbugs.gnu.org Resent-From: Stefan Monnier Resent-Date: Sun, 30 Mar 2008 18:10:35 -0400 X-Original-To: monnier@iro.umontreal.ca Received: from pinpin.iro.umontreal.ca (pinpin.iro.umontreal.ca [132.204.24.52]) by mercure.iro.umontreal.ca (Postfix) with ESMTP id 9EDA92CFA65 for ; Sun, 30 Mar 2008 14:42:13 -0400 (EDT) Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by pinpin.iro.umontreal.ca (Postfix) with ESMTP id 0C30E3B8AA7 for ; Sun, 30 Mar 2008 14:41:48 -0400 (EDT) Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jg2TP-0006sC-TA for monnier@iro.umontreal.ca; Sun, 30 Mar 2008 14:41:43 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Jg2QC-0004lG-AJ for bug-gnu-emacs@gnu.org; Sun, 30 Mar 2008 14:38:24 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Jg2QA-0004kV-DT for bug-gnu-emacs@gnu.org; Sun, 30 Mar 2008 14:38:23 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jg2QA-0004kN-68 for bug-gnu-emacs@gnu.org; Sun, 30 Mar 2008 14:38:22 -0400 Received: from boum.org ([204.13.164.185]) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Jg2Q9-0008JH-MP for bug-gnu-emacs@gnu.org; Sun, 30 Mar 2008 14:38:21 -0400 Received: from localhost (localhost [127.0.0.1]) by boum.org (Postfix) with ESMTP id DD218481AF01; Sun, 30 Mar 2008 11:38:16 -0700 (PDT) X-Virus-Scanned: by Amavis at boum.org Received: from boum.org ([204.13.164.185]) by localhost (boum.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rKa7YhrtCboC; Sun, 30 Mar 2008 11:38:15 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by boum.org (Postfix) with ESMTP id 7EAA2481B1FB; Sun, 30 Mar 2008 11:38:15 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by localhost (Postfix) with ESMTP id E2B18164014; Sun, 30 Mar 2008 20:38:11 +0200 (CEST) Message-Id: <85r6ds6ory.fsf@boum.org> From: intrigeri To: bug-gnu-emacs@gnu.org Date: Sun, 30 Mar 2008 20:38:09 +0200 User-Agent: SquirrelMail/1.4 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 3) X-Mailman-Approved-At: Sun, 30 Mar 2008 14:39:24 -0400 Cc: Colin Marquardt , rfrancoise@debian.org Subject: 23.0.60; Segmentation fault loading auto-lang.el X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: bug-gnu-emacs-bounces+monnier=iro.umontreal.ca@gnu.org Errors-To: bug-gnu-emacs-bounces+monnier=iro.umontreal.ca@gnu.org X-DIRO-MailScanner-Information: Please contact the ISP for more information X-DIRO-MailScanner: Found to be clean X-DIRO-MailScanner-SpamCheck: n'est pas un polluriel, SpamAssassin (score=-1.94, requis 5, BAYES_00 -2.60, RECEIVED_FROM_NOUNK 0.66, SPF_HELO_PASS -0.00) X-DIRO-MailScanner-From: bug-gnu-emacs-bounces+monnier=iro.umontreal.ca@gnu.org Hello, First glance : - download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/ - run emacs -Q - M-x load-file - choose file ~/.elisp/auto-lang.el =3D> Emacs segfaults (same result with emacs -Q -nw) Trying harder : - download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/ - run emacs -Q - C-x C-f ~/.elisp/auto-lang.el - select region from the beginning of the file to, and including, line 1398 - eval-region =3D> Emacs eval=E2=80=99s the region just fine - then eval the next sexp : (defvar al-german-common-8bit-regexp ... ) =3D> Emacs segfaults (same result with emacs -Q -nw) I know that auto-lang.el is not part of GNU Emacs, but I guess that Emacs is supposed not to segfault when loading random *.el files. I=E2=80=99m running Romain Fran=C3=A7oise=E2=80=99s emacs-snapshot Debian p= ackage, based on Emacs CVS (2008-03-28) : In GNU Emacs 23.0.60.1 (i486-pc-linux-gnu, GTK+ Version 2.12.9) of 2008-03-28 on elegiac, modified by Debian (emacs-snapshot package, version 1:20080328-1) configured using `configure '--build' 'i486-linux-gnu' '--host' 'i486-linu= x-gnu' '--prefix=3D/usr' '--sharedstatedir=3D/var/lib' '--libexecdir=3D/usr= /lib' '--localstatedir=3D/var' '--infodir=3D/usr/share/info' '--mandir=3D/u= sr/share/man' '--with-pop=3Dyes' '--enable-locallisppath=3D/etc/emacs-snaps= hot:/etc/emacs:/usr/local/share/emacs/23.0.60/site-lisp:/usr/local/share/em= acs/site-lisp:/usr/share/emacs/23.0.60/site-lisp:/usr/share/emacs/site-lisp= :/usr/share/emacs/23.0.60/leim' '--with-x=3Dyes' '--with-x-toolkit=3Dgtk' '= build_alias=3Di486-linux-gnu' 'host_alias=3Di486-linux-gnu' 'CFLAGS=3D-DDEB= IAN -DSITELOAD_PURESIZE_EXTRA=3D5000 -g -O2'' Important settings: value of $LC_ALL: fr_FR.UTF-8 value of $LC_COLLATE: nil value of $LC_CTYPE: UTF-8 value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: fr_FR.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: ESC x r e p o r t - e m TAB RET Recent messages: ("emacs" "-Q") Bye, --=20 intrigeri From intrigeri@boum.org Mon Mar 31 04:29:43 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: * X-Spam-Status: No, score=1.6 required=4.0 tests=AWL,ONEWORD autolearn=no version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 103) by emacsbugs.donarmstrong.com; 31 Mar 2008 11:29:43 +0000 Received: from boum.org (boum.org [204.13.164.185]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m2VBTauW012472 for <103@emacsbugs.donarmstrong.com>; Mon, 31 Mar 2008 04:29:37 -0700 Received: from localhost (localhost [127.0.0.1]) by boum.org (Postfix) with ESMTP id 6B8B2481AF1F for <103@emacsbugs.donarmstrong.com>; Mon, 31 Mar 2008 04:29:25 -0700 (PDT) X-Virus-Scanned: by Amavis at boum.org Received: from boum.org ([204.13.164.185]) by localhost (boum.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TbneiAodKgMs for <103@emacsbugs.donarmstrong.com>; Mon, 31 Mar 2008 04:29:24 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by boum.org (Postfix) with ESMTP id 519E5481AF08 for <103@emacsbugs.donarmstrong.com>; Mon, 31 Mar 2008 04:29:24 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by localhost (Postfix) with ESMTP id 2619216415C for <103@emacsbugs.donarmstrong.com>; Mon, 31 Mar 2008 13:29:25 +0200 (CEST) Message-Id: <85fxu76sj1.fsf@boum.org> From: intrigeri To: 103@debbugs.gnu.org Subject: Details Date: Mon, 31 Mar 2008 13:29:22 +0200 User-Agent: SquirrelMail/1.4 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello, I tried to further isolate the segfault cause. I first ran a clean Emacs : emacs -Q -nw Then I eval=E2=80=99d only the (defcustom al-german-common-words ...) and the (defcustom al-german-8bit-words ...) from auto-lang.el. Then I tried to eval parts of (defvar al-german-common-8bit-regexp ... ) to find out which part of it makes Emacs segfault. I could eval the following without any issue : (mapcar 'string-as-unibyte (append al-german-common-words al-german-8bit-words nil)) But the following triggers the segfault : (regexp-opt (mapcar 'string-as-unibyte (append al-german-common-words al-german-8bit-words nil))) So it seems the regexp-opt function call is the one that triggers the segfault. Bye, --=20 intrigeri From cyd@stupidchicken.com Mon Apr 7 22:29:53 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: * X-Spam-Status: No, score=1.7 required=4.0 tests=AWL,RCVD_IN_PBL, RCVD_IN_SORBS_DUL,RDNS_DYNAMIC autolearn=no version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 103) by emacsbugs.donarmstrong.com; 8 Apr 2008 05:29:53 +0000 Received: from furry (c-98-216-111-182.hsd1.ma.comcast.net [98.216.111.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m385TlJN030260 for <103@emacsbugs.donarmstrong.com>; Mon, 7 Apr 2008 22:29:50 -0700 Received: by furry (Postfix, from userid 1000) id 28680C047; Tue, 8 Apr 2008 01:29:41 -0400 (EDT) From: Chong Yidong To: emacs-devel@gnu.org Cc: 103@debbugs.gnu.org, intrigeri Subject: Re: 23.0.60; Segmentation fault loading auto-lang.el Date: Tue, 08 Apr 2008 01:29:41 -0400 Message-ID: <87r6dg3oe2.fsf@stupidchicken.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > - download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/ > - run emacs -Q > - M-x load-file > - choose file ~/.elisp/auto-lang.el > =3D> Emacs segfaults (same result with emacs -Q -nw) This is due to an infinite nesting depth in regexp-opt, which can be tracked down to the following problem: (let ((str (string-as-unibyte "=C3=A4"))) (string-match (char-to-string (string-to-char str)) str)) evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that this screws up the use of all-completions in regexp-opt-group. Anyone have any idea what's going on here? From handa@m17n.org Mon Apr 7 23:52:57 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: * X-Spam-Status: No, score=1.3 required=4.0 tests=AWL,IMPRONONCABLE_2 autolearn=no version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 103) by emacsbugs.donarmstrong.com; 8 Apr 2008 06:52:57 +0000 Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m386qnaS025051 for <103@emacsbugs.donarmstrong.com>; Mon, 7 Apr 2008 23:52:51 -0700 Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id m386qcKu004841; Tue, 8 Apr 2008 15:52:39 +0900 (JST) env-from (handa@m17n.org) Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id m386qcUw020399; Tue, 8 Apr 2008 15:52:38 +0900 (JST) env-from (handa@m17n.org) Received: by smtp3.aist.go.jp with ESMTP id m386qSMB003166; Tue, 8 Apr 2008 15:52:28 +0900 (JST) env-from (handa@m17n.org) Received: from handa by etlken.m17n.org with local (Exim 4.69) (envelope-from ) id 1Jj7gx-00039V-Vp; Tue, 08 Apr 2008 15:52:27 +0900 From: Kenichi Handa To: Chong Yidong CC: emacs-devel@gnu.org, intrigeri@boum.org, 103@debbugs.gnu.org In-reply-to: <87r6dg3oe2.fsf@stupidchicken.com> (message from Chong Yidong on Tue, 08 Apr 2008 01:29:41 -0400) Subject: Re: 23.0.60; Segmentation fault loading auto-lang.el References: <87r6dg3oe2.fsf@stupidchicken.com> User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=ISO-2022-JP-2 Message-Id: Date: Tue, 08 Apr 2008 15:52:27 +0900 In article <87r6dg3oe2.fsf@stupidchicken.com>, Chong Yidong writes: > > - download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/ > > - run emacs -Q > > - M-x load-file > > - choose file ~/.elisp/auto-lang.el > > => Emacs segfaults (same result with emacs -Q -nw) > This is due to an infinite nesting depth in regexp-opt, which can be > tracked down to the following problem: > (let ((str (string-as-unibyte "$(D+#(B"))) > (string-match (char-to-string (string-to-char str)) str)) > evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that > this screws up the use of all-completions in regexp-opt-group. > Anyone have any idea what's going on here? (string-as-unibyte "$(D+#(B") => "\303\244" (string-to-char "\303\244") => 195 (because ?\303 == 195) (char-to-string 195) => "$(D**(B" (because 195==0xC3 U+00C3=='$(D**(B') (string-match "$(D**(B" "$(D+#(B") => nil (obvious) Any Lisp program that depends on the result of string-as-unibyte (thus Emacs' internal character representation) won't work in Emacs 23. --- Kenichi Handa handa@ni.aist.go.jp From cyd@stupidchicken.com Tue Apr 8 09:54:05 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: * X-Spam-Status: No, score=1.1 required=4.0 tests=AWL,IMPRONONCABLE_1, IMPRONONCABLE_2,MURPHY_WRONG_WORD2 autolearn=no version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 103) by emacsbugs.donarmstrong.com; 8 Apr 2008 16:54:05 +0000 Received: from cyd.mit.edu (CYD.MIT.EDU [18.115.2.24]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m38Gs2gq024899 for <103@emacsbugs.donarmstrong.com>; Tue, 8 Apr 2008 09:54:03 -0700 Received: by cyd.mit.edu (Postfix, from userid 1000) id E809D4E3FC; Tue, 8 Apr 2008 12:50:11 -0400 (EDT) From: Chong Yidong To: Kenichi Handa Cc: emacs-devel@gnu.org, intrigeri@boum.org, 103@debbugs.gnu.org Subject: Re: 23.0.60; Segmentation fault loading auto-lang.el References: <87r6dg3oe2.fsf@stupidchicken.com> Date: Tue, 08 Apr 2008 12:50:11 -0400 In-Reply-To: (Kenichi Handa's message of "Tue, 08 Apr 2008 15:52:27 +0900") Message-ID: <87skxwl29o.fsf@stupidchicken.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Kenichi Handa writes: > In article <87r6dg3oe2.fsf@stupidchicken.com>, Chong Yidong writes: > >> > - download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/ >> > - run emacs -Q >> > - M-x load-file >> > - choose file ~/.elisp/auto-lang.el >> > =3D> Emacs segfaults (same result with emacs -Q -nw) > >> This is due to an infinite nesting depth in regexp-opt, which can be >> tracked down to the following problem: > >> (let ((str (string-as-unibyte "=C3=A4"))) >> (string-match (char-to-string (string-to-char str)) str)) > >> evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that >> this screws up the use of all-completions in regexp-opt-group. > >> Anyone have any idea what's going on here? > > (string-as-unibyte "=C3=A4") =3D> "\303\244" > (string-to-char "\303\244") =3D> 195 (because ?\303 =3D=3D 195) > (char-to-string 195) =3D> "=C3=83" (because 195=3D=3D0xC3 U+00C3=3D=3D'= =C3=83') > (string-match "=C3=83" "=C3=A4") =3D> nil (obvious) > > Any Lisp program that depends on the result of > string-as-unibyte (thus Emacs' internal character > representation) won't work in Emacs 23. I see. However, maybe the following change to regexp-opt-group in regexp-opt.el would make things a little more predictable. What do you think? *** trunk/lisp/emacs-lisp/regexp-opt.el.~1.37.~ 2008-03-14 17:17:34.0000000= 00 -0400 --- trunk/lisp/emacs-lisp/regexp-opt.el 2008-04-08 12:46:49.000000000 -0400 *************** *** 226,232 **** =20=20 ;; Otherwise, divide the list into those that start with a ;; particular letter and those that do not, and recurse on them. ! (let* ((char (char-to-string (string-to-char (car strings)))) (half1 (all-completions char strings)) (half2 (nthcdr (length half1) strings))) (concat open-group --- 226,232 ---- =20=20 ;; Otherwise, divide the list into those that start with a ;; particular letter and those that do not, and recurse on them. ! (let* ((char (substring (car strings) 0 1)) (half1 (all-completions char strings)) (half2 (nthcdr (length half1) strings))) (concat open-group From monnier@iro.umontreal.ca Tue Apr 8 18:42:24 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: X-Spam-Status: No, score=0.5 required=4.0 tests=AWL autolearn=ham version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 103) by emacsbugs.donarmstrong.com; 9 Apr 2008 01:42:25 +0000 Received: from ironport2-out.teksavvy.com (ironport2-out.pppoe.ca [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m391gKSk003719 for <103@emacsbugs.donarmstrong.com>; Tue, 8 Apr 2008 18:42:22 -0700 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiUFAJ27+0dMCpOw/2dsb2JhbACBXKtC X-IronPort-AV: E=Sophos;i="4.25,626,1199682000"; d="scan'208";a="18153129" Received: from smtp.pppoe.ca (HELO smtp.teksavvy.com) ([65.39.196.238]) by ironport2-out.teksavvy.com with ESMTP; 08 Apr 2008 21:42:15 -0400 Received: from pastel.home ([76.10.147.176]) by smtp.teksavvy.com (Internet Mail Server v1.0) with ESMTP id PFZ30415; Tue, 08 Apr 2008 21:42:15 -0400 Received: by pastel.home (Postfix, from userid 20848) id E066E8C24; Tue, 8 Apr 2008 21:42:14 -0400 (EDT) From: Stefan Monnier To: Chong Yidong Cc: Kenichi Handa , intrigeri@boum.org, 103@debbugs.gnu.org, emacs-devel@gnu.org Subject: Re: 23.0.60; Segmentation fault loading auto-lang.el Message-ID: References: <87r6dg3oe2.fsf@stupidchicken.com> <87skxwl29o.fsf@stupidchicken.com> Date: Tue, 08 Apr 2008 21:42:14 -0400 In-Reply-To: <87skxwl29o.fsf@stupidchicken.com> (Chong Yidong's message of "Tue, 08 Apr 2008 12:50:11 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable >>> (let ((str (string-as-unibyte "=E4"))) >>> (string-match (char-to-string (string-to-char str)) str)) >>=20 >>> evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that >>> this screws up the use of all-completions in regexp-opt-group. >>=20 >>> Anyone have any idea what's going on here? >>=20 >> (string-as-unibyte "=E4") =3D> "\303\244" >> (string-to-char "\303\244") =3D> 195 (because ?\303 =3D=3D 195) >> (char-to-string 195) =3D> "=C3" (because 195=3D=3D0xC3 U+00C3=3D=3D'=C3') >> (string-match "=C3" "=E4") =3D> nil (obvious) >>=20 >> Any Lisp program that depends on the result of >> string-as-unibyte (thus Emacs' internal character >> representation) won't work in Emacs 23. Notice that the problem is unrelated to string-as-unibyte: (string-match (char-to-string (string-to-char str)) str) this should intuitively always return 0. Of course, once you replace `char-to-string' with just `string', you may be reminded that Emacs-23 introduced `unibyte-string', which leads you to the key, if `str' is unibyte, you need to do (string-match (unibyte-string (string-to-char str)) str) In Emacs-22, `string' used a heuristic to decide whether to build a unibyte or multibyte string, and more importantly, the character representing byte code 209 had code 209, whereas in Emacs-23, we have the strange situation that byte 209 is character 4194257. So an integer <256 needs to be accompagnied with some contextual info that says whether it represents a char or a byte, otherwise you get ambiguity that lead to bugs. And string-to-char returns either a byte or a char depending on whether the string was unibyte or multibyte. > I see. However, maybe the following change to regexp-opt-group in > regexp-opt.el would make things a little more predictable. What do you > think? Yes, it looks like a good fix. Maybe "-no-properties" would be even better. Stefan From handa@m17n.org Tue Apr 8 19:20:02 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: * X-Spam-Status: No, score=1.3 required=4.0 tests=AWL,IMPRONONCABLE_1, MURPHY_WRONG_WORD2 autolearn=no version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 103) by emacsbugs.donarmstrong.com; 9 Apr 2008 02:20:03 +0000 Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m392JwfP015717 for <103@emacsbugs.donarmstrong.com>; Tue, 8 Apr 2008 19:19:59 -0700 Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id m392Jo7b026403; Wed, 9 Apr 2008 11:19:51 +0900 (JST) env-from (handa@m17n.org) Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id m392JmkP011950; Wed, 9 Apr 2008 11:19:48 +0900 (JST) env-from (handa@m17n.org) Received: by smtp2.aist.go.jp with ESMTP id m392JhUL019012; Wed, 9 Apr 2008 11:19:43 +0900 (JST) env-from (handa@m17n.org) Received: from handa by etlken.m17n.org with local (Exim 4.69) (envelope-from ) id 1JjPuZ-00052G-Fw; Wed, 09 Apr 2008 11:19:43 +0900 From: Kenichi Handa To: Chong Yidong CC: intrigeri@boum.org, 103@debbugs.gnu.org, emacs-devel@gnu.org In-reply-to: <87skxwl29o.fsf@stupidchicken.com> (message from Chong Yidong on Tue, 08 Apr 2008 12:50:11 -0400) Subject: Re: 23.0.60; Segmentation fault loading auto-lang.el References: <87r6dg3oe2.fsf@stupidchicken.com> <87skxwl29o.fsf@stupidchicken.com> User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Message-Id: Date: Wed, 09 Apr 2008 11:19:43 +0900 In article <87skxwl29o.fsf@stupidchicken.com>, Chong Yidong writes: > > Any Lisp program that depends on the result of > > string-as-unibyte (thus Emacs' internal character > > representation) won't work in Emacs 23. > I see. However, maybe the following change to regexp-opt-group in > regexp-opt.el would make things a little more predictable. What do you > think? I agree because that change will avoid a unibyte string being changed to multibyte by accident. But, I've just downloaded auto-lang.el and found that it has codes something like this: (string-as-multibyte (regexp-opt (mapcar 'string-as-unibyte (append al-german-common-words al-german-8bit-words nil)))) All of them should be changed to this simple form: (regexp-opt (append al-german-common-words al-german-8bit-words)) The above german case works just by chance, but al-danish-common-words doesn't. You'll see peculiar 8-bit codes in it. And, the file should have a coding tag. --- Kenichi Handa handa@ni.aist.go.jp > *** trunk/lisp/emacs-lisp/regexp-opt.el.~1.37.~ 2008-03-14 17:17:34.000000000 -0400 > --- trunk/lisp/emacs-lisp/regexp-opt.el 2008-04-08 12:46:49.000000000 -0400 > *************** > *** 226,232 **** > ;; Otherwise, divide the list into those that start with a > ;; particular letter and those that do not, and recurse on them. > ! (let* ((char (char-to-string (string-to-char (car strings)))) > (half1 (all-completions char strings)) > (half2 (nthcdr (length half1) strings))) > (concat open-group > --- 226,232 ---- > ;; Otherwise, divide the list into those that start with a > ;; particular letter and those that do not, and recurse on them. > ! (let* ((char (substring (car strings) 0 1)) > (half1 (all-completions char strings)) > (half2 (nthcdr (length half1) strings))) > (concat open-group From cyd@stupidchicken.com Tue Apr 8 21:32:24 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: * X-Spam-Status: No, score=1.8 required=4.0 tests=AWL,RCVD_IN_PBL, RCVD_IN_SORBS_DUL,RDNS_DYNAMIC autolearn=no version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 103-close) by emacsbugs.donarmstrong.com; 9 Apr 2008 04:32:24 +0000 Received: from furry (c-98-216-111-182.hsd1.ma.comcast.net [98.216.111.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m394WL5x026743 for <103-close@emacsbugs.donarmstrong.com>; Tue, 8 Apr 2008 21:32:22 -0700 Received: by furry (Postfix, from userid 1000) id EF1E0C047; Wed, 9 Apr 2008 00:32:15 -0400 (EDT) From: Chong Yidong To: 103-close@debbugs.gnu.org Subject: Re: 23.0.60; Segmentation fault loading auto-lang.el Date: Wed, 09 Apr 2008 00:32:15 -0400 Message-ID: <878wznlkc0.fsf@stupidchicken.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii I've checked in the fix to regexp-opt. From unknown Wed Jun 18 23:16:46 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: $requester Subject: Internal Control Message-Id: bug archived. Date: Wed, 07 May 2008 14:24:03 +0000 User-Agent: Fakemail v42.6.9 # A New Hope # A log time ago, in a galaxy far, far away # something happened. # # Magically this resulted in the following # action being taken, but this fake control # message doesn't tell you why it happened # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator