From juri@jurta.org Sun Jul 6 11:45:35 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: X-Spam-Status: No, score=-2.5 required=4.0 tests=AWL,BAYES_00,FOURLA,KOI8R, RCVD_IN_DNSWL_MED autolearn=ham version=3.2.3-bugs.debian.org_2005_01_02 Received: (at submit) by emacsbugs.donarmstrong.com; 6 Jul 2008 18:45:35 +0000 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m66IjRPG031052 for ; Sun, 6 Jul 2008 11:45:28 -0700 Received: from mx10.gnu.org ([199.232.76.166]:58619) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1KFZEO-0003Wr-Rs for emacs-pretest-bug@gnu.org; Sun, 06 Jul 2008 14:45:04 -0400 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1KFZEh-0000sw-4O for emacs-pretest-bug@gnu.org; Sun, 06 Jul 2008 14:45:26 -0400 Received: from relay03.kiev.sovam.com ([62.64.120.201]:63486) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KFZEg-0000sY-Pd for emacs-pretest-bug@gnu.org; Sun, 06 Jul 2008 14:45:22 -0400 Received: from [83.170.232.243] (helo=smtp.svitonline.com) by relay03.kiev.sovam.com with esmtp (Exim 4.67) (envelope-from ) id 1KFZEe-0002AR-Gz for emacs-pretest-bug@gnu.org; Sun, 06 Jul 2008 21:45:20 +0300 From: Juri Linkov To: emacs-pretest-bug@gnu.org Subject: 23.0.60; Unicode search bug Organization: JURTA Date: Sun, 06 Jul 2008 21:43:23 +0300 Message-ID: <87ej66q2os.fsf@jurta.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit X-Scanner-Signature: 4204c266f86c3e9b1e0231505dd73b85 X-DrWeb-checked: yes X-SpamTest-Envelope-From: juri@jurta.org X-SpamTest-Group-ID: 00000000 X-SpamTest-Header: Trusted X-SpamTest-Info: Profiles 4235s [July 6 2008] X-SpamTest-Info: {received from trusted relay: common white list} X-SpamTest-Method: white ip list X-SpamTest-Rate: 0 X-SpamTest-Status: Trusted X-SpamTest-Status-Extended: trusted X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0278], KAS30/Release X-detected-kernel: by monty-python.gnu.org: FreeBSD 6.x (1) There is a weird bug in searching Unicode text. The search function fails on Cyrillic letters between codepoints #x0400 and #x041f, but successfully finds a Cyrillic letter between #x0420 and #x042f. I tried to debug this and see that in case of failure it calls `boyer_moore', and in case of successful search it calls `simple_search'. I checked the Unicode properties, but everything seems correct. This bug didn't exist before the Unicode merge. The easiest way to reproduce it: run `emacs -Q', put in the *scratch* buffer the following 4 lines (note the leading space): (search-forward " ð" nil t) (search-forward " ò" nil t) ð ò and type `C-x C-e' after each of first two lines. In GNU Emacs 23.0.60 (x86_64-pc-linux-gnu) Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: en_US.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default-enable-multibyte-characters: t -- Juri Linkov http://www.jurta.org/emacs/ From cyd@stupidchicken.com Tue Aug 26 21:14:14 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: X-Spam-Status: No, score=-3.6 required=4.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 540) by emacsbugs.donarmstrong.com; 27 Aug 2008 04:14:14 +0000 Received: from cyd.mit.edu (CYD.MIT.EDU [18.115.2.24]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m7R4EBo5016985 for <540@emacsbugs.donarmstrong.com>; Tue, 26 Aug 2008 21:14:12 -0700 Received: by cyd.mit.edu (Postfix, from userid 1000) id 7744657E34B; Wed, 27 Aug 2008 00:15:57 -0400 (EDT) From: Chong Yidong To: Kenichi Handa Cc: 540@debbugs.gnu.org Subject: 23.0.60; Unicode search bug Date: Wed, 27 Aug 2008 00:15:57 -0400 Message-ID: <87wsi3qeiq.fsf@cyd.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Handa-san, Could you take a look at this bug report? Thanks. Juri Linkov wrote: > There is a weird bug in searching Unicode text. The search function > fails on Cyrillic letters between codepoints #x0400 and #x041f, but > successfully finds a Cyrillic letter between #x0420 and #x042f. > > I tried to debug this and see that in case of failure it calls > `boyer_moore', and in case of successful search it calls > `simple_search'. I checked the Unicode properties, but everything > seems correct. > > This bug didn't exist before the Unicode merge. > > The easiest way to reproduce it: run `emacs -Q', put in the *scratch* > buffer the following 4 lines (note the leading space): > > (search-forward " =D0=9F" nil t) > (search-forward " =D0=A0" nil t) > =D0=9F > =D0=A0 > > and type `C-x C-e' after each of first two lines. Here, the failing case is: =D0=9F =3D 1055 =3D 10000011111 inverse(=D0=9F) =3D 1087 =3D 10000111111 ^^^^^^ whereas the case that works (by setting boyer_moore_ok to 0) is =D0=A0 =3D 1056 =3D 10000100000 inverse(=D0=A0) =3D 1088 =3D 10001000000 ^^^^^^ I've indicated the last 6 bits, according to the logic in search_buffer (which I don't fully understand). From schwab@suse.de Wed Aug 27 03:59:46 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: X-Spam-Status: No, score=-10.1 required=4.0 tests=AWL,BAYES_00,HAS_BUG_NUMBER, RCVD_IN_DNSWL_MED autolearn=ham version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 540) by emacsbugs.donarmstrong.com; 27 Aug 2008 10:59:46 +0000 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m7RAxgW4019457 for <540@emacsbugs.donarmstrong.com>; Wed, 27 Aug 2008 03:59:44 -0700 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id EE1E646133; Wed, 27 Aug 2008 12:59:40 +0200 (CEST) From: Andreas Schwab To: Chong Yidong Cc: 540@debbugs.gnu.org, Kenichi Handa Subject: Re: bug#540: 23.0.60; Unicode search bug References: <87wsi3qeiq.fsf@cyd.mit.edu> X-Yow: Now KEN is having a MENTAL CRISIS because his "R.V." PAYMENTS are OVER-DUE!! Date: Wed, 27 Aug 2008 12:59:40 +0200 In-Reply-To: <87wsi3qeiq.fsf@cyd.mit.edu> (Chong Yidong's message of "Wed, 27 Aug 2008 00:15:57 -0400") Message-ID: User-Agent: Gnus/5.110009 (No Gnus v0.9) Emacs/22.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Chong Yidong writes: >> The easiest way to reproduce it: run `emacs -Q', put in the *scratch* >> buffer the following 4 lines (note the leading space): >> >> (search-forward " =D0=9F" nil t) >> (search-forward " =D0=A0" nil t) >> =D0=9F >> =D0=A0 >> >> and type `C-x C-e' after each of first two lines. Should be fixed now. Andreas. --=20 Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstra=C3=9Fe 5, 90409 N=C3=BCrnberg, Germany PGP key fingerprint =3D 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From cyd@stupidchicken.com Wed Aug 27 07:32:55 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: X-Spam-Status: No, score=-5.1 required=4.0 tests=AWL,BAYES_00,HAS_BUG_NUMBER autolearn=ham version=3.2.3-bugs.debian.org_2005_01_02 Received: (at 540-done) by emacsbugs.donarmstrong.com; 27 Aug 2008 14:32:55 +0000 Received: from cyd.mit.edu (CYD.MIT.EDU [18.115.2.24]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m7REWpHS025941 for <540-done@emacsbugs.donarmstrong.com>; Wed, 27 Aug 2008 07:32:52 -0700 Received: by cyd.mit.edu (Postfix, from userid 1000) id 532A857E32E; Wed, 27 Aug 2008 10:34:40 -0400 (EDT) To: Andreas Schwab Cc: 540-done@debbugs.gnu.org, Kenichi Handa Subject: Re: bug#540: 23.0.60; Unicode search bug References: <87wsi3qeiq.fsf@cyd.mit.edu> From: Chong Yidong Date: Wed, 27 Aug 2008 10:34:40 -0400 In-Reply-To: (Andreas Schwab's message of "Wed\, 27 Aug 2008 12\:59\:40 +0200") Message-ID: <87wsi2a5mn.fsf@cyd.mit.edu> User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2.91 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Andreas Schwab writes: > Should be fixed now. Thanks! From unknown Thu Sep 25 13:50:51 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: $requester Subject: Internal Control Message-Id: bug archived. Date: Thu, 25 Sep 2008 14:24:03 +0000 User-Agent: Fakemail v42.6.9 # A New Hope # A log time ago, in a galaxy far, far away # something happened. # # Magically this resulted in the following # action being taken, but this fake control # message doesn't tell you why it happened # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator