GNU bug report logs - #540
23.0.60; Unicode search bug

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> jurta.org>

Date: Sun, 6 Jul 2008 18:55:05 UTC

Severity: normal

Done: Chong Yidong <cyd <at> stupidchicken.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 540 in the body.
You can then email your comments to 540 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#540; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to Juri Linkov <juri <at> jurta.org>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: emacs-pretest-bug <at> gnu.org
Subject: 23.0.60; Unicode search bug
Date: Sun, 06 Jul 2008 21:43:23 +0300
There is a weird bug in searching Unicode text.  The search function
fails on Cyrillic letters between codepoints #x0400 and #x041f, but
successfully finds a Cyrillic letter between #x0420 and #x042f.

I tried to debug this and see that in case of failure
it calls `boyer_moore', and in case of successful search
it calls `simple_search'.  I checked the Unicode properties,
but everything seems correct.

This bug didn't exist before the Unicode merge.

The easiest way to reproduce it: run `emacs -Q',
put in the *scratch* buffer the following 4 lines
(note the leading space):

(search-forward " П" nil t)
(search-forward " Р" nil t)
 П
 Р

and type `C-x C-e' after each of first two lines.

In GNU Emacs 23.0.60 (x86_64-pc-linux-gnu)
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

-- 
Juri Linkov
http://www.jurta.org/emacs/




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#540; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to Chong Yidong <cyd <at> stupidchicken.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #10 received at 540 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 540 <at> debbugs.gnu.org
Subject: 23.0.60; Unicode search bug
Date: Wed, 27 Aug 2008 00:15:57 -0400
Hi Handa-san,

Could you take a look at this bug report?  Thanks.

Juri Linkov <juri <at> jurta.org> wrote:
> There is a weird bug in searching Unicode text.  The search function
> fails on Cyrillic letters between codepoints #x0400 and #x041f, but
> successfully finds a Cyrillic letter between #x0420 and #x042f.
>
> I tried to debug this and see that in case of failure it calls
> `boyer_moore', and in case of successful search it calls
> `simple_search'.  I checked the Unicode properties, but everything
> seems correct.
>
> This bug didn't exist before the Unicode merge.
>
> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
> buffer the following 4 lines (note the leading space):
>
> (search-forward " П" nil t)
> (search-forward " Р" nil t)
>  П
>  Р
>
> and type `C-x C-e' after each of first two lines.

Here, the failing case is:

П          = 1055 = 10000011111
inverse(П) = 1087 = 10000111111
                         ^^^^^^

whereas the case that works (by setting boyer_moore_ok to 0) is

Р          = 1056 = 10000100000
inverse(Р) = 1088 = 10001000000
                         ^^^^^^

I've indicated the last 6 bits, according to the logic in search_buffer
(which I don't fully understand).




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#540; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to Andreas Schwab <schwab <at> suse.de>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #15 received at 540 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Andreas Schwab <schwab <at> suse.de>
To: Chong Yidong <cyd <at> stupidchicken.com>
Cc: 540 <at> debbugs.gnu.org, Kenichi Handa <handa <at> m17n.org>
Subject: Re: bug#540: 23.0.60; Unicode search bug
Date: Wed, 27 Aug 2008 12:59:40 +0200
Chong Yidong <cyd <at> stupidchicken.com> writes:

>> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
>> buffer the following 4 lines (note the leading space):
>>
>> (search-forward " П" nil t)
>> (search-forward " Р" nil t)
>>  П
>>  Р
>>
>> and type `C-x C-e' after each of first two lines.

Should be fixed now.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab <at> suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




Reply sent to Chong Yidong <cyd <at> stupidchicken.com>:
You have taken responsibility. Full text and rfc822 format available.

Notification sent to Juri Linkov <juri <at> jurta.org>:
bug acknowledged by developer. Full text and rfc822 format available.

Message #20 received at 540-done <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Andreas Schwab <schwab <at> suse.de>
Cc: 540-done <at> debbugs.gnu.org, Kenichi Handa <handa <at> m17n.org>
Subject: Re: bug#540: 23.0.60; Unicode search bug
Date: Wed, 27 Aug 2008 10:34:40 -0400
Andreas Schwab <schwab <at> suse.de> writes:

> Should be fixed now.

Thanks!




bug archived. Request was from Debbugs Internal Request <don <at> donarmstrong.com> to internal_control <at> emacsbugs.donarmstrong.com. (Thu, 25 Sep 2008 14:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 15 years and 246 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.