GNU bug report logs - #13084
boyer_moore crashes with certain characters in the case table

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> jurta.org>

Date: Wed, 5 Dec 2012 00:37:02 UTC

Severity: normal

Done: Juri Linkov <juri <at> jurta.org>

Bug is archived. No further changes may be made.

Full log

View this message in rfc822 format

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: juri <at> jurta.org, 13084 <at> debbugs.gnu.org
Subject: bug#13084: boyer_moore crashes with certain characters in the case	table
Date: Sat, 15 Dec 2012 22:17:17 +0900

In article <83obhxoo2v.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > Here, A and B belongs to the same character group iff A and
> > B has the same multibyte sequence except for the last byte.
> > In this condition, we should be able to use the table
> > simple_translate.

> OK, then maybe just the comments need to be fixed.  They shouldn't
> talk about "charset" and "row", which are undefined in Unicode Emacs.
> They should instead use terminology that correspond to UTF-8 multibyte
> representation of characters we use today.

I've just committed this change.  How is it?

=== modified file 'src/search.c'
--- src/search.c	2012-10-10 20:09:47 +0000
+++ src/search.c	2012-12-15 13:04:46 +0000
@@ -1313,8 +1313,11 @@
 	     non-nil, we can use boyer-moore search only if TRT can be
 	     represented by the byte array of 256 elements.  For that,
 	     all non-ASCII case-equivalents of all case-sensitive
-	     characters in STRING must belong to the same charset and
-	     row.  */
+	     characters in STRING must belong to the same character
+	     group (two characters belong to the same group iff their
+	     multibyte forms are the same except for the last byte;
+	     i.e. every 64 characters form a group; U+0000..U+003F,
+	     U+0040..U+007F, U+0080..U+00BF, ...).  */
 
 	  while (--len >= 0)
 	    {

---
Kenichi Handa
handa <at> gnu.org

This bug report was last modified 12 years and 213 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #13084 boyer_moore crashes with certain characters in the case table

GNU bug report logs - #13084
boyer_moore crashes with certain characters in the case table