GNU bug report logs -
#13084
boyer_moore crashes with certain characters in the case table
Previous Next
Reported by: Juri Linkov <juri <at> jurta.org>
Date: Wed, 5 Dec 2012 00:37:02 UTC
Severity: normal
Done: Juri Linkov <juri <at> jurta.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
In article <83obhxoo2v.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> > Here, A and B belongs to the same character group iff A and
> > B has the same multibyte sequence except for the last byte.
> > In this condition, we should be able to use the table
> > simple_translate.
> OK, then maybe just the comments need to be fixed. They shouldn't
> talk about "charset" and "row", which are undefined in Unicode Emacs.
> They should instead use terminology that correspond to UTF-8 multibyte
> representation of characters we use today.
I've just committed this change. How is it?
=== modified file 'src/search.c'
--- src/search.c 2012-10-10 20:09:47 +0000
+++ src/search.c 2012-12-15 13:04:46 +0000
@@ -1313,8 +1313,11 @@
non-nil, we can use boyer-moore search only if TRT can be
represented by the byte array of 256 elements. For that,
all non-ASCII case-equivalents of all case-sensitive
- characters in STRING must belong to the same charset and
- row. */
+ characters in STRING must belong to the same character
+ group (two characters belong to the same group iff their
+ multibyte forms are the same except for the last byte;
+ i.e. every 64 characters form a group; U+0000..U+003F,
+ U+0040..U+007F, U+0080..U+00BF, ...). */
while (--len >= 0)
{
---
Kenichi Handa
handa <at> gnu.org
This bug report was last modified 12 years and 164 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.