GNU bug report logs -
#13084
boyer_moore crashes with certain characters in the case table
Previous Next
Reported by: Juri Linkov <juri <at> jurta.org>
Date: Wed, 5 Dec 2012 00:37:02 UTC
Severity: normal
Done: Juri Linkov <juri <at> jurta.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> From: Kenichi Handa <handa <at> gnu.org>
> Cc: juri <at> jurta.org, 13084 <at> debbugs.gnu.org
> Date: Thu, 13 Dec 2012 22:39:29 +0900
>
> I have not yet checked the code, but what I remember is that
> search_buffer checks the search string and decides which to
> use; boyer_moore or simple_search. If all equivalent
> characters of all non-ASCII characters in the search string
> are in the same character group, we can use boyer_moore.
Yes, that's my reading of the code as well.
> Here, A and B belongs to the same character group iff A and
> B has the same multibyte sequence except for the last byte.
> In this condition, we should be able to use the table
> simple_translate.
OK, then maybe just the comments need to be fixed. They shouldn't
talk about "charset" and "row", which are undefined in Unicode Emacs.
They should instead use terminology that correspond to UTF-8 multibyte
representation of characters we use today.
This bug report was last modified 12 years and 164 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.