GNU bug report logs -
#16544
Optimazation for is_mb_middle
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sat, 25 Jan 2014 02:40:02 UTC
Severity: normal
Tags: patch
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch
When matched characters to a regular expression is found by kwsexec or
dfaexec, we need check whether it is in the middle of a multi-byte character.
`is_mb_middle' of searchutils.c is used for it. However, it's expensive,
even if most of them contain constitute with single-byte characters.
For example, a source code written in a language with multibyte characters,
has a lot of single-byte characters.
Now, I post the patch which optimizes `is_mb_middle'. It checks whether
each single-byte is completion as an character before execution, and
caches them. In addition, for UTF-8 further optimization is performed.
Only when it's impossible to determine the length of a multibyte character
with caches, the length is determined with `mbrlen' in execution.
[is_mb_middle.txt (application/octet-stream, attachment)]
This bug report was last modified 11 years and 163 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.