GNU bug report logs - #16544
Optimazation for is_mb_middle

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 25 Jan 2014 02:40:02 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#16544: closed (Re: bug#16544: Optimazation for is_mb_middle)
Date: Sun, 02 Feb 2014 16:36:03 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#16544: Optimazation for is_mb_middle

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 16544 <at> debbugs.gnu.org.

-- 
16544: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16544
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 16544-done <at> debbugs.gnu.org
Subject: Re: bug#16544: Optimazation for is_mb_middle
Date: Sun, 2 Feb 2014 08:35:15 -0800
Merged and pushed.

[Message part 3 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: Optimazation for is_mb_middle
Date: Sat, 25 Jan 2014 11:39:11 +0900
[Message part 4 (text/plain, inline)]
Package: grep
Tags: patch

When matched characters to a regular expression is found by kwsexec or
dfaexec, we need check whether it is in the middle of a multi-byte character.

`is_mb_middle' of searchutils.c is used for it. However, it's expensive,
even if most of them contain constitute with single-byte characters.
For example, a source code written in a language with multibyte characters,
has a lot of single-byte characters.

Now, I post the patch which optimizes `is_mb_middle'. It checks whether
each single-byte is completion as an character before execution, and
caches them. In addition, for UTF-8 further optimization is performed.

Only when it's impossible to determine the length of a multibyte character
with caches, the length is determined with `mbrlen' in execution.
[is_mb_middle.txt (application/octet-stream, attachment)]

This bug report was last modified 11 years and 163 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.