GNU bug report logs - #16842
[PATCH] Use mbrtowc_cache in DFA engine

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 22 Feb 2014 15:47:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16842 in the body.
You can then email your comments to 16842 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#16842; Package grep. (Sat, 22 Feb 2014 15:47:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sat, 22 Feb 2014 15:47:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: [PATCH] Use mbrtowc_cache in DFA engine
Date: Sun, 23 Feb 2014 00:46:27 +0900
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch

The patch is DFA version of patch#16544 "Optimazation for is_mb_middle".
It will improve performance for non-UTF8 locales in DFA engine.

I tested below.  In both case, Speed-up 3-3.5x.

$ yes $(printf '%078dm' 0)|head -1000000 > in
$ for i in `seq 5`; do env LC_ALL=ja_JP.eucJP time src/grep n in; done

$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -1000000 > k
$ for i in `seq 5`; do env LC_ALL=ja_JP.eucJP time src/grep -i foobar k; done

Norihiro
[use_mb_cache_in_dfa.txt (application/octet-stream, attachment)]
[tests.txt (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16842; Package grep. (Fri, 28 Mar 2014 07:29:02 GMT) Full text and rfc822 format available.

Message #8 received at 16842 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 16842 <at> debbugs.gnu.org
Subject: Re: bug#16842: [PATCH] Use mbrtowc_cache in DFA engine
Date: Fri, 28 Mar 2014 00:27:51 -0700
[Message part 1 (text/plain, inline)]
Thanks very much.  I read through that patch and think we can come up 
with a simpler cache that need not store lengths, but reserves WEOF to 
represent an incomplete multibyte character.  This approach simplifies 
the code and avoids some glitches when mbrtowc returns special values 
not in the range 1..N.  How about the attached patch instead?
[0001-dfa-cache-results-of-mbrtowc-for-speed.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16842; Package grep. (Fri, 28 Mar 2014 16:06:01 GMT) Full text and rfc822 format available.

Message #11 received at 16842 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 16842 <at> debbugs.gnu.org
Subject: bug#16842: [PATCH] Use mbrtowc_cache in DFA engine
Date: Sat, 29 Mar 2014 01:05:23 +0900
[Message part 1 (text/plain, inline)]
Paul,

Thanks very match.  I checked the patch, and I add fixes to it as
following.

 1. Fixed warning.

    dfa.c: In function 'build_mbrtowc_cache':
    dfa.c:448: warning: pointer targets in passing argument 1 of
    'mbrtowc' differ in signedness

 2. took mbrtowc_cache into new member of struct dfa.

    When struct dfa more than one are used at the same time, mbrtowc cache
    may be conflict.  So, take mbrtowc_cache into new member of struct dfa,
    and define each mbrtowc cache for them.

Norihiro
[patch.txt (text/plain, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Fri, 28 Mar 2014 16:37:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Fri, 28 Mar 2014 16:37:03 GMT) Full text and rfc822 format available.

Message #16 received at 16842-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 16842-done <at> debbugs.gnu.org
Subject: Re: bug#16842: [PATCH] Use mbrtowc_cache in DFA engine
Date: Fri, 28 Mar 2014 09:36:11 -0700
[Message part 1 (text/plain, inline)]
Thanks for the review and the fixes.  I found a couple more things. 
First, it's not portable to cast wint_t * to wchar_t *, since the 
pointed-to types might be different sizes or representations. Second, we 
can put the cache directly in the struct dfa, saving the overhead of 
doing a separate malloc.

The attached further patch should address these problems.  I pushed 
this, along with the earlier two patches in this sequence, and am 
marking this as done.


[0003-dfa-avoid-an-indirection-and-port-wint_t-usage.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 26 Apr 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 53 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.