GNU bug report logs - #17070
[PATCH] grep: optimization of DFA by reuse of multi-byte buffers in non-UTF8 locales

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sun, 23 Mar 2014 13:20:03 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17070 in the body.
You can then email your comments to 17070 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#17070; Package grep. (Sun, 23 Mar 2014 13:20:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 23 Mar 2014 13:20:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: [PATCH] grep: optimization of DFA by reuse of multi-byte buffers in
 non-UTF8 locales
Date: Sun, 23 Mar 2014 22:19:36 +0900
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch

dfaexec() allocates and deallocates many buffers in non-UTF8 locales, but
it's very inefficient.

If put them on struct dfa and reuse them, when use not regex but DFA for
ANYCHAR, it will speed up about 20-30% in non-UTF8 locales.

Norihiro
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17070; Package grep. (Fri, 28 Mar 2014 17:36:02 GMT) Full text and rfc822 format available.

Message #8 received at 17070 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17070 <at> debbugs.gnu.org
Subject: Re: bug#17070: [PATCH] grep: optimization of DFA by reuse of
 multi-byte buffers in non-UTF8 locales
Date: Sat, 29 Mar 2014 02:34:59 +0900
[Message part 1 (text/plain, inline)]
I rebased this patch, and add a bug fix to it.

If `elems' of `follows' is re-allocated in transit_state(), It may cause
a segfault.  So, I changed so that don't copy d->mb_follows to `follows'
variable.
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17070; Package grep. (Sat, 29 Mar 2014 22:12:01 GMT) Full text and rfc822 format available.

Message #11 received at 17070 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 17070 <at> debbugs.gnu.org
Subject: bug#17070: [PATCH] grep: optimization of DFA by reuse of multi-byte
 buffers in non-UTF8 locales
Date: Sun, 30 Mar 2014 07:11:22 +0900
[Message part 1 (text/plain, inline)]
I added further improvement to previous patch.

If dfaexec() runs in non-UTF8 locales, length and wide character
representation are checked for all characters of a line in a input
string.  However, if matched early in the line, results for remaining
characters are wasted.

The new patch doesn't reuse both `mblen_buf' and `inputwcs' but stops
using them, and checks multibyte characters on demand.  It enables to
accomplish to speed-up for matched early and reduce required memories.

Norihiro
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17070; Package grep. (Sun, 30 Mar 2014 11:15:03 GMT) Full text and rfc822 format available.

Message #14 received at 17070 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 17070 <at> debbugs.gnu.org
Subject: bug#17070: [PATCH] grep: optimization of DFA by reuse of multi-byte
 buffers in non-UTF8 locales
Date: Sun, 30 Mar 2014 20:14:21 +0900
[Message part 1 (text/plain, inline)]
I added further improvement to previous patch.

If dfaexec() runs in non-UTF8 locales, length and wide character
representation are checked for all characters of a line in a input
string.  However, if matched early in the line, results for remaining
characters are wasted.

The new patch doesn't reuse both `mblen_buf' and `inputwcs' but stops
using them, and checks multibyte characters on demand.  It enables to
accomplish to speed-up for matched early and reduce required memories.

Norihiro
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17070; Package grep. (Tue, 01 Apr 2014 08:54:02 GMT) Full text and rfc822 format available.

Message #17 received at 17070 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17070 <at> debbugs.gnu.org
Subject: Re: bug#17070: [PATCH] grep: optimization of DFA by reuse of
 multi-byte buffers in non-UTF8 locales
Date: Tue, 01 Apr 2014 10:53:10 +0200
Applying this patch too.

Paolo




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 06 Apr 2014 05:14:03 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Sun, 06 Apr 2014 05:14:05 GMT) Full text and rfc822 format available.

Message #22 received at 17070-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paolo Bonzini <bonzini <at> gnu.org>, 17070-done <at> debbugs.gnu.org,
 17156 <at> debbugs.gnu.org
Subject: Re: bug#17070: [PATCH] grep: optimization of DFA by reuse of
 multi-byte buffers in non-UTF8 locales
Date: Sat, 05 Apr 2014 22:12:58 -0700
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> I rebased this patch, and add a bug fix to it.

Thanks.  Paolo wrote it up in <http://bugs.gnu.org/17156#11>, and I just 
now tweaked its ChangeLog and merged the code and installed it (patch 
attached).  I followed up with minor cleanups (2nd patch attached).
[0001-grep-reuse-multibyte-DFA-buffers-in-non-UTF8-locales.patch (text/plain, attachment)]
[0002-grep-minor-improvements-to-previous-patch.patch (text/plain, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 04 May 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 99 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.