GNU bug report logs -
#22103
[PATCH] grep: improve performance for grep -P in UTF-8
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sun, 6 Dec 2015 23:02:01 UTC
Severity: normal
Tags: patch
Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22103 in the body.
You can then email your comments to 22103 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#22103
; Package
grep
.
(Sun, 06 Dec 2015 23:02:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Sun, 06 Dec 2015 23:02:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
After grep -P found first match, TEXTBIN_UNKNOWN optimizations is not
used. Therefore, if grep -P found early match, grep -P is very slow in
UTF-8.
$ time -p grep -P ^1$ <(seq 999999)
1
real 14.55
user 13.77
sys 1.12
Or grep -Pa is not used TEXTBIN_UNKNOWN optimizations. Therefere, it is
also very slow in UTF-8.
grep -P ^1$ <(seq 999999)
$ time -p grep -Pa a <(seq 999999)
real 14.53
user 13.65
sys 1.35
This change makes deference to leave TEXTBIN_UNKNOWN optimizations until
grep -P finds a binary character.
It will bring more than 10x speed up.
$ time -p src/grep -P ^1$ <(seq 999999)
1
real 0.97
user 0.79
sys 0.24
$ time -p src/grep -Pa a <(seq 999999)
real 0.98
user 0.23
sys 0.99
BTW, this change conflicts with proposal in bug#22028.
[0001-grep-improve-performance-for-grep-P-in-UTF-8.patch (text/plain, attachment)]
Reply sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
You have taken responsibility.
(Fri, 08 Jan 2016 13:47:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
bug acknowledged by developer.
(Fri, 08 Jan 2016 13:47:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 22103-done <at> debbugs.gnu.org (full text, mbox):
On Wed, 6 Jan 2016 09:57:46 -0800
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 01/06/2016 12:32 AM, Paul Eggert wrote:
> > I installed the attached patch, which fixed this performance bug for me.
> Whoops! I forgot to 'git add src/search.h' before committing. We also need the attached followup patch, which I installed.
Great! Thanks, many issues including for output of invalid sequence
are fixed by your patches. bug#22103 is also fixed in them, so I am
closing it.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22103
; Package
grep
.
(Fri, 08 Jan 2016 21:36:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 22103 <at> debbugs.gnu.org (full text, mbox):
On Fri, Jan 8, 2016 at 5:46 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>
> On Wed, 6 Jan 2016 09:57:46 -0800
> Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
>> On 01/06/2016 12:32 AM, Paul Eggert wrote:
>> > I installed the attached patch, which fixed this performance bug for me.
>> Whoops! I forgot to 'git add src/search.h' before committing. We also need the attached followup patch, which I installed.
>
> Great! Thanks, many issues including for output of invalid sequence
> are fixed by your patches. bug#22103 is also fixed in them, so I am
> closing it.
Thank you for helping with bug triage.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22103
; Package
grep
.
(Fri, 08 Jan 2016 21:36:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 06 Feb 2016 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 132 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.