GNU bug report logs - #22103
[PATCH] grep: improve performance for grep -P in UTF-8

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sun, 6 Dec 2015 23:02:01 UTC

Severity: normal

Tags: patch

Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22103 in the body.
You can then email your comments to 22103 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#22103; Package grep. (Sun, 06 Dec 2015 23:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 06 Dec 2015 23:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: <bug-grep <at> gnu.org>
Subject: [PATCH] grep: improve performance for grep -P in UTF-8
Date: Mon, 07 Dec 2015 08:01:23 +0900
[Message part 1 (text/plain, inline)]
After grep -P found first match, TEXTBIN_UNKNOWN optimizations is not
used.  Therefore, if grep -P found early match, grep -P is very slow in
UTF-8.

  $ time -p grep -P ^1$ <(seq 999999)
  1
  real 14.55
  user 13.77
  sys 1.12

Or grep -Pa is not used TEXTBIN_UNKNOWN optimizations.  Therefere, it is
also very slow in UTF-8.

grep -P ^1$ <(seq 999999)

  $ time -p grep -Pa a <(seq 999999)
  real 14.53
  user 13.65
  sys 1.35

This change makes deference to leave TEXTBIN_UNKNOWN optimizations until
grep -P finds a binary character.

It will bring more than 10x speed up.

  $ time -p src/grep -P ^1$ <(seq 999999)
  1
  real 0.97
  user 0.79
  sys 0.24

  $ time -p src/grep -Pa a <(seq 999999)
  real 0.98
  user 0.23
  sys 0.99

BTW, this change conflicts with proposal in bug#22028.
[0001-grep-improve-performance-for-grep-P-in-UTF-8.patch (text/plain, attachment)]

Reply sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
You have taken responsibility. (Fri, 08 Jan 2016 13:47:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Fri, 08 Jan 2016 13:47:02 GMT) Full text and rfc822 format available.

Message #10 received at 22103-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 22103-done <at> debbugs.gnu.org
Subject: Re: bug#20526: grep BUG: text file is detected as binary
Date: Fri, 08 Jan 2016 22:46:33 +0900
On Wed, 6 Jan 2016 09:57:46 -0800
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> On 01/06/2016 12:32 AM, Paul Eggert wrote:
> > I installed the attached patch, which fixed this performance bug for me. 
> Whoops! I forgot to 'git add src/search.h' before committing. We also need the attached followup patch, which I installed.

Great!   Thanks, many issues including for output of invalid sequence
are fixed by your patches.  bug#22103 is also fixed in them, so I am
closing it.





Information forwarded to bug-grep <at> gnu.org:
bug#22103; Package grep. (Fri, 08 Jan 2016 21:36:02 GMT) Full text and rfc822 format available.

Message #13 received at 22103 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 22103 <at> debbugs.gnu.org, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 22103-done <at> debbugs.gnu.org
Subject: Re: bug#22103: bug#20526: grep BUG: text file is detected as binary
Date: Fri, 8 Jan 2016 13:35:12 -0800
On Fri, Jan 8, 2016 at 5:46 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>
> On Wed, 6 Jan 2016 09:57:46 -0800
> Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
>> On 01/06/2016 12:32 AM, Paul Eggert wrote:
>> > I installed the attached patch, which fixed this performance bug for me.
>> Whoops! I forgot to 'git add src/search.h' before committing. We also need the attached followup patch, which I installed.
>
> Great!   Thanks, many issues including for output of invalid sequence
> are fixed by your patches.  bug#22103 is also fixed in them, so I am
> closing it.

Thank you for helping with bug triage.




Information forwarded to bug-grep <at> gnu.org:
bug#22103; Package grep. (Fri, 08 Jan 2016 21:36:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 06 Feb 2016 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 132 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.