GNU bug report logs - #22103
[PATCH] grep: improve performance for grep -P in UTF-8

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sun, 6 Dec 2015 23:02:01 UTC

Severity: normal

Tags: patch

Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: <bug-grep <at> gnu.org>
Subject: [PATCH] grep: improve performance for grep -P in UTF-8
Date: Mon, 07 Dec 2015 08:01:23 +0900
[Message part 1 (text/plain, inline)]
After grep -P found first match, TEXTBIN_UNKNOWN optimizations is not
used.  Therefore, if grep -P found early match, grep -P is very slow in
UTF-8.

  $ time -p grep -P ^1$ <(seq 999999)
  1
  real 14.55
  user 13.77
  sys 1.12

Or grep -Pa is not used TEXTBIN_UNKNOWN optimizations.  Therefere, it is
also very slow in UTF-8.

grep -P ^1$ <(seq 999999)

  $ time -p grep -Pa a <(seq 999999)
  real 14.53
  user 13.65
  sys 1.35

This change makes deference to leave TEXTBIN_UNKNOWN optimizations until
grep -P finds a binary character.

It will bring more than 10x speed up.

  $ time -p src/grep -P ^1$ <(seq 999999)
  1
  real 0.97
  user 0.79
  sys 0.24

  $ time -p src/grep -Pa a <(seq 999999)
  real 0.98
  user 0.23
  sys 0.99

BTW, this change conflicts with proposal in bug#22028.
[0001-grep-improve-performance-for-grep-P-in-UTF-8.patch (text/plain, attachment)]

This bug report was last modified 9 years and 133 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.