GNU bug report logs - #18454
Improve performance when -P (PCRE) is used in UTF-8 locales

Previous Next

Package: grep;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Fri, 12 Sep 2014 01:26:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Santiago Ruano Rincón <santiago <at> debian.org>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18454 <at> debbugs.gnu.org
Subject: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Date: Fri, 19 Sep 2014 09:06:47 -0700
On Thu, Sep 18, 2014 at 12:36 PM, Jim Meyering <jim <at> meyering.net> wrote:
> It looks like most of the difference is the result of
> commit cd36abd46c5e0768606979ea75a51732062f5624,
> "grep: treat a file as binary if its prefix contains encoding errors",

Hi Paul,

I found that the above commit induces a large performance hit.
Over 50x in this example:

  seq 99999999 > k
  LC_ALL=C diff -u \
    <(PATH=.bin/2.20-31:$PATH env time -f %e grep asdf k 2>&1) \
    <(PATH=.bin/2.20-32:$PATH env time -f %e grep asdf k 2>&1)
  ...
  -0.21
  +11.47

The problem is that the new function is processing all of
the input, not just a prefix.




This bug report was last modified 3 years and 181 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.