GNU bug report logs - #18454
Improve performance when -P (PCRE) is used in UTF-8 locales

Previous Next

Package: grep;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Fri, 12 Sep 2014 01:26:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #25 received at 18454 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 18454 <at> debbugs.gnu.org, Vincent Lefevre <vincent <at> vinc17.net>
Subject: Re: Improve performance when -P (PCRE) is used in UTF-8 locales
Date: Tue, 16 Sep 2014 18:43:26 -0700
[Message part 1 (text/plain, inline)]
I worked on this some more, and came up with the attached patches 
proposed against the current grep Savannah master (commit 
9ea9254ea58456b84ed2f0c1481ca91cdd325bf7).

For years I've been wanting to write that last patch and I finally got 
around to it.  It improves grep -P's performance by a factor of 1.2 
trillion on one (admittedly artificial) benchmark.  I hope its 1 ZB/s 
scan rate is some kind of record.  The last patch probably won't help 
your test cases, though I hope the other patches do help somewhat.
[0001-grep-refactor-binary-vs-unknown-vs-text-flags-for-cl.patch (text/plain, attachment)]
[0002-grep-z-no-longer-considers-200-to-be-binary-data.patch (text/plain, attachment)]
[0003-grep-non-text-bytes-in-binary-data-may-be-treated-as.patch (text/plain, attachment)]
[0004-grep-minor-P-speedup-with-jit_stack.patch (text/plain, attachment)]
[0005-grep-improve-P-performance-in-typical-cases.patch (text/plain, attachment)]
[0006-grep-skip-past-holes-efficiently.patch (text/plain, attachment)]

This bug report was last modified 3 years and 181 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.