GNU bug report logs -
#18454
Improve performance when -P (PCRE) is used in UTF-8 locales
Previous Next
Full log
View this message in rfc822 format
On Thu, Sep 18, 2014 at 1:33 AM, Santiago Ruano Rincón
<santiago <at> debian.org> wrote:
> El 17/09/14 a las 23:00, Paul Eggert escribió:
>> I've installed all the patches mentioned so far.
>>
>
> I've successfully build the latest commit
> (f6de00f6cec3831b8f334de7dbd1b59115627457), but I don't see any
> performance boost. Rather the opposite.
>
> Comparing with debian's grep 2.20-3, that includes your first patch to solve
> this -P issue, 0001-grep-P-invalid-utf8-non-matching.patch:
>
> grep -P asdf /usr/bin/* 12,42s user 0,12s system 99% cpu 12,545 total
> src/grep -P asdf /usr/bin/* 14,37s user 0,12s system 99% cpu 14,492 total
>
> Note that basic grep also slowdowns:
>
> grep asdf /usr/bin/* 0,22s user 0,16s system 99% cpu 0,382 total
> src/grep asdf /usr/bin/* 1,26s user 0,12s system 99% cpu 1,384 total
Thank you for running timing comparisons.
Once I verified that I had no large, sparse files in my grep working directory,
I ran the same test there (du -sh . reports 176M, du --app -sh . reports 139M)
The following shows a performance regression when searching files
like those in my grep working directory.
The new grep (v2.20-46-gf6de00f) takes 2.5x longer than 2.20.14.
This is with a hot cache (best of several runs) on a
Intel(R) Xeon(R) CPU E5-2660, compiled with gcc-5.x
$ diff -u <(env time grep -r asdf . 2>&1) <(PATH=src:$PATH env time
grep -r asdf . 2>&1)
--- /proc/self/fd/11 2014-09-18 12:07:43.169721947 -0700
+++ /proc/self/fd/12 2014-09-18 12:07:43.169721947 -0700
@@ -1,3 +1,3 @@
./src/grep.c: printf 'asdfqwerzxcv\rASDF\tZXCV\n'
-0.08user 0.10system 0:00.18elapsed 100%CPU (0avgtext+0avgdata
6256maxresident)k
-0inputs+0outputs (0major+670minor)pagefaults 0swaps
+0.40user 0.11system 0:00.51elapsed 99%CPU (0avgtext+0avgdata 5328maxresident)k
+0inputs+0outputs (0major+634minor)pagefaults 0swaps
It looks like most of the difference is the result of
commit cd36abd46c5e0768606979ea75a51732062f5624,
"grep: treat a file as binary if its prefix contains encoding errors",
with its new,
locale-sensitive "is_binary" test. I saw the above timing difference
even with LC_ALL=C, so one quick fix would be to skip the use of
mbrlen when possible.
This bug report was last modified 3 years and 181 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.