GNU bug report logs - #22655
grep-2.21 (and git master): --null-data and ranges work in an odd way (-P works fine)

Previous Next

Package: grep;

Reported by: Sergei Trofimovich <slyfox <at> gentoo.org>

Date: Sat, 13 Feb 2016 23:24:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #58 received at 22655 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 22655 <at> debbugs.gnu.org
Subject: Re: bug#22655: grep -Pz '^' now fails!
Date: Fri, 18 Nov 2016 16:37:54 +0000
2016-11-18 16:24:02 +0000, Stephane Chazelas:
[...]
> Why would it make it slower. AFAICT, PCRE_MULTILINE *adds*
> some overhead. It is really meant to be only about changing the
> behaviour of ^ and $.
[...]

Unsurprisingly,

where ~/a contains the output of find / -print0 (which is
typically what grep -z is used on) 304MiB, 3696401 records,
none of which contain a newline character in this instance.
with grep 2.10 (Ubuntu 12.04 amd64) in a UTF-8 locale:

$ time grep -Pz '(?-m)^/' ~/a > /dev/null
grep -Pz '(?-m)^/' ~/a > /dev/null  0.84s user 0.15s system 99% cpu 0.990 total
$ time grep -Pz '^/' ~/a > /dev/null
grep -Pz '^/' ~/a > /dev/null  0.87s user 0.12s system 99% cpu 0.989 total

Not much difference as "/" is found at the beginning of each
record.

$ time grep -Pz '(?-m)^x' ~/a > /dev/null
grep -Pz '(?-m)^x' ~/a > /dev/null  0.41s user 0.06s system 99% cpu 0.473 total
$ time grep -Pz '^x' ~/a > /dev/null
grep -Pz '^x' ~/a > /dev/null  0.81s user 0.04s system 99% cpu 0.854 total

PCRE_MULTILINE significantly slows things down as even though
"x" is not found at the beginning of the subject, grep still
needs to look for extra newline characters in the record.

-- 
Stephane




This bug report was last modified 8 years and 190 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.