GNU bug report logs - #38690
grep -P quadratic on long lines

Previous Next

Package: grep;

Reported by: Martin Raszyk <m.raszyk <at> gmail.com>

Date: Fri, 20 Dec 2019 15:33:02 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Martin Raszyk <m.raszyk <at> gmail.com>
To: 38690 <at> debbugs.gnu.org
Subject: bug#38690: grep -P quadratic on long lines
Date: Fri, 20 Dec 2019 10:54:14 +0100
Dear grep maintainers,

I've realized that grep -P takes quadratic time on long lines with
many short matches. For instance, executing './src/grep -P -o "a"
a.txt > a.out' on a file 'a.txt' consisting of N characters 'a' takes
time quadratic in N. I've used grep-3.3 and pcre-8.43 for the
benchmarks.

The root causes for this behavior are as follows:
1. in src/pcresearch.c on l. 222 (at commit
cf09252295c554dd3eba4cdb8eb53911fb495f40), the end of the line is
searched each time a new match is searched; this already results in
quadratic runtime in the above mentioned case
2. the function 'pcre_exec' from pcre-8.43 called in src/pcresearch.c
on l. 71 for each match checks if the provided string is valid UTF-8
(code implemented in pcre_valid_utf8.c); this also results in
quadratic runtime

On your side, it is possible to fix the first root cause. I'll post an
e-mail to PCRE mailing list about the second root cause.

Best regards,
Martin




This bug report was last modified 5 years and 178 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.