GNU bug report logs -
#38690
grep -P quadratic on long lines
Previous Next
Full log
View this message in rfc822 format
Dear grep maintainers,
I've realized that grep -P takes quadratic time on long lines with
many short matches. For instance, executing './src/grep -P -o "a"
a.txt > a.out' on a file 'a.txt' consisting of N characters 'a' takes
time quadratic in N. I've used grep-3.3 and pcre-8.43 for the
benchmarks.
The root causes for this behavior are as follows:
1. in src/pcresearch.c on l. 222 (at commit
cf09252295c554dd3eba4cdb8eb53911fb495f40), the end of the line is
searched each time a new match is searched; this already results in
quadratic runtime in the above mentioned case
2. the function 'pcre_exec' from pcre-8.43 called in src/pcresearch.c
on l. 71 for each match checks if the provided string is valid UTF-8
(code implemented in pcre_valid_utf8.c); this also results in
quadratic runtime
On your side, it is possible to fix the first root cause. I'll post an
e-mail to PCRE mailing list about the second root cause.
Best regards,
Martin
This bug report was last modified 5 years and 178 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.