GNU bug report logs - #18266
grep -P and invalid exits with error

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Thu, 14 Aug 2014 15:43:02 UTC

Severity: wishlist

Merged with 18455

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18266 <at> debbugs.gnu.org, Santiago <santiago <at> debian.org>, 758105 <at> bugs.debian.org, 761157 <at> bugs.debian.org
Subject: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error
Date: Fri, 12 Sep 2014 03:42:47 +0200
On 2014-09-11 10:07:49 -0700, Paul Eggert wrote:
> Vincent Lefevre wrote:
> >I've just reported a new Debian concerning the performance problem.
> 
> It's not clear from http://bugs.debian.org/761157 that the performance
> problem occurs only with -P, but I assume that's what is meant.

It's specific to -P:

2.18-2   0.9s with -P, 0.4s without -P
2.20-3  11.6s with -P, 0.4s without -P

> Since this is a performance bug with PCRE, I suggest moving the Debian bug
> report to the Debian libpcre3 package.  Grep cannot go back to the old way,
> which could cause grep to crash, and the bug cannot be fixed in grep because
> libpcre3 does not provide a fast way to search arbitrary data that may
> include encoding errors.  It really is a problem that requires changes to
> libpcre3 to fix; grep cannot fix it.

Fixing the performance problem in libpcre3 would indeed be better
(even with the old version of grep, libpcre3 was twice as slow as
grep, but this is less critical than a 13x slowdown).

However a workaround in grep could be simpler. I've just opened a
new bug and suggested several solutions:

  http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18454

> In the meantime, in order to use 'grep' to search for strings in arbitrary
> data, I suggest omitting the '-P'.

This is a bit annoying because I sometimes use specific PCRE features.
I could try to parse the arguments, detect where the pattern is used,
and avoid -P if the pattern doesn't use specific PCRE features (at
least for the most common forms). An additional advantage is that it
could be twice as fast in most cases (see above). This could also be
done in grep, as I suggested in my new bug report.

> Also, I suggest using the C locale.

This could be a solution, because in practice, I pipe the result
to "less -FRX", but only grep has to use the C locale, so that the
accented characters are correctly displayed by "less". However with
some (rare?) patterns, it won't work because an accented character
would no longer be seen as a single character.

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




This bug report was last modified 10 years and 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.