GNU bug report logs - #18454
Improve performance when -P (PCRE) is used in UTF-8 locales

Previous Next

Package: grep;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Fri, 12 Sep 2014 01:26:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #131 received at 18454 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Zoltán Herczeg <hzmester <at> freemail.hu>
Cc: 18454 <at> debbugs.gnu.org
Subject: Re: bug#18454: Improve performance when -P (PCRE) is used in UTF-8
 locales
Date: Tue, 30 Sep 2014 12:39:17 -0700
On 09/30/2014 11:10 AM, Zoltán Herczeg wrote:
>
>> Grep already does that sort of thing.  And it's smart enough to start matching
>> only at character boundaries.  It's not libpcre's job to worry about this; the
>> caller can worry about it.
> Thank you for bringing this up. I don't see any point of reimplementing what is already there.

Sorry, it sounds like my earlier comment was unclear.  GNU grep is smart 
enough to start matching at character boundaries without checking the 
validity of the input data.  This helps it run faster.  However, because 
libpcre requires a validity prepass, grep -P must slow down and do the 
validity check one way or another.  Grep does this only when libpcre is 
used, and that's one reason grep -P is slower than plain grep.

It's not a question of duplicating code: grep already has code to 
validate binary data.  It's a question of performance. Requiring a 
prepass for validity checking is typically slower (or takes more energy, 
or whatever) than checking validity on the fly.  And in many cases going 
multithreaded would just make matters worse.

I can understand that you don't want to take on the burden of making a 
nontrivial libpcre performance improvement.  Also, I hope 'grep -P' 
performance, though not great, is good enough now to satisfy most 
users.  So perhaps we should just give the topic a rest.




This bug report was last modified 3 years and 181 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.