GNU bug report logs - #19306
[PATCH 1/2] dfa: avoid execution for a pattern including an unsupported expression

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 8 Dec 2014 15:26:01 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 19306 <at> debbugs.gnu.org
Subject: bug#19306: [PATCH 1/2] dfa: avoid execution for a pattern including an unsupported expression
Date: Sun, 19 Jul 2015 20:14:52 -0700
On Sun, Jul 19, 2015 at 12:42 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> On Sat, 18 Jul 2015 22:15:33 -0700
> Jim Meyering <jim <at> meyering.net> wrote:
>
>> Hello,
>> Thank you for the patches in this report:
>>
>>   http://bugs.gnu.org/19306
>>
>> Please excuse my delay in getting back to you on this.
>> Would you revise each of those to include a test case
>> that demonstrates the problem/fix?
>
> Thanks for your reviewing of this report.
>
> This is not bug fix.  It avoids that BACKREF is found in the process of
> DFAEXEC and passed to regex in multibyte locale.  In other words, if a
> pattern includes BACKREF, grep does not try to use DFA from the
> beginning.
>
> I confirmed about 10% speed-up for a test case in attachment.
>
>   Before patching: real 7.29  user 7.26  sys 0.02
>   After patching : real 6.57  user 6.55  sys 0.01
>
> KWset and DFA superset succeeds for all rows in the test case, and DFA
> for multibyte succeeds, too.  However, all rows are rejected in regex.
>
> After patching, grep does not try DFA for multibyte, as pattern includes
> BACKREF.
>
> In addtion, I believe that DFA is simplified by removal of handling for
> BACKREF from dfaanalyze(), dfassbuild() and dfaexec().

Thank you for the additional information and the test script.
I like most of this patch, but not the fact that it causes the
word-delim-multibyte test to fail. I do see that also applying your
following patch makes that test pass once again.  However, it does so
at the cost of forcing a new class of regexps (any that contain a use
of \b, \<  or \>) from DFA into the slower regex matcher.

That feels like too large a performance penalty, in general.

Can you quantify it?




This bug report was last modified 9 years and 363 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.