GNU bug report logs - #16966
[PATCH] grep: optimization with the superset of DFA

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 8 Mar 2014 05:43:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paolo Bonzini <bonzini <at> gnu.org>
Cc: 16966 <at> debbugs.gnu.org
Subject: bug#16966: [PATCH] grep: optimization with the superset of DFA
Date: Wed, 02 Apr 2014 07:21:02 +0900
Paolo Bonzini wrote:
> Yeah, but my problem is that a.b will look at a very long line if it
> is translated to a[\x0-\xff]*b.  Better translate it to a[\x0-\xff]{1,2}b
> or something similar.

I seem that it's no problem.

For example, I try following text for the pattern `a.b'.  Whereas the
digit isn't a part of the text but the line number.

1 accccccccccccccccccccccccccccccccccccccc
2 cccccccccccccccccccccccccccccccccccccccc
3 cccccccccccccccccccccccccccccccccccccccc
4 cccccccccccccccccccccccccccccccccccccccc
5 cccccccccccccccccccccccccccccccccccccccb

a --> a + CSET --> a + CSET --> ...... --> a + CSET + b (match)

Then all lines are matched fast.  However, because matches multiple
lines, retry from the last line (line 5).

It doesn't matches the pattern `a.b'.  Therefore the text is rejected.

Next, I try following text for it.

1 accccccccccccccccccccccccccccccccccccccc
2 cccccccccccccccccccccccccccccccccccccccc
3 cccccccccccccccccccccccccccccccccccccccc
4 cccccccccccccccccccccccccccccccccccccccc
5 accccccccccccccccccccccccccccccccccccccb

Then all lines are matched fast.  However, because matches multiple
lines, retry from the last line (line 5).  It's accepted by the
superset.  However, It's rejected by normal DFA.

On the other hands, It can be constituted just three DFA states.  It's
too simple.

             /\
            /  \
            \  /
             \/
  1:a ---> 2:CSET ---> 3:b

Thanks,
Norihiro





This bug report was last modified 11 years and 107 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.