GNU bug report logs -
#16966
[PATCH] grep: optimization with the superset of DFA
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sat, 8 Mar 2014 05:43:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Paolo Bonzini wrote:
> Yeah, but my problem is that a.b will look at a very long line if it
> is translated to a[\x0-\xff]*b. Better translate it to a[\x0-\xff]{1,2}b
> or something similar.
I seem that it's no problem.
For example, I try following text for the pattern `a.b'. Whereas the
digit isn't a part of the text but the line number.
1 accccccccccccccccccccccccccccccccccccccc
2 cccccccccccccccccccccccccccccccccccccccc
3 cccccccccccccccccccccccccccccccccccccccc
4 cccccccccccccccccccccccccccccccccccccccc
5 cccccccccccccccccccccccccccccccccccccccb
a --> a + CSET --> a + CSET --> ...... --> a + CSET + b (match)
Then all lines are matched fast. However, because matches multiple
lines, retry from the last line (line 5).
It doesn't matches the pattern `a.b'. Therefore the text is rejected.
Next, I try following text for it.
1 accccccccccccccccccccccccccccccccccccccc
2 cccccccccccccccccccccccccccccccccccccccc
3 cccccccccccccccccccccccccccccccccccccccc
4 cccccccccccccccccccccccccccccccccccccccc
5 accccccccccccccccccccccccccccccccccccccb
Then all lines are matched fast. However, because matches multiple
lines, retry from the last line (line 5). It's accepted by the
superset. However, It's rejected by normal DFA.
On the other hands, It can be constituted just three DFA states. It's
too simple.
/\
/ \
\ /
\/
1:a ---> 2:CSET ---> 3:b
Thanks,
Norihiro
This bug report was last modified 11 years and 107 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.