GNU bug report logs -
#16919
[PATCH] fix mismatch between dfa and regex for treatment of titlecase
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sun, 2 Mar 2014 00:34:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
Message #23 received at 16919 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Paul,
Thanks for a lot of investigation. I have understood that we cannot
generally tell whether DFA's or regex's behavior is right.
I have tested the behavior of sereral regex engines. What's interesting
is that most of results differ from others. And nobody will understand
which is right.
--
GNU grep (DFA):
$ env LANG=en_US.utf8 ./test.sh "src/grep -i" 2>/dev/null | nl -ba
1 c7 87 | c7 89
2 c7 87 | c7 88 | c7 89
3 c7 87 | c7 89
4 49 | 69
5 49 | 69
6 69 | c4 b0
7 49 | c4 b1
GNU grep (regex):
$ env LANG=en_US.utf8 ./test.sh "src/grep -i" '\(\)\1' 2>/dev/null | nl -ba
1 c7 87 | c7 88 | c7 89
2 c7 87 | c7 88 | c7 89
3 c7 87 | c7 88 | c7 89
4 49 | 69 | c4 b1
5 49 | 69 | c4 b1
6 c4 b0
7 49 | 69 | c4 b1
pcregrep:
$ env LANG=en_US.utf8 ./test.sh "pcregrep -iu" 2>/dev/null | nl -ba
1 c7 87 | c7 88 | c7 89
2 c7 87 | c7 88 | c7 89
3 c7 87 | c7 88 | c7 89
4 49 | 69
5 49 | 69
6 c4 b0
7 c4 b1
Solaris grep (xpg4):
$ env LANG=ja_JP.UTF-8 ./test.sh "/usr/xpg4/bin/grep -i" 2>/dev/null | nl -ba
1 c7 87 | c7 89
2 c7 88
3 c7 87 | c7 89
4 49 | 69
5 49 | 69
6 error
7 error
HP-UX grep:
$ env LANG=en_US.utf8 ./test.sh "/bin/grep -i" 2>/dev/null | nl -ba
1 c7 87 | c7 88 | c7 89
2 c7 87 | c7 88 | c7 89
3 c7 87 | c7 88 | c7 89
4 49 | 69
5 49 | 69
6 c4 b0
7 c4 b1
--
Norihiro
[test.sh (application/octet-stream, attachment)]
This bug report was last modified 11 years and 134 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.