GNU bug report logs - #16919
[PATCH] fix mismatch between dfa and regex for treatment of titlecase

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sun, 2 Mar 2014 00:34:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 16919 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro TANAKA <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 16919 <at> debbugs.gnu.org
Subject: Re: bug#16919: [PATCH] fix mismatch between dfa and regex for
 treatment of titlecase
Date: Wed, 05 Mar 2014 23:11:40 +0900
[Message part 1 (text/plain, inline)]
Hi Paul,

Thanks for a lot of investigation.  I have understood that we cannot
generally tell whether DFA's or regex's behavior is right.

I have tested the behavior of sereral regex engines.  What's interesting
is that most of results differ from others.  And nobody will understand
which is right.

--
GNU grep (DFA):

$ env LANG=en_US.utf8 ./test.sh "src/grep -i" 2>/dev/null | nl -ba
     1   c7 87 | c7 89
     2   c7 87 | c7 88 | c7 89
     3   c7 87 | c7 89
     4   49 | 69
     5   49 | 69
     6   69 | c4 b0
     7   49 | c4 b1

GNU grep (regex):

$ env LANG=en_US.utf8 ./test.sh "src/grep -i" '\(\)\1' 2>/dev/null | nl -ba
     1   c7 87 | c7 88 | c7 89
     2   c7 87 | c7 88 | c7 89
     3   c7 87 | c7 88 | c7 89
     4   49 | 69 | c4 b1
     5   49 | 69 | c4 b1
     6   c4 b0
     7   49 | 69 | c4 b1

pcregrep:

$ env LANG=en_US.utf8 ./test.sh "pcregrep -iu" 2>/dev/null | nl -ba
     1   c7 87 | c7 88 | c7 89
     2   c7 87 | c7 88 | c7 89
     3   c7 87 | c7 88 | c7 89
     4   49 | 69
     5   49 | 69
     6   c4 b0
     7   c4 b1

Solaris grep (xpg4):

$ env LANG=ja_JP.UTF-8 ./test.sh  "/usr/xpg4/bin/grep -i" 2>/dev/null | nl -ba
     1           c7 87 | c7 89
     2           c7 88
     3           c7 87 | c7 89
     4           49 | 69
     5           49 | 69
     6           error
     7           error

HP-UX grep:

$ env LANG=en_US.utf8 ./test.sh  "/bin/grep -i" 2>/dev/null | nl -ba
     1              c7    87     |    c7    88     |    c7    89
     2              c7    87     |    c7    88     |    c7    89
     3              c7    87     |    c7    88     |    c7    89
     4              49     |    69
     5              49     |    69
     6              c4    b0
     7              c4    b1
--

Norihiro
[test.sh (application/octet-stream, attachment)]

This bug report was last modified 11 years and 134 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.