GNU bug report logs -
#16919
[PATCH] fix mismatch between dfa and regex for treatment of titlecase
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sun, 2 Mar 2014 00:34:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
Message #29 received at 16919 <at> debbugs.gnu.org (full text, mbox):
On 03/05/2014 07:11 AM, Norihiro Tanaka wrote:
> I still believe that upper or lower case of a character should
> also match title case
The (soon-to-be-fixed) gnulib regex code agrees with you, assuming that
towupper (X) agrees for all three values of X, because it uses (towupper
(input) == towupper (pattern)). However, the most-plausible reading of
POSIX does not agree with you, as it would require (input == pattern ||
towlower (input) == pattern || towupper (input) == pattern), which means
a titlecase pattern will match only itself.
It seems pretty clear to me that the most-plausible reading of POSIX is
buggy, for this reason. No wonder so many implementations fail to
conform to it.
I thought of a different way where gnulib/glibc regex does not conform
to POSIX, and here there doesn't seem to be any ambiguity about it. In
the POSIX locale when ignoring case, the pattern '[Z-a]' matches the
data 'Z', 'z', 'A', 'a', and the nonalphabetic characters like '^' that
collate between 'Z' and 'a'. But the glibc regex code rejects that
pattern entirely. Conversely, in the same situation the glibc regex
code says '[A-z]' matches only alphabetic characters, whereas POSIX says
it should also match the nonalphabetic characters like '^' that collate
between 'Z' and 'a'. It appears that nobody cares, as this
incompatibility has been present for years and I don't recall anyone
complaining. Though it is weird that this means "grep PAT" can match
some lines that "grep -i PAT" doesn't.
Here POSIX is not merely ambiguous, it's clearly disagreeing with common
practice. It's not clear whether the bug is in POSIX or in the
implementation.
This bug report was last modified 11 years and 135 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.