GNU bug report logs -
#16581
suggested code simplification in dfa.c
Previous Next
Reported by: Aharon Robbins <arnold <at> skeeve.com>
Date: Tue, 28 Jan 2014 20:12:01 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
On 01/28/2014 11:48 PM, Paul Eggert wrote:
>
> +/* The following functions exploit the commutativity and associativity of ^,
> + and the fact that X ^ X is zero. POSIX requires that C equals
> + either tolower (C) or toupper (C);
Unfortunately, while this is true, I'm not sure if it accurately covers
all possible case-folded comparisons outside of the C locale.
http://www.unicode.org/faq/casemap_charprop.html
Consider the Greek locale, el_GR.UTF-8, which has two lower-case sigma:
L'\x3c3' and L'\x3c2', but only one upper-case: L'\x3a3'. As a result,
all three wchar_t values must compare case-insensitively to one another.
Or consider titlecase characters, such as Unicode L'\x1c8' (Lj), which
has both an uppercase mapping L'\x1c7' (LJ) and lowercase mapping
L'\x1c9' (lj) - again, all three wchar_t values must compare
case-insensitively to one another.
Your hack is great at finding characters that have a case mapping, but
not necessarily at finding all such characters that map to the same
result when passed through towlower(towupper(c)).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 11 years and 78 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.