On 01/29/2014 06:42 AM, Eric Blake wrote:

> Your hack is great at finding characters that have a case mapping, but
> not necessarily at finding all such characters that map to the same
> result when passed through towlower(towupper(c)).
> 

In particular, note that the Java language has formalized
case-insensitive comparison as follows:

http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#equalsIgnoreCase%28java.lang.String%29

Two characters c1 and c2 are considered the same, ignoring case if at
least one of the following is true:

    The two characters are the same (as compared by the == operator).
    Applying the method Character.toUpperCase(char) to each character
produces the same result.
    Applying the method Character.toLowerCase(char) to each character
produces the same result.

and lower down, compareToIgnoreCase():

Compares two strings lexicographically, ignoring case differences. This
method returns an integer whose sign is that of calling compareTo with
normalized versions of the strings where case differences have been
eliminated by calling
Character.toLowerCase(Character.toUpperCase(character)) on each character.

Note that this method does not take locale into account, and will result
in an unsatisfactory ordering for certain locales. The java.text package
provides collators to allow locale-sensitive ordering.


In particular, the specification was careful to require double-case
conversion, with uppercase first, in order to normalize all
single-character oddities, while still mentioning that true Unicode
collation has even more special cases that can't be decided on a
character-by-character basis.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org