On 01/29/2014 06:42 AM, Eric Blake wrote: > Your hack is great at finding characters that have a case mapping, but > not necessarily at finding all such characters that map to the same > result when passed through towlower(towupper(c)). > In particular, note that the Java language has formalized case-insensitive comparison as follows: http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#equalsIgnoreCase%28java.lang.String%29 Two characters c1 and c2 are considered the same, ignoring case if at least one of the following is true: The two characters are the same (as compared by the == operator). Applying the method Character.toUpperCase(char) to each character produces the same result. Applying the method Character.toLowerCase(char) to each character produces the same result. and lower down, compareToIgnoreCase(): Compares two strings lexicographically, ignoring case differences. This method returns an integer whose sign is that of calling compareTo with normalized versions of the strings where case differences have been eliminated by calling Character.toLowerCase(Character.toUpperCase(character)) on each character. Note that this method does not take locale into account, and will result in an unsatisfactory ordering for certain locales. The java.text package provides collators to allow locale-sensitive ordering. In particular, the specification was careful to require double-case conversion, with uppercase first, in order to normalize all single-character oddities, while still mentioning that true Unicode collation has even more special cases that can't be decided on a character-by-character basis. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org