On 02/19/2014 11:59 AM, Ben Boeckel wrote: > [ I am not subscribed; please keep me on the CC. ] > > Hi, > >>From the new grep announcement on LWN[1], I had a thought about how the > German eszett was handled. It seems that it hasn't been handled at all. > This may fall to the same resolution as the recent LJ/Lj thread[2] > though. > > Basically, it seems that grep doesn't support alternates when changing > case. The uppercase of 'ß' is either 'SS' or 'ẞ' depending on the > context[3]. Alas, in terms of POSIX functionality, we can only change case between single-character entities. Changing ß to SS is a single->multi-character change; it is DIFFERENT than the Turkish i situation (there, although we change between single-byte and multi-byte, the changes are still always single character). Similar problems apply to Greek trailing sigma, which is also a context-sensitive change operation. As long as we are stuck using the POSIX definition of case changes on a character-by-character basis, where the input and output are 1:1 character mappings, we cannot handle the German eszett case specially. For PROPER handling of locale-sensitive case rules, we'd need full Unicode rules that operate on words, rather than characters, which quickly gets out of scope of what we can do in POSIX regex. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org