On Fri, Jan 10, 2014 at 5:49 PM, Pádraig Brady
wrote: > Cool so it does this transformation: > > sed 's/./[\L&\U&]/g' > > Though multi byte case handling has all sorts of edge cases (pardon the pun), > and it may not be always valid to treat each character independently? > For example see some of the tests in: > http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=tests/unicase/test-ulc-casecmp.c;hb=HEAD It seems you're right. Since it's a many-to-one mapping in some cases, simply using one lower case character and one upper case version won't cover all possibilities. > I wonder might this faster path be restricted to a safer but very common input subset of: > > (MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80)) That sounds like a good approach. Now I need another test case, to demonstrate that the current code can cause trouble. > Also are the following printfs in the test redundant? > >> +data=$( printf "I:$I $i:i") >> +search_str=$(printf "$i:i I:$I") Good catch. Those were vestiges of pre-factoring code, where they were needed. Here's the patch to fix that part, in your name: