As previously discussed on the coreutils mailing list, beginning with
most of the coreutils text processing commands process bytes instead of characters, regardless of the user's locale, so they do not handle UTF-8 text or options properly.
I propose the changes in
https://github.com/ericfischer/coreutils/compare/multibyte-squash
to convert sort, uniq, join, tr, cut, paste, expand, and unexpand to process characters instead of bytes, allowing them to work correctly on non-ASCII text, as specified by POSIX.
Eric Fischer