[readding the list] On 02/02/2011 02:11 PM, Kostya Stopani wrote: > On Wed, Feb 02, 2011 at 10:15:53AM -0700, Eric Blake wrote: > >> Thanks for the patch. However, it's not trivial, so it would need >> copyright assignment. > > Oh boy... Anyway I don't mind signing papers, if you (or whoever) > don't mind bothering with it. OK, I'll send you those details off-list. > >> Furthermore, there are already known issues where upstream coreutils >> is lacking multibyte character support, but a solution has to be >> both maintainable and no-impact to the single-byte locale case. > > I believe this patch doesn't break single-byte behavior because no > conversion takes place. mbsnrtowcs() is used only to count > characters. I've tested various cases (8-bit encoding was KOI8-R): > > |--------+---------------+--------------------------| > | Locale | Text encoding | Result | > |--------+---------------+--------------------------| > | UTF-8 | UTF-8 | old fmt: text too narrow | > | | | new fmt: ok | > |--------+---------------+--------------------------| > | UTF-8 | 8-bit | same | > |--------+---------------+--------------------------| > | 8-bit | UTF-8 | same | > |--------+---------------+--------------------------| > | 8-bit | 8-bit | same | > |--------+---------------+--------------------------| > > From my point of view the alternative is to convert everything to > wchar_t, which imposes the need to keep track of conversion errors and > gracefully fall back to single-byte. Keeping things in multibyte rather than converting to wchar_t is the way to go (especially given the ongoing discussion of how to handle the fact that on cygwin, wchar_t is UTF-16 and thus still multi-unit as an extension to POSIX, with all sorts of ramifications to programs that expect POSIX semantics). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org