GNU bug report logs -
#7960
multibyte: fmt: fix formatting multibyte text (bug #7372)
Previous Next
Reported by: Kostya Stopani <hatta <at> depni.sinp.msu.ru>
Date: Wed, 2 Feb 2011 14:42:01 UTC
Severity: normal
Tags: moreinfo, patch
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
[readding the list]
On 02/02/2011 02:11 PM, Kostya Stopani wrote:
> On Wed, Feb 02, 2011 at 10:15:53AM -0700, Eric Blake wrote:
>
>> Thanks for the patch. However, it's not trivial, so it would need
>> copyright assignment.
>
> Oh boy... Anyway I don't mind signing papers, if you (or whoever)
> don't mind bothering with it.
OK, I'll send you those details off-list.
>
>> Furthermore, there are already known issues where upstream coreutils
>> is lacking multibyte character support, but a solution has to be
>> both maintainable and no-impact to the single-byte locale case.
>
> I believe this patch doesn't break single-byte behavior because no
> conversion takes place. mbsnrtowcs() is used only to count
> characters. I've tested various cases (8-bit encoding was KOI8-R):
>
> |--------+---------------+--------------------------|
> | Locale | Text encoding | Result |
> |--------+---------------+--------------------------|
> | UTF-8 | UTF-8 | old fmt: text too narrow |
> | | | new fmt: ok |
> |--------+---------------+--------------------------|
> | UTF-8 | 8-bit | same |
> |--------+---------------+--------------------------|
> | 8-bit | UTF-8 | same |
> |--------+---------------+--------------------------|
> | 8-bit | 8-bit | same |
> |--------+---------------+--------------------------|
>
> From my point of view the alternative is to convert everything to
> wchar_t, which imposes the need to keep track of conversion errors and
> gracefully fall back to single-byte.
Keeping things in multibyte rather than converting to wchar_t is the way
to go (especially given the ongoing discussion of how to handle the fact
that on cygwin, wchar_t is UTF-16 and thus still multi-unit as an
extension to POSIX, with all sorts of ramifications to programs that
expect POSIX semantics).
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 6 years and 264 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.