GNU bug report logs -
#54388
printf doesn't handle multi-byte values
Previous Next
Reported by: Pádraig Brady <P <at> draigBrady.com>
Date: Mon, 14 Mar 2022 15:39:02 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Fri, 18 Mar 2022 14:59:42 +0000
with message-id <fd2fd96e-f1d6-40a3-85a1-18817c15400d <at> draigBrady.com>
and subject line Re: bug#54388: printf doesn't handle multi-byte values
has caused the debbugs.gnu.org bug report #54388,
regarding printf doesn't handle multi-byte values
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
54388: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=54388
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
On 14/03/2022 03:27, Christoph Anton Mitterer wrote:
> Hey Pádraig.
>
> I just wanted to ask, whether the following could be a bug in printf:
>
> POSIX says[0], that e.g.:
> printf '%d\n' \"3
> should give the numeric value of the character, and that "in a locale
> with multi-byte characters, the value of a character is intended to be
> the value of the equivalent of the wchar_t representation of the
> character".
>
> In bash:
> $ printf '%d\n' $'"\u2208'
> 8712
>
> here the printf is bash's built-in printf, and there it works.
>
>
> But using GNU coreutils' printf (version 8.32):
> $ /usr/bin/printf '%d\n' $'"\u2208'
> /usr/bin/printf: warning: ��: character(s) following character constant have been ignored
> 226
>
>
> Do I have some wrong assumptions or should I report that as a bug?
>
>
> Thanks,
> Chris.
>
>
> [0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
This is a limitation of current coreutils printf that only handles single byte chars currently.
This email will open an issue in our bug tracker.
To summarize:
$ ord() { printf "0x%x\n" "'$1"; } # bash's printf
$ ord 3
0x33
$ ord $'\u2208'
0x2208
$ ord() { env printf "0x%x\n" "'$1"; } # coreutils' printf
$ ord 3
0x33
$ ord $'\u2208'
0xprintf: warning: ��: character(s) following character constant have been ignored
e2
cheers,
Pádraig
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
On 14/03/2022 15:38, Pádraig Brady wrote:
> On 14/03/2022 03:27, Christoph Anton Mitterer wrote:
>> Hey Pádraig.
>>
>> I just wanted to ask, whether the following could be a bug in printf:
>>
>> POSIX says[0], that e.g.:
>> printf '%d\n' \"3
>> should give the numeric value of the character, and that "in a locale
>> with multi-byte characters, the value of a character is intended to be
>> the value of the equivalent of the wchar_t representation of the
>> character".
>>
>> In bash:
>> $ printf '%d\n' $'"\u2208'
>> 8712
>>
>> here the printf is bash's built-in printf, and there it works.
>>
>>
>> But using GNU coreutils' printf (version 8.32):
>> $ /usr/bin/printf '%d\n' $'"\u2208'
>> /usr/bin/printf: warning: ��: character(s) following character constant have been ignored
>> 226
>>
>>
>> Do I have some wrong assumptions or should I report that as a bug?
>>
>>
>> Thanks,
>> Chris.
>>
>>
>> [0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
>
> This is a limitation of current coreutils printf that only handles single byte chars currently.
> This email will open an issue in our bug tracker.
>
> To summarize:
> $ ord() { printf "0x%x\n" "'$1"; } # bash's printf
> $ ord 3
> 0x33
> $ ord $'\u2208'
> 0x2208
>
> $ ord() { env printf "0x%x\n" "'$1"; } # coreutils' printf
> $ ord 3
> 0x33
> $ ord $'\u2208'
> 0xprintf: warning: ��: character(s) following character constant have been ignored
> e2
The attached should fix this up.
Marking this as done.
cheers,
Pádraig
[printf-mb-values.patch (text/x-patch, attachment)]
This bug report was last modified 3 years and 115 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.