GNU bug report logs -
#54388
printf doesn't handle multi-byte values
Previous Next
Reported by: Pádraig Brady <P <at> draigBrady.com>
Date: Mon, 14 Mar 2022 15:39:02 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 54388 in the body.
You can then email your comments to 54388 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#54388
; Package
coreutils
.
(Mon, 14 Mar 2022 15:39:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Pádraig Brady <P <at> draigBrady.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 14 Mar 2022 15:39:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
On 14/03/2022 03:27, Christoph Anton Mitterer wrote:
> Hey Pádraig.
>
> I just wanted to ask, whether the following could be a bug in printf:
>
> POSIX says[0], that e.g.:
> printf '%d\n' \"3
> should give the numeric value of the character, and that "in a locale
> with multi-byte characters, the value of a character is intended to be
> the value of the equivalent of the wchar_t representation of the
> character".
>
> In bash:
> $ printf '%d\n' $'"\u2208'
> 8712
>
> here the printf is bash's built-in printf, and there it works.
>
>
> But using GNU coreutils' printf (version 8.32):
> $ /usr/bin/printf '%d\n' $'"\u2208'
> /usr/bin/printf: warning: ��: character(s) following character constant have been ignored
> 226
>
>
> Do I have some wrong assumptions or should I report that as a bug?
>
>
> Thanks,
> Chris.
>
>
> [0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
This is a limitation of current coreutils printf that only handles single byte chars currently.
This email will open an issue in our bug tracker.
To summarize:
$ ord() { printf "0x%x\n" "'$1"; } # bash's printf
$ ord 3
0x33
$ ord $'\u2208'
0x2208
$ ord() { env printf "0x%x\n" "'$1"; } # coreutils' printf
$ ord 3
0x33
$ ord $'\u2208'
0xprintf: warning: ��: character(s) following character constant have been ignored
e2
cheers,
Pádraig
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Fri, 18 Mar 2022 15:01:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Pádraig Brady <P <at> draigBrady.com>
:
bug acknowledged by developer.
(Fri, 18 Mar 2022 15:01:01 GMT)
Full text and
rfc822 format available.
Message #10 received at 54388-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 14/03/2022 15:38, Pádraig Brady wrote:
> On 14/03/2022 03:27, Christoph Anton Mitterer wrote:
>> Hey Pádraig.
>>
>> I just wanted to ask, whether the following could be a bug in printf:
>>
>> POSIX says[0], that e.g.:
>> printf '%d\n' \"3
>> should give the numeric value of the character, and that "in a locale
>> with multi-byte characters, the value of a character is intended to be
>> the value of the equivalent of the wchar_t representation of the
>> character".
>>
>> In bash:
>> $ printf '%d\n' $'"\u2208'
>> 8712
>>
>> here the printf is bash's built-in printf, and there it works.
>>
>>
>> But using GNU coreutils' printf (version 8.32):
>> $ /usr/bin/printf '%d\n' $'"\u2208'
>> /usr/bin/printf: warning: ��: character(s) following character constant have been ignored
>> 226
>>
>>
>> Do I have some wrong assumptions or should I report that as a bug?
>>
>>
>> Thanks,
>> Chris.
>>
>>
>> [0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
>
> This is a limitation of current coreutils printf that only handles single byte chars currently.
> This email will open an issue in our bug tracker.
>
> To summarize:
> $ ord() { printf "0x%x\n" "'$1"; } # bash's printf
> $ ord 3
> 0x33
> $ ord $'\u2208'
> 0x2208
>
> $ ord() { env printf "0x%x\n" "'$1"; } # coreutils' printf
> $ ord 3
> 0x33
> $ ord $'\u2208'
> 0xprintf: warning: ��: character(s) following character constant have been ignored
> e2
The attached should fix this up.
Marking this as done.
cheers,
Pádraig
[printf-mb-values.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#54388
; Package
coreutils
.
(Fri, 18 Mar 2022 15:08:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 54388 <at> debbugs.gnu.org (full text, mbox):
On Fri, 2022-03-18 at 14:59 +0000, Pádraig Brady wrote:
> The attached should fix this up.
Thanks!
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#54388
; Package
coreutils
.
(Fri, 18 Mar 2022 15:42:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 54388 <at> debbugs.gnu.org (full text, mbox):
On 18/03/2022 14:59, Pádraig Brady wrote:
> The attached should fix this up.
The following should make this more efficient for the normal unibyte case,
as one can't have NUL chars in any multi-byte encodings.
- if (MB_CUR_MAX > 1) \
+ if (MB_CUR_MAX > 1 && *(s + 1)) \
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 16 Apr 2022 11:24:09 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 65 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.