GNU bug report logs - #54388
printf doesn't handle multi-byte values

Previous Next

Package: coreutils;

Reported by: Pádraig Brady <P <at> draigBrady.com>

Date: Mon, 14 Mar 2022 15:39:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 54388 in the body.
You can then email your comments to 54388 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#54388; Package coreutils. (Mon, 14 Mar 2022 15:39:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pádraig Brady <P <at> draigBrady.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 14 Mar 2022 15:39:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Christoph Anton Mitterer <calestyo <at> scientia.org>,
 Report bugs to <bug-coreutils <at> gnu.org>
Subject: printf doesn't handle multi-byte values
Date: Mon, 14 Mar 2022 15:38:23 +0000
On 14/03/2022 03:27, Christoph Anton Mitterer wrote:
> Hey Pádraig.
> 
> I just wanted to ask, whether the following could be a bug in printf:
> 
> POSIX says[0], that e.g.:
>     printf '%d\n' \"3
> should give the numeric value of the character, and that "in a locale
> with multi-byte characters, the value of a character is intended to be
> the value of the equivalent of the wchar_t representation of the
> character".
> 
> In bash:
> $ printf '%d\n' $'"\u2208'
> 8712
> 
> here the printf is bash's built-in printf, and there it works.
> 
> 
> But using GNU coreutils' printf (version 8.32):
> $ /usr/bin/printf '%d\n' $'"\u2208'
> /usr/bin/printf: warning: ��: character(s) following character constant have been ignored
> 226
> 
> 
> Do I have some wrong assumptions or should I report that as a bug?
> 
> 
> Thanks,
> Chris.
> 
> 
> [0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html

This is a limitation of current coreutils printf that only handles single byte chars currently.
This email will open an issue in our bug tracker.

To summarize:
$ ord() { printf "0x%x\n" "'$1"; }  # bash's printf
$ ord 3
0x33
$ ord $'\u2208'
0x2208

$ ord() { env printf "0x%x\n" "'$1"; }  # coreutils' printf
$ ord 3
0x33
$ ord $'\u2208'
0xprintf: warning: ��: character(s) following character constant have been ignored
e2

cheers,
Pádraig




Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Fri, 18 Mar 2022 15:01:01 GMT) Full text and rfc822 format available.

Notification sent to Pádraig Brady <P <at> draigBrady.com>:
bug acknowledged by developer. (Fri, 18 Mar 2022 15:01:01 GMT) Full text and rfc822 format available.

Message #10 received at 54388-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: calestyo <at> scientia.org, 54388-done <at> debbugs.gnu.org
Subject: Re: bug#54388: printf doesn't handle multi-byte values
Date: Fri, 18 Mar 2022 14:59:42 +0000
[Message part 1 (text/plain, inline)]
On 14/03/2022 15:38, Pádraig Brady wrote:
> On 14/03/2022 03:27, Christoph Anton Mitterer wrote:
>> Hey Pádraig.
>>
>> I just wanted to ask, whether the following could be a bug in printf:
>>
>> POSIX says[0], that e.g.:
>>      printf '%d\n' \"3
>> should give the numeric value of the character, and that "in a locale
>> with multi-byte characters, the value of a character is intended to be
>> the value of the equivalent of the wchar_t representation of the
>> character".
>>
>> In bash:
>> $ printf '%d\n' $'"\u2208'
>> 8712
>>
>> here the printf is bash's built-in printf, and there it works.
>>
>>
>> But using GNU coreutils' printf (version 8.32):
>> $ /usr/bin/printf '%d\n' $'"\u2208'
>> /usr/bin/printf: warning: ��: character(s) following character constant have been ignored
>> 226
>>
>>
>> Do I have some wrong assumptions or should I report that as a bug?
>>
>>
>> Thanks,
>> Chris.
>>
>>
>> [0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
> 
> This is a limitation of current coreutils printf that only handles single byte chars currently.
> This email will open an issue in our bug tracker.
> 
> To summarize:
> $ ord() { printf "0x%x\n" "'$1"; }  # bash's printf
> $ ord 3
> 0x33
> $ ord $'\u2208'
> 0x2208
> 
> $ ord() { env printf "0x%x\n" "'$1"; }  # coreutils' printf
> $ ord 3
> 0x33
> $ ord $'\u2208'
> 0xprintf: warning: ��: character(s) following character constant have been ignored
> e2

The attached should fix this up.

Marking this as done.

cheers,
Pádraig
[printf-mb-values.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#54388; Package coreutils. (Fri, 18 Mar 2022 15:08:01 GMT) Full text and rfc822 format available.

Message #13 received at 54388 <at> debbugs.gnu.org (full text, mbox):

From: Christoph Anton Mitterer <calestyo <at> scientia.org>
To: Pádraig Brady <P <at> draigBrady.com>, 54388 <at> debbugs.gnu.org
Subject: Re: bug#54388: printf doesn't handle multi-byte values
Date: Fri, 18 Mar 2022 16:07:20 +0100
On Fri, 2022-03-18 at 14:59 +0000, Pádraig Brady wrote:
> The attached should fix this up.

Thanks!




Information forwarded to bug-coreutils <at> gnu.org:
bug#54388; Package coreutils. (Fri, 18 Mar 2022 15:42:02 GMT) Full text and rfc822 format available.

Message #16 received at 54388 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: calestyo <at> scientia.org, 54388 <at> debbugs.gnu.org
Subject: Re: bug#54388: printf doesn't handle multi-byte values
Date: Fri, 18 Mar 2022 15:41:11 +0000
On 18/03/2022 14:59, Pádraig Brady wrote:
> The attached should fix this up.

The following should make this more efficient for the normal unibyte case,
as one can't have NUL chars in any multi-byte encodings.

-      if (MB_CUR_MAX > 1)                                               \
+      if (MB_CUR_MAX > 1 && *(s + 1))                                   \





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 16 Apr 2022 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 65 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.