GNU bug report logs - #69951
coreutils: printf formatting bug for nb_NO and nn_NO locales

Previous Next

Package: coreutils;

Reported by: Thomas Dreibholz <dreibh <at> simula.no>

Date: Fri, 22 Mar 2024 22:11:01 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Thomas Dreibholz <dreibh <at> simula.no>
To: 69951 <at> debbugs.gnu.org
Subject: bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales
Date: Sat, 23 Mar 2024 12:39:59 +0100
[Message part 1 (text/plain, inline)]
Hi,

some further debugging of a hexdump output of printf, i.e.:

#!/bin/bash
for l in de_DE en_US nb_NO nn_NO ; do
   echo "LC_NUMERIC=$l.UTF-8"
   for n in 1 100 1000 10000 100000 1000000 10000000 ; do
      LC_NUMERIC=$l.UTF-8 /usr/bin/printf "<%'10d>" $n | hexdump -C
   done
done

The output is:

...
LC_NUMERIC=nb_NO.UTF-8
00000000  3c 20 20 20 20 20 20 20  20 20 31 3e              |<         1>|
0000000c
00000000  3c 20 20 20 20 20 20 20  31 30 30 3e              |<       100>|
0000000c
00000000  3c 20 20 20 31 e2 80 af  30 30 30 3e              |<   1...000>|
0000000c
00000000  3c 20 20 31 30 e2 80 af  30 30 30 3e              |<  10...000>|
0000000c
00000000  3c 20 31 30 30 e2 80 af  30 30 30 3e              |< 100...000>|
0000000c
00000000  3c 31 e2 80 af 30 30 30  e2 80 af 30 30 30 3e 
    |<1...000...000>|
0000000f
00000000  3c 31 30 e2 80 af 30 30  30 e2 80 af 30 30 30 3e 
 |<10...000...000>|
00000010
LC_NUMERIC=nn_NO.UTF-8
00000000  3c 20 20 20 20 20 20 20  20 20 31 3e              |<         1>|
0000000c
00000000  3c 20 20 20 20 20 20 20  31 30 30 3e              |<       100>|
0000000c
00000000  3c 20 20 20 31 e2 80 af  30 30 30 3e              |<   1...000>|
0000000c
00000000  3c 20 20 31 30 e2 80 af  30 30 30 3e              |<  10...000>|
0000000c
00000000  3c 20 31 30 30 e2 80 af  30 30 30 3e              |< 100...000>|
0000000c
00000000  3c 31 e2 80 af 30 30 30  e2 80 af 30 30 30 3e 
    |<1...000...000>|
0000000f
00000000  3c 31 30 e2 80 af 30 30  30 e2 80 af 30 30 30 3e 
 |<10...000...000>|
00000010

printf seems to insert a 3-byte UTF-8 character 0xe2 0x80 0xaf as 
thousands separator. "0xe2 0x80 0xaf" is UTF-8 NARROW NO-BREAK SPACE -> 
https://www.fileformat.info/info/unicode/char/202f/index.htm 
<https://www.fileformat.info/info/unicode/char/202f/index.htm> . But 
terminal output (tested with Konsole and XTerm) has fixed spacing, so 
"narrow space" should probably be a regular space or regular 
non-breakable space (0xc2 0xa0, HTML "&nbsp;")? Note that also 
LibreOffice cannot produce a correct screen output with UTF-8 NARROW 
NO-BREAK SPACE, even with proportional fonts, when loading the output of 
the test script as a text file.

Screenshots for illustration:

 * Terminal output:
   https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2058775/+attachment/5758462/+files/Screenshot_20240322_213947.png
 * LibreOffice output:
   https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2058775/+attachment/5758464/+files/Screenshot_20240322_222052.png

-- 
Best regards / Mit freundlichen Grüßen / Med vennlig hilsen

=======================================================================
 Thomas Dreibholz

 Simula Metropolitan Centre for Digital Engineering
 Centre for Resilient Networks and Applications
 Pilestredet 52
 0167 Oslo, Norway
-----------------------------------------------------------------------
 E-Mail:dreibh <at> simula.no
 Homepage:http://simula.no/people/dreibh
=======================================================================

[Message part 2 (text/html, inline)]
[OpenPGP_signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 1 year and 55 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.