GNU bug report logs -
#69951
coreutils: printf formatting bug for nb_NO and nn_NO locales
Previous Next
Reported by: Thomas Dreibholz <dreibh <at> simula.no>
Date: Fri, 22 Mar 2024 22:11:01 UTC
Severity: normal
Tags: notabug
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Hi,
some further debugging of a hexdump output of printf, i.e.:
#!/bin/bash
for l in de_DE en_US nb_NO nn_NO ; do
echo "LC_NUMERIC=$l.UTF-8"
for n in 1 100 1000 10000 100000 1000000 10000000 ; do
LC_NUMERIC=$l.UTF-8 /usr/bin/printf "<%'10d>" $n | hexdump -C
done
done
The output is:
...
LC_NUMERIC=nb_NO.UTF-8
00000000 3c 20 20 20 20 20 20 20 20 20 31 3e |< 1>|
0000000c
00000000 3c 20 20 20 20 20 20 20 31 30 30 3e |< 100>|
0000000c
00000000 3c 20 20 20 31 e2 80 af 30 30 30 3e |< 1...000>|
0000000c
00000000 3c 20 20 31 30 e2 80 af 30 30 30 3e |< 10...000>|
0000000c
00000000 3c 20 31 30 30 e2 80 af 30 30 30 3e |< 100...000>|
0000000c
00000000 3c 31 e2 80 af 30 30 30 e2 80 af 30 30 30 3e
|<1...000...000>|
0000000f
00000000 3c 31 30 e2 80 af 30 30 30 e2 80 af 30 30 30 3e
|<10...000...000>|
00000010
LC_NUMERIC=nn_NO.UTF-8
00000000 3c 20 20 20 20 20 20 20 20 20 31 3e |< 1>|
0000000c
00000000 3c 20 20 20 20 20 20 20 31 30 30 3e |< 100>|
0000000c
00000000 3c 20 20 20 31 e2 80 af 30 30 30 3e |< 1...000>|
0000000c
00000000 3c 20 20 31 30 e2 80 af 30 30 30 3e |< 10...000>|
0000000c
00000000 3c 20 31 30 30 e2 80 af 30 30 30 3e |< 100...000>|
0000000c
00000000 3c 31 e2 80 af 30 30 30 e2 80 af 30 30 30 3e
|<1...000...000>|
0000000f
00000000 3c 31 30 e2 80 af 30 30 30 e2 80 af 30 30 30 3e
|<10...000...000>|
00000010
printf seems to insert a 3-byte UTF-8 character 0xe2 0x80 0xaf as
thousands separator. "0xe2 0x80 0xaf" is UTF-8 NARROW NO-BREAK SPACE ->
https://www.fileformat.info/info/unicode/char/202f/index.htm
<https://www.fileformat.info/info/unicode/char/202f/index.htm> . But
terminal output (tested with Konsole and XTerm) has fixed spacing, so
"narrow space" should probably be a regular space or regular
non-breakable space (0xc2 0xa0, HTML " ")? Note that also
LibreOffice cannot produce a correct screen output with UTF-8 NARROW
NO-BREAK SPACE, even with proportional fonts, when loading the output of
the test script as a text file.
Screenshots for illustration:
* Terminal output:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2058775/+attachment/5758462/+files/Screenshot_20240322_213947.png
* LibreOffice output:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2058775/+attachment/5758464/+files/Screenshot_20240322_222052.png
--
Best regards / Mit freundlichen Grüßen / Med vennlig hilsen
=======================================================================
Thomas Dreibholz
Simula Metropolitan Centre for Digital Engineering
Centre for Resilient Networks and Applications
Pilestredet 52
0167 Oslo, Norway
-----------------------------------------------------------------------
E-Mail:dreibh <at> simula.no
Homepage:http://simula.no/people/dreibh
=======================================================================
[Message part 2 (text/html, inline)]
[OpenPGP_signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 1 year and 55 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.