GNU bug report logs - #69951
coreutils: printf formatting bug for nb_NO and nn_NO locales

Previous Next

Package: coreutils;

Reported by: Thomas Dreibholz <dreibh <at> simula.no>

Date: Fri, 22 Mar 2024 22:11:01 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 69951 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Thomas Dreibholz <dreibh <at> simula.no>, 69951 <at> debbugs.gnu.org
Subject: Re: coreutils: printf formatting bug for nb_NO and nn_NO locales
Date: Sat, 23 Mar 2024 14:39:04 +0000
tag 69951 notabug
close 69951
stop

On 22/03/2024 20:22, Thomas Dreibholz wrote:
> Hi,
> 
> I just discovered a printf bug for at least the nb_NO and nn_NO locales
> when printing numbers with thousands separator. To reproduce:
> 
> #!/bin/bash
> for l in de_DE nb_NO ; do
>      echo "LC_NUMERIC=$l.UTF-8"
>      for n in 1 100 1000 10000 100000 1000000 10000000 ; do
>         LC_NUMERIC=$l.UTF-8 /usr/bin/printf "<%'10d>\n" $n
>      done
> done
> 
> The expected output of "%'10d" is a right-formatted number string with
> 10 characters.
> 
> The output of the test script is fine for e.g. LC_NUMERIC=de_DE.UTF-8
> and LC_NUMERIC=en_US.UTF-8:
> 
> LC_NUMERIC=de_DE.UTF-8
> <         1>
> <       100>
> <     1.000>
> <    10.000>
> <   100.000>
> < 1.000.000>
> <10.000.000>

> However, for LC_NUMERIC=nb_NO.UTF-8 and LC_NUMERIC=nn_NO.UTF-8, the
> formatting is wrong:
> 
> LC_NUMERIC=nb_NO.UTF-8
> <         1>
> <       100>
> <   1 000>
> <  10 000>
> < 100 000>
> <1 000 000>
> <10 000 000>

> I reproduced the issue with coreutils-8.32-4.1ubuntu1.1 (Ubuntu 22.04)
> as well as coreutils-9.3-5.fc39.x86_64 (Fedora 39).
> 
> Under FreeBSD 14.0-RELEASE (coreutils-9.4_1), the output looks slightly
> better but is still wrong:
> 
> LC_NUMERIC=nb_NO.UTF-8
> <         1>
> <       100>
> <    1 000>
> <   10 000>
> <  100 000>
> <1 000 000>
> <10 000 000>
> LC_NUMERIC=nn_NO.UTF-8
> <         1>
> <       100>
> <    1 000>
> <   10 000>
> <  100 000>
> <1 000 000>
> <10 000 000>
> 
> May be the issue is that the thousands separator for the Norwegian
> locales is a space " ", while it is "."/"," for German/US English locales.

The issue looks to be that the thousands separator for Norwegian locales
is “NARROW NO-BREAK SPACE", or more problematically the _three_ byte
UTF8 sequence E2 80 AF. So it looks like an issue with libc routines
counting bytes rather than characters in this case.

One suggestion is to do the alignment after. For example:

$ export LC_NUMERIC=nb_NO.UTF-8
$ printf "%'.f\n" $(seq -f '1E%.f' 7) | column --table-right=1 -t
        10
       100
     1 000
    10 000
   100 000
 1 000 000
10 000 000

Actually I've just noticed that specifying the %'10.f format
does count characters and not bytes! So another solution is:

$ export LC_NUMERIC=nb_NO.UTF-8
$ printf "%'10.f\n" $(seq -f '1E%.f' 7)
        10
       100
     1 000
    10 000
   100 000
 1 000 000
10 000 000

The issue if there is one is in libc at least.
It would be worth checking existing glibc reports about this
and reporting if not mentioned.

cheers,
Pádraig.




This bug report was last modified 1 year and 54 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.