#69951 - coreutils: printf formatting bug for nb_NO and nn_NO locales

GNU bug report logs - #69951
coreutils: printf formatting bug for nb_NO and nn_NO locales

Reported by: Thomas Dreibholz <dreibh <at> simula.no>

Date: Fri, 22 Mar 2024 22:11:01 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Message #8 received at 69951 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com> To: Thomas Dreibholz <dreibh <at> simula.no>, 69951 <at> debbugs.gnu.org Subject: Re: coreutils: printf formatting bug for nb_NO and nn_NO locales Date: Sat, 23 Mar 2024 14:39:04 +0000

tag 69951 notabug close 69951 stop On 22/03/2024 20:22, Thomas Dreibholz wrote: > Hi, > > I just discovered a printf bug for at least the nb_NO and nn_NO locales > when printing numbers with thousands separator. To reproduce: > > #!/bin/bash > for l in de_DE nb_NO ; do > echo "LC_NUMERIC=$l.UTF-8" > for n in 1 100 1000 10000 100000 1000000 10000000 ; do > LC_NUMERIC=$l.UTF-8 /usr/bin/printf "<%'10d>\n" $n > done > done > > The expected output of "%'10d" is a right-formatted number string with > 10 characters. > > The output of the test script is fine for e.g. LC_NUMERIC=de_DE.UTF-8 > and LC_NUMERIC=en_US.UTF-8: > > LC_NUMERIC=de_DE.UTF-8 > < 1> > < 100> > < 1.000> > < 10.000> > < 100.000> > < 1.000.000> > <10.000.000> > However, for LC_NUMERIC=nb_NO.UTF-8 and LC_NUMERIC=nn_NO.UTF-8, the > formatting is wrong: > > LC_NUMERIC=nb_NO.UTF-8 > < 1> > < 100> > < 1 000> > < 10 000> > < 100 000> > <1 000 000> > <10 000 000> > I reproduced the issue with coreutils-8.32-4.1ubuntu1.1 (Ubuntu 22.04) > as well as coreutils-9.3-5.fc39.x86_64 (Fedora 39). > > Under FreeBSD 14.0-RELEASE (coreutils-9.4_1), the output looks slightly > better but is still wrong: > > LC_NUMERIC=nb_NO.UTF-8 > < 1> > < 100> > < 1 000> > < 10 000> > < 100 000> > <1 000 000> > <10 000 000> > LC_NUMERIC=nn_NO.UTF-8 > < 1> > < 100> > < 1 000> > < 10 000> > < 100 000> > <1 000 000> > <10 000 000> > > May be the issue is that the thousands separator for the Norwegian > locales is a space " ", while it is "."/"," for German/US English locales. The issue looks to be that the thousands separator for Norwegian locales is “NARROW NO-BREAK SPACE", or more problematically the _three_ byte UTF8 sequence E2 80 AF. So it looks like an issue with libc routines counting bytes rather than characters in this case. One suggestion is to do the alignment after. For example: $ export LC_NUMERIC=nb_NO.UTF-8 $ printf "%'.f\n" $(seq -f '1E%.f' 7) | column --table-right=1 -t 10 100 1 000 10 000 100 000 1 000 000 10 000 000 Actually I've just noticed that specifying the %'10.f format does count characters and not bytes! So another solution is: $ export LC_NUMERIC=nb_NO.UTF-8 $ printf "%'10.f\n" $(seq -f '1E%.f' 7) 10 100 1 000 10 000 100 000 1 000 000 10 000 000 The issue if there is one is in libc at least. It would be worth checking existing glibc reports about this and reporting if not mentioned. cheers, Pádraig.

This bug report was last modified 1 year and 117 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #69951 coreutils: printf formatting bug for nb_NO and nn_NO locales

GNU bug report logs - #69951
coreutils: printf formatting bug for nb_NO and nn_NO locales