GNU bug report logs - #17196
multibyte: printf: %s counts bytes instead of characters

Previous Next

Package: coreutils;

Reported by: Jan Novak <jn <at> turbo.sk>

Date: Sat, 5 Apr 2014 23:22:01 UTC

Severity: wishlist

Full log


Message #8 received at 17196 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jan Novak <jn <at> turbo.sk>
Cc: 17196 <at> debbugs.gnu.org
Subject: Re: bug#17196: UTF-8 printf string formating  problem
Date: Sun, 06 Apr 2014 11:15:46 +0100
On 04/06/2014 12:17 AM, Jan Novak wrote:
> Hello,
> 
> printf string format counts bytes instead of chars, which leads to broken output ...
> (the same problem occurs with bash built in printf)
> 
> 
> just try this:
> 
> $ echo $LANG
> us_US.UTF-8
> 
> 
> $ printf "|%3s|\n" "a"
> |  a|
> 
> $ printf "|%3s|\n" "á"     (char is a-acute)
> | á|
> 
> expected output:
> |  á|
> 
> Is there some easy solution ?
> 
> TIA for the answer

Yes printf follows the C standard which only considers bytes.
awk does respect characters in width specifiers though:

  $ awk 'BEGIN{printf "|%3s|\n", "á"}'
  |  á|

I don't think we'd be able to change the current operation of printf
due to backwards compat reasons? Though we might be able to somehow leverage
the existing multibyte character aware alignment/truncation code in:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD

thanks,
Pádraig.




This bug report was last modified 6 years and 250 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.