GNU bug report logs -
#17196
multibyte: printf: %s counts bytes instead of characters
Previous Next
Full log
Message #17 received at 17196 <at> debbugs.gnu.org (full text, mbox):
On 04/06/2014 07:24 PM, Bob Proulx wrote:
> Pádraig Brady wrote:
>> Yes printf follows the C standard which only considers bytes.
>> ...
>> I don't think we'd be able to change the current operation of printf
>> due to backwards compat reasons? Though we might be able to somehow leverage
>> the existing multibyte character aware alignment/truncation code in:
>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD
>
> Dan Douglas pointed out in the corresponding discussion in bug-bash
> that ksh uses the L modifier.
>
> http://lists.gnu.org/archive/html/bug-bash/2014-04/msg00021.html
>
> Dan Douglas wrote:
> > ksh93 already has this feature using the "L" modifier:
> >
> > ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
> > ★★★
>
> At least there is prior art for it.
So we can count bytes, chars or cells (graphemes).
Thinking a bit more about it, I think shell level printf
should be dealing in text of the current encoding and counting cells.
In the edge case where you want to deal in bytes one can do:
LC_ALL=C printf ...
I see that ksh behaves as I would expect and counts cells,
though requires the explicit %L enabler:
$ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
á★★
$ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
A★
$ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'"
A
zsh seems to just count characters:
$ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
á★
$ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
á★
$ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
A★★
I see that dash gives invalid directive for any of %ls %Ls %S.
Pity there is no consensus here.
Personally I would go for:
printf '%3s' 'blah' # count cells
printf '%3Ls' 'blah' # count chars
LANG=C '%3Ls' 'blah' # count bytes
LANG=C '%3s' 'blah' # count bytes
Pádraig.
This bug report was last modified 6 years and 250 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.