GNU bug report logs - #9365
multibyte: tr: TR operates on bytes, not characters

Previous Next

Package: coreutils;

Reported by: "Urmas" <davian818 <at> gmail.com>

Date: Thu, 25 Aug 2011 04:51:01 UTC

Severity: wishlist

Merged with 9569, 10880, 12192, 13362

Full log


Message #14 received at control <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Michael Stummvoll <michael <at> stummi.org>
Cc: 12192 <at> debbugs.gnu.org
Subject: Re: bug#12192: tr - bytes vs characters
Date: Sat, 15 Sep 2012 12:28:54 +0200
forcemerge 12192 9365
thanks

Michael Stummvoll wrote:
> Hi gnu folks,
>
> as already known, tr cannot handle multibyte-encodings like utf-8:
>
>> mst <at> eddie:~$ echo "foo" | tr o ö
>> fÃÃ
>
> i know, that multibyte encoding support is not needed for
> posix-compilance, BUT:
>
> the manpage of tr says the following:
>
>> Translate, squeeze, and/or delete characters from standard input,
>> writing to standard output.
>
> and thats the inconsistence imho.
>
> The typical interpretation of "character" in such a context means one
> character on display. regardless which encoding is used or how many
> bytes are used to display this. So, if tr realy translates "characters"
> it should preserve the encoding. If it doesn't do, it does not
> translate "characters" but "bytes". So there I see two ways:
>
> - add multybyte-encoding support to tr
> or
> - change the manpage and helptext to not say "characters" but "bytes"
>
> since it doesn't seem that somebody want to add the support to tr, an
> update of the manpage would be the easier way to ensure the consistence.

Thanks for the report.
I'm merging this issue with the others that relate to tr
and multi-byte support.




This bug report was last modified 6 years and 245 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.