GNU bug report logs - #10880
multibyte: tr: TR operates on bytes, not characters

Previous Next

Package: coreutils;

Reported by: "Marton Kadar" <marton.kadar <at> mail.com>

Date: Fri, 24 Feb 2012 17:31:02 UTC

Severity: wishlist

Merged with 9365, 9569, 12192, 13362

Full log


View this message in rfc822 format

From: Chris Jones <cjns1989 <at> gmail.com>
To: 10880 <at> debbugs.gnu.org
Subject: bug#10880: instead of characters, tr works on bytes
Date: Mon, 27 Feb 2012 00:44:56 -0500
On Fri, Feb 24, 2012 at 09:29:12AM EST, Marton Kadar wrote:

[..]

> > $ set | grep ^L
> > LANG=hu_HU.UTF-8
> > LC_ALL=hu_HU.UTF-8
> > LINES=73
> > LOGNAME=kadar1marto518
> > 
> > Now let's see the bytestream for the following string
> > (which means flood in Hungarian):
> > 
> > $ echo árvíz | od -c
> > 0000000 303 241   r   v 303 255   z  \n
> > 0000010
> > 
> > Let us try to delete a character and see if it worked:
> > 
> > $ echo árvíz | tr -d á | od -c
> > 0000000   r   v 255   z  \n
> > 0000005

[..]

Try this for size...

$ echo árvíz | od -t x1z -w16 
$ echo árvíz | tr -d é | od -t x1z -w16 

$ echo árvíz | tr -d é > /tmp/u.txt
$ isutf8 /tmp/u.txt

And there is not even an ‘é’ in ‘árvíz’..

CJ

P.S. Though you do have to look for it a bit, the coreutils manual
clearly states that only single-byte encodings are supported: 

http://www.gnu.org/software/coreutils/manual/html_node/tr-invocation.html

-- 
Mooo Canada!!!!





This bug report was last modified 6 years and 304 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.