GNU bug report logs -
#10880
multibyte: tr: TR operates on bytes, not characters
Previous Next
Full log
View this message in rfc822 format
On Fri, Feb 24, 2012 at 09:29:12AM EST, Marton Kadar wrote:
[..]
> > $ set | grep ^L
> > LANG=hu_HU.UTF-8
> > LC_ALL=hu_HU.UTF-8
> > LINES=73
> > LOGNAME=kadar1marto518
> >
> > Now let's see the bytestream for the following string
> > (which means flood in Hungarian):
> >
> > $ echo árvíz | od -c
> > 0000000 303 241 r v 303 255 z \n
> > 0000010
> >
> > Let us try to delete a character and see if it worked:
> >
> > $ echo árvíz | tr -d á | od -c
> > 0000000 r v 255 z \n
> > 0000005
[..]
Try this for size...
$ echo árvíz | od -t x1z -w16
$ echo árvíz | tr -d é | od -t x1z -w16
$ echo árvíz | tr -d é > /tmp/u.txt
$ isutf8 /tmp/u.txt
And there is not even an ‘é’ in ‘árvíz’..
CJ
P.S. Though you do have to look for it a bit, the coreutils manual
clearly states that only single-byte encodings are supported:
http://www.gnu.org/software/coreutils/manual/html_node/tr-invocation.html
--
Mooo Canada!!!!
This bug report was last modified 6 years and 304 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.