GNU bug report logs -
#10880
multibyte: tr: TR operates on bytes, not characters
Previous Next
Full log
View this message in rfc822 format
Don't know which is the official way to report a bug in 'tr'
so I will copy to this list too. CC me on replies as I am not
subscribing.
> ----- Original Message -----
> From: Marton Kadar
> Sent: 02/24/12 03:18 PM
> To: 9365 <at> debbugs.gnu.org
> Subject: Example
>
> Environment for Hungary where á and í are proper lowercase letters
> but for example Spanish has these letters too:
>
> $ set | grep ^L
> LANG=hu_HU.UTF-8
> LC_ALL=hu_HU.UTF-8
> LINES=73
> LOGNAME=kadar1marto518
>
> Now let's see the bytestream for the following string
> (which means flood in Hungarian):
>
> $ echo árvíz | od -c
> 0000000 303 241 r v 303 255 z \n
> 0000010
>
> Let us try to delete a character and see if it worked:
>
> $ echo árvíz | tr -d á | od -c
> 0000000 r v 255 z \n
> 0000005
>
> Correct expected behavior would rather be:
>
> $ echo árvíz | tr -d á | od -c
> 0000000 r v 303 255 z \n
> 0000006
>
> I'll check the source for tr myself although never coded in C.
> This should be a trivial fix. The problem is especially annoying
> as we currently have no real simple and good general purpose case
> conversion tool. (correct me if I'm wrong, but tr should be this
> tool).
>
> Marton Kadar
This bug report was last modified 6 years and 303 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.