GNU bug report logs - #10880
multibyte: tr: TR operates on bytes, not characters

Previous Next

Package: coreutils;

Reported by: "Marton Kadar" <marton.kadar <at> mail.com>

Date: Fri, 24 Feb 2012 17:31:02 UTC

Severity: wishlist

Merged with 9365, 9569, 12192, 13362

Full log


View this message in rfc822 format

From: "Marton Kadar" <marton.kadar <at> mail.com>
To: 10880 <at> debbugs.gnu.org
Subject: bug#10880: instead of characters, tr works on bytes
Date: Fri, 24 Feb 2012 09:29:12 -0500
Don't know which is the official way to report a bug in 'tr'
so I will copy to this list too. CC me on replies as I am not
subscribing.

> ----- Original Message -----
> From: Marton Kadar
> Sent: 02/24/12 03:18 PM
> To: 9365 <at> debbugs.gnu.org
> Subject: Example
> 
> Environment for Hungary where á and í are proper lowercase letters
> but for example Spanish has these letters too:
> 
> $ set | grep ^L
> LANG=hu_HU.UTF-8
> LC_ALL=hu_HU.UTF-8
> LINES=73
> LOGNAME=kadar1marto518
> 
> Now let's see the bytestream for the following string
> (which means flood in Hungarian):
> 
> $ echo árvíz | od -c
> 0000000 303 241   r   v 303 255   z  \n
> 0000010
> 
> Let us try to delete a character and see if it worked:
> 
> $ echo árvíz | tr -d á | od -c
> 0000000   r   v 255   z  \n
> 0000005
> 
> Correct expected behavior would rather be:
> 
> $ echo árvíz | tr -d á | od -c
> 0000000   r   v 303 255   z  \n
> 0000006
> 
> I'll check the source for tr myself although never coded in C.
> This should be a trivial fix. The problem is especially annoying
> as we currently have no real simple and good general purpose case
> conversion tool. (correct me if I'm wrong, but tr should be this
> tool).
> 
> Marton Kadar





This bug report was last modified 6 years and 303 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.