GNU bug report logs - #9365
multibyte: tr: TR operates on bytes, not characters

Previous Next

Package: coreutils;

Reported by: "Urmas" <davian818 <at> gmail.com>

Date: Thu, 25 Aug 2011 04:51:01 UTC

Severity: wishlist

Merged with 9569, 10880, 12192, 13362

To reply to this bug, email your comments to 9365 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9365; Package coreutils. (Thu, 25 Aug 2011 04:51:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Urmas" <davian818 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 25 Aug 2011 04:51:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Urmas" <davian818 <at> gmail.com>
To: <bug-coreutils <at> gnu.org>
Subject: TR operates on bytes, not characters
Date: Thu, 25 Aug 2011 09:01:14 +0700
[coreutils 8.5]
tr is treating UTF-8 characters in SET1/SET2 as byte sequences.
Correct either  this, or manual, which states that SET1/2 are of 'characters', not bytes.
 

Information forwarded to bug-coreutils <at> gnu.org:
bug#9365; Package coreutils. (Fri, 24 Feb 2012 17:30:02 GMT) Full text and rfc822 format available.

Message #8 received at 9365 <at> debbugs.gnu.org (full text, mbox):

From: "Marton Kadar" <marton.kadar <at> mail.com>
To: 9365 <at> debbugs.gnu.org
Subject: Example
Date: Fri, 24 Feb 2012 09:18:24 -0500
Environment for Hungary where á and í are proper lowercase letters
but for example Spanish has these letters too:

$ set | grep ^L
LANG=hu_HU.UTF-8
LC_ALL=hu_HU.UTF-8
LINES=73
LOGNAME=kadar1marto518

Now let's see the bytestream for the following string
(which means flood in Hungarian):

$ echo árvíz | od -c
0000000 303 241   r   v 303 255   z  \n
0000010

Let us try to delete a character and see if it worked:

$ echo árvíz | tr -d á | od -c
0000000   r   v 255   z  \n
0000005

Correct expected behavior would rather be:

$ echo árvíz | tr -d á | od -c
0000000   r   v 303 255   z  \n
0000006

I'll check the source for tr myself although never coded in C.
This should be a trivial fix. The problem is especially annoying
as we currently have no real simple and good general purpose case
conversion tool. (correct me if I'm wrong, but tr should be this
tool).

Marton Kadar




Forcibly Merged 9365 9569. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Fri, 24 Feb 2012 18:29:02 GMT) Full text and rfc822 format available.

Forcibly Merged 9365 9569 10880. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Fri, 24 Feb 2012 18:33:02 GMT) Full text and rfc822 format available.

Forcibly Merged 9365 9569 10880 12192. Request was from Jim Meyering <jim <at> meyering.net> to control <at> debbugs.gnu.org. (Sat, 15 Sep 2012 10:30:03 GMT) Full text and rfc822 format available.

Forcibly Merged 9365 9569 10880 12192 13362. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Sun, 06 Jan 2013 12:24:03 GMT) Full text and rfc822 format available.

Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 15 Oct 2018 14:07:02 GMT) Full text and rfc822 format available.

Changed bug title to 'multibyte: tr: TR operates on bytes, not characters' from 'TR operates on bytes, not characters' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 15 Oct 2018 14:07:02 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 244 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.