GNU bug report logs - #69488
tr (question)

Previous Next

Package: coreutils;

Reported by: lacsaP Patatetom <patatetom <at> gmail.com>

Date: Fri, 1 Mar 2024 15:35:02 UTC

Severity: normal

Full log


Message #8 received at 69488 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: lacsaP Patatetom <patatetom <at> gmail.com>, 69488 <at> debbugs.gnu.org
Subject: Re: bug#69488: tr (question)
Date: Fri, 1 Mar 2024 19:30:33 +0000
On 01/03/2024 15:33, lacsaP Patatetom wrote:
> hi,
> 
> I did a few tests with tr and I'm surprised by the results...
> 
> $ echo éèçà
> éèçà
> 
> these characters are encoded in utf-8 on 2 bytes :
> 
> $ echo éèçà | xxd
> 00000000: c3a9 c3a8 c3a7 c3a0 0a                   .........
> 
> now I use tr to remove non-printable characters :
> 
> $ echo éèçà | tr -cd '[:print:]'
> $ echo éèçà | tr -cd '[:print:]' | wc
>        0       0       0
> 
> all characters are deleted by tr
> now I want to keep the "é" character :
> 
> $ echo éèçà | tr -cd '[:print:]é'
> ��
> 
> why do the "�" characters appear ?
> 
> regards, lacsaP.


It's a known issue that tr is currently non multi-byte aware.

thanks,
Pádraig




This bug report was last modified 1 year and 103 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.