Le ven. 1 mars 2024 à 20:30, Pádraig Brady <P@draigbrady.com> a écrit :
On 01/03/2024 15:33, lacsaP Patatetom wrote:
> hi,
>
> I did a few tests with tr and I'm surprised by the results...
>
> $ echo éèçà
> éèçà
>
> these characters are encoded in utf-8 on 2 bytes :
>
> $ echo éèçà | xxd
> 00000000: c3a9 c3a8 c3a7 c3a0 0a                   .........
>
> now I use tr to remove non-printable characters :
>
> $ echo éèçà | tr -cd '[:print:]'
> $ echo éèçà | tr -cd '[:print:]' | wc
>        0       0       0
>
> all characters are deleted by tr
> now I want to keep the "é" character :
>
> $ echo éèçà | tr -cd '[:print:]é'
> ��
>
> why do the "�" characters appear ?
>
> regards, lacsaP.


It's a known issue that tr is currently non multi-byte aware.

thanks,
Pádraig
hi,

thank you for this clarification.

what alternative to `tr` would you recommend for this type of treatment ?

regards, lacsaP.