GNU bug report logs - #26362
multibyte: tr: "tr -cd" -- Problem with UTF-8?

Previous Next

Package: coreutils;

Reported by: Ronald Schaten <ronald <at> schatenseite.de>

Date: Tue, 4 Apr 2017 15:25:02 UTC

Severity: wishlist

Tags: notabug

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ronald Schaten <ronald <at> schatenseite.de>
To: bug-coreutils <at> gnu.org
Subject: tr -cd -- Problem with UTF-8?
Date: Tue, 4 Apr 2017 16:01:52 +0200
Hey...

I'm not sure if this is bug or if I'm using it wrong. As a matter of
fact, I tested this on several systems, and on BSD-based systems (Mac)
the tr tool gives different results -- the one I expected.

The simplest way to reproduce this looks like this (sorry, umlaut
ahead):

$ echo -ne "\xc3\x82" | tr -cd "ä" | xxd
% 00000000: c3                                       .

The echo prints a capital A with a circumflex (Â), and I expect the tr
command to delete everything except the small umlaut ä. It looks as if
tr just deletes the second byte.

When I try without the umlaut it gives me the empty result, as expected:

$ echo -ne "\xc3\x82" | tr -cd "a" | xxd
[empty result]

I tested several systems, the oldest is a Debian with coreutils 8.5, the
newest an Ubuntu with coreutils 8.25.


For the moment, I'll try to solve my problem differently, but... is this
a bug? Thanks in advance!


Regards,
Ronald.

-- 
There is no reason for any individual to have a computer in his home.
(Ken Olsen, DEC)




This bug report was last modified 6 years and 292 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.