GNU bug report logs - #16168
uniq mis-handles UTF8 (8bit) characters

Previous Next

Package: coreutils;

Reported by: Shlomo Urbach <urbach <at> google.com>

Date: Mon, 16 Dec 2013 16:56:03 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Shlomo Urbach <urbach <at> google.com>
To: bug-coreutils <at> gnu.org
Subject: uniq mis-handles UTF8 (8bit) characters
Date: Mon, 16 Dec 2013 15:50:15 +0200
[Message part 1 (text/plain, inline)]
Lines with CJK letters are deemed equal by length only, since the
characters seem to be ignored.
I understand this is due to locale.
But, it would be nice if a simple flag would do a locale-free comparison
(i.e. equal = all bytes are equal).
[Message part 2 (text/html, inline)]

This bug report was last modified 11 years and 164 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.