GNU bug report logs - #16168
uniq mis-handles UTF8 (8bit) characters

Previous Next

Package: coreutils;

Reported by: Shlomo Urbach <urbach <at> google.com>

Date: Mon, 16 Dec 2013 16:56:03 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Pádraig Brady <P <at> draigBrady.com>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#16168: closed (uniq mis-handles UTF8 (8bit) characters)
Date: Mon, 16 Dec 2013 17:34:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Mon, 16 Dec 2013 17:33:23 +0000
with message-id <52AF3963.6020003 <at> draigBrady.com>
and subject line Re: bug#16168: uniq mis-handles UTF8 (8bit) characters
has caused the debbugs.gnu.org bug report #16168,
regarding uniq mis-handles UTF8 (8bit) characters
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
16168: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16168
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Shlomo Urbach <urbach <at> google.com>
To: bug-coreutils <at> gnu.org
Subject: uniq mis-handles UTF8 (8bit) characters
Date: Mon, 16 Dec 2013 15:50:15 +0200
[Message part 3 (text/plain, inline)]
Lines with CJK letters are deemed equal by length only, since the
characters seem to be ignored.
I understand this is due to locale.
But, it would be nice if a simple flag would do a locale-free comparison
(i.e. equal = all bytes are equal).
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Pádraig Brady <P <at> draigBrady.com>
To: Shlomo Urbach <urbach <at> google.com>
Cc: 16168-done <at> debbugs.gnu.org
Subject: Re: bug#16168: uniq mis-handles UTF8 (8bit) characters
Date: Mon, 16 Dec 2013 17:33:23 +0000
tag 16168 notabug
close 16168
stop

On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
> Lines with CJK letters are deemed equal by length only, since the
> characters seem to be ignored.
> I understand this is due to locale.
> But, it would be nice if a simple flag would do a locale-free comparison
> (i.e. equal = all bytes are equal).

If you want to compare byte by byte:

LC_ALL=C uniq ....

thanks,
Pǽdraig.


This bug report was last modified 11 years and 164 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.