GNU bug report logs - #16168
uniq mis-handles UTF8 (8bit) characters

Previous Next

Package: coreutils;

Reported by: Shlomo Urbach <urbach <at> google.com>

Date: Mon, 16 Dec 2013 16:56:03 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Shlomo Urbach <urbach <at> google.com>
Subject: bug#16168: closed (Re: bug#16168: uniq mis-handles UTF8 (8bit)
 characters)
Date: Mon, 16 Dec 2013 17:34:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#16168: uniq mis-handles UTF8 (8bit) characters

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 16168 <at> debbugs.gnu.org.

-- 
16168: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16168
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Pádraig Brady <P <at> draigBrady.com>
To: Shlomo Urbach <urbach <at> google.com>
Cc: 16168-done <at> debbugs.gnu.org
Subject: Re: bug#16168: uniq mis-handles UTF8 (8bit) characters
Date: Mon, 16 Dec 2013 17:33:23 +0000
tag 16168 notabug
close 16168
stop

On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
> Lines with CJK letters are deemed equal by length only, since the
> characters seem to be ignored.
> I understand this is due to locale.
> But, it would be nice if a simple flag would do a locale-free comparison
> (i.e. equal = all bytes are equal).

If you want to compare byte by byte:

LC_ALL=C uniq ....

thanks,
Pǽdraig.

[Message part 3 (message/rfc822, inline)]
From: Shlomo Urbach <urbach <at> google.com>
To: bug-coreutils <at> gnu.org
Subject: uniq mis-handles UTF8 (8bit) characters
Date: Mon, 16 Dec 2013 15:50:15 +0200
[Message part 4 (text/plain, inline)]
Lines with CJK letters are deemed equal by length only, since the
characters seem to be ignored.
I understand this is due to locale.
But, it would be nice if a simple flag would do a locale-free comparison
(i.e. equal = all bytes are equal).
[Message part 5 (text/html, inline)]

This bug report was last modified 11 years and 164 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.