GNU bug report logs -
#11968
Bug in "uniq"
Previous Next
Reported by: Jaime Gaspar <mail <at> jaimegaspar.com>
Date: Tue, 17 Jul 2012 21:30:03 UTC
Severity: normal
Tags: notabug
Merged with 11967
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
Message #9 received at control <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
forcemerge 11967 11968
tag 11967 notabug
thanks
On 07/17/2012 12:17 PM, Jaime Gaspar wrote:
> I think that there is a bug in "uniq" (version 8.13).
Is this your distro's build? However, I repeated your claim with the
latest coreutils.git (post-8.17)., so this is not likely to be a bug in
a distro-specific multibyte patch.
>
> The file "bug.txt" attached consists of two lines:
> - the first one containing a character that
> looks like a "v" and a line break;
> - the second one containing a character that
> looks like a upside down "v" and a line break.
> In hex:
>
> E2 88 A8 0A
> E2 88 A7 0A
Those glyphs that you describe line up with Unicode characters. I bet
you are using a locale with UTF-8 character encoding.
>
> When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line, so "uniq" thinks that the two lines are equal, but they are not.
I can reproduce your symptoms, but only when I fudge my locale:
$ LC_ALL=C uniq ../bug.txt
∨
∧
$ LC_ALL=en_US.UTF-8 uniq ../bug.txt
∨
$
Remember, 'uniq' is required by POSIX to use the same line comparison
techniques as 'sort'; and 'sort' is required to use strcoll() (not
strcmp) to compare lines. And in your particular choice of locale,
strcoll() happens to state that '∨' and '∧' collate identically; hence
uniq is correct in stating that you have a duplicated line according to
your current locale.
$ LC_ALL=en_US.UTF-8 sort ../bug.txt -u --debug
sort: using ‘en_US.UTF-8’ sorting rules
∨
_
$
So I'm closing this as not a bug, along with a final pointer to our FAQ:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
--
Eric Blake eblake <at> redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 13 years and 6 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.