GNU bug report logs - #11967
Bug in "uniq"

Previous Next

Package: coreutils;

Reported by: Jaime Gaspar <mail <at> jaimegaspar.com>

Date: Tue, 17 Jul 2012 21:30:02 UTC

Severity: normal

Tags: notabug

Merged with 11968

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Jaime Gaspar <mail <at> jaimegaspar.com>
Subject: bug#11967: closed (Re: bug#11967: Bug in "uniq")
Date: Tue, 17 Jul 2012 21:56:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#11967: Bug in "uniq"

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 11967 <at> debbugs.gnu.org.

-- 
11967: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11967
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Eric Blake <eblake <at> redhat.com>
To: Jaime Gaspar <mail <at> jaimegaspar.com>
Cc: control <at> debbugs.gnu.org, 11967-done <at> debbugs.gnu.org
Subject: Re: bug#11967: Bug in "uniq"
Date: Tue, 17 Jul 2012 15:49:38 -0600
[Message part 3 (text/plain, inline)]
forcemerge 11967 11968
tag 11967 notabug
thanks

On 07/17/2012 12:17 PM, Jaime Gaspar wrote:
> I think that there is a bug in "uniq" (version 8.13).

Is this your distro's build?  However, I repeated your claim with the
latest coreutils.git (post-8.17)., so this is not likely to be a bug in
a distro-specific multibyte patch.

> 
> The file "bug.txt" attached consists of two lines:
> - the first one containing a character that
>   looks like a "v" and a line break;
> - the second one containing a character that
>   looks like a upside down "v" and a line break.
> In hex:
> 
>     E2 88 A8  0A
>     E2 88 A7  0A

Those glyphs that you describe line up with Unicode characters.  I bet
you are using a locale with UTF-8 character encoding.

> 
> When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line, so "uniq" thinks that the two lines are equal, but they are not.

I can reproduce your symptoms, but only when I fudge my locale:

$ LC_ALL=C uniq ../bug.txt
∨
∧
$ LC_ALL=en_US.UTF-8 uniq ../bug.txt
∨
$

Remember, 'uniq' is required by POSIX to use the same line comparison
techniques as 'sort'; and 'sort' is required to use strcoll() (not
strcmp) to compare lines.  And in your particular choice of locale,
strcoll() happens to state that '∨' and '∧' collate identically; hence
uniq is correct in stating that you have a duplicated line according to
your current locale.

$ LC_ALL=en_US.UTF-8 sort ../bug.txt -u --debug
sort: using ‘en_US.UTF-8’ sorting rules
∨
_
$

So I'm closing this as not a bug, along with a final pointer to our FAQ:

https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

-- 
Eric Blake   eblake <at> redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org



[signature.asc (application/pgp-signature, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Jaime Gaspar <mail <at> jaimegaspar.com>
To: bug-coreutils <at> gnu.org
Subject: Bug in "uniq"
Date: Tue, 17 Jul 2012 10:17:43 -0800
Dear Sir or Madam,

I think that there is a bug in "uniq" (version 8.13).

The file "bug.txt" attached consists of two lines:
- the first one containing a character that
  looks like a "v" and a line break;
- the second one containing a character that
  looks like a upside down "v" and a line break.
In hex:

    E2 88 A8  0A
    E2 88 A7  0A

When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line, so "uniq" thinks that the two lines are equal, but they are not.

Regards,
Jaime Gaspar
_____________________________
Homepage: www.jaimegaspar.com
E-mail: mail <at> jaimegaspar.com

____________________________________________________________
Send any screenshot to your friends in seconds...
Works in all emails, instant messengers, blogs, forums and social networks.
TRY IM TOOLPACK at http://www.imtoolpack.com/default.aspx?rc=if2 for FREE





This bug report was last modified 13 years and 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.