GNU bug report logs - #9780
sort -u throws out non-duplicates

Previous Next

Package: coreutils;

Reported by: Bernhard Rosenkraenzer <bero <at> bero.eu>

Date: Tue, 18 Oct 2011 01:04:02 UTC

Severity: normal

Tags: moreinfo

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eric Blake <eblake <at> redhat.com>
To: Bernhard Rosenkraenzer <bero <at> bero.eu>
Cc: 9780 <at> debbugs.gnu.org
Subject: bug#9780: sort -u throws out non-duplicates
Date: Mon, 17 Oct 2011 20:22:52 -0600
tag 9780 moreinfo
thanks

On 10/17/2011 06:59 PM, Bernhard Rosenkraenzer wrote:
> [bero <at> matterhorn tmp]$ wget http://bero.eu/java-source-list
> [...]
> [bero <at> matterhorn tmp]$ tr ' ' '\n' <java-source-list |sort |grep
> X509Certificate
> libcore/luni/src/main/java/java/security/cert/X509Certificate.java
> libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
>
> This is correct...
>
> [bero <at> matterhorn tmp]$ tr ' ' '\n' <java-source-list |sort -u |grep
> X509Certificate
> libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
>
> Note the missing .../java/java/security/cert/X509Certificate.java

Thanks for the report.  Unfortunately, you did not provide enough 
information to reproduce this - for example, what platform are you 
running on?  Can you narrow it down to a single file of say 5 or so 
lines?  Can you reproduce the problem with shorter input lines?

My guess, although I need more info to confirm it, is that this is not a 
bug, but rather that java-source-list contains some lines that differ in 
case and/or punctuation but happen to collate identically.  If so, then 
sort -u is picking the lower-case version as the unique line, at which 
point your grep for the case-sensitive X509Certificate is obviously failing.

The fact that you already proved that LC_ALL=C changes the behavior 
lends credence to my supposition, since C is byte-sensitive, but most 
other languages collate case-insensitively.  See also the FAQ:

https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

> The problem occurs (at least) with sort from coreutils 8.12, 8.13 and 8.14.

Use 'sort --debug' to help decipher sort's behavior.  Here's my 
demonstration that I cannot reproduce it using coreutils.git with just 
two input lines:

$ printf 
'libcore/luni/src/main/java/java/security/cert/X509Certificate.java\nlibcore/luni/src/main/java/javax/security/cert/X509Certificate.java\n' 
| sort -u --debug
sort: using `en_US.UTF-8' sorting rules
libcore/luni/src/main/java/java/security/cert/X509Certificate.java
__________________________________________________________________
libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
___________________________________________________________________

So there's definitely something else in java-source-list that we aren't 
seeing that is (probably correctly) affecting your output.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




This bug report was last modified 12 years and 278 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.