GNU bug report logs -
#9780
sort -u throws out non-duplicates
Previous Next
Reported by: Bernhard Rosenkraenzer <bero <at> bero.eu>
Date: Tue, 18 Oct 2011 01:04:02 UTC
Severity: normal
Tags: moreinfo
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
Message #13 received at control <at> debbugs.gnu.org (full text, mbox):
tag 9780 moreinfo
thanks
On 10/17/2011 06:59 PM, Bernhard Rosenkraenzer wrote:
> [bero <at> matterhorn tmp]$ wget http://bero.eu/java-source-list
> [...]
> [bero <at> matterhorn tmp]$ tr ' ' '\n' <java-source-list |sort |grep
> X509Certificate
> libcore/luni/src/main/java/java/security/cert/X509Certificate.java
> libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
>
> This is correct...
>
> [bero <at> matterhorn tmp]$ tr ' ' '\n' <java-source-list |sort -u |grep
> X509Certificate
> libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
>
> Note the missing .../java/java/security/cert/X509Certificate.java
Thanks for the report. Unfortunately, you did not provide enough
information to reproduce this - for example, what platform are you
running on? Can you narrow it down to a single file of say 5 or so
lines? Can you reproduce the problem with shorter input lines?
My guess, although I need more info to confirm it, is that this is not a
bug, but rather that java-source-list contains some lines that differ in
case and/or punctuation but happen to collate identically. If so, then
sort -u is picking the lower-case version as the unique line, at which
point your grep for the case-sensitive X509Certificate is obviously failing.
The fact that you already proved that LC_ALL=C changes the behavior
lends credence to my supposition, since C is byte-sensitive, but most
other languages collate case-insensitively. See also the FAQ:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
> The problem occurs (at least) with sort from coreutils 8.12, 8.13 and 8.14.
Use 'sort --debug' to help decipher sort's behavior. Here's my
demonstration that I cannot reproduce it using coreutils.git with just
two input lines:
$ printf
'libcore/luni/src/main/java/java/security/cert/X509Certificate.java\nlibcore/luni/src/main/java/javax/security/cert/X509Certificate.java\n'
| sort -u --debug
sort: using `en_US.UTF-8' sorting rules
libcore/luni/src/main/java/java/security/cert/X509Certificate.java
__________________________________________________________________
libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
___________________________________________________________________
So there's definitely something else in java-source-list that we aren't
seeing that is (probably correctly) affecting your output.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
This bug report was last modified 12 years and 278 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.