GNU bug report logs - #7323
sort bug

Previous Next

Package: coreutils;

Reported by: Thomas A Schweiger <tom.schweiger <at> acxiom.com>

Date: Wed, 3 Nov 2010 14:53:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eric Blake <eblake <at> redhat.com>
To: Thomas A Schweiger <tom.schweiger <at> acxiom.com>
Cc: 7323 <at> debbugs.gnu.org
Subject: bug#7323: sort bug
Date: Wed, 03 Nov 2010 09:28:58 -0600
[Message part 1 (text/plain, inline)]
On 11/03/2010 08:52 AM, Thomas A Schweiger wrote:
> sort -t\| -k 10  source.dat

Most likely a bug in your usage, and not in sort.

> 
> 
> I get the following result:
> 
> 
> 7|1|1||MARY||JONES   |||19610202|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 5|1|1||MARY||JONES   |||19610203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 3|1|1||Andy||smith   |||19610203|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701| ||| |||||TEST||
> 2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || |Fayetteville|AR|72701 | ||| |||||TEST|| 
> 10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| ||| |||||TEST||
> 11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||
> 4|1|1||Andy||smith   |||19610302|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 8|1|1||MARY||JONES   |||19615292|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 6|1|1||MARY||JONES   |||19660203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 
> 
> Note in particular the location of record 9.

Where did you expect it to appear?  The latest coreutils 8.6 release
includes a --debug option that makes it more obvious what you did wrong
(I'm trimming down your example to a bare minimum):

$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | src/sort --debug -t\| -k2
src/sort: using `en_US.UTF-8' sorting rules
5|19610203|||||| 1400 |
 _____________________
______________________
9|1961020|||||| 315 |
 ___________________
____________________
1|19610203|||||| 315 |
 ____________________
_____________________

Notice that in the en_US.UTF-8 locale, punctuation does NOT affect
collation order.  And, since you explicitly requested that your key
start at field 10 and extend to the end of the line, 1961020315 (from
row 9) collates less than 19610203315 (from row 1).

But, if you instead require byte-wise sorting, and restrict your key to
JUST the field, you get results that I'm assuming you were expecting:

$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | LC_ALL=C src/sort --debug -t\| -k2,2
src/sort: using simple byte comparison
9|1961020|||||| 315 |
  _______
_____________________
1|19610203|||||| 315 |
  ________
______________________
5|19610203|||||| 1400 |
  ________
_______________________


> The information contained in this communication is confidential,

It is considered poor netiquette to send emails to publicly archived
lists with disclaimers like this, since the very nature of public
archival makes this clause unenforceable.  You are better off using a
secondary account that does not add your employer's disclaimer on the end.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 14 years and 222 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.