GNU bug report logs -
#7323
sort bug
Previous Next
Full log
Message #8 received at 7323 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 11/03/2010 08:52 AM, Thomas A Schweiger wrote:
> sort -t\| -k 10 source.dat
Most likely a bug in your usage, and not in sort.
>
>
> I get the following result:
>
>
> 7|1|1||MARY||JONES |||19610202|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 5|1|1||MARY||JONES |||19610203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 3|1|1||Andy||smith |||19610203|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701| ||| |||||TEST||
> 2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || |Fayetteville|AR|72701 | ||| |||||TEST||
> 10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| ||| |||||TEST||
> 11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||
> 4|1|1||Andy||smith |||19610302|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 8|1|1||MARY||JONES |||19615292|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 6|1|1||MARY||JONES |||19660203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
>
>
> Note in particular the location of record 9.
Where did you expect it to appear? The latest coreutils 8.6 release
includes a --debug option that makes it more obvious what you did wrong
(I'm trimming down your example to a bare minimum):
$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | src/sort --debug -t\| -k2
src/sort: using `en_US.UTF-8' sorting rules
5|19610203|||||| 1400 |
_____________________
______________________
9|1961020|||||| 315 |
___________________
____________________
1|19610203|||||| 315 |
____________________
_____________________
Notice that in the en_US.UTF-8 locale, punctuation does NOT affect
collation order. And, since you explicitly requested that your key
start at field 10 and extend to the end of the line, 1961020315 (from
row 9) collates less than 19610203315 (from row 1).
But, if you instead require byte-wise sorting, and restrict your key to
JUST the field, you get results that I'm assuming you were expecting:
$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | LC_ALL=C src/sort --debug -t\| -k2,2
src/sort: using simple byte comparison
9|1961020|||||| 315 |
_______
_____________________
1|19610203|||||| 315 |
________
______________________
5|19610203|||||| 1400 |
________
_______________________
> The information contained in this communication is confidential,
It is considered poor netiquette to send emails to publicly archived
lists with disclaimers like this, since the very nature of public
archival makes this clause unenforceable. You are better off using a
secondary account that does not add your employer's disclaimer on the end.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 14 years and 222 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.