GNU bug report logs -
#7323
sort bug
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 7323 in the body.
You can then email your comments to 7323 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#7323
; Package
coreutils
.
(Wed, 03 Nov 2010 14:53:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Thomas A Schweiger <tom.schweiger <at> acxiom.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 03 Nov 2010 14:53:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I have found an anomaly in the sort utility.
Given the input:
1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || |Fayetteville|AR|72701 | ||| |||||TEST||
3|1|1||Andy||smith |||19610203|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701| ||| |||||TEST||
4|1|1||Andy||smith |||19610302|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
5|1|1||MARY||JONES |||19610203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
6|1|1||MARY||JONES |||19660203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
7|1|1||MARY||JONES |||19610202|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
8|1|1||MARY||JONES |||19615292|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| ||| |||||TEST||
11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||
If I sort on the 10th pipe delimited field using the command:
sort -t\| -k 10 source.dat
I get the following result:
7|1|1||MARY||JONES |||19610202|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
5|1|1||MARY||JONES |||19610203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
3|1|1||Andy||smith |||19610203|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701| ||| |||||TEST||
2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || |Fayetteville|AR|72701 | ||| |||||TEST||
10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| ||| |||||TEST||
11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||
4|1|1||Andy||smith |||19610302|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
8|1|1||MARY||JONES |||19615292|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
6|1|1||MARY||JONES |||19660203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
Note in particular the location of record 9.
This occurs in Centos4 and Ubuntu 9.04 and RHEL5.5
__________________________________________________________________________________
Thomas A. J. Schweiger, PhD, PE | Acxiom Global Consulting Services
535 Research Center Blvd.
Fayetteville, AR 72701
(501) 342-6294
***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.
If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.
If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.
Thank You.
****************************************************************************
[Message part 2 (text/html, inline)]
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#7323
; Package
coreutils
.
(Wed, 03 Nov 2010 15:25:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 7323 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 11/03/2010 08:52 AM, Thomas A Schweiger wrote:
> sort -t\| -k 10 source.dat
Most likely a bug in your usage, and not in sort.
>
>
> I get the following result:
>
>
> 7|1|1||MARY||JONES |||19610202|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 5|1|1||MARY||JONES |||19610203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 3|1|1||Andy||smith |||19610203|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701| ||| |||||TEST||
> 2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || |Fayetteville|AR|72701 | ||| |||||TEST||
> 10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| ||| |||||TEST||
> 11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||
> 4|1|1||Andy||smith |||19610302|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 8|1|1||MARY||JONES |||19615292|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 6|1|1||MARY||JONES |||19660203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
>
>
> Note in particular the location of record 9.
Where did you expect it to appear? The latest coreutils 8.6 release
includes a --debug option that makes it more obvious what you did wrong
(I'm trimming down your example to a bare minimum):
$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | src/sort --debug -t\| -k2
src/sort: using `en_US.UTF-8' sorting rules
5|19610203|||||| 1400 |
_____________________
______________________
9|1961020|||||| 315 |
___________________
____________________
1|19610203|||||| 315 |
____________________
_____________________
Notice that in the en_US.UTF-8 locale, punctuation does NOT affect
collation order. And, since you explicitly requested that your key
start at field 10 and extend to the end of the line, 1961020315 (from
row 9) collates less than 19610203315 (from row 1).
But, if you instead require byte-wise sorting, and restrict your key to
JUST the field, you get results that I'm assuming you were expecting:
$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | LC_ALL=C src/sort --debug -t\| -k2,2
src/sort: using simple byte comparison
9|1961020|||||| 315 |
_______
_____________________
1|19610203|||||| 315 |
________
______________________
5|19610203|||||| 1400 |
________
_______________________
> The information contained in this communication is confidential,
It is considered poor netiquette to send emails to publicly archived
lists with disclaimers like this, since the very nature of public
archival makes this clause unenforceable. You are better off using a
secondary account that does not add your employer's disclaimer on the end.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Wed, 03 Nov 2010 15:59:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Thomas A Schweiger <tom.schweiger <at> acxiom.com>
:
bug acknowledged by developer.
(Wed, 03 Nov 2010 15:59:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 7323-done <at> debbugs.gnu.org (full text, mbox):
On 03/11/10 14:52, Thomas A Schweiger wrote:
>
> I have found an anomaly in the sort utility.
>
> If I sort on the 10th pipe delimited field using the command:
>
> sort -t\| -k 10 source.dat
That sorts from the 10th field on.
If you just want the 10th field then use -k10,10
If the field is numeric (and not fixed width) use -k10,10n
cheers,
Pádraig.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 02 Dec 2010 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 14 years and 202 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.