GNU bug report logs - #7323
sort bug

Previous Next

Package: coreutils;

Reported by: Thomas A Schweiger <tom.schweiger <at> acxiom.com>

Date: Wed, 3 Nov 2010 14:53:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 7323 in the body.
You can then email your comments to 7323 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7323; Package coreutils. (Wed, 03 Nov 2010 14:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Thomas A Schweiger <tom.schweiger <at> acxiom.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 03 Nov 2010 14:53:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Thomas A Schweiger <tom.schweiger <at> acxiom.com>
To: bug-coreutils <at> gnu.org
Subject: sort bug
Date: Wed, 03 Nov 2010 09:52:25 -0500
[Message part 1 (text/plain, inline)]
I have found an anomaly in the sort utility.

Given the input:

1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || |Fayetteville|AR|72701 | ||| |||||TEST|| 
3|1|1||Andy||smith   |||19610203|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701| ||| |||||TEST||
4|1|1||Andy||smith   |||19610302|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
5|1|1||MARY||JONES   |||19610203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
6|1|1||MARY||JONES   |||19660203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
7|1|1||MARY||JONES   |||19610202|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
8|1|1||MARY||JONES   |||19615292|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| ||| |||||TEST||
11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||


If I sort on the 10th pipe delimited field using the command:

sort -t\| -k 10  source.dat


I get the following result:


7|1|1||MARY||JONES   |||19610202|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
5|1|1||MARY||JONES   |||19610203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
3|1|1||Andy||smith   |||19610203|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701| ||| |||||TEST||
2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || |Fayetteville|AR|72701 | ||| |||||TEST|| 
10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| ||| |||||TEST||
11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||
4|1|1||Andy||smith   |||19610302|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
8|1|1||MARY||JONES   |||19615292|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
6|1|1||MARY||JONES   |||19660203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||


Note in particular the location of record 9.

This occurs in Centos4 and Ubuntu 9.04 and RHEL5.5

__________________________________________________________________________________

Thomas A. J. Schweiger, PhD, PE  |  Acxiom Global Consulting Services 
535 Research Center Blvd.
Fayetteville, AR 72701
(501) 342-6294

***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************
[Message part 2 (text/html, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7323; Package coreutils. (Wed, 03 Nov 2010 15:25:02 GMT) Full text and rfc822 format available.

Message #8 received at 7323 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Thomas A Schweiger <tom.schweiger <at> acxiom.com>
Cc: 7323 <at> debbugs.gnu.org
Subject: Re: bug#7323: sort bug
Date: Wed, 03 Nov 2010 09:28:58 -0600
[Message part 1 (text/plain, inline)]
On 11/03/2010 08:52 AM, Thomas A Schweiger wrote:
> sort -t\| -k 10  source.dat

Most likely a bug in your usage, and not in sort.

> 
> 
> I get the following result:
> 
> 
> 7|1|1||MARY||JONES   |||19610202|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 5|1|1||MARY||JONES   |||19610203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 3|1|1||Andy||smith   |||19610203|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701| ||| |||||TEST||
> 2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || |Fayetteville|AR|72701 | ||| |||||TEST|| 
> 10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| ||| |||||TEST||
> 11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||
> 4|1|1||Andy||smith   |||19610302|||||| 315 | | Willow | Street | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 8|1|1||MARY||JONES   |||19615292|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 6|1|1||MARY||JONES   |||19660203|||||| 1400 |N | eastwood | drive | || |Fayetteville|AR| 72701 | ||| |||||TEST||
> 
> 
> Note in particular the location of record 9.

Where did you expect it to appear?  The latest coreutils 8.6 release
includes a --debug option that makes it more obvious what you did wrong
(I'm trimming down your example to a bare minimum):

$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | src/sort --debug -t\| -k2
src/sort: using `en_US.UTF-8' sorting rules
5|19610203|||||| 1400 |
 _____________________
______________________
9|1961020|||||| 315 |
 ___________________
____________________
1|19610203|||||| 315 |
 ____________________
_____________________

Notice that in the en_US.UTF-8 locale, punctuation does NOT affect
collation order.  And, since you explicitly requested that your key
start at field 10 and extend to the end of the line, 1961020315 (from
row 9) collates less than 19610203315 (from row 1).

But, if you instead require byte-wise sorting, and restrict your key to
JUST the field, you get results that I'm assuming you were expecting:

$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | LC_ALL=C src/sort --debug -t\| -k2,2
src/sort: using simple byte comparison
9|1961020|||||| 315 |
  _______
_____________________
1|19610203|||||| 315 |
  ________
______________________
5|19610203|||||| 1400 |
  ________
_______________________


> The information contained in this communication is confidential,

It is considered poor netiquette to send emails to publicly archived
lists with disclaimers like this, since the very nature of public
archival makes this clause unenforceable.  You are better off using a
secondary account that does not add your employer's disclaimer on the end.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Wed, 03 Nov 2010 15:59:02 GMT) Full text and rfc822 format available.

Notification sent to Thomas A Schweiger <tom.schweiger <at> acxiom.com>:
bug acknowledged by developer. (Wed, 03 Nov 2010 15:59:02 GMT) Full text and rfc822 format available.

Message #13 received at 7323-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Thomas A Schweiger <tom.schweiger <at> acxiom.com>
Cc: 7323-done <at> debbugs.gnu.org
Subject: Re: bug#7323: sort bug
Date: Wed, 03 Nov 2010 16:03:09 +0000
On 03/11/10 14:52, Thomas A Schweiger wrote:
> 
> I have found an anomaly in the sort utility.
> 
> If I sort on the 10th pipe delimited field using the command:
> 
> sort -t\| -k 10  source.dat

That sorts from the 10th field on.
If you just want the 10th field then use -k10,10
If the field is numeric (and not fixed width) use -k10,10n

cheers,
Pádraig.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 02 Dec 2010 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 202 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.