GNU bug report logs - #9334
sort bug

Previous Next

Package: coreutils;

Reported by: "ROGER GRAYDON CHRISTMAN" <dvl <at> psu.edu>

Date: Sat, 20 Aug 2011 20:28:01 UTC

Severity: normal

Done: Bob Proulx <bob <at> proulx.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9334 in the body.
You can then email your comments to 9334 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9334; Package coreutils. (Sat, 20 Aug 2011 20:28:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to "ROGER GRAYDON CHRISTMAN" <dvl <at> psu.edu>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 20 Aug 2011 20:28:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "ROGER GRAYDON CHRISTMAN" <dvl <at> psu.edu>
To: bug-coreutils <at> gnu.org
Subject: sort bug
Date: Sat, 20 Aug 2011 10:54:45 -0400
[Message part 1 (text/plain, inline)]
First: some version information:
sort (GNU coreutils) 8.4

I run a series of pipes, and after piping into 'sort -n', I see this: 
    1   12
    1    4
    5   16
    9   20

The first column sorted correctly, numerically, but the second did not.
I do not have sufficient data to determine whether the second column
is sorted lexicographically, or simply ignored.

Roger Christman
Computer Science and Engineering
Pennsylvania State Univeristy





[Message part 2 (text/html, inline)]

Reply sent to Bob Proulx <bob <at> proulx.com>:
You have taken responsibility. (Mon, 22 Aug 2011 01:59:01 GMT) Full text and rfc822 format available.

Notification sent to "ROGER GRAYDON CHRISTMAN" <dvl <at> psu.edu>:
bug acknowledged by developer. (Mon, 22 Aug 2011 01:59:01 GMT) Full text and rfc822 format available.

Message #10 received at 9334-done <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: ROGER GRAYDON CHRISTMAN <dvl <at> psu.edu>
Cc: 9334-done <at> debbugs.gnu.org
Subject: Re: bug#9334: sort bug
Date: Sun, 21 Aug 2011 19:55:57 -0600
tags 9334 + notabug
thanks

ROGER GRAYDON CHRISTMAN wrote:
> First: some version information:
> sort (GNU coreutils) 8.4

Thanks!

> I run a series of pipes, and after piping into 'sort -n', I see this: 
>     1   12
>     1    4
>     5   16
>     9   20
> 
> The first column sorted correctly, numerically, but the second did not.
> I do not have sufficient data to determine whether the second column
> is sorted lexicographically, or simply ignored.

Thanks for the report but you are not seeing a bug in sort but in the
use of it.  You have insufficiently qualified the sort criteria.  Try
this:

  sort -n -k1,1 -k2,2

Or my preference:

  sort -k1,1n -k2,2n

The reasoning is as found in the sort documentation:

     A pair of lines is compared as follows: `sort' compares each pair of
  fields, in the order specified on the command line, according to the
  associated ordering options, until a difference is found or no fields
  are left.  If no key fields are specified, `sort' uses a default key of
  the entire line.  Finally, as a last resort when all keys compare
  equal, `sort' compares entire lines as if no ordering options other
  than `--reverse' (`-r') were specified.  The `--stable' (`-s') option
  disables this "last-resort comparison" so that lines in which all
  fields compare equal are left in their original relative order.  The
  `--unique' (`-u') option also disables the last-resort comparison.
  ...
  `-n'
  `--numeric-sort'
  `--sort=numeric'
       Sort numerically.  The number begins each line and consists of
       optional blanks, an optional `-' sign, and zero or more digits
       possibly separated by thousands separators, optionally followed by
       a decimal-point character and zero or more digits.  An empty
       number is treated as `0'.  ...

Since no fields are specified sort is using a default key of the
entire line.  Since you care about sorting on fields you should
include sort field options.

Bob




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9334; Package coreutils. (Mon, 22 Aug 2011 04:14:02 GMT) Full text and rfc822 format available.

Message #13 received at 9334 <at> debbugs.gnu.org (full text, mbox):

From: Aaron Davies <aaron.davies <at> gmail.com>
To: "9334 <at> debbugs.gnu.org" <9334 <at> debbugs.gnu.org>,
	"bob <at> proulx.com" <bob <at> proulx.com>
Subject: Re: bug#9334: sort bug
Date: Mon, 22 Aug 2011 00:11:12 -0400
[Message part 1 (text/plain, inline)]
On Sunday, August 21, 2011, Bob Proulx <bob <at> proulx.com> wrote:
> tags 9334 + notabug
> thanks
>
> ROGER GRAYDON CHRISTMAN wrote:
>> First: some version information:
>> sort (GNU coreutils) 8.4
>
> Thanks!
>
>> I run a series of pipes, and after piping into 'sort -n', I see this:
>>     1   12
>>     1    4
>>     5   16
>>     9   20
>>
>> The first column sorted correctly, numerically, but the second did not.
>> I do not have sufficient data to determine whether the second column
>> is sorted lexicographically, or simply ignored.
>
> Thanks for the report but you are not seeing a bug in sort but in the
> use of it.  You have insufficiently qualified the sort criteria.  Try
> this:
>
>  sort -n -k1,1 -k2,2
>
> Or my preference:
>
>  sort -k1,1n -k2,2n
>
> The reasoning is as found in the sort documentation:
>
>     A pair of lines is compared as follows: `sort' compares each pair of
>  fields, in the order specified on the command line, according to the
>  associated ordering options, until a difference is found or no fields
>  are left.  If no key fields are specified, `sort' uses a default key of
>  the entire line.  Finally, as a last resort when all keys compare
>  equal, `sort' compares entire lines as if no ordering options other
>  than `--reverse' (`-r') were specified.  The `--stable' (`-s') option
>  disables this "last-resort comparison" so that lines in which all
>  fields compare equal are left in their original relative order.  The
>  `--unique' (`-u') option also disables the last-resort comparison.
>  ...
>  `-n'
>  `--numeric-sort'
>  `--sort=numeric'
>       Sort numerically.  The number begins each line and consists of
>       optional blanks, an optional `-' sign, and zero or more digits
>       possibly separated by thousands separators, optionally followed by
>       a decimal-point character and zero or more digits.  An empty
>       number is treated as `0'.  ...
>
> Since no fields are specified sort is using a default key of the
> entire line.  Since you care about sorting on fields you should
> include sort field options.

Out of curiosity, what's the output mean in this case? "two lines, starting
with the number one, in their original order", "two lines, starting with the
number one, also containing the strings '12' and '4' and sorted
lexicographically thereby", or something else entirely?

-- 
Aaron Davies
aaron.davies <at> gmail.com
[Message part 2 (text/html, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9334; Package coreutils. (Mon, 22 Aug 2011 14:43:01 GMT) Full text and rfc822 format available.

Message #16 received at 9334 <at> debbugs.gnu.org (full text, mbox):

From: "ROGER GRAYDON CHRISTMAN" <dvl <at> psu.edu>
To: Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#9334: sort bug
Date: Mon, 22 Aug 2011 09:47:13 -0400
[Message part 1 (text/plain, inline)]
Thanks.  I guess I misinterpreted "uses a default key of the entire line"
as "uses the entire line as keys by default", in which case if the first column
was equal, it would compare the second, then the third, etc.

I guess I don't know what "default key of the entire line" means with respect
to -n,
since it apparently didn't treat "1    12" as "112" and "1   4" as 14.
I'm curious to find out what this phrase means in this context.

Roger Christman

On Sun, Aug 21, 2011 09:55 PM, Bob Proulx <bob <at> proulx.com> wrote:
>
tags 9334 + notabug
>thanks
>
>ROGER GRAYDON CHRISTMAN wrote:
>> First: some version information:
>> sort (GNU coreutils) 8.4
>
>Thanks!
>
>> I run a series of pipes, and after piping into 'sort -n', I see this: 
>>     1   12
>>     1    4
>>     5   16
>>     9   20
>> 
>> The first column sorted correctly, numerically, but the second did not.
>> I do not have sufficient data to determine whether the second column
>> is sorted lexicographically, or simply ignored.
>
>Thanks for the report but you are not seeing a bug in sort but in the
>use of it.  You have insufficiently qualified the sort criteria.  Try
>this:
>
>  sort -n -k1,1 -k2,2
>
>Or my preference:
>
>  sort -k1,1n -k2,2n
>
>The reasoning is as found in the sort documentation:
>
>     A pair of lines is compared as follows: `sort' compares each pair of
>  fields, in the order specified on the command line, according to the
>  associated ordering options, until a difference is found or no fields
>  are left.  If no key fields are specified, `sort' uses a default key of
>  the entire line.  Finally, as a last resort when all keys compare
>  equal, `sort' compares entire lines as if no ordering options other
>  than `--reverse' (`-r') were specified.  The `--stable'
>(`-s') option
>  disables this "last-resort comparison" so that lines in which all
>  fields compare equal are left in their original relative order.  The
>  `--unique' (`-u') option also disables the last-resort comparison.
>  ...
>  `-n'
>  `--numeric-sort'
>  `--sort=numeric'
>       Sort numerically.  The number begins each line and consists of
>       optional blanks, an optional `-' sign, and zero or more digits
>       possibly separated by thousands separators, optionally followed by
>       a decimal-point character and zero or more digits.  An empty
>       number is treated as `0'.  ...
>
>Since no fields are specified sort is using a default key of the
>entire line.  Since you care about sorting on fields you should
>include sort field options.
>
>Bob
>
>
>

[Message part 2 (text/html, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9334; Package coreutils. (Mon, 22 Aug 2011 14:48:01 GMT) Full text and rfc822 format available.

Message #19 received at 9334 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: ROGER GRAYDON CHRISTMAN <dvl <at> psu.edu>
Cc: 9334 <at> debbugs.gnu.org, Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#9334: sort bug
Date: Mon, 22 Aug 2011 08:44:57 -0600
On 08/22/2011 07:47 AM, ROGER GRAYDON CHRISTMAN wrote:
> Thanks.  I guess I misinterpreted "uses a default key of the entire line"
> as "uses the entire line as keys by default", in which case if the first column
> was equal, it would compare the second, then the third, etc.
>
> I guess I don't know what "default key of the entire line" means with respect
> to -n,
> since it apparently didn't treat "1    12" as "112" and "1   4" as 14.
> I'm curious to find out what this phrase means in this context.

'sort --debug' is your friend.  In the C locale, global -n means 'parse 
as much of the prefix of the line as can be treated as a number as the 
primary key, then treat the entire line as the secondary key'.

$ printf ' 1 12\n 1  4\n 5 16\n 9 20\n' | LC_ALL=C sort --debug -n
sort: using simple byte comparison
 1  4
 _
_____
 1 12
 _
_____
 5 16
 _
_____
 9 20
 _
_____


-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 20 Sep 2011 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 275 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.