GNU bug report logs - #24527
Problem while sorting comma separated values using sort command.

Previous Next

Package: coreutils;

Reported by: Jash Dave <jashdave23 <at> gmail.com>

Date: Sat, 24 Sep 2016 13:13:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 24527 in the body.
You can then email your comments to 24527 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#24527; Package coreutils. (Sat, 24 Sep 2016 13:13:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jash Dave <jashdave23 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 24 Sep 2016 13:13:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jash Dave <jashdave23 <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Problem while sorting comma separated values using sort command.
Date: Sat, 24 Sep 2016 15:14:28 +0530
[Message part 1 (text/plain, inline)]
There is problem while sorting comma separated entries (specifically
numbers). Even when the separator symbol is set to comma, it reads all
following columns with numbers, and doesn't treats comma as separator
between following numbers.

If I use command:
sort -t"," -k1 -n Example.csv

Example.csv :
1,100,a,1,a
4,1000,d,4,c
3,1002,c,3,c
22,10,a,2,b

Output:
1,100,a,1,a
22,10,a,2,b
3,1002,c,3,c
4,1000,d,4,c


Expected:
1,100,a,1,a
3,1002,c,3,c
4,1000,d,4,c
22,10,a,2,b

But it works with column 2 or 4, since there are no following numbers.

Regards,
Jash Dave.
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#24527; Package coreutils. (Sat, 24 Sep 2016 15:36:02 GMT) Full text and rfc822 format available.

Message #8 received at 24527 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Jash Dave <jashdave23 <at> gmail.com>, 24527 <at> debbugs.gnu.org
Subject: Re: bug#24527: Problem while sorting comma separated values using
 sort command.
Date: Sat, 24 Sep 2016 17:35:12 +0200
tag 24527 notabug
close 24527
stop

On 09/24/2016 11:44 AM, Jash Dave wrote:
> There is problem while sorting comma separated entries (specifically
> numbers). Even when the separator symbol is set to comma, it reads all
> following columns with numbers, and doesn't treats comma as separator
> between following numbers.
> 
> If I use command:
> sort -t"," -k1 -n Example.csv
> 
> Example.csv :
> 1,100,a,1,a
> 4,1000,d,4,c
> 3,1002,c,3,c
> 22,10,a,2,b
> 
> Output:
> 1,100,a,1,a
> 22,10,a,2,b
> 3,1002,c,3,c
> 4,1000,d,4,c
> 
> 
> Expected:
> 1,100,a,1,a
> 3,1002,c,3,c
> 4,1000,d,4,c
> 22,10,a,2,b
> 
> But it works with column 2 or 4, since there are no following numbers.

When having trouble with sort, usually the --debug option helps:

  $ sort --debug -t"," -k1 -n
  sort: using ‘en_US.UTF-8’ sorting rules
  sort: key 1 is numeric and spans multiple fields
  1,100,a,1,a
  4,1000,d,4,c
  3,1002,c,3,c
  22,10,a,2,b
  1,100,a,1,a
  ______
  ___________
  22,10,a,2,b
  ______
  ___________
  3,1002,c,3,c
  _______
  ____________
  4,1000,d,4,c
  _______
  ____________

Aha, the key spans multiple fields.
As you want to sort on the first field only, you need to
tell sort to do so:

  $ sort --debug -t"," -k1,1 -n
  sort: using ‘en_US.UTF-8’ sorting rules
  1,100,a,1,a
  4,1000,d,4,c
  3,1002,c,3,c
  22,10,a,2,b
  1,100,a,1,a
  _
  ___________
  3,1002,c,3,c
  _
  ____________
  4,1000,d,4,c
  _
  ____________
  22,10,a,2,b
  __
  ___________

Therefore, I'm marking this as not a bug in sort.

Thanks & have a nice day,
Berny




Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 28 Oct 2018 06:37:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 24527 <at> debbugs.gnu.org and Jash Dave <jashdave23 <at> gmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 28 Oct 2018 06:37:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 25 Nov 2018 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 266 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.