GNU bug report logs - #24527
Problem while sorting comma separated values using sort command.

Previous Next

Package: coreutils;

Reported by: Jash Dave <jashdave23 <at> gmail.com>

Date: Sat, 24 Sep 2016 13:13:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Jash Dave <jashdave23 <at> gmail.com>, 24527 <at> debbugs.gnu.org
Subject: bug#24527: Problem while sorting comma separated values using sort command.
Date: Sat, 24 Sep 2016 17:35:12 +0200
tag 24527 notabug
close 24527
stop

On 09/24/2016 11:44 AM, Jash Dave wrote:
> There is problem while sorting comma separated entries (specifically
> numbers). Even when the separator symbol is set to comma, it reads all
> following columns with numbers, and doesn't treats comma as separator
> between following numbers.
> 
> If I use command:
> sort -t"," -k1 -n Example.csv
> 
> Example.csv :
> 1,100,a,1,a
> 4,1000,d,4,c
> 3,1002,c,3,c
> 22,10,a,2,b
> 
> Output:
> 1,100,a,1,a
> 22,10,a,2,b
> 3,1002,c,3,c
> 4,1000,d,4,c
> 
> 
> Expected:
> 1,100,a,1,a
> 3,1002,c,3,c
> 4,1000,d,4,c
> 22,10,a,2,b
> 
> But it works with column 2 or 4, since there are no following numbers.

When having trouble with sort, usually the --debug option helps:

  $ sort --debug -t"," -k1 -n
  sort: using ‘en_US.UTF-8’ sorting rules
  sort: key 1 is numeric and spans multiple fields
  1,100,a,1,a
  4,1000,d,4,c
  3,1002,c,3,c
  22,10,a,2,b
  1,100,a,1,a
  ______
  ___________
  22,10,a,2,b
  ______
  ___________
  3,1002,c,3,c
  _______
  ____________
  4,1000,d,4,c
  _______
  ____________

Aha, the key spans multiple fields.
As you want to sort on the first field only, you need to
tell sort to do so:

  $ sort --debug -t"," -k1,1 -n
  sort: using ‘en_US.UTF-8’ sorting rules
  1,100,a,1,a
  4,1000,d,4,c
  3,1002,c,3,c
  22,10,a,2,b
  1,100,a,1,a
  _
  ___________
  3,1002,c,3,c
  _
  ____________
  4,1000,d,4,c
  _
  ____________
  22,10,a,2,b
  __
  ___________

Therefore, I'm marking this as not a bug in sort.

Thanks & have a nice day,
Berny




This bug report was last modified 6 years and 267 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.