GNU bug report logs -
#22109
Sort gives incorrect order when changing delimiters
Previous Next
Reported by: Ed Brambley <edbrambley <at> gmail.com>
Date: Mon, 7 Dec 2015 16:17:03 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
tag 22109 notabug
close 22109
stop
Hello Ed,
On 12/07/2015 10:36 AM, Ed Brambley wrote:
> The following problem came to light following a StackOverflow question [1]. The lexical ordering of sort appears to depend on the delimiter used, and I believe it shouldn't. As a minimal example:
>
> ### Correct ordering ###
> $ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t,
> 1,a,1
> 2,aa,2
>
> ### Incorrect ordering by replacing the "," delimiter by "~" ###
> $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~
> 2~aa~2
> 1~a~1
>
This is not a bug in 'sort', but simply an incorrect usage of the key options.
The parameter "-k2" means: use the second key *and all characters until the end of the line* to sort each line.
In this case, the character after the second key ',' or '~' does come into play.
The correct usage is to specify the key as "-k2,2" meaning: sort by the second key alone (then resolve equal keys by the entire line, unless --stable is used).
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2,2 -t~
1~a~1
2~aa~2
Using sort's "--debug" option will illustrate the difference (notice the underscore characters indicating what is the key that is being used):
Incorrect usage (-k2):
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2 -t~
sort: using simple byte comparison
2~aa~2
____
______
1~a~1
___
_____
Better usage (-k2,2):
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2,2 -t~
sort: using simple byte comparison
1~a~1
_
_____
2~aa~2
__
______
regards,
- assaf
This bug report was last modified 9 years and 166 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.