GNU bug report logs - #22109
Sort gives incorrect order when changing delimiters

Previous Next

Package: coreutils;

Reported by: Ed Brambley <edbrambley <at> gmail.com>

Date: Mon, 7 Dec 2015 16:17:03 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Eric Blake <eblake <at> redhat.com>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#22109: closed (Sort gives incorrect order when changing
 delimiters)
Date: Mon, 07 Dec 2015 16:50:04 +0000
[Message part 1 (text/plain, inline)]
Your message dated Mon, 7 Dec 2015 09:49:08 -0700
with message-id <5665B884.1080407 <at> redhat.com>
and subject line Re: bug#22109: Sort gives incorrect order when changing delimiters
has caused the debbugs.gnu.org bug report #22109,
regarding Sort gives incorrect order when changing delimiters
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
22109: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22109
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ed Brambley <edbrambley <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Sort gives incorrect order when changing delimiters
Date: Mon, 7 Dec 2015 15:36:12 +0000
[Message part 3 (text/plain, inline)]
The following problem came to light following a StackOverflow question [1].
The lexical ordering of sort appears to depend on the delimiter used, and I
believe it shouldn't. As a minimal example:

### Correct ordering ###
$ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t,
1,a,1
2,aa,2

### Incorrect ordering by replacing the "," delimiter by "~" ###
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~
2~aa~2
1~a~1

I think this is because, in ASCII, "," < "a" < "~".

Many thanks,
Ed

[1]
http://stackoverflow.com/questions/34134677/trying-to-understand-the-sort-utilty-in-linux
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Eric Blake <eblake <at> redhat.com>
To: Ed Brambley <edbrambley <at> gmail.com>, 22109-done <at> debbugs.gnu.org
Subject: Re: bug#22109: Sort gives incorrect order when changing delimiters
Date: Mon, 7 Dec 2015 09:49:08 -0700
[Message part 6 (text/plain, inline)]
tag 22109 notabug
thanks

On 12/07/2015 08:36 AM, Ed Brambley wrote:
> The following problem came to light following a StackOverflow question [1].
> The lexical ordering of sort appears to depend on the delimiter used, and I
> believe it shouldn't. As a minimal example:

Thanks for the report.  However, you have not found a bug in sort, only
in your misuse of the command line and in your incorrect assumptions.

Let's investigate further with the --debug option:

> 
> ### Correct ordering ###
> $ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t,
> 1,a,1
> 2,aa,2

$ printf '1,a,1\n2,aa,2' | LC_ALL=C sort -k2 -t, --debug
sort: using simple byte comparison
1,a,1
  ___
_____
2,aa,2
  ____
______

You are comparing the string "a,1" with "aa,2"; so the relative relation
between ',' and 'a' matters.

> 
> ### Incorrect ordering by replacing the "," delimiter by "~" ###
> $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~
> 2~aa~2
> 1~a~1

Same goes for here.

$ printf '1~a~1\n2~aa~2' | LC_ALL=C sort -k2 -t~ --debug
sort: using simple byte comparison
2~aa~2
  ____
______
1~a~1
  ___
_____

You compared the string "aa~2" with "a~1".


> 
> I think this is because, in ASCII, "," < "a" < "~".

Yes, so you saw exactly what you asked for.  But what you asked for
("sort starting from the second delimiter through to the end of the
line") is probably not what you wanted.  It sounds like you wanted "sort
on ONLY the second delimiter", which is spelled differently:

$ printf '1~a~1\n2~aa~2' | LC_ALL=C sort -k2,2 -t~ --debug
sort: using simple byte comparison
1~a~1
  _
_____
2~aa~2
  __
______


Note that there is a very distinct difference between '-k2' and '-k2,2';
only the latter one limits the sort to JUST the second key ("a" vs.
"aa", regardless of delimiter), while the former slurps in the rest of
the line such that the spelling of the delimiter affects the result.

I'm marking this as not a bug in the database, but feel free to add
further comments.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 9 years and 166 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.