tag 22109 notabug thanks On 12/07/2015 08:36 AM, Ed Brambley wrote: > The following problem came to light following a StackOverflow question [1]. > The lexical ordering of sort appears to depend on the delimiter used, and I > believe it shouldn't. As a minimal example: Thanks for the report. However, you have not found a bug in sort, only in your misuse of the command line and in your incorrect assumptions. Let's investigate further with the --debug option: > > ### Correct ordering ### > $ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t, > 1,a,1 > 2,aa,2 $ printf '1,a,1\n2,aa,2' | LC_ALL=C sort -k2 -t, --debug sort: using simple byte comparison 1,a,1 ___ _____ 2,aa,2 ____ ______ You are comparing the string "a,1" with "aa,2"; so the relative relation between ',' and 'a' matters. > > ### Incorrect ordering by replacing the "," delimiter by "~" ### > $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~ > 2~aa~2 > 1~a~1 Same goes for here. $ printf '1~a~1\n2~aa~2' | LC_ALL=C sort -k2 -t~ --debug sort: using simple byte comparison 2~aa~2 ____ ______ 1~a~1 ___ _____ You compared the string "aa~2" with "a~1". > > I think this is because, in ASCII, "," < "a" < "~". Yes, so you saw exactly what you asked for. But what you asked for ("sort starting from the second delimiter through to the end of the line") is probably not what you wanted. It sounds like you wanted "sort on ONLY the second delimiter", which is spelled differently: $ printf '1~a~1\n2~aa~2' | LC_ALL=C sort -k2,2 -t~ --debug sort: using simple byte comparison 1~a~1 _ _____ 2~aa~2 __ ______ Note that there is a very distinct difference between '-k2' and '-k2,2'; only the latter one limits the sort to JUST the second key ("a" vs. "aa", regardless of delimiter), while the former slurps in the rest of the line such that the spelling of the delimiter affects the result. I'm marking this as not a bug in the database, but feel free to add further comments. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org