On 02/02/2011 05:42 AM, Francesco Bettella wrote: > hi, > I may have bumped into an undesired feature/bug of sort, which appears to be > still present in the version 8.9 of coreutils. Thanks for the report. However, this is a feature, and not a bug, of sort. > > I'm issuing the following sort commands (see attached files): > > [prompt1] > sort -k 1.4,1n asd1 > asd1.sorted > > [prompt2] > sort -k 2.4,2n asd2 > asd2.sorted If I'm correct, asd1 and asd2 have the same contents, except that you have swapped columns 1 and 2 between the two and resorted the lines. And your desired goal is that the output matches asd1.sorted, again with the columns swapped for asd2.sorted. > > the first one works as I would expect, the second one doesn't. Let's examine why: $ head -3 asd1 | sort -k 1.4,1n --debug sort: using `en_US.UTF-8' sorting rules sort: leading blanks are significant in key 1; consider also specifying `b' chr>coding_gene ^ no match for key _______________ chr1>PRAMEF1 _ ____________ chr1>PRAMEF4 _ ____________ $ head -3 asd1 | LC_ALL=C sort -k 1.4,1n --debug sort: using simple byte comparison sort: leading blanks are significant in key 1; consider also specifying `b' chr>coding_gene ^ no match for key _______________ chr1>PRAMEF1 _ ____________ chr1>PRAMEF4 _ ____________ In both cases, when there is no match for a key but numeric sorting was requested, then that line sorts first; meanwhile, you get the fallback sort of the complete line after the first key has been sorted, so that the end result matches asd1.sorted whether you use the C locale or dictionary sorting. But notice that warning about not using -b, and how it affects asd2 (and also, how the difference in dictionary vs. byte-ordering plays a role in the secondary sorting): $ head -3 asd2 | sort -k 2.4,2n --debug sort: using `en_US.UTF-8' sorting rules sort: leading blanks are significant in key 1; consider also specifying `b' coding_gene>chr ^ no match for key _______________ PRAMEF1>chr1 ^ no match for key ____________ PRAMEF4>chr1 ^ no match for key ____________ $ head -3 asd2 | LC_ALL=C sort -k 2.4,2n --debug sort: using simple byte comparison sort: leading blanks are significant in key 1; consider also specifying `b' PRAMEF1>chr1 ^ no match for key ____________ PRAMEF4>chr1 ^ no match for key ____________ coding_gene>chr ^ no match for key But when you add -b (note, b is the one option you have to add to the start field, since it affects start and end fields specially; all other options can be added to start, end, or both, and affect the entire key): $ head -3 asd2 | sort -k 2.4b,2n --debug sort: using `en_US.UTF-8' sorting rules coding_gene>chr ^ no match for key _______________ PRAMEF1>chr1 _ ____________ PRAMEF4>chr1 _ ____________ $ head -3 asd2 | LC_ALL=C coreutils/src/sort -k 2.4b,2n --debug coreutils/src/sort: using simple byte comparison coding_gene>chr ^ no match for key _______________ PRAMEF1>chr1 _ ____________ PRAMEF4>chr1 _ ____________ That is, your expectations were insufficient - without telling sort enough additional information, sort correctly followed what you told it to do, but what you told it was not what you meant. And the --debug option is your [new] friend :) -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org