tag 15450 needinfo thanks On 09/22/2013 08:28 PM, sam@netinetics.com wrote: > While most items are alphabetically sorted, the following occurs (for > example): > > "Universe (1960 film)" > "Universe" > > "Yellow 2G" > "Yellow" It sounds like you might be falling foul of a FAQ: https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 But to know that for sure, you need to provide more information: what locale settings are you using, and does your locale treat punctuation as insignificant when doing strcoll()? Are you using LC_ALL=C to force C locale sorting? > > the lines are in the wrong order. My C++ program which searches the > index expects that "Universe" comes before "Universe (1960 film)" when > doing a string compare. > > Interestingly, if I copy these problem lines into a separate text file > and run SORT on them, it sorts correctly. > I have tried every switch combination I can think of but the problem > remains. You didn't show the actual options you are trying, so it's hard to say without more information. Are those the full line that you are sorting, or are you sorting something more like this $ LC_ALL=en_US.UTF-8 sort -t/ foo Yellow 2G/1 Yellow/2 $ LC_ALL=en_US.UTF-8 sort -t/ -k1,1 foo Yellow/2 Yellow 2G/1 Note how in the en_US locale, which ignores punctuation, I was able to get a different sort order depending on whether I remembered to terminate the sort key at the separator, vs. letting it strcoll() on the full line. Have you played with the --debug option, to make sure you are sorting on what you THINK you should be sorting on? > I am wondering if it is something to do with the size of the file I am > trying to sort. 605 megabytes, about 10,000,000 lines of text. Again, > most of the lines are sorted correctly, but some (and I haven't checked > exactly how many, but am finding them at random) are not. Most likely, the size of the file probably has nothing to do with it. To guarantee it is not a bad merge when sort uses multiple files, rerun your command with 'sort --parallel=1 $your_options...' to ensure that there are no temporary files to be merged (if there IS a bug with how temporaries are merged, we definitely want to fix that; it would show up with --parallel larger than 1). Again, I suspect it is in your locale or command line, but without enough details I can't prove that. So I'll leave this bug open while waiting for more details. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org