Thanks for the fast response. Right or wrong POSIX is POSIX, Yet a LF as part of a line does seem worth counting. A line must terminate with a line feed. Yet a string does not require a line feed. Paul Eggert's assumption seems correct. During line indexing the lines which consist of only a line feed can be counted and excluded from the sort. And then the correct amount of line feeds can be output before the sorted lines, or after if the -r parameter is present. For test data consecutive LF seems plausible, but for actual sorting tasks; would consecutive LF be common? It might be a potential optimization worthy of omission. If the sort function's compare function was inlined rather than called from a pointer then a modest 5% performance boon could become. To implement some creativity would be required. If the input data was not copied and string conversion was omitted then another 5% performance boon could become. The sort method used is not known. However, a merge sort has some surprisingly frequent uhm code paths like a 3 way comparison which can be implemented for 2 or 3 comparisons and 0 to 4 memory moves. A 15% to 20% overall performance improvement from the three suggestions is not implausible. Thanks for making it faster. On Sun, Apr 9, 2017 at 12:04 PM, Paul Eggert wrote: > Historically, 'sort' ignored the \n at the end of each line, so that empty > lines (i.e., lines consisting only of a single \n) collated before all > other lines. An earlier version of the POSIX spec was (mis)written to > require treating the \n as part of the data, and during development in 1999 > GNU sort was briefly changed to conform to that, but this was an error in > the POSIX spec that was eventually fixed and GNU sort was changed back to > the traditional behavior, before any release was made with the funky > behavior. > > So, it's not a bug that \t\n collates after \n, since "\t" is > lexicographically after "". > > As I understand it, the empty string should collate before all other > strings in all POSIX locales, so empty lines should always sort first in > 'sort' output. I'm by no means a collation expert, though, and if I'm wrong > I'd like to see a counterexample. > > Come to think of it, 'sort' might be able to improve performance in the > common case of sorting text files containing many empty lines, by merely > counting the lines rather than storing them internally. I suppose this is a > different topic, though. >