GNU bug report logs -
#26422
historical feature or grand daddy bug?
Previous Next
Full log
Message #13 received at 26422-done <at> debbugs.gnu.org (full text, mbox):
Historically, 'sort' ignored the \n at the end of each line, so that
empty lines (i.e., lines consisting only of a single \n) collated before
all other lines. An earlier version of the POSIX spec was (mis)written
to require treating the \n as part of the data, and during development
in 1999 GNU sort was briefly changed to conform to that, but this was an
error in the POSIX spec that was eventually fixed and GNU sort was
changed back to the traditional behavior, before any release was made
with the funky behavior.
So, it's not a bug that \t\n collates after \n, since "\t" is
lexicographically after "".
As I understand it, the empty string should collate before all other
strings in all POSIX locales, so empty lines should always sort first in
'sort' output. I'm by no means a collation expert, though, and if I'm
wrong I'd like to see a counterexample.
Come to think of it, 'sort' might be able to improve performance in the
common case of sorting text files containing many empty lines, by merely
counting the lines rather than storing them internally. I suppose this
is a different topic, though.
This bug report was last modified 8 years and 93 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.