GNU bug report logs - #26422
historical feature or grand daddy bug?

Previous Next

Package: coreutils;

Reported by: Kyle Sallee <kyle.sallee <at> gmail.com>

Date: Sun, 9 Apr 2017 18:38:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#26422: closed (historical feature or grand daddy bug?)
Date: Sun, 09 Apr 2017 19:05:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Sun, 9 Apr 2017 12:04:34 -0700
with message-id <d25988e3-2d97-adc2-5286-c57933b8c597 <at> cs.ucla.edu>
and subject line Re: bug#26422: historical feature or grand daddy bug?
has caused the debbugs.gnu.org bug report #26422,
regarding historical feature or grand daddy bug?
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
26422: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=26422
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Kyle Sallee <kyle.sallee <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: historical feature or grand daddy bug?
Date: Sun, 9 Apr 2017 11:37:34 -0700
[Message part 3 (text/plain, inline)]
By the sort program
when a file is sorted
the lines which start with line feed
output earlier than lines which begin with tab.

Tab ASCII value is 9.
LF  ASCII value is 10.
Tabs should be first?

However, to strings
if the lines are converted
then to mitigate a larger address space
presumably with 0 the LF are replaced.
Yet after the LF if the 0 byte was placed
then the expected output would become.

If expected behavior becomes
then historical behavior relied upon scripts might break.

The sort.c source code was not viewed.
Therefore, a patch is not offered.
Discussion is solicited.
Concerning empty lines first.
Is it a bug?
Should it be fixed?

Because I am not on the email list;
if the topic is worth discussion
if a decision is made
then please forward.
Thanks for maintaining and sharing awesome software.
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Kyle Sallee <kyle.sallee <at> gmail.com>, 26422-done <at> debbugs.gnu.org
Subject: Re: bug#26422: historical feature or grand daddy bug?
Date: Sun, 9 Apr 2017 12:04:34 -0700
Historically, 'sort' ignored the \n at the end of each line, so that 
empty lines (i.e., lines consisting only of a single \n) collated before 
all other lines. An earlier version of the POSIX spec was (mis)written 
to require treating the \n as part of the data, and during development 
in 1999 GNU sort was briefly changed to conform to that, but this was an 
error in the POSIX spec that was eventually fixed and GNU sort was 
changed back to the traditional behavior, before any release was made 
with the funky behavior.

So, it's not a bug that \t\n collates after \n, since "\t" is 
lexicographically after "".

As I understand it, the empty string should collate before all other 
strings in all POSIX locales, so empty lines should always sort first in 
'sort' output. I'm by no means a collation expert, though, and if I'm 
wrong I'd like to see a counterexample.

Come to think of it, 'sort' might be able to improve performance in the 
common case of sorting text files containing many empty lines, by merely 
counting the lines rather than storing them internally. I suppose this 
is a different topic, though.


This bug report was last modified 8 years and 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.