GNU bug report logs - #18291
Unix Sort Bug Report

Previous Next

Package: coreutils;

Reported by: NTENTOS STAVROS <ntentos <at> inf.uth.gr>

Date: Mon, 18 Aug 2014 15:36:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18291 in the body.
You can then email your comments to 18291 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#18291; Package coreutils. (Mon, 18 Aug 2014 15:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to NTENTOS STAVROS <ntentos <at> inf.uth.gr>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 18 Aug 2014 15:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: NTENTOS STAVROS <ntentos <at> inf.uth.gr>
To: bug-coreutils <at> gnu.org
Subject: Unix Sort Bug Report
Date: Mon, 18 Aug 2014 11:55:21 +0300
Hello developers,

Recently, using the sort utility I run into an omission. While I  
cannot disclose the file in question, I will try to explain the issue:
On a Windows-created file (line ending: \r\n) I tried to perform a  
sorting, which happened to sort the last entry somewhere above. The  
last line did not have a line ending of any kind, and sort created a  
Unix-like ending (\r), which afterwards creates a parsing problem with  
the file.
-- 
Ntentos Stavros





Information forwarded to bug-coreutils <at> gnu.org:
bug#18291; Package coreutils. (Mon, 18 Aug 2014 15:58:02 GMT) Full text and rfc822 format available.

Message #8 received at 18291 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: NTENTOS STAVROS <ntentos <at> inf.uth.gr>
Cc: 18291 <at> debbugs.gnu.org
Subject: Re: bug#18291: Unix Sort Bug Report
Date: Mon, 18 Aug 2014 16:57:36 +0100
On 08/18/2014 09:55 AM, NTENTOS STAVROS wrote:
> 
> Hello developers,
> 
> Recently, using the sort utility I run into an omission. While I cannot disclose the file in question, I will try to explain the issue:
> On a Windows-created file (line ending: \r\n) I tried to perform a sorting, which happened to sort the last entry somewhere above. The last line did not have a line ending of any kind, and sort created a Unix-like ending (\r), which afterwards creates a parsing problem with the file.

Well a \n is inserted actually, not \r, but yes that is a problem on windows.
This demonstrates the behavior:

  $ printf '2\r\n1' | sort | od -Ax -tx1z -v
  000000 31 0a 32 0d 0a                                   >1.2..<

The \n is inserted so as to delimit the reordered item appropriately,
which is set here:

http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/sort.c;h=c2493192;hb=HEAD#l178

It seems that this should be set to '\r\n' on cygwin builds,
(wither other adjustments to handle multiple chars).

thanks,
Pádraig.





Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Mon, 18 Aug 2014 16:28:01 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Mon, 18 Aug 2014 16:28:02 GMT) Full text and rfc822 format available.

Notification sent to NTENTOS STAVROS <ntentos <at> inf.uth.gr>:
bug acknowledged by developer. (Mon, 18 Aug 2014 16:28:03 GMT) Full text and rfc822 format available.

Message #15 received at 18291-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: NTENTOS STAVROS <ntentos <at> inf.uth.gr>, 18291-done <at> debbugs.gnu.org
Subject: Re: bug#18291: Unix Sort Bug Report
Date: Mon, 18 Aug 2014 10:27:50 -0600
[Message part 1 (text/plain, inline)]
tag 18291 notabug
thanks

On 08/18/2014 02:55 AM, NTENTOS STAVROS wrote:
> 
> Hello developers,
> 
> Recently, using the sort utility I run into an omission. While I cannot
> disclose the file in question, I will try to explain the issue:
> On a Windows-created file (line ending: \r\n) I tried to perform a
> sorting, which happened to sort the last entry somewhere above. The last
> line did not have a line ending of any kind, and sort created a
> Unix-like ending (\r), which afterwards creates a parsing problem with
> the file.

(Unix line ending is \n, not \r)

Per POSIX, sort(1) is only required to operate on text files with one
exception:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html

"The input files shall be text files, except that the sort utility shall
add a <newline> to the end of a file ending with an incomplete last line."

and the POSIX definition of a text file is one that is either empty or
has a trailing newline to begin with:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html

"3.397 Text File"
"A file that contains characters organized into zero or more lines. The
lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
in length, including the <newline> character. Although POSIX.1-2008 does
not distinguish between text files and binary files (see the ISO C
standard), many utilities only produce predictable or meaningful output
when operating on text files. The standard utilities that have such
restrictions always specify "text files" in their STDIN or INPUT FILES
sections."

As such, coreutils is doing what is already required by POSIX, and the
bug is more on you for providing a non-text file without a trailing
newline and expecting sane behavior.  I seriously doubt cygwin can
second-guess your intention to use only windows line endings, and that
you are better off guaranteeing that you have a text file with the
desired line ending already in place than relying on sort's requirement
to add a \n if the file was not a text file merely because it had an
incomplete last line.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#18291; Package coreutils. (Mon, 18 Aug 2014 16:33:01 GMT) Full text and rfc822 format available.

Message #18 received at 18291 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Pádraig Brady <P <at> draigBrady.com>,
 NTENTOS STAVROS <ntentos <at> inf.uth.gr>
Cc: 18291 <at> debbugs.gnu.org
Subject: Re: bug#18291: Unix Sort Bug Report
Date: Mon, 18 Aug 2014 10:32:07 -0600
[Message part 1 (text/plain, inline)]
On 08/18/2014 09:57 AM, Pádraig Brady wrote:
> On 08/18/2014 09:55 AM, NTENTOS STAVROS wrote:
>>
>> Hello developers,
>>
>> Recently, using the sort utility I run into an omission. While I cannot disclose the file in question, I will try to explain the issue:
>> On a Windows-created file (line ending: \r\n) I tried to perform a sorting, which happened to sort the last entry somewhere above. The last line did not have a line ending of any kind, and sort created a Unix-like ending (\r), which afterwards creates a parsing problem with the file.
> 
> Well a \n is inserted actually, not \r, but yes that is a problem on windows.
> This demonstrates the behavior:
> 
>   $ printf '2\r\n1' | sort | od -Ax -tx1z -v
>   000000 31 0a 32 0d 0a                                   >1.2..<
> 
> The \n is inserted so as to delimit the reordered item appropriately,
> which is set here:
> 
> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/sort.c;h=c2493192;hb=HEAD#l178
> 
> It seems that this should be set to '\r\n' on cygwin builds,
> (wither other adjustments to handle multiple chars).

If the file was opened in text mode, then sort only sees \n line endings
on input (cygwin already shortened \r\n to \n before handing the line to
sort), and on output all \n are automatically converted back to \r\n.
If the file was opened in binary mode, then cygwin CANNOT second guess
what line endings you wanted.  It sounds like your file lives on a
binary mount point, when you want it to live on a text mount point
instead; at which point cygwin should do the right thing (although I
admit I did not actually try this on cygwin, because I seldom use cygwin
text mounts).  But that is probably more a question for cygwin
downstream, not for upstream coreutils (the POSIX requirement is that
text and binary file modes are identical, so any system like cygwin
where there are not is already non-POSIX and starts to get into a
question of whether pushing upstream fixes for a downstream-only problem
is maintainable).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 16 Sep 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 277 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.