GNU bug report logs - #22001
Is it possible to tab separate concatenated files?

Previous Next

Package: coreutils;

Reported by: "Macdonald, Kim - BCCDC" <kim.macdonald <at> bccdc.ca>

Date: Mon, 23 Nov 2015 21:03:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eric Blake <eblake <at> redhat.com>
To: Linda Walsh <coreutils <at> tlinx.org>, Bob Proulx <bob <at> proulx.com>
Cc: 22001 <at> debbugs.gnu.org, kim.macdonald <at> bccdc.ca
Subject: bug#22001: Is it possible to tab separate concatenated files?
Date: Thu, 26 Nov 2015 20:28:13 -0700
[Message part 1 (text/plain, inline)]
On 11/26/2015 04:52 PM, Linda Walsh wrote:

>> Because every plain
>> text line in a file must be terminated with a newline.
> ----
>    That's only a recent POSIX definition.  It's not related to
> real life.  When I looked for a text file definition on google, nothing
> was mentioned about needing a newline on the last line -- except on
> 1 site -- and that site was clearly not talking about 'text' files, but
> Unix-text-record files w/each record terminated by a NL char.
> 

Quit spreading FUD about POSIX.  That definition of text file is NOT a
recent invention; even back in POSIX 2001 the definition read:

3.392 Text File

A file that contains characters organized into one or more lines. The
lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
in length, including the <newline>. Although IEEE Std 1003.1-2001 does
not distinguish between text files and binary files (see the ISO C
standard), many utilities only produce predictable or meaningful output
when operating on text files. The standard utilities that have such
restrictions always specify "text files" in their STDIN or INPUT FILES
sections.
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html

That was POSIX Issue 6; the more recent POSIX Issue 7 corrected the
definition to also allow a completely empty file to be considered as a
text file.  But the point is that POSIX has always required a text file
to end in a newline.

>    On a mac, txt files have records separated by 'CR', and on DOS/Win,
> txt files have txt records separated by CRLF.

And those systems aren't POSIX.  So they aren't relevant to a discussion
about POSIX.


>> Why isn't there a newline at the end of the file?  Fix that and all of
>> your problems and many others go away.
>>   
> ---
>    Didn't used to be a requirement -- it was added because of a broken
> interpretation of the posix standard.  Please remember that a a posixified
> definition of 'X' (for any X), may not be the same as a real-live 'X'.

No, it has ALWAYS been a problem.  Even 40 years ago, before POSIX was
invented, the only PORTABLE way to use programs like sed was to use it
on text files - namely, files where no line exceeded LINE_MAX bytes,
where no lines contained NUL bytes, and where ALL lines ended in
newline.  Because there were vendor implementations of sed (not GNU
coreutils, mind you, but other vendors) that really were hardcoded to
some rather small limits, and understandably so in a day when computers
did not have as much memory as they do today.  POSIX just standardized
existing practice on what formed a text file, when it came to existing
Unix systems at that time.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 6 years and 213 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.