GNU bug report logs -
#26574
v4.4: POSIX violation with respect to output of a trailing newline, even with --posix
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
On 04/20/2017 11:36 AM, Michael Klement wrote:
> Thanks for the detailed feedback, Eric.
>
> The POSIX spec. is, unfortunately, vague on this topic:
>
> The definition of a line (which you quote) is complemented with the definition of an incomplete line <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195>:
>
>> A sequence of one or more non- <newline> characters at the end of the file.
>
>
> So while the standard is aware of this possibility and gives it a name that suggests it is a kind of line, but something's missing, there is precious little behavior prescribed with respect to such incomplete lines.
>
You're welcome to submit a bug report to get POSIX to more clearly word
its intentions that a file with an incomplete line is NOT a text file
(http://austingroupbugs.net/main_page.php), but everyone on the Austin
Group (myself included) has already agreed that the intention is there
(even if the wording could be improved): Omitting a trailing newline
causes sed to enter into the realm of undefined behavior - and this is
BECAUSE there are existing sed implementations that behave differently
when a trailing newline is omitted. Some do not do anything with an
incomplete line (sed behaves as though the file were truncated at the
last newline).
> So we have:
>
> sed's "input files shall be text files."
> a text file contains "characters organized into zero or more lines"
>
> Beyond the "zero or more lines", the only restrictions placed on what constitutes a text file <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403> are:
> " The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. "
>
> If you interpret the word "lines" in the phrase "zero or more lines" to mean complete lines only (which is reasonable), then indeed any file that ends in an incomplete line is not a text file.
>
> I really wish the spec. were more explicit about incomplete lines.
As I said, you're welcome to propose a bug report with suggested wording
improvements.
>
>> If anything, the only
>> change I would make is have 'sed --posix' error out on non-text input,
>> to call attention to the user's attempt to feed non-posix-compliant data
>> to sed.
>
>
> That is definitely an option, but perhaps intuitive understanding and historical practice / other implementations could be considered instead:
>
> Intuitively, a file containing text with an incomplete line is obviously still a text file
Not per the POSIX definition of a text file.
It is still a file, but no longer a text file.
It wouldn't be the first time intuition has been wrong.
> wc is an interesting case, which doesn't count an incomplete line as a line (the spec <http://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html>. is actually unambiguous there and mandates counting the newlines),
Indeed, wc is a good example of how the POSIX writers specifically went
out of their way to describe behaviors of programs that MUST be
consistent when presented with a non-text file; as well as the escape
clause that for all other programs (including sed) that require text
file inputs, the behavior is intentionally unspecified if the trailing
newline is not present.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 8 years and 35 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.