On 04/20/2017 11:36 AM, Michael Klement wrote: > Thanks for the detailed feedback, Eric. > > The POSIX spec. is, unfortunately, vague on this topic: > > The definition of a line (which you quote) is complemented with the definition of an incomplete line : > >> A sequence of one or more non- characters at the end of the file. > > > So while the standard is aware of this possibility and gives it a name that suggests it is a kind of line, but something's missing, there is precious little behavior prescribed with respect to such incomplete lines. > You're welcome to submit a bug report to get POSIX to more clearly word its intentions that a file with an incomplete line is NOT a text file (http://austingroupbugs.net/main_page.php), but everyone on the Austin Group (myself included) has already agreed that the intention is there (even if the wording could be improved): Omitting a trailing newline causes sed to enter into the realm of undefined behavior - and this is BECAUSE there are existing sed implementations that behave differently when a trailing newline is omitted. Some do not do anything with an incomplete line (sed behaves as though the file were truncated at the last newline). > So we have: > > sed's "input files shall be text files." > a text file contains "characters organized into zero or more lines" > > Beyond the "zero or more lines", the only restrictions placed on what constitutes a text file are: > " The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the character. " > > If you interpret the word "lines" in the phrase "zero or more lines" to mean complete lines only (which is reasonable), then indeed any file that ends in an incomplete line is not a text file. > > I really wish the spec. were more explicit about incomplete lines. As I said, you're welcome to propose a bug report with suggested wording improvements. > >> If anything, the only >> change I would make is have 'sed --posix' error out on non-text input, >> to call attention to the user's attempt to feed non-posix-compliant data >> to sed. > > > That is definitely an option, but perhaps intuitive understanding and historical practice / other implementations could be considered instead: > > Intuitively, a file containing text with an incomplete line is obviously still a text file Not per the POSIX definition of a text file. It is still a file, but no longer a text file. It wouldn't be the first time intuition has been wrong. > wc is an interesting case, which doesn't count an incomplete line as a line (the spec . is actually unambiguous there and mandates counting the newlines), Indeed, wc is a good example of how the POSIX writers specifically went out of their way to describe behaviors of programs that MUST be consistent when presented with a non-text file; as well as the escape clause that for all other programs (including sed) that require text file inputs, the behavior is intentionally unspecified if the trailing newline is not present. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org