GNU bug report logs - #26574
v4.4: POSIX violation with respect to output of a trailing newline, even with --posix

Previous Next

Package: sed;

Reported by: Michael Klement <michael.klement <at> usa.net>

Date: Thu, 20 Apr 2017 04:00:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #24 received at 26574-done <at> debbugs.gnu.org (full text, mbox):

From: Michael Klement <michael.klement <at> usa.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: Eric Blake <eblake <at> redhat.com>, 26574-done <at> debbugs.gnu.org
Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a
 trailing newline, even with --posix
Date: Thu, 20 Apr 2017 15:32:13 -0400
[Message part 1 (text/plain, inline)]
Thanks for digging into this, it indeed illustrates the point well.

Just for the record:

Here's what I get on FreeBSD 10.1.2 and on macOS 10.12.4:

$ printf 'a' | sed '' | od -tx1
0000000    61  0a                                                        
0000002

macOS typically comes with an older version of the BSD implementation (which doesn't support --version, but the man pages are dated June 20, 2014 and May 10, 2005, respectively).

Another (minor) point of interest:

On macOS 10.12.4 (but not FreeBSD 10.1.2), Sed chokes on bytes that aren't valid in UTF-8 encoding, when using regex-based functionality:

$ printf '\xfc\n' | sed  -n '/./p'
sed: RE error: illegal byte sequence




> On Apr 20, 2017, at 2:32 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> 
> Hello,
> 
> On Thu, Apr 20, 2017 at 11:46:15AM -0500, Eric Blake wrote:
>> On 04/20/2017 11:36 AM, Michael Klement wrote:
>>> Thanks for the detailed feedback, Eric.
>>> 
>>> The POSIX spec. is, unfortunately, vague on this topic:
>>> 
>>> The definition of a line (which you quote) is complemented with the definition of an incomplete line <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195>:
>>> 
>>>> A sequence of one or more non- <newline> characters at the end of the file.
>>> 
>>> 
>>> So while the standard is aware of this possibility and gives it a name that suggests it is a kind of line, but something's missing, there is precious little behavior prescribed with respect to such incomplete lines.
>>> 
>> 
>> You're welcome to submit a bug report to get POSIX to more clearly word
>> its intentions that a file with an incomplete line is NOT a text file
>> (http://austingroupbugs.net/main_page.php), but everyone on the Austin
>> Group (myself included) has already agreed that the intention is there
>> (even if the wording could be improved): Omitting a trailing newline
>> causes sed to enter into the realm of undefined behavior - and this is
>> BECAUSE there are existing sed implementations that behave differently
>> when a trailing newline is omitted.  Some do not do anything with an
>> incomplete line (sed behaves as though the file were truncated at the
>> last newline).
>> 
> 
> For completeness, here's the behaviour of several implementaions:
> 
> sed implementations that do not add a newline (like gnu sed):
>  FreeBSD 10
>  OpenBSD 5.9
>  BusyBox 1.22
>  ToyBox 7.2
>  AIX 7
> 
> sed implementations that do add a new line:
>  NetBSD 7.0
>  Heirloom
> 
> SunOS 5.11's sed prints nothing if there is no newline:
>  $ printf 'a' | sed '' | od -tx1
>  0000000
>  $ printf 'a\n' | sed '' | od -tx1
>  0000000 61 0a
>  0000002
>  $ uname -a
>  SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise
>  $ which sed
>  /usr/bin/sed
> 
> 
> The behaviour (of processing a file without newline at the last line) also differs in other programs/languages/implementations:
> 
>  $ printf a | perl -npe '' | od -tx1
>  0000000 61
>  0000001
> 
>  $ printf a | perl -lnpe '' | od -tx1
>  0000000 61 0a
>  0000002
> 
>  $ printf a | awk '{print}' | od -tx1
>  0000000 61 0a
>  0000002
> 
>  $ printf 'a' | sh -c 'while read A ; do echo $A ; done' | od -tx1
>  0000000
> 
>  $ printf 'a' \
>     | python3 -c 'import sys; [print(x,end="") for x in sys.stdin]' \
>     | od -tx1
>  0000000 61
>  0000001
> 
>  $ printf a | uniq-gnu | od -t x1
>  0000000 61 0a
>  0000002
> 
>  $ printf a | uniq-freebsd-11 | od -t x1
>  0000000    61
>  0000001
> 
>  $ printf a | cut-gnu -f1 | od -tx1
>  0000000 61 0a
>  0000002
> 
>  $ printf a | cut-freebsd-11 -f1 | od -tx1
>  0000000    61
>  0000001
> 
>  $ printf a | sort | od -t x1
>  0000000 61 0a
>  0000002
> 
> 
> And this reinforces what Eric wrote: there is simply no
> 'one correct' (or agreed-upon) way to deal with files without newlines on the last line.
> 
> 
> regards,
> - assaf

[Message part 2 (text/html, inline)]

This bug report was last modified 8 years and 35 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.