From unknown Sun Jun 22 00:03:54 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#26574 <26574@debbugs.gnu.org> To: bug#26574 <26574@debbugs.gnu.org> Subject: Status: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix Reply-To: bug#26574 <26574@debbugs.gnu.org> Date: Sun, 22 Jun 2025 07:03:54 +0000 retitle 26574 v4.4: POSIX violation with respect to output of a trailing ne= wline, even with --posix reassign 26574 sed submitter 26574 Michael Klement severity 26574 normal tag 26574 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 19 23:59:34 2017 Received: (at submit) by debbugs.gnu.org; 20 Apr 2017 03:59:34 +0000 Received: from localhost ([127.0.0.1]:57786 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d13Fk-0001Y2-Hx for submit@debbugs.gnu.org; Wed, 19 Apr 2017 23:59:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:58758) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1187-0004zu-5F for submit@debbugs.gnu.org; Wed, 19 Apr 2017 21:43:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d1181-0002h8-25 for submit@debbugs.gnu.org; Wed, 19 Apr 2017 21:43:26 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: * X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_50, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:57196) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1d1180-0002h3-VE for submit@debbugs.gnu.org; Wed, 19 Apr 2017 21:43:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49719) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d117y-0003iI-Sk for bug-sed@gnu.org; Wed, 19 Apr 2017 21:43:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d117u-0002fE-2x for bug-sed@gnu.org; Wed, 19 Apr 2017 21:43:22 -0400 Received: from mail-qt0-x230.google.com ([2607:f8b0:400d:c0d::230]:35128) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1d117t-0002ez-TC for bug-sed@gnu.org; Wed, 19 Apr 2017 21:43:17 -0400 Received: by mail-qt0-x230.google.com with SMTP id y33so34357377qta.2 for ; Wed, 19 Apr 2017 18:43:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:mime-version:subject:message-id:date:to; bh=wMK6ius52ZYj+PnLOhw3F6XEqCytzWY1uLNM8LJ5yPQ=; b=G8ndfZiMFLmCdh1PZfRbCFWneftmmuuZwEB9tNiaf58YuJr6rfg8J6PGLRCaSe+5KX ubp262JXmuLxmnXDtXMqP6QiGNJE4ZRP85EHUYqIP29ybf3umDxTnWsmFhG70ZpdlA2M 5ERjV6/cq/OsSPJ8Re2zoHZUnuG36JT616CaeeWnQqWgiBXZpdMCS9uqpt470imVnb5m xZkblcbnOk9O50DZ0qua6ptXRfKaAZiJ4G3frpzT3mUoVDtx3qDErzMN9wfF66lMF2uC yxmbNyQIX3UqdvWwNUq/2tsEf8JBLC5fFHcrYER1CJMqkXdO/cPyNY7E6mapg6N+Hswf HSjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:mime-version:subject:message-id:date :to; bh=wMK6ius52ZYj+PnLOhw3F6XEqCytzWY1uLNM8LJ5yPQ=; b=Xo7o3KKS4yb66bLC/SDkvfx1l9uYEVGTmJ2jGyDxyPb5Lf1+K5Ux2YgGTdiDTIDTna UiQ/gN+X0Gt7CN9GkUuSr8vy8cmgXlBjF0oXfSl62qhzqj+q+fbkzTNM0OKfisnwDlO4 NgfljJLDGKXYAsKPWzAcn/do9ZzI6sT+vE1aymG+m1ROUOdcbubKmMDYanzthD/xDAvm W0Gaz9thFmiiokdz1JP2W3tVRqZFpZ4DjoYuZfZwDK2smsdfp69seWj3uRy8CUdZXGjs q8LigUCxhUBxmAjcvbXxGKfiGlYG15P0EFXgvILri8TLoa8psKW4fR7cz1nTDUMJNIA9 Db+Q== X-Gm-Message-State: AN3rC/6Tdk7uIlgd0anUvDX6IQT42vWptIsk7tOyY7SNg8dbChifml7p O9WQ2HPav1PR5BePlJc= X-Received: by 10.237.33.69 with SMTP id 63mr5582053qtc.195.1492652597111; Wed, 19 Apr 2017 18:43:17 -0700 (PDT) Received: from mkimac.home (pool-173-48-237-207.bstnma.fios.verizon.net. [173.48.237.207]) by smtp.gmail.com with ESMTPSA id n128sm2917854qkf.12.2017.04.19.18.43.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Apr 2017 18:43:16 -0700 (PDT) From: Michael Klement Content-Type: multipart/alternative; boundary="Apple-Mail=_F99DFE2A-54E3-4EB2-8287-B3F98832AFC3" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix Message-Id: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> Date: Wed, 19 Apr 2017 21:43:15 -0400 To: bug-sed@gnu.org X-Mailer: Apple Mail (2.3273) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 19 Apr 2017 23:59:32 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.6 (---) --Apple-Mail=_F99DFE2A-54E3-4EB2-8287-B3F98832AFC3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii $ sed --version sed (GNU sed) 4.4 The POSIX spec. = = states: "Whenever the pattern space is written to standard output or a named = file, sed shall immediately follow it with a ." While GNU Sed's default behavior of preserving the trailing-newline = status of the input's last line is defensible and can be helpful, it should exhibit POSIX-compliant behavior when invoked with --posix. # Acceptable default behavior - the no-trailing-newline status of the = input is preserved. $ printf 'a' | sed '' | od -t x1 0000000 61 # SHOULD include a trailing newline, per POSIX, but currently doesn't. $ printf 'a' | sed --posix '' | od -t x1 0000000 61 Regards, Michael= --Apple-Mail=_F99DFE2A-54E3-4EB2-8287-B3F98832AFC3 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii $ sed = --version
sed (GNU sed) 4.4

The POSIX spec. states:
"Whenever the pattern = space is written to standard output or a named file, sed shall = immediately follow it with a <newline>."

While GNU Sed's default behavior of preserving the = trailing-newline status of the input's last line is defensible and can = be helpful,
it should exhibit POSIX-compliant behavior when invoked with --posix.

# Acceptable = default behavior - the no-trailing-newline status of the input is = preserved.
$ printf 'a' = | sed '' | od -t x1
0000000 =    61


# SHOULD include a trailing = newline, per POSIX, but currently doesn't.
$ printf 'a' | sed --posix '' = | od -t x1
0000000    61



Regards,

Michael
= --Apple-Mail=_F99DFE2A-54E3-4EB2-8287-B3F98832AFC3-- From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 20 06:42:19 2017 Received: (at control) by debbugs.gnu.org; 20 Apr 2017 10:42:19 +0000 Received: from localhost ([127.0.0.1]:58005 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d19XW-000501-LZ for submit@debbugs.gnu.org; Thu, 20 Apr 2017 06:42:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43334) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d19XU-0004zh-FW; Thu, 20 Apr 2017 06:42:17 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4F55C3D978; Thu, 20 Apr 2017 10:42:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 4F55C3D978 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=eblake@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 4F55C3D978 Received: from [10.10.121.102] (ovpn-121-102.rdu2.redhat.com [10.10.121.102]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A52351DE7; Thu, 20 Apr 2017 10:42:09 +0000 (UTC) Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix To: Michael Klement , 26574-done@debbugs.gnu.org References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: Date: Thu, 20 Apr 2017 05:42:08 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0 MIME-Version: 1.0 In-Reply-To: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="GDAk9kDD2OgX5owWPtUnEPx8SceA1PuVh" X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 20 Apr 2017 10:42:10 +0000 (UTC) X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --GDAk9kDD2OgX5owWPtUnEPx8SceA1PuVh Content-Type: multipart/mixed; boundary="JRwJar7rbp5fXc2aleL3GsQdAnX87xUlN"; protected-headers="v1" From: Eric Blake To: Michael Klement , 26574-done@debbugs.gnu.org Message-ID: Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> In-Reply-To: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> --JRwJar7rbp5fXc2aleL3GsQdAnX87xUlN Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable tag 26574 notabug thanks On 04/19/2017 08:43 PM, Michael Klement wrote: > $ sed --version > sed (GNU sed) 4.4 >=20 > The POSIX spec. states: > "Whenever the pattern space is written to standard output or a named fi= le, sed shall immediately follow it with a ." >=20 > While GNU Sed's default behavior of preserving the trailing-newline sta= tus of the input's last line is defensible and can be helpful, > it should exhibit POSIX-compliant behavior when invoked with --posix. POSIX also requires that input given to sed be text files: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html "The input files shall be text files." And per the definition of text file, ALL input lines must have a trailing newline in the first place: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html "3.403 Text File A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections." "3.206 Line A sequence of zero or more non- characters plus a terminating character." Input that does NOT end in a trailing newline is NOT a text file, and therefore is NOT a POSIX-compliant use of sed, and therefore, sed --posix need not do anything different with it because you are already outside the bounds of what POSIX requires. Therefore, I don't think you have a case for changing any behavior, at least not on the grounds of appealing to POSIX, so I'm marking this as not a bug, but feel free to continue discussion. If anything, the only change I would make is have 'sed --posix' error out on non-text input, to call attention to the user's attempt to feed non-posix-compliant data to sed. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org --JRwJar7rbp5fXc2aleL3GsQdAnX87xUlN-- --GDAk9kDD2OgX5owWPtUnEPx8SceA1PuVh Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJY+JCAAAoJEKeha0olJ0NqN+wIAKjt5ENU2SurdhUdUR7RGGNg Z6ZuzuOhwmQokIDK6bsW0BQ8FeyflPoiz27WqBynzT9nQLCP5xBr6Ka7nrNsXJY6 E2ssUZfX3CnQ0EUShfJaFOS1j8Gy53TCZcHN5d7IKwTqFBVn/PdhGUhwEnjMIrd5 V4O0RZeWCduG6gKkUaesTPi216fOzHkiUetjpujBXyz/HxjPv7FkjKAQjszGN8yB IvC3FzvOb/D8KLo4NLVAaMqmw2cE58OwDWGRNBzhoK5q7Ta3E/qFY+IAvOAZkNYM q/PTVkQ1jwhKh13GPfZ8t6bcYhhpWxy1z8J2IZE2V6hA71ea4y2LYhl/0CW24xs= =LUW4 -----END PGP SIGNATURE----- --GDAk9kDD2OgX5owWPtUnEPx8SceA1PuVh-- From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 20 12:36:13 2017 Received: (at 26574-done) by debbugs.gnu.org; 20 Apr 2017 16:36:14 +0000 Received: from localhost ([127.0.0.1]:59260 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1F41-0002m5-DS for submit@debbugs.gnu.org; Thu, 20 Apr 2017 12:36:13 -0400 Received: from mail-qt0-f177.google.com ([209.85.216.177]:35701) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1F3y-0002lr-RV for 26574-done@debbugs.gnu.org; Thu, 20 Apr 2017 12:36:11 -0400 Received: by mail-qt0-f177.google.com with SMTP id y33so49825782qta.2 for <26574-done@debbugs.gnu.org>; Thu, 20 Apr 2017 09:36:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=ABFa3HLMoka/3366RUIMQro/xdAuWOOk6lJjXZ9cDjc=; b=uKomL+OGucZ1QpfxSl3Nfz1LvPMXwnXZDn4fvcEYKnc+AJ3dzp1FmDUlKV55xcOw5V 876+IT1P86JihxpjdiJHSCO2TFIVrA7QTL8gkO2mZk3N4/QVeSOSdGRWyRj11TAn02J6 pGwMS/qBFhsgR252zdLiHpQMA4t9216rRaGM+Nm6jeHzTY8WydyKmTbCk03ckCpvRBZ+ vgKLqhChSivrvUNFK3fwfsbYccVFqpmEqh8rTxzli1W2PmAarJrEKFnlouQcZzzHLZHE fVD11o/mwqrh5azHNaPG10nmD46yiqFDVgAGLm0kwwZ3Qo/ZBWpIVOjoBjmreQc+lJDG IizQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=ABFa3HLMoka/3366RUIMQro/xdAuWOOk6lJjXZ9cDjc=; b=GU6KCsiJ/8ijNp6iF347aqo7O2CoPsLPs3SbUvh9tRy3VaFIqJHnNemhKCctAyISu5 dhG5vaKAJXjZYrH7SQ0jhlfkiVTKbvXD9upokQrLBfYSw+FuYWPX9nNBpTKz5Rlmjmjv lm1GeeCVHjrsghzPFaKo9KTMljbcwENEo2y4bebt932/CeHhMiqQc5+DRJxbq+eN6xhn /A9oiHrBhPt4UXzgvx17df2lEUg4sd9ESG1RiLvthKuAOpemMyEPH7Ck6Z5ZhNdm7F2A AHxyIhWRnwE9pEMFqB0V1YGr1IUsH5xIxM4U5KVFRNtay6I+jiAYxi1iPtdCIXbK6Xl3 7BWA== X-Gm-Message-State: AN3rC/6WyQPOWslpKEhd+xg2kZPc4n5jYOp6af74EYmVww9WaPHcVLgG SBvfkWwJEJkFVA== X-Received: by 10.200.50.143 with SMTP id z15mr8817696qta.114.1492706165140; Thu, 20 Apr 2017 09:36:05 -0700 (PDT) Received: from mkimac.home (pool-173-48-237-207.bstnma.fios.verizon.net. [173.48.237.207]) by smtp.gmail.com with ESMTPSA id r55sm4495364qtr.16.2017.04.20.09.36.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Apr 2017 09:36:04 -0700 (PDT) From: Michael Klement Message-Id: <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> Content-Type: multipart/alternative; boundary="Apple-Mail=_F413DE9E-1CD7-4E9A-A4A8-7487F2DB5F54" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix Date: Thu, 20 Apr 2017 12:36:03 -0400 In-Reply-To: To: Eric Blake References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> X-Mailer: Apple Mail (2.3273) X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 26574-done Cc: 26574-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) --Apple-Mail=_F413DE9E-1CD7-4E9A-A4A8-7487F2DB5F54 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Thanks for the detailed feedback, Eric. The POSIX spec. is, unfortunately, vague on this topic: The definition of a line (which you quote) is complemented with the = definition of an incomplete line = : > A sequence of one or more non- characters at the end of the = file. So while the standard is aware of this possibility and gives it a name = that suggests it is a kind of line, but something's missing, there is = precious little behavior prescribed with respect to such incomplete = lines. So we have: sed's "input files shall be text files." a text file contains "characters organized into zero or more lines" Beyond the "zero or more lines", the only restrictions placed on what = constitutes a text file = are: " The lines do not contain NUL characters and none can exceed {LINE_MAX} = bytes in length, including the character. " If you interpret the word "lines" in the phrase "zero or more lines" to = mean complete lines only (which is reasonable), then indeed any file = that ends in an incomplete line is not a text file. I really wish the spec. were more explicit about incomplete lines. > If anything, the only > change I would make is have 'sed --posix' error out on non-text input, > to call attention to the user's attempt to feed non-posix-compliant = data > to sed. That is definitely an option, but perhaps intuitive understanding and = historical practice / other implementations could be considered instead: Intuitively, a file containing text with an incomplete line is obviously = still a text file - just one that has no trailing \n, so treating = incomplete lines (mostly) like lines makes sense. In practice, most utilities still read the incomplete line - the shell's = read builtin being a notable exception. wc is an interesting case, which doesn't count an incomplete line as a = line (the spec = . is = actually unambiguous there and mandates counting the newlines), yet = still counts its words and characters/bytes. BSD/macOS Sed is a mostly POSIX-features-only implementation, and it = always appends a trailing \n, even when encountering an incomplete line. = (On the flip side, that makes it fundamentally unsuited to operating on = binary files - unlike GNU Sed). I'm not sure about other implementations (or even if there are any that = still matter today). So, as a compromise, GNU sed --posix could treat files with an = incomplete line as text files, as long as the incomplete line contains = no NULs and contains at most getconf LINE_MAX - 1 characters. Maybe the issue at hand is rarely of concern in the real world, but I've = stumbled over it on several occasions when writing portable Sed commands = (at least portable between Linux and macOS). This issue and the infamous -i option incompatibility (which probably = will never go away) are what get in the way of writing such commands. Thanks, Michael > On Apr 20, 2017, at 6:42 AM, Eric Blake wrote: >=20 > tag 26574 notabug > thanks >=20 > On 04/19/2017 08:43 PM, Michael Klement wrote: >> $ sed --version >> sed (GNU sed) 4.4 >>=20 >> The POSIX spec. = = states: >> "Whenever the pattern space is written to standard output or a named = file, sed shall immediately follow it with a ." >>=20 >> While GNU Sed's default behavior of preserving the trailing-newline = status of the input's last line is defensible and can be helpful, >> it should exhibit POSIX-compliant behavior when invoked with --posix. >=20 > POSIX also requires that input given to sed be text files: >=20 > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html > "The input files shall be text files." >=20 > And per the definition of text file, ALL input lines must have a > trailing newline in the first place: > = http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html > "3.403 Text File > A file that contains characters organized into zero or more lines. The > lines do not contain NUL characters and none can exceed {LINE_MAX} = bytes > in length, including the character. Although POSIX.1-2008 = does > not distinguish between text files and binary files (see the ISO C > standard), many utilities only produce predictable or meaningful = output > when operating on text files. The standard utilities that have such > restrictions always specify "text files" in their STDIN or INPUT FILES > sections." >=20 > "3.206 Line > A sequence of zero or more non- characters plus a = terminating > character." >=20 > Input that does NOT end in a trailing newline is NOT a text file, and > therefore is NOT a POSIX-compliant use of sed, and therefore, sed > --posix need not do anything different with it because you are already > outside the bounds of what POSIX requires. >=20 > Therefore, I don't think you have a case for changing any behavior, at > least not on the grounds of appealing to POSIX, so I'm marking this as > not a bug, but feel free to continue discussion. If anything, the = only > change I would make is have 'sed --posix' error out on non-text input, > to call attention to the user's attempt to feed non-posix-compliant = data > to sed. >=20 > --=20 > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3266 > Virtualization: qemu.org | libvirt.org >=20 --Apple-Mail=_F413DE9E-1CD7-4E9A-A4A8-7487F2DB5F54 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
Thanks for the detailed feedback, = Eric.

The = POSIX spec. is, unfortunately, vague on this topic:

The definition of a line = (which you quote) is complemented with the definition of an incomplete = line:

A sequence of one or more non- <newline> characters at = the end of the file.

So while the standard is aware of this = possibility and gives it a name that suggests it is a kind = of line, but something's missing, there is = precious little behavior prescribed with respect to such incomplete = lines.

So we = have:

  • sed's "input files shall be text = files."
  • a text file contains "characters organized = into zero or more lines"

Beyond the "zero or = more lines", the only restrictions placed on what constitutes a text = file are:
  • " The lines do not contain NUL = characters and none can exceed {LINE_MAX} bytes in length, including the = <newline> character. "

If you interpret the = word "lines" in the phrase "zero or more lines" to mean complete lines only (which is reasonable), then indeed = any file that ends in an incomplete line is not a text = file.

I really wish the spec. were more = explicit about incomplete lines.

  If anything, the only
change I = would make is have 'sed --posix' error out on non-text input,
to call attention to the user's attempt to feed = non-posix-compliant data
to sed.

That is definitely an = option, but perhaps intuitive understanding and historical practice / = other implementations could be considered instead:

  • Intuitively, a file containing text = with an incomplete line is obviously still a text file = - just one that has no trailing \n, so treating incomplete lines (mostly) like lines = makes sense.
    • In practice, most = utilities still read the incomplete line - the shell's read builtin being a = notable exception.
    • wc is an interesting case, which doesn't = count an incomplete line as a line (the spec. is actually unambiguous there and mandates = counting the newlines), yet still counts its words and = characters/bytes.
  • BSD/macOS Sed is a mostly = POSIX-features-only implementation, and it always = appends a trailing \n, even when encountering an incomplete line. (On the = flip side, that makes it fundamentally unsuited to operating on binary = files - unlike GNU Sed).
  • I'm not sure about other = implementations (or even if there are any that still matter = today).

So, as a compromise, GNU sed --posix could treat files with an incomplete line = as text files, as long as the incomplete line contains no NULs and = contains at most getconf = LINE_MAX - 1 characters.

Maybe the issue at hand is rarely of = concern in the real world, but I've stumbled over it on several = occasions when writing portable Sed commands (at least portable between = Linux and macOS).
This issue and the infamous -i = option incompatibility (which probably will never go away) are what get = in the way of writing such commands.

Thanks,

Michael






On Apr 20, 2017, at 6:42 AM, Eric Blake = <eblake@redhat.com> wrote:

tag = 26574 notabug
thanks

On = 04/19/2017 08:43 PM, Michael Klement wrote:
$ sed --version
sed (GNU sed) = 4.4

The POSIX spec. <http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.h= tml> states:
"Whenever the pattern space is written = to standard output or a named file, sed shall immediately follow it with = a <newline>."

While GNU Sed's default = behavior of preserving the trailing-newline status of the input's last = line is defensible and can be helpful,
it should exhibit = POSIX-compliant behavior when invoked with --posix.

POSIX also requires that input = given to sed be text files:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.h= tml
"The input files shall be text files."

And per the definition of text file, ALL input = lines must have a
trailing newline in the first place:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_cha= p03.html
"3.403 Text File
A file that = contains characters organized into zero or more lines. The
lines do not contain NUL characters and none can exceed = {LINE_MAX} bytes
in length, including the <newline> = character. Although POSIX.1-2008 does
not distinguish = between text files and binary files (see the ISO C
standard), many utilities only produce predictable or = meaningful output
when operating on text files. The = standard utilities that have such
restrictions always = specify "text files" in their STDIN or INPUT FILES
sections."

"3.206 Line
A sequence of zero or more non- <newline> characters = plus a terminating
<newline> character."

Input that does NOT end in a trailing newline = is NOT a text file, and
therefore is NOT a POSIX-compliant = use of sed, and therefore, sed
--posix need not do = anything different with it because you are already
outside = the bounds of what POSIX requires.

Therefore,= I don't think you have a case for changing any behavior, at
least not on the grounds of appealing to POSIX, so I'm = marking this as
not a bug, but feel free to continue = discussion.  If anything, the only
change I would = make is have 'sed --posix' error out on non-text input,
to = call attention to the user's attempt to feed non-posix-compliant data
to sed.

--
Eric = Blake, Principal Software Engineer
Red Hat, Inc. =           +1-919-301-326= 6
Virtualization:  qemu.org | libvirt.org


= --Apple-Mail=_F413DE9E-1CD7-4E9A-A4A8-7487F2DB5F54-- From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 20 12:46:25 2017 Received: (at 26574-done) by debbugs.gnu.org; 20 Apr 2017 16:46:25 +0000 Received: from localhost ([127.0.0.1]:59269 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1FDs-00030S-UE for submit@debbugs.gnu.org; Thu, 20 Apr 2017 12:46:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34848) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1FDr-00030D-0K for 26574-done@debbugs.gnu.org; Thu, 20 Apr 2017 12:46:23 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DEBE842BDA; Thu, 20 Apr 2017 16:46:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com DEBE842BDA Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=eblake@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com DEBE842BDA Received: from [10.10.121.102] (ovpn-121-102.rdu2.redhat.com [10.10.121.102]) by smtp.corp.redhat.com (Postfix) with ESMTP id 60FE57FBCE; Thu, 20 Apr 2017 16:46:16 +0000 (UTC) Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix To: Michael Klement References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <195a1041-b6a6-4dcd-ce0d-ed06febfb86e@redhat.com> Date: Thu, 20 Apr 2017 11:46:15 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0 MIME-Version: 1.0 In-Reply-To: <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="JvpiHFqPeQV0HeULQi7aJOOwDK2HRnFEh" X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 20 Apr 2017 16:46:17 +0000 (UTC) X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 26574-done Cc: 26574-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --JvpiHFqPeQV0HeULQi7aJOOwDK2HRnFEh Content-Type: multipart/mixed; boundary="9FqJvOCe8GIGIOrtEBWkJsWgXBbR4JpIO"; protected-headers="v1" From: Eric Blake To: Michael Klement Cc: 26574-done@debbugs.gnu.org Message-ID: <195a1041-b6a6-4dcd-ce0d-ed06febfb86e@redhat.com> Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> In-Reply-To: <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> --9FqJvOCe8GIGIOrtEBWkJsWgXBbR4JpIO Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 04/20/2017 11:36 AM, Michael Klement wrote: > Thanks for the detailed feedback, Eric. >=20 > The POSIX spec. is, unfortunately, vague on this topic: >=20 > The definition of a line (which you quote) is complemented with the def= inition of an incomplete line : >=20 >> A sequence of one or more non- characters at the end of the = file. >=20 >=20 > So while the standard is aware of this possibility and gives it a name = that suggests it is a kind of line, but something's missing, there is pre= cious little behavior prescribed with respect to such incomplete lines. >=20 You're welcome to submit a bug report to get POSIX to more clearly word its intentions that a file with an incomplete line is NOT a text file (http://austingroupbugs.net/main_page.php), but everyone on the Austin Group (myself included) has already agreed that the intention is there (even if the wording could be improved): Omitting a trailing newline causes sed to enter into the realm of undefined behavior - and this is BECAUSE there are existing sed implementations that behave differently when a trailing newline is omitted. Some do not do anything with an incomplete line (sed behaves as though the file were truncated at the last newline). > So we have: >=20 > sed's "input files shall be text files." > a text file contains "characters organized into zero or more lines" >=20 > Beyond the "zero or more lines", the only restrictions placed on what c= onstitutes a text file are: > " The lines do not contain NUL characters and none can exceed {LINE_MAX= } bytes in length, including the character. " >=20 > If you interpret the word "lines" in the phrase "zero or more lines" to= mean complete lines only (which is reasonable), then indeed any file tha= t ends in an incomplete line is not a text file. >=20 > I really wish the spec. were more explicit about incomplete lines. As I said, you're welcome to propose a bug report with suggested wording improvements. >=20 >> If anything, the only >> change I would make is have 'sed --posix' error out on non-text input,= >> to call attention to the user's attempt to feed non-posix-compliant da= ta >> to sed. >=20 >=20 > That is definitely an option, but perhaps intuitive understanding and h= istorical practice / other implementations could be considered instead: >=20 > Intuitively, a file containing text with an incomplete line is obviousl= y still a text file Not per the POSIX definition of a text file. It is still a file, but no longer a text file. It wouldn't be the first time intuition has been wrong. > wc is an interesting case, which doesn't count an incomplete line as a = line (the spec . is actually unambiguous there and mandates counting the newlin= es), Indeed, wc is a good example of how the POSIX writers specifically went out of their way to describe behaviors of programs that MUST be consistent when presented with a non-text file; as well as the escape clause that for all other programs (including sed) that require text file inputs, the behavior is intentionally unspecified if the trailing newline is not present. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org --9FqJvOCe8GIGIOrtEBWkJsWgXBbR4JpIO-- --JvpiHFqPeQV0HeULQi7aJOOwDK2HRnFEh Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJY+OXXAAoJEKeha0olJ0NqvToH/iNIgMSFF+MPUpB1pTLyAyHA Fw6GVHFRxgRNjYNGvxDF9uycpH3i3GIjmBXrz6V0oy36j3Y76y9wOPdJSkNKQoj1 S/C+38pndyGIEJSkm2Bkc9hnUdAUZCTIEMJVs7hCLZ1aPFDkw8N3e6xOIZOWQZIF ka+mcOlFUX0KSsmGLJbMR3kSiIxQ5eA7hfogu5R8U9+P24XYJHMsIy1A5O/DrpDv QmxgReGCcCTIhJ9Tl5cQ6Xc022+SwLi9lud4lSP0P6Q2T7pGmpREWtpI0FyEEV2n TIr+frzg2QQay/anFQSIkN6v9+8sR27tN1jz/9tnG2HVuP7l8KUFEIVAlDh8PZE= =48yK -----END PGP SIGNATURE----- --JvpiHFqPeQV0HeULQi7aJOOwDK2HRnFEh-- From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 20 14:33:07 2017 Received: (at 26574-done) by debbugs.gnu.org; 20 Apr 2017 18:33:07 +0000 Received: from localhost ([127.0.0.1]:59320 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1Gt9-0005Rc-5x for submit@debbugs.gnu.org; Thu, 20 Apr 2017 14:33:07 -0400 Received: from mail-qk0-f182.google.com ([209.85.220.182]:32808) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1Gt7-0005R8-3x for 26574-done@debbugs.gnu.org; Thu, 20 Apr 2017 14:33:05 -0400 Received: by mail-qk0-f182.google.com with SMTP id h67so54261552qke.0 for <26574-done@debbugs.gnu.org>; Thu, 20 Apr 2017 11:33:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=UC4VUhhBmxn3TrfqiCCMKpP4O8hATk3p7y92TG8yab0=; b=sJqGEsTqE3nCP9zCb//whnu+lPqpg2IKC9OuB/fZc78dhAUF0WuXcPMA2Y6yNu9lo/ pGOFXhhvtq7/Y1h30IsXE9QAPmV8BIsnh2XI4C2wLHJJw/2lVUfwerW9rRuPfkdCmUtZ 6dAq/BIzMRuYzIhUGorcl2JUyXFpb28ApuZ20oyKE1WwjMrv+Nvzw+myiFh2mSXFgqSI D/k6Z+kHKRQmrvoAXX5B7DcC4EaGcCqpIbHIqVM0pXH90nm6bQXCPyaeYoerL0JjOT71 Z7/ZL1MmjejyLDeyyAr77xLNOq2E06/tecfD9GJwk365HamYXWov/1DxlDWF8kI9Y/Wj geGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=UC4VUhhBmxn3TrfqiCCMKpP4O8hATk3p7y92TG8yab0=; b=ImWL9lyPk71gEx+VDrdwvrRAGKxQA2b9liWM3C8XTuohBPLD3HcmOIK0P6d/lCzjQ3 sz56dgsJBE6X7lyUO2d6MRuTIptVcksMW+BYuhKr0YstTCm64DSwRLi/qj4llr2yyg/1 UpA1leO1mFXNjgI1xPLYUWWORxAaLhfY6y/xadwbj/Equq2rNDVqOVDhoT8RPm95Jb91 voLQ4MvuaHYU515dxpoz7KkDEbC24KYY8tqgUJ0u8Z8U7/2PARK7zpji/rkiTHb6Lc7M PPPACyPTYpUCEBYnMa56BHw/E2f7Ey+6d/ByFsmac6tZKUyaudd4v9WLBoaSThSxx2ha F44A== X-Gm-Message-State: AN3rC/5fgubmsVRx3W8WSGSpiGXZcqiTOWDbfuTN3Ae/dof/3cowQs5v AmlGDgIOEyNOfQ== X-Received: by 10.55.128.1 with SMTP id b1mr9114117qkd.226.1492713179517; Thu, 20 Apr 2017 11:32:59 -0700 (PDT) Received: from gmail.com (housegordon.org. [104.236.108.240]) by smtp.gmail.com with ESMTPSA id p64sm4670801qkf.62.2017.04.20.11.32.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Apr 2017 11:32:58 -0700 (PDT) Date: Thu, 20 Apr 2017 18:32:22 +0000 From: Assaf Gordon To: Eric Blake Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix Message-ID: <20170420183220.GC11565@gmail.com> References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> <195a1041-b6a6-4dcd-ce0d-ed06febfb86e@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <195a1041-b6a6-4dcd-ce0d-ed06febfb86e@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 26574-done Cc: Michael Klement , 26574-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) Hello, On Thu, Apr 20, 2017 at 11:46:15AM -0500, Eric Blake wrote: >On 04/20/2017 11:36 AM, Michael Klement wrote: >> Thanks for the detailed feedback, Eric. >> >> The POSIX spec. is, unfortunately, vague on this topic: >> >> The definition of a line (which you quote) is complemented with the definition of an incomplete line : >> >>> A sequence of one or more non- characters at the end of the file. >> >> >> So while the standard is aware of this possibility and gives it a name that suggests it is a kind of line, but something's missing, there is precious little behavior prescribed with respect to such incomplete lines. >> > >You're welcome to submit a bug report to get POSIX to more clearly word >its intentions that a file with an incomplete line is NOT a text file >(http://austingroupbugs.net/main_page.php), but everyone on the Austin >Group (myself included) has already agreed that the intention is there >(even if the wording could be improved): Omitting a trailing newline >causes sed to enter into the realm of undefined behavior - and this is >BECAUSE there are existing sed implementations that behave differently >when a trailing newline is omitted. Some do not do anything with an >incomplete line (sed behaves as though the file were truncated at the >last newline). > For completeness, here's the behaviour of several implementaions: sed implementations that do not add a newline (like gnu sed): FreeBSD 10 OpenBSD 5.9 BusyBox 1.22 ToyBox 7.2 AIX 7 sed implementations that do add a new line: NetBSD 7.0 Heirloom SunOS 5.11's sed prints nothing if there is no newline: $ printf 'a' | sed '' | od -tx1 0000000 $ printf 'a\n' | sed '' | od -tx1 0000000 61 0a 0000002 $ uname -a SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise $ which sed /usr/bin/sed The behaviour (of processing a file without newline at the last line) also differs in other programs/languages/implementations: $ printf a | perl -npe '' | od -tx1 0000000 61 0000001 $ printf a | perl -lnpe '' | od -tx1 0000000 61 0a 0000002 $ printf a | awk '{print}' | od -tx1 0000000 61 0a 0000002 $ printf 'a' | sh -c 'while read A ; do echo $A ; done' | od -tx1 0000000 $ printf 'a' \ | python3 -c 'import sys; [print(x,end="") for x in sys.stdin]' \ | od -tx1 0000000 61 0000001 $ printf a | uniq-gnu | od -t x1 0000000 61 0a 0000002 $ printf a | uniq-freebsd-11 | od -t x1 0000000 61 0000001 $ printf a | cut-gnu -f1 | od -tx1 0000000 61 0a 0000002 $ printf a | cut-freebsd-11 -f1 | od -tx1 0000000 61 0000001 $ printf a | sort | od -t x1 0000000 61 0a 0000002 And this reinforces what Eric wrote: there is simply no 'one correct' (or agreed-upon) way to deal with files without newlines on the last line. regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 20 15:32:24 2017 Received: (at 26574-done) by debbugs.gnu.org; 20 Apr 2017 19:32:24 +0000 Received: from localhost ([127.0.0.1]:59349 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1HoW-0006uL-An for submit@debbugs.gnu.org; Thu, 20 Apr 2017 15:32:24 -0400 Received: from mail-qk0-f175.google.com ([209.85.220.175]:34567) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1HoU-0006u9-CY for 26574-done@debbugs.gnu.org; Thu, 20 Apr 2017 15:32:23 -0400 Received: by mail-qk0-f175.google.com with SMTP id y63so25306713qkd.1 for <26574-done@debbugs.gnu.org>; Thu, 20 Apr 2017 12:32:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=EkWSCRsI9FlA7cTBtwslw1aK39C+aHxGyjQSVL6cu7I=; b=MMSTJTP2QM8hxswMO6BAwMK9Ykl5tw3wsrKZ/RXwfT+ll7HYk4rJtKBuABninv/LjB CxXnfOHvMOnN3oJLVh2zGdLaX7/DDiY6BZjtmlHN6CuzNnqgOx8zYY2uSH4NGTkqcfRa pmn4kks0d9F6Erfc8PknXPg7UoYiknUmxeYwDNCcLTPwnP6OLk3/L2uaWXjZ382gm10t m8E6CmfRQiVd/oUmHKEE6LrpNVSg/s1bHPUj5SXAoNNGXJ1pE+94uC9BJH/W3Oe5bH3e zIEkTDr5WvID6fC+mrZ+i1W8vNctVQpJpjIcR8mumA2WaY/+jilzTAyr7t7kZEACRyEP mTAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=EkWSCRsI9FlA7cTBtwslw1aK39C+aHxGyjQSVL6cu7I=; b=GUEBYnk3dXsCBlCzYTbhL67oRuy8kWWbwM+ryGlejknt08hxHdZuNhWZpBu7hSN2BO el7ecYEJaW0Re0oDn/N8mob6dBloqmcak2P4Lak0OJQBhnHr7Xik+C4ngPyiv2s9wl32 mIOINTNVKdzFEMmHeQIH+cKFbmhDDtoV7T86/q9KVwN2qntEOHFYoWs0UdJYvJMaumR0 7jK41YbWyyteJTg+9EEzM6v7lkJ7idUp3abTxlZX1RgTmc9qTcT8hzsObt7qZmoP3eXB 3k/QyUiUqIgo0Z+RXSfltTs5ZeqqjaIqEDLnvnj9h50MjyHHNSX5McVjDD4EDUza8Jjl A/Zw== X-Gm-Message-State: AN3rC/6FSDA5oBIZE93RlBKt+lcgvZv0rwnjrKs/s5SL2f2lN3tKKUXV uf9j4b5LD+/a3w== X-Received: by 10.55.60.196 with SMTP id j187mr10611147qka.93.1492716736828; Thu, 20 Apr 2017 12:32:16 -0700 (PDT) Received: from mkimac.home (pool-173-48-237-207.bstnma.fios.verizon.net. [173.48.237.207]) by smtp.gmail.com with ESMTPSA id t68sm4792553qkc.44.2017.04.20.12.32.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Apr 2017 12:32:14 -0700 (PDT) From: Michael Klement Message-Id: <65260560-2F7E-4CCB-82D4-5D3AFC509933@usa.net> Content-Type: multipart/alternative; boundary="Apple-Mail=_3C0550B9-E15E-4511-8ECF-3ED09BE40E4B" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix Date: Thu, 20 Apr 2017 15:32:13 -0400 In-Reply-To: <20170420183220.GC11565@gmail.com> To: Assaf Gordon References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> <195a1041-b6a6-4dcd-ce0d-ed06febfb86e@redhat.com> <20170420183220.GC11565@gmail.com> X-Mailer: Apple Mail (2.3273) X-Spam-Score: -1.8 (-) X-Debbugs-Envelope-To: 26574-done Cc: Eric Blake , 26574-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.8 (-) --Apple-Mail=_3C0550B9-E15E-4511-8ECF-3ED09BE40E4B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Thanks for digging into this, it indeed illustrates the point well. Just for the record: Here's what I get on FreeBSD 10.1.2 and on macOS 10.12.4: $ printf 'a' | sed '' | od -tx1 0000000 61 0a =20= 0000002 macOS typically comes with an older version of the BSD implementation = (which doesn't support --version, but the man pages are dated June 20, = 2014 and May 10, 2005, respectively). Another (minor) point of interest: On macOS 10.12.4 (but not FreeBSD 10.1.2), Sed chokes on bytes that = aren't valid in UTF-8 encoding, when using regex-based functionality: $ printf '\xfc\n' | sed -n '/./p' sed: RE error: illegal byte sequence > On Apr 20, 2017, at 2:32 PM, Assaf Gordon = wrote: >=20 > Hello, >=20 > On Thu, Apr 20, 2017 at 11:46:15AM -0500, Eric Blake wrote: >> On 04/20/2017 11:36 AM, Michael Klement wrote: >>> Thanks for the detailed feedback, Eric. >>>=20 >>> The POSIX spec. is, unfortunately, vague on this topic: >>>=20 >>> The definition of a line (which you quote) is complemented with the = definition of an incomplete line = : >>>=20 >>>> A sequence of one or more non- characters at the end of = the file. >>>=20 >>>=20 >>> So while the standard is aware of this possibility and gives it a = name that suggests it is a kind of line, but something's missing, there = is precious little behavior prescribed with respect to such incomplete = lines. >>>=20 >>=20 >> You're welcome to submit a bug report to get POSIX to more clearly = word >> its intentions that a file with an incomplete line is NOT a text file >> (http://austingroupbugs.net/main_page.php), but everyone on the = Austin >> Group (myself included) has already agreed that the intention is = there >> (even if the wording could be improved): Omitting a trailing newline >> causes sed to enter into the realm of undefined behavior - and this = is >> BECAUSE there are existing sed implementations that behave = differently >> when a trailing newline is omitted. Some do not do anything with an >> incomplete line (sed behaves as though the file were truncated at the >> last newline). >>=20 >=20 > For completeness, here's the behaviour of several implementaions: >=20 > sed implementations that do not add a newline (like gnu sed): > FreeBSD 10 > OpenBSD 5.9 > BusyBox 1.22 > ToyBox 7.2 > AIX 7 >=20 > sed implementations that do add a new line: > NetBSD 7.0 > Heirloom >=20 > SunOS 5.11's sed prints nothing if there is no newline: > $ printf 'a' | sed '' | od -tx1 > 0000000 > $ printf 'a\n' | sed '' | od -tx1 > 0000000 61 0a > 0000002 > $ uname -a > SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise > $ which sed > /usr/bin/sed >=20 >=20 > The behaviour (of processing a file without newline at the last line) = also differs in other programs/languages/implementations: >=20 > $ printf a | perl -npe '' | od -tx1 > 0000000 61 > 0000001 >=20 > $ printf a | perl -lnpe '' | od -tx1 > 0000000 61 0a > 0000002 >=20 > $ printf a | awk '{print}' | od -tx1 > 0000000 61 0a > 0000002 >=20 > $ printf 'a' | sh -c 'while read A ; do echo $A ; done' | od -tx1 > 0000000 >=20 > $ printf 'a' \ > | python3 -c 'import sys; [print(x,end=3D"") for x in sys.stdin]' = \ > | od -tx1 > 0000000 61 > 0000001 >=20 > $ printf a | uniq-gnu | od -t x1 > 0000000 61 0a > 0000002 >=20 > $ printf a | uniq-freebsd-11 | od -t x1 > 0000000 61 > 0000001 >=20 > $ printf a | cut-gnu -f1 | od -tx1 > 0000000 61 0a > 0000002 >=20 > $ printf a | cut-freebsd-11 -f1 | od -tx1 > 0000000 61 > 0000001 >=20 > $ printf a | sort | od -t x1 > 0000000 61 0a > 0000002 >=20 >=20 > And this reinforces what Eric wrote: there is simply no > 'one correct' (or agreed-upon) way to deal with files without newlines = on the last line. >=20 >=20 > regards, > - assaf --Apple-Mail=_3C0550B9-E15E-4511-8ECF-3ED09BE40E4B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
Thanks for digging into this, it indeed = illustrates the point well.

Just for the record:

Here's what I get on FreeBSD 10.1.2 and = on macOS 10.12.4:

$ printf = 'a' | sed '' | od -tx1
0000000    61 =  0a                   =                     =                 =  
0000002

macOS typically comes with an older = version of the BSD implementation (which doesn't support --version, but = the man pages are dated June 20, 2014 and May 10, 2005, = respectively).

Another (minor) point of interest:

On macOS 10.12.4 (but not FreeBSD = 10.1.2), Sed chokes on bytes that aren't valid in UTF-8 encoding, when = using regex-based functionality:

$ printf '\xfc\n' | sed  -n '/./p'
sed: RE = error: illegal byte sequence




On Apr 20, 2017, at 2:32 PM, Assaf Gordon = <assafgordon@gmail.com> wrote:

Hello,

On Thu, Apr 20, 2017 at = 11:46:15AM -0500, Eric Blake wrote:
On 04/20/2017 11:36 AM, Michael Klement = wrote:
Thanks for the = detailed feedback, Eric.

The POSIX spec. = is, unfortunately, vague on this topic:

The = definition of a line (which you quote) is complemented with the = definition of an incomplete line <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_cha= p03.html#tag_03_195>:

A sequence of one or more non- <newline> = characters at the end of the file.


So while the standard is aware of this = possibility and gives it a name that suggests it is a kind of line, but = something's missing, there is precious little behavior prescribed with = respect to such incomplete lines.


You're welcome to submit a bug = report to get POSIX to more clearly word
its intentions = that a file with an incomplete line is NOT a text file
(http://austingroupbugs.net/main_page.php), but everyone = on the Austin
Group (myself included) has already agreed = that the intention is there
(even if the wording could be = improved): Omitting a trailing newline
causes sed to enter = into the realm of undefined behavior - and this is
BECAUSE = there are existing sed implementations that behave differently
when a trailing newline is omitted.  Some do not do = anything with an
incomplete line (sed behaves as though = the file were truncated at the
last newline).


For completeness, = here's the behaviour of several implementaions:

sed implementations that do not add a newline (like gnu = sed):
 FreeBSD 10
 OpenBSD = 5.9
 BusyBox 1.22
 ToyBox 7.2
 AIX 7

sed implementations = that do add a new line:
 NetBSD 7.0
=  Heirloom

SunOS 5.11's sed prints = nothing if there is no newline:
 $ printf 'a' | sed = '' | od -tx1
 0000000
 $ printf = 'a\n' | sed '' | od -tx1
 0000000 61 0a
=  0000002
 $ uname -a
=  SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise
 $ which sed
 /usr/bin/sed


The behaviour (of processing a = file without newline at the last line) also differs in other = programs/languages/implementations:

=  $ printf a | perl -npe '' | od -tx1
 0000000 = 61
 0000001

 $ = printf a | perl -lnpe '' | od -tx1
 0000000 61 0a
 0000002

 $ printf a = | awk '{print}' | od -tx1
 0000000 61 0a
 0000002

 $ printf = 'a' | sh -c 'while read A ; do echo $A ; done' | od -tx1
=  0000000

 $ printf 'a' \
    | python3 -c 'import sys; = [print(x,end=3D"") for x in sys.stdin]' \
=     | od -tx1
 0000000 61
 0000001

 $ printf a = | uniq-gnu | od -t x1
 0000000 61 0a
=  0000002

 $ printf a | = uniq-freebsd-11 | od -t x1
 0000000 =    61
 0000001

 $ printf a | cut-gnu -f1 | od -tx1
=  0000000 61 0a
 0000002

 $ printf a | cut-freebsd-11 -f1 | od -tx1
=  0000000    61
 0000001

 $ printf a | sort | od -t x1
 0000000 61 0a
 0000002


And this reinforces what Eric = wrote: there is simply no
'one correct' (or agreed-upon) = way to deal with files without newlines on the last line.


regards,
- = assaf

= --Apple-Mail=_3C0550B9-E15E-4511-8ECF-3ED09BE40E4B-- From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 20 15:36:48 2017 Received: (at 26574-done) by debbugs.gnu.org; 20 Apr 2017 19:36:48 +0000 Received: from localhost ([127.0.0.1]:59362 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1Hsm-00071h-Bh for submit@debbugs.gnu.org; Thu, 20 Apr 2017 15:36:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56104) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d1Hsk-00071U-9K for 26574-done@debbugs.gnu.org; Thu, 20 Apr 2017 15:36:46 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 93AA36E77C; Thu, 20 Apr 2017 19:36:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 93AA36E77C Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=eblake@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 93AA36E77C Received: from [10.10.121.102] (ovpn-121-102.rdu2.redhat.com [10.10.121.102]) by smtp.corp.redhat.com (Postfix) with ESMTP id 12C02173D0; Thu, 20 Apr 2017 19:36:39 +0000 (UTC) Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix To: Michael Klement , Assaf Gordon References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> <195a1041-b6a6-4dcd-ce0d-ed06febfb86e@redhat.com> <20170420183220.GC11565@gmail.com> <65260560-2F7E-4CCB-82D4-5D3AFC509933@usa.net> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <18766970-3874-451a-c9db-9c17fcdb9627@redhat.com> Date: Thu, 20 Apr 2017 14:36:39 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0 MIME-Version: 1.0 In-Reply-To: <65260560-2F7E-4CCB-82D4-5D3AFC509933@usa.net> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="jkle1LgOpEA9fued7UuM3hnJd7398aVcn" X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 20 Apr 2017 19:36:40 +0000 (UTC) X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 26574-done Cc: 26574-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --jkle1LgOpEA9fued7UuM3hnJd7398aVcn Content-Type: multipart/mixed; boundary="FIaiXScwDEOjJL8bsJcXvJPFPTCDklqXn"; protected-headers="v1" From: Eric Blake To: Michael Klement , Assaf Gordon Cc: 26574-done@debbugs.gnu.org Message-ID: <18766970-3874-451a-c9db-9c17fcdb9627@redhat.com> Subject: Re: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix References: <8BD9A8CD-00A9-4B06-8C4A-5F0C6FF6A022@usa.net> <9777309F-9E38-4E9F-A688-A4C8E30F270F@usa.net> <195a1041-b6a6-4dcd-ce0d-ed06febfb86e@redhat.com> <20170420183220.GC11565@gmail.com> <65260560-2F7E-4CCB-82D4-5D3AFC509933@usa.net> In-Reply-To: <65260560-2F7E-4CCB-82D4-5D3AFC509933@usa.net> --FIaiXScwDEOjJL8bsJcXvJPFPTCDklqXn Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 04/20/2017 02:32 PM, Michael Klement wrote: > On macOS 10.12.4 (but not FreeBSD 10.1.2), Sed chokes on bytes that are= n't valid in UTF-8 encoding, when using regex-based functionality: >=20 > $ printf '\xfc\n' | sed -n '/./p' > sed: RE error: illegal byte sequence >=20 That's locale dependent (should not happen with LC_ALL=3DC) - but it illustrates another nice point about POSIX text files: a text file may not have encoding errors, but as a corollary of that fact, there exist files which are text files in some locales but binary files in others! The behavior of sed is only specified when you have no encoding errors, so your choice of locale can indeed affect whether you get output that you wanted. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org --FIaiXScwDEOjJL8bsJcXvJPFPTCDklqXn-- --jkle1LgOpEA9fued7UuM3hnJd7398aVcn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJY+Q3HAAoJEKeha0olJ0NqajUH/1Ax/XyWcpkJ+pQ7BsEN8/Ot OJzlLSdq9FnvlvvY9pjaP9KPuw4WbDqotjlFnwOHOftrWCaYm1GIKjU5nCkmV5Kl +aAobOq7Ni3yow3/QxJe5nu+muG+actzJ47mr2Am/z4vWOdMjo7Ji4gpssheo41y r4aP+1idyHdINRW3SeIi+zGRy/4+whR+RNabFptVygj2ILkOfYXvdBBO5Npr3L/Y sazH7QVXUFSEdblvEq4ph5e8N41ebYdx3WIn+zqm5QhfxvJ9Q2F+utKor3sV9b7h O/KhnQOWVC9TA9YfKoR9OHDofMUZhamgOp4OumIIKJNDDTKz7/W57T58PhRpvgo= =RnwL -----END PGP SIGNATURE----- --jkle1LgOpEA9fued7UuM3hnJd7398aVcn-- From unknown Sun Jun 22 00:03:54 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 19 May 2017 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator