From unknown Fri Jun 20 07:25:58 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#19240 <19240@debbugs.gnu.org> To: bug#19240 <19240@debbugs.gnu.org> Subject: Status: cut 8.22 adds newline Reply-To: bug#19240 <19240@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:25:58 +0000 retitle 19240 cut 8.22 adds newline reassign 19240 coreutils submitter 19240 John Kendall severity 19240 normal tag 19240 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 01 11:43:39 2014 Received: (at submit) by debbugs.gnu.org; 1 Dec 2014 16:43:39 +0000 Received: from localhost ([127.0.0.1]:50958 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvU4Y-0001ks-N1 for submit@debbugs.gnu.org; Mon, 01 Dec 2014 11:43:39 -0500 Received: from eggs.gnu.org ([208.118.235.92]:43340) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvRaU-0006IA-NW for submit@debbugs.gnu.org; Mon, 01 Dec 2014 09:04:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XvRaK-00062H-Ma for submit@debbugs.gnu.org; Mon, 01 Dec 2014 09:04:26 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: *** X-Spam-Status: No, score=3.3 required=5.0 tests=BAYES_50, RECEIVED_FROM_WINDOWS_HOST autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:44760) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvRaK-00062D-JX for submit@debbugs.gnu.org; Mon, 01 Dec 2014 09:04:16 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45049) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvRa8-0002oW-UN for bug-coreutils@gnu.org; Mon, 01 Dec 2014 09:04:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XvRa1-00061f-EC for bug-coreutils@gnu.org; Mon, 01 Dec 2014 09:04:04 -0500 Received: from hub021-ca-5.exch021.serverdata.net ([64.78.56.70]:60078) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvRa1-00061U-8h for bug-coreutils@gnu.org; Mon, 01 Dec 2014 09:03:57 -0500 Received: from MBX021-W3-CA-2.exch021.domain.local ([10.254.4.78]) by HUB021-CA-5.exch021.domain.local ([10.254.4.89]) with mapi id 14.03.0174.001; Mon, 1 Dec 2014 05:39:39 -0800 From: John Kendall To: "bug-coreutils@gnu.org" Subject: cut 8.22 adds newline Thread-Topic: cut 8.22 adds newline Thread-Index: AQHQDWxAAovutUfLoU2mru2rZH4tFA== Date: Mon, 1 Dec 2014 13:39:38 +0000 Message-ID: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [50.161.77.102] Content-Type: text/plain; charset="us-ascii" Content-ID: <3FA64BD37908D7489E7300D659FB770F@exch021.domain.local> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 01 Dec 2014 11:43:36 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Hi, I don't know if this is a bug, but I wonder if there is a consensus on corr= ect behavior. The solaris version of cut does not add a newline if there was no newline o= n the input: Consider this printf command: $ printf "1\n12\n123\n1234\n12345\n123456" 1 12 123 1234 12345 123456$ Note that the shell prompt appears after the 6 on the last line. # Solaris cut $ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 1 12 123 1234 1234 1234$ Note that the shell prompt appears after the 4 on the last line. #gnu 8.22 cut /$ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 1 12 123 1234 1234 1234 $ Note that the shell prompt appears on its own line. I came upon this while porting scripts from Solaris 10 to Centos 7. Interested to hear you thoughts. Thanks and best regards, John --- John Kendall System Administrator CAI International From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 01 12:05:42 2014 Received: (at control) by debbugs.gnu.org; 1 Dec 2014 17:05:42 +0000 Received: from localhost ([127.0.0.1]:50967 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvUPt-0002JG-RW for submit@debbugs.gnu.org; Mon, 01 Dec 2014 12:05:42 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40154) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvUPr-0002J1-Oh; Mon, 01 Dec 2014 12:05:40 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sB1H5b5l003398 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 1 Dec 2014 12:05:38 -0500 Received: from [10.3.113.126] (ovpn-113-126.phx2.redhat.com [10.3.113.126]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sB1H5blr003854; Mon, 1 Dec 2014 12:05:37 -0500 Message-ID: <547C9FE1.2020702@redhat.com> Date: Mon, 01 Dec 2014 10:05:37 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: John Kendall , 19240-done@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> In-Reply-To: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Xbx3A1ieJJjDm6cI1FJiUSFueuEgurXTC" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --Xbx3A1ieJJjDm6cI1FJiUSFueuEgurXTC Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable tag 19240 notabug thanks On 12/01/2014 06:39 AM, John Kendall wrote: > Hi, >=20 > I don't know if this is a bug, but I wonder if there is a consensus on = correct behavior. > The solaris version of cut does not add a newline if there was no newli= ne on the input: Such an input is not a text file (the POSIX definition of text file requires that if the file is not empty, it ends in newline); and POSIX leaves the behavior of 'cut' unspecified if it is not operating on a text file. http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#t= ag_03_397 http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html Therefore, it is unspecified whether cut will add or skip a trailing newline. >=20 > I came upon this while porting scripts from Solaris 10 to Centos 7. GNU chose to make cut behave similarly to sort, which IS required to add a trailing newline even when the input lacks one (that is, POSIX goes the extra mile and defines sort behavior on non-text files that are non-text only because they lack a newline). Solaris chose differently. But the problem is that you are relying on unspecified behavior; fix your input files to have a trailing newline, then you won't have to worry about it. At any rate, I see no reason to change GNU behavior, so I'm closing this as not a bug. Feel free to add further comments, though, including if you have a stronger argument for why we should reopen the bug and change behavior. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --Xbx3A1ieJJjDm6cI1FJiUSFueuEgurXTC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUfJ/hAAoJEKeha0olJ0NqEooIAKkNUHjwOBuDPm3kFXlYlJfW Vowzp644F2Cf3W0rqOVjTuP23Ek7nMMTCTGs1GtKhHa8vjaCZ2FPdApLdeaH0L1A 1vDPPjBdylZIp6s/9pdHXvNAFOwaVDjXIFHVTtx5slzHQ/9XFQKqPAv42EkIheUp JkHDxF5Saf155RUzaoea2bDDie2f/AKI3vN/DP9iVgijg2EqeA2f2MymTgkEuow2 2UY3hVqMlDM4qkR0ZLBQOKPTbUPsqbWvyvHmoNZ/4Lx27mXKtVnXQnHT1ajtBcUu RryVQtE3oBsr+zuhdd/8rcLfBusQsRxAGQVpolu0uF1FNXyNI/hmQ0W07QyrDXY= =QYsR -----END PGP SIGNATURE----- --Xbx3A1ieJJjDm6cI1FJiUSFueuEgurXTC-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 01 16:18:21 2014 Received: (at 19240) by debbugs.gnu.org; 1 Dec 2014 21:18:21 +0000 Received: from localhost ([127.0.0.1]:51176 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvYMP-0004Hc-61 for submit@debbugs.gnu.org; Mon, 01 Dec 2014 16:18:21 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59004) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvYMM-0004HT-Ps for 19240@debbugs.gnu.org; Mon, 01 Dec 2014 16:18:19 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sB1LIGdW011931 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 1 Dec 2014 16:18:17 -0500 Received: from [10.3.113.126] (ovpn-113-126.phx2.redhat.com [10.3.113.126]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sB1LIG7n019960; Mon, 1 Dec 2014 16:18:16 -0500 Message-ID: <547CDB18.3010205@redhat.com> Date: Mon, 01 Dec 2014 14:18:16 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> In-Reply-To: OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Iduc8mmoFIfEGMG9kEbGv3DLRb0VWaP8w" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --Iduc8mmoFIfEGMG9kEbGv3DLRb0VWaP8w Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable [re-adding the bug, with permission] On 12/01/2014 01:10 PM, John Kendall wrote: > Thanks, Eric. >=20 > My only, admittedly weak, rebuttal is that the behavior of sort might n= ot=20 > be the best behavior to imitate. It's understandable why POSIX defines= =20 > how sort behaves, since it's intended for multi-line input. >=20 > It seems sed, which is frequently used for single lines of input, might= be=20 > a better analogy. Gnu sed 4.2.2 and solaris sed act the same way as=20 > solaris cut (no newline added): >=20 > $ printf "ooooooooooo" | sed 's/o/p/g' > ppppppppppp$ As a counter-argument, I recall hearing of other implementations of sed that silently omit a trailing line that lacks a newline. And perhaps GNU sed should be changed to always emit a trailing newline, but that's something to bring up on the sed mailing list :) >=20 >=20 > If my weak rebuttal is unconvincing, then I wonder if a note could be=20 > added to the cut man page so that the next porter can find an answer=20 > a little easier. As an interesting counterpoint, the Solaris version = of > sort announces loudly when it does what POSIX requires: >=20 > $ printf "ooooooooooo" | sort > sort: missing NEWLINE added at end of input file STDIN > ooooooooooo > $ Ouch - that's a bug in Solaris. POSIX does not allow for noise on stderr when giving a default 0 success exit status. >=20 >=20 >=20 > Thanks for taking the time to clarify this. I've been using SunOS and = > Solaris exclusively since 1992, so I've had a stable environment and=20 > was oblivious to the unspecified behavior that my scripts depended on. = =20 >=20 > Cheers, > John >=20 I'll leave it to other contributors to weigh in on whether omitting the final newline on output when it was missing on input is worth the complexity of a change. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --Iduc8mmoFIfEGMG9kEbGv3DLRb0VWaP8w Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUfNsYAAoJEKeha0olJ0NqhaUH/3UHJRsN7aNUDv8wXXkSDEdS B670V/TEAjk7JE9yAbI571rVytk0DLN0Ll4ZKBhQ4afOfzc1NRAmQs6VaNfu4oQf ydHbAnzka8/lcHLIDpmAdmqB4q2QijzqwoWrLn2R8S14hhfe8BSapy8hHHUHNELJ gcDA5id9VcNGqAXLfg3N1pdy5n9g81t3G6RE3e1lx9AfGbjJcYuLBrrZINhGOKSn UoXQuIc/b0NcJwImhrbEwe1zMZWXlYWMyDuqnNj8H/nuQtZYCr3s4G59FV53D1R2 I2PmyjtJ/KcLKJV6cIf/Kvzua+3gXanQN0vCKuSDsVV/tfwj34MM53tX4ePlLKs= =Nxyn -----END PGP SIGNATURE----- --Iduc8mmoFIfEGMG9kEbGv3DLRb0VWaP8w-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 01 17:07:01 2014 Received: (at 19240) by debbugs.gnu.org; 1 Dec 2014 22:07:01 +0000 Received: from localhost ([127.0.0.1]:51223 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvZ7U-0006qu-Uc for submit@debbugs.gnu.org; Mon, 01 Dec 2014 17:07:01 -0500 Received: from mail6.vodafone.ie ([213.233.128.184]:34905) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvZ7R-0006qj-Pm for 19240@debbugs.gnu.org; Mon, 01 Dec 2014 17:06:58 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AroLABXmfFRtTvvh/2dsb2JhbABbgwZRWAGCNFDDXIYaAQICgRcWAQEBAQF9hAIBAQEDASMPATsGFQsNAQoCAgUWCwICCQMCAQIBRQYBDAgBAYgzDQEIvnCFfpEaIIEriTWGFoJ1gVMFlR+OVI00g3s/MQEBgkQBAQE Received: from unknown (HELO localhost.localdomain) ([109.78.251.225]) by mail3.vodafone.ie with ESMTP; 01 Dec 2014 22:06:55 +0000 Message-ID: <547CE67F.1050608@draigBrady.com> Date: Mon, 01 Dec 2014 22:06:55 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Eric Blake , John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> In-Reply-To: <547CDB18.3010205@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 01/12/14 21:18, Eric Blake wrote: > [re-adding the bug, with permission] > > On 12/01/2014 01:10 PM, John Kendall wrote: >> Thanks, Eric. >> >> My only, admittedly weak, rebuttal is that the behavior of sort might not >> be the best behavior to imitate. It's understandable why POSIX defines >> how sort behaves, since it's intended for multi-line input. >> >> It seems sed, which is frequently used for single lines of input, might be >> a better analogy. Gnu sed 4.2.2 and solaris sed act the same way as >> solaris cut (no newline added): >> >> $ printf "ooooooooooo" | sed 's/o/p/g' >> ppppppppppp$ > > As a counter-argument, I recall hearing of other implementations of sed > that silently omit a trailing line that lacks a newline. And perhaps > GNU sed should be changed to always emit a trailing newline, but that's > something to bring up on the sed mailing list :) I don't think so. I agree that a newline should only be added where needed, especially with a low level tool like sed. sort can reorder the last item elsewhere in the output and so needs to output the extra '\n'. BTW the argument that it's not a text file is a bit beside the point as POSIX also says text files can't contain NUL chars, but we process this just fine: $ printf 'a\000b' | cut -c3 b >> If my weak rebuttal is unconvincing, then I wonder if a note could be >> added to the cut man page so that the next porter can find an answer >> a little easier. As an interesting counterpoint, the Solaris version of >> sort announces loudly when it does what POSIX requires: >> >> $ printf "ooooooooooo" | sort >> sort: missing NEWLINE added at end of input file STDIN >> ooooooooooo >> $ > > Ouch - that's a bug in Solaris. POSIX does not allow for noise on > stderr when giving a default 0 success exit status. > >> >> >> >> Thanks for taking the time to clarify this. I've been using SunOS and >> Solaris exclusively since 1992, so I've had a stable environment and >> was oblivious to the unspecified behavior that my scripts depended on. >> >> Cheers, >> John >> > > I'll leave it to other contributors to weigh in on whether omitting the > final newline on output when it was missing on input is worth the > complexity of a change. Our current behaviour wrt newlines is "documented" at: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/misc/cut.pl;h=04188621b#l132 though those tests were only added in v8.21 Note I see that solaris is inconsistent with -c and -f in this regard: solaris> printf '1\n2' | cut -c1 1 2solaris> solaris> printf '1\n2' | cut -f1 1 2 solaris> I kid you not that FreeBSD does the opposite and outputs the extra '\n' in the -c case but not with -f. Also comparing other tools like uniq we have: solaris> printf '1' | uniq solaris> (nothing output!) freebsd> printf '1' | uniq 1freebsd> coreutl> printf '1' | uniq 1 coreutl> If we were just implementing now, I'd not output the extra '\n', but changing at this stage needs to be carefully considered, and with all the textutils, not just cut(1). thanks, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 01 17:25:10 2014 Received: (at 19240) by debbugs.gnu.org; 1 Dec 2014 22:25:10 +0000 Received: from localhost ([127.0.0.1]:51229 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvZP3-0007HR-Pg for submit@debbugs.gnu.org; Mon, 01 Dec 2014 17:25:10 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:33166) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvZP0-0007HD-W2 for 19240@debbugs.gnu.org; Mon, 01 Dec 2014 17:25:08 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id F1F01A60051; Mon, 1 Dec 2014 14:25:04 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f7s6dEVZcyAM; Mon, 1 Dec 2014 14:24:56 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 5F2A0A6004C; Mon, 1 Dec 2014 14:24:56 -0800 (PST) Message-ID: <547CEAB7.9000304@cs.ucla.edu> Date: Mon, 01 Dec 2014 14:24:55 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= , Eric Blake , John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> In-Reply-To: <547CE67F.1050608@draigBrady.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) On 12/01/2014 02:06 PM, Pádraig Brady wrote: > If we were just implementing now, I'd not output the extra '\n', I have just the opposite kneejerk reaction; typically text-based apps are simpler and easier to document and use when they silently pretend that the input had a trailing newline. That's what 'awk' and 'grep' do, for example, and they works fine. There are some solid counterexamples (e.g., Emacs, diff) but they have good reasons to be counterexamples. > a newline should only be added where needed, > especially with a low level tool like sed. I'm afraid 'sed' is not that low-level, and GNU sed's current behavior is inconsistent. Sometimes it silently appends a trailing newline to the input before processing it, and sometimes it doesn't: $ printf x | sed '$a\ > y' x y $ printf x | sed 's/$/y/' xy$ > changing at this stage needs to be carefully considered Yes, the use cases are key here. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 01 18:06:17 2014 Received: (at 19240) by debbugs.gnu.org; 1 Dec 2014 23:06:17 +0000 Received: from localhost ([127.0.0.1]:51285 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xva2r-0008Id-2m for submit@debbugs.gnu.org; Mon, 01 Dec 2014 18:06:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44225) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xva2p-0008IV-4e for 19240@debbugs.gnu.org; Mon, 01 Dec 2014 18:06:16 -0500 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sB1N6CRj019228 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 1 Dec 2014 18:06:12 -0500 Received: from [10.3.113.126] (ovpn-113-126.phx2.redhat.com [10.3.113.126]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sB1N6BQR020343; Mon, 1 Dec 2014 18:06:12 -0500 Message-ID: <547CF463.5080903@redhat.com> Date: Mon, 01 Dec 2014 16:06:11 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= , John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> In-Reply-To: <547CE67F.1050608@draigBrady.com> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="40F6KwwJ7KsQ37SVv8I4tXitCkIsf0Um4" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --40F6KwwJ7KsQ37SVv8I4tXitCkIsf0Um4 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/01/2014 03:06 PM, P=C3=A1draig Brady wrote: > BTW the argument that it's not a text file is a bit beside the point > as POSIX also says text files can't contain NUL chars, but we process > this just fine: >=20 > $ printf 'a\000b' | cut -c3 > b The fact that GNU offers an extension where we gracefully handle NUL bytes is a bonus of GNU, and does not change the fact that POSIX already says we are in unspecified territory and can do whatever we deem most useful. I suspect that in multibyte locales with non-character encoding errors, the behavior becomes harder to pinpoint on what makes the most sense - but again, that is another aspect that makes a file binary rather than text and therefore falls under unspecified behavior. > Also comparing other tools like uniq we have: >=20 > solaris> printf '1' | uniq > solaris> (nothing output!) >=20 > freebsd> printf '1' | uniq > 1freebsd> >=20 > coreutl> printf '1' | uniq > 1 > coreutl> What about: printf '1\n1' | uniq GNU treats the two lines as identical (and thus supplied a missing \n on the second line); but I don't have ready access to test the other two as I type this. > If we were just implementing now, I'd not output the extra '\n', > but changing at this stage needs to be carefully considered, > and with all the textutils, not just cut(1). I tend to go the opposite - producing text output, even on non-text input, is more likely to be useful when piping files to other utilities that don't handle non-text files as gracefully as the coreutils. But I definitely agree that it is not something we change lightly. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --40F6KwwJ7KsQ37SVv8I4tXitCkIsf0Um4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUfPRjAAoJEKeha0olJ0NqYesH/2jBKAe/fF6x1VzkrqvfZxYQ ygeZCB8VrUSeEWSICzGay28RD+IcPNEABMtwwGS+s4rzOHVjP4rj2QE0njxgD7/0 DoPNfAz1rxYJsMFMe2xzUUQTvMlUI7sufRwT6/EbBelHBxyqENI9CUvyoX1S84e8 LsUeDHXGp6GEuFaEL7ciCbp6bMscZB8Qn4tuWgt0Oz8FNtHgzBvGyLcdpFts3kZj y0hvSYkP3rSLcHcEtoAEUEQCduzNhck6YgvzlmX9LQaCg57fACDkHhk9Uz9E0pwx sut1e8Z3WR/S4px+k7lNEjo2DuFqfbP1GiX2ayLgI6gpqIFmMEIL8OjkCIEJdXA= =iSwB -----END PGP SIGNATURE----- --40F6KwwJ7KsQ37SVv8I4tXitCkIsf0Um4-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 01 18:15:33 2014 Received: (at 19240) by debbugs.gnu.org; 1 Dec 2014 23:15:33 +0000 Received: from localhost ([127.0.0.1]:51305 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvaBo-00004r-JU for submit@debbugs.gnu.org; Mon, 01 Dec 2014 18:15:33 -0500 Received: from mail5.vodafone.ie ([213.233.128.176]:39617) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XvaBm-0008WM-Dj for 19240@debbugs.gnu.org; Mon, 01 Dec 2014 18:15:31 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmsFAFH1fFRtTvvh/2dsb2JhbABbgwaDXlDHEYJoAoEXFgEBAQEBfYQCAQEBAwEjDwFBCgsLDQEKAgIFFgsCAgkDAgECAUUGAQwIAQGIMw0BvkqFfpEQAQsBH4Erj0uCdYFTBaNzjTSCAiCBWT+CdwEBAQ Received: from unknown (HELO localhost.localdomain) ([109.78.251.225]) by mail3.vodafone.ie with ESMTP; 01 Dec 2014 23:15:28 +0000 Message-ID: <547CF690.6070803@draigBrady.com> Date: Mon, 01 Dec 2014 23:15:28 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Eric Blake , John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> In-Reply-To: <547CF463.5080903@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 01/12/14 23:06, Eric Blake wrote: > On 12/01/2014 03:06 PM, Pádraig Brady wrote: > >> BTW the argument that it's not a text file is a bit beside the point >> as POSIX also says text files can't contain NUL chars, but we process >> this just fine: >> >> $ printf 'a\000b' | cut -c3 >> b > > The fact that GNU offers an extension where we gracefully handle NUL > bytes is a bonus of GNU, and does not change the fact that POSIX already > says we are in unspecified territory and can do whatever we deem most > useful. I suspect that in multibyte locales with non-character encoding > errors, the behavior becomes harder to pinpoint on what makes the most > sense - but again, that is another aspect that makes a file binary > rather than text and therefore falls under unspecified behavior. > > >> Also comparing other tools like uniq we have: >> >> solaris> printf '1' | uniq >> solaris> (nothing output!) >> >> freebsd> printf '1' | uniq >> 1freebsd> >> >> coreutl> printf '1' | uniq >> 1 >> coreutl> > > What about: > printf '1\n1' | uniq Both solaris and FreeBSD behave like GNU with that input. > GNU treats the two lines as identical (and thus supplied a missing \n on > the second line); but I don't have ready access to test the other two as > I type this. > >> If we were just implementing now, I'd not output the extra '\n', >> but changing at this stage needs to be carefully considered, >> and with all the textutils, not just cut(1). > > I tend to go the opposite - producing text output, even on non-text > input, is more likely to be useful when piping files to other utilities > that don't handle non-text files as gracefully as the coreutils. But I > definitely agree that it is not something we change lightly. > cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 12:48:45 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 17:48:45 +0000 Received: from localhost ([127.0.0.1]:53857 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwaWC-0005r8-7o for submit@debbugs.gnu.org; Thu, 04 Dec 2014 12:48:44 -0500 Received: from joseki.proulx.com ([216.17.153.58]:51322) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwaW8-0005qx-Ba for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 12:48:41 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 141A121236; Thu, 4 Dec 2014 10:48:39 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id F19EB2DC35; Thu, 4 Dec 2014 10:48:38 -0700 (MST) Date: Thu, 4 Dec 2014 10:48:38 -0700 From: Bob Proulx To: John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline Message-ID: <20141204174838.GA17532@hysteria.proulx.com> References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <547CF463.5080903@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Eric Blake wrote: > I'll leave it to other contributors to weigh in on whether omitting > the final newline on output when it was missing on input is worth > the complexity of a change. > Pádraig Brady wrote: > > If we were just implementing now, I'd not output the extra '\n', > > but changing at this stage needs to be carefully considered, > > and with all the textutils, not just cut(1). > > I tend to go the opposite - producing text output, even on non-text > input, is more likely to be useful when piping files to other utilities > that don't handle non-text files as gracefully as the coreutils. But I > definitely agree that it is not something we change lightly. I have these thoughts and comments to make. 1. I don't "like" input file lines that don't have trailing newlines. It raises the question of whether the input is actually valid input. It feels to me like any line missing a newline is incomplete. There is likely to have been an error in the creation of it. Handling it silently feels like ignoring the error. But raising an actual error by exit code or by emitting a warning or error message feels too heavy handed. I would lean toward assuming that any incomplete input line is actually terminated by a newline as the lessor of the evils. 2. The suggesion for for handling *fields* that do not end with a trailing newline differently from those that do doesn't make any sense to me at all. What is a field? Is the newline part of the field? I think not. Consider this. $ printf "one two" | awk '{print$1}' one $ printf "one two" | awk '{print$2}' two $ printf "one two\n" | awk '{print$1}' one $ printf "one two\n" | awk '{print$2}' two The newline is not part of field two. Otherwise printing it would result in the second having two newlines output. $ printf "one two" | cut -d' ' -f1 one $ printf "one two" | cut -d' ' -f2 two $ printf "one two\n" | cut -d' ' -f1 one $ printf "one two\n" | cut -d' ' -f2 two Same thing for cut. The newline is not part of any of the fields. The newline terminates the input line. The newline is not associated with any of the delimited fields contained in an input line. For byte or character operations in the utils such as head -c those are binary operations and should be interpreted strictly according to the bytes. But not for cut -c which is column based. John Kendall wrote: > # Solaris cut > $ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 > 1 > 12 > 123 > 1234 > 1234 > 1234$ That is tickling non-portable behavior. I had a friend run some tests on HP-UX and IBM AIX and the results there were different from Solaris. Seems Solaris is already the unusual case. When looking count the "1234" lines carefully. Because HP-UX and older AIX don't process the line without a trailing newline at all. It is omitted there. Newer AIX appears to handle it like GNU. # uname -srm HP-UX B.10.20 9000/785 # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 1 12 123 1234 1234 # # uname -srm HP-UX B.11.31 ia64 # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 1 12 123 1234 1234 # # uname -s ; oslevel AIX 4.3.3.0 # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 1 12 123 1234 1234 # # uname -s ; oslevel AIX 7.1.0.0 # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 1 12 123 1234 1234 1234 # # head -1 /etc/motd ; uname -m Compaq Tru64 UNIX V5.0A alpha # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 1 12 123 1234 1234 # # uname -s Darwin # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 1 12 123 1234 1234 1234 # Using input lines without a trailing newline is already a minefield of portability problems. It depends upon details of the implementation. I think what Solaris cut must be doing is processing the emission of characters across the line character by character. When it hits the input newline it knows it is done and emits a newline itself and starts again on a new line. When it hits EOF on the input it probably just stops doing anything and exits itself without printing anything more and therefore not emitting a newline. Likely just an accident of implementation. This is what makes "lines" without a newline such an unportable thing to count upon. It causes it to depend upon an implementation detail. Different implementation might do different things. And in fact different ones do actually do different things. This probably isn't too widespread of an issue or it would have come up more often. And more specific to the Solaris code port there would be similar problems differently if trying to use other legacy Unix platforms. Best to avoid the construct entirely for robust operation. > I came upon this while porting scripts from Solaris 10 to Centos 7. Can you share with us the specific construct that caused this to arise? I have done a lot of script porting to and from HP-UX systems and am curious as to the issue. Bob From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 13:41:54 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 18:41:55 +0000 Received: from localhost ([127.0.0.1]:53885 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwbLe-0007AL-5T for submit@debbugs.gnu.org; Thu, 04 Dec 2014 13:41:54 -0500 Received: from hub021-ca-8.exch021.serverdata.net ([64.78.56.73]:40265) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwbLa-0007A9-1c for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 13:41:51 -0500 Received: from MBX021-W3-CA-2.exch021.domain.local ([10.254.4.78]) by HUB021-CA-8.exch021.domain.local ([10.254.4.112]) with mapi id 14.03.0174.001; Thu, 4 Dec 2014 10:41:48 -0800 From: John Kendall To: "19240@debbugs.gnu.org" <19240@debbugs.gnu.org> Subject: Re: bug#19240: cut 8.22 adds newline Thread-Topic: bug#19240: cut 8.22 adds newline Thread-Index: AQHQDWxAAovutUfLoU2mru2rZH4tFJx7fVGAgAAzxoCAABLRAIAADZiAgAAQj4CABF5FAIAADtqA Date: Thu, 4 Dec 2014 18:41:48 +0000 Message-ID: References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> In-Reply-To: <20141204174838.GA17532@hysteria.proulx.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [65.197.152.29] Content-Type: text/plain; charset="iso-8859-1" Content-ID: <0A9BB2C2E3B3AD40ACB535BFFA4FF1BC@exch021.domain.local> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 19240 Cc: Bob Proulx X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Bob Proulx wrote: > Eric Blake wrote: >> I'll leave it to other contributors to weigh in on whether omitting >> the final newline on output when it was missing on input is worth >> the complexity of a change. >=20 >> P=E1draig Brady wrote: >>> If we were just implementing now, I'd not output the extra '\n', >>> but changing at this stage needs to be carefully considered, >>> and with all the textutils, not just cut(1). >>=20 >> I tend to go the opposite - producing text output, even on non-text >> input, is more likely to be useful when piping files to other utilities >> that don't handle non-text files as gracefully as the coreutils. But I >> definitely agree that it is not something we change lightly. >=20 > I have these thoughts and comments to make. >=20 > 1. I don't "like" input file lines that don't have trailing newlines. > It raises the question of whether the input is actually valid input. > It feels to me like any line missing a newline is incomplete. There > is likely to have been an error in the creation of it. Handling it > silently feels like ignoring the error. But raising an actual error > by exit code or by emitting a warning or error message feels too heavy > handed. I would lean toward assuming that any incomplete input line > is actually terminated by a newline as the lessor of the evils. >=20 > 2. The suggesion for for handling *fields* that do not end with a > trailing newline differently from those that do doesn't make any sense > to me at all. What is a field? Is the newline part of the field? I > think not. Consider this. >=20 > $ printf "one two" | awk '{print$1}' > one >=20 > $ printf "one two" | awk '{print$2}' > two >=20 > $ printf "one two\n" | awk '{print$1}' > one >=20 > $ printf "one two\n" | awk '{print$2}' > two >=20 > The newline is not part of field two. Otherwise printing it would > result in the second having two newlines output. >=20 > $ printf "one two" | cut -d' ' -f1 > one >=20 > $ printf "one two" | cut -d' ' -f2 > two >=20 > $ printf "one two\n" | cut -d' ' -f1 > one >=20 > $ printf "one two\n" | cut -d' ' -f2 > two >=20 > Same thing for cut. The newline is not part of any of the fields. > The newline terminates the input line. The newline is not associated > with any of the delimited fields contained in an input line. >=20 > For byte or character operations in the utils such as head -c those > are binary operations and should be interpreted strictly according to > the bytes. But not for cut -c which is column based. >=20 > John Kendall wrote: >> # Solaris cut >> $ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 >> 1 >> 12 >> 123 >> 1234 >> 1234 >> 1234$ >=20 > That is tickling non-portable behavior. I had a friend run some tests > on HP-UX and IBM AIX and the results there were different from > Solaris. Seems Solaris is already the unusual case. >=20 > When looking count the "1234" lines carefully. Because HP-UX and > older AIX don't process the line without a trailing newline at all. > It is omitted there. Newer AIX appears to handle it like GNU. >=20 > # uname -srm > HP-UX B.10.20 9000/785 > # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 > 1 > 12 > 123 > 1234 > 1234 > # >=20 > # uname -srm > HP-UX B.11.31 ia64 > # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 > 1 > 12 > 123 > 1234 > 1234 > # >=20 > # uname -s ; oslevel > AIX > 4.3.3.0 > # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 > 1 > 12 > 123 > 1234 > 1234 > # >=20 > # uname -s ; oslevel > AIX > 7.1.0.0 > # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 > 1 > 12 > 123 > 1234 > 1234 > 1234 > # >=20 > # head -1 /etc/motd ; uname -m > Compaq Tru64 UNIX V5.0A > alpha > # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 > 1 > 12 > 123 > 1234 > 1234 > # >=20 > # uname -s > Darwin > # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4 > 1 > 12 > 123 > 1234 > 1234 > 1234 > # >=20 > Using input lines without a trailing newline is already a minefield of > portability problems. It depends upon details of the implementation. >=20 > I think what Solaris cut must be doing is processing the emission of > characters across the line character by character. When it hits the > input newline it knows it is done and emits a newline itself and > starts again on a new line. When it hits EOF on the input it probably > just stops doing anything and exits itself without printing anything > more and therefore not emitting a newline. Likely just an accident of > implementation. >=20 > This is what makes "lines" without a newline such an unportable thing > to count upon. It causes it to depend upon an implementation detail. > Different implementation might do different things. And in fact > different ones do actually do different things. This probably isn't > too widespread of an issue or it would have come up more often. And > more specific to the Solaris code port there would be similar problems > differently if trying to use other legacy Unix platforms. Best to > avoid the construct entirely for robust operation. >=20 >> I came upon this while porting scripts from Solaris 10 to Centos 7. >=20 > Can you share with us the specific construct that caused this to > arise? I have done a lot of script porting to and from HP-UX systems > and am curious as to the issue. >=20 The construct in question if just for formatting the output=20 of a script that compares disc files to what's in a database. =20 echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D\c"| cut -c1-30 echo " matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" The output on Solaris might look something like this (with=20 monospaced font on a terminal all the "matches" line up): getDFL_info =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D transWestim_msg =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D selfBillDepotStoHan =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D addSale_invoice =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D buildInvoice =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D addInvoice =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matche= s =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D chgUnit =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D = matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D updSale_invoice =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D The gnu output is: getDFL_info =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D transWestim_msg =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D selfBillDepotStoHan =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D addSale_invoice =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D buildInvoice =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D addInvoice =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D chgUnit =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D updSale_invoice =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This can be re-written, of course. (There is one corner case that=20 Solaris's cut handled nicely that I have not been able to come up=20 with a quick fix.)=20 John > Bob From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 15:24:43 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 20:24:43 +0000 Received: from localhost ([127.0.0.1]:54007 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xwcx9-0001IQ-0b for submit@debbugs.gnu.org; Thu, 04 Dec 2014 15:24:43 -0500 Received: from joseki.proulx.com ([216.17.153.58]:52015) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xwcx6-0001IH-NK for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 15:24:41 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 2228021236; Thu, 4 Dec 2014 13:24:37 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 122402DBC1; Thu, 4 Dec 2014 13:24:36 -0700 (MST) Date: Thu, 4 Dec 2014 13:24:36 -0700 From: Bob Proulx To: John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline Message-ID: <20141204132122651284424@bob.proulx.com> Mail-Followup-To: John Kendall , 19240@debbugs.gnu.org References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141204174838.GA17532@hysteria.proulx.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Additional interesting cases from my friend Ken. HP-UX 11.31 (and earlier) # printf "one two" | cut -d' ' -f2 # AIX 4.3 (and presumably also earlier releases) # printf "one two" | cut -d' ' -f2 # Tru64 V5.0A # printf "one two" | cut -d' ' -f2 # AIX 5.2 (and later releases) # printf "one two" | cut -d' ' -f2 two # Solaris 11 (and earlier) # printf "one two" | cut -d' ' -f2 two # printf "one two" | cut -c5-7 two# From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 15:39:29 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 20:39:29 +0000 Received: from localhost ([127.0.0.1]:54018 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwdBR-0001gF-B0 for submit@debbugs.gnu.org; Thu, 04 Dec 2014 15:39:29 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:39895) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwdBO-0001g6-Ty for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 15:39:27 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id D2426A6012D; Thu, 4 Dec 2014 12:39:25 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EnchYWUGLNwb; Thu, 4 Dec 2014 12:39:17 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 08794A6012A; Thu, 4 Dec 2014 12:39:17 -0800 (PST) Message-ID: <5480C674.9070507@cs.ucla.edu> Date: Thu, 04 Dec 2014 12:39:16 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: John Kendall , "19240@debbugs.gnu.org" <19240@debbugs.gnu.org> Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 19240 Cc: Bob Proulx X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) On 12/04/2014 10:41 AM, John Kendall wrote: > echo "$FILE ===========================\c"| cut -c1-30 Since you're going to have to rewrite it anyway if you want it to be portable, I suggest doing it this way: printf '%.30s' "$FILE ===========================" as it's a lot more efficient anyway. From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 16:07:01 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 21:07:01 +0000 Received: from localhost ([127.0.0.1]:54022 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xwdc4-0002Ka-LC for submit@debbugs.gnu.org; Thu, 04 Dec 2014 16:07:01 -0500 Received: from joseki.proulx.com ([216.17.153.58]:52205) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xwdc2-0002KQ-Gr for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 16:06:59 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 4DB7621236; Thu, 4 Dec 2014 14:06:57 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 299232DC35; Thu, 4 Dec 2014 14:06:57 -0700 (MST) Date: Thu, 4 Dec 2014 14:06:57 -0700 From: Bob Proulx To: John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline Message-ID: <20141204132440101636018@bob.proulx.com> Mail-Followup-To: John Kendall , 19240@debbugs.gnu.org References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) John Kendall wrote: > Bob Proulx wrote: > >> I came upon this while porting scripts from Solaris 10 to Centos 7. > > > > Can you share with us the specific construct that caused this to > > arise? I have done a lot of script porting to and from HP-UX systems > > and am curious as to the issue. > > The construct in question if just for formatting the output > of a script that compares disc files to what's in a database. > > echo "$FILE ===========================\c"| cut -c1-30 > echo " matches ==========" Eww... Immediately I have a second immune reaction to the above. The reason is that the use of echo above is non-portable. It uses the old System V echo interface that interprets escape sequences by default. This can be enabled in bash with the --enable-usg-echo-default flag but it is off by default because BSD doesn't support it by default. The solution to this problem has been to recommend using 'printf' everywhere anywhere that an escape sequence is needed or anywhere that not having a newline is needed. Since printf is POSIX standard and avoids the echo unportability. Use of echo can be very unportable and the "\c" is one of those unportable things. > The output on Solaris might look something like this (with > monospaced font on a terminal all the "matches" line up): > ... Cool. > This can be re-written, of course. (There is one corner case that > Solaris's cut handled nicely that I have not been able to come up > with a quick fix.) Immediately printf comes to mind. Use %s with a format with specifier. Since printf is POSIX standard this should work anywhere. The failure mode of not having printf available on really, really, really old systems is trivially handled by providing a printf for that system. Much easier than dealing with other differences. printf "%.30s matches ==========\n" "$FILE ===========================" One thing I still don't like about the above is that it will truncate any long file names. Any file name longer than 30 will be trunncated. But of course that would require changes in output format to address. My preference would be to have "matches" first and the file name second and let the file name go as long as it needs to go. Bob From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 16:13:44 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 21:13:44 +0000 Received: from localhost ([127.0.0.1]:54026 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwdiZ-0002UR-Qi for submit@debbugs.gnu.org; Thu, 04 Dec 2014 16:13:44 -0500 Received: from hub021-ca-2.exch021.serverdata.net ([64.78.22.169]:14291) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwdiW-0002UE-1N for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 16:13:42 -0500 Received: from MBX021-W3-CA-2.exch021.domain.local ([10.254.4.78]) by HUB021-CA-2.exch021.domain.local ([10.254.4.33]) with mapi id 14.03.0174.001; Thu, 4 Dec 2014 13:13:38 -0800 From: John Kendall To: Paul Eggert Subject: Re: bug#19240: cut 8.22 adds newline Thread-Topic: bug#19240: cut 8.22 adds newline Thread-Index: AQHQDWxAAovutUfLoU2mru2rZH4tFJx7fVGAgAAzxoCAABLRAIAADZiAgAAQj4CABF5FAIAADtqAgAAg0wCAAAmZgA== Date: Thu, 4 Dec 2014 21:13:37 +0000 Message-ID: <08AA9BD2-7A0F-4B76-91B5-146968FF29C6@capps.com> References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> <5480C674.9070507@cs.ucla.edu> In-Reply-To: <5480C674.9070507@cs.ucla.edu> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [65.197.152.29] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 19240 Cc: Bob Proulx , "19240@debbugs.gnu.org" <19240@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Paul Eggert wrote: > On 12/04/2014 10:41 AM, John Kendall wrote: >> echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D\c"| cut -c1-30 >=20 > Since you're going to have to rewrite it anyway if you want it to be port= able, I suggest doing it this way: >=20 > printf '%.30s' "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" >=20 > as it's a lot more efficient anyway. Yes, that's what I've done. The corner case I mentioned is=20 handled badly by this, however. In the corner case $FILE=20 is a list of files separated by a newlines. Solaris cut would=20 list them and then the =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D would be tac= ked=20 on to the last line: filename1 filename2 filename3 filename4 filename5 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D matches When printf is used, it truncates the list of filenames if the sum=20 of them exceeds 30 chars in length. The format string %.30s=20 doesn't treat embedded newlines specially: filename1 filename2 filename3 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D matches filenames start getting lopped off. I'll rework the code. It worked for 15 years, don't be too offended by it. :) =20 From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 16:36:45 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 21:36:45 +0000 Received: from localhost ([127.0.0.1]:54052 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xwe4r-00035x-CZ for submit@debbugs.gnu.org; Thu, 04 Dec 2014 16:36:45 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53012) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xwe4o-00035p-UR for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 16:36:43 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sB4LafQ3004371 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 4 Dec 2014 16:36:41 -0500 Received: from [10.3.113.183] (ovpn-113-183.phx2.redhat.com [10.3.113.183]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sB4Laenv010233; Thu, 4 Dec 2014 16:36:40 -0500 Message-ID: <5480D3E8.7030303@redhat.com> Date: Thu, 04 Dec 2014 14:36:40 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: John Kendall , "19240@debbugs.gnu.org" <19240@debbugs.gnu.org> Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> In-Reply-To: OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="eIUF67agFgIXErJjmCVFaAEbqrItWJne6" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 19240 Cc: Bob Proulx X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --eIUF67agFgIXErJjmCVFaAEbqrItWJne6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/04/2014 11:41 AM, John Kendall wrote: >=20 > The construct in question if just for formatting the output=20 > of a script that compares disc files to what's in a database. =20 >=20 > echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D\c"| cut -c1-30 > echo " matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" echo '\c' is non-portable. POSIX says that echo cannot portably used on any string that uses a backslash (some implementations, like Solaris, interpret that backslash; others print it literally). Use 'printf' instead (in particular, 'printf "%b" "...\c"' does the same as your use of 'echo "...\c"'; that is, 'printf %b' should be a drop-in replacement for echoes that know backslash escapes. On the other hand, if all you are using is \c to end a line, that's the same as 'printf %s ...' with no \c) Also, your example doesn't work for 1-byte $FILE (you don't have enough =3D=3D=3D in the line, and need at least 1 more, which I stick in my resp= onses below). Now, for your particular use case: You can use command substitution to strip trailing newlines, although that is not portable to builds of cut that skip output if the last line doesn't have a newline to begin with: echo "$(printf %b "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D\c"| cut -c1-30)" \ " matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" but you can guarantee a newline ending, and get rid of the non-portable \ to echo, all for a shorter line that portably works: echo "$(echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D"| cut -c1-30)" \ " matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" 'head' can do what you are using cut for: echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D" | head -c30 echo " matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" And if you are using bash, you can even do it without forking: line=3D"$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D" line=3D${line::30} echo "$line matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" There's probably lots of other one-liner solutions that don't require particular behavior of 'cut'. > This can be re-written, of course. (There is one corner case that=20 > Solaris's cut handled nicely that I have not been able to come up=20 > with a quick fix.)=20 Hope my quick fix ideas help you. Feel free to keep asking questions, although you are now moving the topic a bit more into the realm of shell programming than coreutils usage. And remember, it always helps to ask questions related to your end goal, rather than your attempted solution: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem http://www.perlmonks.org/?node=3DXY+Problem --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --eIUF67agFgIXErJjmCVFaAEbqrItWJne6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUgNPoAAoJEKeha0olJ0NqLq4H/jJDk03rrHbwAw3xRzDTlVDi S4Pop64jFGGikoD+3+fH9Y6PaGssnQXFSdLxduMCG2Kwb57FvSqhnAMFTRBBSx5t d7FltRgjpy5ajdPZhl4i5Gt1rfOA8v3jWP0C1Lzwg7Tt5LsKp0UZhcK9K0MTITcj h9w6XDOjVmWQsjgn3taS4o4n55uETLK8ZCLPVcPuCUtDNs3Mkf7NZdcIcJxI1+0/ Jk9fBh50ow/SFlUdUwkJvWA10fuMehU8ZsCaHJEkAVRuawTjgRAsOalLd7jVEcTg AtB2ykoczSkGuE+sCyFLhz9kBChO/Un2RSNtev+dORhhaWxhn9Qwa3AyWOX8F3k= =xzPc -----END PGP SIGNATURE----- --eIUF67agFgIXErJjmCVFaAEbqrItWJne6-- From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 16:48:34 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 21:48:34 +0000 Received: from localhost ([127.0.0.1]:54056 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweGH-0003PG-N7 for submit@debbugs.gnu.org; Thu, 04 Dec 2014 16:48:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39052) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweGD-0003P6-TQ for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 16:48:30 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sB4LmIYP016563 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 4 Dec 2014 16:48:19 -0500 Received: from [10.3.113.183] (ovpn-113-183.phx2.redhat.com [10.3.113.183]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sB4LmHKP003586; Thu, 4 Dec 2014 16:48:17 -0500 Message-ID: <5480D6A1.7070303@redhat.com> Date: Thu, 04 Dec 2014 14:48:17 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Paul Eggert , John Kendall , "19240@debbugs.gnu.org" <19240@debbugs.gnu.org> Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> <5480C674.9070507@cs.ucla.edu> In-Reply-To: <5480C674.9070507@cs.ucla.edu> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="SIlX1Q9orrMWkchs2Hjb9iD6gFE1ppvSs" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 19240 Cc: Bob Proulx X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --SIlX1Q9orrMWkchs2Hjb9iD6gFE1ppvSs Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/04/2014 01:39 PM, Paul Eggert wrote: > On 12/04/2014 10:41 AM, John Kendall wrote: >> echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D\c"| cut -c1-30 >=20 > Since you're going to have to rewrite it anyway if you want it to be > portable, I suggest doing it this way: >=20 > printf '%.30s' "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" >=20 > as it's a lot more efficient anyway. Be careful; the POSIX specification of '%.30s' does NOT work well with multibyte characters; it is specified as: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#t= ag_05 "precision Gives the minimum number of digits to appear for the d, o, i, u, x, or X conversion specifiers (the field is padded with leading zeros), the number of digits to appear after the radix character for the e and f conversion specifiers, the maximum number of significant digits for the g conversion specifier; or the maximum number of bytes to be written from a string in the s conversion specifier. The precision shall take the form of a ( '.' ) followed by a decimal digit string; a null digit string is treated as zero." which means that it CAN and WILL corrupt output if the number of bytes written falls in the middle of a multi-byte character. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --SIlX1Q9orrMWkchs2Hjb9iD6gFE1ppvSs Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUgNahAAoJEKeha0olJ0NqRNgH/12yITtQ2VzDGsTjhY30DCHl RgSwRJLOMGaU2e2JSdfgyb+U/T7AZYM3u55OW8rKhWirygiN8XtD6nE4k6RAzQw6 imXpbu7PfjBaQZ4iIs0p3st3Sjn9CeCv45FW+2152vwJfYwrj7cUd3VATi/Bcbfe fUpFfs6awog6UlYJ/2kOBOC6cttGQrotIqtb2xtRcq1JZtM46ZzPcT+yJ5nw/mZb RcOt+hJ7zUxzaEXxrReTfbUxS7hr8e9ZpHHvjmpgAt0+L+aCScct5TptY8xkWDeL cc8NCRy+E4NfGQg5e08EAxkdGhNX9+HZwZgQSXCJZLOoL0XwzzC0+fM4ylCACls= =87fG -----END PGP SIGNATURE----- --SIlX1Q9orrMWkchs2Hjb9iD6gFE1ppvSs-- From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 16:55:08 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 21:55:08 +0000 Received: from localhost ([127.0.0.1]:54060 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweMd-0003Yx-J9 for submit@debbugs.gnu.org; Thu, 04 Dec 2014 16:55:08 -0500 Received: from joseki.proulx.com ([216.17.153.58]:52444) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweMa-0003Yo-V6 for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 16:55:05 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 95C6C21236; Thu, 4 Dec 2014 14:55:03 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 61EC82DC35; Thu, 4 Dec 2014 14:55:03 -0700 (MST) Date: Thu, 4 Dec 2014 14:55:03 -0700 From: Bob Proulx To: John Kendall , 19240@debbugs.gnu.org Subject: Re: bug#19240: cut 8.22 adds newline Message-ID: <20141204145237260439689@bob.proulx.com> Mail-Followup-To: John Kendall , 19240@debbugs.gnu.org References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> <5480C674.9070507@cs.ucla.edu> <5480D6A1.7070303@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5480D6A1.7070303@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 19240 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Eric Blake wrote: > Be careful; the POSIX specification of '%.30s' does NOT work well with > multibyte characters; it is specified as: > ... > which means that it CAN and WILL corrupt output if the number of bytes > written falls in the middle of a multi-byte character. Good point. Which leads me back to thinking that printing a tag first and then the filename second and letting it be as long as it needs to be without truncation is the best solution. But of course in the original application coming from a legacy environment the file names would never be multibyte. Bob From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 16:57:00 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 21:57:01 +0000 Received: from localhost ([127.0.0.1]:54064 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweOS-0003be-DZ for submit@debbugs.gnu.org; Thu, 04 Dec 2014 16:57:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42699) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweOQ-0003bW-Gd for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 16:56:58 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sB4Luums023020 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 4 Dec 2014 16:56:56 -0500 Received: from [10.3.113.183] (ovpn-113-183.phx2.redhat.com [10.3.113.183]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sB4Luuqg021515; Thu, 4 Dec 2014 16:56:56 -0500 Message-ID: <5480D8A8.2070707@redhat.com> Date: Thu, 04 Dec 2014 14:56:56 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: John Kendall , Paul Eggert Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> <5480C674.9070507@cs.ucla.edu> <08AA9BD2-7A0F-4B76-91B5-146968FF29C6@capps.com> In-Reply-To: <08AA9BD2-7A0F-4B76-91B5-146968FF29C6@capps.com> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MwTN5RgcAj1cdmTU9TMQc0xWUFSdH0cIU" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 19240 Cc: Bob Proulx , "19240@debbugs.gnu.org" <19240@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --MwTN5RgcAj1cdmTU9TMQc0xWUFSdH0cIU Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/04/2014 02:13 PM, John Kendall wrote: > Yes, that's what I've done. The corner case I mentioned is=20 > handled badly by this, however. In the corner case $FILE=20 > is a list of files separated by a newlines. Solaris cut would=20 > list them and then the =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D would be= tacked=20 > on to the last line: Again, mention your goal up front, and you can save us some iterations. So you really DO want to grab a rectangular region of text, and append to just the last line, rather than chop a single line of input at a fixed length (it was not obvious to us from the naming or your example that you intended for $FILE to contain newlines). So, my solution of using command substitution still does this, and portab= ly: echo "$(echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D"| cut -c1-30)" \ " matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" So does sed, although no longer a short one-liner: echo "$FILE" | sed -e 's/^\(.\{30\}\).*/\1/' \ -e '$ {' \ -e 's/$/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D/' \ -e 's/^\(.\{30\}\).*/\1/' \ -e '$ s/$/ matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D/' = \ -e '}' --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --MwTN5RgcAj1cdmTU9TMQc0xWUFSdH0cIU Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUgNioAAoJEKeha0olJ0NqfnUH/j7P5qAT695HIql+MZd2EpTt mv4ICza6W6FXphNOvBfSaWjv3/q8FhHG73r3A2oBTtOVyLgHmKn6nRzCK5DG+y+v Hhmp7IcqrG4nXDphwwE81kTqlO/g6sC/KfaR+tTNO2KS8ksi0uBn4cQ2Bs4W+RYM srm4duR6fvSR4VDvCFG5LyIJ6NfZqPfpwPrnKGkFv2r97OuxGkENeC8yv0YHWzEd IxPf2e2Vou1oj4o85ZfWAgPO69/lRYDqxVEdNET20ul4lmUzKP0KV0zv70qPQ46m GzfvI7EbtX60AjAVSeLXDUXsYK00sxNHaoIo9rRPHPv5k2qZEIu5XvSXE+IBZ38= =QZs9 -----END PGP SIGNATURE----- --MwTN5RgcAj1cdmTU9TMQc0xWUFSdH0cIU-- From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 17:01:30 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 22:01:30 +0000 Received: from localhost ([127.0.0.1]:54072 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweSn-0003js-OX for submit@debbugs.gnu.org; Thu, 04 Dec 2014 17:01:30 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43851) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweSl-0003jj-E1 for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 17:01:28 -0500 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sB4M1OPm021513 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 4 Dec 2014 17:01:24 -0500 Received: from [10.3.113.183] (ovpn-113-183.phx2.redhat.com [10.3.113.183]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sB4M1O03032531; Thu, 4 Dec 2014 17:01:24 -0500 Message-ID: <5480D9B3.7090309@redhat.com> Date: Thu, 04 Dec 2014 15:01:23 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: John Kendall , Paul Eggert Subject: Re: bug#19240: cut 8.22 adds newline References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> <5480C674.9070507@cs.ucla.edu> <08AA9BD2-7A0F-4B76-91B5-146968FF29C6@capps.com> <5480D8A8.2070707@redhat.com> In-Reply-To: <5480D8A8.2070707@redhat.com> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="R9E4q8AtaaD01U5UxMRw376Oc0i13Rwt0" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 19240 Cc: Bob Proulx , "19240@debbugs.gnu.org" <19240@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --R9E4q8AtaaD01U5UxMRw376Oc0i13Rwt0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/04/2014 02:56 PM, Eric Blake wrote: > So does sed, although no longer a short one-liner: >=20 > echo "$FILE" | sed -e 's/^\(.\{30\}\).*/\1/' \ > -e '$ {' \ > -e 's/$/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D/' \ > -e 's/^\(.\{30\}\).*/\1/' \ > -e '$ s/$/ matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D/= ' \ > -e '}' Or: echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D" | \ sed -e 's/^\(.\{30\}\).*/\1/' \ -e '$ s/$/ matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D/' --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --R9E4q8AtaaD01U5UxMRw376Oc0i13Rwt0 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUgNmzAAoJEKeha0olJ0NqbucH/iMBbJ/m37DPmrng3vhc5U/5 6YDiNu3matpT9GI6i8iXXNSbJaeXEs6Ma4sC6WmIGhdIb5mSuYUgtyIw2k6IMrIy USn0/KEZQDlUlvCNWLSA/2qWpRWArg06TrVB1UJ7WKIloBXp04Qg9NfWUnpzmz/k +bmgllyM/lN9sAuOMPBxC1klodSBI8ZKMjGPJksNlOV2h+m+VBLozvIxOA0g0FD3 FuE/z6dDJVxGf9zG/u4uPM3v2EEwiJ7xKhwzvVC0/c2mjHq6eesap7t9nJLx5Ic/ l9+rFltm12KuERWRHT2tX3CeiEp3kWjEpoCxR3C3fPuRRdKkye28gyC5ZcICuAQ= =FGl3 -----END PGP SIGNATURE----- --R9E4q8AtaaD01U5UxMRw376Oc0i13Rwt0-- From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 17:29:58 2014 Received: (at 19240) by debbugs.gnu.org; 4 Dec 2014 22:29:58 +0000 Received: from localhost ([127.0.0.1]:54090 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweuM-0004P1-2H for submit@debbugs.gnu.org; Thu, 04 Dec 2014 17:29:58 -0500 Received: from hub021-ca-4.exch021.serverdata.net ([64.78.22.171]:64835) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XweuJ-0004Os-NN for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 17:29:57 -0500 Received: from MBX021-W3-CA-2.exch021.domain.local ([10.254.4.78]) by HUB021-CA-4.exch021.domain.local ([10.254.4.39]) with mapi id 14.03.0174.001; Thu, 4 Dec 2014 14:29:54 -0800 From: John Kendall To: Eric Blake Subject: Re: bug#19240: cut 8.22 adds newline Thread-Topic: bug#19240: cut 8.22 adds newline Thread-Index: AQHQDWxAAovutUfLoU2mru2rZH4tFJx7fVGAgAAzxoCAABLRAIAADZiAgAAQj4CABF5FAIAADtqAgAAg0wCAAAmZgIAADBoAgAAJNAA= Date: Thu, 4 Dec 2014 22:29:53 +0000 Message-ID: References: <0192FC7D-5C87-48C6-A4AC-C695639E7C41@capps.com> <547C9FE1.2020702@redhat.com> <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> <5480C674.9070507@cs.ucla.edu> <08AA9BD2-7A0F-4B76-91B5-146968FF29C6@capps.com> <5480D8A8.2070707@redhat.com> In-Reply-To: <5480D8A8.2070707@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [65.197.152.29] Content-Type: text/plain; charset="us-ascii" Content-ID: <2F101B4D8B73F24386FA43B51A5E4FE0@exch021.domain.local> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 19240 Cc: Paul Eggert , Bob Proulx , "19240@debbugs.gnu.org" <19240@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Dec 4, 2014, at 1:56 PM, Eric Blake wrote: > On 12/04/2014 02:13 PM, John Kendall wrote: >> Yes, that's what I've done. The corner case I mentioned is=20 >> handled badly by this, however. In the corner case $FILE=20 >> is a list of files separated by a newlines. Solaris cut would=20 >> list them and then the =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D would be = tacked=20 >> on to the last line: >=20 > Again, mention your goal up front, and you can save us some iterations. > So you really DO want to grab a rectangular region of text, and append > to just the last line, rather than chop a single line of input at a > fixed length (it was not obvious to us from the naming or your example > that you intended for $FILE to contain newlines). >=20 My goal was to bring up the differences between Solaris cut and gnu cut=20 and hear the justification. And I've learned a lot. I've been in the Solaris gated community for so long, imagine how much I have never had to think about! But it was never my intention to have you solve the re-write for me. I=20 only shared my code because Bob asked. But I really appreciate you=20 solving it for me! Thanks again to all of you. > So, my solution of using command substitution still does this, and portab= ly: >=20 > echo "$(echo "$FILE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D"| cut -c1-30)" \ > " matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" >=20 > So does sed, although no longer a short one-liner: >=20 > echo "$FILE" | sed -e 's/^\(.\{30\}\).*/\1/' \ > -e '$ {' \ > -e 's/$/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D/' \ > -e 's/^\(.\{30\}\).*/\1/' \ > -e '$ s/$/ matches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D/' \ > -e '}' >=20 > --=20 > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org >=20 From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 04 22:17:26 2014 Received: (at 19240) by debbugs.gnu.org; 5 Dec 2014 03:17:26 +0000 Received: from localhost ([127.0.0.1]:54204 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwjOX-0002sM-M7 for submit@debbugs.gnu.org; Thu, 04 Dec 2014 22:17:25 -0500 Received: from joseki.proulx.com ([216.17.153.58]:53955) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XwjOU-0002sC-MR for 19240@debbugs.gnu.org; Thu, 04 Dec 2014 22:17:23 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 2A59421236; Thu, 4 Dec 2014 20:17:19 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id EE0512DC35; Thu, 4 Dec 2014 20:17:18 -0700 (MST) Date: Thu, 4 Dec 2014 20:17:18 -0700 From: Bob Proulx To: John Kendall Subject: Re: bug#19240: cut 8.22 adds newline Message-ID: <20141204200949425889978@bob.proulx.com> Mail-Followup-To: John Kendall , 19240@debbugs.gnu.org References: <547CDB18.3010205@redhat.com> <547CE67F.1050608@draigBrady.com> <547CF463.5080903@redhat.com> <20141204174838.GA17532@hysteria.proulx.com> <5480C674.9070507@cs.ucla.edu> <08AA9BD2-7A0F-4B76-91B5-146968FF29C6@capps.com> <5480D8A8.2070707@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 19240 Cc: 19240@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) John Kendall wrote: > My goal was to bring up the differences between Solaris cut and gnu cut > and hear the justification. And I've learned a lot. I've been in the > Solaris gated community for so long, imagine how much I have never > had to think about! At one time I was exactly the same way after years of using HP-UX! :-) Well... Maybe not because there were always other machines in the mix too. > But it was never my intention to have you solve the re-write for me. I > only shared my code because Bob asked. But I really appreciate you > solving it for me! > > Thanks again to all of you. Thanks for the sharing. As I said I was curious as to the code issue that was problematic for portability. I already knew it wasn't portable or it wouldn't have been a squeaky wheel. So seeing something unportable was simply expected. And I will speak for the group and say you are most welcome. We do this because if you set a tangled ball of string in front we would untangle it. It is just our nature. Bob From unknown Fri Jun 20 07:25:58 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 02 Jan 2015 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator