GNU bug report logs - #19240
cut 8.22 adds newline

Reported by: John Kendall <john <at> capps.com>

Date: Mon, 1 Dec 2014 16:44:01 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19240 in the body.
You can then email your comments to 19240 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Mon, 01 Dec 2014 16:44:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to John Kendall <john <at> capps.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 01 Dec 2014 16:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: John Kendall <john <at> capps.com>
To: "bug-coreutils <at> gnu.org" <bug-coreutils <at> gnu.org>
Subject: cut 8.22 adds newline
Date: Mon, 1 Dec 2014 13:39:38 +0000

Hi,

I don't know if this is a bug, but I wonder if there is a consensus on correct behavior.
The solaris version of cut does not add a newline if there was no newline on the input:

Consider this printf command:

$ printf "1\n12\n123\n1234\n12345\n123456"
1
12
123
1234
12345
123456$

Note that the shell prompt appears after the 6 on the last line.


# Solaris cut
$ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
1
12
123
1234
1234
1234$

Note that the shell prompt appears after the 4 on the last line.

#gnu 8.22 cut
/$ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
1
12
123
1234
1234
1234
$

Note that the shell prompt appears on its own line.

I came upon this while porting scripts from Solaris 10 to Centos 7.

Interested to hear you thoughts.

Thanks and best regards,
John
---
John Kendall
System Administrator
CAI International

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Mon, 01 Dec 2014 17:06:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Mon, 01 Dec 2014 17:06:02 GMT) Full text and rfc822 format available.

Notification sent to John Kendall <john <at> capps.com>:
bug acknowledged by developer. (Mon, 01 Dec 2014 17:06:03 GMT) Full text and rfc822 format available.

Message #12 received at 19240-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: John Kendall <john <at> capps.com>, 19240-done <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Mon, 01 Dec 2014 10:05:37 -0700

[Message part 1 (text/plain, inline)]

tag 19240 notabug
thanks

On 12/01/2014 06:39 AM, John Kendall wrote:
> Hi,
> 
> I don't know if this is a bug, but I wonder if there is a consensus on correct behavior.
> The solaris version of cut does not add a newline if there was no newline on the input:

Such an input is not a text file (the POSIX definition of text file
requires that if the file is not empty, it ends in newline); and POSIX
leaves the behavior of 'cut' unspecified if it is not operating on a
text file.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_397

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html

Therefore, it is unspecified whether cut will add or skip a trailing
newline.

> 
> I came upon this while porting scripts from Solaris 10 to Centos 7.

GNU chose to make cut behave similarly to sort, which IS required to add
a trailing newline even when the input lacks one (that is, POSIX goes
the extra mile and defines sort behavior on non-text files that are
non-text only because they lack a newline).  Solaris chose differently.
 But the problem is that you are relying on unspecified behavior; fix
your input files to have a trailing newline, then you won't have to
worry about it.

At any rate, I see no reason to change GNU behavior, so I'm closing this
as not a bug.  Feel free to add further comments, though, including if
you have a stronger argument for why we should reopen the bug and change
behavior.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Mon, 01 Dec 2014 21:19:02 GMT) Full text and rfc822 format available.

Message #15 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: John Kendall <john <at> capps.com>, 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Mon, 01 Dec 2014 14:18:16 -0700

[Message part 1 (text/plain, inline)]

[re-adding the bug, with permission]

On 12/01/2014 01:10 PM, John Kendall wrote:
> Thanks, Eric.
> 
> My only, admittedly weak, rebuttal is that the behavior of sort might not 
> be the best behavior to imitate.  It's understandable why POSIX defines 
> how sort behaves, since it's intended for multi-line input.
> 
> It seems sed, which is frequently used for single lines of input, might be 
> a better analogy.  Gnu sed 4.2.2 and solaris sed act the same way as 
> solaris cut (no newline added):
> 
> $ printf "ooooooooooo" | sed 's/o/p/g'
> ppppppppppp$

As a counter-argument, I recall hearing of other implementations of sed
that silently omit a trailing line that lacks a newline.  And perhaps
GNU sed should be changed to always emit a trailing newline, but that's
something to bring up on the sed mailing list :)

> 
> 
> If my weak rebuttal is unconvincing, then I wonder if a note could be 
> added to the cut man page so that the next porter can find an answer 
> a little easier.   As an interesting counterpoint, the Solaris version of
> sort announces loudly when it does what POSIX requires:
> 
> $ printf "ooooooooooo" | sort
> sort: missing NEWLINE added at end of input file STDIN
> ooooooooooo
> $

Ouch - that's a bug in Solaris.  POSIX does not allow for noise on
stderr when giving a default 0 success exit status.

> 
> 
> 
> Thanks for taking the time to clarify this.  I've been using SunOS and 
> Solaris exclusively since 1992, so I've had a stable environment and 
> was oblivious to the unspecified behavior that my scripts depended on.  
> 
> Cheers,
> John
> 

I'll leave it to other contributors to weigh in on whether omitting the
final newline on output when it was missing on input is worth the
complexity of a change.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Mon, 01 Dec 2014 22:07:02 GMT) Full text and rfc822 format available.

Message #18 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Eric Blake <eblake <at> redhat.com>, John Kendall <john <at> capps.com>, 
 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Mon, 01 Dec 2014 22:06:55 +0000

On 01/12/14 21:18, Eric Blake wrote:
> [re-adding the bug, with permission]
> 
> On 12/01/2014 01:10 PM, John Kendall wrote:
>> Thanks, Eric.
>>
>> My only, admittedly weak, rebuttal is that the behavior of sort might not 
>> be the best behavior to imitate.  It's understandable why POSIX defines 
>> how sort behaves, since it's intended for multi-line input.
>>
>> It seems sed, which is frequently used for single lines of input, might be 
>> a better analogy.  Gnu sed 4.2.2 and solaris sed act the same way as 
>> solaris cut (no newline added):
>>
>> $ printf "ooooooooooo" | sed 's/o/p/g'
>> ppppppppppp$
> 
> As a counter-argument, I recall hearing of other implementations of sed
> that silently omit a trailing line that lacks a newline.  And perhaps
> GNU sed should be changed to always emit a trailing newline, but that's
> something to bring up on the sed mailing list :)

I don't think so.
I agree that a newline should only be added where needed,
especially with a low level tool like sed.

sort can reorder the last item elsewhere in the output
and so needs to output the extra '\n'.

BTW the argument that it's not a text file is a bit beside the point
as POSIX also says text files can't contain NUL chars, but we process
this just fine:

  $ printf 'a\000b' | cut -c3
  b

>> If my weak rebuttal is unconvincing, then I wonder if a note could be 
>> added to the cut man page so that the next porter can find an answer 
>> a little easier.   As an interesting counterpoint, the Solaris version of
>> sort announces loudly when it does what POSIX requires:
>>
>> $ printf "ooooooooooo" | sort
>> sort: missing NEWLINE added at end of input file STDIN
>> ooooooooooo
>> $
> 
> Ouch - that's a bug in Solaris.  POSIX does not allow for noise on
> stderr when giving a default 0 success exit status.
> 
>>
>>
>>
>> Thanks for taking the time to clarify this.  I've been using SunOS and 
>> Solaris exclusively since 1992, so I've had a stable environment and 
>> was oblivious to the unspecified behavior that my scripts depended on.  
>>
>> Cheers,
>> John
>>
> 
> I'll leave it to other contributors to weigh in on whether omitting the
> final newline on output when it was missing on input is worth the
> complexity of a change.

Our current behaviour wrt newlines is "documented" at:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/misc/cut.pl;h=04188621b#l132
though those tests were only added in v8.21

Note I see that solaris is inconsistent with -c and -f in this regard:

solaris> printf '1\n2' | cut -c1
1
2solaris>

solaris> printf '1\n2' | cut -f1
1
2
solaris>

I kid you not that FreeBSD does the opposite and outputs
the extra '\n' in the -c case but not with -f.

Also comparing other tools like uniq we have:

solaris> printf '1' | uniq
solaris> (nothing output!)

freebsd> printf '1' | uniq
1freebsd>

coreutl> printf '1' | uniq
1
coreutl>


If we were just implementing now, I'd not output the extra '\n',
but changing at this stage needs to be carefully considered,
and with all the textutils, not just cut(1).

thanks,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Mon, 01 Dec 2014 22:26:01 GMT) Full text and rfc822 format available.

Message #21 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>, 
 Eric Blake <eblake <at> redhat.com>,
 John Kendall <john <at> capps.com>, 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Mon, 01 Dec 2014 14:24:55 -0800

On 12/01/2014 02:06 PM, Pádraig Brady wrote:
> If we were just implementing now, I'd not output the extra '\n',

I have just the opposite kneejerk reaction; typically text-based apps 
are simpler and easier to document and use when they silently pretend 
that the input had a trailing newline.  That's what 'awk' and 'grep' do, 
for example, and they works fine.  There are some solid counterexamples 
(e.g., Emacs, diff) but they have good reasons to be counterexamples.

> a newline should only be added where needed,
> especially with a low level tool like sed.

I'm afraid 'sed' is not that low-level, and GNU sed's current behavior 
is inconsistent.  Sometimes it silently appends a trailing newline to 
the input before processing it, and sometimes it doesn't:

$ printf x | sed '$a\
> y'
x
y
$ printf x | sed 's/$/y/'
xy$

> changing at this stage needs to be carefully considered

Yes, the use cases are key here.

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Mon, 01 Dec 2014 23:07:02 GMT) Full text and rfc822 format available.

Message #24 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Pádraig Brady <P <at> draigbrady.com>,
 John Kendall <john <at> capps.com>, 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Mon, 01 Dec 2014 16:06:11 -0700

[Message part 1 (text/plain, inline)]

On 12/01/2014 03:06 PM, Pádraig Brady wrote:

> BTW the argument that it's not a text file is a bit beside the point
> as POSIX also says text files can't contain NUL chars, but we process
> this just fine:
> 
>   $ printf 'a\000b' | cut -c3
>   b

The fact that GNU offers an extension where we gracefully handle NUL
bytes is a bonus of GNU, and does not change the fact that POSIX already
says we are in unspecified territory and can do whatever we deem most
useful.  I suspect that in multibyte locales with non-character encoding
errors, the behavior becomes harder to pinpoint on what makes the most
sense - but again, that is another aspect that makes a file binary
rather than text and therefore falls under unspecified behavior.

> Also comparing other tools like uniq we have:
> 
> solaris> printf '1' | uniq
> solaris> (nothing output!)
> 
> freebsd> printf '1' | uniq
> 1freebsd>
> 
> coreutl> printf '1' | uniq
> 1
> coreutl>

What about:
printf '1\n1' | uniq

GNU treats the two lines as identical (and thus supplied a missing \n on
the second line); but I don't have ready access to test the other two as
I type this.

> If we were just implementing now, I'd not output the extra '\n',
> but changing at this stage needs to be carefully considered,
> and with all the textutils, not just cut(1).

I tend to go the opposite - producing text output, even on non-text
input, is more likely to be useful when piping files to other utilities
that don't handle non-text files as gracefully as the coreutils.  But I
definitely agree that it is not something we change lightly.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Mon, 01 Dec 2014 23:16:02 GMT) Full text and rfc822 format available.

Message #27 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Eric Blake <eblake <at> redhat.com>, John Kendall <john <at> capps.com>, 
 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Mon, 01 Dec 2014 23:15:28 +0000

On 01/12/14 23:06, Eric Blake wrote:
> On 12/01/2014 03:06 PM, Pádraig Brady wrote:
> 
>> BTW the argument that it's not a text file is a bit beside the point
>> as POSIX also says text files can't contain NUL chars, but we process
>> this just fine:
>>
>>   $ printf 'a\000b' | cut -c3
>>   b
> 
> The fact that GNU offers an extension where we gracefully handle NUL
> bytes is a bonus of GNU, and does not change the fact that POSIX already
> says we are in unspecified territory and can do whatever we deem most
> useful.  I suspect that in multibyte locales with non-character encoding
> errors, the behavior becomes harder to pinpoint on what makes the most
> sense - but again, that is another aspect that makes a file binary
> rather than text and therefore falls under unspecified behavior.
> 
> 
>> Also comparing other tools like uniq we have:
>>
>> solaris> printf '1' | uniq
>> solaris> (nothing output!)
>>
>> freebsd> printf '1' | uniq
>> 1freebsd>
>>
>> coreutl> printf '1' | uniq
>> 1
>> coreutl>
> 
> What about:
> printf '1\n1' | uniq

Both solaris and FreeBSD behave like GNU with that input.

> GNU treats the two lines as identical (and thus supplied a missing \n on
> the second line); but I don't have ready access to test the other two as
> I type this.
> 
>> If we were just implementing now, I'd not output the extra '\n',
>> but changing at this stage needs to be carefully considered,
>> and with all the textutils, not just cut(1).
> 
> I tend to go the opposite - producing text output, even on non-text
> input, is more likely to be useful when piping files to other utilities
> that don't handle non-text files as gracefully as the coreutils.  But I
> definitely agree that it is not something we change lightly.
> 

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 17:49:02 GMT) Full text and rfc822 format available.

Message #30 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: John Kendall <john <at> capps.com>, 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 10:48:38 -0700

Eric Blake wrote:
> I'll leave it to other contributors to weigh in on whether omitting
> the final newline on output when it was missing on input is worth
> the complexity of a change.

> Pádraig Brady wrote:
> > If we were just implementing now, I'd not output the extra '\n',
> > but changing at this stage needs to be carefully considered,
> > and with all the textutils, not just cut(1).
> 
> I tend to go the opposite - producing text output, even on non-text
> input, is more likely to be useful when piping files to other utilities
> that don't handle non-text files as gracefully as the coreutils.  But I
> definitely agree that it is not something we change lightly.

I have these thoughts and comments to make.

1. I don't "like" input file lines that don't have trailing newlines.
It raises the question of whether the input is actually valid input.
It feels to me like any line missing a newline is incomplete.  There
is likely to have been an error in the creation of it.  Handling it
silently feels like ignoring the error.  But raising an actual error
by exit code or by emitting a warning or error message feels too heavy
handed.  I would lean toward assuming that any incomplete input line
is actually terminated by a newline as the lessor of the evils.

2. The suggesion for for handling *fields* that do not end with a
trailing newline differently from those that do doesn't make any sense
to me at all.  What is a field?  Is the newline part of the field?  I
think not.  Consider this.

  $ printf "one two" | awk '{print$1}'
  one

  $ printf "one two" | awk '{print$2}'
  two

  $ printf "one two\n" | awk '{print$1}'
  one

  $ printf "one two\n" | awk '{print$2}'
  two

The newline is not part of field two.  Otherwise printing it would
result in the second having two newlines output.

  $ printf "one two" | cut -d' ' -f1
  one

  $ printf "one two" | cut -d' ' -f2
  two

  $ printf "one two\n" | cut -d' ' -f1
  one

  $ printf "one two\n" | cut -d' ' -f2
  two

Same thing for cut.  The newline is not part of any of the fields.
The newline terminates the input line.  The newline is not associated
with any of the delimited fields contained in an input line.

For byte or character operations in the utils such as head -c those
are binary operations and should be interpreted strictly according to
the bytes.  But not for cut -c which is column based.

John Kendall wrote:
> # Solaris cut
> $ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
> 1
> 12
> 123
> 1234
> 1234
> 1234$

That is tickling non-portable behavior.  I had a friend run some tests
on HP-UX and IBM AIX and the results there were different from
Solaris.  Seems Solaris is already the unusual case.

When looking count the "1234" lines carefully.  Because HP-UX and
older AIX don't process the line without a trailing newline at all.
It is omitted there.  Newer AIX appears to handle it like GNU.

  # uname -srm
  HP-UX B.10.20 9000/785
  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
  1
  12
  123
  1234
  1234
  #

  # uname -srm
  HP-UX B.11.31 ia64
  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
  1
  12
  123
  1234
  1234
  #

  # uname -s ; oslevel
  AIX
  4.3.3.0
  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
  1
  12
  123
  1234
  1234
  #

  # uname -s ; oslevel
  AIX
  7.1.0.0
  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
  1
  12
  123
  1234
  1234
  1234
  #

  # head -1 /etc/motd ; uname -m
  Compaq Tru64 UNIX V5.0A
  alpha
  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
  1
  12
  123
  1234
  1234
  #

  # uname -s
  Darwin
  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
  1
  12
  123
  1234
  1234
  1234
  #

Using input lines without a trailing newline is already a minefield of
portability problems.  It depends upon details of the implementation.

I think what Solaris cut must be doing is processing the emission of
characters across the line character by character.  When it hits the
input newline it knows it is done and emits a newline itself and
starts again on a new line.  When it hits EOF on the input it probably
just stops doing anything and exits itself without printing anything
more and therefore not emitting a newline.  Likely just an accident of
implementation.

This is what makes "lines" without a newline such an unportable thing
to count upon.  It causes it to depend upon an implementation detail.
Different implementation might do different things.  And in fact
different ones do actually do different things.  This probably isn't
too widespread of an issue or it would have come up more often.  And
more specific to the Solaris code port there would be similar problems
differently if trying to use other legacy Unix platforms.  Best to
avoid the construct entirely for robust operation.

> I came upon this while porting scripts from Solaris 10 to Centos 7.

Can you share with us the specific construct that caused this to
arise?  I have done a lot of script porting to and from HP-UX systems
and am curious as to the issue.

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 18:42:01 GMT) Full text and rfc822 format available.

Message #33 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: John Kendall <john <at> capps.com>
To: "19240 <at> debbugs.gnu.org" <19240 <at> debbugs.gnu.org>
Cc: Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 18:41:48 +0000

Bob Proulx wrote:
> Eric Blake wrote:
>> I'll leave it to other contributors to weigh in on whether omitting
>> the final newline on output when it was missing on input is worth
>> the complexity of a change.
> 
>> Pádraig Brady wrote:
>>> If we were just implementing now, I'd not output the extra '\n',
>>> but changing at this stage needs to be carefully considered,
>>> and with all the textutils, not just cut(1).
>> 
>> I tend to go the opposite - producing text output, even on non-text
>> input, is more likely to be useful when piping files to other utilities
>> that don't handle non-text files as gracefully as the coreutils.  But I
>> definitely agree that it is not something we change lightly.
> 
> I have these thoughts and comments to make.
> 
> 1. I don't "like" input file lines that don't have trailing newlines.
> It raises the question of whether the input is actually valid input.
> It feels to me like any line missing a newline is incomplete.  There
> is likely to have been an error in the creation of it.  Handling it
> silently feels like ignoring the error.  But raising an actual error
> by exit code or by emitting a warning or error message feels too heavy
> handed.  I would lean toward assuming that any incomplete input line
> is actually terminated by a newline as the lessor of the evils.
> 
> 2. The suggesion for for handling *fields* that do not end with a
> trailing newline differently from those that do doesn't make any sense
> to me at all.  What is a field?  Is the newline part of the field?  I
> think not.  Consider this.
> 
>  $ printf "one two" | awk '{print$1}'
>  one
> 
>  $ printf "one two" | awk '{print$2}'
>  two
> 
>  $ printf "one two\n" | awk '{print$1}'
>  one
> 
>  $ printf "one two\n" | awk '{print$2}'
>  two
> 
> The newline is not part of field two.  Otherwise printing it would
> result in the second having two newlines output.
> 
>  $ printf "one two" | cut -d' ' -f1
>  one
> 
>  $ printf "one two" | cut -d' ' -f2
>  two
> 
>  $ printf "one two\n" | cut -d' ' -f1
>  one
> 
>  $ printf "one two\n" | cut -d' ' -f2
>  two
> 
> Same thing for cut.  The newline is not part of any of the fields.
> The newline terminates the input line.  The newline is not associated
> with any of the delimited fields contained in an input line.
> 
> For byte or character operations in the utils such as head -c those
> are binary operations and should be interpreted strictly according to
> the bytes.  But not for cut -c which is column based.
> 
> John Kendall wrote:
>> # Solaris cut
>> $ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>> 1
>> 12
>> 123
>> 1234
>> 1234
>> 1234$
> 
> That is tickling non-portable behavior.  I had a friend run some tests
> on HP-UX and IBM AIX and the results there were different from
> Solaris.  Seems Solaris is already the unusual case.
> 
> When looking count the "1234" lines carefully.  Because HP-UX and
> older AIX don't process the line without a trailing newline at all.
> It is omitted there.  Newer AIX appears to handle it like GNU.
> 
>  # uname -srm
>  HP-UX B.10.20 9000/785
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  #
> 
>  # uname -srm
>  HP-UX B.11.31 ia64
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  #
> 
>  # uname -s ; oslevel
>  AIX
>  4.3.3.0
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  #
> 
>  # uname -s ; oslevel
>  AIX
>  7.1.0.0
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  1234
>  #
> 
>  # head -1 /etc/motd ; uname -m
>  Compaq Tru64 UNIX V5.0A
>  alpha
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  #
> 
>  # uname -s
>  Darwin
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  1234
>  #
> 
> Using input lines without a trailing newline is already a minefield of
> portability problems.  It depends upon details of the implementation.
> 
> I think what Solaris cut must be doing is processing the emission of
> characters across the line character by character.  When it hits the
> input newline it knows it is done and emits a newline itself and
> starts again on a new line.  When it hits EOF on the input it probably
> just stops doing anything and exits itself without printing anything
> more and therefore not emitting a newline.  Likely just an accident of
> implementation.
> 
> This is what makes "lines" without a newline such an unportable thing
> to count upon.  It causes it to depend upon an implementation detail.
> Different implementation might do different things.  And in fact
> different ones do actually do different things.  This probably isn't
> too widespread of an issue or it would have come up more often.  And
> more specific to the Solaris code port there would be similar problems
> differently if trying to use other legacy Unix platforms.  Best to
> avoid the construct entirely for robust operation.
> 
>> I came upon this while porting scripts from Solaris 10 to Centos 7.
> 
> Can you share with us the specific construct that caused this to
> arise?  I have done a lot of script porting to and from HP-UX systems
> and am curious as to the issue.
> 

The construct in question if just for formatting the output 
of a script that compares disc files to what's in a database.  

 echo "$FILE ===========================\c"| cut -c1-30
 echo " matches =========="


The output on Solaris might look something like this (with 
monospaced font on a terminal all the "matches" line up):

getDFL_info ================== matches ==========
transWestim_msg ============== matches ==========
selfBillDepotStoHan ========== matches ==========
addSale_invoice ============== matches ==========
buildInvoice ================= matches ==========
addInvoice =================== matches ==========
chgUnit ====================== matches ==========
updSale_invoice ============== matches ==========

The gnu output is:

getDFL_info ==================
 matches ==========
transWestim_msg ==============
 matches ==========
selfBillDepotStoHan ==========
 matches ==========
addSale_invoice ==============
 matches ==========
buildInvoice =================
 matches ==========
addInvoice ===================
 matches ==========
chgUnit ======================
 matches ==========
updSale_invoice ==============
 matches ==========

This can be re-written, of course.  (There is one corner case that 
Solaris's cut handled nicely that I have not been able to come up 
with a quick fix.) 

John

> Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 20:25:02 GMT) Full text and rfc822 format available.

Message #36 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: John Kendall <john <at> capps.com>, 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 13:24:36 -0700

Additional interesting cases from my friend Ken.

  HP-UX 11.31 (and earlier)
  # printf "one two" | cut -d' ' -f2
  #

  AIX 4.3 (and presumably also earlier releases)
  # printf "one two" | cut -d' ' -f2
  #

  Tru64 V5.0A
  # printf "one two" | cut -d' ' -f2
  #

  AIX 5.2 (and later releases)
  # printf "one two" | cut -d' ' -f2
  two
  #

  Solaris 11 (and earlier)
  # printf "one two" | cut -d' ' -f2
  two
  # printf "one two" | cut -c5-7
  two#

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 20:40:02 GMT) Full text and rfc822 format available.

Message #39 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: John Kendall <john <at> capps.com>, 
 "19240 <at> debbugs.gnu.org" <19240 <at> debbugs.gnu.org>
Cc: Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 04 Dec 2014 12:39:16 -0800

On 12/04/2014 10:41 AM, John Kendall wrote:
> echo "$FILE ===========================\c"| cut -c1-30

Since you're going to have to rewrite it anyway if you want it to be 
portable, I suggest doing it this way:

printf '%.30s' "$FILE ==========================="

as it's a lot more efficient anyway.

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 21:08:01 GMT) Full text and rfc822 format available.

Message #42 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: John Kendall <john <at> capps.com>, 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 14:06:57 -0700

John Kendall wrote:
> Bob Proulx wrote:
> >> I came upon this while porting scripts from Solaris 10 to Centos 7.
> > 
> > Can you share with us the specific construct that caused this to
> > arise?  I have done a lot of script porting to and from HP-UX systems
> > and am curious as to the issue.
> 
> The construct in question if just for formatting the output 
> of a script that compares disc files to what's in a database.  
> 
>  echo "$FILE ===========================\c"| cut -c1-30
>  echo " matches =========="

Eww...  Immediately I have a second immune reaction to the above.  The
reason is that the use of echo above is non-portable.  It uses the old
System V echo interface that interprets escape sequences by default.
This can be enabled in bash with the --enable-usg-echo-default flag
but it is off by default because BSD doesn't support it by default.

The solution to this problem has been to recommend using 'printf'
everywhere anywhere that an escape sequence is needed or anywhere that
not having a newline is needed.  Since printf is POSIX standard and
avoids the echo unportability.  Use of echo can be very unportable and
the "\c" is one of those unportable things.

> The output on Solaris might look something like this (with 
> monospaced font on a terminal all the "matches" line up):
> ...

Cool.

> This can be re-written, of course.  (There is one corner case that 
> Solaris's cut handled nicely that I have not been able to come up 
> with a quick fix.) 

Immediately printf comes to mind.  Use %s with a format with
specifier.  Since printf is POSIX standard this should work anywhere.
The failure mode of not having printf available on really, really,
really old systems is trivially handled by providing a printf for that
system.  Much easier than dealing with other differences.

  printf "%.30s matches ==========\n" "$FILE ==========================="

One thing I still don't like about the above is that it will truncate
any long file names.  Any file name longer than 30 will be trunncated.
But of course that would require changes in output format to address.
My preference would be to have "matches" first and the file name
second and let the file name go as long as it needs to go.

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 21:14:01 GMT) Full text and rfc822 format available.

Message #45 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: John Kendall <john <at> capps.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Bob Proulx <bob <at> proulx.com>,
 "19240 <at> debbugs.gnu.org" <19240 <at> debbugs.gnu.org>
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 21:13:37 +0000

Paul Eggert wrote:

> On 12/04/2014 10:41 AM, John Kendall wrote:
>> echo "$FILE ===========================\c"| cut -c1-30
> 
> Since you're going to have to rewrite it anyway if you want it to be portable, I suggest doing it this way:
> 
> printf '%.30s' "$FILE ==========================="
> 
> as it's a lot more efficient anyway.

Yes, that's what I've done.  The corner case I mentioned is 
handled badly by this, however.  In the corner case $FILE 
is a list of files separated by a newlines.  Solaris cut would 
list them and then the ============= would be tacked 
on to the last line:

filename1
filename2
filename3
filename4
filename5 ========================= matches

When printf is used, it truncates the list of filenames if the sum 
of them exceeds 30 chars in length.  The format string %.30s 
doesn't treat embedded newlines specially:

filename1
filename2
filename3 ========================= matches

filenames start getting lopped off.

I'll rework the code.  It worked for 15 years, don't be too
offended by it.   :)

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 21:37:01 GMT) Full text and rfc822 format available.

Message #48 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: John Kendall <john <at> capps.com>,
 "19240 <at> debbugs.gnu.org" <19240 <at> debbugs.gnu.org>
Cc: Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 04 Dec 2014 14:36:40 -0700

[Message part 1 (text/plain, inline)]

On 12/04/2014 11:41 AM, John Kendall wrote:

> 
> The construct in question if just for formatting the output 
> of a script that compares disc files to what's in a database.  
> 
>  echo "$FILE ===========================\c"| cut -c1-30
>  echo " matches =========="

echo '\c' is non-portable.  POSIX says that echo cannot portably used on
any string that uses a backslash (some implementations, like Solaris,
interpret that backslash; others print it literally).  Use 'printf'
instead (in particular, 'printf "%b" "...\c"' does the same as your use
of 'echo "...\c"'; that is, 'printf %b' should be a drop-in replacement
for echoes that know backslash escapes.  On the other hand, if all you
are using is \c to end a line, that's the same as 'printf %s ...' with
no \c)

Also, your example doesn't work for 1-byte $FILE (you don't have enough
=== in the line, and need at least 1 more, which I stick in my responses
below).

Now, for your particular use case:

You can use command substitution to strip trailing newlines, although
that is not portable to builds of cut that skip output if the last line
doesn't have a newline to begin with:

echo "$(printf %b "$FILE ============================\c"| cut -c1-30)" \
     " matches =========="

but you can guarantee a newline ending, and get rid of the non-portable
\ to echo, all for a shorter line that portably works:

echo "$(echo "$FILE ============================"| cut -c1-30)" \
     " matches =========="

'head' can do what you are using cut for:

 echo "$FILE ============================" | head -c30
 echo " matches =========="

And if you are using bash, you can even do it without forking:

 line="$FILE ============================"
 line=${line::30}
 echo "$line matches =========="

There's probably lots of other one-liner solutions that don't require
particular behavior of 'cut'.

> This can be re-written, of course.  (There is one corner case that 
> Solaris's cut handled nicely that I have not been able to come up 
> with a quick fix.) 

Hope my quick fix ideas help you.  Feel free to keep asking questions,
although you are now moving the topic a bit more into the realm of shell
programming than coreutils usage.  And remember, it always helps to ask
questions related to your end goal, rather than your attempted solution:

https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
http://www.perlmonks.org/?node=XY+Problem

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 21:49:01 GMT) Full text and rfc822 format available.

Message #51 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, John Kendall <john <at> capps.com>,
 "19240 <at> debbugs.gnu.org" <19240 <at> debbugs.gnu.org>
Cc: Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 04 Dec 2014 14:48:17 -0700

[Message part 1 (text/plain, inline)]

On 12/04/2014 01:39 PM, Paul Eggert wrote:
> On 12/04/2014 10:41 AM, John Kendall wrote:
>> echo "$FILE ===========================\c"| cut -c1-30
> 
> Since you're going to have to rewrite it anyway if you want it to be
> portable, I suggest doing it this way:
> 
> printf '%.30s' "$FILE ==========================="
> 
> as it's a lot more efficient anyway.

Be careful; the POSIX specification of '%.30s' does NOT work well with
multibyte characters; it is specified as:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05
"precision
    Gives the minimum number of digits to appear for the d, o, i, u, x,
or X conversion specifiers (the field is padded with leading zeros), the
number of digits to appear after the radix character for the e and f
conversion specifiers, the maximum number of significant digits for the
g conversion specifier; or the maximum number of bytes to be written
from a string in the s conversion specifier. The precision shall take
the form of a <period> ( '.' ) followed by a decimal digit string; a
null digit string is treated as zero."

which means that it CAN and WILL corrupt output if the number of bytes
written falls in the middle of a multi-byte character.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 21:56:02 GMT) Full text and rfc822 format available.

Message #54 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: John Kendall <john <at> capps.com>, 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 14:55:03 -0700

Eric Blake wrote:
> Be careful; the POSIX specification of '%.30s' does NOT work well with
> multibyte characters; it is specified as:
> ...
> which means that it CAN and WILL corrupt output if the number of bytes
> written falls in the middle of a multi-byte character.

Good point.  Which leads me back to thinking that printing a tag first
and then the filename second and letting it be as long as it needs to
be without truncation is the best solution.

But of course in the original application coming from a legacy
environment the file names would never be multibyte.

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 21:57:02 GMT) Full text and rfc822 format available.

Message #57 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: John Kendall <john <at> capps.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Bob Proulx <bob <at> proulx.com>,
 "19240 <at> debbugs.gnu.org" <19240 <at> debbugs.gnu.org>
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 04 Dec 2014 14:56:56 -0700

[Message part 1 (text/plain, inline)]

On 12/04/2014 02:13 PM, John Kendall wrote:
> Yes, that's what I've done.  The corner case I mentioned is 
> handled badly by this, however.  In the corner case $FILE 
> is a list of files separated by a newlines.  Solaris cut would 
> list them and then the ============= would be tacked 
> on to the last line:

Again, mention your goal up front, and you can save us some iterations.
 So you really DO want to grab a rectangular region of text, and append
to just the last line, rather than chop a single line of input at a
fixed length (it was not obvious to us from the naming or your example
that you intended for $FILE to contain newlines).

So, my solution of using command substitution still does this, and portably:

 echo "$(echo "$FILE ============================"| cut -c1-30)" \
      " matches =========="

So does sed, although no longer a short one-liner:

echo "$FILE" | sed -e 's/^\(.\{30\}\).*/\1/' \
                   -e '$ {' \
                   -e   's/$/ ============================/' \
                   -e   's/^\(.\{30\}\).*/\1/' \
                   -e   '$ s/$/ matches ==========/' \
                   -e '}'

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 22:02:02 GMT) Full text and rfc822 format available.

Message #60 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: John Kendall <john <at> capps.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Bob Proulx <bob <at> proulx.com>,
 "19240 <at> debbugs.gnu.org" <19240 <at> debbugs.gnu.org>
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 04 Dec 2014 15:01:23 -0700

[Message part 1 (text/plain, inline)]

On 12/04/2014 02:56 PM, Eric Blake wrote:
> So does sed, although no longer a short one-liner:
> 
> echo "$FILE" | sed -e 's/^\(.\{30\}\).*/\1/' \
>                    -e '$ {' \
>                    -e   's/$/ ============================/' \
>                    -e   's/^\(.\{30\}\).*/\1/' \
>                    -e   '$ s/$/ matches ==========/' \
>                    -e '}'

Or:

 echo "$FILE ============================" | \
  sed -e 's/^\(.\{30\}\).*/\1/' \
      -e '$ s/$/ matches ==========/'

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Thu, 04 Dec 2014 22:30:03 GMT) Full text and rfc822 format available.

Message #63 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: John Kendall <john <at> capps.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Bob Proulx <bob <at> proulx.com>,
 "19240 <at> debbugs.gnu.org" <19240 <at> debbugs.gnu.org>
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 22:29:53 +0000

On Dec 4, 2014, at 1:56 PM, Eric Blake <eblake <at> redhat.com>
 wrote:

> On 12/04/2014 02:13 PM, John Kendall wrote:
>> Yes, that's what I've done.  The corner case I mentioned is 
>> handled badly by this, however.  In the corner case $FILE 
>> is a list of files separated by a newlines.  Solaris cut would 
>> list them and then the ============= would be tacked 
>> on to the last line:
> 
> Again, mention your goal up front, and you can save us some iterations.
> So you really DO want to grab a rectangular region of text, and append
> to just the last line, rather than chop a single line of input at a
> fixed length (it was not obvious to us from the naming or your example
> that you intended for $FILE to contain newlines).
> 

My goal was to bring up the differences between Solaris cut and gnu cut 
and hear the justification.  And I've learned a lot.  I've been in the
Solaris gated community for so long, imagine how much I have never
had to think about!


But it was never my intention to have you solve the re-write for me.  I 
only shared my code because Bob asked.  But I really appreciate you 
solving it for me!

Thanks again to all of you.



> So, my solution of using command substitution still does this, and portably:
> 
> echo "$(echo "$FILE ============================"| cut -c1-30)" \
>      " matches =========="
> 
> So does sed, although no longer a short one-liner:
> 
> echo "$FILE" | sed -e 's/^\(.\{30\}\).*/\1/' \
>                   -e '$ {' \
>                   -e   's/$/ ============================/' \
>                   -e   's/^\(.\{30\}\).*/\1/' \
>                   -e   '$ s/$/ matches ==========/' \
>                   -e '}'
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>

Information forwarded to bug-coreutils <at> gnu.org:
bug#19240; Package coreutils. (Fri, 05 Dec 2014 03:18:01 GMT) Full text and rfc822 format available.

Message #66 received at 19240 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: John Kendall <john <at> capps.com>
Cc: 19240 <at> debbugs.gnu.org
Subject: Re: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 20:17:18 -0700

John Kendall wrote:
> My goal was to bring up the differences between Solaris cut and gnu cut 
> and hear the justification.  And I've learned a lot.  I've been in the
> Solaris gated community for so long, imagine how much I have never
> had to think about!

At one time I was exactly the same way after years of using HP-UX! :-)
Well...  Maybe not because there were always other machines in the mix
too.

> But it was never my intention to have you solve the re-write for me.  I 
> only shared my code because Bob asked.  But I really appreciate you 
> solving it for me!
>
> Thanks again to all of you.

Thanks for the sharing.  As I said I was curious as to the code issue
that was problematic for portability.  I already knew it wasn't
portable or it wouldn't have been a squeaky wheel.  So seeing
something unportable was simply expected.

And I will speak for the group and say you are most welcome.  We do
this because if you set a tangled ball of string in front we would
untangle it.  It is just our nature.

Bob

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 02 Jan 2015 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 222 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #19240 cut 8.22 adds newline

GNU bug report logs - #19240
cut 8.22 adds newline