GNU bug report logs - #13498
"cut -f" lags a line

Previous Next

Package: coreutils;

Reported by: Scott Lamb <slamb <at> slamb.org>

Date: Sat, 19 Jan 2013 17:27:01 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 13498 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Scott Lamb <slamb <at> slamb.org>
Cc: 13498 <at> debbugs.gnu.org
Subject: Re: bug#13498: "cut -f" lags a line
Date: Sun, 20 Jan 2013 12:41:12 +0000
On 01/19/2013 08:35 AM, Scott Lamb wrote:
> "cut -f" has an apparently long-standing behavior that I'd consider a
> bug: it does not fully send line N to stdout until the first character
> of line N+1 has been read on stdin. This is confusing when stdin comes
> from "tail -f" or the like. The exact behavior varies slightly. If
> stdin is a tty, all but the trailing newline will be flushed
> immediately and then the trailing newline will be flushed when the
> next character shows up. If stdin is not a tty, there's no flush at
> all until the next character shows up.
>
> For example, if I type the following into a shell on Ubuntu 12.04.1,
> meaning cut from coreutils 8.13 and glibc package version
> 2.15-0ubuntu10.3:
>
>      cut -f1-
>      foo
>      bar
>      baz
>      ^D
>
> I will see the following:
>
>      $ cut -f1-
>      foo
>      foobar
>
>      barbaz
>
>      baz
>      $
>
> and if I instead use "cat | cut -f1-" in the first line, I will see
> the following:
>
>      $ cat | cut -f1-
>      foo
>      bar
>      foo
>      baz
>      bar
>      baz
>      $
>
> (coreutils's cut -c does not have the same laggy behavior. Neither
> does BSD cut on my OS X machine in either -c or -f mode.)
>
> This code in cut_fields (still found in trunk tip) is responsible for
> delaying the newline; it runs between the newline being read and being
> written:
>
>        if (c == '\n')
>          {
>            c = getc (stream);
>            if (c != EOF)
>              {
>                ungetc (c, stream);
>                c = '\n';
>              }
>          }
>
> I believe that code is there to avoid turning one newline at EOF into
> two, but that goal could be accomplished in another way.
>
> I don't know exactly why the behavior differs based on stdin being a
> tty or not. My best guess is that glibc might have some logic that, if
> stdin is a tty, automatically flushes stdout any time the program
> blocks on stdin. glibc's stdio internals are a bit hard for me to
> follow, so I haven't found the code in question. Apparently this is a
> vaguely standardized behavior; I see a stackoverflow post mentioning
> the following:
>
> """
> The input and output dynamics of interactive devices shall take place
> as specified in 7.19.3. The intent of these requirements is that
> unbuffered or line-buffered output appear as soon as possible, to
> ensure that prompting messages actually appear prior to a program
> waiting for input.
>
> (ISO/IEC 9899:TC2 Committee Draft -- May 6, 2005, page 14).
> """

For my reference:
http://comments.pixelbeat.org/programming/stdio_buffering/#comment-250521

Yes the use of ungetc() is awkward in cut.
I notice that pr is the only other util using ungetc.
Also the i18n version of cut on my system has a rewritten
cut_fields() function that doesn't exhibit the behavior.

ungetc() is coupled with the use of getndelim2(),
but I'll have a look at addressing this.

thanks,
Pádraig.




This bug report was last modified 10 years and 363 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.