GNU bug report logs - #50129
-f - option doesn't respond to single EOF from TTY.

Previous Next

Package: grep;

Reported by: Kaz Kylheku <kaz <at> kylheku.com>

Date: Fri, 20 Aug 2021 01:01:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 50129 in the body.
You can then email your comments to 50129 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#50129; Package grep. (Fri, 20 Aug 2021 01:01:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kaz Kylheku <kaz <at> kylheku.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 20 Aug 2021 01:01:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Kaz Kylheku <kaz <at> kylheku.com>
To: bug-grep <at> gnu.org
Subject: -f - option doesn't respond to single EOF from TTY.
Date: Thu, 19 Aug 2021 17:59:53 -0700
This reproduces on grep 3.1 on Ubuntu.

The command:

   grep -f -

should accept a list of patterns from standard input, like this:

   $ grep -f -
   pat1
   pat2
   pat3
   [Ctrl-D]

Upon receiving the EOF indication (zero byte read), the program
should immediately conclude that the list of patterns has ended,
and begin processing the input using the patterns.

This does not seem to be working. After a single Ctrl-D, grep is
still accumulating patterns:

It appears that Ctrl-D must be issued twice:

   $ grep -f -
   pat1
   [Ctrl-D] ;; effectively ignored
   pat2     ;; can add more patterns
   pat3
   [Ctrl-D]
   [Ctrl-D] ;; OK, now we are in the matching loop.
   pat3blahblah
   pat3blahblah
   ...

Cheers ...




Information forwarded to bug-grep <at> gnu.org:
bug#50129; Package grep. (Fri, 20 Aug 2021 02:11:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Paul Jackson" <pj <at> usa.net>
To: bug-grep <at> gnu.org
Subject: Re: bug#50129: -f - option doesn't respond to single EOF from TTY.
Date: Thu, 19 Aug 2021 21:10:32 -0500
My fading recollection from long, long ago is that this is a difficult to
avoid artifact of using stdio.

If no patterns are fed into grep, just a single EOF, then grep exits immediately.

But some non-empty input (just a single "pat1" is sufficient) is fed to grep,
then it takes two EOF's to get grep to exit.

These two cases can be seen in the output of the following two commands:

# With an input of "pat1", it takes two reads returning 0 bytes (EOF's) to exit.
(sleep 2; echo pat1) | strace -tt -T  grep -f - 2>&1 | tail -15

# With no "pat1" input, it only takes one read of 0 bytes to exit.
sleep 2 | strace -tt -T  grep -f - 2>&1 | tail -15

=== ===

Here's a copy-paste of a terminal session in which I invoke the above two commands.
Notice the read() system calls that return 0 bytes.  That's how the kernel presents an
EOF to a user process on a read.

$ (sleep 2; echo pat1) | strace -tt -T  grep -f - 2>&1 | tail -15
21:07:12.887450 openat(AT_FDCWD, "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000013>
21:07:12.887490 openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000013>
21:07:12.887535 rt_sigaction(SIGSEGV, {sa_handler=0x55e0d5c49570, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_NODEFER|SA_RESETHAND|SA_SIGINFO, sa_restorer=0x7f9b26cd3210}, NULL, 8) = 0 <0.000011>
21:07:12.887579 fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000012>
21:07:12.887623 read(0, "pat1\n", 4096) = 5 <1.993486>
21:07:14.881203 read(0, "", 4096)       = 0 <0.000020>
21:07:14.881269 fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000019>
21:07:14.881420 brk(0x55e0d7954000)     = 0x55e0d7954000 <0.000023>
21:07:14.881477 fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000012>
21:07:14.881522 lseek(0, 0, SEEK_CUR)   = -1 ESPIPE (Illegal seek) <0.000012>
21:07:14.881570 read(0, "", 98304)      = 0 <0.000012>
21:07:14.881616 close(1)                = 0 <0.000012>
21:07:14.881655 close(2)                = 0 <0.000011>
21:07:14.881706 exit_group(1)           = ?
21:07:14.881936 +++ exited with 1 +++

$ sleep 2 | strace -tt -T  grep -f - 2>&1 | tail -15
21:07:25.407841 openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000015>
21:07:25.407888 openat(AT_FDCWD, "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.407933 openat(AT_FDCWD, "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.407980 openat(AT_FDCWD, "/usr/share/locale-langpack/en_US/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.408026 openat(AT_FDCWD, "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.408071 openat(AT_FDCWD, "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.408116 openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000015>
21:07:25.408165 rt_sigaction(SIGSEGV, {sa_handler=0x55ac5b515570, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_NODEFER|SA_RESETHAND|SA_SIGINFO, sa_restorer=0x7f84afc54210}, NULL, 8) = 0 <0.000012>
21:07:25.408213 fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000012>
21:07:25.408260 read(0, "", 4096)       = 0 <1.992492>
21:07:27.400830 fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000019>
21:07:27.400921 close(1)                = 0 <0.000018>
21:07:27.400979 close(2)                = 0 <0.000019>
21:07:27.401046 exit_group(1)           = ?
21:07:27.401259 +++ exited with 1 +++


-- 
                Paul Jackson
                pj <at> usa.net




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Fri, 20 Aug 2021 15:26:02 GMT) Full text and rfc822 format available.

Notification sent to Kaz Kylheku <kaz <at> kylheku.com>:
bug acknowledged by developer. (Fri, 20 Aug 2021 15:26:02 GMT) Full text and rfc822 format available.

Message #13 received at 50129-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Kaz Kylheku <kaz <at> kylheku.com>
Cc: 50129-done <at> debbugs.gnu.org
Subject: Re: bug#50129: -f - option doesn't respond to single EOF from TTY.
Date: Fri, 20 Aug 2021 08:25:12 -0700
On 8/19/21 5:59 PM, Kaz Kylheku wrote:
> This reproduces on grep 3.1 on Ubuntu.

That's a pretty old version of grep. I cannot reproduce the problem on 
Ubuntu 21.04, which has grep 3.6. So it sounds like the bug is fixed, 
and you can fix your problem by upgrading to a more-recent Ubuntu version.

(Not that anyone would ever want to *use* plain "grep -f -", except 
perhaps to file bug reports....)




Information forwarded to bug-grep <at> gnu.org:
bug#50129; Package grep. (Fri, 20 Aug 2021 16:54:01 GMT) Full text and rfc822 format available.

Message #16 received at 50129 <at> debbugs.gnu.org (full text, mbox):

From: Kaz Kylheku <kaz <at> kylheku.com>
To: 50129 <at> debbugs.gnu.org
Subject: Re: bug#50129: closed (Re: bug#50129: -f - option doesn't respond to
 single EOF from TTY.)
Date: Fri, 20 Aug 2021 09:53:38 -0700
Please allow me to add a semi-final comment to a closed bug.

Looking at grep's small body of code for handling the -f
option, I don't see anything substantially different between
3.1 and 3.7. It's the same logic wrapped around the fread
function.

Though the code has been worked on, there is no difference
in how the end of input is detected.

From that perspective, this looks like it may in fact be a
problem in the libc fread function. However, on the
same system, I cannot reproduce the issue with a similar
loop around fread, e.g. with this program:

  #include <stdio.h>

  int main(void)
  {
    char buf[128];
    setvbuf(stdin, NULL, _IOFBF, 0);
    while (fread(buf, sizeof buf, 1, stdin) != 0);
    return 0;
  }

Regardless of buffering mode, the fread loop promptly
quits if Ctrl-D is given from the TTY on an empty line.

The most plausible explanation at this point is that Debian/Ubuntu
had applied some stinky patch to grep.

If I have some time, I will drill into that, in which case
I will post a final comment here, too.

Cheers ...




Information forwarded to bug-grep <at> gnu.org:
bug#50129; Package grep. (Fri, 20 Aug 2021 17:26:02 GMT) Full text and rfc822 format available.

Message #19 received at 50129 <at> debbugs.gnu.org (full text, mbox):

From: Kaz Kylheku <kaz <at> kylheku.com>
To: 50129 <at> debbugs.gnu.org
Subject: Re: bug#50129: closed (Re: bug#50129: -f - option doesn't respond to
 single EOF from TTY.)
Date: Fri, 20 Aug 2021 10:25:10 -0700
Nope! It looks like a behavior of fread.

We have to reverse the middle two arguments of fread,
so that it's reading an n-sized array of 1-byte objects,
rather than one object of size n.

  #include <stdio.h>

  int main(void)
  {
    char buf[8192];
    while (fread(buf, 1, sizeof buf, stdin) != 0);

    //                ^^^^^^^^^^^^^^ reversed these!
    return 0;
  }

and now, a behavior closely resembling the issue is
reproduced. You can see in the strace that two
zero-sized reads are needed to bail out:

  $ strace ./fread
   read(0, abc
   "abc\n", 8192)                  = 4
  read(0, "", 7168)                       = 0
  read(0, "", 8192)                       = 0
  exit_group(0)                           = ?
  +++ exited with 0 +++

what happens is that upon receiving the zero byte read,
fread is returning the number of bytes it has accumulated
to the application. The application calls fread again.

This is expected, because fread didn't fail.

In my original example, fread is called only once
and returns zero, because it was told to read a 128
byte object, during which it received a short read,
at which point it is game over.

Still, I don't understand.  The grep 3.1 and 3.7
code both have 1 as the second argument of fread,
and the size as the third.

This argument order is necessary, so that fread
doesn't discard partial data at EOF.

So there is something more there. I'd have to play
with the code to see how the second argument is behaving
under the buffer resize logic and whatnot.

(E.g. if it happens that the second argument is
now always 1, that would explain why the behavior is
fixed.)














Information forwarded to bug-grep <at> gnu.org:
bug#50129; Package grep. (Fri, 20 Aug 2021 18:53:01 GMT) Full text and rfc822 format available.

Message #22 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Paul Jackson" <pj <at> usa.net>
To: bug-grep <at> gnu.org
Subject: Re: bug#50129: closed (Re: bug#50129: -f - option doesn't respond to single EOF from TTY.)
Date: Fri, 20 Aug 2021 13:51:06 -0500
I'd guess you're seeing some of the design decisions in stdio,
that were rather "controversial", back around Version 7 Unix,
inside Bell Labs, circa late 1978, when stdio was first introduced.

I've avoided using fread(3S) "forever", even to the point of rolling
my own i/o buffering library, the most recent incarnation of which
can be found at https://github.com/ThePythonicCow/rawscan

The fixed input buffer size of my 'rawscan' probably makes it unsuitable
for most uses such as GNU grep that expect to handle arbitrarily long
input lines.  This 'rawscan' scans over and otherwise ignores the tail end
of any input line that exceeds the line length you specified when opening
the buffer stream.

On a very large test case rather carefully chosen to highlight the
performance of rawscan, it was over twice as fast searching for
a fixed pattern than was grep.  See the "Comparative Performance"
section of rawscan's github page for these results:

https://github.com/ThePythonicCow/rawscan

-- 
                Paul Jackson
                pj <at> usa.net




Information forwarded to bug-grep <at> gnu.org:
bug#50129; Package grep. (Sat, 21 Aug 2021 17:51:01 GMT) Full text and rfc822 format available.

Message #25 received at 50129 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Kaz Kylheku <kaz <at> kylheku.com>
Cc: 50129 <at> debbugs.gnu.org
Subject: Re: bug#50129: -f - option doesn't respond to single EOF from TTY.
Date: Sat, 21 Aug 2021 10:50:30 -0700
[Message part 1 (text/plain, inline)]
On 8/20/21 8:25 AM, Paul Eggert wrote:

> (Not that anyone would ever want to *use* plain "grep -f -", except 
> perhaps to file bug reports....)

I discovered a more-artificial case where grep messes up even on modern 
platforms, namely 'grep -f - -f -' where grep essentially ignores the 
second '-f -'. I installed the attached to fix that, along with another 
bug where grep wasn't reporting fclose errors.

I think these bugs are so unlikely and artificial that they're not worth 
mentioning in NEWS.

As a side effect, this patch might fix the bug#50129 problem on your old 
platform. Hard to say.
[0001-grep-avoid-sticky-problem-with-f-f.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 19 Sep 2021 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 275 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.