GNU bug report logs -
#50129
-f - option doesn't respond to single EOF from TTY.
Previous Next
Reported by: Kaz Kylheku <kaz <at> kylheku.com>
Date: Fri, 20 Aug 2021 01:01:01 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 50129 in the body.
You can then email your comments to 50129 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#50129
; Package
grep
.
(Fri, 20 Aug 2021 01:01:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Kaz Kylheku <kaz <at> kylheku.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Fri, 20 Aug 2021 01:01:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
This reproduces on grep 3.1 on Ubuntu.
The command:
grep -f -
should accept a list of patterns from standard input, like this:
$ grep -f -
pat1
pat2
pat3
[Ctrl-D]
Upon receiving the EOF indication (zero byte read), the program
should immediately conclude that the list of patterns has ended,
and begin processing the input using the patterns.
This does not seem to be working. After a single Ctrl-D, grep is
still accumulating patterns:
It appears that Ctrl-D must be issued twice:
$ grep -f -
pat1
[Ctrl-D] ;; effectively ignored
pat2 ;; can add more patterns
pat3
[Ctrl-D]
[Ctrl-D] ;; OK, now we are in the matching loop.
pat3blahblah
pat3blahblah
...
Cheers ...
Information forwarded
to
bug-grep <at> gnu.org
:
bug#50129
; Package
grep
.
(Fri, 20 Aug 2021 02:11:01 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
My fading recollection from long, long ago is that this is a difficult to
avoid artifact of using stdio.
If no patterns are fed into grep, just a single EOF, then grep exits immediately.
But some non-empty input (just a single "pat1" is sufficient) is fed to grep,
then it takes two EOF's to get grep to exit.
These two cases can be seen in the output of the following two commands:
# With an input of "pat1", it takes two reads returning 0 bytes (EOF's) to exit.
(sleep 2; echo pat1) | strace -tt -T grep -f - 2>&1 | tail -15
# With no "pat1" input, it only takes one read of 0 bytes to exit.
sleep 2 | strace -tt -T grep -f - 2>&1 | tail -15
=== ===
Here's a copy-paste of a terminal session in which I invoke the above two commands.
Notice the read() system calls that return 0 bytes. That's how the kernel presents an
EOF to a user process on a read.
$ (sleep 2; echo pat1) | strace -tt -T grep -f - 2>&1 | tail -15
21:07:12.887450 openat(AT_FDCWD, "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000013>
21:07:12.887490 openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000013>
21:07:12.887535 rt_sigaction(SIGSEGV, {sa_handler=0x55e0d5c49570, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_NODEFER|SA_RESETHAND|SA_SIGINFO, sa_restorer=0x7f9b26cd3210}, NULL, 8) = 0 <0.000011>
21:07:12.887579 fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000012>
21:07:12.887623 read(0, "pat1\n", 4096) = 5 <1.993486>
21:07:14.881203 read(0, "", 4096) = 0 <0.000020>
21:07:14.881269 fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000019>
21:07:14.881420 brk(0x55e0d7954000) = 0x55e0d7954000 <0.000023>
21:07:14.881477 fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000012>
21:07:14.881522 lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) <0.000012>
21:07:14.881570 read(0, "", 98304) = 0 <0.000012>
21:07:14.881616 close(1) = 0 <0.000012>
21:07:14.881655 close(2) = 0 <0.000011>
21:07:14.881706 exit_group(1) = ?
21:07:14.881936 +++ exited with 1 +++
$ sleep 2 | strace -tt -T grep -f - 2>&1 | tail -15
21:07:25.407841 openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000015>
21:07:25.407888 openat(AT_FDCWD, "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.407933 openat(AT_FDCWD, "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.407980 openat(AT_FDCWD, "/usr/share/locale-langpack/en_US/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.408026 openat(AT_FDCWD, "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.408071 openat(AT_FDCWD, "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000014>
21:07:25.408116 openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/grep.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000015>
21:07:25.408165 rt_sigaction(SIGSEGV, {sa_handler=0x55ac5b515570, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_NODEFER|SA_RESETHAND|SA_SIGINFO, sa_restorer=0x7f84afc54210}, NULL, 8) = 0 <0.000012>
21:07:25.408213 fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000012>
21:07:25.408260 read(0, "", 4096) = 0 <1.992492>
21:07:27.400830 fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 <0.000019>
21:07:27.400921 close(1) = 0 <0.000018>
21:07:27.400979 close(2) = 0 <0.000019>
21:07:27.401046 exit_group(1) = ?
21:07:27.401259 +++ exited with 1 +++
--
Paul Jackson
pj <at> usa.net
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Fri, 20 Aug 2021 15:26:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Kaz Kylheku <kaz <at> kylheku.com>
:
bug acknowledged by developer.
(Fri, 20 Aug 2021 15:26:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 50129-done <at> debbugs.gnu.org (full text, mbox):
On 8/19/21 5:59 PM, Kaz Kylheku wrote:
> This reproduces on grep 3.1 on Ubuntu.
That's a pretty old version of grep. I cannot reproduce the problem on
Ubuntu 21.04, which has grep 3.6. So it sounds like the bug is fixed,
and you can fix your problem by upgrading to a more-recent Ubuntu version.
(Not that anyone would ever want to *use* plain "grep -f -", except
perhaps to file bug reports....)
Information forwarded
to
bug-grep <at> gnu.org
:
bug#50129
; Package
grep
.
(Fri, 20 Aug 2021 16:54:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 50129 <at> debbugs.gnu.org (full text, mbox):
Please allow me to add a semi-final comment to a closed bug.
Looking at grep's small body of code for handling the -f
option, I don't see anything substantially different between
3.1 and 3.7. It's the same logic wrapped around the fread
function.
Though the code has been worked on, there is no difference
in how the end of input is detected.
From that perspective, this looks like it may in fact be a
problem in the libc fread function. However, on the
same system, I cannot reproduce the issue with a similar
loop around fread, e.g. with this program:
#include <stdio.h>
int main(void)
{
char buf[128];
setvbuf(stdin, NULL, _IOFBF, 0);
while (fread(buf, sizeof buf, 1, stdin) != 0);
return 0;
}
Regardless of buffering mode, the fread loop promptly
quits if Ctrl-D is given from the TTY on an empty line.
The most plausible explanation at this point is that Debian/Ubuntu
had applied some stinky patch to grep.
If I have some time, I will drill into that, in which case
I will post a final comment here, too.
Cheers ...
Information forwarded
to
bug-grep <at> gnu.org
:
bug#50129
; Package
grep
.
(Fri, 20 Aug 2021 17:26:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 50129 <at> debbugs.gnu.org (full text, mbox):
Nope! It looks like a behavior of fread.
We have to reverse the middle two arguments of fread,
so that it's reading an n-sized array of 1-byte objects,
rather than one object of size n.
#include <stdio.h>
int main(void)
{
char buf[8192];
while (fread(buf, 1, sizeof buf, stdin) != 0);
// ^^^^^^^^^^^^^^ reversed these!
return 0;
}
and now, a behavior closely resembling the issue is
reproduced. You can see in the strace that two
zero-sized reads are needed to bail out:
$ strace ./fread
read(0, abc
"abc\n", 8192) = 4
read(0, "", 7168) = 0
read(0, "", 8192) = 0
exit_group(0) = ?
+++ exited with 0 +++
what happens is that upon receiving the zero byte read,
fread is returning the number of bytes it has accumulated
to the application. The application calls fread again.
This is expected, because fread didn't fail.
In my original example, fread is called only once
and returns zero, because it was told to read a 128
byte object, during which it received a short read,
at which point it is game over.
Still, I don't understand. The grep 3.1 and 3.7
code both have 1 as the second argument of fread,
and the size as the third.
This argument order is necessary, so that fread
doesn't discard partial data at EOF.
So there is something more there. I'd have to play
with the code to see how the second argument is behaving
under the buffer resize logic and whatnot.
(E.g. if it happens that the second argument is
now always 1, that would explain why the behavior is
fixed.)
Information forwarded
to
bug-grep <at> gnu.org
:
bug#50129
; Package
grep
.
(Fri, 20 Aug 2021 18:53:01 GMT)
Full text and
rfc822 format available.
Message #22 received at submit <at> debbugs.gnu.org (full text, mbox):
I'd guess you're seeing some of the design decisions in stdio,
that were rather "controversial", back around Version 7 Unix,
inside Bell Labs, circa late 1978, when stdio was first introduced.
I've avoided using fread(3S) "forever", even to the point of rolling
my own i/o buffering library, the most recent incarnation of which
can be found at https://github.com/ThePythonicCow/rawscan
The fixed input buffer size of my 'rawscan' probably makes it unsuitable
for most uses such as GNU grep that expect to handle arbitrarily long
input lines. This 'rawscan' scans over and otherwise ignores the tail end
of any input line that exceeds the line length you specified when opening
the buffer stream.
On a very large test case rather carefully chosen to highlight the
performance of rawscan, it was over twice as fast searching for
a fixed pattern than was grep. See the "Comparative Performance"
section of rawscan's github page for these results:
https://github.com/ThePythonicCow/rawscan
--
Paul Jackson
pj <at> usa.net
Information forwarded
to
bug-grep <at> gnu.org
:
bug#50129
; Package
grep
.
(Sat, 21 Aug 2021 17:51:01 GMT)
Full text and
rfc822 format available.
Message #25 received at 50129 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 8/20/21 8:25 AM, Paul Eggert wrote:
> (Not that anyone would ever want to *use* plain "grep -f -", except
> perhaps to file bug reports....)
I discovered a more-artificial case where grep messes up even on modern
platforms, namely 'grep -f - -f -' where grep essentially ignores the
second '-f -'. I installed the attached to fix that, along with another
bug where grep wasn't reporting fclose errors.
I think these bugs are so unlikely and artificial that they're not worth
mentioning in NEWS.
As a side effect, this patch might fix the bug#50129 problem on your old
platform. Hard to say.
[0001-grep-avoid-sticky-problem-with-f-f.patch (text/x-patch, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 19 Sep 2021 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 275 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.