GNU bug report logs - #22945
Surprising behaviour (bug?) of zgrep in combination with the -f option and process substitutions

Previous Next

Package: gzip;

Reported by: Fulvio Scapin <trantorvega <at> gmail.com>

Date: Tue, 8 Mar 2016 16:33:03 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22945 in the body.
You can then email your comments to 22945 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#22945; Package gzip. (Tue, 08 Mar 2016 16:33:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Fulvio Scapin <trantorvega <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Tue, 08 Mar 2016 16:33:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Fulvio Scapin <trantorvega <at> gmail.com>
To: bug-gzip <at> gnu.org
Subject: Surprising behaviour (bug?) of zgrep in combination with the -f
 option and process substitutions
Date: Tue, 8 Mar 2016 12:42:30 +0100
[Message part 1 (text/plain, inline)]
Hello.

There is a problem with zgrep whenever the -f option actually reads from
the output of a process substition in bash.
A willingly trivial example below.

$ mkdir /tmp/test

$ cd /tmp/test

$ cat > first
aaa

$ cat > second
bbb

$ cat > third
ccc

$ cat > fourth
ddd

$ tail *
==> first <==
aaa

==> fourth <==
ddd

==> second <==
bbb

==> third <==
ccc

$ gzip -9 *

$ ls
first.gz  fourth.gz  second.gz  third.gz

$ cat > patterns
aaa
bbb
ccc
ddd

$ tail patterns
aaa
bbb
ccc
ddd

$ zfgrep -f <( cat patterns ) first.gz fourth.gz second.gz third.gz
first.gz:aaa

$ zfgrep -f patterns first.gz fourth.gz second.gz third.gz
first.gz:aaa
fourth.gz:ddd
second.gz:bbb
third.gz:ccc



zfgrep -f <( cat patterns ) first.gz fourth.gz second.gz third.gz
translates in
zfgrep -f /dev/fd/XX first.gz fourth.gz second.gz third.gz
where XX is a number, 63 for instance .

The problem, from what I understand, arises since

zgrep -f patternfile a.gz b.gz c.gz

actually is a succession of

gzip -dc a.gz | grep -f patternfile
gzip -dc b.gz | grep -f patternfile
gzip -dc c.gz | grep -f patternfile


Since patternfile in this case is /dev/fd/XX, only the first invocation of
grep in the first pipeline actually reads a pattern list, while the second
and third invocation get nothing, giving no match for b.gz and c.gz as a
result.


From /bin/zgrep (Version 1.6, Ubuntu 15.10) one can read

  (-f | --file)
   # The pattern is coming from a file rather than the command-line.
   # If the file is actually stdin then we need to do a little
   # magic, since we use stdin to pass the gzip output to grep.
   # Turn the -f option into an -e option by copying the file's
   # contents into OPTARG.
   case $optarg in
   (" '-'" | " '/dev/stdin'" | " '/dev/fd/0'")
     option=-e
     optarg=" '"$(sed "$escape") || exit 2;;
   esac
   have_pat=1;;

The workaround concerning stdin should (maybe) also apply to situations
such as the one in my example?

Thanks in advance.

Fulvio Scapin
[Message part 2 (text/html, inline)]

Information forwarded to bug-gzip <at> gnu.org:
bug#22945; Package gzip. (Wed, 16 Mar 2016 02:36:01 GMT) Full text and rfc822 format available.

Message #8 received at 22945 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Fulvio Scapin <trantorvega <at> gmail.com>
Cc: 22945 <at> debbugs.gnu.org
Subject: Re: bug#22945: Surprising behaviour (bug?) of zgrep in combination
 with the -f option and process substitutions
Date: Tue, 15 Mar 2016 19:34:49 -0700
On Tue, Mar 8, 2016 at 3:42 AM, Fulvio Scapin <trantorvega <at> gmail.com> wrote:
> Hello.
>
> There is a problem with zgrep whenever the -f option actually reads from
> the output of a process substition in bash.
> A willingly trivial example below.
>
> $ mkdir /tmp/test
...
>From /bin/zgrep (Version 1.6, Ubuntu 15.10) one can read

Thank you for the report.
To summarize, with zgrep-1.6, this erroneously prints matches only
from the first file:

  $ zgrep -f <(echo .) <(echo a) <(echo b)
  /dev/fd/12:a

However, with the latest from git (and soon to be gzip-1.7), this now
works as desired:

  $ zgrep -f <(echo .) <(echo a) <(echo b)
  /dev/fd/12:a
  /dev/fd/13:b

I see there is no NEWS entry for this fix and haven't yet identified
the origin of the bug or the commit that fixed it, but will do so.




Information forwarded to bug-gzip <at> gnu.org:
bug#22945; Package gzip. (Wed, 16 Mar 2016 21:07:01 GMT) Full text and rfc822 format available.

Message #11 received at 22945 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Fulvio Scapin <trantorvega <at> gmail.com>
Cc: 22945 <at> debbugs.gnu.org
Subject: Re: bug#22945: Surprising behaviour (bug?) of zgrep in combination
 with the -f option and process substitutions
Date: Wed, 16 Mar 2016 14:06:12 -0700
On 03/15/2016 07:34 PM, Jim Meyering wrote:
> Thank you for the report.
> To summarize, with zgrep-1.6, this erroneously prints matches only
> from the first file:
>
>    $ zgrep -f <(echo .) <(echo a) <(echo b)
>    /dev/fd/12:a
>
> However, with the latest from git (and soon to be gzip-1.7), this now
> works as desired:
>
>    $ zgrep -f <(echo .) <(echo a) <(echo b)
>    /dev/fd/12:a
>    /dev/fd/13:b
>
> I see there is no NEWS entry for this fix and haven't yet identified
> the origin of the bug or the commit that fixed it, but will do so.

Draft gzip 1.7 doesn't work for me (Fedora 23 x86-64). I have worked on 
a patch but don't have a reliable fix yet, or even a portable test case 
to illustrate the bug. Perhaps we should just think of it as a known bug 
for now.




Information forwarded to bug-gzip <at> gnu.org:
bug#22945; Package gzip. (Thu, 17 Mar 2016 16:33:02 GMT) Full text and rfc822 format available.

Message #14 received at 22945 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: 22945 <at> debbugs.gnu.org
Cc: Fulvio Scapin <trantorvega <at> gmail.com>
Subject: Re: bug#22945: Surprising behaviour (bug?) of zgrep in combination
 with	the -f option and process substitutions
Date: Thu, 17 Mar 2016 17:39:24 +0100
Paul Eggert wrote:
> Draft gzip 1.7 doesn't work for me (Fedora 23 x86-64). I have worked on 
> a patch but don't have a reliable fix yet, or even a portable test case 
> to illustrate the bug. Perhaps we should just think of it as a known bug 
> for now.

What about using command substitution with '-e' instead of process 
substitution with '-f'?

  zgrep -e "$(cat FILE)" file1.lz file2.gz


Best regards,
Antonio.




Information forwarded to bug-gzip <at> gnu.org:
bug#22945; Package gzip. (Thu, 17 Mar 2016 20:14:01 GMT) Full text and rfc822 format available.

Message #17 received at 22945 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Fulvio Scapin <trantorvega <at> gmail.com>
Cc: 22945 <at> debbugs.gnu.org
Subject: Re: bug#22945: Surprising behaviour (bug?) of zgrep in combination
 with the -f option and process substitutions
Date: Thu, 17 Mar 2016 13:13:34 -0700
[Message part 1 (text/plain, inline)]
On 03/16/2016 02:06 PM, Paul Eggert wrote:
> I have worked on a patch but don't have a reliable fix yet, or even a 
> portable test case to illustrate the bug.

On further thought I found a test case and a fix, which I've attached. 
Normally I would just install this, but we're so close to a release that 
I'll wait for a word from Jim.
[0001-zgrep-with-f-SPECIAL-read-SPECIAL-just-once.patch (application/x-patch, attachment)]

Information forwarded to bug-gzip <at> gnu.org:
bug#22945; Package gzip. (Fri, 18 Mar 2016 03:59:01 GMT) Full text and rfc822 format available.

Message #20 received at 22945 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Fulvio Scapin <trantorvega <at> gmail.com>, 22945 <at> debbugs.gnu.org
Subject: Re: bug#22945: Surprising behaviour (bug?) of zgrep in combination
 with the -f option and process substitutions
Date: Thu, 17 Mar 2016 20:58:14 -0700
On Thu, Mar 17, 2016 at 1:13 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 03/16/2016 02:06 PM, Paul Eggert wrote:
>>
>> I have worked on a patch but don't have a reliable fix yet, or even a
>> portable test case to illustrate the bug.
>
> On further thought I found a test case and a fix, which I've attached.
> Normally I would just install this, but we're so close to a release that
> I'll wait for a word from Jim.

Thank you for working on that.
One nit: perhaps it should continue to work when a search string
contains a NUL byte?  E.g., this works before your change on OS X
yet finds no match with the patch applied:

  $ zgrep -af <(printf 'b\0\na') <(printf 'b\0') <(echo a)
  /dev/fd/12:b
  /dev/fd/13:a

Might be tricky to portably transform that NUL byte into something we
can embed in a command-line-specified search string. Is there even a
notation for that? I don't think so.

But NUL problems aside, this also should work, requiring alternation
in the regexp derived from input with two or more lines, but then
we'll have to escape embedded '|' bytes, too:

  $ zgrep -f <(printf 'a\nb') <(echo b) <(echo a)
  /dev/fd/12:b
  /dev/fd/13:a




Information forwarded to bug-gzip <at> gnu.org:
bug#22945; Package gzip. (Fri, 18 Mar 2016 07:47:02 GMT) Full text and rfc822 format available.

Message #23 received at 22945 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: Fulvio Scapin <trantorvega <at> gmail.com>, 22945 <at> debbugs.gnu.org
Subject: Re: bug#22945: Surprising behaviour (bug?) of zgrep in combination
 with the -f option and process substitutions
Date: Fri, 18 Mar 2016 00:46:25 -0700
[Message part 1 (text/plain, inline)]
Jim Meyering wrote:
> Might be tricky to portably transform that NUL byte into something we
> can embed in a command-line-specified search string. Is there even a
> notation for that? I don't think so.
>
> But NUL problems aside, this also should work, requiring alternation
> in the regexp derived from input with two or more lines, but then
> we'll have to escape embedded '|' bytes, too:

How about the attached patch instead? It uses a bigger hammer, which should 
address both issues.
[0001-zgrep-with-f-SPECIAL-read-SPECIAL-just-once.patch (text/x-diff, attachment)]

Information forwarded to bug-gzip <at> gnu.org:
bug#22945; Package gzip. (Fri, 18 Mar 2016 20:26:01 GMT) Full text and rfc822 format available.

Message #26 received at 22945 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Fulvio Scapin <trantorvega <at> gmail.com>, 22945 <at> debbugs.gnu.org
Subject: Re: bug#22945: Surprising behaviour (bug?) of zgrep in combination
 with the -f option and process substitutions
Date: Fri, 18 Mar 2016 13:24:55 -0700
[Message part 1 (text/plain, inline)]
On Fri, Mar 18, 2016 at 12:46 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> Might be tricky to portably transform that NUL byte into something we
>> can embed in a command-line-specified search string. Is there even a
>> notation for that? I don't think so.
>>
>> But NUL problems aside, this also should work, requiring alternation
>> in the regexp derived from input with two or more lines, but then
>> we'll have to escape embedded '|' bytes, too:
>
>
> How about the attached patch instead? It uses a bigger hammer, which should
> address both issues.

Very nice. Thank you very much.
You are welcome to push that with changes like the following:
 - retain the 2-empty-line section separator in NEWS (there's a
syntax-check hook to test for that in other packages, but not yet here
in gzip)
 - adjust the test to cover the case of more than one line in -f's input:
[k.patch (text/x-patch, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Fri, 18 Mar 2016 22:31:02 GMT) Full text and rfc822 format available.

Notification sent to Fulvio Scapin <trantorvega <at> gmail.com>:
bug acknowledged by developer. (Fri, 18 Mar 2016 22:31:02 GMT) Full text and rfc822 format available.

Message #31 received at 22945-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: Fulvio Scapin <trantorvega <at> gmail.com>, 22945-done <at> debbugs.gnu.org
Subject: Re: bug#22945: Surprising behaviour (bug?) of zgrep in combination
 with the -f option and process substitutions
Date: Fri, 18 Mar 2016 15:30:24 -0700
[Message part 1 (text/plain, inline)]
On 03/18/2016 01:24 PM, Jim Meyering wrote:
> You are welcome to push that with changes like the following:
OK, thanks, I pushed the attached patch, which contains those changes, 
plus one more change: check for errors when writing to the temporary 
pattern file. Marking this as done.
[0001-zgrep-with-f-SPECIAL-read-SPECIAL-just-once.patch (application/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 16 Apr 2016 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 126 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.