GNU bug report logs - #23892
grep is not "grepping" from grep-2.23-1 (archlinux) with external fixed patterns file.

Previous Next

Package: grep;

Reported by: Pascal <patatetom <at> gmail.com>

Date: Mon, 4 Jul 2016 13:58:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23892 in the body.
You can then email your comments to 23892 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#23892; Package grep. (Mon, 04 Jul 2016 13:58:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pascal <patatetom <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 04 Jul 2016 13:58:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pascal <patatetom <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: grep is not "grepping" from grep-2.23-1 (archlinux) with external
 fixed patterns file.
Date: Mon, 4 Jul 2016 15:57:18 +0200
[Message part 1 (text/plain, inline)]
hi,

I've a big (3.3Go) gzipped file which comes from nsrl with fields separated
by one tabulation :

$ zcat nsrlfiletxt.gz | head -2
sha-1    md5    crc32    filename    filesize    productcode
opsystemcode    specialcode
000000206738748edd92c4e3d2e823896700f849
392126e756571ebf112cb1c1cdedf926    ebd105a0    i05002t2.pfb    98865
3095    win

I've a file with fixed patterns (windows only from field 7 opsystemcode) :

$ cat win.os
2000 sp 4
2ksp3
dos
...
xp sp2
xphomeedw/sp2
xpprofessw/sp2

my os is :

$ uname -a
Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64
GNU/Linux

and grep is :

$ grep --version
grep (GNU grep) 2.25
...

$ pacman -Q grep
grep 2.25-2

when I try this :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,4k 0:00:00 [ 776k/s] [ <=> ]

only 59.4k lines are processed, with no error :-( !
(sed is used on win.os to match only on field and pipe view is used to show
progess)

I downgrade to grep 2.24 :

# pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz
...

and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,4k 0:00:00 [ 863k/s] [ <=> ]

again, only 59.4k lines are processed, with no error :-( !

I downgrade to grep 2.23 :

# pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz
...

and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,1k 0:00:00 [ 823k/s] [ <=> ]

only 59.1k lines are processed, with no error :-( !

I downgrade to grep 2.22 :

# pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz
...

and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
 157M 0:04:36 [ 567k/s] [ <=> ]

all the 157M of lines are well processed :-) !

so I think there's a bug introduced with grep 2.23...

regards.
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#23892; Package grep. (Mon, 04 Jul 2016 14:52:01 GMT) Full text and rfc822 format available.

Message #8 received at 23892 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pascal <patatetom <at> gmail.com>
Cc: 23892 <at> debbugs.gnu.org
Subject: Re: bug#23892: grep is not "grepping" from grep-2.23-1 (archlinux)
 with external fixed patterns file.
Date: Mon, 4 Jul 2016 07:51:13 -0700
On Mon, Jul 4, 2016 at 6:57 AM, Pascal <patatetom <at> gmail.com> wrote:
> hi,
>
> I've a big (3.3Go) gzipped file which comes from nsrl with fields separated
> by one tabulation :
>
> $ zcat nsrlfiletxt.gz | head -2
> sha-1    md5    crc32    filename    filesize    productcode
> opsystemcode    specialcode
> 000000206738748edd92c4e3d2e823896700f849
> 392126e756571ebf112cb1c1cdedf926    ebd105a0    i05002t2.pfb    98865
> 3095    win
>
> I've a file with fixed patterns (windows only from field 7 opsystemcode) :
>
> $ cat win.os
> 2000 sp 4
> 2ksp3
> dos
> ...
> xp sp2
> xphomeedw/sp2
> xpprofessw/sp2
>
> my os is :
>
> $ uname -a
> Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64
> GNU/Linux
>
> and grep is :
>
> $ grep --version
> grep (GNU grep) 2.25
> ...
>
> $ pacman -Q grep
> grep 2.25-2
>
> when I try this :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,4k 0:00:00 [ 776k/s] [ <=> ]
>
> only 59.4k lines are processed, with no error :-( !
> (sed is used on win.os to match only on field and pipe view is used to show
> progess)
>
> I downgrade to grep 2.24 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,4k 0:00:00 [ 863k/s] [ <=> ]
>
> again, only 59.4k lines are processed, with no error :-( !
>
> I downgrade to grep 2.23 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,1k 0:00:00 [ 823k/s] [ <=> ]
>
> only 59.1k lines are processed, with no error :-( !
>
> I downgrade to grep 2.22 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
>  157M 0:04:36 [ 567k/s] [ <=> ]
>
> all the 157M of lines are well processed :-) !
>
> so I think there's a bug introduced with grep 2.23...

Thank you for the report. However, I'll bet that your input is not
what POSIX calls a "text file," and your locale is neither C nor
POSIX. I.e., I'll bet the input contains a NUL byte or a sequence of
bytes that constitutes an invalid character in your locale. Either of
those would make your use of grep non-conformant. You may be able to
make your command work portably by adding grep's "-a" option or by
running grep in the C locale:

  zcat nsrlfiletxt.gz | pv -l | LC_ALL=C grep --fixed-strings --file=...

or

  zcat nsrlfiletxt.gz | pv -l | grep -a --fixed-strings --file=...

If you look at the actual output, you should see an indication of the
problem: when you have less output than expected, there should be at
least one line of the form "Binary file ... matches".




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Mon, 04 Jul 2016 20:06:01 GMT) Full text and rfc822 format available.

Notification sent to Pascal <patatetom <at> gmail.com>:
bug acknowledged by developer. (Mon, 04 Jul 2016 20:06:01 GMT) Full text and rfc822 format available.

Message #13 received at 23892-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pascal <patatetom <at> gmail.com>
Cc: 23892-done <at> debbugs.gnu.org
Subject: Re: bug#23892: grep is not "grepping" from grep-2.23-1 (archlinux)
 with external fixed patterns file.
Date: Mon, 4 Jul 2016 13:05:25 -0700
tags 23892 notabug
thanks

[I've re-added the bug-tracking address to record that this was not a
bug and that the issue auto-created by your email is closed. ]

On Mon, Jul 4, 2016 at 11:56 AM, Pascal <patatetom <at> gmail.com> wrote:
> that's right, with LANG=C before grep : all lines are processed :-)

Use LC_ALL=C, not LANG=C. The latter is not portable, while the former is.

> but why it was good with grep 2.22 ?

We discovered bugs -- triggered by e.g., invalid multibyte characters --
that could cause a segfault or an infinite loop that were present in 2.22,
and to fix them, we had to make grep more strict.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 02 Aug 2016 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 318 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.