GNU bug report logs -
#23892
grep is not "grepping" from grep-2.23-1 (archlinux) with external fixed patterns file.
Previous Next
Reported by: Pascal <patatetom <at> gmail.com>
Date: Mon, 4 Jul 2016 13:58:02 UTC
Severity: normal
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23892 in the body.
You can then email your comments to 23892 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#23892
; Package
grep
.
(Mon, 04 Jul 2016 13:58:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Pascal <patatetom <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Mon, 04 Jul 2016 13:58:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
hi,
I've a big (3.3Go) gzipped file which comes from nsrl with fields separated
by one tabulation :
$ zcat nsrlfiletxt.gz | head -2
sha-1 md5 crc32 filename filesize productcode
opsystemcode specialcode
000000206738748edd92c4e3d2e823896700f849
392126e756571ebf112cb1c1cdedf926 ebd105a0 i05002t2.pfb 98865
3095 win
I've a file with fixed patterns (windows only from field 7 opsystemcode) :
$ cat win.os
2000 sp 4
2ksp3
dos
...
xp sp2
xphomeedw/sp2
xpprofessw/sp2
my os is :
$ uname -a
Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64
GNU/Linux
and grep is :
$ grep --version
grep (GNU grep) 2.25
...
$ pacman -Q grep
grep 2.25-2
when I try this :
$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,4k 0:00:00 [ 776k/s] [ <=> ]
only 59.4k lines are processed, with no error :-( !
(sed is used on win.os to match only on field and pipe view is used to show
progess)
I downgrade to grep 2.24 :
# pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz
...
and retry this (the same) :
$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,4k 0:00:00 [ 863k/s] [ <=> ]
again, only 59.4k lines are processed, with no error :-( !
I downgrade to grep 2.23 :
# pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz
...
and retry this (the same) :
$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,1k 0:00:00 [ 823k/s] [ <=> ]
only 59.1k lines are processed, with no error :-( !
I downgrade to grep 2.22 :
# pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz
...
and retry this (the same) :
$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
157M 0:04:36 [ 567k/s] [ <=> ]
all the 157M of lines are well processed :-) !
so I think there's a bug introduced with grep 2.23...
regards.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#23892
; Package
grep
.
(Mon, 04 Jul 2016 14:52:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 23892 <at> debbugs.gnu.org (full text, mbox):
On Mon, Jul 4, 2016 at 6:57 AM, Pascal <patatetom <at> gmail.com> wrote:
> hi,
>
> I've a big (3.3Go) gzipped file which comes from nsrl with fields separated
> by one tabulation :
>
> $ zcat nsrlfiletxt.gz | head -2
> sha-1 md5 crc32 filename filesize productcode
> opsystemcode specialcode
> 000000206738748edd92c4e3d2e823896700f849
> 392126e756571ebf112cb1c1cdedf926 ebd105a0 i05002t2.pfb 98865
> 3095 win
>
> I've a file with fixed patterns (windows only from field 7 opsystemcode) :
>
> $ cat win.os
> 2000 sp 4
> 2ksp3
> dos
> ...
> xp sp2
> xphomeedw/sp2
> xpprofessw/sp2
>
> my os is :
>
> $ uname -a
> Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64
> GNU/Linux
>
> and grep is :
>
> $ grep --version
> grep (GNU grep) 2.25
> ...
>
> $ pacman -Q grep
> grep 2.25-2
>
> when I try this :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,4k 0:00:00 [ 776k/s] [ <=> ]
>
> only 59.4k lines are processed, with no error :-( !
> (sed is used on win.os to match only on field and pipe view is used to show
> progess)
>
> I downgrade to grep 2.24 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,4k 0:00:00 [ 863k/s] [ <=> ]
>
> again, only 59.4k lines are processed, with no error :-( !
>
> I downgrade to grep 2.23 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,1k 0:00:00 [ 823k/s] [ <=> ]
>
> only 59.1k lines are processed, with no error :-( !
>
> I downgrade to grep 2.22 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 157M 0:04:36 [ 567k/s] [ <=> ]
>
> all the 157M of lines are well processed :-) !
>
> so I think there's a bug introduced with grep 2.23...
Thank you for the report. However, I'll bet that your input is not
what POSIX calls a "text file," and your locale is neither C nor
POSIX. I.e., I'll bet the input contains a NUL byte or a sequence of
bytes that constitutes an invalid character in your locale. Either of
those would make your use of grep non-conformant. You may be able to
make your command work portably by adding grep's "-a" option or by
running grep in the C locale:
zcat nsrlfiletxt.gz | pv -l | LC_ALL=C grep --fixed-strings --file=...
or
zcat nsrlfiletxt.gz | pv -l | grep -a --fixed-strings --file=...
If you look at the actual output, you should see an indication of the
problem: when you have less output than expected, there should be at
least one line of the form "Binary file ... matches".
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Mon, 04 Jul 2016 20:06:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Pascal <patatetom <at> gmail.com>
:
bug acknowledged by developer.
(Mon, 04 Jul 2016 20:06:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 23892-done <at> debbugs.gnu.org (full text, mbox):
tags 23892 notabug
thanks
[I've re-added the bug-tracking address to record that this was not a
bug and that the issue auto-created by your email is closed. ]
On Mon, Jul 4, 2016 at 11:56 AM, Pascal <patatetom <at> gmail.com> wrote:
> that's right, with LANG=C before grep : all lines are processed :-)
Use LC_ALL=C, not LANG=C. The latter is not portable, while the former is.
> but why it was good with grep 2.22 ?
We discovered bugs -- triggered by e.g., invalid multibyte characters --
that could cause a segfault or an infinite loop that were present in 2.22,
and to fix them, we had to make grep more strict.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 02 Aug 2016 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 8 years and 318 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.