From unknown Sun Jun 15 08:42:05 2025 X-Loop: help-debbugs@gnu.org Subject: bug#23892: grep is not "grepping" from grep-2.23-1 (archlinux) with external fixed patterns file. Resent-From: Pascal Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 04 Jul 2016 13:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 23892 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 23892@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.146764065228461 (code B ref -1); Mon, 04 Jul 2016 13:58:02 +0000 Received: (at submit) by debbugs.gnu.org; 4 Jul 2016 13:57:32 +0000 Received: from localhost ([127.0.0.1]:37157 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bK4NQ-0007Oy-Dk for submit@debbugs.gnu.org; Mon, 04 Jul 2016 09:57:32 -0400 Received: from eggs.gnu.org ([208.118.235.92]:43903) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bK4NO-0007On-Vw for submit@debbugs.gnu.org; Mon, 04 Jul 2016 09:57:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bK4NI-0006XL-HD for submit@debbugs.gnu.org; Mon, 04 Jul 2016 09:57:25 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56476) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bK4NI-0006XG-Dc for submit@debbugs.gnu.org; Mon, 04 Jul 2016 09:57:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34864) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bK4NG-0003gO-Un for bug-grep@gnu.org; Mon, 04 Jul 2016 09:57:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bK4NE-0006Wr-Te for bug-grep@gnu.org; Mon, 04 Jul 2016 09:57:21 -0400 Received: from mail-pa0-x242.google.com ([2607:f8b0:400e:c03::242]:33368) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bK4NE-0006Wi-Iz for bug-grep@gnu.org; Mon, 04 Jul 2016 09:57:20 -0400 Received: by mail-pa0-x242.google.com with SMTP id ts6so15985490pac.0 for ; Mon, 04 Jul 2016 06:57:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=cwpxuv2Uo1YQqBRGSG/37uBYHlKnHAFNHF3ugsoBh74=; b=i6jo5eOaiUZpEkVbFEeAwb2sG7ETNzTXc2uIJyq7uGsdF9bHRPXGWJQsOCuYLg2EAp WYDFO4SAtXYWC2xkwf5NqvZvDUK6l/WerlfGcFmndSR3bE+dqkRXraiYFDhvx+V4qNC+ I0ZNqpWEMdzYNIaHRL7oYeF93V710dofYEM5fIR0NxuXyquUO1+xYrG7LQT3Hb/Bz0TO +uEKCyYn3FMTrsyhn8b97nj8Cs0oTERFgrn3/0ODB7fSVdPjwbudgOdlDtEzY6uKtb7L uVqcMMhPnjhsOczS1b+jHMN0F1X81WosrnSvhtgFXU8sImk9iYfYa/Om5/PcYSUXwn7i YuoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=cwpxuv2Uo1YQqBRGSG/37uBYHlKnHAFNHF3ugsoBh74=; b=UcFgcuX0oiB4DF4OmiR6TEY/I5CjyYfxwOxyC/Q+KPKhfBbRQrESxECHQnS1JPjmTs qNgDqRiIIDkdshuajCZdlkzZjm9Tuzuc6RBnwVCHBzlcYduZtTReEgMmHa6OQicA5D/D OsjUapcR1rxd09wd/RmLBLZEg/wYDd46BB5Y8ODTb487keNnbAHqiT/PgkZa68RZTE9t MeYnwz3e/y3aRCzavWojppXxUuvp/EaT7NtfZA2CBMzP3dHix5OtQlEmIPAw69zcKs+q Ug4vmEjJ+iy1yXUpJonOJk7MygZlFAA5qGHXwIvxbS0G7vQicDBAuzN+UBoxtqrA3Do3 lgFQ== X-Gm-Message-State: ALyK8tLPGNR83sTlI+gYO0Det/4zVVOWjGRdXbX4vl1WhwMXWU5k3RKDjx3NlW8jNRnukt7OhRgDX8xfgVAC0w== X-Received: by 10.66.175.45 with SMTP id bx13mr20042742pac.23.1467640639234; Mon, 04 Jul 2016 06:57:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.66.168.236 with HTTP; Mon, 4 Jul 2016 06:57:18 -0700 (PDT) From: Pascal Date: Mon, 4 Jul 2016 15:57:18 +0200 Message-ID: Content-Type: multipart/alternative; boundary=047d7bdc89a6f5a6850536cfb6bb X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --047d7bdc89a6f5a6850536cfb6bb Content-Type: text/plain; charset=UTF-8 hi, I've a big (3.3Go) gzipped file which comes from nsrl with fields separated by one tabulation : $ zcat nsrlfiletxt.gz | head -2 sha-1 md5 crc32 filename filesize productcode opsystemcode specialcode 000000206738748edd92c4e3d2e823896700f849 392126e756571ebf112cb1c1cdedf926 ebd105a0 i05002t2.pfb 98865 3095 win I've a file with fixed patterns (windows only from field 7 opsystemcode) : $ cat win.os 2000 sp 4 2ksp3 dos ... xp sp2 xphomeedw/sp2 xpprofessw/sp2 my os is : $ uname -a Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64 GNU/Linux and grep is : $ grep --version grep (GNU grep) 2.25 ... $ pacman -Q grep grep 2.25-2 when I try this : $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows 59,4k 0:00:00 [ 776k/s] [ <=> ] only 59.4k lines are processed, with no error :-( ! (sed is used on win.os to match only on field and pipe view is used to show progess) I downgrade to grep 2.24 : # pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz ... and retry this (the same) : $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows 59,4k 0:00:00 [ 863k/s] [ <=> ] again, only 59.4k lines are processed, with no error :-( ! I downgrade to grep 2.23 : # pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz ... and retry this (the same) : $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows 59,1k 0:00:00 [ 823k/s] [ <=> ] only 59.1k lines are processed, with no error :-( ! I downgrade to grep 2.22 : # pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz ... and retry this (the same) : $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows 157M 0:04:36 [ 567k/s] [ <=> ] all the 157M of lines are well processed :-) ! so I think there's a bug introduced with grep 2.23... regards. --047d7bdc89a6f5a6850536cfb6bb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
hi,

I've a big (3.3Go) gzipped file which comes= from nsrl with fields separated by one tabulation :

$ zcat nsrlfiletxt.gz | head -2
sha-1= =C2=A0=C2=A0=C2=A0 md5=C2=A0=C2=A0=C2=A0 crc32=C2=A0=C2=A0=C2=A0 filename= =C2=A0=C2=A0=C2=A0 filesize=C2=A0=C2=A0=C2=A0 productcode=C2=A0=C2=A0=C2=A0= opsystemcode=C2=A0=C2=A0=C2=A0 specialcode
000000206738748edd92c4e3d2e8= 23896700f849=C2=A0=C2=A0=C2=A0 392126e756571ebf112cb1c1cdedf926=C2=A0=C2=A0= =C2=A0 ebd105a0=C2=A0=C2=A0=C2=A0 i05002t2.pfb=C2=A0=C2=A0=C2=A0 98865=C2= =A0=C2=A0=C2=A0 3095=C2=A0=C2=A0=C2=A0 win


I've a file wi= th fixed patterns (windows only from field 7 opsystemcode) :

$ cat win.os
2000 sp 4
2ksp= 3
dos
...
xp sp2
xphomeedw/sp2
xpprofessw/sp2


= my os is :

$ uname -= a
Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64 GN= U/Linux


and grep is :

$ grep --version
grep (GNU grep) 2.25
...

$ pa= cman -Q grep
grep 2.25-2


when I try this :

$ zcat nsrlfiletxt.gz | pv -l | gre= p --fixed-strings --file=3D<( sed 's;^.*$;\t&\t;' win.os ) &= gt; /opt/nsrl.windows
59,4k 0:00:00 [ 776k/s] [ <=3D> ]

=
only 59.4k lines are processed, with no error :-( !
(sed is used on = win.os to match only on field and pipe view is used to show progess)
I downgrade to grep 2.24 :

# pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz ...

and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-str= ings --file=3D<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl= .windows
59,4k 0:00:00 [ 863k/s] [ <=3D> ]


again, on= ly 59.4k lines are processed, with no error :-( !

I downgrade to gre= p 2.23 :

# pacman -U= /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz
...


= and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=3D<(= sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,1k 0= :00:00 [ 823k/s] [ <=3D> ]


only 59.1k lines are process= ed, with no error :-( !

I downgrade to grep 2.22 :

# pacman -U /var/cache/pacman/pkg/gre= p-2.22-1-x86_64.pkg.tar.xz
...


and retry this (the same) = :

$ zcat nsrlfiletxt= .gz | pv -l | grep --fixed-strings --file=3D<( sed 's;^.*$;\t&\t= ;' win.os ) > /opt/nsrl.windows
=C2=A0157M 0:04:36 [ 567k/s] [ &l= t;=3D> ]


all the 157M of lines are well processed :-) !
so I think there's a bug introduced with grep 2.23...

regar= ds.
--047d7bdc89a6f5a6850536cfb6bb-- From unknown Sun Jun 15 08:42:05 2025 X-Loop: help-debbugs@gnu.org Subject: bug#23892: grep is not "grepping" from grep-2.23-1 (archlinux) with external fixed patterns file. Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 04 Jul 2016 14:52:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23892 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Pascal Cc: 23892@debbugs.gnu.org Received: via spool by 23892-submit@debbugs.gnu.org id=B23892.14676439001000 (code B ref 23892); Mon, 04 Jul 2016 14:52:01 +0000 Received: (at 23892) by debbugs.gnu.org; 4 Jul 2016 14:51:40 +0000 Received: from localhost ([127.0.0.1]:37224 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bK5Do-0000G4-0Q for submit@debbugs.gnu.org; Mon, 04 Jul 2016 10:51:40 -0400 Received: from mail-yw0-f176.google.com ([209.85.161.176]:34925) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bK5Dn-0000Fs-1c for 23892@debbugs.gnu.org; Mon, 04 Jul 2016 10:51:39 -0400 Received: by mail-yw0-f176.google.com with SMTP id l125so41290107ywb.2 for <23892@debbugs.gnu.org>; Mon, 04 Jul 2016 07:51:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=hOxwa8jo4FJm2Xwgpf8n0yNQMjjQ0di0e611A9D0LZk=; b=tOkPjckf9Gl2NHL13aq9ZTu8D/15zuLvhFNSZCfLzf9U8aMMrkm4zkHbCBuM93x8j4 Ogq8UdvkLg+uja1R++anjpLeWMqzd/oEgbbg4NmrVdzmqmNlOLfCJfiIzYvC6tCC489k gp0ZBy1j3oTuYVOi0Yeounq9mzU7AQsaPk8JVCgSA6hQKcfnjJGuuSzUQy50NkHnAzbL DSnxLU8PQEY296MvDQ5byIwoKXIxtT2sSXEBM9ebtdorYYncmQZZIt0E0l0woZdgqeef i1eT+MSgT5+8xFYF7YjfXpLfCz6m52YF5MGT5vpDZlfQyBaKwSR/oiUUC1ufxCILkxTj aVAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=hOxwa8jo4FJm2Xwgpf8n0yNQMjjQ0di0e611A9D0LZk=; b=hvHEz1nCppyz4PqPIVupacVs8AEFgbDrDIFjhfeDHcXTMbza949kQzqirIjkg4vXmS 5o2gflPe4WTM4p8oCbHffW5qw0i0N0Ltd60PPs2Qijsj/AfMu0XcvBZM0sw8ecfEqWVQ 9PpyIQrOPPevbd1qYeWMzf4M5oLAszaq7PubrtR5SbFj+5STWjJxJ2F/FiEcG+MHgsT3 Cal36CKiBk6ASk5waWj48Iwe0zodHeM30N6PW0e6V5vL0VrwTShi50shCrMFWXSKciR4 JhSsEIXm10zk833DTIF554BHbWWCVI58sJs9AnEY6prxuQhPgXJY9JdDRo1eMisYeAsC kXcw== X-Gm-Message-State: ALyK8tKf/WUIU1FLHHcT7TrQX+gSaexCCD7HyGoI4boNAOIdFUKYsSAvc0zi/urtXABnVaFFhRvRhnSsRxuhiw== X-Received: by 10.37.98.86 with SMTP id w83mr7098076ybb.129.1467643893208; Mon, 04 Jul 2016 07:51:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.13.212.1 with HTTP; Mon, 4 Jul 2016 07:51:13 -0700 (PDT) In-Reply-To: References: From: Jim Meyering Date: Mon, 4 Jul 2016 07:51:13 -0700 X-Google-Sender-Auth: C3s5yN0lFxjH9s8QVTzWCgu1lnU Message-ID: Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On Mon, Jul 4, 2016 at 6:57 AM, Pascal wrote: > hi, > > I've a big (3.3Go) gzipped file which comes from nsrl with fields separated > by one tabulation : > > $ zcat nsrlfiletxt.gz | head -2 > sha-1 md5 crc32 filename filesize productcode > opsystemcode specialcode > 000000206738748edd92c4e3d2e823896700f849 > 392126e756571ebf112cb1c1cdedf926 ebd105a0 i05002t2.pfb 98865 > 3095 win > > I've a file with fixed patterns (windows only from field 7 opsystemcode) : > > $ cat win.os > 2000 sp 4 > 2ksp3 > dos > ... > xp sp2 > xphomeedw/sp2 > xpprofessw/sp2 > > my os is : > > $ uname -a > Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64 > GNU/Linux > > and grep is : > > $ grep --version > grep (GNU grep) 2.25 > ... > > $ pacman -Q grep > grep 2.25-2 > > when I try this : > > $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed > 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows > 59,4k 0:00:00 [ 776k/s] [ <=> ] > > only 59.4k lines are processed, with no error :-( ! > (sed is used on win.os to match only on field and pipe view is used to show > progess) > > I downgrade to grep 2.24 : > > # pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz > ... > > and retry this (the same) : > > $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed > 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows > 59,4k 0:00:00 [ 863k/s] [ <=> ] > > again, only 59.4k lines are processed, with no error :-( ! > > I downgrade to grep 2.23 : > > # pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz > ... > > and retry this (the same) : > > $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed > 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows > 59,1k 0:00:00 [ 823k/s] [ <=> ] > > only 59.1k lines are processed, with no error :-( ! > > I downgrade to grep 2.22 : > > # pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz > ... > > and retry this (the same) : > > $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed > 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows > 157M 0:04:36 [ 567k/s] [ <=> ] > > all the 157M of lines are well processed :-) ! > > so I think there's a bug introduced with grep 2.23... Thank you for the report. However, I'll bet that your input is not what POSIX calls a "text file," and your locale is neither C nor POSIX. I.e., I'll bet the input contains a NUL byte or a sequence of bytes that constitutes an invalid character in your locale. Either of those would make your use of grep non-conformant. You may be able to make your command work portably by adding grep's "-a" option or by running grep in the C locale: zcat nsrlfiletxt.gz | pv -l | LC_ALL=C grep --fixed-strings --file=... or zcat nsrlfiletxt.gz | pv -l | grep -a --fixed-strings --file=... If you look at the actual output, you should see an indication of the problem: when you have less output than expected, there should be at least one line of the form "Binary file ... matches". From unknown Sun Jun 15 08:42:05 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Pascal Subject: bug#23892: closed (Re: bug#23892: grep is not "grepping" from grep-2.23-1 (archlinux) with external fixed patterns file.) Message-ID: References: X-Gnu-PR-Message: they-closed 23892 X-Gnu-PR-Package: grep Reply-To: 23892@debbugs.gnu.org Date: Mon, 04 Jul 2016 20:06:01 +0000 Content-Type: multipart/mixed; boundary="----------=_1467662761-11513-1" This is a multi-part message in MIME format... ------------=_1467662761-11513-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #23892: grep is not "grepping" from grep-2.23-1 (archlinux) with external f= ixed patterns file. which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 23892@debbugs.gnu.org. --=20 23892: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D23892 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1467662761-11513-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 23892-done) by debbugs.gnu.org; 4 Jul 2016 20:05:52 +0000 Received: from localhost ([127.0.0.1]:37413 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bKA7s-0002zN-Jr for submit@debbugs.gnu.org; Mon, 04 Jul 2016 16:05:52 -0400 Received: from mail-oi0-f51.google.com ([209.85.218.51]:33327) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bKA7q-0002z5-Pa for 23892-done@debbugs.gnu.org; Mon, 04 Jul 2016 16:05:51 -0400 Received: by mail-oi0-f51.google.com with SMTP id u201so205149813oie.0 for <23892-done@debbugs.gnu.org>; Mon, 04 Jul 2016 13:05:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=Xkw6X42HLfrisd/zixYgctVK37eMcuiuXM+fbqg/jPM=; b=hoNzLzbqIBlx1HyyqH5JO1V0IqM4cBiEBLEY8a4uorDUeeu2IufGLuA5Xt0QzTDeSU fFUgcWn0eAX7xeZZwms+f/GdYA4JYOwAg8Gd/WlOtekxCtUOcDABVWA8WUree7UhzTPH kPstX0IW4iG93Z118YcyCGWMhCsb3kZejYu1bLdQLEmbiOyGgmrxSOPt30da5WspsIcU VACBWqyrf6NjQkgoxJZwTl41vmwa7HkUdw/Zml4VDJVhXS+MFDInSBNuYhGmD2DhVU26 DrIo6DCz8Ft4x91NeoHKkefMJOuLke7neqp6ykEtu84QRo+mytpHkFf/rb3FFql2oeSp /Bag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=Xkw6X42HLfrisd/zixYgctVK37eMcuiuXM+fbqg/jPM=; b=CsZ7nqp5NJfl80uLWTkrt+RePu8XHgebF2s5FHE+HyyDq4mDpMoaMZZ7CHcNCzGHPl OKmEx7655xkLELmnCP6mmFpTcuAe6eIiaVS8+oVjjt9EdCIySohoS6tx92mDtOjXEUHy yFMxKbQxZaDrb8/r9p5f2ldZ3/Jwde14zf6wiAmTVUk62Ru5VlItKVtW8MCvRCDOYB1D fJGILD7x2V8t2o8RgULHzqO3/LG0yvWxdKyH+2F7p86+v6NHngpLl5AHOnu2y6CVTIYX ByOCYSb/+2UWnD5hoCrTHQTN3M+LFz1jpxBhLLDxhPxFS40dFbu1bALufU75ZGrzQRbn 8z0Q== X-Gm-Message-State: ALyK8tL4c9xnEBXCqot+EsKyFjbMbOvK/B/2IWNkJvC8aRzB34T6+FXsWR2JYQXua9oFjya2L2t3LaIcpkAlAw== X-Received: by 10.157.15.2 with SMTP id 2mr4604585ott.3.1467662745167; Mon, 04 Jul 2016 13:05:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.73.23 with HTTP; Mon, 4 Jul 2016 13:05:25 -0700 (PDT) In-Reply-To: References: From: Jim Meyering Date: Mon, 4 Jul 2016 13:05:25 -0700 X-Google-Sender-Auth: zfY1wRyaMbDdeIDD584SpC3dIdk Message-ID: Subject: Re: bug#23892: grep is not "grepping" from grep-2.23-1 (archlinux) with external fixed patterns file. To: Pascal Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 23892-done Cc: 23892-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) tags 23892 notabug thanks [I've re-added the bug-tracking address to record that this was not a bug and that the issue auto-created by your email is closed. ] On Mon, Jul 4, 2016 at 11:56 AM, Pascal wrote: > that's right, with LANG=C before grep : all lines are processed :-) Use LC_ALL=C, not LANG=C. The latter is not portable, while the former is. > but why it was good with grep 2.22 ? We discovered bugs -- triggered by e.g., invalid multibyte characters -- that could cause a segfault or an infinite loop that were present in 2.22, and to fix them, we had to make grep more strict. ------------=_1467662761-11513-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 4 Jul 2016 13:57:32 +0000 Received: from localhost ([127.0.0.1]:37157 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bK4NQ-0007Oy-Dk for submit@debbugs.gnu.org; Mon, 04 Jul 2016 09:57:32 -0400 Received: from eggs.gnu.org ([208.118.235.92]:43903) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bK4NO-0007On-Vw for submit@debbugs.gnu.org; Mon, 04 Jul 2016 09:57:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bK4NI-0006XL-HD for submit@debbugs.gnu.org; Mon, 04 Jul 2016 09:57:25 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56476) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bK4NI-0006XG-Dc for submit@debbugs.gnu.org; Mon, 04 Jul 2016 09:57:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34864) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bK4NG-0003gO-Un for bug-grep@gnu.org; Mon, 04 Jul 2016 09:57:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bK4NE-0006Wr-Te for bug-grep@gnu.org; Mon, 04 Jul 2016 09:57:21 -0400 Received: from mail-pa0-x242.google.com ([2607:f8b0:400e:c03::242]:33368) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bK4NE-0006Wi-Iz for bug-grep@gnu.org; Mon, 04 Jul 2016 09:57:20 -0400 Received: by mail-pa0-x242.google.com with SMTP id ts6so15985490pac.0 for ; Mon, 04 Jul 2016 06:57:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=cwpxuv2Uo1YQqBRGSG/37uBYHlKnHAFNHF3ugsoBh74=; b=i6jo5eOaiUZpEkVbFEeAwb2sG7ETNzTXc2uIJyq7uGsdF9bHRPXGWJQsOCuYLg2EAp WYDFO4SAtXYWC2xkwf5NqvZvDUK6l/WerlfGcFmndSR3bE+dqkRXraiYFDhvx+V4qNC+ I0ZNqpWEMdzYNIaHRL7oYeF93V710dofYEM5fIR0NxuXyquUO1+xYrG7LQT3Hb/Bz0TO +uEKCyYn3FMTrsyhn8b97nj8Cs0oTERFgrn3/0ODB7fSVdPjwbudgOdlDtEzY6uKtb7L uVqcMMhPnjhsOczS1b+jHMN0F1X81WosrnSvhtgFXU8sImk9iYfYa/Om5/PcYSUXwn7i YuoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=cwpxuv2Uo1YQqBRGSG/37uBYHlKnHAFNHF3ugsoBh74=; b=UcFgcuX0oiB4DF4OmiR6TEY/I5CjyYfxwOxyC/Q+KPKhfBbRQrESxECHQnS1JPjmTs qNgDqRiIIDkdshuajCZdlkzZjm9Tuzuc6RBnwVCHBzlcYduZtTReEgMmHa6OQicA5D/D OsjUapcR1rxd09wd/RmLBLZEg/wYDd46BB5Y8ODTb487keNnbAHqiT/PgkZa68RZTE9t MeYnwz3e/y3aRCzavWojppXxUuvp/EaT7NtfZA2CBMzP3dHix5OtQlEmIPAw69zcKs+q Ug4vmEjJ+iy1yXUpJonOJk7MygZlFAA5qGHXwIvxbS0G7vQicDBAuzN+UBoxtqrA3Do3 lgFQ== X-Gm-Message-State: ALyK8tLPGNR83sTlI+gYO0Det/4zVVOWjGRdXbX4vl1WhwMXWU5k3RKDjx3NlW8jNRnukt7OhRgDX8xfgVAC0w== X-Received: by 10.66.175.45 with SMTP id bx13mr20042742pac.23.1467640639234; Mon, 04 Jul 2016 06:57:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.66.168.236 with HTTP; Mon, 4 Jul 2016 06:57:18 -0700 (PDT) From: Pascal Date: Mon, 4 Jul 2016 15:57:18 +0200 Message-ID: Subject: grep is not "grepping" from grep-2.23-1 (archlinux) with external fixed patterns file. To: bug-grep@gnu.org Content-Type: multipart/alternative; boundary=047d7bdc89a6f5a6850536cfb6bb X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --047d7bdc89a6f5a6850536cfb6bb Content-Type: text/plain; charset=UTF-8 hi, I've a big (3.3Go) gzipped file which comes from nsrl with fields separated by one tabulation : $ zcat nsrlfiletxt.gz | head -2 sha-1 md5 crc32 filename filesize productcode opsystemcode specialcode 000000206738748edd92c4e3d2e823896700f849 392126e756571ebf112cb1c1cdedf926 ebd105a0 i05002t2.pfb 98865 3095 win I've a file with fixed patterns (windows only from field 7 opsystemcode) : $ cat win.os 2000 sp 4 2ksp3 dos ... xp sp2 xphomeedw/sp2 xpprofessw/sp2 my os is : $ uname -a Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64 GNU/Linux and grep is : $ grep --version grep (GNU grep) 2.25 ... $ pacman -Q grep grep 2.25-2 when I try this : $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows 59,4k 0:00:00 [ 776k/s] [ <=> ] only 59.4k lines are processed, with no error :-( ! (sed is used on win.os to match only on field and pipe view is used to show progess) I downgrade to grep 2.24 : # pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz ... and retry this (the same) : $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows 59,4k 0:00:00 [ 863k/s] [ <=> ] again, only 59.4k lines are processed, with no error :-( ! I downgrade to grep 2.23 : # pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz ... and retry this (the same) : $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows 59,1k 0:00:00 [ 823k/s] [ <=> ] only 59.1k lines are processed, with no error :-( ! I downgrade to grep 2.22 : # pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz ... and retry this (the same) : $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows 157M 0:04:36 [ 567k/s] [ <=> ] all the 157M of lines are well processed :-) ! so I think there's a bug introduced with grep 2.23... regards. --047d7bdc89a6f5a6850536cfb6bb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
hi,

I've a big (3.3Go) gzipped file which comes= from nsrl with fields separated by one tabulation :

$ zcat nsrlfiletxt.gz | head -2
sha-1= =C2=A0=C2=A0=C2=A0 md5=C2=A0=C2=A0=C2=A0 crc32=C2=A0=C2=A0=C2=A0 filename= =C2=A0=C2=A0=C2=A0 filesize=C2=A0=C2=A0=C2=A0 productcode=C2=A0=C2=A0=C2=A0= opsystemcode=C2=A0=C2=A0=C2=A0 specialcode
000000206738748edd92c4e3d2e8= 23896700f849=C2=A0=C2=A0=C2=A0 392126e756571ebf112cb1c1cdedf926=C2=A0=C2=A0= =C2=A0 ebd105a0=C2=A0=C2=A0=C2=A0 i05002t2.pfb=C2=A0=C2=A0=C2=A0 98865=C2= =A0=C2=A0=C2=A0 3095=C2=A0=C2=A0=C2=A0 win


I've a file wi= th fixed patterns (windows only from field 7 opsystemcode) :

$ cat win.os
2000 sp 4
2ksp= 3
dos
...
xp sp2
xphomeedw/sp2
xpprofessw/sp2


= my os is :

$ uname -= a
Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64 GN= U/Linux


and grep is :

$ grep --version
grep (GNU grep) 2.25
...

$ pa= cman -Q grep
grep 2.25-2


when I try this :

$ zcat nsrlfiletxt.gz | pv -l | gre= p --fixed-strings --file=3D<( sed 's;^.*$;\t&\t;' win.os ) &= gt; /opt/nsrl.windows
59,4k 0:00:00 [ 776k/s] [ <=3D> ]

=
only 59.4k lines are processed, with no error :-( !
(sed is used on = win.os to match only on field and pipe view is used to show progess)
I downgrade to grep 2.24 :

# pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz ...

and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-str= ings --file=3D<( sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl= .windows
59,4k 0:00:00 [ 863k/s] [ <=3D> ]


again, on= ly 59.4k lines are processed, with no error :-( !

I downgrade to gre= p 2.23 :

# pacman -U= /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz
...


= and retry this (the same) :

$ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=3D<(= sed 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
59,1k 0= :00:00 [ 823k/s] [ <=3D> ]


only 59.1k lines are process= ed, with no error :-( !

I downgrade to grep 2.22 :

# pacman -U /var/cache/pacman/pkg/gre= p-2.22-1-x86_64.pkg.tar.xz
...


and retry this (the same) = :

$ zcat nsrlfiletxt= .gz | pv -l | grep --fixed-strings --file=3D<( sed 's;^.*$;\t&\t= ;' win.os ) > /opt/nsrl.windows
=C2=A0157M 0:04:36 [ 567k/s] [ &l= t;=3D> ]


all the 157M of lines are well processed :-) !
so I think there's a bug introduced with grep 2.23...

regar= ds.
--047d7bdc89a6f5a6850536cfb6bb-- ------------=_1467662761-11513-1--