From unknown Thu Sep 11 09:18:12 2025 X-Loop: help-debbugs@gnu.org Subject: bug#23185: GNU grep matching discrepancy between -a/--text and not. Resent-From: Shlomi Fish Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 02 Apr 2016 12:06:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 23185 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 23185@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.145959875012558 (code B ref -1); Sat, 02 Apr 2016 12:06:01 +0000 Received: (at submit) by debbugs.gnu.org; 2 Apr 2016 12:05:50 +0000 Received: from localhost ([127.0.0.1]:47843 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amKJJ-0003GU-LL for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:49 -0400 Received: from eggs.gnu.org ([208.118.235.92]:47017) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amKJH-0003GI-M8 for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1amKJB-0007iz-KM for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:42 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45470) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJB-0007iv-HJ for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:41 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37988) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJA-0002db-Ch for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1amKJ7-0007i9-6L for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:40 -0400 Received: from gateway31.websitewelcome.com ([192.185.143.33]:45024) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJ7-0007du-1T for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:37 -0400 Received: from cm3.websitewelcome.com (unknown [192.185.178.245]) by gateway31.websitewelcome.com (Postfix) with ESMTP id 00E945DA5B60C for ; Sat, 2 Apr 2016 07:05:20 -0500 (CDT) Received: from gator4065.hostgator.com ([192.185.4.76]) by cm3.websitewelcome.com with id dQ0H1s0211eQXhs01Q0J0U; Sat, 02 Apr 2016 07:00:19 -0500 Received: from bzq-79-183-4-74.red.bezeqint.net ([79.183.4.74]:48164 helo=telaviv1.shlomifish.org) by gator4065.hostgator.com with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.86_1) (envelope-from ) id 1amKDx-000MrC-By for bug-grep@gnu.org; Sat, 02 Apr 2016 07:00:17 -0500 Received: from telaviv1.shlomifish.org (telaviv1.shlomifish.org [127.0.0.1]) by telaviv1.shlomifish.org (Postfix) with ESMTP id 0C8B62600C7 for ; Sat, 2 Apr 2016 15:00:13 +0300 (IDT) Date: Sat, 2 Apr 2016 15:00:12 +0300 From: Shlomi Fish Message-ID: <20160402150012.37fd239e@telaviv1.shlomifish.org> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-mageia-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator4065.hostgator.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - shlomifish.org X-BWhitelist: no X-Source-IP: 79.183.4.74 X-Exim-ID: 1amKDx-000MrC-By X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: bzq-79-183-4-74.red.bezeqint.net (telaviv1.shlomifish.org) [79.183.4.74]:48164 X-Source-Auth: shlomif@shlomifish.org X-Email-Count: 1 X-Source-Cap: c2hsb21pZjtzaGxvbWlmO2dhdG9yNDA2NS5ob3N0Z2F0b3IuY29t X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.4 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) Hi all, as can be seen in this repository: https://github.com/shlomif/gnu-grep-trailing-space-and-CR-on-riddles.he-fal= se-match GNU grep says a document it suspects to be binary matches without -a/--text= and doesn't match it or return results with that flag applied. perl sides with = the latter. I'm on Mageia linux x86-64 v6 and have built GNU grep from the latest git commit ( c767ed70eca9a82d76f07dcdbcaafa21ec7f86d6 ) to test. Regards, Shlomi Fish P.S: it seems the build system uses gperf but configure does not verify tha= t it exists in the path. --=20 ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Interview with Ben Collins-Sussman - http://shlom.in/sussman Can I SCO now? Sue who you wanna sue, it doesn't matter anyhoo, it's time to litigate. =E2=80=94 http://www.shlomifish.org/humour/bits/Can-I-SCO-Now/ Please reply to list if it's a mailing list post - http://shlom.in/reply . From unknown Thu Sep 11 09:18:12 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Shlomi Fish Subject: bug#23185: closed (Re: bug#23185: GNU grep matching discrepancy between -a/--text and not.) Message-ID: References: <5704B316.2020601@cs.ucla.edu> <20160402150012.37fd239e@telaviv1.shlomifish.org> X-Gnu-PR-Message: they-closed 23185 X-Gnu-PR-Package: grep Reply-To: 23185@debbugs.gnu.org Date: Wed, 06 Apr 2016 06:57:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1459925822-21337-1" This is a multi-part message in MIME format... ------------=_1459925822-21337-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #23185: GNU grep matching discrepancy between -a/--text and not. which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 23185@debbugs.gnu.org. --=20 23185: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D23185 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1459925822-21337-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 23185-done) by debbugs.gnu.org; 6 Apr 2016 06:56:36 +0000 Received: from localhost ([127.0.0.1]:50907 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anhOG-0005XT-2j for submit@debbugs.gnu.org; Wed, 06 Apr 2016 02:56:36 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:57076) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anhOE-0005XG-EQ for 23185-done@debbugs.gnu.org; Wed, 06 Apr 2016 02:56:35 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id DF762160EF1; Tue, 5 Apr 2016 23:56:26 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 5BdXLur0VpsE; Tue, 5 Apr 2016 23:56:26 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 14BF716126F; Tue, 5 Apr 2016 23:56:26 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id JCmJlL8wuWgQ; Tue, 5 Apr 2016 23:56:25 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E7F4B160EF1; Tue, 5 Apr 2016 23:56:25 -0700 (PDT) Subject: Re: bug#23185: GNU grep matching discrepancy between -a/--text and not. To: Shlomi Fish , 23185-done@debbugs.gnu.org References: <20160402150012.37fd239e@telaviv1.shlomifish.org> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5704B316.2020601@cs.ucla.edu> Date: Tue, 5 Apr 2016 23:56:22 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160402150012.37fd239e@telaviv1.shlomifish.org> Content-Type: multipart/mixed; boundary="------------090801030602070609010805" X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23185-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------090801030602070609010805 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Thanks for pointing out the seeming inconsistency. The documentation ment= ions=20 the issue but is perhaps not clear enough, so I installed the attached pa= tch. The input file contains NUL bytes and so is treated as binary data, and t= he grep=20 documentation (secton "File and Directory Selection", option "--binary-fi= les")=20 says "When processing binary data, =E2=80=98grep=E2=80=99 may treat non-t= ext bytes as line=20 terminators". This behavior was added to GNU grep in release 2.21 dated 2= 014,=20 partly for performance reasons. There are two instances in riddle.he of a space followed by a NUL byte, s= o grep -P '[ \t]\r?$' riddles.he finds a match when the $ matches just before the NUL byte. -a is one way to get the behavior you evidently expected. Another (perhap= s=20 better) way is -z. The command: grep -zP '[ \t]\r?\n' riddles.he outputs nothing and exits with status 1. --------------090801030602070609010805 Content-Type: text/x-diff; name="0001-Give-another-example-of-binary-file-processing.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-Give-another-example-of-binary-file-processing.patch" >From 7cfd9d20773e1a67cb085a14206fd33274c64387 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 5 Apr 2016 23:53:30 -0700 Subject: [PATCH] Give another example of binary file processing Problem reported by Shlomi Fish * doc/grep.texi (File and Directory Selection): Document that 'q$' might match 'q' followed by a NUL if --binary-files=binary is in effect. --- doc/grep.texi | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/doc/grep.texi b/doc/grep.texi index 074113b..1d3d5cb 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -607,10 +607,6 @@ By default, @var{type} is @samp{binary}, and when @command{grep} discovers that a file is binary it suppresses any further output, and instead outputs either a one-line message saying that a binary file matches, or no message if there is no match. -When processing binary data, @command{grep} may treat non-text bytes -as line terminators; for example, the pattern @samp{.} (period) might -not match a null byte, as the null byte might be treated as a line -terminator even without the @option{-z} (@option{--null-data}) option. If @var{type} is @samp{without-match}, when @command{grep} discovers that a file is binary @@ -621,6 +617,16 @@ If @var{type} is @samp{text}, @command{grep} processes a binary file as if it were text; this is equivalent to the @option{-a} option. +When @var{type} is @samp{binary}, @command{grep} may treat non-text +bytes as line terminators even without the @option{-z} +(@option{--null-data}) option. This means choosing @samp{binary} +versus @samp{text} can affect whether a pattern matches a file. For +example, when @var{type} is @samp{binary} the pattern @samp{q$} might +match @samp{q} immediately followed by a null byte, even though this +is not matched when @var{type} is @samp{text}. Conversely, when +@var{type} is @samp{binary} the pattern @samp{.} (period) might not +match a null byte. + @emph{Warning:} @samp{--binary-files=text} might output binary garbage, which can have nasty side effects if the output is a terminal and -- 2.5.5 --------------090801030602070609010805-- ------------=_1459925822-21337-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 2 Apr 2016 12:05:50 +0000 Received: from localhost ([127.0.0.1]:47843 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amKJJ-0003GU-LL for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:49 -0400 Received: from eggs.gnu.org ([208.118.235.92]:47017) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amKJH-0003GI-M8 for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1amKJB-0007iz-KM for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:42 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45470) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJB-0007iv-HJ for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:41 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37988) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJA-0002db-Ch for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1amKJ7-0007i9-6L for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:40 -0400 Received: from gateway31.websitewelcome.com ([192.185.143.33]:45024) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJ7-0007du-1T for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:37 -0400 Received: from cm3.websitewelcome.com (unknown [192.185.178.245]) by gateway31.websitewelcome.com (Postfix) with ESMTP id 00E945DA5B60C for ; Sat, 2 Apr 2016 07:05:20 -0500 (CDT) Received: from gator4065.hostgator.com ([192.185.4.76]) by cm3.websitewelcome.com with id dQ0H1s0211eQXhs01Q0J0U; Sat, 02 Apr 2016 07:00:19 -0500 Received: from bzq-79-183-4-74.red.bezeqint.net ([79.183.4.74]:48164 helo=telaviv1.shlomifish.org) by gator4065.hostgator.com with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.86_1) (envelope-from ) id 1amKDx-000MrC-By for bug-grep@gnu.org; Sat, 02 Apr 2016 07:00:17 -0500 Received: from telaviv1.shlomifish.org (telaviv1.shlomifish.org [127.0.0.1]) by telaviv1.shlomifish.org (Postfix) with ESMTP id 0C8B62600C7 for ; Sat, 2 Apr 2016 15:00:13 +0300 (IDT) Date: Sat, 2 Apr 2016 15:00:12 +0300 From: Shlomi Fish To: bug-grep@gnu.org Subject: GNU grep matching discrepancy between -a/--text and not. Message-ID: <20160402150012.37fd239e@telaviv1.shlomifish.org> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-mageia-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator4065.hostgator.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - shlomifish.org X-BWhitelist: no X-Source-IP: 79.183.4.74 X-Exim-ID: 1amKDx-000MrC-By X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: bzq-79-183-4-74.red.bezeqint.net (telaviv1.shlomifish.org) [79.183.4.74]:48164 X-Source-Auth: shlomif@shlomifish.org X-Email-Count: 1 X-Source-Cap: c2hsb21pZjtzaGxvbWlmO2dhdG9yNDA2NS5ob3N0Z2F0b3IuY29t X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.4 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) Hi all, as can be seen in this repository: https://github.com/shlomif/gnu-grep-trailing-space-and-CR-on-riddles.he-fal= se-match GNU grep says a document it suspects to be binary matches without -a/--text= and doesn't match it or return results with that flag applied. perl sides with = the latter. I'm on Mageia linux x86-64 v6 and have built GNU grep from the latest git commit ( c767ed70eca9a82d76f07dcdbcaafa21ec7f86d6 ) to test. Regards, Shlomi Fish P.S: it seems the build system uses gperf but configure does not verify tha= t it exists in the path. --=20 ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Interview with Ben Collins-Sussman - http://shlom.in/sussman Can I SCO now? Sue who you wanna sue, it doesn't matter anyhoo, it's time to litigate. =E2=80=94 http://www.shlomifish.org/humour/bits/Can-I-SCO-Now/ Please reply to list if it's a mailing list post - http://shlom.in/reply . ------------=_1459925822-21337-1--