From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 02 08:05:49 2016 Received: (at submit) by debbugs.gnu.org; 2 Apr 2016 12:05:50 +0000 Received: from localhost ([127.0.0.1]:47843 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amKJJ-0003GU-LL for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:49 -0400 Received: from eggs.gnu.org ([208.118.235.92]:47017) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amKJH-0003GI-M8 for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1amKJB-0007iz-KM for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:42 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45470) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJB-0007iv-HJ for submit@debbugs.gnu.org; Sat, 02 Apr 2016 08:05:41 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37988) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJA-0002db-Ch for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1amKJ7-0007i9-6L for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:40 -0400 Received: from gateway31.websitewelcome.com ([192.185.143.33]:45024) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1amKJ7-0007du-1T for bug-grep@gnu.org; Sat, 02 Apr 2016 08:05:37 -0400 Received: from cm3.websitewelcome.com (unknown [192.185.178.245]) by gateway31.websitewelcome.com (Postfix) with ESMTP id 00E945DA5B60C for ; Sat, 2 Apr 2016 07:05:20 -0500 (CDT) Received: from gator4065.hostgator.com ([192.185.4.76]) by cm3.websitewelcome.com with id dQ0H1s0211eQXhs01Q0J0U; Sat, 02 Apr 2016 07:00:19 -0500 Received: from bzq-79-183-4-74.red.bezeqint.net ([79.183.4.74]:48164 helo=telaviv1.shlomifish.org) by gator4065.hostgator.com with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.86_1) (envelope-from ) id 1amKDx-000MrC-By for bug-grep@gnu.org; Sat, 02 Apr 2016 07:00:17 -0500 Received: from telaviv1.shlomifish.org (telaviv1.shlomifish.org [127.0.0.1]) by telaviv1.shlomifish.org (Postfix) with ESMTP id 0C8B62600C7 for ; Sat, 2 Apr 2016 15:00:13 +0300 (IDT) Date: Sat, 2 Apr 2016 15:00:12 +0300 From: Shlomi Fish To: bug-grep@gnu.org Subject: GNU grep matching discrepancy between -a/--text and not. Message-ID: <20160402150012.37fd239e@telaviv1.shlomifish.org> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-mageia-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator4065.hostgator.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - shlomifish.org X-BWhitelist: no X-Source-IP: 79.183.4.74 X-Exim-ID: 1amKDx-000MrC-By X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: bzq-79-183-4-74.red.bezeqint.net (telaviv1.shlomifish.org) [79.183.4.74]:48164 X-Source-Auth: shlomif@shlomifish.org X-Email-Count: 1 X-Source-Cap: c2hsb21pZjtzaGxvbWlmO2dhdG9yNDA2NS5ob3N0Z2F0b3IuY29t X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.4 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) Hi all, as can be seen in this repository: https://github.com/shlomif/gnu-grep-trailing-space-and-CR-on-riddles.he-fal= se-match GNU grep says a document it suspects to be binary matches without -a/--text= and doesn't match it or return results with that flag applied. perl sides with = the latter. I'm on Mageia linux x86-64 v6 and have built GNU grep from the latest git commit ( c767ed70eca9a82d76f07dcdbcaafa21ec7f86d6 ) to test. Regards, Shlomi Fish P.S: it seems the build system uses gperf but configure does not verify tha= t it exists in the path. --=20 ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Interview with Ben Collins-Sussman - http://shlom.in/sussman Can I SCO now? Sue who you wanna sue, it doesn't matter anyhoo, it's time to litigate. =E2=80=94 http://www.shlomifish.org/humour/bits/Can-I-SCO-Now/ Please reply to list if it's a mailing list post - http://shlom.in/reply . From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 02:56:36 2016 Received: (at 23185-done) by debbugs.gnu.org; 6 Apr 2016 06:56:36 +0000 Received: from localhost ([127.0.0.1]:50907 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anhOG-0005XT-2j for submit@debbugs.gnu.org; Wed, 06 Apr 2016 02:56:36 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:57076) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anhOE-0005XG-EQ for 23185-done@debbugs.gnu.org; Wed, 06 Apr 2016 02:56:35 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id DF762160EF1; Tue, 5 Apr 2016 23:56:26 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 5BdXLur0VpsE; Tue, 5 Apr 2016 23:56:26 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 14BF716126F; Tue, 5 Apr 2016 23:56:26 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id JCmJlL8wuWgQ; Tue, 5 Apr 2016 23:56:25 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E7F4B160EF1; Tue, 5 Apr 2016 23:56:25 -0700 (PDT) Subject: Re: bug#23185: GNU grep matching discrepancy between -a/--text and not. To: Shlomi Fish , 23185-done@debbugs.gnu.org References: <20160402150012.37fd239e@telaviv1.shlomifish.org> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5704B316.2020601@cs.ucla.edu> Date: Tue, 5 Apr 2016 23:56:22 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160402150012.37fd239e@telaviv1.shlomifish.org> Content-Type: multipart/mixed; boundary="------------090801030602070609010805" X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23185-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------090801030602070609010805 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Thanks for pointing out the seeming inconsistency. The documentation ment= ions=20 the issue but is perhaps not clear enough, so I installed the attached pa= tch. The input file contains NUL bytes and so is treated as binary data, and t= he grep=20 documentation (secton "File and Directory Selection", option "--binary-fi= les")=20 says "When processing binary data, =E2=80=98grep=E2=80=99 may treat non-t= ext bytes as line=20 terminators". This behavior was added to GNU grep in release 2.21 dated 2= 014,=20 partly for performance reasons. There are two instances in riddle.he of a space followed by a NUL byte, s= o grep -P '[ \t]\r?$' riddles.he finds a match when the $ matches just before the NUL byte. -a is one way to get the behavior you evidently expected. Another (perhap= s=20 better) way is -z. The command: grep -zP '[ \t]\r?\n' riddles.he outputs nothing and exits with status 1. --------------090801030602070609010805 Content-Type: text/x-diff; name="0001-Give-another-example-of-binary-file-processing.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-Give-another-example-of-binary-file-processing.patch" >From 7cfd9d20773e1a67cb085a14206fd33274c64387 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 5 Apr 2016 23:53:30 -0700 Subject: [PATCH] Give another example of binary file processing Problem reported by Shlomi Fish * doc/grep.texi (File and Directory Selection): Document that 'q$' might match 'q' followed by a NUL if --binary-files=binary is in effect. --- doc/grep.texi | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/doc/grep.texi b/doc/grep.texi index 074113b..1d3d5cb 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -607,10 +607,6 @@ By default, @var{type} is @samp{binary}, and when @command{grep} discovers that a file is binary it suppresses any further output, and instead outputs either a one-line message saying that a binary file matches, or no message if there is no match. -When processing binary data, @command{grep} may treat non-text bytes -as line terminators; for example, the pattern @samp{.} (period) might -not match a null byte, as the null byte might be treated as a line -terminator even without the @option{-z} (@option{--null-data}) option. If @var{type} is @samp{without-match}, when @command{grep} discovers that a file is binary @@ -621,6 +617,16 @@ If @var{type} is @samp{text}, @command{grep} processes a binary file as if it were text; this is equivalent to the @option{-a} option. +When @var{type} is @samp{binary}, @command{grep} may treat non-text +bytes as line terminators even without the @option{-z} +(@option{--null-data}) option. This means choosing @samp{binary} +versus @samp{text} can affect whether a pattern matches a file. For +example, when @var{type} is @samp{binary} the pattern @samp{q$} might +match @samp{q} immediately followed by a null byte, even though this +is not matched when @var{type} is @samp{text}. Conversely, when +@var{type} is @samp{binary} the pattern @samp{.} (period) might not +match a null byte. + @emph{Warning:} @samp{--binary-files=text} might output binary garbage, which can have nasty side effects if the output is a terminal and -- 2.5.5 --------------090801030602070609010805-- From unknown Thu Sep 11 09:18:20 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 04 May 2016 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator