From unknown Sun Jul 27 03:51:47 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#22461 <22461@debbugs.gnu.org> To: bug#22461 <22461@debbugs.gnu.org> Subject: Status: problem with "Binary file messages" in latest snapshot Reply-To: bug#22461 <22461@debbugs.gnu.org> Date: Sun, 27 Jul 2025 10:51:47 +0000 retitle 22461 problem with "Binary file messages" in latest snapshot reassign 22461 grep submitter 22461 Paul Eggert severity 22461 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 25 04:20:04 2016 Received: (at submit) by debbugs.gnu.org; 25 Jan 2016 09:20:04 +0000 Received: from localhost ([127.0.0.1]:35461 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aNdJc-0001CB-Jo for submit@debbugs.gnu.org; Mon, 25 Jan 2016 04:20:04 -0500 Received: from eggs.gnu.org ([208.118.235.92]:54148) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aNdJa-0001Bf-WA for submit@debbugs.gnu.org; Mon, 25 Jan 2016 04:20:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aNdJU-0004IY-V9 for submit@debbugs.gnu.org; Mon, 25 Jan 2016 04:19:57 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:55693) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aNdJU-0004IU-SI for submit@debbugs.gnu.org; Mon, 25 Jan 2016 04:19:56 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45113) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aNdJU-00048p-0h for bug-grep@gnu.org; Mon, 25 Jan 2016 04:19:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aNdJO-0004I3-Rk for bug-grep@gnu.org; Mon, 25 Jan 2016 04:19:55 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:33636) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aNdJO-0004HO-LR for bug-grep@gnu.org; Mon, 25 Jan 2016 04:19:50 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id F407D1605E1 for ; Mon, 25 Jan 2016 01:19:47 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id WOpsnStqchKn for ; Mon, 25 Jan 2016 01:19:47 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 572031607DA for ; Mon, 25 Jan 2016 01:19:47 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id PVA7mpHus5Z2 for ; Mon, 25 Jan 2016 01:19:47 -0800 (PST) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 3BE4A1605E1 for ; Mon, 25 Jan 2016 01:19:47 -0800 (PST) To: grep mailing list From: Paul Eggert Subject: problem with "Binary file messages" in latest snapshot Organization: UCLA Computer Science Department Message-ID: <56A5E8AE.3090709@cs.ucla.edu> Date: Mon, 25 Jan 2016 01:19:42 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) I ran into this problem when using 'grep' to search through GNU Emacs installed files. Here's how to reproduce the problem: $ (echo xxx && yes yyy | sed 100000q && printf '\0') >big $ grep xxx big xxx Binary file big matches The last line should not be output. I'll look into fixing this. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 01 02:31:28 2016 Received: (at 22461-done) by debbugs.gnu.org; 1 Feb 2016 07:31:28 +0000 Received: from localhost ([127.0.0.1]:43833 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aQ8xM-0005fF-2f for submit@debbugs.gnu.org; Mon, 01 Feb 2016 02:31:28 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49630) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aQ8xK-0005f0-4e for 22461-done@debbugs.gnu.org; Mon, 01 Feb 2016 02:31:26 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A34D3160F53 for <22461-done@debbugs.gnu.org>; Sun, 31 Jan 2016 23:31:20 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id tEFkCEj7Y86g for <22461-done@debbugs.gnu.org>; Sun, 31 Jan 2016 23:31:18 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B60DF160F54 for <22461-done@debbugs.gnu.org>; Sun, 31 Jan 2016 23:31:18 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Ngob9jaEW7Ht for <22461-done@debbugs.gnu.org>; Sun, 31 Jan 2016 23:31:18 -0800 (PST) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 9539E160F53 for <22461-done@debbugs.gnu.org>; Sun, 31 Jan 2016 23:31:18 -0800 (PST) To: 22461-done@debbugs.gnu.org From: Paul Eggert Subject: Re: problem with "Binary file messages" in latest snapshot Organization: UCLA Computer Science Department Message-ID: <56AF09C6.8010408@cs.ucla.edu> Date: Sun, 31 Jan 2016 23:31:18 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------080407050007030205010604" X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 22461-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) This is a multi-part message in MIME format. --------------080407050007030205010604 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit I installed the attached patch, which should fix the bug, and am closing this. --------------080407050007030205010604 Content-Type: text/x-diff; name="0001-Omit-excess-Binary-file-.-matches.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-Omit-excess-Binary-file-.-matches.patch" >From 1d6609c299d2a51747c9bc9e82a399d53c54f8ea Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 31 Jan 2016 23:29:01 -0800 Subject: [PATCH] Omit excess "Binary file ... matches" Problem reported in: http://bugs.gnu.org/22461 * src/grep.c (grep): Don't report "Binary file ... matches" merely because the file contained both matches and binary data. Insist that the binary data contained a match. * tests/null-byte: Add a test for this. --- src/grep.c | 16 +++++++++++----- tests/null-byte | 5 +++++ 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/src/grep.c b/src/grep.c index 10aabf9..73c3651 100644 --- a/src/grep.c +++ b/src/grep.c @@ -1373,7 +1373,11 @@ grep (int fd, struct stat const *st) char nul_zapper = '\0'; bool done_on_match_0 = done_on_match; bool out_quiet_0 = out_quiet; - bool has_nulls = false; + + /* The value of NLINES when nulls were first deduced in the input; + this is not necessarily the same as the number of matching lines + before the first null. -1 if no input nulls have been deduced. */ + intmax_t nlines_first_null = -1; if (! reset (fd, st)) return 0; @@ -1400,15 +1404,15 @@ grep (int fd, struct stat const *st) for (bool firsttime = true; ; firsttime = false) { - if (!has_nulls && eol && binary_files != TEXT_BINARY_FILES + if (nlines_first_null < 0 && eol && binary_files != TEXT_BINARY_FILES && (buf_has_nulls (bufbeg, buflim - bufbeg) || (firsttime && file_must_have_nulls (buflim - bufbeg, fd, st)))) { - has_nulls = true; if (binary_files == WITHOUT_MATCH_BINARY_FILES) return 0; if (!count_matches) done_on_match = out_quiet = true; + nlines_first_null = nlines; nul_zapper = eol; skip_nuls = skip_empty_lines; } @@ -1445,7 +1449,8 @@ grep (int fd, struct stat const *st) nlines += grepbuf (beg, lim); if (pending) prpending (lim); - if ((!outleft && !pending) || (nlines && done_on_match)) + if ((!outleft && !pending) + || (done_on_match && MAX (0, nlines_first_null) < nlines)) goto finish_grep; } @@ -1490,7 +1495,8 @@ grep (int fd, struct stat const *st) finish_grep: done_on_match = done_on_match_0; out_quiet = out_quiet_0; - if ((has_nulls || encoding_error_output) && !out_quiet && nlines != 0) + if (!out_quiet && (encoding_error_output + || (0 <= nlines_first_null && nlines_first_null < nlines))) { printf (_("Binary file %s matches\n"), filename); if (line_buffered) diff --git a/tests/null-byte b/tests/null-byte index 44dad92..9a76887 100755 --- a/tests/null-byte +++ b/tests/null-byte @@ -51,4 +51,9 @@ for left in '' a '#' '\0'; do done done +(echo xxx && yes yyy | sed 100000q && printf '\0') >in || framework_failure_ +echo xxx >exp || framework_failure_ +grep xxx in >out || fail=1 +compare exp out || fail=1 + Exit $fail -- 2.5.0 --------------080407050007030205010604-- From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 01 11:20:43 2016 Received: (at 22461) by debbugs.gnu.org; 1 Feb 2016 16:20:43 +0000 Received: from localhost ([127.0.0.1]:45337 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aQHDW-00043V-TZ for submit@debbugs.gnu.org; Mon, 01 Feb 2016 11:20:43 -0500 Received: from mail-oi0-f46.google.com ([209.85.218.46]:35445) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aQHDU-00042z-Uh for 22461@debbugs.gnu.org; Mon, 01 Feb 2016 11:20:41 -0500 Received: by mail-oi0-f46.google.com with SMTP id p187so92750066oia.2 for <22461@debbugs.gnu.org>; Mon, 01 Feb 2016 08:20:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=+YwM9p4sUPbRsY5/Mpkq789+sk7Cgg/bS8D+TD7Cu0c=; b=kiOcRIcnzaIolL+9rRiWeDYadKgN39QYwMjNz/uA6uI37GjkG170Za98e7i24ZGiUt lZkVdfLY3YENoqd2HyB+5YRR6EyNBLIDXoSHVQcnpMhjB/1caronwaz/JWS9bZGJ3rmQ 6TdUDr9v9e28jTcYLCrDtNNi+jjSRwdQFAiA1Ubs39akWPlrdbceRqy/WwF8BFYKNtsg NNxFSyu3x6QCBI0uB33zkJVdcMsNMGKQ07y3rOicHOZy7dE7r32pZpqY7x/JljxY7ysf 8MAzTOpKhvL5VXm3ljR1ZJg9kbqNefwCU5Pz9sR9Gk70JGrcXLZAr2FeceN8Ma7BVGoA 5LLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-type; bh=+YwM9p4sUPbRsY5/Mpkq789+sk7Cgg/bS8D+TD7Cu0c=; b=PjWhUVTCRNLOlNlEFpEA6vry0R7lz4PAzcMmywGFYMR3rCSV+LStn4Tl6k4FJiuXXI JLqo3IEdG/QuLu3Xn7kESqrLTUCwRsOLW5B+18k+vpp0eqTnb30B2cNBdoB2ZqhEMwxD jJ2S/TGUbHJ1Dx03MH+aqoK7CY7icnlc0Zj8BtQRwVIkSowF6GalzkQiyXk7jyFAz+3w 17aPd95Z5JzBF3PhjqWopjKoSbA+9/DVpe9ezZm2r6iWX7xm4W/brXZN+vFxTnGiS4dF 9h72npYb2CJa4IP50oxItg0eUy4lOqh42XTzADp7aYLTYEfoqNVBd/el+iiselHLoXBH ZUWA== X-Gm-Message-State: AG10YORyI7IIDJ4vlVx6b/UnjD6LKs8z5fjEeJehxQp4KjoycoZ9buYMygBvyhaOoLDGIT1cS791xGaujg9s9w== X-Received: by 10.202.49.211 with SMTP id x202mr8095174oix.130.1454343635155; Mon, 01 Feb 2016 08:20:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.202.64.134 with HTTP; Mon, 1 Feb 2016 08:20:15 -0800 (PST) In-Reply-To: <56AF097A.3000703@cs.ucla.edu> References: <56AAC17C.7060307@cs.ucla.edu> <56AF097A.3000703@cs.ucla.edu> From: Jim Meyering Date: Mon, 1 Feb 2016 08:20:15 -0800 X-Google-Sender-Auth: UqEVOOS84ppYMfwlp3Lh8yVl5CE Message-ID: Subject: Re: bug#22443: Subject: new snapshot available: grep-2.22.31-8b6a To: Paul Eggert Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22461 Cc: 22461@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Sun, Jan 31, 2016 at 11:30 PM, Paul Eggert wrote: > Jim Meyering wrote: >> >> I looked into it and do not see a way to fix it with reasonable cost. >> When grep finds a NUL byte, it punts on the entire block. > > > Hmm, I think of it differently. When grep finds a NUL it records that the > file is binary but if no match has been found so far, it keeps looking for > the first match, either in the block containing the NUL or in a later block. > In this case grep stops reading the file only after finding a match, and it > works correctly. > > The problem occurs if grep finds one or more text matches before the first > NUL: it reports those matches, records the fact that it found them, then > sees the NUL, then stops reading that file, and then the calling code > notices that this was (1) a binary file that (2) contained matches, so > outputs the "Binary file ... matches" message. This is wrong, because no > binary data actually matched. > > When grep finds a NUL, it should record that the file is binary and then > look for one more match after the NUL, then quit reading the file and report > "Binary file ... matches" only if it found that one more match. > > The code is complicated by the fact that the file could also be binary > because an output line contains an encoding error, something that's detected > in a different part of the code. > > Argh, I'm taking too long to explain this. It's easier to fix than to > explain. I installed a patch; what do you think? Oh, I see, now. Nicely done. I noticed only now that I replied to you off-list. Didn't mean to. So am adding the bug email in Cc, so your explanation is recorded there. I added one more test case: http://git.savannah.gnu.org/cgit/grep.git/commit/?id=43f6246fe82f1 From unknown Sun Jul 27 03:51:47 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 01 Mar 2016 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator