From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: "L. A. Walsh" Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 02 Feb 2018 19:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 30326@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.15175998304547 (code B ref -1); Fri, 02 Feb 2018 19:31:02 +0000 Received: (at submit) by debbugs.gnu.org; 2 Feb 2018 19:30:30 +0000 Received: from localhost ([127.0.0.1]:54554 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehh2b-0001BH-Vd for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:30:30 -0500 Received: from eggs.gnu.org ([208.118.235.92]:58930) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehh2a-0001B5-DG for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:30:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ehh2U-0000DJ-F1 for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:30:23 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:51638) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ehh2U-0000D5-C2 for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:30:22 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49873) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ehh2S-0004KL-UH for bug-grep@gnu.org; Fri, 02 Feb 2018 14:30:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ehh2O-0000Ag-1U for bug-grep@gnu.org; Fri, 02 Feb 2018 14:30:20 -0500 Received: from ishtar.tlinx.org ([173.164.175.65]:56832 helo=Ishtar.sc.tlinx.org) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ehh2N-00007t-Gr for bug-grep@gnu.org; Fri, 02 Feb 2018 14:30:15 -0500 Received: from [192.168.3.12] (Athenae [192.168.3.12]) by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id w12JU8FQ032278 for ; Fri, 2 Feb 2018 11:30:10 -0800 Message-ID: <5A74BC3F.1030401@tlinx.org> Date: Fri, 02 Feb 2018 11:30:07 -0800 From: "L. A. Walsh" User-Agent: Thunderbird MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) I've used grep to search through my mbox-format emails for decades, but I've run into a case where it seems to be ignore a text mailbox because, I guess, it thinks it is "binary" (I think ignoring binary is a default in my aliases file). I used: > grep -Pr 'Game:\s+NCSOFT' * and it ignored a mailbox named 'Domain': that contained the string: " =E2=80=A2=09Game: NCSOFT" > file Domain Domain: Non-ISO extended-ASCII text, with very long lines If I used "-Par" it finds it. It seems that grep believes the file to binary and ignores it, though "file" calls it "text". Any ideas? grep -V grep (GNU grep) 2.21.31-adf9 Maybe grep is being a bit overzealous in calling files 'binary'? From debbugs-submit-bounces@debbugs.gnu.org Fri Feb 02 14:55:10 2018 Received: (at control) by debbugs.gnu.org; 2 Feb 2018 19:55:10 +0000 Received: from localhost ([127.0.0.1]:54577 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehhQU-0001jn-9R for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:55:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49024) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehhQS-0001jV-CJ; Fri, 02 Feb 2018 14:55:08 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A2C2237E7B; Fri, 2 Feb 2018 19:55:02 +0000 (UTC) Received: from [10.10.120.51] (ovpn-120-51.rdu2.redhat.com [10.10.120.51]) by smtp.corp.redhat.com (Postfix) with ESMTP id E451B67DEF; Fri, 2 Feb 2018 19:55:01 +0000 (UTC) Subject: Re: bug#30326: grep not searching through a text file (thinking it binary) To: "L. A. Walsh" , 30326-done@debbugs.gnu.org, GNU bug control References: <5A74BC3F.1030401@tlinx.org> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> Date: Fri, 2 Feb 2018 13:55:00 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <5A74BC3F.1030401@tlinx.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="rkTFVqYNTxOzqI7zanAw7m6dBglPT1hp8" X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 02 Feb 2018 19:55:02 +0000 (UTC) X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --rkTFVqYNTxOzqI7zanAw7m6dBglPT1hp8 Content-Type: multipart/mixed; boundary="ovvDdxf6o9ny5GxWSiJtLenKE8BfnrqTP"; protected-headers="v1" From: Eric Blake To: "L. A. Walsh" , 30326-done@debbugs.gnu.org, GNU bug control Message-ID: <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> Subject: Re: bug#30326: grep not searching through a text file (thinking it binary) References: <5A74BC3F.1030401@tlinx.org> In-Reply-To: <5A74BC3F.1030401@tlinx.org> --ovvDdxf6o9ny5GxWSiJtLenKE8BfnrqTP Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable tag 30326 notabug thanks On 02/02/2018 01:30 PM, L. A. Walsh wrote: > I've used grep to search through my mbox-format emails for decades, but= > I've run into a case where it seems to be ignore a text mailbox > because, I guess, it thinks it is "binary" Yes, that's correct. > If I used "-Par" it finds it. Yes, that's also correct. >=20 > It seems that grep believes the file to binary and ignores it, though > "file" calls it "text". The file is conditionally text. The POSIX definition of a text file is one whose lines consist of valid characters in the current locale - but note this definition is locale-dependent! So a file that is text under one locale may be binary under another. When you are grepping a file encoded correctly for the current locale, you get the output you want; when you are grepping a file that contains encoding errors for the current locale, POSIX says behavior is undefined, so GNU grep warns you that the file is binary (in the current locale); and your use of -a tells grep to process it anyways. As 'file' reported that your file was using non-ISO extended-ASCII, it probable means the file was encoded for an 8-bit single-byte locale; and my guess is that you were running grep under a UTF-8 locale, and generally, UTF-8 treats 8-bit single-byte inputs as encoding errors. Hence the warning that your file is binary, under the current locale. You can also use 'LC_ALL=3DC grep' to force a locale where EVERY byte is = a valid character, and thus where you will never encounter encoding errors (you may encounter OTHER things that make your file binary, such as embedded NULs, but that's a different matter). This behavior is documented and intentional, so I'm closing this as not a bug in the tracker. However, feel free to add further comments or questions to the thread. And perhaps we could tweak the grep diagnostics to clarify whether a file is binary because NUL bytes were encountered, vs. a file is binary because encoding errors were encountered. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org --ovvDdxf6o9ny5GxWSiJtLenKE8BfnrqTP-- --rkTFVqYNTxOzqI7zanAw7m6dBglPT1hp8 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEzBAEBCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAlp0whQACgkQp6FrSiUn Q2pHpAgAlrJkU2Jcq63V8zZctfKtgIEUI1iG89qQ+PyAxKEslc9TMby25AhfsMNu UXutMh0Vxd/hriLKpG5G6fp2gNtcXSvyAE3J4wP32dcGhbg4TX4sDDmIxtmqK2UU kDYv+9T5bFL4M2s80ZcouSLqciEkWSMG7oOptexj/OpNnWAF5ndYu898dQQ3XQd9 VOrlIwFsUB4+pEoN9pN1AQXWHFEiak+rNPeej+j2c8bNNAvNuQ4Yd+Ggjiv5APpK 3Tmxs3cEwVUx+Zz1mXx6QhxTW6bDG0G9hopAGCSWfcVlXM2Jrbm5Ex0Uf9uyd0tX KjNRNY50l4zkD/wZ2ZVrynHtLCoYkQ== =F0gd -----END PGP SIGNATURE----- --rkTFVqYNTxOzqI7zanAw7m6dBglPT1hp8-- From unknown Sat Jun 21 12:14:34 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: "L. A. Walsh" Subject: bug#30326: closed (Re: bug#30326: grep not searching through a text file (thinking it binary)) Message-ID: References: <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74BC3F.1030401@tlinx.org> X-Gnu-PR-Message: they-closed 30326 X-Gnu-PR-Package: grep X-Gnu-PR-Keywords: notabug Reply-To: 30326@debbugs.gnu.org Date: Fri, 02 Feb 2018 19:56:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1517601362-6764-1" This is a multi-part message in MIME format... ------------=_1517601362-6764-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #30326: grep not searching through a text file (thinking it binary) which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 30326@debbugs.gnu.org. --=20 30326: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D30326 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1517601362-6764-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 30326-done) by debbugs.gnu.org; 2 Feb 2018 19:55:10 +0000 Received: from localhost ([127.0.0.1]:54575 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehhQU-0001jl-09 for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:55:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49024) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehhQS-0001jV-CJ; Fri, 02 Feb 2018 14:55:08 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A2C2237E7B; Fri, 2 Feb 2018 19:55:02 +0000 (UTC) Received: from [10.10.120.51] (ovpn-120-51.rdu2.redhat.com [10.10.120.51]) by smtp.corp.redhat.com (Postfix) with ESMTP id E451B67DEF; Fri, 2 Feb 2018 19:55:01 +0000 (UTC) Subject: Re: bug#30326: grep not searching through a text file (thinking it binary) To: "L. A. Walsh" , 30326-done@debbugs.gnu.org, GNU bug control References: <5A74BC3F.1030401@tlinx.org> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> Date: Fri, 2 Feb 2018 13:55:00 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <5A74BC3F.1030401@tlinx.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="rkTFVqYNTxOzqI7zanAw7m6dBglPT1hp8" X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 02 Feb 2018 19:55:02 +0000 (UTC) X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 30326-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --rkTFVqYNTxOzqI7zanAw7m6dBglPT1hp8 Content-Type: multipart/mixed; boundary="ovvDdxf6o9ny5GxWSiJtLenKE8BfnrqTP"; protected-headers="v1" From: Eric Blake To: "L. A. Walsh" , 30326-done@debbugs.gnu.org, GNU bug control Message-ID: <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> Subject: Re: bug#30326: grep not searching through a text file (thinking it binary) References: <5A74BC3F.1030401@tlinx.org> In-Reply-To: <5A74BC3F.1030401@tlinx.org> --ovvDdxf6o9ny5GxWSiJtLenKE8BfnrqTP Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable tag 30326 notabug thanks On 02/02/2018 01:30 PM, L. A. Walsh wrote: > I've used grep to search through my mbox-format emails for decades, but= > I've run into a case where it seems to be ignore a text mailbox > because, I guess, it thinks it is "binary" Yes, that's correct. > If I used "-Par" it finds it. Yes, that's also correct. >=20 > It seems that grep believes the file to binary and ignores it, though > "file" calls it "text". The file is conditionally text. The POSIX definition of a text file is one whose lines consist of valid characters in the current locale - but note this definition is locale-dependent! So a file that is text under one locale may be binary under another. When you are grepping a file encoded correctly for the current locale, you get the output you want; when you are grepping a file that contains encoding errors for the current locale, POSIX says behavior is undefined, so GNU grep warns you that the file is binary (in the current locale); and your use of -a tells grep to process it anyways. As 'file' reported that your file was using non-ISO extended-ASCII, it probable means the file was encoded for an 8-bit single-byte locale; and my guess is that you were running grep under a UTF-8 locale, and generally, UTF-8 treats 8-bit single-byte inputs as encoding errors. Hence the warning that your file is binary, under the current locale. You can also use 'LC_ALL=3DC grep' to force a locale where EVERY byte is = a valid character, and thus where you will never encounter encoding errors (you may encounter OTHER things that make your file binary, such as embedded NULs, but that's a different matter). This behavior is documented and intentional, so I'm closing this as not a bug in the tracker. However, feel free to add further comments or questions to the thread. And perhaps we could tweak the grep diagnostics to clarify whether a file is binary because NUL bytes were encountered, vs. a file is binary because encoding errors were encountered. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org --ovvDdxf6o9ny5GxWSiJtLenKE8BfnrqTP-- --rkTFVqYNTxOzqI7zanAw7m6dBglPT1hp8 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEzBAEBCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAlp0whQACgkQp6FrSiUn Q2pHpAgAlrJkU2Jcq63V8zZctfKtgIEUI1iG89qQ+PyAxKEslc9TMby25AhfsMNu UXutMh0Vxd/hriLKpG5G6fp2gNtcXSvyAE3J4wP32dcGhbg4TX4sDDmIxtmqK2UU kDYv+9T5bFL4M2s80ZcouSLqciEkWSMG7oOptexj/OpNnWAF5ndYu898dQQ3XQd9 VOrlIwFsUB4+pEoN9pN1AQXWHFEiak+rNPeej+j2c8bNNAvNuQ4Yd+Ggjiv5APpK 3Tmxs3cEwVUx+Zz1mXx6QhxTW6bDG0G9hopAGCSWfcVlXM2Jrbm5Ex0Uf9uyd0tX KjNRNY50l4zkD/wZ2ZVrynHtLCoYkQ== =F0gd -----END PGP SIGNATURE----- --rkTFVqYNTxOzqI7zanAw7m6dBglPT1hp8-- ------------=_1517601362-6764-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 2 Feb 2018 19:30:30 +0000 Received: from localhost ([127.0.0.1]:54554 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehh2b-0001BH-Vd for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:30:30 -0500 Received: from eggs.gnu.org ([208.118.235.92]:58930) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehh2a-0001B5-DG for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:30:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ehh2U-0000DJ-F1 for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:30:23 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:51638) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ehh2U-0000D5-C2 for submit@debbugs.gnu.org; Fri, 02 Feb 2018 14:30:22 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49873) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ehh2S-0004KL-UH for bug-grep@gnu.org; Fri, 02 Feb 2018 14:30:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ehh2O-0000Ag-1U for bug-grep@gnu.org; Fri, 02 Feb 2018 14:30:20 -0500 Received: from ishtar.tlinx.org ([173.164.175.65]:56832 helo=Ishtar.sc.tlinx.org) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ehh2N-00007t-Gr for bug-grep@gnu.org; Fri, 02 Feb 2018 14:30:15 -0500 Received: from [192.168.3.12] (Athenae [192.168.3.12]) by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id w12JU8FQ032278 for ; Fri, 2 Feb 2018 11:30:10 -0800 Message-ID: <5A74BC3F.1030401@tlinx.org> Date: Fri, 02 Feb 2018 11:30:07 -0800 From: "L. A. Walsh" User-Agent: Thunderbird MIME-Version: 1.0 To: bug-grep@gnu.org Subject: grep not searching through a text file (thinking it binary) Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) I've used grep to search through my mbox-format emails for decades, but I've run into a case where it seems to be ignore a text mailbox because, I guess, it thinks it is "binary" (I think ignoring binary is a default in my aliases file). I used: > grep -Pr 'Game:\s+NCSOFT' * and it ignored a mailbox named 'Domain': that contained the string: " =E2=80=A2=09Game: NCSOFT" > file Domain Domain: Non-ISO extended-ASCII text, with very long lines If I used "-Par" it finds it. It seems that grep believes the file to binary and ignores it, though "file" calls it "text". Any ideas? grep -V grep (GNU grep) 2.21.31-adf9 Maybe grep is being a bit overzealous in calling files 'binary'? ------------=_1517601362-6764-1-- From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: L A Walsh Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 02 Feb 2018 20:10:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug Cc: 30326-done@debbugs.gnu.org, GNU bug control Received: via spool by 30326-done@debbugs.gnu.org id=D30326.151760216815369 (code D ref 30326); Fri, 02 Feb 2018 20:10:02 +0000 Received: (at 30326-done) by debbugs.gnu.org; 2 Feb 2018 20:09:28 +0000 Received: from localhost ([127.0.0.1]:54601 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehheK-0003zk-3L for submit@debbugs.gnu.org; Fri, 02 Feb 2018 15:09:28 -0500 Received: from ishtar.tlinx.org ([173.164.175.65]:42360 helo=Ishtar.sc.tlinx.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehheI-0003zX-QQ; Fri, 02 Feb 2018 15:09:27 -0500 Received: from [192.168.3.12] (Athenae [192.168.3.12]) by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id w12K9NRQ036026; Fri, 2 Feb 2018 12:09:25 -0800 Message-ID: <5A74C573.7030101@tlinx.org> Date: Fri, 02 Feb 2018 12:09:23 -0800 From: L A Walsh User-Agent: Thunderbird MIME-Version: 1.0 References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> In-Reply-To: <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Grep was around long before POSIX, as were most of the unix utils. Grep was able to find text strings in mboxes without a POSIX definition telling it that it was "broken". I don't want it displaying random binary that throws my terminal into weird modes, which is why I skip binary files. To have grep searching through some mailboxes while skipping others, randomly based on what email happens to be in the box at the time, is hardly a useful utility. [...] Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 1.2 MISSING_HEADERS Missing To: header X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Grep was around long before POSIX, as were most of the unix utils. Grep was able to find text strings in mboxes without a POSIX definition telling it that it was "broken". I don't want it displaying random binary that throws my terminal into weird modes, which is why I skip binary files. To have grep searching through some mailboxes while skipping others, randomly based on what email happens to be in the box at the time, is hardly a useful utility. [...] Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 1.2 MISSING_HEADERS Missing To: header Grep was around long before POSIX, as were most of the unix utils. Grep was able to find text strings in mboxes without a POSIX definition telling it that it was "broken". I don't want it displaying random binary that throws my terminal into weird modes, which is why I skip binary files. To have grep searching through some mailboxes while skipping others, randomly based on what email happens to be in the box at the time, is hardly a useful utility. I did not ask for POSIXLY_CORRECT -- if you need to have it be POSIXLY Correct, then use the existing var, but grep is now broken -- since POSIX doesn't define "text" files "out in the real world", but only for files that adhere to the POSIX standard. People don't write emails that adhere to the POSIX standard. Also, FWIW, grep's manpage doesn't say it is limited to posix-only files. It's summary says: grep, egrep, fgrep - print lines matching a pattern which it does not do. It doesn't say "print lines matching a pattern only from POSIX text files. Eric Blake wrote: > tag 30326 notabug > thanks > > On 02/02/2018 01:30 PM, L. A. Walsh wrote: > >> I've used grep to search through my mbox-format emails for decades, but >> I've run into a case where it seems to be ignore a text mailbox >> because, I guess, it thinks it is "binary" >> > > Yes, that's correct. > > >> If I used "-Par" it finds it. >> > > Yes, that's also correct. > > >> It seems that grep believes the file to binary and ignores it, though >> "file" calls it "text". >> > > The file is conditionally text. The POSIX definition of a text file is > one whose lines consist of valid characters in the current locale - but > note this definition is locale-dependent! So a file that is text under > one locale may be binary under another. When you are grepping a file > encoded correctly for the current locale, you get the output you want; > when you are grepping a file that contains encoding errors for the > current locale, POSIX says behavior is undefined, so GNU grep warns you > that the file is binary (in the current locale); and your use of -a > tells grep to process it anyways. As 'file' reported that your file was > using non-ISO extended-ASCII, it probable means the file was encoded for > an 8-bit single-byte locale; and my guess is that you were running grep > under a UTF-8 locale, and generally, UTF-8 treats 8-bit single-byte > inputs as encoding errors. Hence the warning that your file is binary, > under the current locale. > > You can also use 'LC_ALL=C grep' to force a locale where EVERY byte is a > valid character, and thus where you will never encounter encoding errors > (you may encounter OTHER things that make your file binary, such as > embedded NULs, but that's a different matter). > > This behavior is documented and intentional, so I'm closing this as not > a bug in the tracker. However, feel free to add further comments or > questions to the thread. > > And perhaps we could tweak the grep diagnostics to clarify whether a > file is binary because NUL bytes were encountered, vs. a file is binary > because encoding errors were encountered. > > From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 02 Feb 2018 23:10:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: L A Walsh Cc: 30326-done@debbugs.gnu.org, GNU bug control Received: via spool by 30326-done@debbugs.gnu.org id=D30326.15176129555986 (code D ref 30326); Fri, 02 Feb 2018 23:10:03 +0000 Received: (at 30326-done) by debbugs.gnu.org; 2 Feb 2018 23:09:15 +0000 Received: from localhost ([127.0.0.1]:54709 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehkSI-0001YT-Qu for submit@debbugs.gnu.org; Fri, 02 Feb 2018 18:09:14 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46844) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehkSH-0001YB-N3; Fri, 02 Feb 2018 18:09:14 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E4C631615A9; Fri, 2 Feb 2018 15:09:07 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id kqdHLNWE-ucZ; Fri, 2 Feb 2018 15:09:07 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3B0AA1615AC; Fri, 2 Feb 2018 15:09:07 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 116YXEokzOQ7; Fri, 2 Feb 2018 15:09:07 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1E0D5161588; Fri, 2 Feb 2018 15:09:07 -0800 (PST) References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> Date: Fri, 2 Feb 2018 15:09:06 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <5A74C573.7030101@tlinx.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) On 02/02/2018 12:09 PM, L A Walsh wrote: > Grep was able to find text strings in mboxes without a POSIX > definition telling it that it was "broken". It's not a question of POSIX telling us what to do. It's a question of what is a good thing for GNU grep to do, and making sure that this behavior conforms to POSIX (at least if POSIXLY_CORRECT is set). When grep encounters binary data, there are different "good" things to do depending on the application, so grep has options. The behavior you're asking for is available as an option. As I understand it, the main point of your bug report is that you want the option to be the default behavior. However, that would adversely affect some other common uses of grep and it's not clear that it's a good idea. From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: L A Walsh Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 02 Feb 2018 23:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Paul Eggert Cc: 30326-done@debbugs.gnu.org, GNU bug control Received: via spool by 30326-done@debbugs.gnu.org id=D30326.151761342414038 (code D ref 30326); Fri, 02 Feb 2018 23:18:02 +0000 Received: (at 30326-done) by debbugs.gnu.org; 2 Feb 2018 23:17:04 +0000 Received: from localhost ([127.0.0.1]:54722 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehkZr-0003eG-TE for submit@debbugs.gnu.org; Fri, 02 Feb 2018 18:17:04 -0500 Received: from ishtar.tlinx.org ([173.164.175.65]:49054 helo=Ishtar.sc.tlinx.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehkZp-0003Zw-04; Fri, 02 Feb 2018 18:17:01 -0500 Received: from [192.168.3.12] (Athenae [192.168.3.12]) by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id w12NGnPi051694; Fri, 2 Feb 2018 15:16:51 -0800 Message-ID: <5A74F161.2060403@tlinx.org> Date: Fri, 02 Feb 2018 15:16:49 -0800 From: L A Walsh User-Agent: Thunderbird MIME-Version: 1.0 References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> In-Reply-To: <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Paul Eggert wrote: > On 02/02/2018 12:09 PM, L A Walsh wrote: > >> Grep was able to find text strings in mboxes without a POSIX >> definition telling it that it was "broken". >> > > It's not a question of POSIX telling us what to do. It's a question of > what is a good thing for GNU grep to do, and making sure that this > behavior conforms to POSIX (at least if POSIXLY_CORRECT is set). > In this case it is not. > When grep encounters binary data, there are different "good" things to > do depending on the application, so grep has options. The behavior > you're asking for is available as an option. It also used to be the default. I still don't want it to search through a core or executable if they happened to be in the same directory. But email is organized in lines -- and I don't think I've ever had it spew binary out to my screen (for an email search). (i.e. I want it to work as it used to work, pre-posix, but still filtering out binary files. In this case "file" is able to determine that it is a text file. Grep used to get it right after the option was added to skip binary files, but before it had to be well-formed posix text. FWIW, grep does handle at least 1 "binary case" -- when last line doesn't have a linefeed -- something that some would like to believe indiates binary -- but grep still handles that as text resulting in some differing output when piped through "wc". > As I understand it, the > main point of your bug report is that you want the option to be the > default behavior. However, that would adversely affect some other common > uses of grep and it's not clear that it's a good idea. > > From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 02 Feb 2018 23:20:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: L A Walsh Cc: 30326-done@debbugs.gnu.org, GNU bug control Received: via spool by 30326-done@debbugs.gnu.org id=D30326.151761358414287 (code D ref 30326); Fri, 02 Feb 2018 23:20:02 +0000 Received: (at 30326-done) by debbugs.gnu.org; 2 Feb 2018 23:19:44 +0000 Received: from localhost ([127.0.0.1]:54729 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehkcS-0003iL-B5 for submit@debbugs.gnu.org; Fri, 02 Feb 2018 18:19:44 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49032) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehkcR-0003i7-5b; Fri, 02 Feb 2018 18:19:43 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id AF9B5161588; Fri, 2 Feb 2018 15:19:37 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id edQmMTUyKZKR; Fri, 2 Feb 2018 15:19:37 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 10B951615AC; Fri, 2 Feb 2018 15:19:37 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 6UVThZs1OHmo; Fri, 2 Feb 2018 15:19:36 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id EA43D161588; Fri, 2 Feb 2018 15:19:36 -0800 (PST) References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> Date: Fri, 2 Feb 2018 15:19:36 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <5A74F161.2060403@tlinx.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) On 02/02/2018 03:16 PM, L A Walsh wrote: > It also used to be the default. Single-byte locales also used to be the default. Times have changed, and things have gotten more complicated. We don't change default behavior for no reason, but we also don't keep the default the same even when the world has changed and another default behavior would typically be better. From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: L A Walsh Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 02 Feb 2018 23:31:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Paul Eggert Cc: 30326-done@debbugs.gnu.org Received: via spool by 30326-done@debbugs.gnu.org id=D30326.151761424418170 (code D ref 30326); Fri, 02 Feb 2018 23:31:01 +0000 Received: (at 30326-done) by debbugs.gnu.org; 2 Feb 2018 23:30:44 +0000 Received: from localhost ([127.0.0.1]:54738 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehkn6-0004ij-If for submit@debbugs.gnu.org; Fri, 02 Feb 2018 18:30:44 -0500 Received: from ishtar.tlinx.org ([173.164.175.65]:49560 helo=Ishtar.sc.tlinx.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehkn5-0004gG-0v for 30326-done@debbugs.gnu.org; Fri, 02 Feb 2018 18:30:43 -0500 Received: from [192.168.3.12] (Athenae [192.168.3.12]) by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id w12NUd1U052778; Fri, 2 Feb 2018 15:30:41 -0800 Message-ID: <5A74F49D.3050909@tlinx.org> Date: Fri, 02 Feb 2018 15:30:37 -0800 From: L A Walsh User-Agent: Thunderbird MIME-Version: 1.0 References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> In-Reply-To: <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Paul Eggert wrote: > On 02/02/2018 03:16 PM, L A Walsh wrote: > >> It also used to be the default. >> > > Single-byte locales also used to be the default. Times have changed, and > things have gotten more complicated. We don't change default behavior > for no reason, but we also don't keep the default the same even when the > world has changed and another default behavior would typically be better. > But most computer files (vs. user-files) are still single-byte. Even UTF-8 is mostly single byte, though treating it as a text-stream works most of the time -- unless the user specifies a character above 0x7e. I more often use a search in a GUI for user-based files. From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 02 Feb 2018 23:45:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: L A Walsh Cc: 30326-done@debbugs.gnu.org Received: via spool by 30326-done@debbugs.gnu.org id=D30326.151761509523798 (code D ref 30326); Fri, 02 Feb 2018 23:45:01 +0000 Received: (at 30326-done) by debbugs.gnu.org; 2 Feb 2018 23:44:55 +0000 Received: from localhost ([127.0.0.1]:54747 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehl0p-0006Bl-36 for submit@debbugs.gnu.org; Fri, 02 Feb 2018 18:44:55 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:53772) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehl0n-0006BZ-BT for 30326-done@debbugs.gnu.org; Fri, 02 Feb 2018 18:44:53 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C90B71615B2; Fri, 2 Feb 2018 15:44:47 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id CLXuTg_MjNPM; Fri, 2 Feb 2018 15:44:46 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 324A71615C9; Fri, 2 Feb 2018 15:44:46 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ZZx-N765M1Ir; Fri, 2 Feb 2018 15:44:46 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 18F521615B2; Fri, 2 Feb 2018 15:44:46 -0800 (PST) References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: Date: Fri, 2 Feb 2018 15:44:45 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <5A74F49D.3050909@tlinx.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) On 02/02/2018 03:30 PM, L A Walsh wrote: > most computer files (vs. user-files) are still single-byte. That's because so many of them are ASCII. But ASCII files are not the issue here. grep's behavior hasn't changed when operating on ASCII files in typical locales. The issue is text using a non-ASCII encoding that is not compatible with your locale; e.g., if your text file uses ISO 8859-1 but your locale specifies UTF-8. In my experience, UTF-8 has long been winning this battle, in the sense that UTF-8 is by far the dominant encoding for the non-ASCII files I regularly use. So I use a UTF-8 locale, and suggest this as a good default for most users nowadays. It's not possible to get direct statistics about encoding for all user files. However, we can see what's being published on the web. Currently UTF-8 is being used by about 90% of public websites whose character encoding can be determined, according to the latest W3Techs survey. ISO 8859-1 is in second place, at about 4%. See: https://w3techs.com/technologies/overview/character_encoding/all From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: L A Walsh Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 03 Feb 2018 00:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Paul Eggert Cc: 30326-done@debbugs.gnu.org Received: via spool by 30326-done@debbugs.gnu.org id=D30326.151761912429628 (code D ref 30326); Sat, 03 Feb 2018 00:53:02 +0000 Received: (at 30326-done) by debbugs.gnu.org; 3 Feb 2018 00:52:04 +0000 Received: from localhost ([127.0.0.1]:54772 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehm3o-0007ho-6n for submit@debbugs.gnu.org; Fri, 02 Feb 2018 19:52:04 -0500 Received: from ishtar.tlinx.org ([173.164.175.65]:52264 helo=Ishtar.sc.tlinx.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ehm3l-0007hN-Kt for 30326-done@debbugs.gnu.org; Fri, 02 Feb 2018 19:52:02 -0500 Received: from [192.168.3.12] (Athenae [192.168.3.12]) by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id w130pt58057807; Fri, 2 Feb 2018 16:51:57 -0800 Message-ID: <5A7507AB.7060402@tlinx.org> Date: Fri, 02 Feb 2018 16:51:55 -0800 From: L A Walsh User-Agent: Thunderbird MIME-Version: 1.0 References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Paul Eggert wrote: > On 02/02/2018 03:30 PM, L A Walsh wrote: > > most computer files (vs. user-files) are still single-byte. > > That's because so many of them are ASCII. But ASCII files are not the > issue here. grep's behavior hasn't changed when operating on ASCII files > in typical locales. The issue is text using a non-ASCII encoding that is > not compatible with your locale; e.g., if your text file uses ISO 8859-1 > but your locale specifies UTF-8. ---- I've had my locale as UTF-8 since around 2000. My music collection needed french, english, middle east, and now japanese chars -- so I set things to UTF-8. I didn't need perfection. For the email, I needed to know what files the text was in so I could look at those mbox's with a mail-reader or with a text editor. I needed grep to work as a 1st level search tool. It's failed on that score. Still if it just searched for the bytes that I put in the search string, I'm not sure how it would "go wrong". > > In my experience, UTF-8 has long been winning this battle, in the sense > that UTF-8 is by far the dominant encoding for the non-ASCII files I > regularly use. So I use a UTF-8 locale, and suggest this as a good > default for most users nowadays. > > It's not possible to get direct statistics about encoding for all user > files. However, we can see what's being published on the web. Currently > UTF-8 is being used by about 90% of public websites whose character > encoding can be determined, according to the latest W3Techs survey. ISO > 8859-1 is in second place, at about 4%. See: > > https://w3techs.com/technologies/overview/character_encoding/all > Whereas this one was: Domain: Non-ISO extended-ASCII text, with very long lines So theoretically, it would never match any locale. Problem is on a mailbox, different emails can have different encodings. But I didn't care -- I typed in an ascii string -- so let it search in octets w/no encoding. It's also such that in a mailbox it's very likely there are going to be lines (maybe "very long lines"), but the text I was searching for was <80 chars. I'm really surprised it was decided to break compat -- as I've been doing searches like this for over 2 decades - not often, mind you, but it's one of the big advantages for me of keeping mailboxes for my IMAP server in mbox format. Maildir format or others would kill search ability with slow file-IO. ;^/ From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 04 Feb 2018 17:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: L A Walsh Cc: 30326-done@debbugs.gnu.org Received: via spool by 30326-done@debbugs.gnu.org id=D30326.151776540312920 (code D ref 30326); Sun, 04 Feb 2018 17:31:02 +0000 Received: (at 30326-done) by debbugs.gnu.org; 4 Feb 2018 17:30:03 +0000 Received: from localhost ([127.0.0.1]:57286 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eiO79-0003MJ-6D for submit@debbugs.gnu.org; Sun, 04 Feb 2018 12:30:03 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:50890) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eiO77-0003LQ-EI for 30326-done@debbugs.gnu.org; Sun, 04 Feb 2018 12:30:01 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 82DA3161615; Sun, 4 Feb 2018 09:29:55 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id z0SapOdRETWc; Sun, 4 Feb 2018 09:29:54 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C873116161E; Sun, 4 Feb 2018 09:29:54 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9dlVqW0YYM7j; Sun, 4 Feb 2018 09:29:54 -0800 (PST) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A7D9A161615; Sun, 4 Feb 2018 09:29:54 -0800 (PST) References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> <5A7507AB.7060402@tlinx.org> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <2e031d27-ba2b-6e17-7885-a1de9a904681@cs.ucla.edu> Date: Sun, 4 Feb 2018 09:29:54 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <5A7507AB.7060402@tlinx.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) L A Walsh wrote: > I didn't care Some users do care: they don't want grep to output binary junk that may m= ess up=20 their screen. > Problem is on a mailbox, different emails can have different encodings. There's no general solution to that problem. No matter what grep does, it= will=20 mishandle some cases. At best the user will get an approximation to what = is=20 really wanted. And there will be some cases where grep's default behavior= (no=20 matter what the default is) will do the "wrong" thing. From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Jackson Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 05 Feb 2018 16:06:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: 30326@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.151784672219064 (code B ref -1); Mon, 05 Feb 2018 16:06:01 +0000 Received: (at submit) by debbugs.gnu.org; 5 Feb 2018 16:05:22 +0000 Received: from localhost ([127.0.0.1]:58933 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eijGk-0004xP-5g for submit@debbugs.gnu.org; Mon, 05 Feb 2018 11:05:22 -0500 Received: from eggs.gnu.org ([208.118.235.92]:48485) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eijGi-0004xB-VM for submit@debbugs.gnu.org; Mon, 05 Feb 2018 11:05:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eijGc-00057x-Oz for submit@debbugs.gnu.org; Mon, 05 Feb 2018 11:05:15 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:54626) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eijGc-00057V-Lo for submit@debbugs.gnu.org; Mon, 05 Feb 2018 11:05:14 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39449) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eijGb-0005Pv-Be for bug-grep@gnu.org; Mon, 05 Feb 2018 11:05:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eijGV-0004jH-LP for bug-grep@gnu.org; Mon, 05 Feb 2018 11:05:10 -0500 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:43669) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eijGV-0004g9-G0 for bug-grep@gnu.org; Mon, 05 Feb 2018 11:05:07 -0500 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 0A7F120F49 for ; Mon, 5 Feb 2018 11:05:06 -0500 (EST) Received: from web2 ([10.202.2.212]) by compute1.internal (MEProxy); Mon, 05 Feb 2018 11:05:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=jZa+6n WWO9cYwUW37IjInxnEZOabTVp1uWy6SJOL9VQ=; b=jZLPeKLvIhG5y/7iOnnm1V Q73eByTykiBTnJAXSYxw+yjuOhALSqzusWJ0/wfmaDPlBGbifD6dmT3oAkaSM7R4 FZSPmTlu4AEisegBtEW9oAPLPQKzrtIlGQBjuHgaleftYqytF7yydNLz6b57Gymy BaJst/YROYjUXTEIpO5upRXdtzB4Zct+B1KsBrU0SL66lEWBiXVuNPXo1k5IZIoh hans2YYyGn0IQvdFLtW8eo+hk2fngGpWVuy7vl5hYU5CjXGokdVAOto5FVRqHoLT Sq9GvrVgG44D9xB90wU5V54VNxeigKRbNXzDd4TqJSvn6oTLRVqwtsZDiLBtxFjQ == X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id E0D2D62BAB; Mon, 5 Feb 2018 11:05:05 -0500 (EST) Message-Id: <1517846705.1634806.1260027824.786C5AEF@webmail.messagingengine.com> From: Paul Jackson MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface - ajax-fde26eb3 Date: Mon, 05 Feb 2018 10:05:05 -0600 In-Reply-To: <5A7507AB.7060402@tlinx.org> References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> <5A7507AB.7060402@tlinx.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.4 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) A couple of possible "solutions" to this quandary: === If one goal of the current grep behavior is to avoid putting out "junk" unexpectedly, then instead of rejecting input files that have any such "junk", rather happily grep on any dang file, by default, but then filter the output to suppress the "junk". For many years now, I've been using my own private mutant semi-brain damaged grep-variant that I use for searching for text within mostly binary files that does this ... it will look for any specified sequence of non-nul bytes within any bucket of bits, and when found, work forward and backward until it hits either a newline or a non-ASCII character, and then limit it's output to what is between those beginning and ending points. No non-ASCII junk will be output (except in so far as that was part of the requested search string.) My private mutant only does fixed strings (grep -F equivalent), but I imagine that the same trimming of output could be done on a real grep as well. Since "grep" is commonly used in shell scripts, I name my mutant by some other name, and let "grep" continue to be whatever is the current convention. In short, if the goal is to not output "junk", then perhaps that is what the current "grep" should do, rather than rejecting from even considering everything in a file after it encounters any "junk" character (even if it has already successfully found and emitted some matches earlier in the file.) === Second possibility: keep one's own private copy of whatever grep last performed as desired, in a "bin" that's on one's path ahead of whatever "standard" and "current" grep is installed. For many years now, I've continued to use the "ed" command that was current back then (with a couple of my own hacks), in preference to the current evolving ed. Since "ed" is seldom used within shell scripts, and when so used, is never that I've noticed used in a way that depends on which version of "ed" is used, I don't need to rename my preferred, archaic, "ed". But, perhaps L. A. Walsh might choose to do with "grep" as I have done with "ed" ... put an old version ahead of the current version on $PATH. (wave to "law" ... hope you're doing well.) -- Paul Jackson pj@usa.net From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 05 Feb 2018 16:51:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Paul Jackson , 30326@debbugs.gnu.org Received: via spool by 30326-submit@debbugs.gnu.org id=B30326.151784944923113 (code B ref 30326); Mon, 05 Feb 2018 16:51:01 +0000 Received: (at 30326) by debbugs.gnu.org; 5 Feb 2018 16:50:49 +0000 Received: from localhost ([127.0.0.1]:58961 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eijyj-00060j-7p for submit@debbugs.gnu.org; Mon, 05 Feb 2018 11:50:49 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:55328) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eijyh-00060U-QQ for 30326@debbugs.gnu.org; Mon, 05 Feb 2018 11:50:48 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 6F24A1616B3; Mon, 5 Feb 2018 08:50:41 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id GEBPj_aO9bwI; Mon, 5 Feb 2018 08:50:40 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 47D671616BF; Mon, 5 Feb 2018 08:50:40 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id TKFBYtYW662K; Mon, 5 Feb 2018 08:50:40 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 2BB531616B3; Mon, 5 Feb 2018 08:50:40 -0800 (PST) References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> <5A7507AB.7060402@tlinx.org> <1517846705.1634806.1260027824.786C5AEF@webmail.messagingengine.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <83e92515-a0ed-d4e1-2fb2-69ff17602f10@cs.ucla.edu> Date: Mon, 5 Feb 2018 08:50:37 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <1517846705.1634806.1260027824.786C5AEF@webmail.messagingengine.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) On 02/05/2018 08:05 AM, Paul Jackson wrote: > If one goal of the current grep behavior is to avoid putting out > "junk" unexpectedly, then instead of rejecting input files that > have any such "junk", rather happily grep on any dang file, by > default, but then filter the output to suppress the "junk". We've done that already, if memory serves. From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Jackson Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 05 Feb 2018 21:28:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Paul Eggert , 30326@debbugs.gnu.org Received: via spool by 30326-submit@debbugs.gnu.org id=B30326.151786607723952 (code B ref 30326); Mon, 05 Feb 2018 21:28:01 +0000 Received: (at 30326) by debbugs.gnu.org; 5 Feb 2018 21:27:57 +0000 Received: from localhost ([127.0.0.1]:59089 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eioIv-0006EG-0m for submit@debbugs.gnu.org; Mon, 05 Feb 2018 16:27:57 -0500 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:39587) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eioIr-0006E7-MX for 30326@debbugs.gnu.org; Mon, 05 Feb 2018 16:27:54 -0500 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 9283820AFA; Mon, 5 Feb 2018 16:27:53 -0500 (EST) Received: from web2 ([10.202.2.212]) by compute1.internal (MEProxy); Mon, 05 Feb 2018 16:27:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=M/r+O1 xvYShRlMry/YrHeJVWIbtV1ANXM2+lFuZ7LZo=; b=phrUjh6UYOQ2Ex+QUcM3Jo 1MUuQgXqxbb1nPjxLalW/fmojUJdtngaQTGKhiDuAtrub+KtjhuHZmCX1ONUz7tE jqSuq45qeSKJS8E0emUzRRh4zPRvVAbITjY3qGEX9hUu5KgGzWThwGxj3fjgdCSK Mk7KQ9UZTjy5N75pavY2gATXDcbWHZdIpSLvb3KRbSwLwIqHvhgkOkp0NIpLUaMW gR2LJtAaYRehBOXGv/1AhdP2gfoj0kAZ0n5Ne2waLAuZm0uPeJzm4Vkjmhgy0uDB m6h3wcB5q2TB8CxtIosYnoL3mglLYtwCC9DiJnxyHzmmEsc4HSkClZCbM4L80bhg == X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id 6FB7A62BAB; Mon, 5 Feb 2018 16:27:53 -0500 (EST) Message-Id: <1517866073.2340426.1260485416.49767342@webmail.messagingengine.com> From: Paul Jackson MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary="_----------=_151786607323404262" X-Mailer: MessagingEngine.com Webmail Interface - ajax-fde26eb3 In-Reply-To: <83e92515-a0ed-d4e1-2fb2-69ff17602f10@cs.ucla.edu> Date: Mon, 05 Feb 2018 15:27:53 -0600 References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> <5A7507AB.7060402@tlinx.org> <1517846705.1634806.1260027824.786C5AEF@webmail.messagingengine.com> <83e92515-a0ed-d4e1-2fb2-69ff17602f10@cs.ucla.edu> X-Spam-Score: -0.1 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.1 (/) This is a multi-part message in MIME format. --_----------=_151786607323404262 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Paul Eggert wrote, in response to my suggestion to filter grep output, not input, for "binary junk":>> We've done that already, if memory serves. I don't think so :). The installed grep on the system I'm typing on right now is "grep (GNU grep) 3.0".I've not checked closely, but I believe that should be a fairly recent grep. I created a large file ("/tmp/pjbb") by concatenating: 1) a big plain ASCII file of C source code, 2) a small ELF executable, and 3) another big plain ASCII file of C source code. Then I grep'd in this big file for the string "pj@usa.net", which appeared twice in the first file of C source code, and once again in the second file of C source code. Here's what I see: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D *$* grep --version | head -1 grep (GNU grep) 3.0 *$* grep pj@usa.net /tmp/pjbb * pj@usa.net * pj@usa.net Binary file /tmp/pjbb matches *$* grep -a pj@usa.net /tmp/pjbb * pj@usa.net * pj@usa.net * pj@usa.net =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D By default, grep sees the first two "pj@usa.net", then abandons the search before seeing the third such, when it first encounters the ELF binary. Using "grep -a" to ask grep to persist, it sees all three "pj@usa.net" strings. =3D=3D=3D My ancient home-brew hack that provides ASCII trimmed output when scanning binary files for ASCII strings, contains custom code to buffer the already scanned input, in order that it can then scan backwards, once it finds a match. The usual line oriented buffering doesn't work so well when the input file might have no, or at least infrequent, line breaks. -- Paul Jackson =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 pj@usa.net --_----------=_151786607323404262 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset="utf-8"

Paul Eggert wrote, in response to my suggestion to filter grep output, not input, for "binary junk":
>> We've done that already, if memory serves.

I don't think so :).

The installed grep on the system I'm typing on right now is "grep (GNU grep) 3.0".
I've not checked closely, but I believe that should be a fairly recent grep.

I created a large file ("/tmp/pjbb")  by concatenating:
1) a big plain ASCII file of C source code,
2) a small ELF executable, and
3) another big plain ASCII file of C source code.

Then I grep'd in this big file for the string "pj@usa.net", which
appeared twice in  the first file of C source code,  and once
again in the second file of C source code.

Here's what I see:
============================

$ grep --version | head -1
grep (GNU grep) 3.0

$ grep pj@usa.net /tmp/pjbb
Binary file /tmp/pjbb matches

$ grep -a pj@usa.net /tmp/pjbb
============================

By default, grep sees the first two "pj@usa.net",
then abandons the search before seeing the third
such, when it first encounters the ELF binary.

Using "grep -a" to ask grep to persist, it sees all
three "pj@usa.net" strings.

===

My ancient home-brew hack that provides ASCII trimmed
output when scanning binary files for ASCII strings, contains
custom code to buffer the already scanned input, in order
that it can then scan backwards, once it finds a match.

The usual line oriented buffering doesn't work so well when
the input file might have no, or at least infrequent, line breaks.

--
                Paul Jackson
                pj@usa.net
--_----------=_151786607323404262-- From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 05 Feb 2018 23:39:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Paul Jackson , 30326@debbugs.gnu.org Received: via spool by 30326-submit@debbugs.gnu.org id=B30326.15178739373428 (code B ref 30326); Mon, 05 Feb 2018 23:39:01 +0000 Received: (at 30326) by debbugs.gnu.org; 5 Feb 2018 23:38:57 +0000 Received: from localhost ([127.0.0.1]:59217 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eiqLh-0000tC-5F for submit@debbugs.gnu.org; Mon, 05 Feb 2018 18:38:57 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:60304) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eiqLf-0000sy-9u for 30326@debbugs.gnu.org; Mon, 05 Feb 2018 18:38:55 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 38AE7160D39; Mon, 5 Feb 2018 15:38:49 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id jpqfigodde4U; Mon, 5 Feb 2018 15:38:48 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 44A58160E18; Mon, 5 Feb 2018 15:38:48 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id JEfkIJPkxXBW; Mon, 5 Feb 2018 15:38:48 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 28491160D39; Mon, 5 Feb 2018 15:38:48 -0800 (PST) References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> <5A7507AB.7060402@tlinx.org> <1517846705.1634806.1260027824.786C5AEF@webmail.messagingengine.com> <83e92515-a0ed-d4e1-2fb2-69ff17602f10@cs.ucla.edu> <1517866073.2340426.1260485416.49767342@webmail.messagingengine.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <7b18847c-f646-a877-6e51-36abd7d94142@cs.ucla.edu> Date: Mon, 5 Feb 2018 15:38:47 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <1517866073.2340426.1260485416.49767342@webmail.messagingengine.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) On 02/05/2018 01:27 PM, Paul Jackson wrote: > > I created a large file ("/tmp/pjbb")=C2=A0 by concatenating: > 1) a big plain ASCII file of C source code, > 2) a small ELF executable, and > 3) another big plain ASCII file of C source code. > > Then I grep'd in this big file for the string "pj@usa.net=20 > ", which > appeared twice in=C2=A0 the first file of C source code,=C2=A0 and once > again in the second file of C source code. That example contains NULs, which have indicated binary data for ages. I=20 was referring to text containing encoding errors without containing=20 NULs, which is what this bug report originally was about. Sorry I didn't=20 make that clear. From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Jackson Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Tue, 06 Feb 2018 05:39:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Paul Eggert , 30326@debbugs.gnu.org Received: via spool by 30326-submit@debbugs.gnu.org id=B30326.151789548919458 (code B ref 30326); Tue, 06 Feb 2018 05:39:01 +0000 Received: (at 30326) by debbugs.gnu.org; 6 Feb 2018 05:38:09 +0000 Received: from localhost ([127.0.0.1]:59415 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eivxI-00053m-V0 for submit@debbugs.gnu.org; Tue, 06 Feb 2018 00:38:09 -0500 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:59611) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eivxH-00053e-2g for 30326@debbugs.gnu.org; Tue, 06 Feb 2018 00:38:07 -0500 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 382F520C39; Tue, 6 Feb 2018 00:38:06 -0500 (EST) Received: from web2 ([10.202.2.212]) by compute1.internal (MEProxy); Tue, 06 Feb 2018 00:38:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=wR8t/T oWh3fFlvAuXJrAwlZeHtxzyr0Kytfqj32jAG0=; b=QZIjHMh3yc05vy2aMMEHsk 0CVPCIb1P0+c40OsXIY2ogTr57bw7rjCyXW2eVIS4aVO/5fhMfvrHuJ5Ugiq2b/h G/W38fK0MtXRLfBJKWH5DK1mRb4ZuFlntcMg36EkeqqLY3aPG8iQlJ9SzgTSHojj 5fkB9A8gNng72AR417jCu4IqI1/G17lBkuNyuBbKFrYKPJbin8HBxrxGH6T4BAH8 WUF0Cuz5XHvhNH8bd6lwVTEkpUfizdGRR4RThqC29nt2QVTMBb42KSVPAQ1zSWuV 40c5gFUfwJGur9w4LisM8iH9fq6UadyCF1bnC76IkSAplcTB5vwHPP+lRvJAYvsA == X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id 072C6621DC; Tue, 6 Feb 2018 00:38:05 -0500 (EST) Message-Id: <1517895485.2535753.1260869064.0188CA1B@webmail.messagingengine.com> From: Paul Jackson MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary="_----------=_151789548525357530" X-Mailer: MessagingEngine.com Webmail Interface - ajax-fde26eb3 In-Reply-To: <7b18847c-f646-a877-6e51-36abd7d94142@cs.ucla.edu> References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> <5A7507AB.7060402@tlinx.org> <1517846705.1634806.1260027824.786C5AEF@webmail.messagingengine.com> <83e92515-a0ed-d4e1-2fb2-69ff17602f10@cs.ucla.edu> <1517866073.2340426.1260485416.49767342@webmail.messagingengine.com> <7b18847c-f646-a877-6e51-36abd7d94142@cs.ucla.edu> Date: Mon, 05 Feb 2018 23:38:05 -0600 X-Spam-Score: -0.1 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.1 (/) This is a multi-part message in MIME format. --_----------=_151789548525357530 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" Paul Eggert wrote: >> I was referring to text containing encoding errors without >> containing NULs Ah - that makes sense. The following experiment leads me to conclude that grep entirely suppressesemitting any portion of a match that would contain an encoding error, ratherthan emitting some substring of the match that can be correctly encoded. That is, it seems that if grep is asked to emit what it thinks would be amatch with an encoding error, grep seems to suppress that output line entirely, and continues looking for matches that it can emit without encodingerrors, and then at the end, if it saw a match that would have emitted anencoding error, it issues the "*Binary file ... matches*" error, just before exiting (or ending processing of that particular file.) I demonstrated this by replacing the ELF executable of my previous example withthe output of the following C program, which issues every possible pair of bytes,except for no nul and no 255 bytes: *main()** * *{** * * int i, j;** * * for (i = 1; i < 255; i++) {** * * for (j = 1; j < 255; j++)** * * printf("%c%c", i, j);** * * }** * * puts("");** * *}** * So I tested on a file (*/tmp/pjcc*) containing (1) a bunch of ASCII C code,(2) output from the above program, and (3) another copy of the same ASCII C code. Then, with the following settings: *LC_COLLATE=C** * *LANGUAGE=en_US.UTF-8** * *LC_ALL=en_US.UTF-8** * *LANG=en_US.UTF-8** * I ran the command: *grep "'N'" /tmp/pjcc** * I got the following output: * case 'N':** * * case 'N':** * *Binary file /tmp/pjcc matches** * The "*case 'N':*" string appears once in the C code used in the file, butthere are two copies of that C code in the file, so that grep prints that line twice. I also double checked that my file */tmp/pjcc* did not contain any nul bytes. The three character sequence *'N'* also appears in the middle section ofall non-nul, non-255 pairs of bytes, as well as in the ASCII C code, andit was (I presume) the match on that section of the file that caused grepto issue the ""*Binary file /tmp/pjcc matches* complaint at the end of its processing of that file. If on the other hand, I ran the command: *grep "'N':" /tmp/pjcc* then I got the output: * case 'N':* * case 'N':* with*_out_* any complaint that the *Binary file /tmp/pjcc matches.* The four character sequence *'N':* appears (twice) in the C code, but zero times in the middle section of all non-nul, non-255 pairs of bytes. >From this I conclude that if grep, in its default mode, is asked to emit a matchingpattern that would contain encoding errors, that it does not trim the output to whatwould encode correctly and continue onward, but rather emits nothing for that match,continues onward looking for more matches that it can emit correctly, and thenprints the "*Binary file ... matches*" error just before it exits or goes to thenext file. If I were designing grep from scratch, and had infinite resources, I might refer tohave grep emit some substring of each match that it can encode correctly, ratherthan emit nothing in case of an encoding error. However, I can't imagine that this is worth the effort, and (being a stickin the mud old fart) I usually recommend against incompatible changes unless strongly necessary. So ... whatever ... nevermind ... as they say. -- Paul Jackson pj@usa.net --_----------=_151789548525357530 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset="utf-8"

Paul Eggert wrote:
>>  I was referring to text containing encoding errors without containing NULs

Ah - that makes sense.

The following experiment leads me to conclude that grep entirely suppresses
emitting any portion of a match that would contain an encoding error, rather
than emitting some substring of the match that can be correctly encoded.

That is, it seems that if grep is asked to emit what it thinks would be a
match with an encoding error, grep seems to suppress that output line
entirely, and continues looking for matches that it can emit without encoding
errors, and then at the end, if it saw a match that would have emitted an
encoding error, it issues the "Binary file ... matches" error, just
before exiting (or ending processing of that particular file.)

I demonstrated this by replacing the ELF executable of my previous example with
the output of the following C program, which issues every possible pair of bytes,
except for no nul and no 255 bytes:

main()
{
    int i, j;
    for (i = 1; i < 255; i++) {
        for (j = 1; j < 255; j++)
            printf("%c%c", i, j);
    }
    puts("");
}

So I tested on a file (/tmp/pjcc) containing (1) a bunch of ASCII C code,
(2) output from the above program, and (3) another copy of the same ASCII C code.

Then, with the following settings:

LC_COLLATE=C
LANGUAGE=en_US.UTF-8
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

I ran the command:

grep "'N'" /tmp/pjcc

I got the following output:

        case 'N':
        case 'N':
Binary file /tmp/pjcc matches

The "case 'N':" string appears once in the C code used in the file, but
there are two copies of that C code in the file, so that grep prints that line twice.

I also double checked that my file /tmp/pjcc did not contain any nul bytes.

The three character sequence 'N' also appears in the middle section of
all non-nul, non-255 pairs of bytes, as well as in the ASCII C code, and
it was (I presume) the match on that section of the file that caused grep
to issue the ""Binary file /tmp/pjcc matches complaint at the
end of its processing of that file.

If on the other hand, I ran the command:

grep "'N':" /tmp/pjcc

then I got the output:

        case 'N':
        case 'N':

without any complaint that the Binary file /tmp/pjcc matches.

The four character sequence 'N':  appears (twice) in the C code,
but zero times in the middle section of all non-nul, non-255 pairs of bytes.

From this I conclude that if grep, in its default mode, is asked to emit a matching
pattern that would contain encoding errors, that it does not trim the output to what
would encode correctly and continue onward, but rather emits nothing for that match,
continues onward looking for more matches that it can emit correctly, and then
prints the "Binary file ... matches" error just before it exits or goes to the
next file.

If I were designing grep from scratch, and had infinite resources, I might refer to
have grep emit some substring of each match that it can encode correctly, rather
than emit nothing in case of an encoding error.

However, I can't imagine that this is worth the effort, and (being a stick
in the mud old fart) I usually recommend against incompatible changes
unless strongly necessary.

So ... whatever ... nevermind ... as they say.

--
                Paul Jackson
                pj@usa.net

--_----------=_151789548525357530-- From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 20 18:16:52 2018 Received: (at control) by debbugs.gnu.org; 20 Apr 2018 22:16:53 +0000 Received: from localhost ([127.0.0.1]:34395 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f9eKq-0006xo-NF for submit@debbugs.gnu.org; Fri, 20 Apr 2018 18:16:52 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:54736) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f9eKo-0006xb-R3 for control@debbugs.gnu.org; Fri, 20 Apr 2018 18:16:51 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A7B6A16161E for ; Fri, 20 Apr 2018 15:16:44 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id MVQiDrzB8yh6 for ; Fri, 20 Apr 2018 15:16:44 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0BFE9161668 for ; Fri, 20 Apr 2018 15:16:44 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jhbXQz0dhNbq for ; Fri, 20 Apr 2018 15:16:43 -0700 (PDT) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E5EB416161E for ; Fri, 20 Apr 2018 15:16:43 -0700 (PDT) To: GNU bug control From: Paul Eggert Subject: unarchive 30326 Organization: UCLA Computer Science Department Message-ID: <7027248f-f8d9-1ea0-f7fd-3adb0195d592@cs.ucla.edu> Date: Fri, 20 Apr 2018 15:16:43 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) unarchive 30326 From unknown Sat Jun 21 12:14:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#30326: grep not searching through a text file (thinking it binary) Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 20 Apr 2018 22:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30326 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Paul Jackson Cc: 30326@debbugs.gnu.org Received: via spool by 30326-submit@debbugs.gnu.org id=B30326.152426307027473 (code B ref 30326); Fri, 20 Apr 2018 22:25:01 +0000 Received: (at 30326) by debbugs.gnu.org; 20 Apr 2018 22:24:30 +0000 Received: from localhost ([127.0.0.1]:34400 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f9eSC-00078z-GG for submit@debbugs.gnu.org; Fri, 20 Apr 2018 18:24:30 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:56088) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f9eSA-00078n-KC for 30326@debbugs.gnu.org; Fri, 20 Apr 2018 18:24:27 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id F1E2816161E; Fri, 20 Apr 2018 15:24:20 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id mVZ5A-KCK0ny; Fri, 20 Apr 2018 15:24:19 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E726A161668; Fri, 20 Apr 2018 15:24:19 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9YEOycTNsjFI; Fri, 20 Apr 2018 15:24:19 -0700 (PDT) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id C812F16161E; Fri, 20 Apr 2018 15:24:19 -0700 (PDT) From: Paul Eggert References: <5A74BC3F.1030401@tlinx.org> <2c00563c-9347-c596-4ade-a87bd9262ca1@redhat.com> <5A74C573.7030101@tlinx.org> <8ae08329-1d8d-f83b-0898-72bd09a3841b@cs.ucla.edu> <5A74F161.2060403@tlinx.org> <82fb317f-3a94-00f8-4382-243b766ab900@cs.ucla.edu> <5A74F49D.3050909@tlinx.org> <5A7507AB.7060402@tlinx.org> <1517846705.1634806.1260027824.786C5AEF@webmail.messagingengine.com> <83e92515-a0ed-d4e1-2fb2-69ff17602f10@cs.ucla.edu> <1517866073.2340426.1260485416.49767342@webmail.messagingengine.com> <7b18847c-f646-a877-6e51-36abd7d94142@cs.ucla.edu> Organization: UCLA Computer Science Department Message-ID: <3b7740be-c8cf-dcb7-2100-c14f4e0b7031@cs.ucla.edu> Date: Fri, 20 Apr 2018 15:24:19 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <7b18847c-f646-a877-6e51-36abd7d94142@cs.ucla.edu> Content-Type: multipart/mixed; boundary="------------0A67923EFC29633409C84E53" Content-Language: en-US X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------0A67923EFC29633409C84E53 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 02/05/2018 03:38 PM, Paul Eggert wrote: > I was referring to text containing encoding errors without containing > NULs, which is what this bug report originally was about. Sorry I > didn't make that clear. > Following up on this (with some delay...), I installed the attached patch to try to cover this point more clearly in the grep manual. --------------0A67923EFC29633409C84E53 Content-Type: text/plain; charset=UTF-8; name="0001-doc-mention-encoding-errors.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0001-doc-mention-encoding-errors.txt" RnJvbSA5OTA0YTJiY2IwOTkwNDhlNWExN2JkZDZlZGY2NTk1NzY0OTExNzQxIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBGcmksIDIwIEFwciAyMDE4IDE1OjE5OjA5IC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gZG9jOiBtZW50aW9uIGVuY29kaW5nIGVycm9ycwpNSU1FLVZlcnNpb246IDEuMApDb250 ZW50LVR5cGU6IHRleHQvcGxhaW47IGNoYXJzZXQ9VVRGLTgKQ29udGVudC1UcmFuc2Zlci1F bmNvZGluZzogOGJpdAoKVGhpcyBhdHRlbXB0cyB0byBkb2N1bWVudCB0aGUgZW5jb2Rpbmct ZXJyb3IgcHJvYmxlbSBtb3JlCnByZWNpc2VseSAoQnVnIzMwMzI2KS4KKiBkb2MvZ3JlcC5p bi4xLCBkb2MvZ3JlcC50ZXhpOiBNZW50aW9uIHRoYXQgdGhlIGJlaGF2aW9yIG9mCnBhdHRl cm5zIGxpa2Ug4oCYLuKAmSBpcyBub3Qgc3BlY2lmaWVkIG9uIGVuY29kaW5nIGVycm9ycy4K LS0tCiBkb2MvZ3JlcC5pbi4xIHwgIDYgKysrKy0tCiBkb2MvZ3JlcC50ZXhpIHwgNDAgKysr KysrKysrKysrKysrKysrKysrKysrKysrKystLS0tLS0tLS0tLQogMiBmaWxlcyBjaGFuZ2Vk LCAzMyBpbnNlcnRpb25zKCspLCAxMyBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9kb2Mv Z3JlcC5pbi4xIGIvZG9jL2dyZXAuaW4uMQppbmRleCA5MzkzYjM3Li5hZTE0ZTU0IDEwMDY0 NAotLS0gYS9kb2MvZ3JlcC5pbi4xCisrKyBiL2RvYy9ncmVwLmluLjEKQEAgLTc0NCw2ICs3 NDQsNyBAQCBtYXkgYmUgcXVvdGVkIGJ5IHByZWNlZGluZyBpdCB3aXRoIGEgYmFja3NsYXNo LgogVGhlIHBlcmlvZAogLkIgLlwmCiBtYXRjaGVzIGFueSBzaW5nbGUgY2hhcmFjdGVyLgor SXQgaXMgdW5zcGVjaWZpZWQgd2hldGhlciBpdCBtYXRjaGVzIGFuIGVuY29kaW5nIGVycm9y LgogLlNTICJDaGFyYWN0ZXIgQ2xhc3NlcyBhbmQgQnJhY2tldCBFeHByZXNzaW9ucyIKIEEK IC5JICJicmFja2V0IGV4cHJlc3Npb24iCkBAIC03NTIsMTIgKzc1MywxMyBAQCBpcyBhIGxp c3Qgb2YgY2hhcmFjdGVycyBlbmNsb3NlZCBieQogYW5kCiAuQlIgXSAuCiBJdCBtYXRjaGVz IGFueSBzaW5nbGUKLWNoYXJhY3RlciBpbiB0aGF0IGxpc3Q7IGlmIHRoZSBmaXJzdCBjaGFy YWN0ZXIgb2YgdGhlIGxpc3QKK2NoYXJhY3RlciBpbiB0aGF0IGxpc3QuCitJZiB0aGUgZmly c3QgY2hhcmFjdGVyIG9mIHRoZSBsaXN0CiBpcyB0aGUgY2FyZXQKIC5CIF4KIHRoZW4gaXQg bWF0Y2hlcyBhbnkgY2hhcmFjdGVyCiAuSSBub3QKLWluIHRoZSBsaXN0LgoraW4gdGhlIGxp c3Q7IGl0IGlzIHVuc3BlY2lmaWVkIHdoZXRoZXIgaXQgbWF0Y2hlcyBhbiBlbmNvZGluZyBl cnJvci4KIEZvciBleGFtcGxlLCB0aGUgcmVndWxhciBleHByZXNzaW9uCiAuQiBbMDEyMzQ1 Njc4OV0KIG1hdGNoZXMgYW55IHNpbmdsZSBkaWdpdC4KZGlmZiAtLWdpdCBhL2RvYy9ncmVw LnRleGkgYi9kb2MvZ3JlcC50ZXhpCmluZGV4IDkyMmQ5NmUuLjU4Y2FhNjIgMTAwNjQ0Ci0t LSBhL2RvYy9ncmVwLnRleGkKKysrIGIvZG9jL2dyZXAudGV4aQpAQCAtMTAxNiw2ICsxMDE2 LDggQEAgaW50ZXJwcmV0ZWQuCiBAdmluZGV4IExDX0FMTCBAcntlbnZpcm9ubWVudCB2YXJp YWJsZX0KIEB2aW5kZXggTENfQ1RZUEUgQHJ7ZW52aXJvbm1lbnQgdmFyaWFibGV9CiBAdmlu ZGV4IExBTkcgQHJ7ZW52aXJvbm1lbnQgdmFyaWFibGV9CitAY2luZGV4IGVuY29kaW5nIGVy cm9yCitAY2luZGV4IG51bGwgY2hhcmFjdGVyCiBUaGVzZSB2YXJpYWJsZXMgc3BlY2lmeSB0 aGUgbG9jYWxlIGZvciB0aGUgQGVudntMQ19DVFlQRX0gY2F0ZWdvcnksCiB3aGljaCBkZXRl cm1pbmVzIHRoZSB0eXBlIG9mIGNoYXJhY3RlcnMsCiBlLmcuLCB3aGljaCBjaGFyYWN0ZXJz IGFyZSB3aGl0ZXNwYWNlLgpAQCAtMTAyMyw2ICsxMDI1LDE4IEBAIFRoaXMgY2F0ZWdvcnkg YWxzbyBkZXRlcm1pbmVzIHRoZSBjaGFyYWN0ZXIgZW5jb2RpbmcsIHRoYXQgaXMsIHdoZXRo ZXIKIHRleHQgaXMgZW5jb2RlZCBpbiBVVEYtOCwgQVNDSUksIG9yIHNvbWUgb3RoZXIgZW5j b2RpbmcuICBJbiB0aGUKIEBzYW1we0N9IG9yIEBzYW1we1BPU0lYfSBsb2NhbGUsIGFsbCBj aGFyYWN0ZXJzIGFyZSBlbmNvZGVkIGFzIGEKIHNpbmdsZSBieXRlIGFuZCBldmVyeSBieXRl IGlzIGEgdmFsaWQgY2hhcmFjdGVyLgorSW4gbW9yZS1jb21wbGV4IGVuY29kaW5ncyBzdWNo IGFzIFVURi04LCBhIHNlcXVlbmNlIG9mIG11bHRpcGxlIGJ5dGVzCittYXkgYmUgbmVlZGVk IHRvIHJlcHJlc2VudCBhIGNoYXJhY3RlciwgYW5kIHNvbWUgYnl0ZXMgbWF5IGJlIGVuY29k aW5nCitlcnJvcnMgdGhhdCBkbyBub3QgY29udHJpYnV0ZSB0byB0aGUgcmVwcmVzZW50YXRp b24gb2YgYW55IGNoYXJhY3Rlci4KK1BPU0lYIGRvZXMgbm90IHNwZWNpZnkgdGhlIGJlaGF2 aW9yIG9mIEBjb21tYW5ke2dyZXB9IHdoZW4gcGF0dGVybnMgb3IKK2lucHV0IGRhdGEgY29u dGFpbiBlbmNvZGluZyBlcnJvcnMgb3IgbnVsbCBjaGFyYWN0ZXJzLCBzbyBwb3J0YWJsZQor c2NyaXB0cyBzaG91bGQgYXZvaWQgc3VjaCB1c2FnZS4gIEFzIGFuIGV4dGVuc2lvbiB0byBQ T1NJWCwgR05VCitAY29tbWFuZHtncmVwfSB0cmVhdHMgbnVsbCBjaGFyYWN0ZXJzIGxpa2Ug YW55IG90aGVyIGNoYXJhY3Rlci4KK0hvd2V2ZXIsIHVubGVzcyB0aGUgQG9wdGlvbnstYX0g KEBvcHRpb257LS1iaW5hcnktZmlsZXM9dGV4dH0pIG9wdGlvbgoraXMgdXNlZCwgdGhlIHBy ZXNlbmNlIG9mIG51bGwgY2hhcmFjdGVycyBpbiBpbnB1dCBvciBvZiBlbmNvZGluZworZXJy b3JzIGluIG91dHB1dCBjYXVzZXMgR05VIEBjb21tYW5ke2dyZXB9IHRvIHRyZWF0IHRoZSBm aWxlIGFzIGJpbmFyeQorYW5kIHN1cHByZXNzIGRldGFpbHMgYWJvdXQgbWF0Y2hlcy4gIEB4 cmVme0ZpbGUgYW5kIERpcmVjdG9yeQorU2VsZWN0aW9ufS4KIAogQGl0ZW0gTEFOR1VBR0UK IEBpdGVteCBMQ19BTEwKQEAgLTExODcsMTYgKzEyMDEsMTYgQEAgYXJlIHJlZ3VsYXIgZXhw cmVzc2lvbnMgdGhhdCBtYXRjaCB0aGVtc2VsdmVzLgogQW55IG1ldGEtY2hhcmFjdGVyCiB3 aXRoIHNwZWNpYWwgbWVhbmluZyBtYXkgYmUgcXVvdGVkIGJ5IHByZWNlZGluZyBpdCB3aXRo IGEgYmFja3NsYXNoLgogCi1BIHJlZ3VsYXIgZXhwcmVzc2lvbiBtYXkgYmUgZm9sbG93ZWQg Ynkgb25lIG9mIHNldmVyYWwKLXJlcGV0aXRpb24gb3BlcmF0b3JzOgotCi1AdGFibGUgQHNh bXAKLQotQGl0ZW0gLgogQG9waW5kZXggLgogQGNpbmRleCBkb3QKIEBjaW5kZXggcGVyaW9k CiBUaGUgcGVyaW9kIEBzYW1wey59IG1hdGNoZXMgYW55IHNpbmdsZSBjaGFyYWN0ZXIuCitJ dCBpcyB1bnNwZWNpZmllZCB3aGV0aGVyIEBzYW1wey59IG1hdGNoZXMgYW4gZW5jb2Rpbmcg ZXJyb3IuCisKK0EgcmVndWxhciBleHByZXNzaW9uIG1heSBiZSBmb2xsb3dlZCBieSBvbmUg b2Ygc2V2ZXJhbAorcmVwZXRpdGlvbiBvcGVyYXRvcnM6CisKK0B0YWJsZSBAc2FtcAogCiBA aXRlbSA/CiBAb3BpbmRleCA/CkBAIC0xMjY3LDExICsxMjgxLDE1IEBAIEFuIHVubWF0Y2hl ZCBAc2FtcHspfSBtYXRjaGVzIGp1c3QgaXRzZWxmLgogQGNpbmRleCBjaGFyYWN0ZXIgY2xh c3MKIEEgQGRmbnticmFja2V0IGV4cHJlc3Npb259IGlzIGEgbGlzdCBvZiBjaGFyYWN0ZXJz IGVuY2xvc2VkIGJ5IEBzYW1we1t9IGFuZAogQHNhbXB7XX0uCi1JdCBtYXRjaGVzIGFueSBz aW5nbGUgY2hhcmFjdGVyIGluIHRoYXQgbGlzdDsKLWlmIHRoZSBmaXJzdCBjaGFyYWN0ZXIg b2YgdGhlIGxpc3QgaXMgdGhlIGNhcmV0IEBzYW1we159LAotdGhlbiBpdCBtYXRjaGVzIGFu eSBjaGFyYWN0ZXIgQHN0cm9uZ3tub3R9IGluIHRoZSBsaXN0LgorSXQgbWF0Y2hlcyBhbnkg c2luZ2xlIGNoYXJhY3RlciBpbiB0aGF0IGxpc3QuCitJZiB0aGUgZmlyc3QgY2hhcmFjdGVy IG9mIHRoZSBsaXN0IGlzIHRoZSBjYXJldCBAc2FtcHtefSwKK3RoZW4gaXQgbWF0Y2hlcyBh bnkgY2hhcmFjdGVyIEBzdHJvbmd7bm90fSBpbiB0aGUgbGlzdCwKK2FuZCBpdCBpcyB1bnNw ZWNpZmllZCB3aGV0aGVyIGl0IG1hdGNoZXMgYW4gZW5jb2RpbmcgZXJyb3IuCiBGb3IgZXhh bXBsZSwgdGhlIHJlZ3VsYXIgZXhwcmVzc2lvbgotQHNhbXB7WzAxMjM0NTY3ODldfSBtYXRj aGVzIGFueSBzaW5nbGUgZGlnaXQuCitAc2FtcHtbMDEyMzQ1Njc4OV19IG1hdGNoZXMgYW55 IHNpbmdsZSBkaWdpdCwKK3doZXJlYXMgQHNhbXB7W14oKV19IG1hdGNoZXMgYW55IHNpbmds ZSBjaGFyYWN0ZXIgdGhhdCBpcyBub3QKK2FuIG9wZW5pbmcgb3IgY2xvc2luZyBwYXJlbnRo ZXNpcywgYW5kIG1pZ2h0IG9yIG1pZ2h0IG5vdCBtYXRjaCBhbgorZW5jb2RpbmcgZXJyb3Iu CiAKIEBjaW5kZXggcmFuZ2UgZXhwcmVzc2lvbgogV2l0aGluIGEgYnJhY2tldCBleHByZXNz aW9uLCBhIEBkZm57cmFuZ2UgZXhwcmVzc2lvbn0gY29uc2lzdHMgb2YgdHdvCkBAIC0xODU2 LDcgKzE4NzQsNyBAQCBPbiBzb21lIG9wZXJhdGluZyBzeXN0ZW1zIHRoYXQgc3VwcG9ydCBm aWxlcyB3aXRoIGhvbGVzLS0tbGFyZ2UKIHJlZ2lvbnMgb2YgemVyb3MgdGhhdCBhcmUgbm90 IHBoeXNpY2FsbHkgcHJlc2VudCBvbiBzZWNvbmRhcnkKIHN0b3JhZ2UtLS1AY29tbWFuZHtn cmVwfSBjYW4gc2tpcCBvdmVyIHRoZSBob2xlcyBlZmZpY2llbnRseSB3aXRob3V0CiBuZWVk aW5nIHRvIHJlYWQgdGhlIHplcm9zLiAgVGhpcyBvcHRpbWl6YXRpb24gaXMgbm90IGF2YWls YWJsZSBpZiB0aGUKLUBvcHRpb257LWF9IChAb3B0aW9uey0tdGV4dH0pIG9wdGlvbiBpcyB1 c2VkIChAcHhyZWZ7RmlsZSBhbmQKK0BvcHRpb257LWF9IChAb3B0aW9uey0tYmluYXJ5LWZp bGVzPXRleHR9KSBvcHRpb24gaXMgdXNlZCAoQHB4cmVme0ZpbGUgYW5kCiBEaXJlY3Rvcnkg U2VsZWN0aW9ufSksIHVubGVzcyB0aGUgQG9wdGlvbnsten0gKEBvcHRpb257LS1udWxsLWRh dGF9KQogb3B0aW9uIGlzIGFsc28gdXNlZCAoQHB4cmVme090aGVyIE9wdGlvbnN9KS4KIAot LSAKMi4xNC4zCgo= --------------0A67923EFC29633409C84E53--