From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 16:44:02 2016 Received: (at submit) by debbugs.gnu.org; 6 Apr 2016 20:44:02 +0000 Received: from localhost ([127.0.0.1]:52127 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anuIz-0002E0-Ai for submit@debbugs.gnu.org; Wed, 06 Apr 2016 16:44:01 -0400 Received: from eggs.gnu.org ([208.118.235.92]:57863) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ant5C-0000FA-7Z for submit@debbugs.gnu.org; Wed, 06 Apr 2016 15:25:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ant55-0003Pp-QQ for submit@debbugs.gnu.org; Wed, 06 Apr 2016 15:25:37 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:57659) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ant55-0003Ph-ME for submit@debbugs.gnu.org; Wed, 06 Apr 2016 15:25:35 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48826) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ant54-0002uH-8G for bug-grep@gnu.org; Wed, 06 Apr 2016 15:25:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ant50-0003OK-Mn for bug-grep@gnu.org; Wed, 06 Apr 2016 15:25:34 -0400 Received: from elli.j3e.de ([193.175.80.161]:55892 helo=mail.j3e.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ant50-0003NF-9n for bug-grep@gnu.org; Wed, 06 Apr 2016 15:25:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=j3e.de; s=2954282; h=Message-ID:To:From:Date; bh=7QfWSyDQyrX2IeOdhekZazktb4ww6jceolJwweMeMyk=; b=Q3rNjTAexv6eE0ehGo3byTw9hxqXqWqviIRyhMuxnMLOIGH/3zF5+lPqYSJXVBEVyWK3eQFdY5agQKKAfBksKiHIZ3+xKI+aZ+kLag8vvlslsmbfMklgtQbPtc7nnFjxdcxa2ad3H9DdfbN9Ibr439PrLjVnM5ruXXEHM/R9rojGDyZNluQOdZ4TEnqxUV5y9mRaWjMEkJSXdvJ+UDCvHYaOKM9IanVj5PcZ+g5qFvEbcfXV2QG1ipTI2IOtu8HYbh2dnFc1BGOO4VCiI4gzGwFDGrQu6OPRHveBOudrT/Juig8RkWWgFRl3B52A78UTYciZvc/gTDOaiGDSSEc5gw==; Received: from [127.0.0.2] (localhost [127.0.0.1]) by mail.j3e.de with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim) id 1ant4s-0007vj-Bg for bug-grep@gnu.org; Wed, 06 Apr 2016 21:25:22 +0200 Received: from bjacke by pell.sernet.de with local (Exim 4.86_2) (envelope-from ) id 1ant4r-0003qS-JJ for bug-grep@gnu.org; Wed, 06 Apr 2016 21:25:21 +0200 Date: Wed, 6 Apr 2016 21:25:21 +0200 From: =?iso-8859-1?Q?Bj=F6rn?= JACKE To: bug-grep@gnu.org Subject: unexpected results with charset handling in GNU grep 2.23 Message-ID: <20160406192521.GA14451@SerNet.DE> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Q: Die Schriftsteller koennen nicht so schnell schreiben, wie die Regierungen Kriege machen; denn das Schreiben verlangt Denkarbeit. - Brecht X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 06 Apr 2016 16:44:00 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.1 (----) Hi, this change in GNU grep 2.23 has severe consequences: > Binary files are now less likely to generate diagnostics and more > likely to yield text matches. grep now reports "Binary file FOO > matches" and suppresses further output instead of outputting a line > containing an encoding error; hence grep can now report matching text > before a later binary match. Formerly, grep reported FOO to be > binary when it found an encoding error in FOO before generating > output for FOO, which meant it never reported both matching text and > matching binary data; this was less useful for searching text > containing encoding errors in non-matching lines. I got a report that the build of the German spellcheck dictionary got broken. It tuned out that this happened after the update to GNU grep to 2.23: https://bugzilla.redhat.com/show_bug.cgi?id=1316359 Actually the mentioned change leaves no reliable way to grep lines out of a any text file, which contains non-ASCII characters. Until now it was quite save to use grep in the C locale, also for non-ASCII text. Now after that change, the locale charmap has to match all of the encoding of the input file. Unfortunately the only locale that definetely always exists for sure is the C locale. We cannot assume that any other locale definitions exist on an unknown system. For a script, that wants to use grep, this is a big problem now. Let's take this example using grep 2.23: # echo -e "test\ntäst\ntest" | iconv -f utf8 -t latin1 | LC_ALL=C grep "st" ; echo $? test Binary file (standard input) matches 0 There are several problems here. Someone might want to assume that the locale definitions for en_US.ISO-8859-1 exist. Unfortunetely such an assumtion cannot be made. Whatever locale is used - if the definition might not be there and we will fall back to the C locale in any case then. The result is, we get the first matching line in the example. The second matching line with a non-ASCII character returns the text "Binary file (standard input) matches" on stdout (which might even be a valid matching line of the input file!) and the following matches are skipped. (Finally the return code is 0 - as the grepping stopped quickly, a return code >1 might be desireble, but I don't want to dive into that point right now.) Let me draw a biger picture: Have a look at what a POSIX compliant grep is expected to do: http://pubs.opengroup.org/onlinepubs/009604499/utilities/grep.html Read the description section, especially: --snip-- By default, an input line shall be selected if any pattern, treated as an entire basic regular expression (BRE) as described in the Base Definitions volume of IEEE Std 1003.1-2001, Section 9.3, Basic Regular Expressions, matches any part of the line excluding the terminating ; --snap-- That means a posix compliant grep should not try to be too smart and tell the user that a binary file matches the search pattern (people can use "strings" if they want). It should just output the line. From that perspective GNU grep was not posix compliant before either, but it was not a big problem for most people obviously. With the recent change though and the issues described above I think a lot of scripts using (GNU) grep will get broken. I really hope this change will be reverted as soon as possible. I would rather prefer GNU grep to become posix compliant and not do any binary detection by default actually. Cheers Björn From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 17:04:34 2016 Received: (at 23234) by debbugs.gnu.org; 6 Apr 2016 21:04:34 +0000 Received: from localhost ([127.0.0.1]:52176 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anucr-0004QF-Rs for submit@debbugs.gnu.org; Wed, 06 Apr 2016 17:04:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34856) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anucp-0004Q3-Vq for 23234@debbugs.gnu.org; Wed, 06 Apr 2016 17:04:32 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C58E1E872; Wed, 6 Apr 2016 21:04:27 +0000 (UTC) Received: from [10.3.113.199] (ovpn-113-199.phx2.redhat.com [10.3.113.199]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u36L4RWW011013; Wed, 6 Apr 2016 17:04:27 -0400 Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: =?UTF-8?Q?Bj=c3=b6rn_JACKE?= , 23234@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg X-Enigmail-Draft-Status: N1110 Organization: Red Hat, Inc. Message-ID: <570579DA.9020602@redhat.com> Date: Wed, 6 Apr 2016 15:04:26 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <20160406192521.GA14451@SerNet.DE> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ghBv2JR9B8R2VAvKBNjjS1538flFbmxcm" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Spam-Score: -6.0 (------) X-Debbugs-Envelope-To: 23234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --ghBv2JR9B8R2VAvKBNjjS1538flFbmxcm Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 04/06/2016 01:25 PM, Bj=C3=B6rn JACKE wrote: > Let's take this example using grep 2.23: >=20 > # echo -e "test\nt=C3=A4st\ntest" | iconv -f utf8 -t latin1 | LC_ALL=3D= C grep "st" ; echo $? [As a side point, 'echo -e' is non-portable; better is to use printf.] Hmm. POSIX says that a file is binary if it does not end in newline, if it contains embedded NUL, or if it contains an encoding error. But it also says that LC_ALL=3DC is _required_ to treat all 256 byte values as valid characters (ASCII is only required to treat 7-bit characters as valid, and may reject 8-bit bytes, but LC_ALL=3DC is _not_ ASCII). This indeed looks like a bug in current grep.git, as I can reproduce it: $ git rev-parse HEAD 2ba6ab34da05d3aebc5e7e3dfaedb1cf3ddc5a73 $ printf "test\nt=C3=A4st\ntest\n" | iconv -f utf8 -t latin1 | LC_ALL=3DC src/grep "st" test Binary file (standard input) matches Looks like we don't have something quite right in claiming that 0xe4 is not a valid character when in the single-byte C locale. > I really hope this change will be reverted as soon as possible. I would= rather > prefer GNU grep to become posix compliant and not do any binary detecti= on by > default actually. The change of treating encoding errors as binary files will NOT be reverted, but here, you HAVE pointed out a bug where we are treating something as binary that is NOT an encoding error (because by definition, LC_ALL=3DC has no encoding errors - all 256 byte values are characters). So this is indeed a bug to be fixed. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --ghBv2JR9B8R2VAvKBNjjS1538flFbmxcm Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJXBXnaAAoJEKeha0olJ0NqFDsH/2KDFBmvGHLkN/VU7+qv2iay hsN4LwaJlOIJvM02PUSYzZGp5vtRFIZZA9p2X9661lu8ojBdBFZLslloOZAHLRTq 5VzAnyMJybeoys7UF8uEhR+XBfmrtb5EHrzGrEXNrC3w0cspugVNUsrcTo4FLV/z glTFEID5n35lsc1z7KSanEjLNohipKrx6OY9LKwc/NpC5ZLswlbPUkoOt5cDU2q7 bTEiWL1ICnv3k02RJGy/BieZwDJe0P8TF1PKIC5XGboPjVgYYGugnikCoybOICLV BW1YPny9olhFLPSXiF1WQ3pyOiNhA9WlM301Pmm+lrAZgwq9ZDRppCGUwt20QQ0= =6W4w -----END PGP SIGNATURE----- --ghBv2JR9B8R2VAvKBNjjS1538flFbmxcm-- From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 18:23:57 2016 Received: (at 23234) by debbugs.gnu.org; 6 Apr 2016 22:23:57 +0000 Received: from localhost ([127.0.0.1]:52196 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anvrh-0006QO-Jy for submit@debbugs.gnu.org; Wed, 06 Apr 2016 18:23:57 -0400 Received: from elli.j3e.de ([193.175.80.161]:56569 helo=mail.j3e.de) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anvrf-0006QF-VK for 23234@debbugs.gnu.org; Wed, 06 Apr 2016 18:23:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=j3e.de; s=2954282; h=Date:Message-ID:From:To; bh=93kM1PJZYOYSvBZrfWO4Tlf0P+9H5u3OpdygliJjhHA=; b=Ypr+Ay3Nl91vVtnNPmCPK5X3nexCmwUA4PtuHPQrHIJVRCIOaztnUx7sXfMfsIHxwQoycXWlxQ3ycbtrKTP0DO5Is+XH5+gRAOBh/bnULW5ZDWRN6oczkhhwnClE8bgwCkQNaBPaRcpm+2K0pkc+g5Z5hoSShaIUTmHqxasRz6Z/m6Ui3z6/5YNA1NtzV8L50OPOmgPsAoqCtHJSE9Wl9Mt5AL9IiEliPiFuJi4llJyUdDn5GwVSJeKsvvB2phF46ri3ksZc8tTITwzMRrP8pjOiWTnCRZ8B9TYSORkMTwv6NctapuNC6cNASHXo8SFY3ZApkzw02rERlYs8ZsnJwA==; Received: from [127.0.0.2] (localhost [127.0.0.1]) by mail.j3e.de with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim) id 1anvre-0000hZ-5X; Thu, 07 Apr 2016 00:23:54 +0200 Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Eric Blake , 23234@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> From: Bjoern Jacke X-Enigmail-Draft-Status: N1110 Message-ID: <57058C79.5080407@j3e.de> Date: Thu, 7 Apr 2016 00:23:53 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <570579DA.9020602@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 06.04.2016 23:04, Eric Blake wrote: > The change of treating encoding errors as binary files will NOT be > reverted, but here, hmm ... think of log files: In log files you will usually find all kind of encodings. If a user greps for a certain error message string in a log file he will not be able to find the errors because GNU grep will terminate grepping as soon as the first byte which does not fit into the locate encoding pops up. The only way would be to advice users to use the C locale if that is the only one that will be fixed. I can't believe that this is what you intended to achieve here. And what about the output of "Binary file (standard input) matches" on *stdout*? This is not distinguishable from a line that matched and contains this text. How should a script catch this situation? Björn From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 18:33:32 2016 Received: (at 23234) by debbugs.gnu.org; 6 Apr 2016 22:33:32 +0000 Received: from localhost ([127.0.0.1]:52204 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anw0y-0006f8-44 for submit@debbugs.gnu.org; Wed, 06 Apr 2016 18:33:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57197) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anw0w-0006ev-Rn for 23234@debbugs.gnu.org; Wed, 06 Apr 2016 18:33:31 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 22A9A201E8; Wed, 6 Apr 2016 22:33:25 +0000 (UTC) Received: from [10.3.113.199] (ovpn-113-199.phx2.redhat.com [10.3.113.199]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u36MXO6g016901; Wed, 6 Apr 2016 18:33:24 -0400 Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Bjoern Jacke , 23234@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <57058C79.5080407@j3e.de> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <57058EB4.4050200@redhat.com> Date: Wed, 6 Apr 2016 16:33:24 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <57058C79.5080407@j3e.de> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="jcVH0EcHSilOI2wjXdQ9teAHAu4K0g521" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Spam-Score: -6.0 (------) X-Debbugs-Envelope-To: 23234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --jcVH0EcHSilOI2wjXdQ9teAHAu4K0g521 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 04/06/2016 04:23 PM, Bjoern Jacke wrote: > On 06.04.2016 23:04, Eric Blake wrote: >> The change of treating encoding errors as binary files will NOT be >> reverted, but here, >=20 > hmm ... think of log files: In log files you will usually find all kind= > of encodings. If a user greps for a certain error message string in a > log file he will not be able to find the errors because GNU grep will > terminate grepping as soon as the first byte which does not fit into th= e > locate encoding pops up. 'grep -a' is your friend. > And what about the output of "Binary file (standard input) matches" on > *stdout*? This is not distinguishable from a line that matched and > contains this text. How should a script catch this situation? That behavior complies with POSIX requirements. Again, a script SHOULD NOT be grepping binary files (POSIX only defines grep on text files) without knowing the ramifications. Meanwhile, 'grep -a' guarantees you won't get the "Binary file" message. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --jcVH0EcHSilOI2wjXdQ9teAHAu4K0g521 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJXBY60AAoJEKeha0olJ0NqGVYH/2D1zFhyVyUbMNpLGPiRoNjx FICfxTj39hzJSrl0R4Us5wdg6FPSOa9tVYPCjq2FIie/UIYiRwZYTuoaxfzue3TA qri4TivP8YwezB6sLNSOfAxWb1MRCAygfvothOg5VSPceUbn1IOLlOIQdCTz6ztG UV4wuUrdrKGegp6lJvQqrPyLTQUD1ovKkQuQoQ6wD5UFFG1ndJb175SGFWj1JDUj YkYwLp4CZCWGPfp7oveHYPMW0Je52sqMKEw/DBk/bID6KwZSozO5w/e0wcPWy3yA z57xJeaKr8CNz11BaILOPU/CfO5/p6YOGj/Lo9j5f9x3g0puYtw492psAYqM+lA= =vLro -----END PGP SIGNATURE----- --jcVH0EcHSilOI2wjXdQ9teAHAu4K0g521-- From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 19:04:08 2016 Received: (at 23234) by debbugs.gnu.org; 6 Apr 2016 23:04:09 +0000 Received: from localhost ([127.0.0.1]:52224 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anwUa-0007Op-MP for submit@debbugs.gnu.org; Wed, 06 Apr 2016 19:04:08 -0400 Received: from elli.j3e.de ([193.175.80.161]:59045 helo=mail.j3e.de) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anwUZ-0007Of-0C for 23234@debbugs.gnu.org; Wed, 06 Apr 2016 19:04:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=j3e.de; s=2954282; h=Date:Message-ID:From:To; bh=85o4TlYJpCswlrc3MkbkvPD0YsG2wZiHUS1eXZ/blCY=; b=AODkXV+pkHUBUvX2xZLQdUXTKhIFFcnZiyQZRAVz9bNRPaw+wA5sc0Lcq4DZQRbUcjeRBhvLn/QEqBaD9syKbGFIPpgmmRd0Il4dRqGyyv0Sf9umyGqgn0g+diCC5mUmL6p/88OgjTVYBBm+4xugtKAzGcSwLVcRBfT2HILEkuCT/FfXzA5uIMyFb8Eg/f549SFHWF7gD4cnRZEac9U2f8k+a+V4C2oQqwaNZZsg8GE79GvcQzwdz5TRJyEEI8Oxe8ddpes6w7lRctylp7d9e720G7g1/baJncNlqo9pepDKov4E8XnP+EoU+50JFeyKGdW2vhs+AuF8hbDPPJBwcw==; Received: from [127.0.0.2] (localhost [127.0.0.1]) by mail.j3e.de with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim) id 1anwUW-00045c-1X; Thu, 07 Apr 2016 01:04:04 +0200 Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Eric Blake , 23234@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <57058C79.5080407@j3e.de> <57058EB4.4050200@redhat.com> From: Bjoern Jacke X-Enigmail-Draft-Status: N1110 Message-ID: <570595E4.4010602@j3e.de> Date: Thu, 7 Apr 2016 01:04:04 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <57058EB4.4050200@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 07.04.2016 00:33, Eric Blake wrote: > That behavior complies with POSIX requirements. can you give a quote here? One thing which is not POSIX compliant is that the diagnostic messages is given back on stdout. http://pubs.opengroup.org/onlinepubs/9699919799/ says: --snip-- LC_MESSAGES Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error. --snap-- which implies that diagnostic messages should be given back to standard error. > Again, a script SHOULD > NOT be grepping binary files (POSIX only defines grep on text files) > without knowing the ramifications. Meanwhile, 'grep -a' guarantees you > won't get the "Binary file" message. if you consider grepping text files with mixed encodings as invalid use of grep, then you should not return 0 and/or output the "Binary file (standard input) matches" on stdout. This makes the output of GNU grep look like a valid match. You say "grep -a" is your friend to all the users, who want to grep log files (cause they tend to conain mixed encodinds). Sure, -a is a workaround to make GNU grep work as before again. Realisically 99.99 of the users will not know that though, because this is the first grep version ever I guess, that requires this. Also -a is a GNU option only, so portable scripts will not be able to use that. I guess you are aware, that you will break a lot of existing scripts with that change of treating mixed encoding input files as binary like the way you do it now with GNU grep >= 2.23 ? Björn From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 19:15:30 2016 Received: (at 23234) by debbugs.gnu.org; 6 Apr 2016 23:15:30 +0000 Received: from localhost ([127.0.0.1]:52241 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anwfa-0007fO-5h for submit@debbugs.gnu.org; Wed, 06 Apr 2016 19:15:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60137) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anwfX-0007fF-Pz for 23234@debbugs.gnu.org; Wed, 06 Apr 2016 19:15:28 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1B25B72675; Wed, 6 Apr 2016 23:15:27 +0000 (UTC) Received: from [10.3.113.199] (ovpn-113-199.phx2.redhat.com [10.3.113.199]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u36NFP0J002106; Wed, 6 Apr 2016 19:15:26 -0400 Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Bjoern Jacke , 23234@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <57058C79.5080407@j3e.de> <57058EB4.4050200@redhat.com> <570595E4.4010602@j3e.de> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg X-Enigmail-Draft-Status: N1110 Organization: Red Hat, Inc. Message-ID: <5705988D.7030001@redhat.com> Date: Wed, 6 Apr 2016 17:15:25 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <570595E4.4010602@j3e.de> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="TQuugC2gv724UpqpQv9JUwF1tWa7pI5RE" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 06 Apr 2016 23:15:27 +0000 (UTC) X-Spam-Score: -6.0 (------) X-Debbugs-Envelope-To: 23234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --TQuugC2gv724UpqpQv9JUwF1tWa7pI5RE Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 04/06/2016 05:04 PM, Bjoern Jacke wrote: > On 07.04.2016 00:33, Eric Blake wrote: >> That behavior complies with POSIX requirements. >=20 > can you give a quote here? One thing which is not POSIX compliant is > that the diagnostic messages is given back on stdout. > http://pubs.opengroup.org/onlinepubs/9699919799/ says: >=20 > --snip-- > LC_MESSAGES > Determine the locale that should be used to affect the format and > contents of diagnostic messages written to standard error. > --snap-- http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html STDIN The standard input shall be used if no file operands are specified, and shall be used if a file operand is '-' and the implementation treats the '-' as meaning standard input. Otherwise, the standard input shall not be used. See the INPUT FILES section. INPUT FILES The input files shall be text files. As soon as you supply grep with non-text-file input, POSIX no longer applies, and we can do WHATEVER WE WANT. The violation is not in grep's behavior, but in yours for passing a binary file. We have chosen that WHATEVER WE WANT means that by default, we will tell you (on stdout) that the binary file matches, but if you use the (non-standard extension) -a option, we will pretend the file is text anyways. And it's been documented that way for basically "forever" in GNU grep. What's changed recently is what we've done under the hood (more efficient recognition of binary files, treating '\0' and '\n' identically as line terminators when -a is not in effect because of the speed improvements it lets us gain, and attempts with heuristics to avoid spamming terminals or downstream clients with encoding errors when -a is not in effect). But all of those still fall under the broad category of WHATEVER WE WANT as it falls outside the POSIX standard. And yes, maybe we could change grep to print the "Binary file matches" message to stderr, but that in turn will probably break other scripts, and lead to even more complaints from people doing non-standard things and expecting consistent results. That said, patches are still welcome, if you think you have better heuristics than what we currently have, and as long as it still falls within the realm of WHATEVER WE WANT. > if you consider grepping text files with mixed encodings as invalid use= > of grep, then you should not return 0 and/or output the "Binary file > (standard input) matches" on stdout. This makes the output of GNU grep > look like a valid match. Maybe changing the exit status when a binary file is encountered is worth doing - but not returning status 0 when a match is detected is more likely to do harm than good. >=20 > You say "grep -a" is your friend to all the users, who want to grep log= > files (cause they tend to conain mixed encodinds). Sure, -a is a > workaround to make GNU grep work as before again. Realisically 99.99 of= > the users will not know that though, because this is the first grep > version ever I guess, that requires this. Also -a is a GNU option only,= > so portable scripts will not be able to use that. Portable scripts are not able to grep binary files, period. As long as you don't mind non-portable extensions, 'grep -a' is what you want. >=20 > I guess you are aware, that you will break a lot of existing scripts > with that change of treating mixed encoding input files as binary like > the way you do it now with GNU grep >=3D 2.23 ? Yes, we are aware that lots of users are getting an education on the subtleties of POSIX. But that doesn't mean it is a bug. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --TQuugC2gv724UpqpQv9JUwF1tWa7pI5RE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJXBZiNAAoJEKeha0olJ0NqV0oH/Rfk7mUi+uwwwk6BX1ZYpnH+ JFlPhWCQ+0Nom2sjSk8ChY9/Kp+bWTx5lYYwflEFpnFrOWcwF07r4RhOGviGPQxz dJNn1BDjGWNpiW0kjb504f34HaG1+lHhFd0ICorOlnujoWrh9vmq3QO58JnqTBBv 4x/11IaXR4H7WRe7NJ+vmnaptuxQQdpXRgBU9UQGIHPJu2rPQ4kM3+wOBdaMZjvm fLq8BYWDupExVo3529RkZuteLD/l7Bj1EH+45nGcbLjwnXyJXg9rMfFsqg+O6wid uyRqHa2uJ+PEtjM46bY7a4E+4kt7SFLmKHPrqYnpsbeFaszdfkBCLLznbhh0Q7c= =DAn8 -----END PGP SIGNATURE----- --TQuugC2gv724UpqpQv9JUwF1tWa7pI5RE-- From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 21:25:25 2016 Received: (at 23234) by debbugs.gnu.org; 7 Apr 2016 01:25:25 +0000 Received: from localhost ([127.0.0.1]:52273 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anyhI-0002Sd-Vz for submit@debbugs.gnu.org; Wed, 06 Apr 2016 21:25:25 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:39148) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anyhG-0002SP-R5 for 23234@debbugs.gnu.org; Wed, 06 Apr 2016 21:25:23 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id F1BB6160E3C; Wed, 6 Apr 2016 18:25:16 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id uERpBe0FoIDP; Wed, 6 Apr 2016 18:25:16 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 44ABB160FF0; Wed, 6 Apr 2016 18:25:16 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id BzT1Q1fvZlpb; Wed, 6 Apr 2016 18:25:16 -0700 (PDT) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 2BC39160E3C; Wed, 6 Apr 2016 18:25:16 -0700 (PDT) Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Eric Blake , Bjoern Jacke , 23234@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <57058C79.5080407@j3e.de> <57058EB4.4050200@redhat.com> <570595E4.4010602@j3e.de> <5705988D.7030001@redhat.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5705B6FC.6040908@cs.ucla.edu> Date: Wed, 6 Apr 2016 18:25:16 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <5705988D.7030001@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 04/06/2016 04:15 PM, Eric Blake wrote: > And yes, maybe we could change grep to print the "Binary file matches" > message to stderr, but that in turn will probably break other scripts, > and lead to even more complaints from people doing non-standard things > and expecting consistent results. Yes, I'm dubious about this idea. grep's behavior was inspired by diff's similar behavior, and grep and diff have worked that way for many years and I expect people depend on it. POSIX says that diff should output its binary-file message to stdout, and I expect that if POSIX standardized grep's behavior on binary files it would do something similar. > Maybe changing the exit status when a binary file is encountered is > worth doing Possibly, though I don't see the use case yet. If it's needed I suggest doing the change only if a new option is specified (--binary-files=error, say) so that it's upward-compatible with existing behavior. From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 06 21:28:42 2016 Received: (at 23234) by debbugs.gnu.org; 7 Apr 2016 01:28:42 +0000 Received: from localhost ([127.0.0.1]:52278 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anykU-0002XB-F0 for submit@debbugs.gnu.org; Wed, 06 Apr 2016 21:28:42 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:39307) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anykS-0002Wx-4X for 23234@debbugs.gnu.org; Wed, 06 Apr 2016 21:28:40 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C6AB7161287; Wed, 6 Apr 2016 18:28:34 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Z-09n4wxLnka; Wed, 6 Apr 2016 18:28:33 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id D2EB4161288; Wed, 6 Apr 2016 18:28:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id kGtz6Uo79Z-W; Wed, 6 Apr 2016 18:28:33 -0700 (PDT) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id B95D0161287; Wed, 6 Apr 2016 18:28:33 -0700 (PDT) Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Eric Blake , =?UTF-8?Q?Bj=c3=b6rn_JACKE?= , 23234@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5705B7C1.4040301@cs.ucla.edu> Date: Wed, 6 Apr 2016 18:28:33 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <570579DA.9020602@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 04/06/2016 02:04 PM, Eric Blake wrote: > POSIX ... says that LC_ALL=3DC is _required_ to treat all 256 byte=20 > values as valid characters Although that was the intent of POSIX, it's not what the current=20 standard says, and it's not what many popular platforms do. Problematic=20 platforms include Fedora 23, where mbrtowc reports an encoding error in=20 the C locale when given a byte outside the range 0-127. This affects=20 many programs other than 'grep'. This bug in the standard is intended to be fixed in a future version of=20 POSIX (see ). I=20 suppose glibc and eventually Fedora will be fixed to conform to the new=20 standard in due course. Perhaps grep should work around this problem on systems like Fedora 23=20 where the underlying C library does not conform to the next version of=20 POSIX. It sounds like a new gnulib module or two might do the trick.=20 This should fix the problems that Bj=C3=B6rn mentions. In the meantime grep -a is the way to go. Yes, it's not portable to=20 non-GNU grep, but there is no portable solution given the abovementioned=20 POSIX problems, so a GNU-grep-only workaround is all one can reasonably=20 ask for. From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 09 04:34:42 2016 Received: (at 23234) by debbugs.gnu.org; 9 Apr 2016 08:34:42 +0000 Received: from localhost ([127.0.0.1]:55134 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aooLq-0004yu-8j for submit@debbugs.gnu.org; Sat, 09 Apr 2016 04:34:42 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:43784) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aooLo-0004yh-Md for 23234@debbugs.gnu.org; Sat, 09 Apr 2016 04:34:41 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B03EE161248; Sat, 9 Apr 2016 01:34:34 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 17YxgRpxR1aq; Sat, 9 Apr 2016 01:34:33 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 992D0161249; Sat, 9 Apr 2016 01:34:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id XCuI9HuPoOr8; Sat, 9 Apr 2016 01:34:33 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 5817D161248; Sat, 9 Apr 2016 01:34:33 -0700 (PDT) Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Eric Blake , =?UTF-8?Q?Bj=c3=b6rn_JACKE?= , 23234@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <5705B7C1.4040301@cs.ucla.edu> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5708BE92.6010002@cs.ucla.edu> Date: Sat, 9 Apr 2016 01:34:26 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <5705B7C1.4040301@cs.ucla.edu> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Paul Eggert wrote: > Perhaps grep should work around this problem on systems like Fedora 23 = where the > underlying C library does not conform to the next version of POSIX. It = sounds > like a new gnulib module or two might do the trick. This should fix the= problems > that Bj=C3=B6rn mentions. I've started on this by changing the mbrtowc module in gnulib to work aro= und the=20 future-POSIX incompatibility of mbrtowc in glibc. See: http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=3Db7bc3c1a4e78add4= cbad39ae1a0c4fb0747b483f I plan to change GNU grep to use this new facility, and to add some grep = test=20 cases for this issue. From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 09 05:29:12 2016 Received: (at 23234) by debbugs.gnu.org; 9 Apr 2016 09:29:12 +0000 Received: from localhost ([127.0.0.1]:55166 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aopCa-0006Ku-8v for submit@debbugs.gnu.org; Sat, 09 Apr 2016 05:29:12 -0400 Received: from mailgw02.kcn.ne.jp ([61.86.7.209]:52902) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aopCY-0006Kg-EG for 23234@debbugs.gnu.org; Sat, 09 Apr 2016 05:29:11 -0400 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw02.kcn.ne.jp (Postfix) with ESMTP id 702F7BFC6B for <23234@debbugs.gnu.org>; Sat, 9 Apr 2016 18:29:02 +0900 (JST) X-matriXscan-loop-detect: ed1dd64397ba4e1e720cd9c85c2d192165780f06 Received: from mail03.kcn.ne.jp ([61.86.6.182]) by mxs01-s with ESMTP; Sat, 09 Apr 2016 18:29:00 +0900 (JST) Received: from [10.120.1.5] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail03.kcn.ne.jp (Postfix) with ESMTPA id A2F44141009A; Sat, 9 Apr 2016 18:29:00 +0900 (JST) Date: Sat, 09 Apr 2016 18:29:01 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 In-Reply-To: <5705B6FC.6040908@cs.ucla.edu> References: <5705988D.7030001@redhat.com> <5705B6FC.6040908@cs.ucla.edu> Message-Id: <20160409182901.9FB2.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 1.7 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Wed, 6 Apr 2016 18:25:16 -0700 Paul Eggert wrote: > On 04/06/2016 04:15 PM, Eric Blake wrote: > > And yes, maybe we could change grep to print the "Binary file matches" > > message to stderr, but that in turn will probably break other scripts, > > and lead to even more complaints from people doing non-standard things > > and expecting consistent results. > > Yes, I'm dubious about this idea. grep's behavior was inspired by > diff's similar behavior, and grep and diff have worked that way for > many years and I expect people depend on it. POSIX says that diff > should output its binary-file message to stdout, and I expect that if > POSIX standardized grep's behavior on binary files it would do > something similar. [...] Content analysis details: (1.7 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 2.7 RCVD_IN_PSBL RBL: Received via a relay in PSBL [61.86.7.209 listed in psbl.surriel.com] -1.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -0.0 SPF_PASS SPF: sender matches SPF record X-Debbugs-Envelope-To: 23234 Cc: Bjoern Jacke , Eric Blake , 23234@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.7 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Wed, 6 Apr 2016 18:25:16 -0700 Paul Eggert wrote: > On 04/06/2016 04:15 PM, Eric Blake wrote: > > And yes, maybe we could change grep to print the "Binary file matches" > > message to stderr, but that in turn will probably break other scripts, > > and lead to even more complaints from people doing non-standard things > > and expecting consistent results. > > Yes, I'm dubious about this idea. grep's behavior was inspired by > diff's similar behavior, and grep and diff have worked that way for > many years and I expect people depend on it. POSIX says that diff > should output its binary-file message to stdout, and I expect that if > POSIX standardized grep's behavior on binary files it would do > something similar. [...] Content analysis details: (1.7 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 2.7 RCVD_IN_PSBL RBL: Received via a relay in PSBL [61.86.7.209 listed in psbl.surriel.com] -1.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -0.0 SPF_PASS SPF: sender matches SPF record On Wed, 6 Apr 2016 18:25:16 -0700 Paul Eggert wrote: > On 04/06/2016 04:15 PM, Eric Blake wrote: > > And yes, maybe we could change grep to print the "Binary file matches" > > message to stderr, but that in turn will probably break other scripts, > > and lead to even more complaints from people doing non-standard things > > and expecting consistent results. > > Yes, I'm dubious about this idea. grep's behavior was inspired by > diff's similar behavior, and grep and diff have worked that way for > many years and I expect people depend on it. POSIX says that diff > should output its binary-file message to stdout, and I expect that if > POSIX standardized grep's behavior on binary files it would do > something similar. Hmm, diff does not output "Binary file matches" between text files, but grep does it. $ cp src/grep grep.bin $ LC_ALL=en_US.utf8 src/grep g grep.bin Binary file grep.bin matches $ cat >grep.bin <) id 1ap5ii-0001X7-1q for submit@debbugs.gnu.org; Sat, 09 Apr 2016 23:07:28 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46729) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ap5if-0001Wr-LC for 23234@debbugs.gnu.org; Sat, 09 Apr 2016 23:07:26 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id AA1FD161245; Sat, 9 Apr 2016 20:07:19 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 2MwcE55LvdiI; Sat, 9 Apr 2016 20:07:19 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id EF4FA16124A; Sat, 9 Apr 2016 20:07:18 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id MxYjYgFhCRWT; Sat, 9 Apr 2016 20:07:18 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id CB04A161245; Sat, 9 Apr 2016 20:07:18 -0700 (PDT) Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Norihiro Tanaka References: <5705988D.7030001@redhat.com> <5705B6FC.6040908@cs.ucla.edu> <20160409182901.9FB2.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5709C366.8060307@cs.ucla.edu> Date: Sat, 9 Apr 2016 20:07:18 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160409182901.9FB2.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 Cc: Bjoern Jacke , Eric Blake , 23234@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Norihiro Tanaka wrote: > Hmm, diff does not output "Binary file matches" between text files, but > grep does it. I wasn't referring to the exact string "Binary file matches", merely to the idea that diff outputs a message to stdout saying that there was a binary file, rather than to stderr. Something like this: $ diff /usr/bin/diff /usr/bin/emacs 2>/dev/null Binary files /usr/bin/diff and /usr/bin/emacs differ > When a user got "Binary file matches" from grep, he can distinguish > whether matched a binary file or a line including "Binary file matches" > of a text file from only this result. Although that's a problem, it's not a serious one, as one can easily work around it by using -n or -H. If there were need, I suppose we could add another operand to the --binary option to cause it to do something else with a match. From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 10 04:43:24 2016 Received: (at 23234-done) by debbugs.gnu.org; 10 Apr 2016 08:43:24 +0000 Received: from localhost ([127.0.0.1]:56575 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apAxl-0001LG-5T for submit@debbugs.gnu.org; Sun, 10 Apr 2016 04:43:24 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:55195) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apAxj-0001L3-BK for 23234-done@debbugs.gnu.org; Sun, 10 Apr 2016 04:43:20 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 33AA2161245; Sun, 10 Apr 2016 01:43:13 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id QRcg69FhdXCc; Sun, 10 Apr 2016 01:43:11 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1B16116124A; Sun, 10 Apr 2016 01:43:11 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id I6_jV7vp6mmW; Sun, 10 Apr 2016 01:43:11 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E9648161245; Sun, 10 Apr 2016 01:43:10 -0700 (PDT) Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Eric Blake , =?UTF-8?Q?Bj=c3=b6rn_JACKE?= , 23234-done@debbugs.gnu.org References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <5705B7C1.4040301@cs.ucla.edu> <5708BE92.6010002@cs.ucla.edu> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <570A121E.4010802@cs.ucla.edu> Date: Sun, 10 Apr 2016 01:43:10 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <5708BE92.6010002@cs.ucla.edu> Content-Type: multipart/mixed; boundary="------------080206080106020003010701" X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------080206080106020003010701 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Paul Eggert wrote: > I plan to change GNU grep to use this new facility, and to add some grep test > cases for this issue. I did that by installing the attached patches into the grep master. This fixes the bug for me, so I'm closing the bug report. These patches mostly just report the fix and add test cases. The actual fix was in gnulib, here: http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b7bc3c1a4e78add4cbad39ae1a0c4fb0747b483f This gnulib fix works around the underyling glibc facility which caused the problem, for which I've filed a bug report here: https://sourceware.org/bugzilla/show_bug.cgi?id=19932 It's not clear when the glibc bug will be fixed. Until it is, one should expect similar problems to crop up in applications other than 'grep'. --------------080206080106020003010701 Content-Type: text/plain; charset=UTF-8; name="0001-build-update-gnulib-submodule-to-latest.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0001-build-update-gnulib-submodule-to-latest.txt" RnJvbSAwZDZhNDViMzdhYjg0ZGQzMzhlM2I1OTU3MTZkNzcwZjFhYzJkMDdjIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTdW4sIDEwIEFwciAyMDE2IDAwOjI1OjI3IC0wNzAwClN1YmplY3Q6IFtQQVRD SCAxLzJdIGJ1aWxkOiB1cGRhdGUgZ251bGliIHN1Ym1vZHVsZSB0byBsYXRlc3QKCi0tLQog Z251bGliIHwgMiArLQogMSBmaWxlIGNoYW5nZWQsIDEgaW5zZXJ0aW9uKCspLCAxIGRlbGV0 aW9uKC0pCgpkaWZmIC0tZ2l0IGEvZ251bGliIGIvZ251bGliCmluZGV4IGNkNmE0NTIuLmI3 YmMzYzEgMTYwMDAwCi0tLSBhL2dudWxpYgorKysgYi9nbnVsaWIKQEAgLTEgKzEgQEAKLVN1 YnByb2plY3QgY29tbWl0IGNkNmE0NTI5MmNkYjdiM2M0YjYyOGYxY2IwZjE5OWEwMjE0MGVh N2MKK1N1YnByb2plY3QgY29tbWl0IGI3YmMzYzFhNGU3OGFkZDRjYmFkMzlhZTFhMGM0ZmIw NzQ3YjQ4M2YKLS0gCjIuNS41Cgo= --------------080206080106020003010701 Content-Type: text/plain; charset=UTF-8; name="0002-grep-in-C-locale-all-bytes-are-valid-characters.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0002-grep-in-C-locale-all-bytes-are-valid-characters.txt" RnJvbSAxN2ZiNjA0YTRjZDIzYjA3Yjk5NTg0NzA2ZjkyZGI4ZDZkZDA1ZTc0IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTdW4sIDEwIEFwciAyMDE2IDAxOjMzOjI1IC0wNzAwClN1YmplY3Q6IFtQQVRD SCAyLzJdIGdyZXA6IGluIEMgbG9jYWxlLCBhbGwgYnl0ZXMgYXJlIHZhbGlkIGNoYXJhY3Rl cnMKTUlNRS1WZXJzaW9uOiAxLjAKQ29udGVudC1UeXBlOiB0ZXh0L3BsYWluOyBjaGFyc2V0 PVVURi04CkNvbnRlbnQtVHJhbnNmZXItRW5jb2Rpbmc6IDhiaXQKClRoaXMgd29ya3MgYXJv dW5kIGdsaWJjIGJ1ZyAxOTkzMjoKaHR0cHM6Ly9zb3VyY2V3YXJlLm9yZy9idWd6aWxsYS9z aG93X2J1Zy5jZ2k/aWQ9MTk5MzIKVGhlIGFjdHVhbCBidWcgZml4IHdhcyB0aGUgdXBkYXRl IHRvIHRoZSBjdXJyZW50IHZlcnNpb24gb2YgR251bGliLgpncmVwIHByb2JsZW0gcmVwb3J0 ZWQgYnkgQmrDtnJuIEphY2tlIGluOiBodHRwOi8vYnVncy5nbnUub3JnLzIzMjM0CiogTkVX UzogTWVudGlvbiB0aGlzLgoqIGRvYy9ncmVwLnRleGkgKEZpbGUgYW5kIERpcmVjdG9yeSBT ZWxlY3Rpb24pOiBDcm9zc3JlZiB0byBMQ18qCnNlY3Rpb24uICBTdWdnZXN0IHdoeSAtYSBv ciBMQ19BTEw9QyBtaWdodCBiZSB1c2VmdWwuCihFbnZpcm9ubWVudCBWYXJpYWJsZXMpOiBN ZW50aW9uICdsb2NhbGUgLWEnLgpTYXkgdGhhdCBMQ19DVFlQRSBhbHNvIHNwZWNpZmllcyBl bmNvZGluZywgYW5kIHRoYXQgZXZlcnkKYnl0ZSBpcyBhIHZhbGlkIGNoYXJhY3RlciBpbiB0 aGUgQyBvciBQT1NJWCBsb2NhbGUuCiogdGVzdHMvYy1sb2NhbGU6IE5ldyB0ZXN0LgoqIHRl c3RzL01ha2VmaWxlLmFtIChURVNUUyk6IEFkZCBpdC4KLS0tCiBORVdTICAgICAgICAgICAg ICB8ICA2ICsrKysrKwogZG9jL2dyZXAudGV4aSAgICAgfCAxOSArKysrKysrKysrKysrKy0t LS0tCiB0ZXN0cy9NYWtlZmlsZS5hbSB8ICAxICsKIHRlc3RzL2MtbG9jYWxlICAgIHwgMjYg KysrKysrKysrKysrKysrKysrKysrKysrKysKIDQgZmlsZXMgY2hhbmdlZCwgNDcgaW5zZXJ0 aW9ucygrKSwgNSBkZWxldGlvbnMoLSkKIGNyZWF0ZSBtb2RlIDEwMDc1NSB0ZXN0cy9jLWxv Y2FsZQoKZGlmZiAtLWdpdCBhL05FV1MgYi9ORVdTCmluZGV4IDY5ZTRhMjMuLjYzNzY3YWEg MTAwNjQ0Ci0tLSBhL05FV1MKKysrIGIvTkVXUwpAQCAtNCw2ICs0LDEyIEBAIEdOVSBncmVw IE5FV1MgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAtKi0gb3V0bGluZSAt Ki0KIAogKiogQnVnIGZpeGVzCiAKKyAgSW4gdGhlIEMgb3IgUE9TSVggbG9jYWxlLCBncmVw IG5vdyB0cmVhdHMgYWxsIGJ5dGVzIGFzIHZhbGlkCisgIGNoYXJhY3RlcnMgZXZlbiBpZiB0 aGUgQyBydW50aW1lIGxpYnJhcnkgc2F5cyBvdGhlcndpc2UuICBUaGUKKyAgcmV2aXNlZCBi ZWhhdmlvciBpcyBtb3JlIGNvbXBhdGlibGUgd2l0aCB0aGUgb3JpZ2luYWwgaW50ZW50IG9m CisgIFBPU0lYLCBhbmQgdGhlIG5leHQgcmVsZWFzZSBvZiBQT1NJWCB3aWxsIGxpa2VseSBt YWtlIHRoaXMgb2ZmaWNpYWwuCisgIFtidWcgaW50cm9kdWNlZCBpbiBncmVwLTIuMjNdCisK ICAgZ3JlcCAtUHogbm8gbG9uZ2VyIG1pc3Rha2VubHkgZGlhZ25vc2VzIHBhdHRlcm5zIGxp a2UgW15hXSB0aGF0IHVzZQogICBuZWdhdGVkIGNoYXJhY3RlciBjbGFzc2VzLiBbYnVnIGlu dHJvZHVjZWQgaW4gZ3JlcC0yLjI0XQogCmRpZmYgLS1naXQgYS9kb2MvZ3JlcC50ZXhpIGIv ZG9jL2dyZXAudGV4aQppbmRleCAxZDNkNWNiLi40ZTBlNDhlIDEwMDY0NAotLS0gYS9kb2Mv Z3JlcC50ZXhpCisrKyBiL2RvYy9ncmVwLnRleGkKQEAgLTU5OSw3ICs1OTksOCBAQCBJZiBh IGZpbGUncyBkYXRhIG9yIG1ldGFkYXRhCiBpbmRpY2F0ZSB0aGF0IHRoZSBmaWxlIGNvbnRh aW5zIGJpbmFyeSBkYXRhLAogYXNzdW1lIHRoYXQgdGhlIGZpbGUgaXMgb2YgdHlwZSBAdmFy e3R5cGV9LgogTm9uLXRleHQgYnl0ZXMgaW5kaWNhdGUgYmluYXJ5IGRhdGE7IHRoZXNlIGFy ZSBlaXRoZXIgb3V0cHV0IGJ5dGVzIHRoYXQgYXJlCi1pbXByb3Blcmx5IGVuY29kZWQgZm9y IHRoZSBjdXJyZW50IGxvY2FsZSwgb3IgbnVsbCBpbnB1dCBieXRlcyB3aGVuIHRoZQoraW1w cm9wZXJseSBlbmNvZGVkIGZvciB0aGUgY3VycmVudCBsb2NhbGUgKEBweHJlZntFbnZpcm9u bWVudAorVmFyaWFibGVzfSksIG9yIG51bGwgaW5wdXQgYnl0ZXMgd2hlbiB0aGUKIEBvcHRp b257LXp9IChAb3B0aW9uey0tbnVsbC1kYXRhfSkgb3B0aW9uIGlzIG5vdCBnaXZlbiAoQHB4 cmVme090aGVyCiBPcHRpb25zfSkuCiAKQEAgLTYyNywxMCArNjI4LDEzIEBAIGlzIG5vdCBt YXRjaGVkIHdoZW4gQHZhcnt0eXBlfSBpcyBAc2FtcHt0ZXh0fS4gIENvbnZlcnNlbHksIHdo ZW4KIEB2YXJ7dHlwZX0gaXMgQHNhbXB7YmluYXJ5fSB0aGUgcGF0dGVybiBAc2FtcHsufSAo cGVyaW9kKSBtaWdodCBub3QKIG1hdGNoIGEgbnVsbCBieXRlLgogCi1AZW1waHtXYXJuaW5n On0gQHNhbXB7LS1iaW5hcnktZmlsZXM9dGV4dH0gbWlnaHQgb3V0cHV0IGJpbmFyeSBnYXJi YWdlLAotd2hpY2ggY2FuIGhhdmUgbmFzdHkgc2lkZSBlZmZlY3RzCi1pZiB0aGUgb3V0cHV0 IGlzIGEgdGVybWluYWwgYW5kCi1pZiB0aGUgdGVybWluYWwgZHJpdmVyIGludGVycHJldHMg c29tZSBvZiBpdCBhcyBjb21tYW5kcy4KK0BlbXBoe1dhcm5pbmc6fSBUaGUgQG9wdGlvbnst YX0gKEBvcHRpb257LS1iaW5hcnktZmlsZXM9dGV4dH0pIG9wdGlvbgorbWlnaHQgb3V0cHV0 IGJpbmFyeSBnYXJiYWdlLCB3aGljaCBjYW4gaGF2ZSBuYXN0eSBzaWRlIGVmZmVjdHMgaWYg dGhlCitvdXRwdXQgaXMgYSB0ZXJtaW5hbCBhbmQgaWYgdGhlIHRlcm1pbmFsIGRyaXZlciBp bnRlcnByZXRzIHNvbWUgb2YgaXQKK2FzIGNvbW1hbmRzLiAgT24gdGhlIG90aGVyIGhhbmQs IHdoZW4gcmVhZGluZyBmaWxlcyB3aG9zZSB0ZXh0CitlbmNvZGluZ3MgYXJlIHVua25vd24s IGl0IGNhbiBiZSBoZWxwZnVsIHRvIHVzZSBAb3B0aW9uey1hfSBvciB0byBzZXQKK0BzYW1w e0xDX0FMTD0nQyd9IGluIHRoZSBlbnZpcm9ubWVudCwgaW4gb3JkZXIgdG8gZmluZCBtb3Jl IG1hdGNoZXMKK2V2ZW4gaWYgdGhlIG1hdGNoZXMgYXJlIHVuc2FmZSBmb3IgZGlyZWN0IGRp c3BsYXkuCiAKIEBpdGVtIC1EIEB2YXJ7YWN0aW9ufQogQGl0ZW14IC0tZGV2aWNlcz1AdmFy e2FjdGlvbn0KQEAgLTgwMyw2ICs4MDcsNyBAQCBUaGUgQHNhbXB7Q30gbG9jYWxlIGlzIHVz ZWQgaWYgbm9uZSBvZiB0aGVzZSBlbnZpcm9ubWVudCB2YXJpYWJsZXMgYXJlIHNldCwKIGlm IHRoZSBsb2NhbGUgY2F0YWxvZyBpcyBub3QgaW5zdGFsbGVkLAogb3IgaWYgQGNvbW1hbmR7 Z3JlcH0gd2FzIG5vdCBjb21waWxlZAogd2l0aCBuYXRpb25hbCBsYW5ndWFnZSBzdXBwb3J0 IChOTFMpLgorVGhlIHNoZWxsIGNvbW1hbmQgQGNvZGV7bG9jYWxlIC1hfSBsaXN0cyBsb2Nh bGVzIHRoYXQgYXJlIGN1cnJlbnRseSBhdmFpbGFibGUuCiAKIE1hbnkgb2YgdGhlIGVudmly b25tZW50IHZhcmlhYmxlcyBpbiB0aGUgZm9sbG93aW5nIGxpc3QgbGV0IHlvdQogY29udHJv bCBoaWdobGlnaHRpbmcgdXNpbmcKQEAgLTEwMDQsNiArMTAwOSwxMCBAQCBpbnRlcnByZXRl ZC4KIFRoZXNlIHZhcmlhYmxlcyBzcGVjaWZ5IHRoZSBsb2NhbGUgZm9yIHRoZSBAZW52e0xD X0NUWVBFfSBjYXRlZ29yeSwKIHdoaWNoIGRldGVybWluZXMgdGhlIHR5cGUgb2YgY2hhcmFj dGVycywKIGUuZy4sIHdoaWNoIGNoYXJhY3RlcnMgYXJlIHdoaXRlc3BhY2UuCitUaGlzIGNh dGVnb3J5IGFsc28gZGV0ZXJtaW5lcyB0aGUgY2hhcmFjdGVyIGVuY29kaW5nLCB0aGF0IGlz LCB3aGV0aGVyCit0ZXh0IGlzIGVuY29kZWQgaW4gVVRGLTgsIEFTQ0lJLCBvciBzb21lIG90 aGVyIGVuY29kaW5nLiAgSW4gdGhlCitAc2FtcHtDfSBvciBAc2FtcHtQT1NJWH0gbG9jYWxl LCBhbGwgY2hhcmFjdGVycyBhcmUgZW5jb2RlZCBhcyBhCitzaW5nbGUgYnl0ZSBhbmQgZXZl cnkgYnl0ZSBpcyBhIHZhbGlkIGNoYXJhY3Rlci4KIAogQGl0ZW0gTEFOR1VBR0UKIEBpdGVt eCBMQ19BTEwKZGlmZiAtLWdpdCBhL3Rlc3RzL01ha2VmaWxlLmFtIGIvdGVzdHMvTWFrZWZp bGUuYW0KaW5kZXggYjY1ZmMzOS4uNDU5MDhjZSAxMDA2NDQKLS0tIGEvdGVzdHMvTWFrZWZp bGUuYW0KKysrIGIvdGVzdHMvTWFrZWZpbGUuYW0KQEAgLTUzLDYgKzUzLDcgQEAgVEVTVFMg PQkJCQkJCVwKICAgYmlnLW1hdGNoCQkJCQlcCiAgIGJvZ3VzLXdjdG9iCQkJCQlcCiAgIGJy ZQkJCQkJCVwKKyAgYy1sb2NhbGUJCQkJCVwKICAgY2FzZS1mb2xkLWJhY2tyZWYJCQkJXAog ICBjYXNlLWZvbGQtYmFja3NsYXNoLXcJCQkJXAogICBjYXNlLWZvbGQtY2hhci1jbGFzcwkJ CQlcCmRpZmYgLS1naXQgYS90ZXN0cy9jLWxvY2FsZSBiL3Rlc3RzL2MtbG9jYWxlCm5ldyBm aWxlIG1vZGUgMTAwNzU1CmluZGV4IDAwMDAwMDAuLjFmZTVjNzAKLS0tIC9kZXYvbnVsbAor KysgYi90ZXN0cy9jLWxvY2FsZQpAQCAtMCwwICsxLDI2IEBACisjISAvYmluL3NoCisjIFJl Z3Jlc3Npb24gdGVzdCBmb3IgR05VIGdyZXAuCisjCisjIENvcHlyaWdodCAyMDE2IEZyZWUg U29mdHdhcmUgRm91bmRhdGlvbiwgSW5jLgorIworIyBDb3B5aW5nIGFuZCBkaXN0cmlidXRp b24gb2YgdGhpcyBmaWxlLCB3aXRoIG9yIHdpdGhvdXQgbW9kaWZpY2F0aW9uLAorIyBhcmUg cGVybWl0dGVkIGluIGFueSBtZWRpdW0gd2l0aG91dCByb3lhbHR5IHByb3ZpZGVkIHRoZSBj b3B5cmlnaHQKKyMgbm90aWNlIGFuZCB0aGlzIG5vdGljZSBhcmUgcHJlc2VydmVkLgorCisu ICIke3NyY2Rpcj0ufS9pbml0LnNoIjsgcGF0aF9wcmVwZW5kXyAuLi9zcmMKKworZmFpbD0w CisKK2M9MQord2hpbGUgdGVzdCAkYyAtbHQgMjU2OyBkbworICB0cjI9JChwcmludGYgJ1xc JW9cbicgJGMpCisgIGVjaG8gWCB8IHRyIFggIiR0cjIiID5pbgorICBpZiB0ZXN0ICQod2Mg LWwgPGluKSAtZXEgMTsgdGhlbgorICAgIGdyZXAgLiBpbiA+b3V0IHx8IGZhaWw9MQorICAg IGNvbXBhcmUgaW4gb3V0IHx8IGZhaWw9MQorICBmaQorICB0ZXN0ICRmYWlsIC1uZSAwICYm IEV4aXQgJGZhaWwKKyAgYz0kKGV4cHIgJGMgKyAxKQorZG9uZQorCitFeGl0ICRmYWlsCi0t IAoyLjUuNQoK --------------080206080106020003010701-- From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 10 17:10:37 2016 Received: (at 23234) by debbugs.gnu.org; 10 Apr 2016 21:10:38 +0000 Received: from localhost ([127.0.0.1]:57582 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apMcv-0003ww-Fn for submit@debbugs.gnu.org; Sun, 10 Apr 2016 17:10:37 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:35099) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apMcs-0003wd-86; Sun, 10 Apr 2016 17:10:35 -0400 Received: by mail-oi0-f67.google.com with SMTP id w18so22038587oie.2; Sun, 10 Apr 2016 14:10:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=FSNREibGp/Wu7n7INQf80df5jSAVyq+wtAp/DC+ERCE=; b=sKt8OLAOsKCnvBAEsDCZ3UsIJzKlNuSHbKNvnb7VDTHTRPv2AB9tsFHqJJOg94IWvH NyLDDT0vzOY2ZPPR5YyiWu4fG5G0C11hX0fohbgU6AuWuIRlwp1O8YzPZm29fB9/Nv0A fDwNq3faVFSM5mlfR3BvH5SzOUbh0EmaaCXjBa2ElN7ytcAXtXdBKBgTOlDKzE7XKtHf /ucO0xCR+3WbER5+gwQ+V8tazqt/xaiC/4+5JocFFdhA/RQEAkBDkywaZn0YSb3RXqmp cySyf8VzP8uzdLS5iF00/0VsPvZkLHDNUS398jdwlbWg0PylUYjqdOjvmmdkGvomxwhH XXDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=FSNREibGp/Wu7n7INQf80df5jSAVyq+wtAp/DC+ERCE=; b=XPrVgJwaerB/N92G7fJwBRG8PGjnRjKd0YPy0yJCb5BbjkjfwneT+lVmy8oy17/yxg a3QD2MqRhCIUcfQVioMNowgnU0Jtf2Eu09AJZw+bPP2tDrFh4o04Kl72h7LLPR8icROT fxl6tyLkLxqtJcer4Cdxa41W9peVhuB+orvenREechG3ZgtLMs3fXkYNJ1+q5mdhDESH QLcff6h/qCV3wgDH5GHIAiSxL7mF68CzavCuiuipgdy3ZDKblM1CLVNbGTidZ+hIqw7M 7ZEAwBJMDo7EbPn+DvGXqosB1gFS0T7zI4jtnDnVj/HHA/mlurb5xDOTkEh9HWSuxB2p wsjg== X-Gm-Message-State: AD7BkJJ3/SaA+c0QTBT3a1XfPIyEOQH1Ycfoilv7AdWVJYNMESyxwBftINl/gMS7tKFqmTXUoRZB76mWlnYNUw== X-Received: by 10.202.218.133 with SMTP id r127mr8409797oig.36.1460322628727; Sun, 10 Apr 2016 14:10:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.213.141 with HTTP; Sun, 10 Apr 2016 14:10:08 -0700 (PDT) In-Reply-To: <570A121E.4010802@cs.ucla.edu> References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <5705B7C1.4040301@cs.ucla.edu> <5708BE92.6010002@cs.ucla.edu> <570A121E.4010802@cs.ucla.edu> From: Jim Meyering Date: Sun, 10 Apr 2016 14:10:08 -0700 X-Google-Sender-Auth: 223rzzpv1gvSNRZsfSMKyW55ftk Message-ID: Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: 23234@debbugs.gnu.org, Paul Eggert , Bjoern Jacke Content-Type: multipart/mixed; boundary=001a113d44e68b1fd6053027dbdf X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 23234 Cc: 23234-done@debbugs.gnu.org, Eric Blake X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) --001a113d44e68b1fd6053027dbdf Content-Type: text/plain; charset=UTF-8 On Sun, Apr 10, 2016 at 1:43 AM, Paul Eggert wrote: > Paul Eggert wrote: >> >> I plan to change GNU grep to use this new facility, and to add some grep >> test >> cases for this issue. > > > I did that by installing the attached patches into the grep master. This > fixes the bug for me, so I'm closing the bug report. > > These patches mostly just report the fix and add test cases. The actual fix > was in gnulib, here: > > http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b7bc3c1a4e78add4cbad39ae1a0c4fb0747b483f > > This gnulib fix works around the underyling glibc facility which caused the > problem, for which I've filed a bug report here: > > https://sourceware.org/bugzilla/show_bug.cgi?id=19932 > > It's not clear when the glibc bug will be fixed. Until it is, one should > expect similar problems to crop up in applications other than 'grep'. Thanks for the fine work, Paul. With this fix, I would like to make yet another grep release. Does anyone have any pending changes we should consider? Incidentally, looking at mbrtowc uses, I found an unused function and removed it with this patch: --001a113d44e68b1fd6053027dbdf Content-Type: text/x-patch; charset=US-ASCII; name="0001-maint-remove-unused-mbtoupper-function.patch" Content-Disposition: attachment; filename="0001-maint-remove-unused-mbtoupper-function.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_imv2c17l2 RnJvbSA3MDRkZTg3MjVmYTlkZjgwYjBjYjc0MzA1MjczYWNmNWRkZTBiMWQ3IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog U3VuLCAxMCBBcHIgMjAxNiAxMjozNjoxOCAtMDcwMApTdWJqZWN0OiBbUEFUQ0hdIG1haW50OiBy ZW1vdmUgdW51c2VkIG1idG91cHBlciBmdW5jdGlvbgoKKiBzcmMvc2VhcmNodXRpbHMuYyAobWJ0 b3VwcGVyKTogUmVtb3ZlIG5vdy11bnVzZWQgZnVuY3Rpb24uCkFsc28gcmVtb3ZlIGluY2x1c2lv biBvZiA8YXNzZXJ0Lmg+LCBzaW5jZSB0aGlzIGNoYW5nZSByZW1vdmVkCnRoZSBmaW5hbCB1c2Ug b2YgYXNzZXJ0LgoqIHNyYy9zZWFyY2guaCAobWJ0b3VwcGVyKTogUmVtb3ZlIGRlY2xhcmF0aW9u LgotLS0KIHNyYy9zZWFyY2guaCAgICAgIHwgICAxIC0KIHNyYy9zZWFyY2h1dGlscy5jIHwgMTY1 IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQog MiBmaWxlcyBjaGFuZ2VkLCAxNjYgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL3NlYXJj aC5oIGIvc3JjL3NlYXJjaC5oCmluZGV4IGE2OWJmMTkuLjdkYzE5NDAgMTAwNjQ0Ci0tLSBhL3Ny Yy9zZWFyY2guaAorKysgYi9zcmMvc2VhcmNoLmgKQEAgLTQ4LDcgKzQ4LDYgQEAgdHlwZWRlZiBz aWduZWQgY2hhciBtYl9sZW5fbWFwX3Q7CiAvKiBzZWFyY2h1dGlscy5jICovCiBleHRlcm4gdm9p ZCBrd3Npbml0IChrd3NldF90ICopOwoKLWV4dGVybiBjaGFyICptYnRvdXBwZXIgKGNoYXIgY29u c3QgKiwgc2l6ZV90ICosIG1iX2xlbl9tYXBfdCAqKik7CiBleHRlcm4gdm9pZCBidWlsZF9tYmNs ZW5fY2FjaGUgKHZvaWQpOwogZXh0ZXJuIHNpemVfdCBtYmNsZW5fY2FjaGVbXTsKIGV4dGVybiBw dHJkaWZmX3QgbWJfZ29iYWNrIChjaGFyIGNvbnN0ICoqLCBjaGFyIGNvbnN0ICosIGNoYXIgY29u c3QgKik7CmRpZmYgLS1naXQgYS9zcmMvc2VhcmNodXRpbHMuYyBiL3NyYy9zZWFyY2h1dGlscy5j CmluZGV4IDJlYWIzZGMuLjFmMjFhMGUgMTAwNjQ0Ci0tLSBhL3NyYy9zZWFyY2h1dGlscy5jCisr KyBiL3NyYy9zZWFyY2h1dGlscy5jCkBAIC0yMiw4ICsyMiw2IEBACiAjZGVmaW5lIFNZU1RFTV9J TkxJTkUgX0dMX0VYVEVSTl9JTkxJTkUKICNpbmNsdWRlICJzZWFyY2guaCIKCi0jaW5jbHVkZSA8 YXNzZXJ0Lmg+Ci0KICNkZWZpbmUgTkNIQVIgKFVDSEFSX01BWCArIDEpCgogc2l6ZV90IG1iY2xl bl9jYWNoZVtOQ0hBUl07CkBAIC00OCwxNjkgKzQ2LDYgQEAga3dzaW5pdCAoa3dzZXRfdCAqa3dz ZXQpCiAgICAgeGFsbG9jX2RpZSAoKTsKIH0KCi0vKiBDb252ZXJ0IEJFRywgYW4gKk4tYnl0ZSBz dHJpbmcsIHRvIHVwcGVyY2FzZSwgYW5kIHdyaXRlIHRoZQotICAgTlVMLXRlcm1pbmF0ZWQgcmVz dWx0IGludG8gbWFsbG9jJ2Qgc3RvcmFnZS4gIFVwb24gc3VjY2Vzcywgc2V0ICpOCi0gICB0byB0 aGUgbGVuZ3RoIChpbiBieXRlcykgb2YgdGhlIHJlc3VsdGluZyBzdHJpbmcgKG5vdCBpbmNsdWRp bmcgdGhlCi0gICB0cmFpbGluZyBOVUwgYnl0ZSksIGFuZCByZXR1cm4gYSBwb2ludGVyIHRvIHRo ZSB1cHBlcmNhc2Ugc3RyaW5nLgotICAgVXBvbiBtZW1vcnkgYWxsb2NhdGlvbiBmYWlsdXJlLCBl eGl0LiAgKk4gbXVzdCBiZSBwb3NpdGl2ZS4KLQotICAgQWx0aG91Z2ggdGhpcyBmdW5jdGlvbiBy ZXR1cm5zIGEgcG9pbnRlciB0byBtYWxsb2MnZCBzdG9yYWdlLAotICAgdGhlIGNhbGxlciBtdXN0 IG5vdCBmcmVlIGl0LCBzaW5jZSB0aGlzIGZ1bmN0aW9uIHJldGFpbnMgYSBwb2ludGVyCi0gICB0 byB0aGUgYnVmZmVyIGFuZCByZXVzZXMgaXQgb24gYW55IHN1YnNlcXVlbnQgY2FsbC4gIEFzIGEg Y29uc2VxdWVuY2UsCi0gICB0aGlzIGZ1bmN0aW9uIGlzIG5vdCB0aHJlYWQtc2FmZS4KLQotICAg V2hlbiBlYWNoIGNoYXJhY3RlciBpbiB0aGUgdXBwZXJjYXNlIHJlc3VsdCBzdHJpbmcgaGFzIHRo ZSBzYW1lIGxlbmd0aAotICAgYXMgdGhlIGNvcnJlc3BvbmRpbmcgY2hhcmFjdGVyIGluIHRoZSBp bnB1dCBzdHJpbmcsIHNldCAqTEVOX01BUF9QCi0gICB0byBOVUxMLiAgT3RoZXJ3aXNlLCBzZXQg aXQgdG8gYSBtYWxsb2MnZCBidWZmZXIgKGxpa2UgdGhlIHJldHVybmVkCi0gICBidWZmZXIsIHRo aXMgbXVzdCBub3QgYmUgZnJlZWQgYnkgY2FsbGVyKSBvZiB0aGUgc2FtZSBsZW5ndGggYXMgdGhl Ci0gICByZXN1bHQgc3RyaW5nLiAgKCpMRU5fTUFQX1ApW0pdIGlzIHRoZSBjaGFuZ2UgaW4gYnl0 ZS1sZW5ndGggb2YgdGhlCi0gICBjaGFyYWN0ZXIgaW4gQkVHIHRoYXQgZm9ybWVkIGJ5dGUgSiBv ZiB0aGUgcmVzdWx0IGFzIGl0IHdhcyBjb252ZXJ0ZWQgdG8KLSAgIHVwcGVyY2FzZS4gIEl0IGlz IHVzdWFsbHkgemVyby4gIEZvciBsb3dlcmNhc2UgVHVya2lzaCBkb3RsZXNzIEkgaXQKLSAgIGlz IC0xLCBzaW5jZSB0aGUgbG93ZXJjYXNlIGlucHV0IG9jY3VwaWVzIHR3byBieXRlcywgd2hpbGUg dGhlCi0gICB1cHBlcmNhc2Ugb3V0cHV0IG9jY3VwaWVzIG9ubHkgb25lIGJ5dGUuICBGb3IgbG93 ZXJjYXNlIEkgaW4gdGhlCi0gICB0cl9UUi51dGY4IGxvY2FsZSwgaXQgaXMgMSBiZWNhdXNlIHRo ZSB1cHBlcmNhc2UgVHVya2lzaCBkb3R0ZWQgSQotICAgaXMgb25lIGJ5dGUgbG9uZ2VyIHRoYW4g dGhlIG9yaWdpbmFsLiAgV2hlbiB0aGF0IGhhcHBlbnMsIHdlIGhhdmUgdHdvCi0gICBvciBtb3Jl IHNsb3RzIGluICpMRU5fTUFQX1AgZm9yIGVhY2ggc3VjaCBjaGFyYWN0ZXIuICBXZSBzdG9yZSB0 aGUKLSAgIGRpZmZlcmVuY2UgaW4gdGhlIGZpcnN0IG9uZSBhbmQgMCdzIGluIGFueSByZW1haW5p bmcgc2xvdHMuCi0KLSAgIFRoaXMgbWFwIGlzIHVzZWQgYnkgdGhlIGNhbGxlciB0byBjb252ZXJ0 IG9mZnNldCxsZW5ndGggcGFpcnMgdGhhdAotICAgcmVmZXJlbmNlIHRoZSB1cHBlcmNhc2UgcmVz dWx0IHRvIG51bWJlcnMgdGhhdCByZWZlciB0byB0aGUgbWF0Y2hlZAotICAgcGFydCBvZiB0aGUg b3JpZ2luYWwgYnVmZmVyLiAgKi8KLQotY2hhciAqCi1tYnRvdXBwZXIgKGNvbnN0IGNoYXIgKmJl Zywgc2l6ZV90ICpuLCBtYl9sZW5fbWFwX3QgKipsZW5fbWFwX3ApCi17Ci0gIHN0YXRpYyBjaGFy ICpvdXQ7Ci0gIHN0YXRpYyBtYl9sZW5fbWFwX3QgKmxlbl9tYXA7Ci0gIHN0YXRpYyBzaXplX3Qg b3V0YWxsb2M7Ci0gIHNpemVfdCBvdXRsZW4sIG1iX2N1cl9tYXg7Ci0gIG1ic3RhdGVfdCBpcywg b3M7Ci0gIGNvbnN0IGNoYXIgKmVuZDsKLSAgY2hhciAqcDsKLSAgbWJfbGVuX21hcF90ICptOwot ICBib29sIGxlbmd0aHNfZGlmZmVyID0gZmFsc2U7Ci0KLSAgaWYgKCpuID4gb3V0YWxsb2MgfHwg b3V0YWxsb2MgPT0gMCkKLSAgICB7Ci0gICAgICBvdXRhbGxvYyA9IE1BWCAoMSwgKm4pOwotICAg ICAgb3V0ID0geHJlYWxsb2MgKG91dCwgb3V0YWxsb2MpOwotICAgICAgbGVuX21hcCA9IHhyZWFs bG9jIChsZW5fbWFwLCBvdXRhbGxvYyk7Ci0gICAgfQotCi0gIC8qIGFwcGVhc2UgY2xhbmctMi42 ICovCi0gIGFzc2VydCAob3V0KTsKLSAgYXNzZXJ0IChsZW5fbWFwKTsKLSAgaWYgKCpuID09IDAp Ci0gICAgcmV0dXJuIG91dDsKLQotICBtZW1zZXQgKCZpcywgMCwgc2l6ZW9mIChpcykpOwotICBt ZW1zZXQgKCZvcywgMCwgc2l6ZW9mIChvcykpOwotICBlbmQgPSBiZWcgKyAqbjsKLQotICBtYl9j dXJfbWF4ID0gTUJfQ1VSX01BWDsKLSAgcCA9IG91dDsKLSAgbSA9IGxlbl9tYXA7Ci0gIG91dGxl biA9IDA7Ci0gIHdoaWxlIChiZWcgPCBlbmQpCi0gICAgewotICAgICAgd2NoYXJfdCB3YzsKLSAg ICAgIHNpemVfdCBtYmNsZW4gPSBtYnJ0b3djICgmd2MsIGJlZywgZW5kIC0gYmVnLCAmaXMpOwot I2lmZGVmIF9fQ1lHV0lOX18KLSAgICAgIC8qIEhhbmRsZSBhIFVURi04IHNlcXVlbmNlIGZvciBh IGNoYXJhY3RlciBiZXlvbmQgdGhlIGJhc2UgcGxhbmUuCi0gICAgICAgICBDeWd3aW4ncyB3Y2hh cl90IGlzIFVURi0xNiwgYXMgaW4gdGhlIHVuZGVybHlpbmcgT1MuICBUaGlzCi0gICAgICAgICBy ZXN1bHRzIGluIHN1cnJvZ2F0ZSBwYWlycyB3aGljaCBuZWVkIHNvbWUgZXh0cmEgYXR0ZW50aW9u LiAgKi8KLSAgICAgIHdpbnRfdCB3Y2kgPSAwOwotICAgICAgaWYgKG1iY2xlbiA9PSAzICYmICh3 YyAmIDB4ZGMwMCkgPT0gMHhkODAwKQotICAgICAgICB7Ci0gICAgICAgICAgLyogV2UgZ290IHRo ZSBzdGFydCBvZiBhIDQgYnl0ZSBVVEYtOCBzZXF1ZW5jZS4gIFRoaXMgaXMgcmV0dXJuZWQKLSAg ICAgICAgICAgICBhcyBhIFVURi0xNiBzdXJyb2dhdGUgcGFpci4gIFRoZSBmaXJzdCBjYWxsIHRv IG1icnRvd2MgcmV0dXJuZWQgMwotICAgICAgICAgICAgIGFuZCB3YyBoYXMgYmVlbiBzZXQgdG8g YSBoaWdoIHN1cnJvZ2F0ZSB2YWx1ZSwgbm93IHdlJ3JlIGdvaW5nCi0gICAgICAgICAgICAgdG8g ZmV0Y2ggdGhlIG1hdGNoaW5nIGxvdyBzdXJyb2dhdGUuICBUaGlzIHNlY29uZCBjYWxsIHRvIG1i cnRvd2MKLSAgICAgICAgICAgICBpcyBzdXBwb3NlZCB0byByZXR1cm4gMSB0byBjb21wbGV0ZSB0 aGUgNCBieXRlIFVURi04IHNlcXVlbmNlLiAgKi8KLSAgICAgICAgICB3Y2hhcl90IHdjXzI7Ci0g ICAgICAgICAgc2l6ZV90IG1iY2xlbl8yID0gbWJydG93YyAoJndjXzIsIGJlZyArIG1iY2xlbiwg ZW5kIC0gYmVnIC0gbWJjbGVuLAotICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICZpcyk7Ci0gICAgICAgICAgaWYgKG1iY2xlbl8yID09IDEgJiYgKHdjXzIgJiAweGRjMDApID09 IDB4ZGMwMCkKLSAgICAgICAgICAgIHsKLSAgICAgICAgICAgICAgLyogTWF0Y2guICBDb252ZXJ0 IHRoaXMgdG8gYSA0IGJ5dGUgd2ludF90IHdoaWNoIGNvbnN0aXR1dGVzCi0gICAgICAgICAgICAg ICAgIGEgMzItYml0IFVURi0zMiB2YWx1ZS4gICovCi0gICAgICAgICAgICAgIHdjaSA9ICggKCgo d2ludF90KSAod2MgLSAweGQ4MDApKSA8PCAxMCkKLSAgICAgICAgICAgICAgICAgICAgIHwgKCh3 aW50X3QpICh3Y18yIC0gMHhkYzAwKSkpCi0gICAgICAgICAgICAgICAgICAgICsgMHgxMDAwMDsK LSAgICAgICAgICAgICAgKyttYmNsZW47Ci0gICAgICAgICAgICB9Ci0gICAgICAgICAgZWxzZQot ICAgICAgICAgICAgewotICAgICAgICAgICAgICAvKiBJbnZhbGlkIFVURi04IHNlcXVlbmNlLiAg Ki8KLSAgICAgICAgICAgICAgbWJjbGVuID0gKHNpemVfdCkgLTE7Ci0gICAgICAgICAgICB9Ci0g ICAgICAgIH0KLSNlbmRpZgotICAgICAgaWYgKG91dGxlbiArIG1iX2N1cl9tYXggPj0gb3V0YWxs b2MpCi0gICAgICAgIHsKLSAgICAgICAgICBzaXplX3QgZG0gPSBtIC0gbGVuX21hcDsKLSAgICAg ICAgICBvdXQgPSB4Mm5yZWFsbG9jIChvdXQsICZvdXRhbGxvYywgMSk7Ci0gICAgICAgICAgbGVu X21hcCA9IHhyZWFsbG9jIChsZW5fbWFwLCBvdXRhbGxvYyk7Ci0gICAgICAgICAgcCA9IG91dCAr IG91dGxlbjsKLSAgICAgICAgICBtID0gbGVuX21hcCArIGRtOwotICAgICAgICB9Ci0KLSAgICAg IGlmIChtYmNsZW4gPT0gKHNpemVfdCkgLTEgfHwgbWJjbGVuID09IChzaXplX3QpIC0yIHx8IG1i Y2xlbiA9PSAwKQotICAgICAgICB7Ci0gICAgICAgICAgLyogQW4gaW52YWxpZCBzZXF1ZW5jZSwg b3IgYSB0cnVuY2F0ZWQgbXVsdGktb2N0ZXQgY2hhcmFjdGVyLgotICAgICAgICAgICAgIFdlIHRy ZWF0IGl0IGFzIGEgc2luZ2xlLW9jdGV0IGNoYXJhY3Rlci4gICovCi0gICAgICAgICAgKm0rKyA9 IDA7Ci0gICAgICAgICAgKnArKyA9ICpiZWcrKzsKLSAgICAgICAgICBvdXRsZW4rKzsKLSAgICAg ICAgICBtZW1zZXQgKCZpcywgMCwgc2l6ZW9mIChpcykpOwotICAgICAgICAgIG1lbXNldCAoJm9z LCAwLCBzaXplb2YgKG9zKSk7Ci0gICAgICAgIH0KLSAgICAgIGVsc2UKLSAgICAgICAgewotICAg ICAgICAgIHNpemVfdCBvbWJjbGVuOwotICAgICAgICAgIGJlZyArPSBtYmNsZW47Ci0jaWZkZWYg X19DWUdXSU5fXwotICAgICAgICAgIC8qIEhhbmRsZSBVbmljb2RlIGNoYXJhY3RlcnMgYmV5b25k IHRoZSBiYXNlIHBsYW5lLiAgKi8KLSAgICAgICAgICBpZiAobWJjbGVuID09IDQpCi0gICAgICAg ICAgICB7Ci0gICAgICAgICAgICAgIC8qIHRvd3VwcGVyLCB0YWtpbmcgd2ludF90ICg0IGJ5dGVz KSwgaGFuZGxlcyBVQ1MtNCB2YWx1ZXMuICAqLwotICAgICAgICAgICAgICB3Y2kgPSB0b3d1cHBl ciAod2NpKTsKLSAgICAgICAgICAgICAgaWYgKHdjaSA+PSAweDEwMDAwKQotICAgICAgICAgICAg ICAgIHsKLSAgICAgICAgICAgICAgICAgIHdjaSAtPSAweDEwMDAwOwotICAgICAgICAgICAgICAg ICAgd2MgPSAod2NpID4+IDEwKSB8IDB4ZDgwMDsKLSAgICAgICAgICAgICAgICAgIC8qIE5vIG5l ZWQgdG8gY2hlY2sgdGhlIHJldHVybiB2YWx1ZS4gIFdoZW4gcmVhZGluZyB0aGUKLSAgICAgICAg ICAgICAgICAgICAgIGhpZ2ggc3Vycm9nYXRlLCB0aGUgcmV0dXJuIHZhbHVlIHdpbGwgYmUgMCBh bmQgb25seSB0aGUKLSAgICAgICAgICAgICAgICAgICAgIG1ic3RhdGUgaW5kaWNhdGVzIHRoYXQg d2UncmUgaW4gdGhlIG1pZGRsZSBvZiByZWFkaW5nIGEKLSAgICAgICAgICAgICAgICAgICAgIHN1 cnJvZ2F0ZSBwYWlyLiAgVGhlIG5leHQgd2NydG9tYiBjYWxsIHJlYWRpbmcgdGhlIGxvdwotICAg ICAgICAgICAgICAgICAgICAgc3Vycm9nYXRlIHdpbGwgdGhlbiByZXR1cm4gNCBhbmQgcmVzZXQg dGhlIG1ic3RhdGUuICAqLwotICAgICAgICAgICAgICAgICAgd2NydG9tYiAocCwgd2MsICZvcyk7 Ci0gICAgICAgICAgICAgICAgICB3YyA9ICh3Y2kgJiAweDNmZikgfCAweGRjMDA7Ci0gICAgICAg ICAgICAgICAgfQotICAgICAgICAgICAgICBlbHNlCi0gICAgICAgICAgICAgICAgewotICAgICAg ICAgICAgICAgICAgd2MgPSAod2NoYXJfdCkgd2NpOwotICAgICAgICAgICAgICAgIH0KLSAgICAg ICAgICAgICAgb21iY2xlbiA9IHdjcnRvbWIgKHAsIHdjLCAmb3MpOwotICAgICAgICAgICAgfQot ICAgICAgICAgIGVsc2UKLSNlbmRpZgotICAgICAgICAgIG9tYmNsZW4gPSB3Y3J0b21iIChwLCB0 b3d1cHBlciAod2MpLCAmb3MpOwotICAgICAgICAgICptID0gbWJjbGVuIC0gb21iY2xlbjsKLSAg ICAgICAgICBtZW1zZXQgKG0gKyAxLCAwLCBvbWJjbGVuIC0gMSk7Ci0gICAgICAgICAgbSArPSBv bWJjbGVuOwotICAgICAgICAgIHAgKz0gb21iY2xlbjsKLSAgICAgICAgICBvdXRsZW4gKz0gb21i Y2xlbjsKLSAgICAgICAgICBsZW5ndGhzX2RpZmZlciB8PSAobWJjbGVuICE9IG9tYmNsZW4pOwot ICAgICAgICB9Ci0gICAgfQotCi0gICpsZW5fbWFwX3AgPSBsZW5ndGhzX2RpZmZlciA/IGxlbl9t YXAgOiBOVUxMOwotICAqbiA9IHAgLSBvdXQ7Ci0gICpwID0gMDsKLSAgcmV0dXJuIG91dDsKLX0K LQogLyogSW5pdGlhbGl6ZSBhIGNhY2hlIG9mIG1icmxlbiB2YWx1ZXMgZm9yIGVhY2ggb2YgaXRz IDEtYnl0ZSBpbnB1dHMuICAqLwogdm9pZAogYnVpbGRfbWJjbGVuX2NhY2hlICh2b2lkKQotLSAK Mi41LjUKCg== --001a113d44e68b1fd6053027dbdf-- From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 10 17:59:23 2016 Received: (at 23234) by debbugs.gnu.org; 10 Apr 2016 21:59:23 +0000 Received: from localhost ([127.0.0.1]:57607 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apNO7-0005De-CR for submit@debbugs.gnu.org; Sun, 10 Apr 2016 17:59:23 -0400 Received: from thorn.bewilderbeest.net ([71.19.156.171]:43486) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apNO5-0005DQ-O2; Sun, 10 Apr 2016 17:59:22 -0400 Received: from hatter.bewilderbeest.net (hatter.bewilderbeest.net [IPv6:2001:470:c3f4:1::1:1]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: zev) by thorn.bewilderbeest.net (Postfix) with ESMTPSA id D63748042E; Sun, 10 Apr 2016 14:59:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bewilderbeest.net; s=thorn; t=1460325560; bh=sVWw7ibq4B3E28KnDe5cjkHnRnvb/wQUcm/VOwGObrA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=EWg75NCXpIpHEPOkiDPx2HzbTWLu4sEn2UZt16edp0h5771k6OnWvMBsyt3Vxy1Oa iQzKkQqVRUOIpRBwNu+HWBZMeAfVx9DLlKvEft3/32v9OZhFwSLaX1UDHK1+vMC15q /7mhnwz/8PhSeVVNu/ynFzzprVxKlmmIcDaQYyuw= Date: Sun, 10 Apr 2016 16:59:08 -0500 From: Zev Weiss To: Jim Meyering Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 Message-ID: <20160410215908.GA23038@hatter.bewilderbeest.net> References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <5705B7C1.4040301@cs.ucla.edu> <5708BE92.6010002@cs.ucla.edu> <570A121E.4010802@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 Cc: Bjoern Jacke , 23234-done@debbugs.gnu.org, Paul Eggert , 23234@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Sun, Apr 10, 2016 at 02:10:08PM -0700, Jim Meyering wrote: >On Sun, Apr 10, 2016 at 1:43 AM, Paul Eggert wrote: >> Paul Eggert wrote: >>> >>> I plan to change GNU grep to use this new facility, and to add some grep >>> test >>> cases for this issue. >> >> >> I did that by installing the attached patches into the grep master. This >> fixes the bug for me, so I'm closing the bug report. >> >> These patches mostly just report the fix and add test cases. The actual fix >> was in gnulib, here: >> >> http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b7bc3c1a4e78add4cbad39ae1a0c4fb0747b483f >> >> This gnulib fix works around the underyling glibc facility which caused the >> problem, for which I've filed a bug report here: >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=19932 >> >> It's not clear when the glibc bug will be fixed. Until it is, one should >> expect similar problems to crop up in applications other than 'grep'. > >Thanks for the fine work, Paul. >With this fix, I would like to make yet another grep release. >Does anyone have any pending changes we should consider? > >Incidentally, looking at mbrtowc uses, I found an unused >function and removed it with this patch: Well, I still have my multithreading patch series (https://github.com/zevweiss/grep/) awaiting review, which I'd hope to get applied at some point, though I'd guess it's enough of a review task that delaying an impending release for it isn't likely (the mbtoupper()-removal patch made that series one patch shorter though, since one was to deal with that function's thread-unsafety). I've been rebasing it periodically and running it on my own system in /usr/local without any problems for a while now, for what that's worth. With current HEAD from savannah though, all check-very-expensive tests pass for me on Debian stretch with gcc 5.3, glibc 2.22, and Linux kernel 4.3. Zev Weiss From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 10 18:09:43 2016 Received: (at 23234) by debbugs.gnu.org; 10 Apr 2016 22:09:43 +0000 Received: from localhost ([127.0.0.1]:57625 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apNY7-0005UH-4G for submit@debbugs.gnu.org; Sun, 10 Apr 2016 18:09:43 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:48547) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apNY4-0005To-FY; Sun, 10 Apr 2016 18:09:40 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 04292160FD3; Sun, 10 Apr 2016 15:09:35 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id RapsR_ztgtow; Sun, 10 Apr 2016 15:09:34 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 106A6161250; Sun, 10 Apr 2016 15:09:34 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id kIZBwdkxNnAV; Sun, 10 Apr 2016 15:09:33 -0700 (PDT) Received: from [192.168.1.25] (unknown [71.109.149.160]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 2E1EE160FD3; Sun, 10 Apr 2016 15:09:28 -0700 (PDT) Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Zev Weiss , Jim Meyering References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <5705B7C1.4040301@cs.ucla.edu> <5708BE92.6010002@cs.ucla.edu> <570A121E.4010802@cs.ucla.edu> <20160410215908.GA23038@hatter.bewilderbeest.net> From: Paul Eggert Message-ID: <570ACF0D.2040507@cs.ucla.edu> Date: Sun, 10 Apr 2016 15:09:17 -0700 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160410215908.GA23038@hatter.bewilderbeest.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 Cc: Bjoern Jacke , 23234-done@debbugs.gnu.org, 20768@debbugs.gnu.org, 23234@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 04/10/2016 02:59 PM, Zev Weiss wrote: > I still have my multithreading patch series > (https://github.com/zevweiss/grep/) awaiting review, which I'd hope to > get applied at some point, though I'd guess it's enough of a review > task that delaying an impending release for it isn't likely (the > mbtoupper()-removal patch made that series one patch shorter though, > since one was to deal with that function's thread-unsafety). I've > been rebasing it periodically and running it on my own system in > /usr/local without any problems for a while now, for what that's worth. > > With current HEAD from savannah though, all check-very-expensive tests > pass for me on Debian stretch with gcc 5.3, glibc 2.22, and Linux > kernel 4.3. Thanks for pinging us about this. Sorry, I kind of dropped the ball on this one. I will try to bump its priority. There are some other long-pending patches that also need review. I agree that these shouldn't delay the next release, but perhaps it could delay the release after that.... From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 11 00:46:34 2016 Received: (at 23234) by debbugs.gnu.org; 11 Apr 2016 04:46:34 +0000 Received: from localhost ([127.0.0.1]:57833 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apTkA-0003Ob-B2 for submit@debbugs.gnu.org; Mon, 11 Apr 2016 00:46:34 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:33062) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apTk7-0003OJ-Q3; Mon, 11 Apr 2016 00:46:32 -0400 Received: by mail-oi0-f67.google.com with SMTP id v67so22776806oie.0; Sun, 10 Apr 2016 21:46:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=HYjuiG/ATZmamqcDHfp4E694VHHNLpLycATajFwpgIg=; b=EK63exVhy6m4NEG/LYSIECqC6eoaOycxvMkMyOtpNJIwvfPNTdXi9fWAA/jPzjVBy4 iCzkMm34nW6MIDy6ZsQDa0Oxw2tMDpSACEeolE9zvQjkedh9LHdZlMtbqn6r88SiMhLr 14ZHvtQrO64HfkSpAcHLc3vxfx5bMT2F+QjvqOtwfKKUtpe6XyEC64/OVR3tMn+af6sN v0BLDqhHSir/+UR4RscOiEIWKYKIIOl1mx5aLoi1JTPPgf71HTJD+ecPGUBHeY6aHcYY LCVp6fRPXdjaZxApR28O9Cjcs9KKlXwhIe6BVpabEYxBMdhEuvMNoFCQxlcBATzX6Ksa hCNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=HYjuiG/ATZmamqcDHfp4E694VHHNLpLycATajFwpgIg=; b=Na/4oqpRRWnDoXRoetHpo4liN8As60AZcW0T7xK5YrQIZ91wIRX1898sD1izicLpy+ XMOuoun+xMea5Lw9wuL6jC5OXiqOgxJ59vni0UKrLvAqiwu3YaquW6HyUACIbfdfh+e/ 1VNh27XV/GpUYuzLNTWAPVWpoVd09BsN0KvkKY8IEKL3HzgIsEaBOQQBrrSk+iNTo1OD r9+Do63Ld6iohF2ooI0JG9DQdhyKHNWnuQOeEai4v0sFPe9g5oNhR86L54C/n85G7+53 1Gew00cU/L5+flQgK4AUZboxL6Sz3AomUFRTirlY8L6RYZG4qbj/nOIJPxJ/11Bykxvl fsRA== X-Gm-Message-State: AD7BkJIPRfFQgyET1r64F4Xmr+Bmdz6BAQMk4kidoVuqykEFgUaq3K1pKJ2ufc67vXeGBw1uoRpogJYsXVkdXg== X-Received: by 10.202.0.78 with SMTP id 75mr8818990oia.134.1460349985906; Sun, 10 Apr 2016 21:46:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.213.141 with HTTP; Sun, 10 Apr 2016 21:46:06 -0700 (PDT) In-Reply-To: <20160410215908.GA23038@hatter.bewilderbeest.net> References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <5705B7C1.4040301@cs.ucla.edu> <5708BE92.6010002@cs.ucla.edu> <570A121E.4010802@cs.ucla.edu> <20160410215908.GA23038@hatter.bewilderbeest.net> From: Jim Meyering Date: Sun, 10 Apr 2016 21:46:06 -0700 X-Google-Sender-Auth: hBdGFBn3nX7nwhDqKa2tId2Ll5s Message-ID: Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Zev Weiss Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 23234 Cc: Bjoern Jacke , 23234-done@debbugs.gnu.org, Paul Eggert , 23234@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Sun, Apr 10, 2016 at 2:59 PM, Zev Weiss wrote: ... > Well, I still have my multithreading patch series > (https://github.com/zevweiss/grep/) awaiting review, which I'd hope to get > applied at some point, though I'd guess it's enough of a review task that > delaying an impending release for it isn't likely (the mbtoupper()-removal > patch made that series one patch shorter though, since one was to deal with > that function's thread-unsafety). I've been rebasing it periodically and > running it on my own system in /usr/local without any problems for a while > now, for what that's worth. > > With current HEAD from savannah though, all check-very-expensive tests pass > for me on Debian stretch with gcc 5.3, glibc 2.22, and Linux kernel 4.3. Thanks for your patience. Definitely a worthwhile feature. You're right: I want to ensure the core functionality is in a very solid state before making a release including multithreading. From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 11 01:19:43 2016 Received: (at 23234) by debbugs.gnu.org; 11 Apr 2016 05:19:43 +0000 Received: from localhost ([127.0.0.1]:57864 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apUGE-0007d4-SV for submit@debbugs.gnu.org; Mon, 11 Apr 2016 01:19:43 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:35307) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apUGD-0007cq-0I for 23234@debbugs.gnu.org; Mon, 11 Apr 2016 01:19:41 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 84C7C161257; Sun, 10 Apr 2016 22:19:35 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Pwgc3SYxn1FK; Sun, 10 Apr 2016 22:19:34 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id D650B16125E; Sun, 10 Apr 2016 22:19:34 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jqRaekszzjFi; Sun, 10 Apr 2016 22:19:34 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id B04B0161257; Sun, 10 Apr 2016 22:19:34 -0700 (PDT) Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Jim Meyering , 23234@debbugs.gnu.org, Bjoern Jacke References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <5705B7C1.4040301@cs.ucla.edu> <5708BE92.6010002@cs.ucla.edu> <570A121E.4010802@cs.ucla.edu> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <570B33E6.1010001@cs.ucla.edu> Date: Sun, 10 Apr 2016 22:19:34 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23234 Cc: Eric Blake X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Jim Meyering wrote: > With this fix, I would like to make yet another grep release. > Does anyone have any pending changes we should consider? I did a bit of bug triage, closing some bug reports and fixing one minor documentation bug (Bug#22911). I didn't see any pending changes that can't wait until after the next release. Thanks for volunteering to make these releases. From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 11 11:35:13 2016 Received: (at 23234) by debbugs.gnu.org; 11 Apr 2016 15:35:13 +0000 Received: from localhost ([127.0.0.1]:59594 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apdrt-0000y0-8K for submit@debbugs.gnu.org; Mon, 11 Apr 2016 11:35:13 -0400 Received: from mail-ob0-f169.google.com ([209.85.214.169]:35608) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apdrq-0000xm-PR for 23234@debbugs.gnu.org; Mon, 11 Apr 2016 11:35:11 -0400 Received: by mail-ob0-f169.google.com with SMTP id k9so16961171obk.2 for <23234@debbugs.gnu.org>; Mon, 11 Apr 2016 08:35:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=USK+qVhWqkbDcqFe5+MHPSUtyOYI3OpmjX4eK733tcA=; b=QwSd5KnLWNDro9qMe8zByqMCCwZWfkL+5aB2yaj5yYsQLaaUZUWD3rJcF8iY6Gios2 bs8eSyVj71k6UubsaxnudLCM02oc5ok4iSp979Pf+N8DRMHYcI1UgyCvXQqR/xWRuyJH StEVd/zMinLK7LV75Z9ENpy9af1Y3Asr3P5Jbl30UbYCzm6L0qGt0kcoG4TMZID/hQOa mDBtU0p1DyUEcVKQtrzKlOZsn7AOT0R/iJfvaKkYqgQQp76C84mIy4ZBRtjg/E8yNxiW 6pZH1yxwBFHkhmUijEz/eFz2ofutyA/cx+TKbaVASN2UAP9KoyMiH3ArsL0CBMgPIbvV AN6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=USK+qVhWqkbDcqFe5+MHPSUtyOYI3OpmjX4eK733tcA=; b=Agn37OOd9c1QV4OlwNibz931z+i5NsKn1WVsCPbqud+1sFIdnbY8PgWBau7ZW5eY7r 8m2Knm54KQCuAjr2HYCtXByjKzIhRWPsRM5U+hkIqDJG6mLpk5Mt3RgUhBmDgwLaqVEG OSL5p6SzNKRtnleQ2CUcJ/TJmy7HoTSZnov1q7lDyy6YBedTe2yu3BUNLlANuG88ju4Y 2Y7MN+Cp+WTLwokSR5Fg22f11TQdRx2coRn6DhozKXJAY858vRjsRhlvOIEELLdOoC32 4FJj1QyVqcXKvaLvZBCA+xrdagMXQe/ZeIqPqhZSeH76r9dQR4BxM2okKZSoPY+Ei01n akbw== X-Gm-Message-State: AD7BkJITaNksSzCzvZxmQDftNaQ2cW1frr12+pA5nlSvNDmlBO+L4vMTvqLB3ubNWShu11cKJyJ1fWlozawi3A== X-Received: by 10.182.242.4 with SMTP id wm4mr10469796obc.85.1460388905212; Mon, 11 Apr 2016 08:35:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.213.141 with HTTP; Mon, 11 Apr 2016 08:34:45 -0700 (PDT) In-Reply-To: <570B33E6.1010001@cs.ucla.edu> References: <20160406192521.GA14451@SerNet.DE> <570579DA.9020602@redhat.com> <5705B7C1.4040301@cs.ucla.edu> <5708BE92.6010002@cs.ucla.edu> <570A121E.4010802@cs.ucla.edu> <570B33E6.1010001@cs.ucla.edu> From: Jim Meyering Date: Mon, 11 Apr 2016 08:34:45 -0700 X-Google-Sender-Auth: klVn7PmDku_ahlGT48RL5kPwy2g Message-ID: Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Paul Eggert Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 23234 Cc: Bjoern Jacke , Eric Blake , 23234@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Sun, Apr 10, 2016 at 10:19 PM, Paul Eggert wrote: > Jim Meyering wrote: >> >> With this fix, I would like to make yet another grep release. >> Does anyone have any pending changes we should consider? > > > I did a bit of bug triage, closing some bug reports and fixing one minor > documentation bug (Bug#22911). I didn't see any pending changes that can't > wait until after the next release. Thank *you*. > Thanks for volunteering to make these releases. It's the least I can do, when you're fixing so many bugs. From unknown Mon Jun 23 11:27:05 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 10 May 2016 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator