From unknown Thu Jun 19 14:04:20 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#18266 <18266@debbugs.gnu.org> To: bug#18266 <18266@debbugs.gnu.org> Subject: Status: grep -P and invalid exits with error Reply-To: bug#18266 <18266@debbugs.gnu.org> Date: Thu, 19 Jun 2025 21:04:20 +0000 retitle 18266 grep -P and invalid exits with error=20 reassign 18266 grep submitter 18266 Santiago severity 18266 wishlist thanks From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 11:42:53 2014 Received: (at submit) by debbugs.gnu.org; 14 Aug 2014 15:42:53 +0000 Received: from localhost ([127.0.0.1]:43318 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHxAy-0002UQ-VX for submit@debbugs.gnu.org; Thu, 14 Aug 2014 11:42:53 -0400 Received: from eggs.gnu.org ([208.118.235.92]:49866) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHxAw-0002UB-SO for submit@debbugs.gnu.org; Thu, 14 Aug 2014 11:42:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XHxAm-00025W-DP for submit@debbugs.gnu.org; Thu, 14 Aug 2014 11:42:45 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,UNPARSEABLE_RELAY autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:44917) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XHxAm-00025S-AO for submit@debbugs.gnu.org; Thu, 14 Aug 2014 11:42:40 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51547) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XHxAh-0001uC-7u for bug-grep@gnu.org; Thu, 14 Aug 2014 11:42:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XHxAc-0001zJ-E8 for bug-grep@gnu.org; Thu, 14 Aug 2014 11:42:35 -0400 Received: from mx1.riseup.net ([198.252.153.129]:56020) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XHxAc-0001vF-7v for bug-grep@gnu.org; Thu, 14 Aug 2014 11:42:30 -0400 Received: from plantcutter.riseup.net (unknown [10.0.1.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id CE8F256210; Thu, 14 Aug 2014 08:42:23 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr) with ESMTPSA id 12DD82206C Received: by nomada (sSMTP sendmail emulation); Thu, 14 Aug 2014 17:42:57 +0200 Date: Thu, 14 Aug 2014 17:42:57 +0200 From: Santiago To: bug-grep@gnu.org Subject: grep -P and invalid exits with error Message-ID: <20140814154257.GA29230@nomada> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: clamav-milter 0.98.1 at mx1 X-Virus-Status: Clean X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit Cc: 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hi, Please, revert ca7868cc27db3d9deafaa2e0ac5a2bb0aa8ef373 That commit (re)introduced a regression bug (See http://debbugs.gnu.org/15758). pcresearch checks again if input is UTF-8 valid. The problem is that binary files are utf-8 invalid, so grep -P, in unicode locales, exits with error: LANG=en_US.UTF-8 grep -P -r x /usr/bin/ grep: invalid UTF-8 byte sequence in input printf 'j\x82\nj\n'|LC_ALL=en_US.UTF-8 grep -P j|cat -A; echo $? grep: invalid UTF-8 byte sequence in input 0 should be: printf 'j\x82\nj\n'|LC_ALL=en_US.UTF-8 src/grep -P j|cat -A; echo $? jM-^B$ j$ 0 Tested on Debian and Archlinux with pcre 8.35. Thanks, Santiago From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 12:16:13 2014 Received: (at 18266) by debbugs.gnu.org; 14 Aug 2014 16:16:13 +0000 Received: from localhost ([127.0.0.1]:43328 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHxhE-0003eT-2U for submit@debbugs.gnu.org; Thu, 14 Aug 2014 12:16:12 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:56020) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHxhB-0003e6-PE for 18266@debbugs.gnu.org; Thu, 14 Aug 2014 12:16:10 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id A02D1A60017; Thu, 14 Aug 2014 09:16:03 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c7okB2cvat8Y; Thu, 14 Aug 2014 09:15:58 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id C6174A60018; Thu, 14 Aug 2014 09:15:58 -0700 (PDT) Message-ID: <53ECE0BE.8090603@cs.ucla.edu> Date: Thu, 14 Aug 2014 09:15:58 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Santiago , 18266@debbugs.gnu.org Subject: Re: bug#18266: grep -P and invalid exits with error References: <20140814154257.GA29230@nomada> In-Reply-To: <20140814154257.GA29230@nomada> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) Santiago wrote: > Please, revert ca7868cc27db3d9deafaa2e0ac5a2bb0aa8ef373 That commit was necessary to avoid undefined behavior in libpcre. We can't simply undo the commit (unless you want to reintroduce security holes into grep :-). The current behavior is the best we can do, unless someone fixes libpcre (which doesn't appear to be likely), or unless someone takes the time to write code in grep to work around the problem. One way forward is suggested in . No doubt there are others. Can you suggest a volunteer to take this on? From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 12:22:34 2014 Received: (at control) by debbugs.gnu.org; 14 Aug 2014 16:22:34 +0000 Received: from localhost ([127.0.0.1]:43339 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHxnO-0003s9-8a for submit@debbugs.gnu.org; Thu, 14 Aug 2014 12:22:34 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:56441) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHxnL-0003rj-Q0 for control@debbugs.gnu.org; Thu, 14 Aug 2014 12:22:32 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id DDB4BA60023 for ; Thu, 14 Aug 2014 09:22:25 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eGvg3fy6iQSu for ; Thu, 14 Aug 2014 09:22:17 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 438D0A60018 for ; Thu, 14 Aug 2014 09:22:17 -0700 (PDT) Message-ID: <53ECE238.7040709@cs.ucla.edu> Date: Thu, 14 Aug 2014 09:22:16 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: 18266 is wishlist Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) severity 18266 wishlist From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 13:44:47 2014 Received: (at 18266) by debbugs.gnu.org; 14 Aug 2014 17:44:47 +0000 Received: from localhost ([127.0.0.1]:43367 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHz4x-0007wQ-10 for submit@debbugs.gnu.org; Thu, 14 Aug 2014 13:44:47 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:42345) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHz4u-0007wD-LR for 18266@debbugs.gnu.org; Thu, 14 Aug 2014 13:44:45 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id 5329570A; Thu, 14 Aug 2014 19:44:43 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id E200E21A07C; Thu, 14 Aug 2014 19:44:42 +0200 (CEST) Date: Thu, 14 Aug 2014 19:44:42 +0200 From: Vincent Lefevre To: Paul Eggert , 758105@bugs.debian.org Subject: Re: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140814174442.GA11558@xvii.vinc17.org> References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <53ECE0BE.8090603@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18266 Cc: Santiago , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On 2014-08-14 09:15:58 -0700, Paul Eggert wrote: > That commit was necessary to avoid undefined behavior in libpcre. We can't > simply undo the commit (unless you want to reintroduce security holes into > grep :-). The current behavior is the best we can do, unless someone fixes > libpcre (which doesn't appear to be likely), or unless someone takes the > time to write code in grep to work around the problem. > > One way forward is suggested in . No doubt > there are others. Can you suggest a volunteer to take this on? Discarding input lines with invalid UTF-8 sequences is not OK. IMHO, it would be better to replace invalid UTF-8 sequences by zero bytes before passing them to libpcre. Is it allowed to do that in Pexecute()? -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 14:19:48 2014 Received: (at 18266) by debbugs.gnu.org; 14 Aug 2014 18:19:48 +0000 Received: from localhost ([127.0.0.1]:43484 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHzcp-0000m6-Ap for submit@debbugs.gnu.org; Thu, 14 Aug 2014 14:19:48 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:35024) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XHzcm-0000lh-88 for 18266@debbugs.gnu.org; Thu, 14 Aug 2014 14:19:45 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 8F19EA60018; Thu, 14 Aug 2014 11:19:37 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pEzUvIJuyuxj; Thu, 14 Aug 2014 11:19:29 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id CB3FCA60017; Thu, 14 Aug 2014 11:19:28 -0700 (PDT) Message-ID: <53ECFDB0.40601@cs.ucla.edu> Date: Thu, 14 Aug 2014 11:19:28 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Vincent Lefevre , 758105@bugs.debian.org Subject: Re: Bug#758105: bug#18266: grep -P and invalid exits with error References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> In-Reply-To: <20140814174442.GA11558@xvii.vinc17.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 18266 Cc: Santiago , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) Vincent Lefevre wrote: > it would be better to replace invalid UTF-8 sequences by > zero bytes before passing them to libpcre. Is it allowed to do > that in Pexecute()? Sorry, I don't know. I was hoping that the volunteer (whoever it is) could figure all this stuff out. grep should work correctly even if the input contains NUL bytes, so perhaps it would be better to replace an invalid byte by the UTF-8 sequence for U+FFFD REPLACEMENT CHARACTER, as that's one standard way to deal with this problem. Or perhaps the volunteer will have a better idea. From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 16:11:54 2014 Received: (at 18266) by debbugs.gnu.org; 14 Aug 2014 20:11:54 +0000 Received: from localhost ([127.0.0.1]:43600 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI1NJ-0005mz-My for submit@debbugs.gnu.org; Thu, 14 Aug 2014 16:11:54 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:42367) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI1NG-0005ml-CB for 18266@debbugs.gnu.org; Thu, 14 Aug 2014 16:11:51 -0400 Received: by ioooi.vinc17.net (Postfix, from userid 1001) id E1EE8A2E; Thu, 14 Aug 2014 22:11:48 +0200 (CEST) Date: Thu, 14 Aug 2014 22:11:48 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140814201148.GC1951@ioooi.vinc17.net> References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <53ECFDB0.40601@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On 2014-08-14 11:19:28 -0700, Paul Eggert wrote: > grep should work correctly even if the input contains NUL bytes, so perhaps > it would be better to replace an invalid byte by the UTF-8 sequence for > U+FFFD REPLACEMENT CHARACTER, as that's one standard way to deal with this > problem. Or perhaps the volunteer will have a better idea. The problem with this solution is that it would change the length of the text, while replacing invalid bytes by zero bytes could be done in place (if allowed), with very little change of the code, I think. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 16:14:04 2014 Received: (at 18266) by debbugs.gnu.org; 14 Aug 2014 20:14:04 +0000 Received: from localhost ([127.0.0.1]:43604 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI1PP-0005rl-Jj for submit@debbugs.gnu.org; Thu, 14 Aug 2014 16:14:04 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:41002) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI1PM-0005qu-IY for 18266@debbugs.gnu.org; Thu, 14 Aug 2014 16:14:01 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 73AA4A6002D; Thu, 14 Aug 2014 13:13:54 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tIMhwxjlfENn; Thu, 14 Aug 2014 13:13:45 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id BE8D5A60017; Thu, 14 Aug 2014 13:13:45 -0700 (PDT) Message-ID: <53ED1879.8050400@cs.ucla.edu> Date: Thu, 14 Aug 2014 13:13:45 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: Bug#758105: bug#18266: grep -P and invalid exits with error References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> In-Reply-To: <20140814201148.GC1951@ioooi.vinc17.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) Vincent Lefevre wrote: > The problem with this solution is that it would change the length > of the text, while replacing invalid bytes by zero bytes could be > done in place (if allowed), with very little change of the code, > I think. True. Though it might be more user-friendly to use '?' as the replacement byte. From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 17:03:55 2014 Received: (at 18266) by debbugs.gnu.org; 14 Aug 2014 21:03:55 +0000 Received: from localhost ([127.0.0.1]:43637 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI2Bf-0007XP-85 for submit@debbugs.gnu.org; Thu, 14 Aug 2014 17:03:55 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:42375) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI2Bc-0007XG-Lc for 18266@debbugs.gnu.org; Thu, 14 Aug 2014 17:03:53 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id 4C6F670A; Thu, 14 Aug 2014 23:03:51 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id 0C1C321A07C; Thu, 14 Aug 2014 23:03:50 +0200 (CEST) Date: Thu, 14 Aug 2014 23:03:50 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140814210350.GJ5034@xvii.vinc17.org> References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <53ED1879.8050400@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On 2014-08-14 13:13:45 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > >The problem with this solution is that it would change the length > >of the text, while replacing invalid bytes by zero bytes could be > >done in place (if allowed), with very little change of the code, > >I think. > > True. Though it might be more user-friendly to use '?' as the > replacement byte. On output, yes (though in most cases, non-printable characters are probably seen as garbage and don't really matter); and when the lines are not printed, this doesn't matter. On input, using null bytes may be better if one wants to be able to match real replacement characters without false positives. Matching null bytes is not common, AFAIK. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 14 17:33:56 2014 Received: (at 18266) by debbugs.gnu.org; 14 Aug 2014 21:33:56 +0000 Received: from localhost ([127.0.0.1]:43677 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI2eh-0008Vu-Kb for submit@debbugs.gnu.org; Thu, 14 Aug 2014 17:33:55 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:45221) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI2ee-0008Va-Mo for 18266@debbugs.gnu.org; Thu, 14 Aug 2014 17:33:53 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 7CEABA6002F; Thu, 14 Aug 2014 14:33:46 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DiLVLgWny9Hq; Thu, 14 Aug 2014 14:33:37 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id CAF0BA6002D; Thu, 14 Aug 2014 14:33:37 -0700 (PDT) Message-ID: <53ED2B30.2050108@cs.ucla.edu> Date: Thu, 14 Aug 2014 14:33:36 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> In-Reply-To: <20140814210350.GJ5034@xvii.vinc17.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) Vincent Lefevre wrote: > On input, using null bytes may be better if one wants to be able to > match real replacement characters without false positives. Maybe, though this is no place to get fancy. It's simple to tell users "an invalid byte acts like '?'". Simple is good. Anyway, this is a matter for the implementing volunteer to decide, whoever that happens to be. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 16 10:00:58 2014 Received: (at 18266) by debbugs.gnu.org; 16 Aug 2014 14:00:58 +0000 Received: from localhost ([127.0.0.1]:44607 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIeXR-0006RX-AW for submit@debbugs.gnu.org; Sat, 16 Aug 2014 10:00:57 -0400 Received: from mx1.riseup.net ([198.252.153.129]:43727) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIeXN-0006RN-68 for 18266@debbugs.gnu.org; Sat, 16 Aug 2014 10:00:54 -0400 Received: from plantcutter.riseup.net (plantcutter-pn.riseup.net [10.0.1.121]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 8920B56C00; Sat, 16 Aug 2014 14:00:51 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr) with ESMTPSA id 01C8222E96 Received: by nomada (sSMTP sendmail emulation); Sat, 16 Aug 2014 16:01:27 +0200 Date: Sat, 16 Aug 2014 16:01:27 +0200 From: Santiago To: Paul Eggert , 758105@bugs.debian.org Subject: Re: Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140816140127.GA2252@nomada> References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="OwLcNYc0lM97+oe1" Content-Disposition: inline In-Reply-To: <53ED2B30.2050108@cs.ucla.edu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: clamav-milter 0.98.4 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Vincent Lefevre X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --OwLcNYc0lM97+oe1 Content-Type: multipart/mixed; boundary="5vNYLRcllDrimb99" Content-Disposition: inline --5vNYLRcllDrimb99 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable El 14/08/14 a las 14:33, Paul Eggert escribi=F3: > Vincent Lefevre wrote: > >On input, using null bytes may be better if one wants to be able to > >match real replacement characters without false positives. >=20 > Maybe, though this is no place to get fancy. It's simple to tell users "= an > invalid byte acts like '?'". Simple is good. >=20 > Anyway, this is a matter for the implementing volunteer to decide, whoever > that happens to be. >=20 Workaround attached. It's too slow against binary files, but I haven't found a simpler solution. What do you think? Santiago --5vNYLRcllDrimb99 Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="grep-pcresearch-clean-utf8-1.patch" Content-Transfer-Encoding: quoted-printable =46rom 7dd8d7c8682ee29bcb0ec9a64b98170fb7c6a064 Mon Sep 17 00:00:00 2001 =46rom: =3D?UTF-8?q?Santiago=3D20Ruano=3D20Rinc=3DC3=3DB3n?=3D Date: Sat, 16 Aug 2014 14:24:43 +0200 Subject: [PATCH] Workaround to don't abort for invalid UTF8 input * src/pcresearch.c (Pexecute): When pcre_exec returns an invalid UTF8 character error, copies line_buf to an auxiliar buffer, removes invalid characters and evaluates against it. * tests/pcre-infloop: Exit status is 1 again. * tests/pcre-invalid-utf8-input: Check again if grep doesn't abort. Also cheks for match after a second invalid character in the same line. Closes http://debbugs.gnu.org/18266 --- src/pcresearch.c | 16 ++++++++++++++++ tests/pcre-infloop | 2 +- tests/pcre-invalid-utf8-input | 12 +++++++++--- 3 files changed, 26 insertions(+), 4 deletions(-) diff --git a/src/pcresearch.c b/src/pcresearch.c index 820dd00..2b81e2b 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -164,6 +164,22 @@ Pexecute (char const *buf, size_t size, size_t *match_= size, e =3D pcre_exec (cre, extra, line_buf, line_end - line_buf, start_ofs < 0 ? 0 : start_ofs, 0, sub, sizeof sub / sizeof *sub); + + /* Workaround to don't abort for invalid multi-byte input (until + libpcre provides a better solution?) + If pcre_exec returns PCRE_ERROR_BADUTF8, copy the input, clean it + and evaluate again. */ + if (e =3D=3D PCRE_ERROR_BADUTF8){ + char *line_utf8_clean =3D xmemdup (line_buf, line_end - line_buf); + + while (e =3D=3D PCRE_ERROR_BADUTF8) { + line_utf8_clean[sub[0]] =3D '\0'; + + e =3D pcre_exec (cre, extra, line_utf8_clean, line_end - line_bu= f, + start_ofs < 0 ? 0 : start_ofs, 0, + sub, sizeof sub / sizeof *sub); + } + } } =20 if (e <=3D 0) diff --git a/tests/pcre-infloop b/tests/pcre-infloop index 1b33e72..b92f8e1 100755 --- a/tests/pcre-infloop +++ b/tests/pcre-infloop @@ -28,6 +28,6 @@ printf 'a\201b\r' > in || framework_failure_ fail=3D0 =20 LC_ALL=3Den_US.UTF-8 timeout 3 grep -P 'a.?..b' in -test $? =3D 2 || fail_ "libpcre's match function appears to infloop" +test $? =3D 1 || fail_ "libpcre's match function appears to infloop" =20 Exit $fail diff --git a/tests/pcre-invalid-utf8-input b/tests/pcre-invalid-utf8-input index 913e8ee..2c6aadb 100755 --- a/tests/pcre-invalid-utf8-input +++ b/tests/pcre-invalid-utf8-input @@ -13,9 +13,15 @@ require_en_utf8_locale_ =20 fail=3D0 =20 -printf 'j\202\nj\n' > in || framework_failure_ +printf 'j\202j\202\x\njx\n' > in || framework_failure_ =20 -LC_ALL=3Den_US.UTF-8 grep -P j in -test $? -eq 2 || fail=3D1 +LC_ALL=3Den_US.UTF-8 grep -P j in > out 2>&1 || fail=3D1 +compare in out || fail=3D1 +compare /dev/null err || fail=3D1 + +# Match after a second invalid UTF-8 character +LC_ALL=3Den_US.UTF-8 grep -P x in > out 2>&1 || fail=3D1 +compare in out || fail=3D1 +compare /dev/null err || fail=3D1 =20 Exit $fail --=20 1.7.10.4 --5vNYLRcllDrimb99-- --OwLcNYc0lM97+oe1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJT72Q3AAoJELPyXWsAqAZiyW8P/RFNnrtD4xSN5RWz1kF3uFDH b7D7FLvkF/gUN8lr1JQtwlGsmKKn4eSazSbj1SFQZcWdGg5/d1qAuHM3zc7yn/A/ PhBpLBMC0ebh64Ppgf5iW6675kQ79/Yq+v8Z7jLKSaNrq2hhWtIye1bPZarL7ZxD cOt8IMZFbBfYx4+X5WZ7Pzh29U5weWKW3Ur7L7aAnLPmwRL7gTXMj48bMnVFf+KF vKorgQSntC0vnQNFY3J+CIUg2x4eNdM/jgrCYaYfiubWTBGlj78TG93EBnyfsGTb D/LqZn8rDJMOKnON/igWUgiu8GgEfoJNWR4AoI+ik9Xi9pMHuMDn7XjQyHaiqSmI NstYGeXAnzbSNs0g2t4H3w4LUCliBZZI7dnxgthNmlxMm2uEgUdBm4yGrSKa+DPb nP1NL5XsRhEBB3ZiX0Aep9mkicbTJ2pEwzVGv9q+T9NXcHmrPz5ymRJ/3WARGG1+ YXkUORZ6nWX4MIcUHGAfS4SBf5LRwn8ovROPWHCtrnR8bQg5wnT/dOtrrUZ9F3NC DrPQZJBw4CBWiPS1fEUjqHVddtyIAWMduEFya5bYHOjsF/tXUze+BRnuXwWHvgxD ASuK1ej2ZZHdeCiHfDkYbRkuPaLyIuX+/BW3LenX0Dj02LN+XGsCtAPINpm8c/K9 /oxHFxa3rSaqKVLCOE0Z =SxA4 -----END PGP SIGNATURE----- --OwLcNYc0lM97+oe1-- From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 16 12:26:26 2014 Received: (at 18266) by debbugs.gnu.org; 16 Aug 2014 16:26:26 +0000 Received: from localhost ([127.0.0.1]:44656 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIgoD-0002Co-OT for submit@debbugs.gnu.org; Sat, 16 Aug 2014 12:26:26 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:42551) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIgoB-0002Ca-Hk for 18266@debbugs.gnu.org; Sat, 16 Aug 2014 12:26:24 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id ECAE38C2; Sat, 16 Aug 2014 18:26:21 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id 6DBD921A07B; Sat, 16 Aug 2014 18:26:21 +0200 (CEST) Date: Sat, 16 Aug 2014 18:26:21 +0200 From: Vincent Lefevre To: Santiago Subject: Re: Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140816162621.GM5034@xvii.vinc17.org> References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20140816140127.GA2252@nomada> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Paul Eggert , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On 2014-08-16 16:01:27 +0200, Santiago wrote: > Workaround attached. It's too slow against binary files, but I haven't > found a simpler solution. To avoid the slowness, I think that it would be better to detect (directly, not via PCRE) invalid UTF-8 sequences and replace them by null bytes *in-place*. It might slow down the general case, though. However I'm not sure, because if the UTF8 validity check (via the replacement of invalid sequences) is done in grep, it doesn't need to be done in PCRE. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 16 13:56:07 2014 Received: (at 18266) by debbugs.gnu.org; 16 Aug 2014 17:56:07 +0000 Received: from localhost ([127.0.0.1]:44792 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIiD0-0004wi-Ej for submit@debbugs.gnu.org; Sat, 16 Aug 2014 13:56:06 -0400 Received: from mx1.riseup.net ([198.252.153.129]:43082) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIiCx-0004wa-Ll for 18266@debbugs.gnu.org; Sat, 16 Aug 2014 13:56:04 -0400 Received: from berryeater.riseup.net (berryeater-pn.riseup.net [10.0.1.120]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 1CF235852A; Sat, 16 Aug 2014 17:56:02 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr) with ESMTPSA id E989642C14 Received: by nomada (sSMTP sendmail emulation); Sat, 16 Aug 2014 19:56:37 +0200 Date: Sat, 16 Aug 2014 19:56:37 +0200 From: Santiago To: Vincent Lefevre Subject: Re: Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140816175637.GA6115@nomada> References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20140816162621.GM5034@xvii.vinc17.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: clamav-milter 0.98.4 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Paul Eggert , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) El 16/08/14 a las 18:26, Vincent Lefevre escribió: > On 2014-08-16 16:01:27 +0200, Santiago wrote: > > Workaround attached. It's too slow against binary files, but I haven't > > found a simpler solution. > > To avoid the slowness, I think that it would be better to detect > (directly, not via PCRE) invalid UTF-8 sequences and replace them > by null bytes *in-place*. > > It might slow down the general case, though. However I'm not sure, > because if the UTF8 validity check (via the replacement of invalid > sequences) is done in grep, it doesn't need to be done in PCRE. > I think that'd require a similar work to replace the "invalid" content from binary files. Another solution would be to don't check if binary files are valid (passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd avoid security holes, and I don't know how to do it either. Regards, Santiago From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 16 14:36:46 2014 Received: (at 18266) by debbugs.gnu.org; 16 Aug 2014 18:36:46 +0000 Received: from localhost ([127.0.0.1]:44805 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIiqL-0007LL-Mh for submit@debbugs.gnu.org; Sat, 16 Aug 2014 14:36:46 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:47519) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIiqJ-0007L7-9Z for 18266@debbugs.gnu.org; Sat, 16 Aug 2014 14:36:44 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 3BDC2A60012; Sat, 16 Aug 2014 11:36:37 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9hzJGs2eBNfI; Sat, 16 Aug 2014 11:36:28 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 8D06239E80BE; Sat, 16 Aug 2014 11:36:28 -0700 (PDT) Message-ID: <53EFA4AC.7090308@cs.ucla.edu> Date: Sat, 16 Aug 2014 11:36:28 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Santiago , Vincent Lefevre Subject: Re: Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error References: <20140814154257.GA29230@nomada> <53ECE0BE.8090603@cs.ucla.edu> <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> In-Reply-To: <20140816175637.GA6115@nomada> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.3 (-) Santiago wrote: > Another solution would be to don't check if binary files are valid > (passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd > avoid security holes It wouldn't. (We already tried it.) From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 01:47:12 2014 Received: (at 18266) by debbugs.gnu.org; 29 Aug 2014 05:47:12 +0000 Received: from localhost ([127.0.0.1]:53062 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XNF1j-00053V-AE for submit@debbugs.gnu.org; Fri, 29 Aug 2014 01:47:11 -0400 Received: from mx1.riseup.net ([198.252.153.129]:41412) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XNF1g-00053L-1l for 18266@debbugs.gnu.org; Fri, 29 Aug 2014 01:47:09 -0400 Received: from berryeater.riseup.net (berryeater-pn.riseup.net [10.0.1.120]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id C3133533EF; Fri, 29 Aug 2014 05:47:06 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr) with ESMTPSA id 2986842045 Received: by nomada (sSMTP sendmail emulation); Thu, 28 Aug 2014 22:47:54 -0700 Date: Thu, 28 Aug 2014 22:47:54 -0700 From: Santiago To: Paul Eggert , 758105@bugs.debian.org Subject: Re: grep -P and invalid exits with error Message-ID: <20140829054754.GC5210@nomada> References: <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ALfTUftag+2gvp1h" Content-Disposition: inline In-Reply-To: <53EFA4AC.7090308@cs.ucla.edu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: clamav-milter 0.98.4 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Vincent Lefevre X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --ALfTUftag+2gvp1h Content-Type: multipart/mixed; boundary="oLBj+sq0vYjzfsbl" Content-Disposition: inline --oLBj+sq0vYjzfsbl Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable El 16/08/14 a las 11:36, Paul Eggert escribi=F3: > Santiago wrote: > >Another solution would be to don't check if binary files are valid > >(passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd > >avoid security holes >=20 > It wouldn't. (We already tried it.) >=20 Another try. This patch is by far more efficient. With the previous patch #1: % time grep -P faz /usr/bin/* = =20 ... grep: /usr/bin/X11: Es un directorio grep -P faz /usr/bin/* 519,78s user 0,32s system 99% cpu 8:41,19 total With this one: % time src/grep -P faz /usr/bin/* src/grep -P faz /usr/bin/* 7,36s user 0,33s system 99% cpu 7,695 total Cheers, Santiago --oLBj+sq0vYjzfsbl Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="grep-pcresearch-clean-utf8-2.patch" Content-Transfer-Encoding: quoted-printable =46rom 1f8aa0f711f1954b688a790c54b0cadbde165e5a Mon Sep 17 00:00:00 2001 =46rom: =3D?UTF-8?q?Santiago=3D20Ruano=3D20Rinc=3DC3=3DB3n?=3D Date: Thu, 28 Aug 2014 22:39:51 -0700 Subject: [PATCH] Workaround to don't abort for invalid UTF8 input * src/pcresearch.c (Pexecute): When pcre_exec returns an invalid UTF8 character error, copies line_buf to an auxiliar buffer, removes invalid characters and evaluates against it. * tests/pcre-infloop: Exit status is 1 again. * tests/pcre-invalid-utf8-input: Check again if grep doesn't abort. Also cheks for match after a second invalid character in the same line. Closes http://debbugs.gnu.org/18266 --- src/pcresearch.c | 21 +++++++++++++++++++++ tests/pcre-infloop | 2 +- tests/pcre-invalid-utf8-input | 12 +++++++++--- 3 files changed, 31 insertions(+), 4 deletions(-) diff --git a/src/pcresearch.c b/src/pcresearch.c index 820dd00..31661f9 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -164,6 +164,27 @@ Pexecute (char const *buf, size_t size, size_t *match_= size, e =3D pcre_exec (cre, extra, line_buf, line_end - line_buf, start_ofs < 0 ? 0 : start_ofs, 0, sub, sizeof sub / sizeof *sub); + + if (e =3D=3D PCRE_ERROR_BADUTF8){ + char *line_utf8_clean =3D xmemdup (line_buf, line_end - line_buf= ); + long invalid_pos =3D 0; + + /* Change invalid UTF-8 characters (according to pcre_exec) to '= \0' */ + while (e =3D=3D PCRE_ERROR_BADUTF8){ + line_utf8_clean[sub[0]+invalid_pos] =3D '\0'; + invalid_pos +=3D sub[0]; + + /* Evaluate the remaining line_utf8_clean section */ + e =3D pcre_exec (cre, extra, line_utf8_clean + invalid_pos, li= ne_end - line_buf - invalid_pos, + start_ofs < 0 ? 0 : start_ofs, 0, + sub, sizeof sub / sizeof *sub); + } + + /* Evaluate the cleaned line_utf8_clean */ + e =3D pcre_exec (cre, extra, line_utf8_clean, line_end - line_bu= f, + start_ofs < 0 ? 0 : start_ofs, 0, + sub, sizeof sub / sizeof *sub); + } } =20 if (e <=3D 0) diff --git a/tests/pcre-infloop b/tests/pcre-infloop index 1b33e72..b92f8e1 100755 --- a/tests/pcre-infloop +++ b/tests/pcre-infloop @@ -28,6 +28,6 @@ printf 'a\201b\r' > in || framework_failure_ fail=3D0 =20 LC_ALL=3Den_US.UTF-8 timeout 3 grep -P 'a.?..b' in -test $? =3D 2 || fail_ "libpcre's match function appears to infloop" +test $? =3D 1 || fail_ "libpcre's match function appears to infloop" =20 Exit $fail diff --git a/tests/pcre-invalid-utf8-input b/tests/pcre-invalid-utf8-input index 913e8ee..2c6aadb 100755 --- a/tests/pcre-invalid-utf8-input +++ b/tests/pcre-invalid-utf8-input @@ -13,9 +13,15 @@ require_en_utf8_locale_ =20 fail=3D0 =20 -printf 'j\202\nj\n' > in || framework_failure_ +printf 'j\202j\202\x\njx\n' > in || framework_failure_ =20 -LC_ALL=3Den_US.UTF-8 grep -P j in -test $? -eq 2 || fail=3D1 +LC_ALL=3Den_US.UTF-8 grep -P j in > out 2>&1 || fail=3D1 +compare in out || fail=3D1 +compare /dev/null err || fail=3D1 + +# Match after a second invalid UTF-8 character +LC_ALL=3Den_US.UTF-8 grep -P x in > out 2>&1 || fail=3D1 +compare in out || fail=3D1 +compare /dev/null err || fail=3D1 =20 Exit $fail --=20 1.7.10.4 --oLBj+sq0vYjzfsbl-- --ALfTUftag+2gvp1h Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJUABQKAAoJELPyXWsAqAZiMG4P/059N1/Ccq5hp1R2KXihPDZo MUtrRJvqXUhFRS5nocbwTQAkHgXZObY36E6mFiya17oQeZ4hvA5IqGfigWWXZCOB 12FswzXcQ9+py/IlStnxkCJhbF07O4oQJ3EedBIqqVXv5OY+nBbLawFhsyVJB3ZD JCFbT+v1zCls2wtf2HQKL9ie0AfABXDQiZGm4K4b59mKirhZ5mg30Jg4Crm9Kl0L kknRFmfQ7VICdI7H60HsTUAetvxD915+mAzbpILQlOyEuM8HzmYV50/rQp5qDraI 6Bo06Ukf9cmUZ27/HX9OvEv+dSpjetAFudILKXDARSQHwM8PK3mi048z+K7JXNoI fOqMhGCv+DtQOngCmCNvk0Vj/MKYtJdCmSa3ja5jLLODcOsXVvu662VM1Nw/POpB qHLMPr2z0WMWHE8lMAK6KDoCpwmfY+3/yplNdo/XbZFujaIZedTN2wKbZZx1ojQr Pb9F473Dv7Gk9wXuje3ePTN3IE+/MEuYfKCphjNkLF3lm6BmgSfRegjVf0WvOiJw S1GHXgeT2IjroKGb8KVRO241Ch6Lk2oSPeuDpFXIykT9QqCtq3qq+9DDBlnZmT/L XUHqdSHqPiIyDL64btXiTXfCg3g2FgRcCovv0wSs2z4gjXWohnA/CgK+001gKE4q YvaIB3qr3QIoefrzs5Z2 =OBFh -----END PGP SIGNATURE----- --ALfTUftag+2gvp1h-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 08:58:36 2014 Received: (at 18266) by debbugs.gnu.org; 29 Aug 2014 12:58:36 +0000 Received: from localhost ([127.0.0.1]:53194 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XNLlD-0001TY-AT for submit@debbugs.gnu.org; Fri, 29 Aug 2014 08:58:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:8560) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XNLl8-0001TJ-On for 18266@debbugs.gnu.org; Fri, 29 Aug 2014 08:58:31 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s7TCwJsl025987 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 29 Aug 2014 08:58:20 -0400 Received: from [10.3.113.127] (ovpn-113-127.phx2.redhat.com [10.3.113.127]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s7TCwI9F023228; Fri, 29 Aug 2014 08:58:18 -0400 Message-ID: <540078E9.6080108@redhat.com> Date: Fri, 29 Aug 2014 06:58:17 -0600 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0 MIME-Version: 1.0 To: Santiago , Paul Eggert , 758105@bugs.debian.org Subject: Re: bug#18266: grep -P and invalid exits with error References: <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> In-Reply-To: <20140829054754.GC5210@nomada> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="QM2Jnhaqq2P5TS9RoFrQMkoq12cwKFbf6" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Vincent Lefevre X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --QM2Jnhaqq2P5TS9RoFrQMkoq12cwKFbf6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 08/28/2014 11:47 PM, Santiago wrote: > El 16/08/14 a las 11:36, Paul Eggert escribi=C3=B3: >> > Santiago wrote: >>> > >Another solution would be to don't check if binary files are valid= >>> > >(passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if tha= t'd >>> > >avoid security holes >> >=20 >> > It wouldn't. (We already tried it.) >> >=20 > Another try. This patch is by far more efficient. > * src/pcresearch.c (Pexecute): When pcre_exec returns an invalid > UTF8 character error, copies line_buf to an auxiliar buffer, s/auxiliar/auxiliary/ > removes invalid characters and evaluates against it. > * tests/pcre-infloop: Exit status is 1 again. > * tests/pcre-invalid-utf8-input: Check again if grep doesn't > abort. Also cheks for match after a second invalid character s/cheks/checks/ > + /* Change invalid UTF-8 characters (according to pcre_exec) = to '\0' */ > + while (e =3D=3D PCRE_ERROR_BADUTF8){ Space before { > + line_utf8_clean[sub[0]+invalid_pos] =3D '\0'; Spaces around + --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --QM2Jnhaqq2P5TS9RoFrQMkoq12cwKFbf6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUAHjpAAoJEKeha0olJ0Nql10H/jnwU8cYpGqBhJ7OfBYrixWK BfY+kePBbxeb8n22OJiKDwn9ZH3Sj2hGuwwCsZIXvVaRiZiwBN2rl3xzxQQ41hY2 Ob5eQiVhPuiS30FCFoztBeo6+ByhXb3al0E5RooFWZ612EwhSPlfFibNLXS2hs9a rHCTh9bgJC1RpZg4HXKcIf5JRvYJk4cLk8wUz4lyVP4K5ZF/3GpIXm7ysKBmjAV6 z1VHGJ7Hc/m1Cl51yr9V13gpoX+UiEQBNE455KjIpxFgaQnZ4tYhxjzN90PaIuWq zy01e1YtPMxkUl2/By1Vwf6XKC8FyKNn1JMAxUytZNS3DiZ+jqCtbIzY09uAQss= =keKL -----END PGP SIGNATURE----- --QM2Jnhaqq2P5TS9RoFrQMkoq12cwKFbf6-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 09:44:05 2014 Received: (at 18266) by debbugs.gnu.org; 29 Aug 2014 13:44:05 +0000 Received: from localhost ([127.0.0.1]:53204 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XNMTF-0002cK-4s for submit@debbugs.gnu.org; Fri, 29 Aug 2014 09:44:05 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:49039) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XNMTC-0002bg-0X for 18266@debbugs.gnu.org; Fri, 29 Aug 2014 09:44:02 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id DDDBBA60001; Fri, 29 Aug 2014 06:43:55 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XAtUTcTe222j; Fri, 29 Aug 2014 06:43:47 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id AEB6FA60013; Fri, 29 Aug 2014 06:43:46 -0700 (PDT) Message-ID: <54008391.6030801@cs.ucla.edu> Date: Fri, 29 Aug 2014 06:43:45 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Santiago , 758105@bugs.debian.org Subject: Re: grep -P and invalid exits with error References: <20140814174442.GA11558@xvii.vinc17.org> <53ECFDB0.40601@cs.ucla.edu> <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> In-Reply-To: <20140829054754.GC5210@nomada> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Vincent Lefevre X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Thanks, but that patch seems to depend on libpcre internals, in that it "knows" that pcre_exec cannot possibly succeed without first checking its entire input buffer for invalid UTF-8 bytes. Even if that's true now, it reflects a performance bug that might be fixed in a future libpcre version. Also, I don't see why grep needs to copy the buffer when there's an encoding error. Why not simply rerun the matcher on the initial prefix that doesn't have an encoding-error byte, and then (if that doesn't find a match), try matching the suffix after the encoding-error byte? This approach would not only avoid the buffer copy, it would avoid knowledge of libpcre internals. From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 01 04:18:36 2014 Received: (at 18266) by debbugs.gnu.org; 1 Sep 2014 08:18:36 +0000 Received: from localhost ([127.0.0.1]:55381 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XOMot-0008CV-Ks for submit@debbugs.gnu.org; Mon, 01 Sep 2014 04:18:35 -0400 Received: from ypig.lip.ens-lyon.fr ([140.77.13.48]:34238) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XOMoq-0008CM-Tp for 18266@debbugs.gnu.org; Mon, 01 Sep 2014 04:18:34 -0400 Received: from vlefevre by ypig.lip.ens-lyon.fr with local (Exim 4.84) (envelope-from ) id 1XOMog-0001Zt-Ac; Mon, 01 Sep 2014 10:18:22 +0200 Date: Mon, 1 Sep 2014 10:18:22 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: grep -P and invalid exits with error Message-ID: <20140901081822.GB3775@ypig.lip.ens-lyon.fr> References: <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <54008391.6030801@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 2014-08-29 06:43:45 -0700, Paul Eggert wrote: > Thanks, but that patch seems to depend on libpcre internals, in that it > "knows" that pcre_exec cannot possibly succeed without first checking its > entire input buffer for invalid UTF-8 bytes. Even if that's true now, it > reflects a performance bug that might be fixed in a future libpcre version. If I understand correctly, I don't think that's an internal. The pcreapi(3) man page says about PCRE_NO_UTF8_CHECK: [...] Note that this option can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the validity checking of subject strings only. If the same string is being matched many times, the option can be safely set for the second and subsequent matchings to improve performance. The last sentence would imply that the UTF8 checking is done on the whole input buffer before matching is done. > Also, I don't see why grep needs to copy the buffer when there's an encoding > error. Why not simply rerun the matcher on the initial prefix that doesn't > have an encoding-error byte, and then (if that doesn't find a match), try > matching the suffix after the encoding-error byte? This approach would not > only avoid the buffer copy, it would avoid knowledge of libpcre internals. If there are many invalid UTF8 bytes, this would be slow, IMHO (it could be worth a try, though). But is the copy of the buffer really needed? Couldn't the invalid UTF8 sequences just be replaced by null bytes? Note that in case of invalid UTF8 bytes, in some (many?) cases, the cause is a binary file (possibly with some text in it), where lines can be very long. So, wouldn't it mean that it can take significantly more memory? -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 01 04:32:12 2014 Received: (at 18266) by debbugs.gnu.org; 1 Sep 2014 08:32:12 +0000 Received: from localhost ([127.0.0.1]:55385 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XON23-00006m-HG for submit@debbugs.gnu.org; Mon, 01 Sep 2014 04:32:11 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:47698) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XON20-00006X-3T for 18266@debbugs.gnu.org; Mon, 01 Sep 2014 04:32:09 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 4483EA6001B; Mon, 1 Sep 2014 01:32:02 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l9L2+RSd04lf; Mon, 1 Sep 2014 01:31:53 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 937E0A6001D; Mon, 1 Sep 2014 01:31:53 -0700 (PDT) Message-ID: <54042EF9.6000309@cs.ucla.edu> Date: Mon, 01 Sep 2014 01:31:53 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: grep -P and invalid exits with error References: <20140814201148.GC1951@ioooi.vinc17.net> <53ED1879.8050400@cs.ucla.edu> <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> In-Reply-To: <20140901081822.GB3775@ypig.lip.ens-lyon.fr> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Vincent Lefevre wrote: > [...] Note that this option can also be passed to pcre_exec() > and pcre_dfa_exec(), to suppress the validity checking of > subject strings only. If the same string is being matched > many times, the option can be safely set for the second and > subsequent matchings to improve performance. > > The last sentence would imply that the UTF8 checking is done on the > whole input buffer before matching is done. That's pretty subtle, and perhaps too subtle. A plausible interpretation of the phrase "same string is being matched" is that libpcre checks only the matched string, and that bytes after the match (which did not need to be examined to do the match) are not checked. Can you confirm with the libpcre authors that this plausible interpretation is incorrect, i.e., that the entire input string is checked, even the unmatched part? If that's what is intended, the documentation should state so clearly, so at least there's a documentation bug there. > If there are many invalid UTF8 bytes, this would be slow, IMHO That's OK. We don't need grep -P to be fast on invalid input. > But is the copy of the buffer really needed? Couldn't the invalid > UTF8 sequences just be replaced by null bytes? I'd rather not, because that changes the semantics of matching. The null byte is valid input data that might get matched. > in case of invalid UTF8 bytes, in some (many?) cases, the > cause is a binary file (possibly with some text in it), where lines > can be very long. So, wouldn't it mean that it can take significantly > more memory? Sure. But that's the same for -P as it is for plain grep. From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 08 22:44:54 2014 Received: (at 18266) by debbugs.gnu.org; 9 Sep 2014 02:44:54 +0000 Received: from localhost ([127.0.0.1]:34287 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRBQL-0001Lm-I6 for submit@debbugs.gnu.org; Mon, 08 Sep 2014 22:44:54 -0400 Received: from mx1.riseup.net ([198.252.153.129]:47101) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRBQH-0001LZ-5a for 18266@debbugs.gnu.org; Mon, 08 Sep 2014 22:44:50 -0400 Received: from berryeater.riseup.net (berryeater-pn.riseup.net [10.0.1.120]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id E6DBE5121D; Mon, 8 Sep 2014 19:44:47 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr) with ESMTPSA id 3ABF6427BF Received: by nomada (sSMTP sendmail emulation); Tue, 09 Sep 2014 04:44:43 +0200 Date: Tue, 9 Sep 2014 04:44:43 +0200 From: Santiago To: Paul Eggert Subject: Re: grep -P and invalid exits with error Message-ID: <20140909024442.GA4021@nomada> References: <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="uXxzq0nDebZQVNAZ" Content-Disposition: inline In-Reply-To: <54042EF9.6000309@cs.ucla.edu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: clamav-milter 0.98.4 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, Vincent Lefevre , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --uXxzq0nDebZQVNAZ Content-Type: multipart/mixed; boundary="24zk1gE8NUlDmwG9" Content-Disposition: inline --24zk1gE8NUlDmwG9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Patch updated. Paul, thanks for the previous comments. As you suggested, the attached patch doesn't copy the buffer and splits the input when it finds an invalid character. For the moment, I don't see a cleaner way to avoid the pcre internals. Regards, Santiago --24zk1gE8NUlDmwG9 Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="grep-pcresearch-clean-utf8-3.patch" Content-Transfer-Encoding: quoted-printable =46rom d58b53f86bb3f4b97137f708c159b4a3bc40c543 Mon Sep 17 00:00:00 2001 =46rom: =3D?UTF-8?q?Santiago=3D20Ruano=3D20Rinc=3DC3=3DB3n?=3D Date: Tue, 9 Sep 2014 00:02:52 +0200 Subject: [PATCH] Workaround to don't abort for invalid UTF8 input * src/pcresearch.c (Pexecute): If pcre_exec returns an invalid UTF8 character error, evaluates the valid characters only, iteratively dividing line_buf in two sections, before and after each invalid character it founds. * tests/pcre-infloop: Exit status is 1 again. * tests/pcre-invalid-utf8-input: Check again if grep doesn't abort. Also cheks for match after a second invalid character in the same line. * tests/fload1: Add simple --color tests with -P matcher Closes http://debbugs.gnu.org/18266 --- src/pcresearch.c | 31 +++++++++++++++++++++++++++++++ tests/foad1 | 2 ++ tests/pcre-infloop | 2 +- tests/pcre-invalid-utf8-input | 16 +++++++++++++--- 4 files changed, 47 insertions(+), 4 deletions(-) diff --git a/src/pcresearch.c b/src/pcresearch.c index 820dd00..e542d48 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -166,6 +166,37 @@ Pexecute (char const *buf, size_t size, size_t *match_= size, sub, sizeof sub / sizeof *sub); } =20 + if (e =3D=3D PCRE_ERROR_BADUTF8) + { + ptrdiff_t clean_offset =3D start_ptr ? start_ptr - line_buf : 0; + const char * clean_sect_beg =3D line_buf + clean_offset; + + while (e =3D=3D PCRE_ERROR_BADUTF8) + { + if (line_buf < clean_sect_beg + sub[0]) + { + /* Evaluate the buffer section previous to the invalid chara= cter */ + e =3D pcre_exec (cre, extra, clean_sect_beg, sub[0], + 0, 0, sub, sizeof sub / sizeof *sub); + } + if (e =3D=3D 1) + continue; /* Match */ + else if (clean_sect_beg + sub[0] + 1 < line_end) + { + clean_sect_beg +=3D sub[0] + 1; + + /* Evaluate the remaining buffer section, after the invalid + character */ + e =3D pcre_exec (cre, extra, clean_sect_beg, line_end - clea= n_sect_beg, + 0, 0, sub, sizeof sub / sizeof *sub); + } + } + + /* Fix offsets */ + sub[0] +=3D clean_sect_beg - line_buf; + sub[1] +=3D clean_sect_beg - line_buf; + } + if (e <=3D 0) { switch (e) diff --git a/tests/foad1 b/tests/foad1 index eeab51a..fe9d0f5 100755 --- a/tests/foad1 +++ b/tests/foad1 @@ -134,8 +134,10 @@ grep_test "$x1" "$y1" -E -w --color=3Dalways -e ccc -e= bb grep_test "$x1" "$y1" -F -w --color=3Dalways -e ccc -e bb grep_test "$x2" "$y2" -E -w --color=3Dalways bc grep_test "$x2" "$y2" -F -w --color=3Dalways bc +grep_test "$x2" "$y2" -P -w --color=3Dalways bc grep_test "$x3" "$y3" -E -w --color=3Dalways bc grep_test "$x3" "$y3" -F -w --color=3Dalways bc +grep_test "$x3" "$y3" -P -w --color=3Dalways bc =20 # Skip the rest of the tests - known to fail. TAA. Exit $failures diff --git a/tests/pcre-infloop b/tests/pcre-infloop index 1b33e72..b92f8e1 100755 --- a/tests/pcre-infloop +++ b/tests/pcre-infloop @@ -28,6 +28,6 @@ printf 'a\201b\r' > in || framework_failure_ fail=3D0 =20 LC_ALL=3Den_US.UTF-8 timeout 3 grep -P 'a.?..b' in -test $? =3D 2 || fail_ "libpcre's match function appears to infloop" +test $? =3D 1 || fail_ "libpcre's match function appears to infloop" =20 Exit $fail diff --git a/tests/pcre-invalid-utf8-input b/tests/pcre-invalid-utf8-input index 913e8ee..a5ae7bc 100755 --- a/tests/pcre-invalid-utf8-input +++ b/tests/pcre-invalid-utf8-input @@ -13,9 +13,19 @@ require_en_utf8_locale_ =20 fail=3D0 =20 -printf 'j\202\nj\n' > in || framework_failure_ +printf 'j\202j\202x\njx\n' > in || framework_failure_ =20 -LC_ALL=3Den_US.UTF-8 grep -P j in -test $? -eq 2 || fail=3D1 +LC_ALL=3Den_US.UTF-8 grep -P j in > out 2>&1 || fail=3D1 +compare in out || fail=3D1 +compare /dev/null err || fail=3D1 =20 +# Match after a second invalid UTF-8 character +#LC_ALL=3Den_US.UTF-8 grep -P x in > out 2>&1 || fail=3D1 +#compare in out || fail=3D1 +#compare /dev/null err || fail=3D1 + +printf '\202xj\n' > in || framework_failure_ +LC_ALL=3Den_US.UTF-8 grep -P x in > out 2>&1 || fail=3D1 +compare in out || fail=3D1 +compare /dev/null err || fail=3D1 Exit $fail --=20 1.7.10.4 --24zk1gE8NUlDmwG9-- --uXxzq0nDebZQVNAZ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJUDmmaAAoJELPyXWsAqAZifJEQAMP4S+YBYCb3G7NpjFrIuDmE ZF6CcWrMRWcrYl/QMHTNlp3LtsL2RYGERdZ+tRaWKK+Ugh+FpGwgM/wUKqouM2Fy kJSJO+nT/aImYptkt+6Ni58ljGZF0NYExn2IpVFlYoRnobQxLBtgoaN+phbvo1gt AlDuIRsSs+ZX1At6620n28kXY32WgI4KoQb+0npfYKbGjXatOsncYcXq7bEnjpPJ 921OTjMdGs2TfaQcnVGo0XoCC9DudBAadmR4+mEXw9Pf3MrXPQkaURp/mLNz7JP0 JeXT9lMdeXl0zRyaj06E0asH4ZEeJBCM/JXXRCel2VzM98b2XJYv1egoejef+aQN 2bjyPWJzxK7UNsXwzIygbFsy/xJpLgWj4zyc2QXqwhYr+dKXSid0YP2oko/xVaQ8 IDAMUpJ17TmFh+nJt5glzZ2gr5Ac7Ewykd9YplAl9m/DDXSMGBno3loAcz4Bgurh OIZCmsDSGDpE2T5DB6Itf3tN4SSHMUZmqV1Z6pt53Q+b5mYOWq/q6AOfQv/8tNos 9COVP3q6oXc9zzH9QegZffTe8Jh/VxLRVy1ejhk2WzEoO+N+0v4SpMkeAI3cGznl GFzxojl7J/XSbfggGi2XVW6Pv5AXZOvUTAIHuS1AsukfnBpS/el0I6Iu8xmhIw6h 5gswlzSNz60foUpWPWXv =pUbK -----END PGP SIGNATURE----- --uXxzq0nDebZQVNAZ-- From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 09 11:41:08 2014 Received: (at 18266) by debbugs.gnu.org; 9 Sep 2014 15:41:08 +0000 Received: from localhost ([127.0.0.1]:35501 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRNXY-0005k9-6J for submit@debbugs.gnu.org; Tue, 09 Sep 2014 11:41:08 -0400 Received: from mailgw06.kcn.ne.jp ([61.86.7.213]:35834) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRNXV-0005jW-1U for 18266@debbugs.gnu.org; Tue, 09 Sep 2014 11:41:06 -0400 Received: from imp02 (mailgw6.kcn.ne.jp [61.86.15.232]) by mailgw06.kcn.ne.jp (Postfix) with ESMTP id 7B6F1C8001 for <18266@debbugs.gnu.org>; Wed, 10 Sep 2014 00:40:57 +0900 (JST) Received: from mail03.kcn.ne.jp ([61.86.6.182]) by imp02 with bizsmtp id p3gx1o0063veGq5013gxo4; Wed, 10 Sep 2014 00:40:57 +0900 X-OrgRCPT: 18266@debbugs.gnu.org Received: from [10.120.1.60] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail03.kcn.ne.jp (Postfix) with ESMTPA id 1F376141009A; Wed, 10 Sep 2014 00:40:57 +0900 (JST) Date: Wed, 10 Sep 2014 00:40:53 +0900 From: Norihiro Tanaka To: Santiago Subject: Re: bug#18266: grep -P and invalid exits with error In-Reply-To: <20140909024442.GA4021@nomada> References: <54042EF9.6000309@cs.ucla.edu> <20140909024442.GA4021@nomada> Message-Id: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, Paul Eggert , Vincent Lefevre , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.5 (--) I'm worried that to re-run for invalid UTF-8 makes slowness for searching of the large number of binary files. From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 09 15:59:43 2014 Received: (at 18266) by debbugs.gnu.org; 9 Sep 2014 19:59:43 +0000 Received: from localhost ([127.0.0.1]:35685 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRRZm-0005TQ-H3 for submit@debbugs.gnu.org; Tue, 09 Sep 2014 15:59:43 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:53689) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRRZj-0005T8-3c for 18266@debbugs.gnu.org; Tue, 09 Sep 2014 15:59:40 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id A65A039E8012; Tue, 9 Sep 2014 12:59:32 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oQ-oQv6bANPD; Tue, 9 Sep 2014 12:59:28 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id CE4A239E8015; Tue, 9 Sep 2014 12:59:27 -0700 (PDT) Message-ID: <540F5C1F.2040008@cs.ucla.edu> Date: Tue, 09 Sep 2014 12:59:27 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Norihiro Tanaka , Santiago Subject: Re: bug#18266: grep -P and invalid exits with error References: <54042EF9.6000309@cs.ucla.edu> <20140909024442.GA4021@nomada> <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------060706050205060007010706" X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, Vincent Lefevre , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) This is a multi-part message in MIME format. --------------060706050205060007010706 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Norihiro Tanaka wrote: > I'm worried that to re-run for invalid UTF-8 makes slowness for searching > of the large number of binary files. Yes, that could be a problem, but even so it's better for grep to report matches than to give up and fail. Perhaps someone could optimize this better later, but to be honest given how flaky libpcre is we're probably better off spending our scarce development resources elsewhere. Santiago's latest patch still had some troubles, unfortunately. It could mishandle '^' by having it match just past an encoding error. It was less efficient than it could be, as it checked all valid bytes for UTF-8-edness twice. If I understand PCRE correctly (which quite possibly I don't), it also appeared to mishandle matches that contain nested subexpressions. But the worst part was that the code was too complicated (and this was true even before Santiago's patch was applied). So I rewrote it and installed the attached patch instead. Please give it a try. --------------060706050205060007010706 Content-Type: text/plain; charset=UTF-8; name="0001-grep-P-now-treats-invalid-UTF-8-input-as-non-matchin.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0001-grep-P-now-treats-invalid-UTF-8-input-as-non-matchin.pa"; filename*1="tch" RnJvbSAyOTg1NWU3YmJlNDdiOTE2ODBhZTBjYmE1NzI5YzViZWNmYWEzMjE2IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBUdWUsIDkgU2VwIDIwMTQgMTI6NDE6NTQgLTA3MDAKU3ViamVjdDogW1BBVENI XSBncmVwOiAtUCBub3cgdHJlYXRzIGludmFsaWQgVVRGLTggaW5wdXQgYXMgbm9uLW1hdGNo aW5nCgpQcm9ibGVtIHJlcG9ydGVkIGJ5IFNhbnRpYWdvIFZpbGEgaW46IGh0dHA6Ly9idWdz LmdudS5vcmcvMTgyNjYKKiBORVdTOiBNZW50aW9uIHRoaXMuCiogc3JjL3BjcmVzZWFyY2gu YyAoUGV4ZWN1dGUpOiBUcmVhdCBVVEYtOCBlbmNvZGluZyBlcnJvcnMKYXMgbm9uLW1hdGNo aW5nIGRhdGEsIGluc3RlYWQgb2YgZXhpdGluZyAnZ3JlcCcuCiogdGVzdHMvcGNyZS1pbmZs b29wOiBncmVwIG5vdyBleGl0cyB3aXRoIHN0YXR1cyAxLCBub3QgMi4KKiB0ZXN0cy9wY3Jl LWludmFsaWQtdXRmOC1pbnB1dDogZ3JlcCBub3cgZXhpdHMgd2l0aCBzdGF0dXMgMCwgbm90 IDIuCi0tLQogTkVXUyAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgMyArKwogc3JjL3Bj cmVzZWFyY2guYyAgICAgICAgICAgICAgfCA3MCArKysrKysrKysrKysrKysrKy0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tCiB0ZXN0cy9wY3JlLWluZmxvb3AgICAgICAgICAgICB8ICAy ICstCiB0ZXN0cy9wY3JlLWludmFsaWQtdXRmOC1pbnB1dCB8ICAyICstCiA0IGZpbGVzIGNo YW5nZWQsIDMzIGluc2VydGlvbnMoKyksIDQ0IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBh L05FV1MgYi9ORVdTCmluZGV4IDU1MGJmNGMuLmNhNzk1MjUgMTAwNjQ0Ci0tLSBhL05FV1MK KysrIGIvTkVXUwpAQCAtNiw2ICs2LDkgQEAgR05VIGdyZXAgTkVXUyAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgIC0qLSBvdXRsaW5lIC0qLQogCiAgIFBlcmZvcm1hbmNl IGhhcyBpbXByb3ZlZCBmb3IgdmVyeSBsb25nIHN0cmluZ3MgaW4gcGF0dGVybnMuCiAKKyAg Z3JlcCAtUCBubyBsb25nZXIgcmVwb3J0cyBhbiBlcnJvciBhbmQgZXhpdHMgd2hlbiBnaXZl biBpbnZhbGlkIFVURi04IGRhdGEuCisgIEluc3RlYWQsIGl0IGNvbnNpZGVycyB0aGUgZGF0 YSB0byBiZSBub24tbWF0Y2hpbmcuCisKICoqIEJ1ZyBmaXhlcwogCiAgIGdyZXAgLUUgcmVq ZWN0ZWQgdW5tYXRjaGVkICcpJywgaW5zdGVhZCBvZiB0cmVhdGluZyBpdCBsaWtlICdcKScu CmRpZmYgLS1naXQgYS9zcmMvcGNyZXNlYXJjaC5jIGIvc3JjL3BjcmVzZWFyY2guYwppbmRl eCA4MjBkZDAwLi4yYTAxZTZkIDEwMDY0NAotLS0gYS9zcmMvcGNyZXNlYXJjaC5jCisrKyBi L3NyYy9wY3Jlc2VhcmNoLmMKQEAgLTEzNiwzNCArMTM2LDQxIEBAIFBleGVjdXRlIChjaGFy IGNvbnN0ICpidWYsIHNpemVfdCBzaXplLCBzaXplX3QgKm1hdGNoX3NpemUsCiAjZWxzZQog ICAvKiBUaGlzIGFycmF5IG11c3QgaGF2ZSBhdCBsZWFzdCB0d28gZWxlbWVudHM7IGV2ZXJ5 dGhpbmcgYWZ0ZXIgdGhhdAogICAgICBpcyBqdXN0IGZvciBwZXJmb3JtYW5jZSBpbXByb3Zl bWVudCBpbiBwY3JlX2V4ZWMuICAqLwotICBpbnQgc3ViWzMwMF07CisgIGVudW0geyBuc3Vi ID0gMzAwIH07CisgIGludCBzdWJbbnN1Yl07CiAKLSAgY29uc3QgY2hhciAqbGluZV9idWYs ICpsaW5lX2VuZCwgKmxpbmVfbmV4dDsKKyAgY2hhciBjb25zdCAqcCA9IHN0YXJ0X3B0ciA/ IHN0YXJ0X3B0ciA6IGJ1ZjsKKyAgaW50IG9wdGlvbnMgPSBwID09IGJ1ZiB8fCBwWy0xXSA9 PSBlb2xieXRlID8gMCA6IFBDUkVfTk9UQk9MOworICBjaGFyIGNvbnN0ICpsaW5lX3N0YXJ0 ID0gYnVmOwogICBpbnQgZSA9IFBDUkVfRVJST1JfTk9NQVRDSDsKLSAgcHRyZGlmZl90IHN0 YXJ0X29mcyA9IHN0YXJ0X3B0ciA/IHN0YXJ0X3B0ciAtIGJ1ZiA6IDA7CisgIGNoYXIgY29u c3QgKmxpbmVfZW5kOwogCiAgIC8qIFBDUkUgY2FuJ3QgbGltaXQgdGhlIG1hdGNoaW5nIHRv IHNpbmdsZSBsaW5lcywgdGhlcmVmb3JlIHdlIGhhdmUgdG8KICAgICAgbWF0Y2ggZWFjaCBs aW5lIGluIHRoZSBidWZmZXIgc2VwYXJhdGVseS4gICovCi0gIGZvciAobGluZV9uZXh0ID0g YnVmOwotICAgICAgIGUgPT0gUENSRV9FUlJPUl9OT01BVENIICYmIGxpbmVfbmV4dCA8IGJ1 ZiArIHNpemU7Ci0gICAgICAgc3RhcnRfb2ZzIC09IGxpbmVfbmV4dCAtIGxpbmVfYnVmKQor ICBmb3IgKDsgcCA8IGJ1ZiArIHNpemU7IHAgPSBsaW5lX3N0YXJ0ID0gbGluZV9lbmQgKyAx KQogICAgIHsKLSAgICAgIGxpbmVfYnVmID0gbGluZV9uZXh0OwotICAgICAgbGluZV9lbmQg PSBtZW1jaHIgKGxpbmVfYnVmLCBlb2xieXRlLCAoYnVmICsgc2l6ZSkgLSBsaW5lX2J1Zik7 Ci0gICAgICBpZiAobGluZV9lbmQgPT0gTlVMTCkKLSAgICAgICAgbGluZV9uZXh0ID0gbGlu ZV9lbmQgPSBidWYgKyBzaXplOwotICAgICAgZWxzZQotICAgICAgICBsaW5lX25leHQgPSBs aW5lX2VuZCArIDE7Ci0KLSAgICAgIGlmIChzdGFydF9wdHIgJiYgc3RhcnRfcHRyID49IGxp bmVfZW5kKQotICAgICAgICBjb250aW51ZTsKKyAgICAgIGxpbmVfZW5kID0gbWVtY2hyIChw LCBlb2xieXRlLCBidWYgKyBzaXplIC0gcCk7CiAKLSAgICAgIGlmIChJTlRfTUFYIDwgbGlu ZV9lbmQgLSBsaW5lX2J1ZikKKyAgICAgIGlmIChJTlRfTUFYIDwgbGluZV9lbmQgLSBwKQog ICAgICAgICBlcnJvciAoRVhJVF9UUk9VQkxFLCAwLCBfKCJleGNlZWRlZCBQQ1JFJ3MgbGlu ZSBsZW5ndGggbGltaXQiKSk7CiAKLSAgICAgIGUgPSBwY3JlX2V4ZWMgKGNyZSwgZXh0cmEs IGxpbmVfYnVmLCBsaW5lX2VuZCAtIGxpbmVfYnVmLAotICAgICAgICAgICAgICAgICAgICAg c3RhcnRfb2ZzIDwgMCA/IDAgOiBzdGFydF9vZnMsIDAsCi0gICAgICAgICAgICAgICAgICAg ICBzdWIsIHNpemVvZiBzdWIgLyBzaXplb2YgKnN1Yik7CisgICAgICAvKiBUcmVhdCBlbmNv ZGluZy1lcnJvciBieXRlcyBhcyBkYXRhIHRoYXQgY2Fubm90IG1hdGNoLiAgKi8KKyAgICAg IGZvciAoOzspCisgICAgICAgIHsKKyAgICAgICAgICBlID0gcGNyZV9leGVjIChjcmUsIGV4 dHJhLCBwLCBsaW5lX2VuZCAtIHAsIDAsIG9wdGlvbnMsIHN1YiwgbnN1Yik7CisgICAgICAg ICAgaWYgKGUgIT0gUENSRV9FUlJPUl9CQURVVEY4KQorICAgICAgICAgICAgYnJlYWs7Cisg ICAgICAgICAgZSA9IHBjcmVfZXhlYyAoY3JlLCBleHRyYSwgcCwgc3ViWzBdLCAwLAorICAg ICAgICAgICAgICAgICAgICAgICAgIG9wdGlvbnMgfCBQQ1JFX05PX1VURjhfQ0hFQ0ssIHN1 YiwgbnN1Yik7CisgICAgICAgICAgaWYgKGUgIT0gUENSRV9FUlJPUl9OT01BVENIKQorICAg ICAgICAgICAgYnJlYWs7CisgICAgICAgICAgcCArPSBzdWJbMF0gKyAxOworICAgICAgICAg IG9wdGlvbnMgPSBQQ1JFX05PVEJPTDsKKyAgICAgICAgfQorCisgICAgICBpZiAoZSAhPSBQ Q1JFX0VSUk9SX05PTUFUQ0gpCisgICAgICAgIGJyZWFrOworICAgICAgb3B0aW9ucyA9IDA7 CiAgICAgfQogCiAgIGlmIChlIDw9IDApCkBAIC0xODAsMTAgKzE4Nyw2IEBAIFBleGVjdXRl IChjaGFyIGNvbnN0ICpidWYsIHNpemVfdCBzaXplLCBzaXplX3QgKm1hdGNoX3NpemUsCiAg ICAgICAgICAgZXJyb3IgKEVYSVRfVFJPVUJMRSwgMCwKICAgICAgICAgICAgICAgICAgXygi ZXhjZWVkZWQgUENSRSdzIGJhY2t0cmFja2luZyBsaW1pdCIpKTsKIAotICAgICAgICBjYXNl IFBDUkVfRVJST1JfQkFEVVRGODoKLSAgICAgICAgICBlcnJvciAoRVhJVF9UUk9VQkxFLCAw LAotICAgICAgICAgICAgICAgICBfKCJpbnZhbGlkIFVURi04IGJ5dGUgc2VxdWVuY2UgaW4g aW5wdXQiKSk7Ci0KICAgICAgICAgZGVmYXVsdDoKICAgICAgICAgICAvKiBGb3Igbm93LCB3 ZSBsdW1wIGFsbCByZW1haW5pbmcgUENSRSBmYWlsdXJlcyBpbnRvIHRoaXMgYmFza2V0Lgog ICAgICAgICAgICAgIElmIGFueW9uZSBjYXJlcyB0byBwcm92aWRlIHNhbXBsZSBncmVwIHVz YWdlIHRoYXQgY2FuIHRyaWdnZXIKQEAgLTE5NywyNSArMjAwLDggQEAgUGV4ZWN1dGUgKGNo YXIgY29uc3QgKmJ1Ziwgc2l6ZV90IHNpemUsIHNpemVfdCAqbWF0Y2hfc2l6ZSwKICAgICB9 CiAgIGVsc2UKICAgICB7Ci0gICAgICAvKiBOYXJyb3cgZG93biB0byB0aGUgbGluZSB3ZSd2 ZSBmb3VuZC4gICovCi0gICAgICBjaGFyIGNvbnN0ICpiZWcgPSBsaW5lX2J1ZiArIHN1Ylsw XTsKLSAgICAgIGNoYXIgY29uc3QgKmVuZCA9IGxpbmVfYnVmICsgc3ViWzFdOwotICAgICAg Y2hhciBjb25zdCAqYnVmbGltID0gYnVmICsgc2l6ZTsKLSAgICAgIGNoYXIgZW9sID0gZW9s Ynl0ZTsKLSAgICAgIGlmICghc3RhcnRfcHRyKQotICAgICAgICB7Ci0gICAgICAgICAgLyog RklYTUU6IFRoZSBjYXNlIHdoZW4gJ1xuJyBpcyBub3QgZm91bmQgaW5kaWNhdGVzIGEgYnVn OgotICAgICAgICAgICAgIFNpbmNlIGdyZXAgaXMgbGluZSBvcmllbnRlZCwgdGhlIG1hdGNo IHNob3VsZCBuZXZlciBjb250YWluCi0gICAgICAgICAgICAgYSBuZXdsaW5lLCBzbyB0aGVy ZSBfbXVzdF8gYmUgYSBuZXdsaW5lIGZvbGxvd2luZy4KLSAgICAgICAgICAgKi8KLSAgICAg ICAgICBpZiAoIShlbmQgPSBtZW1jaHIgKGVuZCwgZW9sLCBidWZsaW0gLSBlbmQpKSkKLSAg ICAgICAgICAgIGVuZCA9IGJ1ZmxpbTsKLSAgICAgICAgICBlbHNlCi0gICAgICAgICAgICBl bmQrKzsKLSAgICAgICAgICB3aGlsZSAoYnVmIDwgYmVnICYmIGJlZ1stMV0gIT0gZW9sKQot ICAgICAgICAgICAgLS1iZWc7Ci0gICAgICAgIH0KLQorICAgICAgY2hhciBjb25zdCAqYmVn ID0gc3RhcnRfcHRyID8gcCArIHN1YlswXSA6IGxpbmVfc3RhcnQ7CisgICAgICBjaGFyIGNv bnN0ICplbmQgPSBzdGFydF9wdHIgPyBwICsgc3ViWzFdIDogbGluZV9lbmQgKyAxOwogICAg ICAgKm1hdGNoX3NpemUgPSBlbmQgLSBiZWc7CiAgICAgICByZXR1cm4gYmVnIC0gYnVmOwog ICAgIH0KZGlmZiAtLWdpdCBhL3Rlc3RzL3BjcmUtaW5mbG9vcCBiL3Rlc3RzL3BjcmUtaW5m bG9vcAppbmRleCAxYjMzZTcyLi5iOTJmOGUxIDEwMDc1NQotLS0gYS90ZXN0cy9wY3JlLWlu Zmxvb3AKKysrIGIvdGVzdHMvcGNyZS1pbmZsb29wCkBAIC0yOCw2ICsyOCw2IEBAIHByaW50 ZiAnYVwyMDFiXHInID4gaW4gfHwgZnJhbWV3b3JrX2ZhaWx1cmVfCiBmYWlsPTAKIAogTENf QUxMPWVuX1VTLlVURi04IHRpbWVvdXQgMyBncmVwIC1QICdhLj8uLmInIGluCi10ZXN0ICQ/ ID0gMiB8fCBmYWlsXyAibGlicGNyZSdzIG1hdGNoIGZ1bmN0aW9uIGFwcGVhcnMgdG8gaW5m bG9vcCIKK3Rlc3QgJD8gPSAxIHx8IGZhaWxfICJsaWJwY3JlJ3MgbWF0Y2ggZnVuY3Rpb24g YXBwZWFycyB0byBpbmZsb29wIgogCiBFeGl0ICRmYWlsCmRpZmYgLS1naXQgYS90ZXN0cy9w Y3JlLWludmFsaWQtdXRmOC1pbnB1dCBiL3Rlc3RzL3BjcmUtaW52YWxpZC11dGY4LWlucHV0 CmluZGV4IDkxM2U4ZWUuLmY0MmUwZGQgMTAwNzU1Ci0tLSBhL3Rlc3RzL3BjcmUtaW52YWxp ZC11dGY4LWlucHV0CisrKyBiL3Rlc3RzL3BjcmUtaW52YWxpZC11dGY4LWlucHV0CkBAIC0x Niw2ICsxNiw2IEBAIGZhaWw9MAogcHJpbnRmICdqXDIwMlxualxuJyA+IGluIHx8IGZyYW1l d29ya19mYWlsdXJlXwogCiBMQ19BTEw9ZW5fVVMuVVRGLTggZ3JlcCAtUCBqIGluCi10ZXN0 ICQ/IC1lcSAyIHx8IGZhaWw9MQordGVzdCAkPyAtZXEgMCB8fCBmYWlsPTEKIAogRXhpdCAk ZmFpbAotLSAKMS45LjMKCg== --------------060706050205060007010706-- From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 09 19:39:48 2014 Received: (at 18266) by debbugs.gnu.org; 9 Sep 2014 23:39:48 +0000 Received: from localhost ([127.0.0.1]:35765 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRV0l-0002kG-ON for submit@debbugs.gnu.org; Tue, 09 Sep 2014 19:39:47 -0400 Received: from conuserg007.nifty.com ([202.248.44.33]:61975 helo=conuserg007-v.nifty.com) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRV0h-0002jw-W9 for 18266@debbugs.gnu.org; Tue, 09 Sep 2014 19:39:45 -0400 Received: from [10.120.1.42] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by conuserg007-v.nifty.com with ESMTP id s89NdDCi024095; Wed, 10 Sep 2014 08:39:13 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Wed, 10 Sep 2014 08:39:10 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#18266: grep -P and invalid exits with error In-Reply-To: <540F5C1F.2040008@cs.ucla.edu> References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> Message-Id: <20140910083909.6495.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, Santiago , Vincent Lefevre , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.3 (-) I see that new version has no response for following test which was used previously. printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b' From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 09 20:01:10 2014 Received: (at 18266) by debbugs.gnu.org; 10 Sep 2014 00:01:10 +0000 Received: from localhost ([127.0.0.1]:35770 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRVLR-0003IX-Ma for submit@debbugs.gnu.org; Tue, 09 Sep 2014 20:01:09 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:36816) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRVLO-0003Hp-LR for 18266@debbugs.gnu.org; Tue, 09 Sep 2014 20:01:07 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 5AC3439E8011; Tue, 9 Sep 2014 17:01:00 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lQec9j5YYVR9; Tue, 9 Sep 2014 17:00:51 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id A05C539E8013; Tue, 9 Sep 2014 17:00:51 -0700 (PDT) Message-ID: <540F94B3.1040804@cs.ucla.edu> Date: Tue, 09 Sep 2014 17:00:51 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Norihiro Tanaka Subject: Re: bug#18266: grep -P and invalid exits with error References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140910083909.6495.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, Santiago , Vincent Lefevre , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) Norihiro Tanaka wrote: > I see that new version has no response for following test which was used > previously. > > printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b' > Thanks for reporting that. The test case works for me (Fedora 20 x86-64, GCC 4.9.1): $ printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b' | od -c 0000000 200 a b \n 0000004 Fedora 20 is using pcre version 8.33-6.fc20; perhaps there's a PCRE version dependency here? Can you use GDB to put a breakpoint on pcre_exec and see what values it's returning, and what it's storing into sub[0] and sub[1]? Here's what I see (I compiled grep with '-g3 -O0'): $ printf '\x80ab\n' >in $ gdb src/grep ... (gdb) b pcre_exec ... (gdb) r -P '.?b' in ... (gdb) fin ... (gdb) n ... (gdb) p e $1 = -10 (gdb) c ... (gdb) fin ... (gdb) n ... (gdb) p e $2 = -1 (gdb) c ... (gdb) fin ... (gdb) n ... (gdb) p e $3 = 1 (gdb) p sub[0] $4 = 0 (gdb) p sub[1] $5 = 2 (gdb) p p $6 = 0x62f001 "ab\n" (gdb) p buf $7 = 0x62f000 "\200ab\n" That is, the first call to pcre_exec reports the encoding error, the second one (on the empty string) reports no match, and the third one (on "ab") finds the match. From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 10 03:08:32 2014 Received: (at 18266) by debbugs.gnu.org; 10 Sep 2014 07:08:32 +0000 Received: from localhost ([127.0.0.1]:35913 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRc12-0007A8-7X for submit@debbugs.gnu.org; Wed, 10 Sep 2014 03:08:32 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:50912) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRc0z-00079s-I1 for 18266@debbugs.gnu.org; Wed, 10 Sep 2014 03:08:30 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id B40F639E801D; Wed, 10 Sep 2014 00:08:23 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EP0-AIdSNQuy; Wed, 10 Sep 2014 00:08:19 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id D8D0639E8014; Wed, 10 Sep 2014 00:08:18 -0700 (PDT) Message-ID: <540FF8E2.9080903@cs.ucla.edu> Date: Wed, 10 Sep 2014 00:08:18 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Norihiro Tanaka Subject: Re: bug#18266: grep -P and invalid exits with error References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> <540F94B3.1040804@cs.ucla.edu> In-Reply-To: <540F94B3.1040804@cs.ucla.edu> Content-Type: multipart/mixed; boundary="------------050808070400090005020100" X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, Santiago , Vincent Lefevre , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) This is a multi-part message in MIME format. --------------050808070400090005020100 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Paul Eggert wrote: > perhaps there's a PCRE version dependency here? I found a PCRE-version-dependent problem that may be relevant, and installed the attached further patch to fix it. --------------050808070400090005020100 Content-Type: text/plain; charset=UTF-8; name="0001-grep-port-recent-fix-to-older-pcre-version.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0001-grep-port-recent-fix-to-older-pcre-version.patch" RnJvbSBkYzdkNTMyZDE2ZGVjNzQwZDExYjY4MTdjOWI1NTg1NDNhY2EwMTM2IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBXZWQsIDEwIFNlcCAyMDE0IDAwOjA0OjU4IC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gZ3JlcDogcG9ydCByZWNlbnQgZml4IHRvIG9sZGVyIHBjcmUgdmVyc2lvbgoKKiBzcmMv cGNyZXNlYXJjaC5jIChQZXhlY3V0ZSk6IERvbid0IGFzc3VtZSB0aGF0IGEgcGNyZV9leGVj CnRoYXQgcmV0dXJucyBQQ1JFX0VSUk9SX05PTUFUQ0ggbGVhdmVzIGl0cyBzdWIgYXJndW1l bnQgYWxvbmUuClRoaXMgYXNzdW1wdGlvbiBpcyBmYWxzZSBmb3IgbGlicGNyZS0zIHZlcnNp b24gOC4zMS0ydWJ1bnR1Mi4KLS0tCiBzcmMvcGNyZXNlYXJjaC5jIHwgNiArKysrLS0KIDEg ZmlsZSBjaGFuZ2VkLCA0IGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pCgpkaWZmIC0t Z2l0IGEvc3JjL3BjcmVzZWFyY2guYyBiL3NyYy9wY3Jlc2VhcmNoLmMKaW5kZXggMmEwMWU2 ZC4uNGUyY2NmOCAxMDA2NDQKLS0tIGEvc3JjL3BjcmVzZWFyY2guYworKysgYi9zcmMvcGNy ZXNlYXJjaC5jCkBAIC0xNTcsMTQgKzE1NywxNiBAQCBQZXhlY3V0ZSAoY2hhciBjb25zdCAq YnVmLCBzaXplX3Qgc2l6ZSwgc2l6ZV90ICptYXRjaF9zaXplLAogICAgICAgLyogVHJlYXQg ZW5jb2RpbmctZXJyb3IgYnl0ZXMgYXMgZGF0YSB0aGF0IGNhbm5vdCBtYXRjaC4gICovCiAg ICAgICBmb3IgKDs7KQogICAgICAgICB7CisgICAgICAgICAgaW50IHZhbGlkX2J5dGVzOwog ICAgICAgICAgIGUgPSBwY3JlX2V4ZWMgKGNyZSwgZXh0cmEsIHAsIGxpbmVfZW5kIC0gcCwg MCwgb3B0aW9ucywgc3ViLCBuc3ViKTsKICAgICAgICAgICBpZiAoZSAhPSBQQ1JFX0VSUk9S X0JBRFVURjgpCiAgICAgICAgICAgICBicmVhazsKLSAgICAgICAgICBlID0gcGNyZV9leGVj IChjcmUsIGV4dHJhLCBwLCBzdWJbMF0sIDAsCisgICAgICAgICAgdmFsaWRfYnl0ZXMgPSBz dWJbMF07CisgICAgICAgICAgZSA9IHBjcmVfZXhlYyAoY3JlLCBleHRyYSwgcCwgdmFsaWRf Ynl0ZXMsIDAsCiAgICAgICAgICAgICAgICAgICAgICAgICAgb3B0aW9ucyB8IFBDUkVfTk9f VVRGOF9DSEVDSywgc3ViLCBuc3ViKTsKICAgICAgICAgICBpZiAoZSAhPSBQQ1JFX0VSUk9S X05PTUFUQ0gpCiAgICAgICAgICAgICBicmVhazsKLSAgICAgICAgICBwICs9IHN1YlswXSAr IDE7CisgICAgICAgICAgcCArPSB2YWxpZF9ieXRlcyArIDE7CiAgICAgICAgICAgb3B0aW9u cyA9IFBDUkVfTk9UQk9MOwogICAgICAgICB9CiAKLS0gCjEuOS4zCgo= --------------050808070400090005020100-- From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 10 07:22:47 2014 Received: (at 18266) by debbugs.gnu.org; 10 Sep 2014 11:22:47 +0000 Received: from localhost ([127.0.0.1]:35988 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRfz5-0005Dd-6r for submit@debbugs.gnu.org; Wed, 10 Sep 2014 07:22:47 -0400 Received: from mx1.riseup.net ([198.252.153.129]:47648) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRfz0-0005DS-LQ for 18266@debbugs.gnu.org; Wed, 10 Sep 2014 07:22:44 -0400 Received: from berryeater.riseup.net (berryeater-pn.riseup.net [10.0.1.120]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 67DC04DB09; Wed, 10 Sep 2014 04:22:41 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr) with ESMTPSA id 8CAC64206B Received: by nomada (sSMTP sendmail emulation); Wed, 10 Sep 2014 13:22:36 +0200 Date: Wed, 10 Sep 2014 13:22:36 +0200 From: Santiago To: Paul Eggert , 758105@bugs.debian.org Subject: Re: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140910112235.GA17843@nomada> References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> <540F94B3.1040804@cs.ucla.edu> <540FF8E2.9080903@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <540FF8E2.9080903@cs.ucla.edu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: clamav-milter 0.98.4 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Vincent Lefevre X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) El 10/09/14 a las 00:08, Paul Eggert escribió: > Paul Eggert wrote: > >perhaps there's a PCRE version dependency here? > > I found a PCRE-version-dependent problem that may be relevant, and installed > the attached further patch to fix it. Thanks! I'm including this fix in the current debian package. Santiago (Ruano Rincón) P.S. Vincent Lefevre actually reported this bug, not Santiago Vila. From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 10 10:20:43 2014 Received: (at 18266) by debbugs.gnu.org; 10 Sep 2014 14:20:43 +0000 Received: from localhost ([127.0.0.1]:36634 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRilH-0002iI-4S for submit@debbugs.gnu.org; Wed, 10 Sep 2014 10:20:43 -0400 Received: from mailgw04.kcn.ne.jp ([61.86.7.211]:44450) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRilC-0002hv-Sg for 18266@debbugs.gnu.org; Wed, 10 Sep 2014 10:20:41 -0400 Received: from imp02 (mailgw6.kcn.ne.jp [61.86.15.232]) by mailgw04.kcn.ne.jp (Postfix) with ESMTP id E43F36C1BA3 for <18266@debbugs.gnu.org>; Wed, 10 Sep 2014 23:20:31 +0900 (JST) Received: from mail07.kcn.ne.jp ([61.86.6.186]) by imp02 with bizsmtp id pSLX1o00C40oyB901SLX7S; Wed, 10 Sep 2014 23:20:31 +0900 X-OrgRCPT: 18266@debbugs.gnu.org Received: from [10.120.1.59] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail07.kcn.ne.jp (Postfix) with ESMTPA id 3D971D5009A; Wed, 10 Sep 2014 23:20:31 +0900 (JST) Date: Wed, 10 Sep 2014 23:20:28 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#18266: grep -P and invalid exits with error In-Reply-To: <540FF8E2.9080903@cs.ucla.edu> References: <540F94B3.1040804@cs.ucla.edu> <540FF8E2.9080903@cs.ucla.edu> Message-Id: <20140910232027.9B3E.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, Santiago , Vincent Lefevre , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.5 (--) Thanks. I have confirmed that new version has expected response as following. $ env LC_ALL=en_US.utf8 src/grep -P '.?b' in ab From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 04:15:23 2014 Received: (at 18266) by debbugs.gnu.org; 11 Sep 2014 08:15:23 +0000 Received: from localhost ([127.0.0.1]:37247 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRzXH-0004aT-8v for submit@debbugs.gnu.org; Thu, 11 Sep 2014 04:15:23 -0400 Received: from ypig.lip.ens-lyon.fr ([140.77.13.48]:57150) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XRzXE-0004aI-Av for 18266@debbugs.gnu.org; Thu, 11 Sep 2014 04:15:21 -0400 Received: from vlefevre by ypig.lip.ens-lyon.fr with local (Exim 4.84) (envelope-from ) id 1XRzX4-0001S6-F1; Thu, 11 Sep 2014 10:15:10 +0200 Date: Thu, 11 Sep 2014 10:15:10 +0200 From: Vincent Lefevre To: Santiago Subject: Re: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140911081510.GA4859@ypig.lip.ens-lyon.fr> References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> <540F94B3.1040804@cs.ucla.edu> <540FF8E2.9080903@cs.ucla.edu> <20140910112235.GA17843@nomada> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20140910112235.GA17843@nomada> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Paul Eggert , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 2014-09-10 13:22:36 +0200, Santiago wrote: > Thanks! I'm including this fix in the current debian package. Unfortunately, it is very slow, with a large slowdown factor. I've just reported a new Debian concerning the performance problem. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 07:07:12 2014 Received: (at 18266) by debbugs.gnu.org; 11 Sep 2014 11:07:12 +0000 Received: from localhost ([127.0.0.1]:37369 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS2DX-0000W6-Fi for submit@debbugs.gnu.org; Thu, 11 Sep 2014 07:07:11 -0400 Received: from ypig.lip.ens-lyon.fr ([140.77.13.48]:34502) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS2DT-0000Vv-M4 for 18266@debbugs.gnu.org; Thu, 11 Sep 2014 07:07:08 -0400 Received: from vlefevre by ypig.lip.ens-lyon.fr with local (Exim 4.84) (envelope-from ) id 1XS2DM-0005wD-V2; Thu, 11 Sep 2014 13:07:01 +0200 Date: Thu, 11 Sep 2014 13:07:00 +0200 From: Vincent Lefevre To: Paul Eggert Subject: handling bytes not part of the charset, and other garbage (was: grep -P and invalid exits with error) Message-ID: <20140911110700.GA20565@ypig.lip.ens-lyon.fr> References: <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <54042EF9.6000309@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 2014-09-01 01:31:53 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > >If there are many invalid UTF8 bytes, this would be slow, IMHO > > That's OK. We don't need grep -P to be fast on invalid input. I can see a too important slowdown in practical cases. > >But is the copy of the buffer really needed? Couldn't the invalid > >UTF8 sequences just be replaced by null bytes? > > I'd rather not, because that changes the semantics of matching. The null > byte is valid input data that might get matched. It appears that the current behavior in UTF-8 is incorrect, even without -P. For instance: $ printf 'tr\xe8s\n' > text $ grep 'tr.s' text $ LC_ALL=C grep 'tr.s' text trs There's no reason that '.' matches something that doesn't belong to the charset in C locale, but doesn't match in a UTF-8 locale. The pattern tr.s is used here to match the French word "très" in files that could be encoded in ISO-8859-1 or UTF-8 locales. In the past, before using UTF-8 locales, I was doing something like: grep -E 'tr..?s' text to match both encodings, and this worked (I could get false positives, but anyway, one is often not interested in all the real grep matches in practice, so that even when knowing the encoding, one was already getting false positives). It's annoying that now in UTF-8, one can no longer match ISO-8859-1 text, and doing a pre-conversion would take too much time. Concerning binary files, I've never wanted to differentiate explicitly null bytes and invalid UTF-8 sequences: IMHO, this is just garbage. There are obviously no differences with patterns like 'some_word' or 'foo[0-9]*bar', but when I use a pattern like 'foo.bar' or 'foo.*bar', I can see two valid reasons to handle these sequences in a similar way with '.': 1. One may want to match "valid" (often in the sense "printable", in the specified encoding) but unknown characters. 2. One may also want to match garbage (including null bytes, and also bytes that do not have any meaning in the charset), with the drawback that if the garbage contains a newline character, this won't work. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 12:23:02 2014 Received: (at 18266) by debbugs.gnu.org; 11 Sep 2014 16:23:02 +0000 Received: from localhost ([127.0.0.1]:38311 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS79B-0003AP-J9 for submit@debbugs.gnu.org; Thu, 11 Sep 2014 12:23:01 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:45095) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS799-0003A7-Jn for 18266@debbugs.gnu.org; Thu, 11 Sep 2014 12:23:00 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 6CFFDA6001D; Thu, 11 Sep 2014 09:22:58 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EOMBdTG+-6m4; Thu, 11 Sep 2014 09:22:49 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id B1B48A6001A; Thu, 11 Sep 2014 09:22:49 -0700 (PDT) Message-ID: <5411CC59.10407@cs.ucla.edu> Date: Thu, 11 Sep 2014 09:22:49 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: handling bytes not part of the charset, and other garbage References: <20140814210350.GJ5034@xvii.vinc17.org> <53ED2B30.2050108@cs.ucla.edu> <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> In-Reply-To: <20140911110700.GA20565@ypig.lip.ens-lyon.fr> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) Vincent Lefevre wrote: > There's no reason that '.' matches something that doesn't belong to > the charset in C locale, but doesn't match in a UTF-8 locale. In the C locale on GNU/Linux, all byte values are members of the charset. That is why it's OK for '.' to accept that byte in the C locale but reject it in a UTF-8 locale. > It's annoying that now in UTF-8, one can no longer match ISO-8859-1 text This has been true for quite some time in 'grep', at least with the standard matchers. It may not have been true for -P but that relied on undefined behavior that could crash grep, and we can't have that. It would make sense to add a notation to mean "match any character or invalid byte", as an extension. That'd take some work, though. From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 13:07:59 2014 Received: (at 18266-done) by debbugs.gnu.org; 11 Sep 2014 17:07:59 +0000 Received: from localhost ([127.0.0.1]:38427 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS7qg-0004UE-Q5 for submit@debbugs.gnu.org; Thu, 11 Sep 2014 13:07:59 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:47384) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS7qd-0004U4-O0 for 18266-done@debbugs.gnu.org; Thu, 11 Sep 2014 13:07:56 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id ED7BBA60001; Thu, 11 Sep 2014 10:07:54 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lLLI9zgH66od; Thu, 11 Sep 2014 10:07:50 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 0AC62A6001C; Thu, 11 Sep 2014 10:07:50 -0700 (PDT) Message-ID: <5411D6E5.4000402@cs.ucla.edu> Date: Thu, 11 Sep 2014 10:07:49 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Vincent Lefevre , Santiago Subject: Re: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> <540F94B3.1040804@cs.ucla.edu> <540FF8E2.9080903@cs.ucla.edu> <20140910112235.GA17843@nomada> <20140911081510.GA4859@ypig.lip.ens-lyon.fr> In-Reply-To: <20140911081510.GA4859@ypig.lip.ens-lyon.fr> Content-Type: multipart/mixed; boundary="------------010905060707060206020300" X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 18266-done Cc: 18266-done@debbugs.gnu.org, 758105@bugs.debian.org, 761157@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) This is a multi-part message in MIME format. --------------010905060707060206020300 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Vincent Lefevre wrote: > I've just reported a new Debian concerning the performance problem. It's not clear from http://bugs.debian.org/761157 that the performance problem occurs only with -P, but I assume that's what is meant. Since this is a performance bug with PCRE, I suggest moving the Debian bug report to the Debian libpcre3 package. Grep cannot go back to the old way, which could cause grep to crash, and the bug cannot be fixed in grep because libpcre3 does not provide a fast way to search arbitrary data that may include encoding errors. It really is a problem that requires changes to libpcre3 to fix; grep cannot fix it. In the meantime, in order to use 'grep' to search for strings in arbitrary data, I suggest omitting the '-P'. Also, I suggest using the C locale. As the GNU bug 18266 "grep -P and invalid exits with error" has been fixed, I'm closing that bug report. Please feel free to open a separate GNU bug report for the performance issue. PS. While composing this email I noticed another bug in grep -P and encoding errors, which I fixed by installing the attached patch. --------------010905060707060206020300 Content-Type: text/plain; charset=UTF-8; name="0001-grep-fix-false-matches-with-P-.-and-invalid-UTF-8.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0001-grep-fix-false-matches-with-P-.-and-invalid-UTF-8.patch" RnJvbSBmYjM5YjMyYjEyYmUwYzYxMTRmMDlkNTE4MThjZDcwMzE2MWIxMDRlIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBUaHUsIDExIFNlcCAyMDE0IDA5OjUyOjAxIC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gZ3JlcDogZml4IGZhbHNlIG1hdGNoZXMgd2l0aCAtUCAnLi4uJCcgYW5kIGludmFsaWQg VVRGLTgKCiogc3JjL3BjcmVzZWFyY2guYyAoUGV4ZWN1dGUpOiBVc2UgUENSRV9OT1RFT0wg d2hlbiBtYXRjaGluZwppbml0aWFsIHN1YnN0cmluZ3Mgb2YgYSBsaW5lLgotLS0KIHNyYy9w Y3Jlc2VhcmNoLmMgfCAzICsrLQogMSBmaWxlIGNoYW5nZWQsIDIgaW5zZXJ0aW9ucygrKSwg MSBkZWxldGlvbigtKQoKZGlmZiAtLWdpdCBhL3NyYy9wY3Jlc2VhcmNoLmMgYi9zcmMvcGNy ZXNlYXJjaC5jCmluZGV4IDRlMmNjZjguLjE3ZTBlMzIgMTAwNjQ0Ci0tLSBhL3NyYy9wY3Jl c2VhcmNoLmMKKysrIGIvc3JjL3BjcmVzZWFyY2guYwpAQCAtMTYzLDcgKzE2Myw4IEBAIFBl eGVjdXRlIChjaGFyIGNvbnN0ICpidWYsIHNpemVfdCBzaXplLCBzaXplX3QgKm1hdGNoX3Np emUsCiAgICAgICAgICAgICBicmVhazsKICAgICAgICAgICB2YWxpZF9ieXRlcyA9IHN1Ylsw XTsKICAgICAgICAgICBlID0gcGNyZV9leGVjIChjcmUsIGV4dHJhLCBwLCB2YWxpZF9ieXRl cywgMCwKLSAgICAgICAgICAgICAgICAgICAgICAgICBvcHRpb25zIHwgUENSRV9OT19VVEY4 X0NIRUNLLCBzdWIsIG5zdWIpOworICAgICAgICAgICAgICAgICAgICAgICAgIG9wdGlvbnMg fCBQQ1JFX05PX1VURjhfQ0hFQ0sgfCBQQ1JFX05PVEVPTCwKKyAgICAgICAgICAgICAgICAg ICAgICAgICBzdWIsIG5zdWIpOwogICAgICAgICAgIGlmIChlICE9IFBDUkVfRVJST1JfTk9N QVRDSCkKICAgICAgICAgICAgIGJyZWFrOwogICAgICAgICAgIHAgKz0gdmFsaWRfYnl0ZXMg KyAxOwotLSAKMS45LjMKCg== --------------010905060707060206020300-- From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 14:37:44 2014 Received: (at 18266) by debbugs.gnu.org; 11 Sep 2014 18:37:44 +0000 Received: from localhost ([127.0.0.1]:38464 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS9FX-0006x6-B4 for submit@debbugs.gnu.org; Thu, 11 Sep 2014 14:37:44 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:50448) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS9FQ-0006wn-I0; Thu, 11 Sep 2014 14:37:37 -0400 Received: by mail-wi0-f173.google.com with SMTP id em10so1473296wid.0 for ; Thu, 11 Sep 2014 11:37:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=F17RruAT//RCpakYRx/xfw+xiDOpJJvhtunBepJYQG0=; b=wbAyesKcnZMbUc0LoqPJXvl1x0aou1ahtSyIkBmuR9cmT5dOYnpXmlBNpFAiya7YR9 kD5o6erUC+p7Q39kPFRRIGyQ7pZHa+aoDVxvgMrUpZFse7OVcynRj/PGH/g7iIdWWAS8 sNwwrEboMGeHQqcZvlyLhiVeZCjolu1LtVk80XY212jCJBRwSs/IXRZ6nOPgBqMJp+50 mlRmPDxQZsYsazsrFCCtCJOUa5S6SHyDn/Vf3F7X5/zcfXlSjnRe14chIq6ITmZY1XJ3 IvUefhgphs3ZTHc5M43zGVDiu5UtmyeZ/pIoWmQH0khG7YGh5Ap0E42ncJlWyRgI5/p6 /NgQ== X-Received: by 10.194.59.18 with SMTP id v18mr4214337wjq.64.1410460651303; Thu, 11 Sep 2014 11:37:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.41.202 with HTTP; Thu, 11 Sep 2014 11:37:11 -0700 (PDT) In-Reply-To: <5411D6E5.4000402@cs.ucla.edu> References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> <540F94B3.1040804@cs.ucla.edu> <540FF8E2.9080903@cs.ucla.edu> <20140910112235.GA17843@nomada> <20140911081510.GA4859@ypig.lip.ens-lyon.fr> <5411D6E5.4000402@cs.ucla.edu> From: Jim Meyering Date: Thu, 11 Sep 2014 11:37:11 -0700 X-Google-Sender-Auth: xk-R4xawpbRwMJsicuPonEydz4A Message-ID: Subject: Re: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error To: 18266@debbugs.gnu.org, Paul Eggert , Santiago Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, 18266-done@debbugs.gnu.org, Vincent Lefevre , 761157@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Thu, Sep 11, 2014 at 10:07 AM, Paul Eggert wrote: > Vincent Lefevre wrote: > >> I've just reported a new Debian concerning the performance problem. > > > It's not clear from http://bugs.debian.org/761157 that the performance > problem occurs only with -P, but I assume that's what is meant. > > Since this is a performance bug with PCRE, I suggest moving the Debian bug > report to the Debian libpcre3 package. Grep cannot go back to the old way, > which could cause grep to crash, and the bug cannot be fixed in grep because > libpcre3 does not provide a fast way to search arbitrary data that may > include encoding errors. It really is a problem that requires changes to > libpcre3 to fix; grep cannot fix it. > > In the meantime, in order to use 'grep' to search for strings in arbitrary > data, I suggest omitting the '-P'. Also, I suggest using the C locale. > > As the GNU bug 18266 "grep -P and invalid exits with error" has been fixed, > I'm closing that bug report. Please feel free to open a separate GNU bug > report for the performance issue. > > PS. While composing this email I noticed another bug in grep -P and > encoding errors, which I fixed by installing the attached patch. Thanks for fixing yet another bug, Paul. Would you mind adding a test to trigger that one? From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 15:10:38 2014 Received: (at 18266) by debbugs.gnu.org; 11 Sep 2014 19:10:38 +0000 Received: from localhost ([127.0.0.1]:38471 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS9lO-0007q4-6m for submit@debbugs.gnu.org; Thu, 11 Sep 2014 15:10:38 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:53668) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XS9lK-0007pn-OZ; Thu, 11 Sep 2014 15:10:36 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 8D257A6001E; Thu, 11 Sep 2014 12:10:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Uy3YVIWefBUQ; Thu, 11 Sep 2014 12:10:28 -0700 (PDT) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 451D8A60001; Thu, 11 Sep 2014 12:10:28 -0700 (PDT) Message-ID: <5411F3A3.9050403@cs.ucla.edu> Date: Thu, 11 Sep 2014 12:10:27 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 MIME-Version: 1.0 To: Jim Meyering , 18266@debbugs.gnu.org, Santiago Subject: Re: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> <540F94B3.1040804@cs.ucla.edu> <540FF8E2.9080903@cs.ucla.edu> <20140910112235.GA17843@nomada> <20140911081510.GA4859@ypig.lip.ens-lyon.fr> <5411D6E5.4000402@cs.ucla.edu> In-Reply-To: Content-Type: multipart/mixed; boundary="------------090703060506070000090907" X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, 18266-done@debbugs.gnu.org, Vincent Lefevre , 761157@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) This is a multi-part message in MIME format. --------------090703060506070000090907 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 09/11/2014 11:37 AM, Jim Meyering wrote: > Would you mind adding a test to trigger that one? Ordinarily I would have done that already but this -P stuff is so buggy and slow that I got discouraged. (If we keep having trouble with -P I may start lobbying to remove it....) Anyway, I gave it a shot with the attached further patch. --------------090703060506070000090907 Content-Type: text/x-patch; name="0001-grep-fix-false-matches-with-P-.-and-invalid-UTF-8.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-grep-fix-false-matches-with-P-.-and-invalid-UTF-8.patch" >From 266b8d4485053a6733e11d43a66c09d080c520fa Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 11 Sep 2014 12:05:19 -0700 Subject: [PATCH] grep: fix false matches with -P '...$' and invalid UTF-8 * tests/pcre-invalid-utf8-input: Add a test for that. --- tests/pcre-invalid-utf8-input | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tests/pcre-invalid-utf8-input b/tests/pcre-invalid-utf8-input index f42e0dd..9da4b18 100755 --- a/tests/pcre-invalid-utf8-input +++ b/tests/pcre-invalid-utf8-input @@ -13,9 +13,12 @@ require_en_utf8_locale_ fail=0 -printf 'j\202\nj\n' > in || framework_failure_ +printf 'j\202j\nj\nk\202\n' > in || framework_failure_ LC_ALL=en_US.UTF-8 grep -P j in test $? -eq 0 || fail=1 +LC_ALL=en_US.UTF-8 grep -P 'k$' in +test $? -eq 1 || fail=1 + Exit $fail -- 1.9.3 --------------090703060506070000090907-- From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 20:37:06 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 00:37:06 +0000 Received: from localhost ([127.0.0.1]:38661 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSErJ-0003AQ-Fx for submit@debbugs.gnu.org; Thu, 11 Sep 2014 20:37:05 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:47043) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSErG-00039B-3N for 18266@debbugs.gnu.org; Thu, 11 Sep 2014 20:37:03 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id 9A86F29A; Fri, 12 Sep 2014 02:37:00 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id 17F5721A079; Fri, 12 Sep 2014 02:37:00 +0200 (CEST) Date: Fri, 12 Sep 2014 02:36:59 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: handling bytes not part of the charset, and other garbage Message-ID: <20140912003659.GA18162@xvii.vinc17.org> References: <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5411CC59.10407@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.5 (--) On 2014-09-11 09:22:49 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > > >There's no reason that '.' matches something that doesn't belong to > >the charset in C locale, but doesn't match in a UTF-8 locale. > > In the C locale on GNU/Linux, all byte values are members of the charset. I don't see any valid reason for that (the C locale corresponds to ANSI_X3.4-1968, which is 7-bit only, so that there is some inconsistency), except that it could be seen as more practical. But then, I would say that this should be the same for invalid byte sequences in a UTF-8 locale. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 21:16:47 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 01:16:47 +0000 Received: from localhost ([127.0.0.1]:38669 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSFTi-00047p-06 for submit@debbugs.gnu.org; Thu, 11 Sep 2014 21:16:47 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:41590) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSFTf-00047e-Ak for 18266@debbugs.gnu.org; Thu, 11 Sep 2014 21:16:44 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id ECEEAA60001; Thu, 11 Sep 2014 18:16:41 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wlpmjCshWs3A; Thu, 11 Sep 2014 18:16:33 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 3D22CA6001D; Thu, 11 Sep 2014 18:16:33 -0700 (PDT) Message-ID: <5412496D.8050805@cs.ucla.edu> Date: Thu, 11 Sep 2014 18:16:29 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140816140127.GA2252@nomada> <20140816162621.GM5034@xvii.vinc17.org> <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> In-Reply-To: <20140912003659.GA18162@xvii.vinc17.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) Vincent Lefevre wrote: > the C locale corresponds to ANSI_X3.4-1968, No it doesn't, at least not on any current platform I'm aware of. And POSIX does not require that. POSIX even allows the C locale to be multibyte, e.g., UTF-8. > I would say that this should be the same for invalid > byte sequences in a UTF-8 locale. One *could* design an encoding with that property, but it wouldn't be UTF-8; it would be something else. I don't know of any C library that does that to UTF-8. There are good arguments against doing it, e.g., one loses the property that one can concatenate character strings by concatenating their byte representations. Anyway I'm afraid we may be going off the deep end here. After all, grep can't impose its coding system design onto the operating system; it's more the other way around. From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 21:41:30 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 01:41:30 +0000 Received: from localhost ([127.0.0.1]:38691 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSFrd-0004js-S1 for submit@debbugs.gnu.org; Thu, 11 Sep 2014 21:41:30 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:47059) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSFra-0004jf-GS for 18266@debbugs.gnu.org; Thu, 11 Sep 2014 21:41:27 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id 5E9C52CC; Fri, 12 Sep 2014 03:41:25 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id ED3BF21A079; Fri, 12 Sep 2014 03:41:24 +0200 (CEST) Date: Fri, 12 Sep 2014 03:41:24 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage Message-ID: <20140912014124.GA4404@xvii.vinc17.org> References: <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5412496D.8050805@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.5 (--) On 2014-09-11 18:16:29 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > >the C locale corresponds to ANSI_X3.4-1968, > > No it doesn't, at least not on any current platform I'm aware of. It does on Debian: ypig% LC_ALL=C locale charmap ANSI_X3.4-1968 > >I would say that this should be the same for invalid > >byte sequences in a UTF-8 locale. > > One *could* design an encoding with that property, but it wouldn't be UTF-8; > it would be something else. I don't know of any C library that does that to > UTF-8. There are good arguments against doing it, e.g., one loses the > property that one can concatenate character strings by concatenating their > byte representations. I'm talking only about grep here. BTW, the current behavior breaks the sometimes used "grep ." solution to match non-empty lines. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 21:42:51 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 01:42:51 +0000 Received: from localhost ([127.0.0.1]:38695 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSFsw-0004mD-MX for submit@debbugs.gnu.org; Thu, 11 Sep 2014 21:42:51 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:47060) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSFsu-0004m5-5p for 18266@debbugs.gnu.org; Thu, 11 Sep 2014 21:42:48 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id 748F62CC; Fri, 12 Sep 2014 03:42:47 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id 3EE5C21A079; Fri, 12 Sep 2014 03:42:47 +0200 (CEST) Date: Fri, 12 Sep 2014 03:42:47 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error Message-ID: <20140912014247.GB4404@xvii.vinc17.org> References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> <540F94B3.1040804@cs.ucla.edu> <540FF8E2.9080903@cs.ucla.edu> <20140910112235.GA17843@nomada> <20140911081510.GA4859@ypig.lip.ens-lyon.fr> <5411D6E5.4000402@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5411D6E5.4000402@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, Santiago , 758105@bugs.debian.org, 761157@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.5 (--) On 2014-09-11 10:07:49 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > >I've just reported a new Debian concerning the performance problem. > > It's not clear from http://bugs.debian.org/761157 that the performance > problem occurs only with -P, but I assume that's what is meant. It's specific to -P: 2.18-2 0.9s with -P, 0.4s without -P 2.20-3 11.6s with -P, 0.4s without -P > Since this is a performance bug with PCRE, I suggest moving the Debian bug > report to the Debian libpcre3 package. Grep cannot go back to the old way, > which could cause grep to crash, and the bug cannot be fixed in grep because > libpcre3 does not provide a fast way to search arbitrary data that may > include encoding errors. It really is a problem that requires changes to > libpcre3 to fix; grep cannot fix it. Fixing the performance problem in libpcre3 would indeed be better (even with the old version of grep, libpcre3 was twice as slow as grep, but this is less critical than a 13x slowdown). However a workaround in grep could be simpler. I've just opened a new bug and suggested several solutions: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18454 > In the meantime, in order to use 'grep' to search for strings in arbitrary > data, I suggest omitting the '-P'. This is a bit annoying because I sometimes use specific PCRE features. I could try to parse the arguments, detect where the pattern is used, and avoid -P if the pattern doesn't use specific PCRE features (at least for the most common forms). An additional advantage is that it could be twice as fast in most cases (see above). This could also be done in grep, as I suggested in my new bug report. > Also, I suggest using the C locale. This could be a solution, because in practice, I pipe the result to "less -FRX", but only grep has to use the C locale, so that the accented characters are correctly displayed by "less". However with some (rare?) patterns, it won't work because an accented character would no longer be seen as a single character. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 23:26:26 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 03:26:26 +0000 Received: from localhost ([127.0.0.1]:38738 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSHVB-0007O3-Cv for submit@debbugs.gnu.org; Thu, 11 Sep 2014 23:26:25 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:45522) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSHV9-0007Nr-0R for 18266@debbugs.gnu.org; Thu, 11 Sep 2014 23:26:23 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id D2E7DA6001D; Thu, 11 Sep 2014 20:26:21 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wqT9Lr2Q4v-M; Thu, 11 Sep 2014 20:26:13 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 194ACA60001; Thu, 11 Sep 2014 20:26:13 -0700 (PDT) Message-ID: <541267D4.7030605@cs.ucla.edu> Date: Thu, 11 Sep 2014 20:26:12 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140816175637.GA6115@nomada> <53EFA4AC.7090308@cs.ucla.edu> <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> In-Reply-To: <20140912014124.GA4404@xvii.vinc17.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) Vincent Lefevre wrote: > ypig% LC_ALL=C locale charmap > ANSI_X3.4-1968 That may be what the 'locale' command says, but bytes with the top bit on are considered to be valid single-byte characters. There are no encoding errors. So, in that sense it's not strict ASCII. > the current behavior breaks the sometimes used "grep ." solution > to match non-empty lines. "grep ." matches lines containing one or more characters. Encoding errors are not characters, at least, not as far as plain grep is concerned. Perhaps PCRE is different, and if libpcre worked with encoding errors we could simply use its way of matching them. But there doesn't seem to be a safe way to do that. From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 11 23:42:03 2014 Received: (at control) by debbugs.gnu.org; 12 Sep 2014 03:42:03 +0000 Received: from localhost ([127.0.0.1]:38747 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSHkI-0007mG-Tw for submit@debbugs.gnu.org; Thu, 11 Sep 2014 23:42:03 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:46022) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSHkG-0007lq-Iq for control@debbugs.gnu.org; Thu, 11 Sep 2014 23:42:01 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 29DDCA6001E for ; Thu, 11 Sep 2014 20:42:00 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jFVPB3FK1C6n for ; Thu, 11 Sep 2014 20:41:51 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 9DA6EA60001 for ; Thu, 11 Sep 2014 20:41:51 -0700 (PDT) Message-ID: <54126B7F.9000402@cs.ucla.edu> Date: Thu, 11 Sep 2014 20:41:51 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: 18455 is a duplicate of 18266 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) forcemerge 18266 18455 From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 04:29:20 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 08:29:20 +0000 Received: from localhost ([127.0.0.1]:38850 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSMEK-0000aD-2e for submit@debbugs.gnu.org; Fri, 12 Sep 2014 04:29:20 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:47137) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSMEH-0000a4-50 for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 04:29:18 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id 8C8E73B; Fri, 12 Sep 2014 10:29:16 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id 5391421A079; Fri, 12 Sep 2014 10:29:16 +0200 (CEST) Date: Fri, 12 Sep 2014 10:29:16 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage Message-ID: <20140912082916.GD4404@xvii.vinc17.org> References: <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <541267D4.7030605@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.5 (--) On 2014-09-11 20:26:12 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > > >ypig% LC_ALL=C locale charmap > >ANSI_X3.4-1968 > > That may be what the 'locale' command says, but bytes with the top bit on > are considered to be valid single-byte characters. There are no encoding > errors. So, in that sense it's not strict ASCII. Glibc regards it as ASCII: $ printf '\xe8' | LC_ALL=C iconv iconv: illegal input sequence at position 0 > >the current behavior breaks the sometimes used "grep ." solution > >to match non-empty lines. > > "grep ." matches lines containing one or more characters. Encoding errors > are not characters, at least, not as far as plain grep is concerned. I just mean that "grep ." is a method given by some people, that was working before UTF-8. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 12:13:47 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 16:13:47 +0000 Received: from localhost ([127.0.0.1]:39594 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSTTn-0004eS-5s for submit@debbugs.gnu.org; Fri, 12 Sep 2014 12:13:47 -0400 Received: from mail-wi0-f171.google.com ([209.85.212.171]:55337) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSTTj-0004eD-JW; Fri, 12 Sep 2014 12:13:44 -0400 Received: by mail-wi0-f171.google.com with SMTP id bs8so910765wib.16 for ; Fri, 12 Sep 2014 09:13:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=3E7qtEHc5yebH8BOWse8z10Ur6OoAPvYyHqImlKQAzw=; b=L5q3i1FB7//eoUqFKhIHqJOCfyYY7S0V5VNOK0rCh5aV2gSxdlB5BIUOauBFJ8NdyX w0dKlAeZTLscq91wfb9DMOMY3c9kP109YMOXO+5Ts43rJrvAXsCAEBJc1bMgqo41dtiN rAm9Yzp+KjoU6BYvr5GC3iXPc1KMrdIfpzrd6z8Zb8IeWnqhJYMYIP5hyd2gX6NVZ/Ly LXvgaLG+rXVoAvf29y7czfRyO1JyWi2lf7mCcj+9NfVvZSkwV/r0n1iRm+zTM9Om+0ki rM10A23+MkHsUVIQTDAsaHmBdRtfjtq0oM2keiQsuWU7yYayO9N8euAOqtSdvitHmYzg hoEQ== X-Received: by 10.180.78.226 with SMTP id e2mr3583722wix.68.1410538422637; Fri, 12 Sep 2014 09:13:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.41.202 with HTTP; Fri, 12 Sep 2014 09:13:22 -0700 (PDT) In-Reply-To: <5411F3A3.9050403@cs.ucla.edu> References: <20140910004053.6D8A.27F6AC2D@kcn.ne.jp> <540F5C1F.2040008@cs.ucla.edu> <20140910083909.6495.27F6AC2D@kcn.ne.jp> <540F94B3.1040804@cs.ucla.edu> <540FF8E2.9080903@cs.ucla.edu> <20140910112235.GA17843@nomada> <20140911081510.GA4859@ypig.lip.ens-lyon.fr> <5411D6E5.4000402@cs.ucla.edu> <5411F3A3.9050403@cs.ucla.edu> From: Jim Meyering Date: Fri, 12 Sep 2014 09:13:22 -0700 X-Google-Sender-Auth: UaHmQnW__CYnm1Y4UAMVyV3LzFw Message-ID: Subject: Re: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18266 Cc: 761157 <761157@bugs.debian.org>, Santiago , 18266 <18266@debbugs.gnu.org>, 18266-done <18266-done@debbugs.gnu.org>, Vincent Lefevre , 758105 <758105@bugs.debian.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Thu, Sep 11, 2014 at 12:10 PM, Paul Eggert wrote: > On 09/11/2014 11:37 AM, Jim Meyering wrote: >> >> Would you mind adding a test to trigger that one? > > Ordinarily I would have done that already but this -P stuff is so buggy and > slow that I got discouraged. (If we keep having trouble with -P I may start > lobbying to remove it....) Anyway, I gave it a shot with the attached > further patch. Thank you. Looks perfect. I too rely on grep's -P, sometimes using PCRE features that are very hard to emulate using EREs. From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 12:17:00 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 16:17:00 +0000 Received: from localhost ([127.0.0.1]:39599 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSTWs-0004jZ-VK for submit@debbugs.gnu.org; Fri, 12 Sep 2014 12:16:59 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:43177) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSTWq-0004jQ-4k for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 12:16:57 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 2E471A6001D; Fri, 12 Sep 2014 09:16:55 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c7qVwtUv9AmL; Fri, 12 Sep 2014 09:16:46 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 03936A6002D; Fri, 12 Sep 2014 09:16:46 -0700 (PDT) Message-ID: <54131C6D.4070503@cs.ucla.edu> Date: Fri, 12 Sep 2014 09:16:45 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140829054754.GC5210@nomada> <54008391.6030801@cs.ucla.edu> <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> In-Reply-To: <20140912082916.GD4404@xvii.vinc17.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.5 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.5 (----) Vincent Lefevre wrote: > Glibc regards it as ASCII: You're right. Sorry, I was confused. FreeBSD, Solaris, and AIX work the way that I thought, though. Plus, in GNU regular expressions the pattern "." works the way that I thought with LC_ALL=C; my guess (without investigating this) is that this is because whoever wrote the regex code assumed the BSDish behavior. Arguably this is a glitch in the GNU regex code, in that for consistency "." should not match encoding errors in unibyte locales. Here's a pair of test cases to illustrate the glitch: $ printf '\200\n' | LC_ALL=en_US.utf8 grep '.' | wc 0 0 0 $ printf '\200\n' | LC_ALL=C grep '.' | wc 1 0 2 > I just mean that "grep ." is a method given by some people, that > was working before UTF-8. And it still works, if by "." one means "match one character". Unfortunately there is no POSIX regular expression that does what you're looking for (match either one character, or a single byte that is an encoding error). This is because POSIX says the behavior is undefined on encoding errors. The GNU syntax for regular expressions extends POSIX and does not dump core, but it still provides no way to write the pattern you're asking for, and the behavior is unspecified on encoding errors. Perhaps this should be improved by fixing the abovementioned glitch and by providing a syntax extension for matching encoding errors, though we'd need a volunteer to do that. The situation with libpcre is weirder: there's a pattern '\C' for matching a single byte even if it's an encoding error, but as far as I can tell there's no way to use regular expressions safely on arbitrary data containing encoding errors unless you're in unibyte mode (in which case '\C' provides no extra power). I.e., \C appears to be useless in any program for which undefined behavior is unacceptable. From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 17:29:45 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 21:29:46 +0000 Received: from localhost ([127.0.0.1]:39682 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSYPY-0003uO-V1 for submit@debbugs.gnu.org; Fri, 12 Sep 2014 17:29:45 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:47292) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSYPV-0003uE-5N for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 17:29:42 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id A5DD5131; Fri, 12 Sep 2014 23:29:39 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id 58BF921A079; Fri, 12 Sep 2014 23:29:39 +0200 (CEST) Date: Fri, 12 Sep 2014 23:29:39 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage Message-ID: <20140912212939.GJ4404@xvii.vinc17.org> References: <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <54131C6D.4070503@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -2.2 (--) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.2 (--) On 2014-09-12 09:16:45 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > >I just mean that "grep ." is a method given by some people, that > >was working before UTF-8. > > And it still works, if by "." one means "match one character". No, by "working", I mean that "grep ." was matching any non-empty line. A non-empty line is anything that is not "\n", with valid characters and/or invalid byte sequences. > Unfortunately there is no POSIX regular expression that does what you're > looking for (match either one character, or a single byte that is an > encoding error). This is because POSIX says the behavior is undefined on > encoding errors. But since the behavior is undefined, a grep implementation is free to do anything it likes, such as make "." match invalid bytes. See below for details. > The GNU syntax for regular expressions extends POSIX and does not > dump core, but it still provides no way to write the pattern you're > asking for, and the behavior is unspecified on encoding errors. > Perhaps this should be improved by fixing the abovementioned glitch > and by providing a syntax extension for matching encoding errors, > though we'd need a volunteer to do that. I'm not sure that a syntax extension would really be useful. I think that an option to control what happens on encoding errors would be better and sufficient. For instance, a choice between the 4 following behaviors: 1. If an encoding error is encountered, grep returns an error. Some encoding errors may remained unnoticed, e.g. if -m is used and the max count has been reached (you can see the behavior of such an error as being similar to a file read error). The error may be signaled immediately, even when there is a match before. 2. An encoding error is never matched. I suppose that this is the current behavior in UTF-8. 3. An encoding error is regarded as a special character different from the other characters. In particular it will be matched by "." and "[^...]". Whether a sequence of invalid bytes is regarded as a single special character or several ones could be specified or not (in practice, there could be 2 possibilities: either regard each byte as a special character, or regard each longest valid prefix as a special character). The properties of this special character could be specified or not, concerning character classes (I would say that the character doesn't fall in any class, possibly except cntrl). 4. Like (3), but the character could be an existing one (such as \0). The idea behind this behavior is that the user may not really care, but wants grep to be fast. Now, unless \0 appears in the pattern under some form, replacing the encoding error by a null character would be equivalent to "(3) + the special character is in the cntrl character class". > The situation with libpcre is weirder: there's a pattern '\C' for > matching a single byte even if it's an encoding error, but as far as > I can tell there's no way to use regular expressions safely on > arbitrary data containing encoding errors unless you're in unibyte > mode (in which case '\C' provides no extra power). I.e., \C appears > to be useless in any program for which undefined behavior is > unacceptable. In the context of libpcre (which doesn't support encoding errors, contrary to Perl if I understand correctly), \C can still be used and be useful when there are no encoding errors. But not that the pcresyntax(3) man page says "best avoided", the pcrepattern(3) man page says that it can yield undefined behavior (but gives a complex example where it can be used), and the perlre(1) man page says that \C is deprecated. So, grep could say that \C is not supported. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 17:39:48 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 21:39:48 +0000 Received: from localhost ([127.0.0.1]:39686 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSYZH-00049t-KH for submit@debbugs.gnu.org; Fri, 12 Sep 2014 17:39:47 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:58498) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSYZF-00049i-DP for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 17:39:46 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 2E311A60018; Fri, 12 Sep 2014 14:39:44 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 08ip3I-+VIWu; Fri, 12 Sep 2014 14:39:35 -0700 (PDT) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 8C06BA60010; Fri, 12 Sep 2014 14:39:35 -0700 (PDT) Message-ID: <54136817.5040309@cs.ucla.edu> Date: Fri, 12 Sep 2014 14:39:35 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> In-Reply-To: <20140912212939.GJ4404@xvii.vinc17.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.5 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.5 (----) On 09/12/2014 02:29 PM, Vincent Lefevre wrote: > an option to control what happens on encoding errors would be better > and sufficient. It might suffice for your use cases, but it's more complicated and less flexible than being able to match bytes within the regular expression. (Plus, someone would have to implement it, which is perhaps the biggest objection to either approach ....) But I take your point that \C is best avoided. This whole area is pretty hairy, I'm afraid. Speaking of hairy, why doesn't grep use PCRE_MULTILINE? Using PCRE_MULTILINE shouldn't be that hard, and should boost performance quite a bit in typical usage. Or am I being too optimistic here? From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 18:23:33 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 22:23:33 +0000 Received: from localhost ([127.0.0.1]:39719 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSZFc-0005DT-9b for submit@debbugs.gnu.org; Fri, 12 Sep 2014 18:23:32 -0400 Received: from mail-we0-f170.google.com ([74.125.82.170]:36566) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSZFZ-0005DI-LD for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 18:23:30 -0400 Received: by mail-we0-f170.google.com with SMTP id u57so1388551wes.1 for <18266@debbugs.gnu.org>; Fri, 12 Sep 2014 15:23:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=nh4CjNt/ZcfPJZ7LfTCb8zVxEX+2ZGwxFiYK4CU+1nI=; b=MimPdJAmpN/ylAj3kYjBtuP46I1OLm9MMw6uBd2RYbR0V7SYb7x3nZGyoAQjoB+LMb 8nIoVL7KLCw+ETS0TkXJlJue7mGqtJDmZdkPOA+i2oRbNI973wxkGCCZalivh5vVpxK7 j1otBea27mQNLaGPuCSdAWCb+f9pkIY/3M62CC+DxBAEDVp3PMZJ/Jy3tmsCdS7LIqtg 8r7Di7HsZopW6GZBqPBip5ZfV2NoerYTNARHSv9wzaZP+c+kK1otjPBWduYHOMCiaJCp ODDSFWkKoqAvmDxY/2WybjYjeM49tSQrg+HBJ9c3N75lmthaC+LBZEYSsF9IdbMtzSUe Ap1w== X-Received: by 10.180.95.135 with SMTP id dk7mr5909337wib.68.1410560608829; Fri, 12 Sep 2014 15:23:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.41.202 with HTTP; Fri, 12 Sep 2014 15:23:08 -0700 (PDT) In-Reply-To: <54136817.5040309@cs.ucla.edu> References: <20140901081822.GB3775@ypig.lip.ens-lyon.fr> <54042EF9.6000309@cs.ucla.edu> <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> <54136817.5040309@cs.ucla.edu> From: Jim Meyering Date: Fri, 12 Sep 2014 15:23:08 -0700 X-Google-Sender-Auth: 4rHeUK7At2EgtEXeFh2nL7s3Hqg Message-ID: Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18266 Cc: 758105@bugs.debian.org, Vincent Lefevre , 18266@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Fri, Sep 12, 2014 at 2:39 PM, Paul Eggert wrote: > On 09/12/2014 02:29 PM, Vincent Lefevre wrote: > >> an option to control what happens on encoding errors would be better and >> sufficient. > > > It might suffice for your use cases, but it's more complicated and less > flexible than being able to match bytes within the regular expression. > (Plus, someone would have to implement it, which is perhaps the biggest > objection to either approach ....) But I take your point that \C is best > avoided. This whole area is pretty hairy, I'm afraid. > > Speaking of hairy, why doesn't grep use PCRE_MULTILINE? Using > PCRE_MULTILINE shouldn't be that hard, and should boost performance quite a > bit in typical usage. Or am I being too optimistic here? When I first saw that implementation, I assumed it was just a first-cut one. I see no reason not to use PCRE_MULTILINE, but haven't tried it, either. From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 18:40:43 2014 Received: (at 18266) by debbugs.gnu.org; 12 Sep 2014 22:40:43 +0000 Received: from localhost ([127.0.0.1]:39723 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSZWE-0005dy-E8 for submit@debbugs.gnu.org; Fri, 12 Sep 2014 18:40:42 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:47352) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSZW7-0005dl-9E for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 18:40:36 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id 530A73B; Sat, 13 Sep 2014 00:40:34 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id F3F4D21A079; Sat, 13 Sep 2014 00:40:33 +0200 (CEST) Date: Sat, 13 Sep 2014 00:40:33 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage Message-ID: <20140912224033.GM4404@xvii.vinc17.org> References: <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> <54136817.5040309@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <54136817.5040309@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -2.2 (--) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.2 (--) On 2014-09-12 14:39:35 -0700, Paul Eggert wrote: > On 09/12/2014 02:29 PM, Vincent Lefevre wrote: > >an option to control what happens on encoding errors would be > >better and sufficient. > > It might suffice for your use cases, but it's more complicated and less > flexible than being able to match bytes within the regular expression. But IMHO, some solutions I proposed would be faster. I wonder whether anyone is interested in matching individual bytes in a file regarded as UTF-8 encoded. This seems weird. > Speaking of hairy, why doesn't grep use PCRE_MULTILINE? Using > PCRE_MULTILINE shouldn't be that hard, and should boost performance > quite a bit in typical usage. Or am I being too optimistic here? Perhaps in text files. In binary files, with the current solution, I don't think this matters as failures due to invalid bytes typically occur several times per line. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 20:57:53 2014 Received: (at 18266) by debbugs.gnu.org; 13 Sep 2014 00:57:53 +0000 Received: from localhost ([127.0.0.1]:39738 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSbez-0000PS-0z for submit@debbugs.gnu.org; Fri, 12 Sep 2014 20:57:53 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:38145) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSbew-0000PJ-DA for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 20:57:51 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id D4485A60006; Fri, 12 Sep 2014 17:57:48 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xlPSyOSYfNA4; Fri, 12 Sep 2014 17:57:43 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 8CF58A60010; Fri, 12 Sep 2014 17:57:43 -0700 (PDT) Message-ID: <54139683.5010302@cs.ucla.edu> Date: Fri, 12 Sep 2014 17:57:39 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140911110700.GA20565@ypig.lip.ens-lyon.fr> <5411CC59.10407@cs.ucla.edu> <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> <54136817.5040309@cs.ucla.edu> <20140912224033.GM4404@xvii.vinc17.org> In-Reply-To: <20140912224033.GM4404@xvii.vinc17.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.5 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.5 (----) Vincent Lefevre wrote: > I wonder whether anyone is interested in matching individual bytes > in a file regarded as UTF-8 encoded. This seems weird. It's not weird at all. For example, suppose we invent the notation [[:error:]] to match encoding errors. Then the pattern '[[:error:]]' would match all encoding errors in a file, which could well be a useful thing. Currently, for example, the tz package has a Make rule 'check_character_set' that verifies that the source files are all properly encoded. It executes this shell command: ! grep -nv '^.*$' file names This relies on GNU grep's behavior that "." does not match an encoding error. But it's a command that is not obvious. It'd be simpler and clearer to write this: ! grep -n '[[:error:]]' file names if such a feature were available. From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 21:17:47 2014 Received: (at 18266) by debbugs.gnu.org; 13 Sep 2014 01:17:47 +0000 Received: from localhost ([127.0.0.1]:39746 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSbyE-0000u0-G0 for submit@debbugs.gnu.org; Fri, 12 Sep 2014 21:17:46 -0400 Received: from ioooi.vinc17.net ([92.243.22.117]:47383) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSbyB-0000tp-Bx for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 21:17:44 -0400 Received: from smtp-xvii.vinc17.net (128.119.75.86.rev.sfr.net [86.75.119.128]) by ioooi.vinc17.net (Postfix) with ESMTPSA id BEA21A75; Sat, 13 Sep 2014 03:17:41 +0200 (CEST) Received: by xvii.vinc17.org (Postfix, from userid 1000) id 3E6AD21A079; Sat, 13 Sep 2014 03:17:41 +0200 (CEST) Date: Sat, 13 Sep 2014 03:17:41 +0200 From: Vincent Lefevre To: Paul Eggert Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage Message-ID: <20140913011740.GN4404@xvii.vinc17.org> References: <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> <54136817.5040309@cs.ucla.edu> <20140912224033.GM4404@xvii.vinc17.org> <54139683.5010302@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <54139683.5010302@cs.ucla.edu> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25) X-Spam-Score: -2.2 (--) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.2 (--) On 2014-09-12 17:57:39 -0700, Paul Eggert wrote: > Currently, for example, the tz package has > a Make rule 'check_character_set' that verifies that the source files are > all properly encoded. It executes this shell command: > > ! grep -nv '^.*$' file names > > This relies on GNU grep's behavior that "." does not match an encoding > error. But it's a command that is not obvious. It'd be simpler and clearer > to write this: > > ! grep -n '[[:error:]]' file names > > if such a feature were available. But both of these solutions have the drawback of working only in UTF-8 locales. One may wonder whether grep is the right tool, as "iconv -f UTF-8 -t UTF-8" can do such a check in any locale. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 22:08:53 2014 Received: (at 18266) by debbugs.gnu.org; 13 Sep 2014 02:08:53 +0000 Received: from localhost ([127.0.0.1]:39750 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSclg-00025h-NX for submit@debbugs.gnu.org; Fri, 12 Sep 2014 22:08:52 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:40345) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSclc-00025X-SS for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 22:08:50 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id F16AFA60018; Fri, 12 Sep 2014 19:08:47 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id unZ1yJR-hl5v; Fri, 12 Sep 2014 19:08:39 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 4335DA60006; Fri, 12 Sep 2014 19:08:39 -0700 (PDT) Message-ID: <5413A726.1080903@cs.ucla.edu> Date: Fri, 12 Sep 2014 19:08:38 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> <54136817.5040309@cs.ucla.edu> <20140912224033.GM4404@xvii.vinc17.org> <54139683.5010302@cs.ucla.edu> <20140913011740.GN4404@xvii.vinc17.org> In-Reply-To: <20140913011740.GN4404@xvii.vinc17.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.5 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.5 (----) Vincent Lefevre wrote: > But both of these solutions have the drawback of working only in > UTF-8 locales. Not at all; '[[:error:]]' would match a single-byte encoding error in the current locale. The tz database is interested in UTF-8 so it sets the LC_ALL environment variable to a UTF-8 locale, but that setting shouldn't be required in general. Also, the tz database needs grep patterns that iconv doesn't support. For example, one rule is that commentary (which starts with #) can contain UTF-8 characters, but the ordinary data (before the #) is limited to a smaller set. This is captured by the command: grep -Env '^[ordinarycharset]*(#.*)?$' where 'ordinarycharset' is the set of ASCII characters in ordinary tz data. Here it's useful that '.' does not match encoding errors on GNU/Linux. From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 12 22:11:16 2014 Received: (at 18266) by debbugs.gnu.org; 13 Sep 2014 02:11:17 +0000 Received: from localhost ([127.0.0.1]:39754 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XSco0-00029o-Dh for submit@debbugs.gnu.org; Fri, 12 Sep 2014 22:11:16 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:40416) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XScny-00029g-H1 for 18266@debbugs.gnu.org; Fri, 12 Sep 2014 22:11:15 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 12A16A60018; Fri, 12 Sep 2014 19:11:14 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OOtOn4Op-TrW; Fri, 12 Sep 2014 19:11:09 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id E95CDA60006; Fri, 12 Sep 2014 19:11:08 -0700 (PDT) Message-ID: <5413A7BC.2080801@cs.ucla.edu> Date: Fri, 12 Sep 2014 19:11:08 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> <54136817.5040309@cs.ucla.edu> <20140912224033.GM4404@xvii.vinc17.org> <54139683.5010302@cs.ucla.edu> <20140913011740.GN4404@xvii.vinc17.org> In-Reply-To: <20140913011740.GN4404@xvii.vinc17.org> Content-Type: multipart/mixed; boundary="------------070208020805080209040406" X-Spam-Score: -4.5 (----) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.5 (----) This is a multi-part message in MIME format. --------------070208020805080209040406 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Come to think of it, grep -P misbehaves badly in multibyte locales that are not UTF-8. It should report an error and exit rather than output gibberish. I installed the attached patch to catch that. --------------070208020805080209040406 Content-Type: text/plain; charset=UTF-8; name="0001-grep-diagnose-P-in-non-UTF-8-multibyte-locale.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0001-grep-diagnose-P-in-non-UTF-8-multibyte-locale.patch" RnJvbSBjYWM5MWUzZTIzM2I3NjlkNjBkN2I1ZDZiYzBlOGFmYzY3YzBjNzEzIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBGcmksIDEyIFNlcCAyMDE0IDE5OjA2OjI3IC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gZ3JlcDogZGlhZ25vc2UgLVAgaW4gbm9uLVVURi04IG11bHRpYnl0ZSBsb2NhbGUKCiog c3JjL3BjcmVzZWFyY2guYyAoUGNvbXBpbGUpOgpsaWJwY3JlIHN1cHBvcnRzIG9ubHkgdW5p Ynl0ZSBhbmQgVVRGLTggbG9jYWxlcywKc28gcmVwb3J0IGFuIGVycm9yIGFuZCBleGl0IGlm IHVzZWQgaW4gb3RoZXIgbG9jYWxlcy4KKiBORVdTOiBNZW50aW9uIHRoaXMuCiogdGVzdHMv ZXVjLW1iOiBUZXN0IHRoaXMuCi0tLQogTkVXUyAgICAgICAgICAgICB8IDMgKysrCiBzcmMv cGNyZXNlYXJjaC5jIHwgOCArKysrKystLQogdGVzdHMvZXVjLW1iICAgICB8IDQgKysrKwog MyBmaWxlcyBjaGFuZ2VkLCAxMyBpbnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygtKQoKZGlm ZiAtLWdpdCBhL05FV1MgYi9ORVdTCmluZGV4IDM2MjRiNzYuLjM2YmI0OGYgMTAwNjQ0Ci0t LSBhL05FV1MKKysrIGIvTkVXUwpAQCAtMTksNiArMTksOSBAQCBHTlUgZ3JlcCBORVdTICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgLSotIG91dGxpbmUgLSotCiAgIFRo ZSBHUkVQX09QVElPTlMgZW52aXJvbm1lbnQgdmFyaWFibGUgaXMgbm93IG9ic29sZXNjZW50 LCBhbmQgZ3JlcAogICBub3cgd2FybnMgaWYgaXQgaXMgdXNlZC4gIFBsZWFzZSB1c2UgYW4g YWxpYXMgb3Igc2NyaXB0IGluc3RlYWQuCiAKKyAgSW4gbG9jYWxlcyB3aXRoIG11bHRpYnl0 ZSBjaGFyYWN0ZXIgZW5jb2RpbmdzIG90aGVyIHRoYW4gVVRGLTgsCisgIGdyZXAgLVAgbm93 IHJlcG9ydHMgYW4gZXJyb3IgYW5kIGV4aXRzIGluc3RlYWQgb2YgbWlzYmVoYXZpbmcuCisK ICogTm90ZXdvcnRoeSBjaGFuZ2VzIGluIHJlbGVhc2UgMi4yMCAoMjAxNC0wNi0wMykgW3N0 YWJsZV0KIAogKiogQnVnIGZpeGVzCmRpZmYgLS1naXQgYS9zcmMvcGNyZXNlYXJjaC5jIGIv c3JjL3BjcmVzZWFyY2guYwppbmRleCAxN2UwZTMyLi4zNDc1ZDRhIDEwMDY0NAotLS0gYS9z cmMvcGNyZXNlYXJjaC5jCisrKyBiL3NyYy9wY3Jlc2VhcmNoLmMKQEAgLTUyLDEzICs1Miwx NyBAQCBQY29tcGlsZSAoY2hhciBjb25zdCAqcGF0dGVybiwgc2l6ZV90IHNpemUpCiAgIGNo YXIgY29uc3QgKmVwOwogICBjaGFyICpyZSA9IHhubWFsbG9jICg0LCBzaXplICsgNyk7CiAg IGludCBmbGFncyA9IChQQ1JFX01VTFRJTElORQotICAgICAgICAgICAgICAgfCAobWF0Y2hf aWNhc2UgPyBQQ1JFX0NBU0VMRVNTIDogMCkKLSAgICAgICAgICAgICAgIHwgKHVzaW5nX3V0 ZjggKCkgPyBQQ1JFX1VURjggOiAwKSk7CisgICAgICAgICAgICAgICB8IChtYXRjaF9pY2Fz ZSA/IFBDUkVfQ0FTRUxFU1MgOiAwKSk7CiAgIGNoYXIgY29uc3QgKnBhdGxpbSA9IHBhdHRl cm4gKyBzaXplOwogICBjaGFyICpuID0gcmU7CiAgIGNoYXIgY29uc3QgKnA7CiAgIGNoYXIg Y29uc3QgKnBudWw7CiAKKyAgaWYgKHVzaW5nX3V0ZjggKCkpCisgICAgZmxhZ3MgfD0gUENS RV9VVEY4OworICBlbHNlIGlmIChNQl9DVVJfTUFYICE9IDEpCisgICAgZXJyb3IgKEVYSVRf VFJPVUJMRSwgMCwgXygiLVAgc3VwcG9ydHMgb25seSB1bmlieXRlIGFuZCBVVEYtOCBsb2Nh bGVzIikpOworCiAgIC8qIEZJWE1FOiBSZW1vdmUgdGhlc2UgcmVzdHJpY3Rpb25zLiAgKi8K ICAgaWYgKG1lbWNociAocGF0dGVybiwgJ1xuJywgc2l6ZSkpCiAgICAgZXJyb3IgKEVYSVRf VFJPVUJMRSwgMCwgXygidGhlIC1QIG9wdGlvbiBvbmx5IHN1cHBvcnRzIGEgc2luZ2xlIHBh dHRlcm4iKSk7CmRpZmYgLS1naXQgYS90ZXN0cy9ldWMtbWIgYi90ZXN0cy9ldWMtbWIKaW5k ZXggYWEyNTRjYS4uNmE5YTg0NSAxMDA3NTUKLS0tIGEvdGVzdHMvZXVjLW1iCisrKyBiL3Rl c3RzL2V1Yy1tYgpAQCAtNDAsNCArNDAsOCBAQCBtYWtlX2lucHV0IEJBQkFBQiA+IGV4cCB8 fCBmcmFtZXdvcmtfZmFpbHVyZV8KIGNvbXBhcmUgZXhwIG91dCB8fCBmYWlsPTEKIG1ha2Vf aW5wdXQgQkFCQUJBIHxldWNfZ3JlcCBBQjsgdGVzdCAkPyA9IDEgfHwgZmFpbD0xCiAKKyMg LVAgc3VwcG9ydHMgb25seSB1bmlieXRlIGFuZCBVVEYtOCBsb2NhbGVzLgorTENfQUxMPSRs b2NhbGUgZ3JlcCAtUCB4IC9kZXYvbnVsbAordGVzdCAkPyA9IDIgfHwgZmFpbD0xCisKIEV4 aXQgJGZhaWwKLS0gCjEuOS4zCgo= --------------070208020805080209040406-- From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 15 01:32:46 2014 Received: (at 18266) by debbugs.gnu.org; 15 Sep 2014 05:32:46 +0000 Received: from localhost ([127.0.0.1]:41065 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XTOu2-0008DX-DO for submit@debbugs.gnu.org; Mon, 15 Sep 2014 01:32:45 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:36026) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XTOtu-0008DG-HF for 18266@debbugs.gnu.org; Mon, 15 Sep 2014 01:32:39 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 5194CA60007; Sun, 14 Sep 2014 22:32:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2W6UgrhztrLH; Sun, 14 Sep 2014 22:32:29 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id EEB6039E8015; Sun, 14 Sep 2014 22:32:28 -0700 (PDT) Message-ID: <541679EC.5000701@cs.ucla.edu> Date: Sun, 14 Sep 2014 22:32:28 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> <54136817.5040309@cs.ucla.edu> <20140912224033.GM4404@xvii.vinc17.org> <54139683.5010302@cs.ucla.edu> <20140913011740.GN4404@xvii.vinc17.org> <5413A7BC.2080801@cs.ucla.edu> In-Reply-To: <5413A7BC.2080801@cs.ucla.edu> Content-Type: multipart/mixed; boundary="------------080305090207030305040600" X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.3 (-) This is a multi-part message in MIME format. --------------080305090207030305040600 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Attached are some proposed patches which should improve the performance of grep -P when applied to binary files, among other things. I have some other ideas for boosting performance further but thought I'd publish these first. Please give them a try if you have the time. I doubt whether this will "solve" the performance problem entirely with -P and encoding errors but at least it should be heading in the right direction. --------------080305090207030305040600 Content-Type: text/plain; charset=UTF-8; name="0001-grep-remove-refactor-unnecessary-code-about-line-spl.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0001-grep-remove-refactor-unnecessary-code-about-line-spl.pa"; filename*1="tch" RnJvbSBhZDM0YjdkODU1NmU5ZmMyNzQ2OTA2NjZhYzZkZWQyYjY1NzZmZWIzIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTdW4sIDE0IFNlcCAyMDE0IDExOjQyOjA4IC0wNzAwClN1YmplY3Q6IFtQUk9Q T1NFRCBQQVRDSCAxLzZdIGdyZXA6IHJlbW92ZS9yZWZhY3RvciB1bm5lY2Vzc2FyeSBjb2Rl IGFib3V0CiBsaW5lIHNwbGl0dGluZwoKKiBzcmMvZ3JlcC5jIChkb19leGVjdXRlKTogUmVt b3ZlLiAgQ2FsbGVyIG5vdyB1c2VzICdleGVjdXRlJy4KKiBzcmMvcGNyZXNlYXJjaC5jIChQ ZXhlY3V0ZSk6IEltcHJvdmUgY29tbWVudCBhYm91dCB0aGlzLgotLS0KIHNyYy9ncmVwLmMg ICAgICAgfCA0NSArLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0KIHNyYy9wY3Jlc2VhcmNoLmMgfCAgNyArKysrKy0tCiAyIGZpbGVzIGNoYW5nZWQsIDYg aW5zZXJ0aW9ucygrKSwgNDYgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2dyZXAu YyBiL3NyYy9ncmVwLmMKaW5kZXggMWY4MDFlOS4uNzE5ZGZmMSAxMDA2NDQKLS0tIGEvc3Jj L2dyZXAuYworKysgYi9zcmMvZ3JlcC5jCkBAIC0xMDQ4LDQ5ICsxMDQ4LDYgQEAgcHJ0ZXh0 IChjaGFyIGNvbnN0ICpiZWcsIGNoYXIgY29uc3QgKmxpbSkKICAgb3V0bGVmdCAtPSBuOwog fQogCi0vKiBJbnZva2UgdGhlIG1hdGNoZXIsIEVYRUNVVEUsIG9uIGJ1ZmZlciBCVUYgb2Yg U0laRSBieXRlcy4gIElmIHRoZXJlCi0gICBpcyBubyBtYXRjaCwgcmV0dXJuIChzaXplX3Qp IC0xLiAgT3RoZXJ3aXNlLCBzZXQgKk1BVENIX1NJWkUgdG8gdGhlCi0gICBsZW5ndGggb2Yg dGhlIG1hdGNoIGFuZCByZXR1cm4gdGhlIG9mZnNldCBvZiB0aGUgc3RhcnQgb2YgdGhlIG1h dGNoLiAgKi8KLXN0YXRpYyBzaXplX3QKLWRvX2V4ZWN1dGUgKGNoYXIgY29uc3QgKmJ1Ziwg c2l6ZV90IHNpemUsIHNpemVfdCAqbWF0Y2hfc2l6ZSkKLXsKLSAgc2l6ZV90IHJlc3VsdDsK LSAgY29uc3QgY2hhciAqbGluZV9uZXh0OwotCi0gIC8qIFdpdGggdGhlIGN1cnJlbnQgaW1w bGVtZW50YXRpb24sIHVzaW5nIC0taWdub3JlLWNhc2Ugd2l0aCBhIG11bHRpLWJ5dGUKLSAg ICAgY2hhcmFjdGVyIHNldCBpcyB2ZXJ5IGluZWZmaWNpZW50IHdoZW4gYXBwbGllZCB0byBh IGxhcmdlIGJ1ZmZlcgotICAgICBjb250YWluaW5nIG1hbnkgbWF0Y2hlcy4gIFdlIGNhbiBh dm9pZCBtdWNoIG9mIHRoZSB3YXN0ZWQgZWZmb3J0Ci0gICAgIGJ5IG1hdGNoaW5nIGxpbmUt YnktbGluZS4KLQotICAgICBGSVhNRTogdGhpcyBpcyBqdXN0IGFuIHVnbHkgd29ya2Fyb3Vu ZCwgYW5kIGl0IGRvZXNuJ3QgcmVhbGx5Ci0gICAgIGJlbG9uZyBoZXJlLiAgQWxzbywgUENS RSBpcyBhbHdheXMgdXNpbmcgdGhpcyBzYW1lIHBlci1saW5lCi0gICAgIG1hdGNoaW5nIGFs Z29yaXRobS4gIEVpdGhlciB3ZSBmaXggLWksIG9yIHdlIHNob3VsZCByZWZhY3RvcgotICAg ICB0aGlzIGNvZGUtLS1mb3IgZXhhbXBsZSwgd2UgY291bGQgYWRkIGFub3RoZXIgZnVuY3Rp b24gcG9pbnRlcgotICAgICB0byBzdHJ1Y3QgbWF0Y2hlciB0byBzcGxpdCB0aGUgYnVmZmVy IHBhc3NlZCB0byBleGVjdXRlLiAgSXQgd291bGQKLSAgICAgcGVyZm9ybSB0aGUgbWVtY2hy IGlmIGxpbmUtYnktbGluZSBtYXRjaGluZyBpcyBuZWNlc3NhcnksIG9yIGp1c3QKLSAgICAg cmV0dXJuIGJ1ZiArIHNpemUgb3RoZXJ3aXNlLiAgKi8KLSAgaWYgKCEgKGV4ZWN1dGUgPT0g RmV4ZWN1dGUgfHwgZXhlY3V0ZSA9PSBQZXhlY3V0ZSkKLSAgICAgIHx8IE1CX0NVUl9NQVgg PT0gMSB8fCAhbWF0Y2hfaWNhc2UpCi0gICAgcmV0dXJuIGV4ZWN1dGUgKGJ1Ziwgc2l6ZSwg bWF0Y2hfc2l6ZSwgTlVMTCk7Ci0KLSAgZm9yIChsaW5lX25leHQgPSBidWY7IGxpbmVfbmV4 dCA8IGJ1ZiArIHNpemU7ICkKLSAgICB7Ci0gICAgICBjb25zdCBjaGFyICpsaW5lX2J1ZiA9 IGxpbmVfbmV4dDsKLSAgICAgIGNvbnN0IGNoYXIgKmxpbmVfZW5kID0gbWVtY2hyIChsaW5l X2J1ZiwgZW9sYnl0ZSwKLSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAo YnVmICsgc2l6ZSkgLSBsaW5lX2J1Zik7Ci0gICAgICBpZiAobGluZV9lbmQgPT0gTlVMTCkK LSAgICAgICAgbGluZV9uZXh0ID0gbGluZV9lbmQgPSBidWYgKyBzaXplOwotICAgICAgZWxz ZQotICAgICAgICBsaW5lX25leHQgPSBsaW5lX2VuZCArIDE7Ci0KLSAgICAgIHJlc3VsdCA9 IGV4ZWN1dGUgKGxpbmVfYnVmLCBsaW5lX25leHQgLSBsaW5lX2J1ZiwgbWF0Y2hfc2l6ZSwg TlVMTCk7Ci0gICAgICBpZiAocmVzdWx0ICE9IChzaXplX3QpIC0xKQotICAgICAgICByZXR1 cm4gKGxpbmVfYnVmIC0gYnVmKSArIHJlc3VsdDsKLSAgICB9Ci0KLSAgcmV0dXJuIChzaXpl X3QpIC0xOwotfQotCiAvKiBTY2FuIHRoZSBzcGVjaWZpZWQgcG9ydGlvbiBvZiB0aGUgYnVm ZmVyLCBtYXRjaGluZyBsaW5lcyAob3IKICAgIGJldHdlZW4gbWF0Y2hpbmcgbGluZXMgaWYg T1VUX0lOVkVSVCBpcyB0cnVlKS4gIFJldHVybiBhIGNvdW50IG9mCiAgICBsaW5lcyBwcmlu dGVkLiAqLwpAQCAtMTEwNCw3ICsxMDYxLDcgQEAgZ3JlcGJ1ZiAoY2hhciBjb25zdCAqYmVn LCBjaGFyIGNvbnN0ICpsaW0pCiAgIGZvciAocCA9IGJlZzsgcCA8IGxpbTsgcCA9IGVuZHAp CiAgICAgewogICAgICAgc2l6ZV90IG1hdGNoX3NpemU7Ci0gICAgICBzaXplX3QgbWF0Y2hf b2Zmc2V0ID0gZG9fZXhlY3V0ZSAocCwgbGltIC0gcCwgJm1hdGNoX3NpemUpOworICAgICAg c2l6ZV90IG1hdGNoX29mZnNldCA9IGV4ZWN1dGUgKHAsIGxpbSAtIHAsICZtYXRjaF9zaXpl LCBOVUxMKTsKICAgICAgIGlmIChtYXRjaF9vZmZzZXQgPT0gKHNpemVfdCkgLTEpCiAgICAg ICAgIHsKICAgICAgICAgICBpZiAoIW91dF9pbnZlcnQpCmRpZmYgLS1naXQgYS9zcmMvcGNy ZXNlYXJjaC5jIGIvc3JjL3BjcmVzZWFyY2guYwppbmRleCAzNDc1ZDRhLi4wYzUyMjBkIDEw MDY0NAotLS0gYS9zcmMvcGNyZXNlYXJjaC5jCisrKyBiL3NyYy9wY3Jlc2VhcmNoLmMKQEAg LTE0OSw4ICsxNDksMTEgQEAgUGV4ZWN1dGUgKGNoYXIgY29uc3QgKmJ1Ziwgc2l6ZV90IHNp emUsIHNpemVfdCAqbWF0Y2hfc2l6ZSwKICAgaW50IGUgPSBQQ1JFX0VSUk9SX05PTUFUQ0g7 CiAgIGNoYXIgY29uc3QgKmxpbmVfZW5kOwogCi0gIC8qIFBDUkUgY2FuJ3QgbGltaXQgdGhl IG1hdGNoaW5nIHRvIHNpbmdsZSBsaW5lcywgdGhlcmVmb3JlIHdlIGhhdmUgdG8KLSAgICAg bWF0Y2ggZWFjaCBsaW5lIGluIHRoZSBidWZmZXIgc2VwYXJhdGVseS4gICovCisgIC8qIHBj cmVfZXhlYyBtaXNoYW5kbGVzIG1hdGNoZXMgdGhhdCBjcm9zcyBsaW5lIGJvdW5kYXJpZXMu CisgICAgIFBDUkVfTVVMVElMSU5FIGlzbid0IGEgd2luLCBwYXJ0bHkgYmVjYXVzZSBpdCdz IGluY29tcGF0aWJsZSB3aXRoCisgICAgIC16LCBhbmQgcGFydGx5IGJlY2F1c2UgaXQgY2hl Y2tzIHRoZSBlbnRpcmUgaW5wdXQgYnVmZmVyIGFuZCBpcworICAgICB0aGVyZWZvcmUgc2xv dyBvbiBhIGxhcmdlIGJ1ZmZlciBjb250YWluaW5nIG1hbnkgbWF0Y2hlcy4KKyAgICAgQXZv aWQgdGhlc2UgcHJvYmxlbXMgYnkgbWF0Y2hpbmcgbGluZS1ieS1saW5lLiAgKi8KICAgZm9y ICg7IHAgPCBidWYgKyBzaXplOyBwID0gbGluZV9zdGFydCA9IGxpbmVfZW5kICsgMSkKICAg ICB7CiAgICAgICBsaW5lX2VuZCA9IG1lbWNociAocCwgZW9sYnl0ZSwgYnVmICsgc2l6ZSAt IHApOwotLSAKMS45LjMKCg== --------------080305090207030305040600 Content-Type: text/plain; charset=UTF-8; name="0002-grep-speed-up-P-on-files-containing-many-multibyte-e.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0002-grep-speed-up-P-on-files-containing-many-multibyte-e.pa"; filename*1="tch" RnJvbSBiN2I3NzExZGQwNzJjMzM1YTQ1ZGJmMDkxMTViMTU5N2ZlZDJhZTc2IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTdW4sIDE0IFNlcCAyMDE0IDExOjQ0OjEyIC0wNzAwClN1YmplY3Q6IFtQUk9Q T1NFRCBQQVRDSCAyLzZdIGdyZXA6IHNwZWVkIHVwIC1QIG9uIGZpbGVzIGNvbnRhaW5pbmcg bWFueQogbXVsdGlieXRlIGVycm9ycwoKKiBzcmMvcGNyZXNlYXJjaC5jIChlbXB0eV9tYXRj aCk6IE5ldyB2YXIuCihQY29tcGlsZSk6IFNldCBpdC4KKFBleGVjdXRlKTogVXNlIGl0Lgot LS0KIHNyYy9wY3Jlc2VhcmNoLmMgfCAyNiArKysrKysrKysrKysrKysrKystLS0tLS0tLQog MSBmaWxlIGNoYW5nZWQsIDE4IGluc2VydGlvbnMoKyksIDggZGVsZXRpb25zKC0pCgpkaWZm IC0tZ2l0IGEvc3JjL3BjcmVzZWFyY2guYyBiL3NyYy9wY3Jlc2VhcmNoLmMKaW5kZXggMGM1 MjIwZC4uOTU4NzdlMyAxMDA2NDQKLS0tIGEvc3JjL3BjcmVzZWFyY2guYworKysgYi9zcmMv cGNyZXNlYXJjaC5jCkBAIC0zMyw2ICszMywxMCBAQCBzdGF0aWMgcGNyZSAqY3JlOwogLyog QWRkaXRpb25hbCBpbmZvcm1hdGlvbiBhYm91dCB0aGUgcGF0dGVybi4gICovCiBzdGF0aWMg cGNyZV9leHRyYSAqZXh0cmE7CiAKKy8qIFRhYmxlLCBpbmRleGVkIGJ5ICEgKGZsYWcgJiBQ Q1JFX05PVEJPTCksIG9mIHdoZXRoZXIgdGhlIGVtcHR5CisgICBzdHJpbmcgbWF0Y2hlcyB3 aGVuIHRoYXQgZmxhZyBpcyB1c2VkLiAgKi8KK3N0YXRpYyBpbnQgZW1wdHlfbWF0Y2hbMl07 CisKICMgaWZkZWYgUENSRV9TVFVEWV9KSVRfQ09NUElMRQogc3RhdGljIHBjcmVfaml0X3N0 YWNrICpqaXRfc3RhY2s7CiAjIGVsc2UKQEAgLTEyNCw2ICsxMjgsMTAgQEAgUGNvbXBpbGUg KGNoYXIgY29uc3QgKnBhdHRlcm4sIHNpemVfdCBzaXplKQogICAgICAgICAgICAgICAgXygi ZmFpbGVkIHRvIGFsbG9jYXRlIG1lbW9yeSBmb3IgdGhlIFBDUkUgSklUIHN0YWNrIikpOwog ICAgICAgcGNyZV9hc3NpZ25faml0X3N0YWNrIChleHRyYSwgTlVMTCwgaml0X3N0YWNrKTsK ICAgICB9CisKKyAgZW1wdHlfbWF0Y2hbZmFsc2VdID0gcGNyZV9leGVjIChjcmUsIGV4dHJh LCAiIiwgMCwgMCwgUENSRV9OT1RCT0wsIE5VTEwsIDApOworICBlbXB0eV9tYXRjaFt0cnVl XSA9IHBjcmVfZXhlYyAoY3JlLCBleHRyYSwgIiIsIDAsIDAsIDAsIE5VTEwsIDApOworCiAj IGVuZGlmCiAgIGZyZWUgKHJlKTsKICNlbmRpZiAvKiBIQVZFX0xJQlBDUkUgKi8KQEAgLTE0 NCw3ICsxNTIsNyBAQCBQZXhlY3V0ZSAoY2hhciBjb25zdCAqYnVmLCBzaXplX3Qgc2l6ZSwg c2l6ZV90ICptYXRjaF9zaXplLAogICBpbnQgc3ViW25zdWJdOwogCiAgIGNoYXIgY29uc3Qg KnAgPSBzdGFydF9wdHIgPyBzdGFydF9wdHIgOiBidWY7Ci0gIGludCBvcHRpb25zID0gcCA9 PSBidWYgfHwgcFstMV0gPT0gZW9sYnl0ZSA/IDAgOiBQQ1JFX05PVEJPTDsKKyAgYm9vbCBi b2wgPSBwWy0xXSA9PSBlb2xieXRlOwogICBjaGFyIGNvbnN0ICpsaW5lX3N0YXJ0ID0gYnVm OwogICBpbnQgZSA9IFBDUkVfRVJST1JfTk9NQVRDSDsKICAgY2hhciBjb25zdCAqbGluZV9l bmQ7CkBAIC0xNjQsMjMgKzE3MiwyNiBAQCBQZXhlY3V0ZSAoY2hhciBjb25zdCAqYnVmLCBz aXplX3Qgc2l6ZSwgc2l6ZV90ICptYXRjaF9zaXplLAogICAgICAgLyogVHJlYXQgZW5jb2Rp bmctZXJyb3IgYnl0ZXMgYXMgZGF0YSB0aGF0IGNhbm5vdCBtYXRjaC4gICovCiAgICAgICBm b3IgKDs7KQogICAgICAgICB7CisgICAgICAgICAgaW50IG9wdGlvbnMgPSBib2wgPyAwIDog UENSRV9OT1RCT0w7CiAgICAgICAgICAgaW50IHZhbGlkX2J5dGVzOwogICAgICAgICAgIGUg PSBwY3JlX2V4ZWMgKGNyZSwgZXh0cmEsIHAsIGxpbmVfZW5kIC0gcCwgMCwgb3B0aW9ucywg c3ViLCBuc3ViKTsKICAgICAgICAgICBpZiAoZSAhPSBQQ1JFX0VSUk9SX0JBRFVURjgpCiAg ICAgICAgICAgICBicmVhazsKICAgICAgICAgICB2YWxpZF9ieXRlcyA9IHN1YlswXTsKLSAg ICAgICAgICBlID0gcGNyZV9leGVjIChjcmUsIGV4dHJhLCBwLCB2YWxpZF9ieXRlcywgMCwK LSAgICAgICAgICAgICAgICAgICAgICAgICBvcHRpb25zIHwgUENSRV9OT19VVEY4X0NIRUNL IHwgUENSRV9OT1RFT0wsCi0gICAgICAgICAgICAgICAgICAgICAgICAgc3ViLCBuc3ViKTsK KyAgICAgICAgICBlID0gKHZhbGlkX2J5dGVzID09IDAKKyAgICAgICAgICAgICAgID8gZW1w dHlfbWF0Y2hbYm9sXQorICAgICAgICAgICAgICAgOiBwY3JlX2V4ZWMgKGNyZSwgZXh0cmEs IHAsIHZhbGlkX2J5dGVzLCAwLAorICAgICAgICAgICAgICAgICAgICAgICAgICAgIG9wdGlv bnMgfCBQQ1JFX05PX1VURjhfQ0hFQ0sgfCBQQ1JFX05PVEVPTCwKKyAgICAgICAgICAgICAg ICAgICAgICAgICAgICBzdWIsIG5zdWIpKTsKICAgICAgICAgICBpZiAoZSAhPSBQQ1JFX0VS Uk9SX05PTUFUQ0gpCiAgICAgICAgICAgICBicmVhazsKICAgICAgICAgICBwICs9IHZhbGlk X2J5dGVzICsgMTsKLSAgICAgICAgICBvcHRpb25zID0gUENSRV9OT1RCT0w7CisgICAgICAg ICAgYm9sID0gZmFsc2U7CiAgICAgICAgIH0KIAogICAgICAgaWYgKGUgIT0gUENSRV9FUlJP Ul9OT01BVENIKQogICAgICAgICBicmVhazsKLSAgICAgIG9wdGlvbnMgPSAwOworICAgICAg Ym9sID0gdHJ1ZTsKICAgICB9CiAKICAgaWYgKGUgPD0gMCkKQEAgLTE4OCw3ICsxOTksNyBA QCBQZXhlY3V0ZSAoY2hhciBjb25zdCAqYnVmLCBzaXplX3Qgc2l6ZSwgc2l6ZV90ICptYXRj aF9zaXplLAogICAgICAgc3dpdGNoIChlKQogICAgICAgICB7CiAgICAgICAgIGNhc2UgUENS RV9FUlJPUl9OT01BVENIOgotICAgICAgICAgIHJldHVybiAtMTsKKyAgICAgICAgICBicmVh azsKIAogICAgICAgICBjYXNlIFBDUkVfRVJST1JfTk9NRU1PUlk6CiAgICAgICAgICAgZXJy b3IgKEVYSVRfVFJPVUJMRSwgMCwgXygibWVtb3J5IGV4aGF1c3RlZCIpKTsKQEAgLTIwNSw3 ICsyMTYsNiBAQCBQZXhlY3V0ZSAoY2hhciBjb25zdCAqYnVmLCBzaXplX3Qgc2l6ZSwgc2l6 ZV90ICptYXRjaF9zaXplLAogICAgICAgICAgIGVycm9yIChFWElUX1RST1VCTEUsIDAsIF8o ImludGVybmFsIFBDUkUgZXJyb3I6ICVkIiksIGUpOwogICAgICAgICB9CiAKLSAgICAgIC8q IE5PVFJFQUNIRUQgKi8KICAgICAgIHJldHVybiAtMTsKICAgICB9CiAgIGVsc2UKLS0gCjEu OS4zCgo= --------------080305090207030305040600 Content-Type: text/plain; charset=UTF-8; name="0003-grep-use-bool-for-boolean-in-grep.c.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0003-grep-use-bool-for-boolean-in-grep.c.patch" RnJvbSBmNGE5NWRmZjkwMjg0MDgyNmVlZDY5ZmNjNzIwNWRiNWIzZTg2NTczIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTYXQsIDEzIFNlcCAyMDE0IDE3OjU4OjUzIC0wNzAwClN1YmplY3Q6IFtQUk9Q T1NFRCBQQVRDSCAzLzZdIGdyZXA6IHVzZSBib29sIGZvciBib29sZWFuIGluIGdyZXAuYwoK KiBzcmMvZ3JlcC5jIChzaG93X3ZlcnNpb24sIHN1cHByZXNzX2Vycm9ycywgb25seV9tYXRj aGluZykKKGFsaWduX3RhYnMsIG1hdGNoX2ljYXNlLCBtYXRjaF93b3JkcywgbWF0Y2hfbGlu ZXMsIGVycnNlZW4pCih3cml0ZV9lcnJvcl9zZWVuLCBpc19kZXZpY2VfbW9kZSwgdXNhYmxl X3N0X3NpemUpCihmaWxlX2lzX2JpbmFyeSwgc2tpcHBlZF9maWxlLCByZXNldCwgZmlsbGJ1 Ziwgb3V0X3F1aWV0KQoob3V0X2xpbmUsIG91dF9ieXRlLCBjb3VudF9tYXRjaGVzLCBub19m aWxlbmFtZXMsIGxpbmVfYnVmZmVyZWQpCihkb25lX29uX21hdGNoLCBleGl0X29uX21hdGNo LCBwcmludF9saW5lX2hlYWQsIHBybGluZSwgZ3JlcCkKKGdyZXBkaXJlbnQsIGdyZXBmaWxl LCBncmVwZGVzYywgZ3JlcF9jb21tYW5kX2xpbmVfYXJnKQooZ2V0X25vbmRpZ2l0X29wdGlv biwgbWFpbik6IFVzZSBib29sIGZvciBib29sZWFuLgoocHJpbnRfbGluZV9oZWFkLCBwcmxp bmUpOiBVc2UgY2hhciBmb3IgYnl0ZS4KKiBzcmMvZ3JlcC5oOiBJbmNsdWRlIDxzdGRib29s Lmg+LCBhbmQgYWRqdXN0IGRlY2xzIHRvIG1hdGNoCmNoYW5nZXMgaW4gZ3JlcC5jLgotLS0K IHNyYy9ncmVwLmMgfCAyMzIgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKy0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQogc3JjL2dyZXAuaCB8ICAgOCArKy0KIDIgZmls ZXMgY2hhbmdlZCwgMTI0IGluc2VydGlvbnMoKyksIDExNiBkZWxldGlvbnMoLSkKCmRpZmYg LS1naXQgYS9zcmMvZ3JlcC5jIGIvc3JjL2dyZXAuYwppbmRleCA3MTlkZmYxLi4xZTBjYzZk IDEwMDY0NAotLS0gYS9zcmMvZ3JlcC5jCisrKyBiL3NyYy9ncmVwLmMKQEAgLTY1LDIwICs2 NSwyMCBAQCBzdGF0aWMgc3RydWN0IHN0YXQgb3V0X3N0YXQ7CiAvKiBpZiBub24temVybywg ZGlzcGxheSB1c2FnZSBpbmZvcm1hdGlvbiBhbmQgZXhpdCAqLwogc3RhdGljIGludCBzaG93 X2hlbHA7CiAKLS8qIElmIG5vbi16ZXJvLCBwcmludCB0aGUgdmVyc2lvbiBvbiBzdGFuZGFy ZCBvdXRwdXQgYW5kIGV4aXQuICAqLwotc3RhdGljIGludCBzaG93X3ZlcnNpb247CisvKiBQ cmludCB0aGUgdmVyc2lvbiBvbiBzdGFuZGFyZCBvdXRwdXQgYW5kIGV4aXQuICAqLworc3Rh dGljIGJvb2wgc2hvd192ZXJzaW9uOwogCi0vKiBJZiBub256ZXJvLCBzdXBwcmVzcyBkaWFn bm9zdGljcyBmb3Igbm9uZXhpc3RlbnQgb3IgdW5yZWFkYWJsZSBmaWxlcy4gICovCi1zdGF0 aWMgaW50IHN1cHByZXNzX2Vycm9yczsKKy8qIFN1cHByZXNzIGRpYWdub3N0aWNzIGZvciBu b25leGlzdGVudCBvciB1bnJlYWRhYmxlIGZpbGVzLiAgKi8KK3N0YXRpYyBib29sIHN1cHBy ZXNzX2Vycm9yczsKIAogLyogSWYgbm9uemVybywgdXNlIGNvbG9yIG1hcmtlcnMuICAqLwog c3RhdGljIGludCBjb2xvcl9vcHRpb247CiAKLS8qIElmIG5vbnplcm8sIHNob3cgb25seSB0 aGUgcGFydCBvZiBhIGxpbmUgbWF0Y2hpbmcgdGhlIGV4cHJlc3Npb24uICovCi1zdGF0aWMg aW50IG9ubHlfbWF0Y2hpbmc7CisvKiBTaG93IG9ubHkgdGhlIHBhcnQgb2YgYSBsaW5lIG1h dGNoaW5nIHRoZSBleHByZXNzaW9uLiAqLworc3RhdGljIGJvb2wgb25seV9tYXRjaGluZzsK IAogLyogSWYgbm9uemVybywgbWFrZSBzdXJlIGZpcnN0IGNvbnRlbnQgY2hhciBpbiBhIGxp bmUgaXMgb24gYSB0YWIgc3RvcC4gKi8KLXN0YXRpYyBpbnQgYWxpZ25fdGFiczsKK3N0YXRp YyBib29sIGFsaWduX3RhYnM7CiAKIC8qIFRoZSBncm91cCBzZXBhcmF0b3IgdXNlZCB3aGVu IGNvbnRleHQgaXMgcmVxdWVzdGVkLiAqLwogc3RhdGljIGNvbnN0IGNoYXIgKmdyb3VwX3Nl cGFyYXRvciA9IFNFUF9TVFJfR1JPVVA7CkBAIC0zNDcsOSArMzQ3LDkgQEAgc3RhdGljIHN0 cnVjdCBvcHRpb24gY29uc3QgbG9uZ19vcHRpb25zW10gPQogfTsKIAogLyogRGVmaW5lIGZs YWdzIGRlY2xhcmVkIGluIGdyZXAuaC4gKi8KLWludCBtYXRjaF9pY2FzZTsKLWludCBtYXRj aF93b3JkczsKLWludCBtYXRjaF9saW5lczsKK2Jvb2wgbWF0Y2hfaWNhc2U7Citib29sIG1h dGNoX3dvcmRzOworYm9vbCBtYXRjaF9saW5lczsKIHVuc2lnbmVkIGNoYXIgZW9sYnl0ZTsK IAogc3RhdGljIGNoYXIgY29uc3QgKm1hdGNoZXI7CkBAIC0zNTgsOCArMzU4LDggQEAgc3Rh dGljIGNoYXIgY29uc3QgKm1hdGNoZXI7CiAvKiBUaGUgaW5wdXQgZmlsZSBuYW1lLCBvciAo aWYgc3RhbmRhcmQgaW5wdXQpICItIiBvciBhIC0tbGFiZWwgYXJndW1lbnQuICAqLwogc3Rh dGljIGNoYXIgY29uc3QgKmZpbGVuYW1lOwogc3RhdGljIHNpemVfdCBmaWxlbmFtZV9wcmVm aXhfbGVuOwotc3RhdGljIGludCBlcnJzZWVuOwotc3RhdGljIGludCB3cml0ZV9lcnJvcl9z ZWVuOworc3RhdGljIGJvb2wgZXJyc2VlbjsKK3N0YXRpYyBib29sIHdyaXRlX2Vycm9yX3Nl ZW47CiAKIGVudW0gZGlyZWN0b3JpZXNfdHlwZQogICB7CkBAIC0zOTIsMjIgKzM5MiwyMiBA QCBzdGF0aWMgZW51bQogICAgIFNLSVBfREVWSUNFUwogICB9IGRldmljZXMgPSBSRUFEX0NP TU1BTkRfTElORV9ERVZJQ0VTOwogCi1zdGF0aWMgaW50IGdyZXBmaWxlIChpbnQsIGNoYXIg Y29uc3QgKiwgaW50LCBpbnQpOwotc3RhdGljIGludCBncmVwZGVzYyAoaW50LCBpbnQpOwor c3RhdGljIGJvb2wgZ3JlcGZpbGUgKGludCwgY2hhciBjb25zdCAqLCBib29sLCBib29sKTsK K3N0YXRpYyBib29sIGdyZXBkZXNjIChpbnQsIGJvb2wpOwogCiBzdGF0aWMgdm9pZCBkb3Nf YmluYXJ5ICh2b2lkKTsKIHN0YXRpYyB2b2lkIGRvc191bml4X2J5dGVfb2Zmc2V0cyAodm9p ZCk7CiBzdGF0aWMgc2l6ZV90IHVuZG9zc2lmeV9pbnB1dCAoY2hhciAqLCBzaXplX3QpOwog Ci1zdGF0aWMgaW50CitzdGF0aWMgYm9vbAogaXNfZGV2aWNlX21vZGUgKG1vZGVfdCBtKQog ewogICByZXR1cm4gU19JU0NIUiAobSkgfHwgU19JU0JMSyAobSkgfHwgU19JU1NPQ0sgKG0p IHx8IFNfSVNGSUZPIChtKTsKIH0KIAotLyogUmV0dXJuIG5vbnplcm8gaWYgU1QtPnN0X3Np emUgaXMgZGVmaW5lZC4gIEFzc3VtZSB0aGUgZmlsZSBpcyBub3QgYQorLyogUmV0dXJuIGlm IFNULT5zdF9zaXplIGlzIGRlZmluZWQuICBBc3N1bWUgdGhlIGZpbGUgaXMgbm90IGEKICAg IHN5bWJvbGljIGxpbmsuICAqLwotc3RhdGljIGludAorc3RhdGljIGJvb2wKIHVzYWJsZV9z dF9zaXplIChzdHJ1Y3Qgc3RhdCBjb25zdCAqc3QpCiB7CiAgIHJldHVybiBTX0lTUkVHIChz dC0+c3RfbW9kZSkgfHwgU19UWVBFSVNTSE0gKHN0KSB8fCBTX1RZUEVJU1RNTyAoc3QpOwpA QCAtNDI1LDcgKzQyNSw3IEBAIHN1cHByZXNzaWJsZV9lcnJvciAoY2hhciBjb25zdCAqbWVz ZywgaW50IGVycm51bSkKIHsKICAgaWYgKCEgc3VwcHJlc3NfZXJyb3JzKQogICAgIGVycm9y ICgwLCBlcnJudW0sICIlcyIsIG1lc2cpOwotICBlcnJzZWVuID0gMTsKKyAgZXJyc2VlbiA9 IHRydWU7CiB9CiAKIC8qIElmIHRoZXJlIGhhcyBhbHJlYWR5IGJlZW4gYSB3cml0ZSBlcnJv ciwgZG9uJ3QgYm90aGVyIGNsb3NpbmcKQEAgLTQzNywxMCArNDM3LDEwIEBAIGNsZWFuX3Vw X3N0ZG91dCAodm9pZCkKICAgICBjbG9zZV9zdGRvdXQgKCk7CiB9CiAKLS8qIFJldHVybiAx IGlmIGEgZmlsZSBpcyBrbm93biB0byBiZSBiaW5hcnkgZm9yIHRoZSBwdXJwb3NlIG9mICdn cmVwJy4KKy8qIFJldHVybiB0cnVlIGlmIGEgZmlsZSBpcyBrbm93biB0byBiZSBiaW5hcnkg Zm9yIHRoZSBwdXJwb3NlIG9mICdncmVwJy4KICAgIEJVRiwgb2Ygc2l6ZSBCVUZTSVpFLCBp cyB0aGUgaW5pdGlhbCBidWZmZXIgcmVhZCBmcm9tIHRoZSBmaWxlIHdpdGgKICAgIGRlc2Ny aXB0b3IgRkQgYW5kIHN0YXR1cyBTVC4gICovCi1zdGF0aWMgaW50CitzdGF0aWMgYm9vbAog ZmlsZV9pc19iaW5hcnkgKGNoYXIgY29uc3QgKmJ1Ziwgc2l6ZV90IGJ1ZnNpemUsIGludCBm ZCwgc3RydWN0IHN0YXQgY29uc3QgKnN0KQogewogICAjaWZuZGVmIFNFRUtfSE9MRQpAQCAt NDU1LDcgKzQ1NSw3IEBAIGZpbGVfaXNfYmluYXJ5IChjaGFyIGNvbnN0ICpidWYsIHNpemVf dCBidWZzaXplLCBpbnQgZmQsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKICAgLyogSWYgdGhl IGluaXRpYWwgYnVmZmVyIGNvbnRhaW5zIGEgbnVsbCBieXRlLCBndWVzcyB0aGF0IHRoZSBm aWxlCiAgICAgIGlzIGJpbmFyeS4gICovCiAgIGlmIChtZW1jaHIgKGJ1ZiwgJ1wwJywgYnVm c2l6ZSkpCi0gICAgcmV0dXJuIDE7CisgICAgcmV0dXJuIHRydWU7CiAKICAgLyogSWYgdGhl IGZpbGUgaGFzIGhvbGVzLCBpdCBtdXN0IGNvbnRhaW4gYSBudWxsIGJ5dGUgc29tZXdoZXJl LiAgKi8KICAgaWYgKFNFRUtfSE9MRSAhPSBTRUVLX0VORCAmJiB1c2FibGVfc3Rfc2l6ZSAo c3QpKQpAQCAtNDY1LDcgKzQ2NSw3IEBAIGZpbGVfaXNfYmluYXJ5IChjaGFyIGNvbnN0ICpi dWYsIHNpemVfdCBidWZzaXplLCBpbnQgZmQsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKICAg ICAgICAgewogICAgICAgICAgIGN1ciA9IGxzZWVrIChmZCwgMCwgU0VFS19DVVIpOwogICAg ICAgICAgIGlmIChjdXIgPCAwKQotICAgICAgICAgICAgcmV0dXJuIDA7CisgICAgICAgICAg ICByZXR1cm4gZmFsc2U7CiAgICAgICAgIH0KIAogICAgICAgLyogTG9vayBmb3IgYSBob2xl IGFmdGVyIHRoZSBjdXJyZW50IGxvY2F0aW9uLiAgKi8KQEAgLTQ3NSwxMiArNDc1LDEyIEBA IGZpbGVfaXNfYmluYXJ5IChjaGFyIGNvbnN0ICpidWYsIHNpemVfdCBidWZzaXplLCBpbnQg ZmQsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKICAgICAgICAgICBpZiAobHNlZWsgKGZkLCBj dXIsIFNFRUtfU0VUKSA8IDApCiAgICAgICAgICAgICBzdXBwcmVzc2libGVfZXJyb3IgKGZp bGVuYW1lLCBlcnJubyk7CiAgICAgICAgICAgaWYgKGhvbGVfc3RhcnQgPCBzdC0+c3Rfc2l6 ZSkKLSAgICAgICAgICAgIHJldHVybiAxOworICAgICAgICAgICAgcmV0dXJuIHRydWU7CiAg ICAgICAgIH0KICAgICB9CiAKICAgLyogR3Vlc3MgdGhhdCB0aGUgZmlsZSBkb2VzIG5vdCBj b250YWluIGJpbmFyeSBkYXRhLiAgKi8KLSAgcmV0dXJuIDA7CisgIHJldHVybiBmYWxzZTsK IH0KIAogLyogQ29udmVydCBTVFIgdG8gYSBub25uZWdhdGl2ZSBpbnRlZ2VyLCBzdG9yaW5n IHRoZSByZXN1bHQgaW4gKk9VVC4KQEAgLTUwMywxMSArNTAzLDExIEBAIGNvbnRleHRfbGVu Z3RoX2FyZyAoY2hhciBjb25zdCAqc3RyLCBpbnRtYXhfdCAqb3V0KQogICAgIH0KIH0KIAot LyogUmV0dXJuIG5vbnplcm8gaWYgdGhlIGZpbGUgd2l0aCBOQU1FIHNob3VsZCBiZSBza2lw cGVkLgotICAgSWYgQ09NTUFORF9MSU5FIGlzIG5vbnplcm8sIGl0IGlzIGEgY29tbWFuZC1s aW5lIGFyZ3VtZW50LgotICAgSWYgSVNfRElSIGlzIG5vbnplcm8sIGl0IGlzIGEgZGlyZWN0 b3J5LiAgKi8KLXN0YXRpYyBpbnQKLXNraXBwZWRfZmlsZSAoY2hhciBjb25zdCAqbmFtZSwg aW50IGNvbW1hbmRfbGluZSwgaW50IGlzX2RpcikKKy8qIFJldHVybiB0cnVlIGlmIHRoZSBm aWxlIHdpdGggTkFNRSBzaG91bGQgYmUgc2tpcHBlZC4KKyAgIElmIENPTU1BTkRfTElORSwg aXQgaXMgYSBjb21tYW5kLWxpbmUgYXJndW1lbnQuCisgICBJZiBJU19ESVIsIGl0IGlzIGEg ZGlyZWN0b3J5LiAgKi8KK3N0YXRpYyBib29sCitza2lwcGVkX2ZpbGUgKGNoYXIgY29uc3Qg Km5hbWUsIGJvb2wgY29tbWFuZF9saW5lLCBib29sIGlzX2RpcikKIHsKICAgcmV0dXJuIChp c19kaXIKICAgICAgICAgICA/IChkaXJlY3RvcmllcyA9PSBTS0lQX0RJUkVDVE9SSUVTCkBA IC01NDEsOSArNTQxLDkgQEAgc3RhdGljIG9mZl90IGFmdGVyX2xhc3RfbWF0Y2g7CS8qIFBv aW50ZXIgYWZ0ZXIgbGFzdCBtYXRjaGluZyBsaW5lIHRoYXQKICAgID8gKHZhbCkgXAogICAg OiAodmFsKSArICgoYWxpZ25tZW50KSAtIChzaXplX3QpICh2YWwpICUgKGFsaWdubWVudCkp KQogCi0vKiBSZXNldCB0aGUgYnVmZmVyIGZvciBhIG5ldyBmaWxlLCByZXR1cm5pbmcgemVy byBpZiB3ZSBzaG91bGQgc2tpcCBpdC4KKy8qIFJlc2V0IHRoZSBidWZmZXIgZm9yIGEgbmV3 IGZpbGUsIHJldHVybmluZyBmYWxzZSBpZiB3ZSBzaG91bGQgc2tpcCBpdC4KICAgIEluaXRp YWxpemUgb24gdGhlIGZpcnN0IHRpbWUgdGhyb3VnaC4gKi8KLXN0YXRpYyBpbnQKK3N0YXRp YyBib29sCiByZXNldCAoaW50IGZkLCBzdHJ1Y3Qgc3RhdCBjb25zdCAqc3QpCiB7CiAgIGlm ICghIHBhZ2VzaXplKQpAQCAtNTY5LDIyICs1NjksMjIgQEAgcmVzZXQgKGludCBmZCwgc3Ry dWN0IHN0YXQgY29uc3QgKnN0KQogICAgICAgICAgIGlmIChidWZvZmZzZXQgPCAwKQogICAg ICAgICAgICAgewogICAgICAgICAgICAgICBzdXBwcmVzc2libGVfZXJyb3IgKF8oImxzZWVr IGZhaWxlZCIpLCBlcnJubyk7Ci0gICAgICAgICAgICAgIHJldHVybiAwOworICAgICAgICAg ICAgICByZXR1cm4gZmFsc2U7CiAgICAgICAgICAgICB9CiAgICAgICAgIH0KICAgICB9Ci0g IHJldHVybiAxOworICByZXR1cm4gdHJ1ZTsKIH0KIAogLyogUmVhZCBuZXcgc3R1ZmYgaW50 byB0aGUgYnVmZmVyLCBzYXZpbmcgdGhlIHNwZWNpZmllZAogICAgYW1vdW50IG9mIG9sZCBz dHVmZi4gIFdoZW4gd2UncmUgZG9uZSwgJ2J1ZmJlZycgcG9pbnRzCiAgICB0byB0aGUgYmVn aW5uaW5nIG9mIHRoZSBidWZmZXIgY29udGVudHMsIGFuZCAnYnVmbGltJwotICAgcG9pbnRz IGp1c3QgYWZ0ZXIgdGhlIGVuZC4gIFJldHVybiB6ZXJvIGlmIHRoZXJlJ3MgYW4gZXJyb3Iu ICAqLwotc3RhdGljIGludAorICAgcG9pbnRzIGp1c3QgYWZ0ZXIgdGhlIGVuZC4gIFJldHVy biBmYWxzZSBpZiB0aGVyZSdzIGFuIGVycm9yLiAgKi8KK3N0YXRpYyBib29sCiBmaWxsYnVm IChzaXplX3Qgc2F2ZSwgc3RydWN0IHN0YXQgY29uc3QgKnN0KQogewogICBzaXplX3QgZmls bHNpemU7Ci0gIGludCBjYyA9IDE7CisgIGJvb2wgY2MgPSB0cnVlOwogICBjaGFyICpyZWFk YnVmOwogICBzaXplX3QgcmVhZHNpemU7CiAKQEAgLTY0Niw3ICs2NDYsMTAgQEAgZmlsbGJ1 ZiAoc2l6ZV90IHNhdmUsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKIAogICBmaWxsc2l6ZSA9 IHNhZmVfcmVhZCAoYnVmZGVzYywgcmVhZGJ1ZiwgcmVhZHNpemUpOwogICBpZiAoZmlsbHNp emUgPT0gU0FGRV9SRUFEX0VSUk9SKQotICAgIGZpbGxzaXplID0gY2MgPSAwOworICAgIHsK KyAgICAgIGZpbGxzaXplID0gMDsKKyAgICAgIGNjID0gZmFsc2U7CisgICAgfQogICBidWZv ZmZzZXQgKz0gZmlsbHNpemU7CiAgIGZpbGxzaXplID0gdW5kb3NzaWZ5X2lucHV0IChyZWFk YnVmLCBmaWxsc2l6ZSk7CiAgIGJ1ZmxpbSA9IHJlYWRidWYgKyBmaWxsc2l6ZTsKQEAgLTY2 MiwyMCArNjY1LDE5IEBAIHN0YXRpYyBlbnVtCiB9IGJpbmFyeV9maWxlczsJCS8qIEhvdyB0 byBoYW5kbGUgYmluYXJ5IGZpbGVzLiAgKi8KIAogc3RhdGljIGludCBmaWxlbmFtZV9tYXNr OwkvKiBJZiB6ZXJvLCBvdXRwdXQgbnVsbHMgYWZ0ZXIgZmlsZW5hbWVzLiAgKi8KLXN0YXRp YyBpbnQgb3V0X3F1aWV0OwkJLyogU3VwcHJlc3MgYWxsIG5vcm1hbCBvdXRwdXQuICovCitz dGF0aWMgYm9vbCBvdXRfcXVpZXQ7CQkvKiBTdXBwcmVzcyBhbGwgbm9ybWFsIG91dHB1dC4g Ki8KIHN0YXRpYyBib29sIG91dF9pbnZlcnQ7CQkvKiBQcmludCBub25tYXRjaGluZyBzdHVm Zi4gKi8KIHN0YXRpYyBpbnQgb3V0X2ZpbGU7CQkvKiBQcmludCBmaWxlbmFtZXMuICovCi1z dGF0aWMgaW50IG91dF9saW5lOwkJLyogUHJpbnQgbGluZSBudW1iZXJzLiAqLwotc3RhdGlj IGludCBvdXRfYnl0ZTsJCS8qIFByaW50IGJ5dGUgb2Zmc2V0cy4gKi8KK3N0YXRpYyBib29s IG91dF9saW5lOwkJLyogUHJpbnQgbGluZSBudW1iZXJzLiAqLworc3RhdGljIGJvb2wgb3V0 X2J5dGU7CQkvKiBQcmludCBieXRlIG9mZnNldHMuICovCiBzdGF0aWMgaW50bWF4X3Qgb3V0 X2JlZm9yZTsJLyogTGluZXMgb2YgbGVhZGluZyBjb250ZXh0LiAqLwogc3RhdGljIGludG1h eF90IG91dF9hZnRlcjsJLyogTGluZXMgb2YgdHJhaWxpbmcgY29udGV4dC4gKi8KLXN0YXRp YyBpbnQgY291bnRfbWF0Y2hlczsJLyogQ291bnQgbWF0Y2hpbmcgbGluZXMuICAqLworc3Rh dGljIGJvb2wgY291bnRfbWF0Y2hlczsJLyogQ291bnQgbWF0Y2hpbmcgbGluZXMuICAqLwog c3RhdGljIGludCBsaXN0X2ZpbGVzOwkJLyogTGlzdCBtYXRjaGluZyBmaWxlcy4gICovCi1z dGF0aWMgaW50IG5vX2ZpbGVuYW1lczsJLyogU3VwcHJlc3MgZmlsZSBuYW1lcy4gICovCitz dGF0aWMgYm9vbCBub19maWxlbmFtZXM7CS8qIFN1cHByZXNzIGZpbGUgbmFtZXMuICAqLwog c3RhdGljIGludG1heF90IG1heF9jb3VudDsJLyogU3RvcCBhZnRlciBvdXRwdXR0aW5nIHRo aXMgbWFueQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBsaW5lcyBmcm9t IGFuIGlucHV0IGZpbGUuICAqLwotc3RhdGljIGludCBsaW5lX2J1ZmZlcmVkOyAgICAgICAv KiBJZiBub256ZXJvLCB1c2UgbGluZSBidWZmZXJpbmcsIGkuZS4KLSAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgZmZsdXNoIGV2ZXJ5bGluZSBvdXQuICAqLworc3RhdGlj IGJvb2wgbGluZV9idWZmZXJlZDsJLyogVXNlIGxpbmUgYnVmZmVyaW5nLiAgKi8KIHN0YXRp YyBjaGFyICpsYWJlbCA9IE5VTEw7ICAgICAgLyogRmFrZSBmaWxlbmFtZSBmb3Igc3RkaW4g Ki8KIAogCkBAIC02ODksOCArNjkxLDggQEAgc3RhdGljIHVpbnRtYXhfdCB0b3RhbG5sOwkv KiBUb3RhbCBuZXdsaW5lIGNvdW50IGJlZm9yZSBsYXN0bmwuICovCiBzdGF0aWMgaW50bWF4 X3Qgb3V0bGVmdDsJLyogTWF4aW11bSBudW1iZXIgb2YgbGluZXMgdG8gYmUgb3V0cHV0LiAg Ki8KIHN0YXRpYyBpbnRtYXhfdCBwZW5kaW5nOwkvKiBQZW5kaW5nIGxpbmVzIG9mIG91dHB1 dC4KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgQWx3YXlzIGtlcHQgMCBp ZiBvdXRfcXVpZXQgaXMgdHJ1ZS4gICovCi1zdGF0aWMgaW50IGRvbmVfb25fbWF0Y2g7CS8q IFN0b3Agc2Nhbm5pbmcgZmlsZSBvbiBmaXJzdCBtYXRjaC4gICovCi1zdGF0aWMgaW50IGV4 aXRfb25fbWF0Y2g7CS8qIEV4aXQgb24gZmlyc3QgbWF0Y2guICAqLworc3RhdGljIGJvb2wg ZG9uZV9vbl9tYXRjaDsJLyogU3RvcCBzY2FubmluZyBmaWxlIG9uIGZpcnN0IG1hdGNoLiAg Ki8KK3N0YXRpYyBib29sIGV4aXRfb25fbWF0Y2g7CS8qIEV4aXQgb24gZmlyc3QgbWF0Y2gu ICAqLwogCiAjaW5jbHVkZSAiZG9zYnVmLmMiCiAKQEAgLTc2OCwxNSArNzcwLDE1IEBAIHBy aW50X29mZnNldCAodWludG1heF90IHBvcywgaW50IG1pbl93aWR0aCwgY29uc3QgY2hhciAq Y29sb3IpCiAKIC8qIFByaW50IGEgd2hvbGUgbGluZSBoZWFkIChmaWxlbmFtZSwgbGluZSwg Ynl0ZSkuICAqLwogc3RhdGljIHZvaWQKLXByaW50X2xpbmVfaGVhZCAoY2hhciBjb25zdCAq YmVnLCBjaGFyIGNvbnN0ICpsaW0sIGludCBzZXApCitwcmludF9saW5lX2hlYWQgKGNoYXIg Y29uc3QgKmJlZywgY2hhciBjb25zdCAqbGltLCBjaGFyIHNlcCkKIHsKLSAgaW50IHBlbmRp bmdfc2VwID0gMDsKKyAgYm9vbCBwZW5kaW5nX3NlcCA9IGZhbHNlOwogCiAgIGlmIChvdXRf ZmlsZSkKICAgICB7CiAgICAgICBwcmludF9maWxlbmFtZSAoKTsKICAgICAgIGlmIChmaWxl bmFtZV9tYXNrKQotICAgICAgICBwZW5kaW5nX3NlcCA9IDE7CisgICAgICAgIHBlbmRpbmdf c2VwID0gdHJ1ZTsKICAgICAgIGVsc2UKICAgICAgICAgZnB1dGMgKDAsIHN0ZG91dCk7CiAg ICAgfQpAQCAtNzkyLDcgKzc5NCw3IEBAIHByaW50X2xpbmVfaGVhZCAoY2hhciBjb25zdCAq YmVnLCBjaGFyIGNvbnN0ICpsaW0sIGludCBzZXApCiAgICAgICBpZiAocGVuZGluZ19zZXAp CiAgICAgICAgIHByaW50X3NlcCAoc2VwKTsKICAgICAgIHByaW50X29mZnNldCAodG90YWxu bCwgNCwgbGluZV9udW1fY29sb3IpOwotICAgICAgcGVuZGluZ19zZXAgPSAxOworICAgICAg cGVuZGluZ19zZXAgPSB0cnVlOwogICAgIH0KIAogICBpZiAob3V0X2J5dGUpCkBAIC04MDIs NyArODA0LDcgQEAgcHJpbnRfbGluZV9oZWFkIChjaGFyIGNvbnN0ICpiZWcsIGNoYXIgY29u c3QgKmxpbSwgaW50IHNlcCkKICAgICAgIGlmIChwZW5kaW5nX3NlcCkKICAgICAgICAgcHJp bnRfc2VwIChzZXApOwogICAgICAgcHJpbnRfb2Zmc2V0IChwb3MsIDYsIGJ5dGVfbnVtX2Nv bG9yKTsKLSAgICAgIHBlbmRpbmdfc2VwID0gMTsKKyAgICAgIHBlbmRpbmdfc2VwID0gdHJ1 ZTsKICAgICB9CiAKICAgaWYgKHBlbmRpbmdfc2VwKQpAQCAtOTAzLDkgKzkwNSw5IEBAIHBy aW50X2xpbmVfdGFpbCAoY29uc3QgY2hhciAqYmVnLCBjb25zdCBjaGFyICpsaW0sIGNvbnN0 IGNoYXIgKmxpbmVfY29sb3IpCiB9CiAKIHN0YXRpYyB2b2lkCi1wcmxpbmUgKGNoYXIgY29u c3QgKmJlZywgY2hhciBjb25zdCAqbGltLCBpbnQgc2VwKQorcHJsaW5lIChjaGFyIGNvbnN0 ICpiZWcsIGNoYXIgY29uc3QgKmxpbSwgY2hhciBzZXApCiB7Ci0gIGludCBtYXRjaGluZzsK KyAgYm9vbCBtYXRjaGluZzsKICAgY29uc3QgY2hhciAqbGluZV9jb2xvcjsKICAgY29uc3Qg Y2hhciAqbWF0Y2hfY29sb3I7CiAKQEAgLTk0NSw3ICs5NDcsNyBAQCBwcmxpbmUgKGNoYXIg Y29uc3QgKmJlZywgY2hhciBjb25zdCAqbGltLCBpbnQgc2VwKQogCiAgIGlmIChmZXJyb3Ig KHN0ZG91dCkpCiAgICAgewotICAgICAgd3JpdGVfZXJyb3Jfc2VlbiA9IDE7CisgICAgICB3 cml0ZV9lcnJvcl9zZWVuID0gdHJ1ZTsKICAgICAgIGVycm9yIChFWElUX1RST1VCTEUsIDAs IF8oIndyaXRlIGVycm9yIikpOwogICAgIH0KIApAQCAtMTA5OCwxMiArMTEwMCwxNCBAQCBz dGF0aWMgaW50bWF4X3QKIGdyZXAgKGludCBmZCwgc3RydWN0IHN0YXQgY29uc3QgKnN0KQog ewogICBpbnRtYXhfdCBubGluZXMsIGk7Ci0gIGludCBub3RfdGV4dDsKKyAgYm9vbCBub3Rf dGV4dDsKICAgc2l6ZV90IHJlc2lkdWUsIHNhdmU7CiAgIGNoYXIgb2xkYzsKICAgY2hhciAq YmVnOwogICBjaGFyICpsaW07CiAgIGNoYXIgZW9sID0gZW9sYnl0ZTsKKyAgYm9vbCBkb25l X29uX21hdGNoXzAgPSBkb25lX29uX21hdGNoOworICBib29sIG91dF9xdWlldF8wID0gb3V0 X3F1aWV0OwogCiAgIGlmICghIHJlc2V0IChmZCwgc3QpKQogICAgIHJldHVybiAwOwpAQCAt MTEzMCw4ICsxMTM0LDggQEAgZ3JlcCAoaW50IGZkLCBzdHJ1Y3Qgc3RhdCBjb25zdCAqc3Qp CiAgICAgICAgICAgICAgICYmIGZpbGVfaXNfYmluYXJ5IChidWZiZWcsIGJ1ZmxpbSAtIGJ1 ZmJlZywgZmQsIHN0KSk7CiAgIGlmIChub3RfdGV4dCAmJiBiaW5hcnlfZmlsZXMgPT0gV0lU SE9VVF9NQVRDSF9CSU5BUllfRklMRVMpCiAgICAgcmV0dXJuIDA7Ci0gIGRvbmVfb25fbWF0 Y2ggKz0gbm90X3RleHQ7Ci0gIG91dF9xdWlldCArPSBub3RfdGV4dDsKKyAgZG9uZV9vbl9t YXRjaCB8PSBub3RfdGV4dDsKKyAgb3V0X3F1aWV0IHw9IG5vdF90ZXh0OwogCiAgIGZvciAo OzspCiAgICAgewpAQCAtMTIwOCwxNyArMTIxMiwxOCBAQCBncmVwIChpbnQgZmQsIHN0cnVj dCBzdGF0IGNvbnN0ICpzdCkKICAgICB9CiAKICBmaW5pc2hfZ3JlcDoKLSAgZG9uZV9vbl9t YXRjaCAtPSBub3RfdGV4dDsKLSAgb3V0X3F1aWV0IC09IG5vdF90ZXh0OworICBkb25lX29u X21hdGNoID0gZG9uZV9vbl9tYXRjaF8wOworICBvdXRfcXVpZXQgPSBvdXRfcXVpZXRfMDsK ICAgaWYgKChub3RfdGV4dCAmIH5vdXRfcXVpZXQpICYmIG5saW5lcyAhPSAwKQogICAgIHBy aW50ZiAoXygiQmluYXJ5IGZpbGUgJXMgbWF0Y2hlc1xuIiksIGZpbGVuYW1lKTsKICAgcmV0 dXJuIG5saW5lczsKIH0KIAotc3RhdGljIGludAotZ3JlcGRpcmVudCAoRlRTICpmdHMsIEZU U0VOVCAqZW50LCBpbnQgY29tbWFuZF9saW5lKQorc3RhdGljIGJvb2wKK2dyZXBkaXJlbnQg KEZUUyAqZnRzLCBGVFNFTlQgKmVudCwgYm9vbCBjb21tYW5kX2xpbmUpCiB7Ci0gIGludCBm b2xsb3csIGRpcmRlc2M7CisgIGJvb2wgZm9sbG93OworICBpbnQgZGlyZGVzYzsKICAgc3Ry dWN0IHN0YXQgKnN0ID0gZW50LT5mdHNfc3RhdHA7CiAgIGNvbW1hbmRfbGluZSAmPSBlbnQt PmZ0c19sZXZlbCA9PSBGVFNfUk9PVExFVkVMOwogCkBAIC0xMjI2LDcgKzEyMzEsNyBAQCBn cmVwZGlyZW50IChGVFMgKmZ0cywgRlRTRU5UICplbnQsIGludCBjb21tYW5kX2xpbmUpCiAg ICAgewogICAgICAgaWYgKGRpcmVjdG9yaWVzID09IFJFQ1VSU0VfRElSRUNUT1JJRVMgJiYg Y29tbWFuZF9saW5lKQogICAgICAgICBvdXRfZmlsZSAmPSB+ICgyICogIW5vX2ZpbGVuYW1l cyk7Ci0gICAgICByZXR1cm4gMTsKKyAgICAgIHJldHVybiB0cnVlOwogICAgIH0KIAogICBp ZiAoc2tpcHBlZF9maWxlIChlbnQtPmZ0c19uYW1lLCBjb21tYW5kX2xpbmUsCkBAIC0xMjM0 LDcgKzEyMzksNyBAQCBncmVwZGlyZW50IChGVFMgKmZ0cywgRlRTRU5UICplbnQsIGludCBj b21tYW5kX2xpbmUpCiAgICAgICAgICAgICAgICAgICAgICB8fCBlbnQtPmZ0c19pbmZvID09 IEZUU19ETlIpKSkKICAgICB7CiAgICAgICBmdHNfc2V0IChmdHMsIGVudCwgRlRTX1NLSVAp OwotICAgICAgcmV0dXJuIDE7CisgICAgICByZXR1cm4gdHJ1ZTsKICAgICB9CiAKICAgZmls ZW5hbWUgPSBlbnQtPmZ0c19wYXRoICsgZmlsZW5hbWVfcHJlZml4X2xlbjsKQEAgLTEyNDcs NyArMTI1Miw3IEBAIGdyZXBkaXJlbnQgKEZUUyAqZnRzLCBGVFNFTlQgKmVudCwgaW50IGNv bW1hbmRfbGluZSkKICAgICAgIGlmIChkaXJlY3RvcmllcyA9PSBSRUNVUlNFX0RJUkVDVE9S SUVTKQogICAgICAgICB7CiAgICAgICAgICAgb3V0X2ZpbGUgfD0gMiAqICFub19maWxlbmFt ZXM7Ci0gICAgICAgICAgcmV0dXJuIDE7CisgICAgICAgICAgcmV0dXJuIHRydWU7CiAgICAg ICAgIH0KICAgICAgIGZ0c19zZXQgKGZ0cywgZW50LCBGVFNfU0tJUCk7CiAgICAgICBicmVh azsKQEAgLTEyNTYsMTMgKzEyNjEsMTMgQEAgZ3JlcGRpcmVudCAoRlRTICpmdHMsIEZUU0VO VCAqZW50LCBpbnQgY29tbWFuZF9saW5lKQogICAgICAgaWYgKCFzdXBwcmVzc19lcnJvcnMp CiAgICAgICAgIGVycm9yICgwLCAwLCBfKCJ3YXJuaW5nOiAlczogJXMiKSwgZmlsZW5hbWUs CiAgICAgICAgICAgICAgICBfKCJyZWN1cnNpdmUgZGlyZWN0b3J5IGxvb3AiKSk7Ci0gICAg ICByZXR1cm4gMTsKKyAgICAgIHJldHVybiB0cnVlOwogCiAgICAgY2FzZSBGVFNfRE5SOgog ICAgIGNhc2UgRlRTX0VSUjoKICAgICBjYXNlIEZUU19OUzoKICAgICAgIHN1cHByZXNzaWJs ZV9lcnJvciAoZmlsZW5hbWUsIGVudC0+ZnRzX2Vycm5vKTsKLSAgICAgIHJldHVybiAxOwor ICAgICAgcmV0dXJuIHRydWU7CiAKICAgICBjYXNlIEZUU19ERUZBVUxUOgogICAgIGNhc2Ug RlRTX05TT0s6CkBAIC0xMjc5LDEyICsxMjg0LDEyIEBAIGdyZXBkaXJlbnQgKEZUUyAqZnRz LCBGVFNFTlQgKmVudCwgaW50IGNvbW1hbmRfbGluZSkKICAgICAgICAgICAgICAgaWYgKGZz dGF0YXQgKGZ0cy0+ZnRzX2N3ZF9mZCwgZW50LT5mdHNfYWNjcGF0aCwgJnN0MSwgZmxhZykg IT0gMCkKICAgICAgICAgICAgICAgICB7CiAgICAgICAgICAgICAgICAgICBzdXBwcmVzc2li bGVfZXJyb3IgKGZpbGVuYW1lLCBlcnJubyk7Ci0gICAgICAgICAgICAgICAgICByZXR1cm4g MTsKKyAgICAgICAgICAgICAgICAgIHJldHVybiB0cnVlOwogICAgICAgICAgICAgICAgIH0K ICAgICAgICAgICAgICAgc3QgPSAmc3QxOwogICAgICAgICAgICAgfQogICAgICAgICAgIGlm IChpc19kZXZpY2VfbW9kZSAoc3QtPnN0X21vZGUpKQotICAgICAgICAgICAgcmV0dXJuIDE7 CisgICAgICAgICAgICByZXR1cm4gdHJ1ZTsKICAgICAgICAgfQogICAgICAgYnJlYWs7CiAK QEAgLTEyOTQsNyArMTI5OSw3IEBAIGdyZXBkaXJlbnQgKEZUUyAqZnRzLCBGVFNFTlQgKmVu dCwgaW50IGNvbW1hbmRfbGluZSkKIAogICAgIGNhc2UgRlRTX1NMOgogICAgIGNhc2UgRlRT X1c6Ci0gICAgICByZXR1cm4gMTsKKyAgICAgIHJldHVybiB0cnVlOwogCiAgICAgZGVmYXVs dDoKICAgICAgIGFib3J0ICgpOwpAQCAtMTMwNiwyNCArMTMxMSwyNCBAQCBncmVwZGlyZW50 IChGVFMgKmZ0cywgRlRTRU5UICplbnQsIGludCBjb21tYW5kX2xpbmUpCiAgIHJldHVybiBn cmVwZmlsZSAoZGlyZGVzYywgZW50LT5mdHNfYWNjcGF0aCwgZm9sbG93LCBjb21tYW5kX2xp bmUpOwogfQogCi1zdGF0aWMgaW50Ci1ncmVwZmlsZSAoaW50IGRpcmRlc2MsIGNoYXIgY29u c3QgKm5hbWUsIGludCBmb2xsb3csIGludCBjb21tYW5kX2xpbmUpCitzdGF0aWMgYm9vbAor Z3JlcGZpbGUgKGludCBkaXJkZXNjLCBjaGFyIGNvbnN0ICpuYW1lLCBib29sIGZvbGxvdywg Ym9vbCBjb21tYW5kX2xpbmUpCiB7CiAgIGludCBkZXNjID0gb3BlbmF0X3NhZmVyIChkaXJk ZXNjLCBuYW1lLCBPX1JET05MWSB8IChmb2xsb3cgPyAwIDogT19OT0ZPTExPVykpOwogICBp ZiAoZGVzYyA8IDApCiAgICAgewogICAgICAgaWYgKGZvbGxvdyB8fCAoZXJybm8gIT0gRUxP T1AgJiYgZXJybm8gIT0gRU1MSU5LKSkKICAgICAgICAgc3VwcHJlc3NpYmxlX2Vycm9yIChm aWxlbmFtZSwgZXJybm8pOwotICAgICAgcmV0dXJuIDE7CisgICAgICByZXR1cm4gdHJ1ZTsK ICAgICB9CiAgIHJldHVybiBncmVwZGVzYyAoZGVzYywgY29tbWFuZF9saW5lKTsKIH0KIAot c3RhdGljIGludAotZ3JlcGRlc2MgKGludCBkZXNjLCBpbnQgY29tbWFuZF9saW5lKQorc3Rh dGljIGJvb2wKK2dyZXBkZXNjIChpbnQgZGVzYywgYm9vbCBjb21tYW5kX2xpbmUpCiB7CiAg IGludG1heF90IGNvdW50OwotICBpbnQgc3RhdHVzID0gMTsKKyAgYm9vbCBzdGF0dXMgPSB0 cnVlOwogICBzdHJ1Y3Qgc3RhdCBzdDsKIAogICAvKiBHZXQgdGhlIGZpbGUgc3RhdHVzLCBw b3NzaWJseSBmb3IgdGhlIHNlY29uZCB0aW1lLiAgVGhpcyBjYXRjaGVzCkBAIC0xMzM5LDcg KzEzNDQsNyBAQCBncmVwZGVzYyAoaW50IGRlc2MsIGludCBjb21tYW5kX2xpbmUpCiAgICAg fQogCiAgIGlmIChkZXNjICE9IFNURElOX0ZJTEVOTyAmJiBjb21tYW5kX2xpbmUKLSAgICAg ICYmIHNraXBwZWRfZmlsZSAoZmlsZW5hbWUsIDEsIFNfSVNESVIgKHN0LnN0X21vZGUpKSkK KyAgICAgICYmIHNraXBwZWRfZmlsZSAoZmlsZW5hbWUsIHRydWUsIFNfSVNESVIgKHN0LnN0 X21vZGUpICE9IDApKQogICAgIGdvdG8gY2xvc2VvdXQ7CiAKICAgaWYgKGRlc2MgIT0gU1RE SU5fRklMRU5PCkBAIC0xNDA0LDcgKzE0MDksNyBAQCBncmVwZGVzYyAoaW50IGRlc2MsIGlu dCBjb21tYW5kX2xpbmUpCiAgICAgewogICAgICAgaWYgKCEgc3VwcHJlc3NfZXJyb3JzKQog ICAgICAgICBlcnJvciAoMCwgMCwgXygiaW5wdXQgZmlsZSAlcyBpcyBhbHNvIHRoZSBvdXRw dXQiKSwgcXVvdGUgKGZpbGVuYW1lKSk7Ci0gICAgICBlcnJzZWVuID0gMTsKKyAgICAgIGVy cnNlZW4gPSB0cnVlOwogICAgICAgZ290byBjbG9zZW91dDsKICAgICB9CiAKQEAgLTE0NTYs MTggKzE0NjEsMTggQEAgZ3JlcGRlc2MgKGludCBkZXNjLCBpbnQgY29tbWFuZF9saW5lKQog ICByZXR1cm4gc3RhdHVzOwogfQogCi1zdGF0aWMgaW50CitzdGF0aWMgYm9vbAogZ3JlcF9j b21tYW5kX2xpbmVfYXJnIChjaGFyIGNvbnN0ICphcmcpCiB7CiAgIGlmIChTVFJFUSAoYXJn LCAiLSIpKQogICAgIHsKICAgICAgIGZpbGVuYW1lID0gbGFiZWwgPyBsYWJlbCA6IF8oIihz dGFuZGFyZCBpbnB1dCkiKTsKLSAgICAgIHJldHVybiBncmVwZGVzYyAoU1RESU5fRklMRU5P LCAxKTsKKyAgICAgIHJldHVybiBncmVwZGVzYyAoU1RESU5fRklMRU5PLCB0cnVlKTsKICAg ICB9CiAgIGVsc2UKICAgICB7CiAgICAgICBmaWxlbmFtZSA9IGFyZzsKLSAgICAgIHJldHVy biBncmVwZmlsZSAoQVRfRkRDV0QsIGFyZywgMSwgMSk7CisgICAgICByZXR1cm4gZ3JlcGZp bGUgKEFUX0ZEQ1dELCBhcmcsIHRydWUsIHRydWUpOwogICAgIH0KIH0KIApAQCAtMTcyMSwx NCArMTcyNiwxNSBAQCBzdGF0aWMgaW50CiBnZXRfbm9uZGlnaXRfb3B0aW9uIChpbnQgYXJn YywgY2hhciAqY29uc3QgKmFyZ3YsIGludG1heF90ICpkZWZhdWx0X2NvbnRleHQpCiB7CiAg IHN0YXRpYyBpbnQgcHJldl9kaWdpdF9vcHRpbmQgPSAtMTsKLSAgaW50IHRoaXNfZGlnaXRf b3B0aW5kLCB3YXNfZGlnaXQ7CisgIGludCB0aGlzX2RpZ2l0X29wdGluZDsKKyAgYm9vbCB3 YXNfZGlnaXQ7CiAgIGNoYXIgYnVmW0lOVF9CVUZTSVpFX0JPVU5EIChpbnRtYXhfdCkgKyA0 XTsKICAgY2hhciAqcCA9IGJ1ZjsKICAgaW50IG9wdDsKIAotICB3YXNfZGlnaXQgPSAwOwor ICB3YXNfZGlnaXQgPSBmYWxzZTsKICAgdGhpc19kaWdpdF9vcHRpbmQgPSBvcHRpbmQ7Ci0g IHdoaWxlICgxKQorICB3aGlsZSAodHJ1ZSkKICAgICB7CiAgICAgICBvcHQgPSBnZXRvcHRf bG9uZyAoYXJnYywgKGNoYXIgKiopIGFyZ3YsIHNob3J0X29wdGlvbnMsCiAgICAgICAgICAg ICAgICAgICAgICAgICAgbG9uZ19vcHRpb25zLCBOVUxMKTsKQEAgLTE3NTgsNyArMTc2NCw3 IEBAIGdldF9ub25kaWdpdF9vcHRpb24gKGludCBhcmdjLCBjaGFyICpjb25zdCAqYXJndiwg aW50bWF4X3QgKmRlZmF1bHRfY29udGV4dCkKICAgICAgICAgfQogICAgICAgKnArKyA9IG9w dDsKIAotICAgICAgd2FzX2RpZ2l0ID0gMTsKKyAgICAgIHdhc19kaWdpdCA9IHRydWU7CiAg ICAgICBwcmV2X2RpZ2l0X29wdGluZCA9IHRoaXNfZGlnaXRfb3B0aW5kOwogICAgICAgdGhp c19kaWdpdF9vcHRpbmQgPSBvcHRpbmQ7CiAgICAgfQpAQCAtMTg5MCw5ICsxODk2LDkgQEAg bWFpbiAoaW50IGFyZ2MsIGNoYXIgKiphcmd2KQogewogICBjaGFyICprZXlzOwogICBzaXpl X3Qga2V5Y2MsIG9sZGNjLCBrZXlhbGxvYzsKLSAgaW50IHdpdGhfZmlsZW5hbWVzOworICBi b29sIHdpdGhfZmlsZW5hbWVzLCBvazsKICAgc2l6ZV90IGNjOwotICBpbnQgb3B0LCBzdGF0 dXMsIHByZXBlbmRlZDsKKyAgaW50IG9wdCwgcHJlcGVuZGVkOwogICBpbnQgcHJldl9vcHRp bmQsIGxhc3RfcmVjdXJzaXZlOwogICBpbnQgZnJlYWRfZXJybm87CiAgIGludG1heF90IGRl ZmF1bHRfY29udGV4dDsKQEAgLTE5MDQsNyArMTkxMCw3IEBAIG1haW4gKGludCBhcmdjLCBj aGFyICoqYXJndikKIAogICBrZXlzID0gTlVMTDsKICAga2V5Y2MgPSAwOwotICB3aXRoX2Zp bGVuYW1lcyA9IDA7CisgIHdpdGhfZmlsZW5hbWVzID0gZmFsc2U7CiAgIGVvbGJ5dGUgPSAn XG4nOwogICBmaWxlbmFtZV9tYXNrID0gfjA7CiAKQEAgLTE5MTUsNyArMTkyMSw3IEBAIG1h aW4gKGludCBhcmdjLCBjaGFyICoqYXJndikKICAgLyogRGVmYXVsdCBiZWZvcmUvYWZ0ZXIg Y29udGV4dDogY2hhbmdlZCBieSAtQy8tTlVNIG9wdGlvbnMgKi8KICAgZGVmYXVsdF9jb250 ZXh0ID0gLTE7CiAgIC8qIENoYW5nZWQgYnkgLW8gb3B0aW9uICovCi0gIG9ubHlfbWF0Y2hp bmcgPSAwOworICBvbmx5X21hdGNoaW5nID0gZmFsc2U7CiAKICAgLyogSW50ZXJuYXRpb25h bGl6YXRpb24uICovCiAjaWYgZGVmaW5lZCBIQVZFX1NFVExPQ0FMRQpAQCAtMTk4Nyw4ICsx OTkzLDggQEAgbWFpbiAoaW50IGFyZ2MsIGNoYXIgKiphcmd2KQogICAgICAgICBicmVhazsK IAogICAgICAgY2FzZSAnSCc6Ci0gICAgICAgIHdpdGhfZmlsZW5hbWVzID0gMTsKLSAgICAg ICAgbm9fZmlsZW5hbWVzID0gMDsKKyAgICAgICAgd2l0aF9maWxlbmFtZXMgPSB0cnVlOwor ICAgICAgICBub19maWxlbmFtZXMgPSBmYWxzZTsKICAgICAgICAgYnJlYWs7CiAKICAgICAg IGNhc2UgJ0knOgpAQCAtMTk5Niw3ICsyMDAyLDcgQEAgbWFpbiAoaW50IGFyZ2MsIGNoYXIg Kiphcmd2KQogICAgICAgICBicmVhazsKIAogICAgICAgY2FzZSAnVCc6Ci0gICAgICAgIGFs aWduX3RhYnMgPSAxOworICAgICAgICBhbGlnbl90YWJzID0gdHJ1ZTsKICAgICAgICAgYnJl YWs7CiAKICAgICAgIGNhc2UgJ1UnOgpAQCAtMjAwOCw3ICsyMDE0LDcgQEAgbWFpbiAoaW50 IGFyZ2MsIGNoYXIgKiphcmd2KQogICAgICAgICBicmVhazsKIAogICAgICAgY2FzZSAnVic6 Ci0gICAgICAgIHNob3dfdmVyc2lvbiA9IDE7CisgICAgICAgIHNob3dfdmVyc2lvbiA9IHRy dWU7CiAgICAgICAgIGJyZWFrOwogCiAgICAgICBjYXNlICdhJzoKQEAgLTIwMTYsMTEgKzIw MjIsMTEgQEAgbWFpbiAoaW50IGFyZ2MsIGNoYXIgKiphcmd2KQogICAgICAgICBicmVhazsK IAogICAgICAgY2FzZSAnYic6Ci0gICAgICAgIG91dF9ieXRlID0gMTsKKyAgICAgICAgb3V0 X2J5dGUgPSB0cnVlOwogICAgICAgICBicmVhazsKIAogICAgICAgY2FzZSAnYyc6Ci0gICAg ICAgIGNvdW50X21hdGNoZXMgPSAxOworICAgICAgICBjb3VudF9tYXRjaGVzID0gdHJ1ZTsK ICAgICAgICAgYnJlYWs7CiAKICAgICAgIGNhc2UgJ2QnOgpAQCAtMjA2MywxMyArMjA2OSwx MyBAQCBtYWluIChpbnQgYXJnYywgY2hhciAqKmFyZ3YpCiAgICAgICAgIGJyZWFrOwogCiAg ICAgICBjYXNlICdoJzoKLSAgICAgICAgd2l0aF9maWxlbmFtZXMgPSAwOwotICAgICAgICBu b19maWxlbmFtZXMgPSAxOworICAgICAgICB3aXRoX2ZpbGVuYW1lcyA9IGZhbHNlOworICAg ICAgICBub19maWxlbmFtZXMgPSB0cnVlOwogICAgICAgICBicmVhazsKIAogICAgICAgY2Fz ZSAnaSc6CiAgICAgICBjYXNlICd5JzoJCQkvKiBGb3Igb2xkLXRpbWVycyAuIC4gLiAqLwot ICAgICAgICBtYXRjaF9pY2FzZSA9IDE7CisgICAgICAgIG1hdGNoX2ljYXNlID0gdHJ1ZTsK ICAgICAgICAgYnJlYWs7CiAKICAgICAgIGNhc2UgJ0wnOgpAQCAtMjA5NSwxNSArMjEwMSwx NSBAQCBtYWluIChpbnQgYXJnYywgY2hhciAqKmFyZ3YpCiAgICAgICAgIGJyZWFrOwogCiAg ICAgICBjYXNlICduJzoKLSAgICAgICAgb3V0X2xpbmUgPSAxOworICAgICAgICBvdXRfbGlu ZSA9IHRydWU7CiAgICAgICAgIGJyZWFrOwogCiAgICAgICBjYXNlICdvJzoKLSAgICAgICAg b25seV9tYXRjaGluZyA9IDE7CisgICAgICAgIG9ubHlfbWF0Y2hpbmcgPSB0cnVlOwogICAg ICAgICBicmVhazsKIAogICAgICAgY2FzZSAncSc6Ci0gICAgICAgIGV4aXRfb25fbWF0Y2gg PSAxOworICAgICAgICBleGl0X29uX21hdGNoID0gdHJ1ZTsKICAgICAgICAgZXhpdF9mYWls dXJlID0gMDsKICAgICAgICAgYnJlYWs7CiAKQEAgLTIxMTYsNyArMjEyMiw3IEBAIG1haW4g KGludCBhcmdjLCBjaGFyICoqYXJndikKICAgICAgICAgYnJlYWs7CiAKICAgICAgIGNhc2Ug J3MnOgotICAgICAgICBzdXBwcmVzc19lcnJvcnMgPSAxOworICAgICAgICBzdXBwcmVzc19l cnJvcnMgPSB0cnVlOwogICAgICAgICBicmVhazsKIAogICAgICAgY2FzZSAndic6CkBAIC0y MTI0LDExICsyMTMwLDExIEBAIG1haW4gKGludCBhcmdjLCBjaGFyICoqYXJndikKICAgICAg ICAgYnJlYWs7CiAKICAgICAgIGNhc2UgJ3cnOgotICAgICAgICBtYXRjaF93b3JkcyA9IDE7 CisgICAgICAgIG1hdGNoX3dvcmRzID0gdHJ1ZTsKICAgICAgICAgYnJlYWs7CiAKICAgICAg IGNhc2UgJ3gnOgotICAgICAgICBtYXRjaF9saW5lcyA9IDE7CisgICAgICAgIG1hdGNoX2xp bmVzID0gdHJ1ZTsKICAgICAgICAgYnJlYWs7CiAKICAgICAgIGNhc2UgJ1onOgpAQCAtMjE5 OSw3ICsyMjA1LDcgQEAgbWFpbiAoaW50IGFyZ2MsIGNoYXIgKiphcmd2KQogICAgICAgICBi cmVhazsKIAogICAgICAgY2FzZSBMSU5FX0JVRkZFUkVEX09QVElPTjoKLSAgICAgICAgbGlu ZV9idWZmZXJlZCA9IDE7CisgICAgICAgIGxpbmVfYnVmZmVyZWQgPSB0cnVlOwogICAgICAg ICBicmVhazsKIAogICAgICAgY2FzZSBMQUJFTF9PUFRJT046CkBAIC0yMjI2LDggKzIyMzIs OCBAQCBtYWluIChpbnQgYXJnYywgY2hhciAqKmFyZ3YpCiAgICAgbGlzdF9maWxlcyA9IDA7 CiAgIGlmIChleGl0X29uX21hdGNoIHwgbGlzdF9maWxlcykKICAgICB7Ci0gICAgICBjb3Vu dF9tYXRjaGVzID0gMDsKLSAgICAgIGRvbmVfb25fbWF0Y2ggPSAxOworICAgICAgY291bnRf bWF0Y2hlcyA9IGZhbHNlOworICAgICAgZG9uZV9vbl9tYXRjaCA9IHRydWU7CiAgICAgfQog ICBvdXRfcXVpZXQgPSBjb3VudF9tYXRjaGVzIHwgZG9uZV9vbl9tYXRjaDsKIApAQCAtMjI2 Nyw3ICsyMjczLDcgQEAgbWFpbiAoaW50IGFyZ2MsIGNoYXIgKiphcmd2KQogICAgICAgICB7 CiAgICAgICAgICAgLyogTm8ga2V5cyB3ZXJlIHNwZWNpZmllZCAoZS5nLiAtZiAvZGV2L251 bGwpLiAgTWF0Y2ggbm90aGluZy4gICovCiAgICAgICAgICAgb3V0X2ludmVydCBePSB0cnVl OwotICAgICAgICAgIG1hdGNoX2xpbmVzID0gbWF0Y2hfd29yZHMgPSAwOworICAgICAgICAg IG1hdGNoX2xpbmVzID0gbWF0Y2hfd29yZHMgPSBmYWxzZTsKICAgICAgICAgfQogICAgICAg ZWxzZQogICAgICAgICAvKiBTdHJpcCB0cmFpbGluZyBuZXdsaW5lLiAqLwpAQCAtMjMyMywy MSArMjMyOSwyMSBAQCBtYWluIChpbnQgYXJnYywgY2hhciAqKmFyZ3YpCiAKICAgaWYgKG9w dGluZCA8IGFyZ2MpCiAgICAgewotICAgICAgc3RhdHVzID0gMTsKKyAgICAgIG9rID0gdHJ1 ZTsKICAgICAgIGRvCi0gICAgICAgIHN0YXR1cyAmPSBncmVwX2NvbW1hbmRfbGluZV9hcmcg KGFyZ3Zbb3B0aW5kXSk7CisgICAgICAgIG9rICY9IGdyZXBfY29tbWFuZF9saW5lX2FyZyAo YXJndltvcHRpbmRdKTsKICAgICAgIHdoaWxlICgrK29wdGluZCA8IGFyZ2MpOwogICAgIH0K ICAgZWxzZSBpZiAoZGlyZWN0b3JpZXMgPT0gUkVDVVJTRV9ESVJFQ1RPUklFUyAmJiBwcmVw ZW5kZWQgPCBsYXN0X3JlY3Vyc2l2ZSkKICAgICB7CiAgICAgICAvKiBHcmVwIHRocm91Z2gg Ii4iLCBvbWl0dGluZyBsZWFkaW5nICIuLyIgZnJvbSBkaWFnbm9zdGljcy4gICovCiAgICAg ICBmaWxlbmFtZV9wcmVmaXhfbGVuID0gMjsKLSAgICAgIHN0YXR1cyA9IGdyZXBfY29tbWFu ZF9saW5lX2FyZyAoIi4iKTsKKyAgICAgIG9rID0gZ3JlcF9jb21tYW5kX2xpbmVfYXJnICgi LiIpOwogICAgIH0KICAgZWxzZQotICAgIHN0YXR1cyA9IGdyZXBfY29tbWFuZF9saW5lX2Fy ZyAoIi0iKTsKKyAgICBvayA9IGdyZXBfY29tbWFuZF9saW5lX2FyZyAoIi0iKTsKIAogICAv KiBXZSByZWdpc3RlciB2aWEgYXRleGl0KCkgdG8gdGVzdCBzdGRvdXQuICAqLwotICBleGl0 IChlcnJzZWVuID8gRVhJVF9UUk9VQkxFIDogc3RhdHVzKTsKKyAgZXhpdCAoZXJyc2VlbiA/ IEVYSVRfVFJPVUJMRSA6IG9rKTsKIH0KIC8qIHZpbTpzZXQgc2hpZnR3aWR0aD0yOiAqLwpk aWZmIC0tZ2l0IGEvc3JjL2dyZXAuaCBiL3NyYy9ncmVwLmgKaW5kZXggNDkzNTg3Mi4uNTQ5 NmViMiAxMDA2NDQKLS0tIGEvc3JjL2dyZXAuaAorKysgYi9zcmMvZ3JlcC5oCkBAIC0yMCwx MSArMjAsMTMgQEAKICNpZm5kZWYgR1JFUF9HUkVQX0gKICNkZWZpbmUgR1JFUF9HUkVQX0gg MQogCisjaW5jbHVkZSA8c3RkYm9vbC5oPgorCiAvKiBUaGUgZm9sbG93aW5nIGZsYWdzIGFy ZSBleHBvcnRlZCBmcm9tIGdyZXAgZm9yIHRoZSBtYXRjaGVycwogICAgdG8gbG9vayBhdC4g Ki8KLWV4dGVybiBpbnQgbWF0Y2hfaWNhc2U7CQkvKiAtaSAqLwotZXh0ZXJuIGludCBtYXRj aF93b3JkczsJCS8qIC13ICovCi1leHRlcm4gaW50IG1hdGNoX2xpbmVzOwkJLyogLXggKi8K K2V4dGVybiBib29sIG1hdGNoX2ljYXNlOwkvKiAtaSAqLworZXh0ZXJuIGJvb2wgbWF0Y2hf d29yZHM7CS8qIC13ICovCitleHRlcm4gYm9vbCBtYXRjaF9saW5lczsJLyogLXggKi8KIGV4 dGVybiB1bnNpZ25lZCBjaGFyIGVvbGJ5dGU7CS8qIC16ICovCiAKICNlbmRpZgotLSAKMS45 LjMKCg== --------------080305090207030305040600 Content-Type: text/plain; charset=UTF-8; name="0004-grep-treat-a-file-as-binary-if-its-prefix-contains-e.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0004-grep-treat-a-file-as-binary-if-its-prefix-contains-e.pa"; filename*1="tch" RnJvbSBmNzg0YTczYTAxYjgyMzEwOWQ2NjBhYThkMjU2NTM1NjIzZTk4OTcxIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTdW4sIDE0IFNlcCAyMDE0IDEzOjQ5OjE4IC0wNzAwClN1YmplY3Q6IFtQUk9Q T1NFRCBQQVRDSCA0LzZdIGdyZXA6IHRyZWF0IGEgZmlsZSBhcyBiaW5hcnkgaWYgaXRzIHBy ZWZpeAogY29udGFpbnMgZW5jb2RpbmcgZXJyb3JzCgoqIE5FV1M6CiogZG9jL2dyZXAudGV4 aSAoRmlsZSBhbmQgRGlyZWN0b3J5IFNlbGVjdGlvbik6CkRvY3VtZW50IHRoaXMuCiogc3Jj L2dyZXAuYyAoYnVmZmVyX2VuY29kaW5nLCBidWZmZXJfdGV4dGJpbik6IE5ldyBmdW5jdGlv bnMuCihmaWxlX3RleHRiaW4pOiBSZW5hbWUgZnJvbSBmaWxlX2lzX2JpbmFyeS4gIE5vdyBy ZXR1cm5zIDMtd2F5IHZhbHVlLgpBbGwgY2FsbGVycyBjaGFuZ2VkLgooZmlsZV90ZXh0Ymlu LCBncmVwKTogQ2hlY2sgdGhlIGlucHV0IG1vcmUgY2FyZWZ1bGx5IGZvciB0ZXh0IHZzCmJp bmFyeSBkYXRhLgooY29udGFpbnNfZW5jb2RpbmdfZXJyb3IpOiBSZW1vdmU7IHVzZSByZXBs YWNlZCBieSBidWZmZXJfZW5jb2RpbmcuCiogdGVzdHMvYmFja3JlZi1tdWx0aWJ5dGUtc2xv dzoKKiB0ZXN0cy9oaWdoLWJpdC1yYW5nZToKKiB0ZXN0cy9pbnZhbGlkLW11bHRpYnl0ZS1p bmZsb29wOgpVc2UgLWEsIHNpbmNlIHRoZSBpbnB1dCBpcyBub3cgY29uc2lkZXJlZCB0byBi ZSBiaW5hcnkuCiogdGVzdHMvaW52YWxpZC1tdWx0aWJ5dGUtaW5mbG9vcDogQWRkIGEgY2hl Y2sgZm9yIG5ldyBiZWhhdmlvci4KLS0tCiBORVdTICAgICAgICAgICAgICAgICAgICAgICAg ICAgIHwgICA0ICsrCiBkb2MvZ3JlcC50ZXhpICAgICAgICAgICAgICAgICAgIHwgICAzICst CiBzcmMvZ3JlcC5jICAgICAgICAgICAgICAgICAgICAgIHwgMTI2ICsrKysrKysrKysrKysr KysrKysrKysrKysrKy0tLS0tLS0tLS0tLS0KIHRlc3RzL2JhY2tyZWYtbXVsdGlieXRlLXNs b3cgICAgfCAgIDIgKy0KIHRlc3RzL2hpZ2gtYml0LXJhbmdlICAgICAgICAgICAgfCAgIDIg Ky0KIHRlc3RzL2ludmFsaWQtbXVsdGlieXRlLWluZmxvb3AgfCAgMTQgKysrKy0KIDYgZmls ZXMgY2hhbmdlZCwgMTA2IGluc2VydGlvbnMoKyksIDQ1IGRlbGV0aW9ucygtKQoKZGlmZiAt LWdpdCBhL05FV1MgYi9ORVdTCmluZGV4IDM2YmI0OGYuLjkzNzdkN2QgMTAwNjQ0Ci0tLSBh L05FV1MKKysrIGIvTkVXUwpAQCAtNiw2ICs2LDEwIEBAIEdOVSBncmVwIE5FV1MgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAtKi0gb3V0bGluZSAtKi0KIAogICBQZXJm b3JtYW5jZSBoYXMgaW1wcm92ZWQgZm9yIHZlcnkgbG9uZyBzdHJpbmdzIGluIHBhdHRlcm5z LgogCisgIElmIGEgZmlsZSBjb250YWlucyBkYXRhIGltcHJvcGVybHkgZW5jb2RlZCBmb3Ig dGhlIGN1cnJlbnQgbG9jYWxlLAorICBhbmQgdGhpcyBpcyBkaXNjb3ZlcmVkIGJlZm9yZSBh bnkgb2YgdGhlIGZpbGUncyBjb250ZW50cyBhcmUgb3V0cHV0LAorICBncmVwIG5vdyB0cmVh dHMgdGhlIGZpbGUgYXMgYmluYXJ5LgorCiAgIGdyZXAgLVAgbm8gbG9uZ2VyIHJlcG9ydHMg YW4gZXJyb3IgYW5kIGV4aXRzIHdoZW4gZ2l2ZW4gaW52YWxpZCBVVEYtOCBkYXRhLgogICBJ bnN0ZWFkLCBpdCBjb25zaWRlcnMgdGhlIGRhdGEgdG8gYmUgbm9uLW1hdGNoaW5nLgogCmRp ZmYgLS1naXQgYS9kb2MvZ3JlcC50ZXhpIGIvZG9jL2dyZXAudGV4aQppbmRleCBjOGU0YWNk Li4xNGJkNjllIDEwMDY0NAotLS0gYS9kb2MvZ3JlcC50ZXhpCisrKyBiL2RvYy9ncmVwLnRl eGkKQEAgLTU5Miw3ICs1OTIsOCBAQCB0aGlzIGlzIGVxdWl2YWxlbnQgdG8gdGhlIEBzYW1w ey0tYmluYXJ5LWZpbGVzPXRleHR9IG9wdGlvbi4KIEBpdGVtIC0tYmluYXJ5LWZpbGVzPUB2 YXJ7dHlwZX0KIEBvcGluZGV4IC0tYmluYXJ5LWZpbGVzCiBAY2luZGV4IGJpbmFyeSBmaWxl cwotSWYgYSBmaWxlJ3MgYWxsb2NhdGlvbiBtZXRhZGF0YSBvciBpdHMgZmlyc3QgZmV3IGJ5 dGVzCitJZiBhIGZpbGUncyBhbGxvY2F0aW9uIG1ldGFkYXRhLAorb3IgaWYgaXRzIGRhdGEg cmVhZCBiZWZvcmUgYSBsaW5lIGlzIHNlbGVjdGVkIGZvciBvdXRwdXQsCiBpbmRpY2F0ZSB0 aGF0IHRoZSBmaWxlIGNvbnRhaW5zIGJpbmFyeSBkYXRhLAogYXNzdW1lIHRoYXQgdGhlIGZp bGUgaXMgb2YgdHlwZSBAdmFye3R5cGV9LgogQnkgZGVmYXVsdCwgQHZhcnt0eXBlfSBpcyBA c2FtcHtiaW5hcnl9LApkaWZmIC0tZ2l0IGEvc3JjL2dyZXAuYyBiL3NyYy9ncmVwLmMKaW5k ZXggMWUwY2M2ZC4uY2NiYTFiNiAxMDA2NDQKLS0tIGEvc3JjL2dyZXAuYworKysgYi9zcmMv Z3JlcC5jCkBAIC00MzcsNTAgKzQzNyw3NCBAQCBjbGVhbl91cF9zdGRvdXQgKHZvaWQpCiAg ICAgY2xvc2Vfc3Rkb3V0ICgpOwogfQogCi0vKiBSZXR1cm4gdHJ1ZSBpZiBhIGZpbGUgaXMg a25vd24gdG8gYmUgYmluYXJ5IGZvciB0aGUgcHVycG9zZSBvZiAnZ3JlcCcuCisvKiBSZXR1 cm4gMSBpZiBCVUYgKG9mIHNpemUgU0laRSkgY29udGFpbnMgdGV4dCwgLTEgaWYgaXQgY29u dGFpbnMKKyAgIGJpbmFyeSBkYXRhLCBhbmQgMCBpZiB0aGUgYW5zd2VyIGRlcGVuZHMgb24g d2hhdCBjb21lcyBpbW1lZGlhdGVseQorICAgYWZ0ZXIgQlVGLiAgKi8KK3N0YXRpYyBpbnQK K2J1ZmZlcl90ZXh0YmluIChjaGFyIGNvbnN0ICpidWYsIHNpemVfdCBzaXplKQoreworICBt YnN0YXRlX3QgbWJzID0geyAwIH07CisgIHNpemVfdCBjaGFybGVuOworICBjaGFyIGJhZGJ5 dGUgPSBlb2xieXRlID8gJ1wwJyA6ICdcMjAwJzsKKyAgY2hhciBjb25zdCAqcDsKKworICBm b3IgKHAgPSBidWY7IHAgPCBidWYgKyBzaXplOyBwICs9IGNoYXJsZW4pCisgICAgeworICAg ICAgaWYgKCpwID09IGJhZGJ5dGUpCisgICAgICAgIHJldHVybiAtMTsKKyAgICAgIGNoYXJs ZW4gPSBtYnJsZW4gKHAsIGJ1ZiArIHNpemUgLSBwLCAmbWJzKTsKKyAgICAgIGlmICgoc2l6 ZV90KSAtMiA8PSBjaGFybGVuKQorICAgICAgICByZXR1cm4gY2hhcmxlbiA9PSAoc2l6ZV90 KSAtMiA/IDAgOiAtMTsKKyAgICAgIGNoYXJsZW4gKz0gIWNoYXJsZW47CisgICAgfQorCisg IHJldHVybiAxOworfQorCisvKiBSZXR1cm4gMSBpZiBhIGZpbGUgaXMga25vd24gdG8gYmUg dGV4dCBmb3IgdGhlIHB1cnBvc2Ugb2YgJ2dyZXAnLgorICAgUmV0dXJuIC0xIGlmIGl0IGlz IGtub3duIHRvIGJlIGJpbmFyeSwgMCBpZiB1bmtub3duLgogICAgQlVGLCBvZiBzaXplIEJV RlNJWkUsIGlzIHRoZSBpbml0aWFsIGJ1ZmZlciByZWFkIGZyb20gdGhlIGZpbGUgd2l0aAog ICAgZGVzY3JpcHRvciBGRCBhbmQgc3RhdHVzIFNULiAgKi8KLXN0YXRpYyBib29sCi1maWxl X2lzX2JpbmFyeSAoY2hhciBjb25zdCAqYnVmLCBzaXplX3QgYnVmc2l6ZSwgaW50IGZkLCBz dHJ1Y3Qgc3RhdCBjb25zdCAqc3QpCitzdGF0aWMgaW50CitmaWxlX3RleHRiaW4gKGNoYXIg Y29uc3QgKmJ1Ziwgc2l6ZV90IGJ1ZnNpemUsIGludCBmZCwgc3RydWN0IHN0YXQgY29uc3Qg KnN0KQogewogICAjaWZuZGVmIFNFRUtfSE9MRQogICBlbnVtIHsgU0VFS19IT0xFID0gU0VF S19FTkQgfTsKICAgI2VuZGlmCiAKLSAgLyogSWYgLXosIHRlc3Qgb25seSB3aGV0aGVyIHRo ZSBpbml0aWFsIGJ1ZmZlciBjb250YWlucyAnXDIwMCc7Ci0gICAgIGtub3dpbmcgYWJvdXQg aG9sZXMgd29uJ3QgaGVscC4gICovCi0gIGlmICghIGVvbGJ5dGUpCi0gICAgcmV0dXJuIG1l bWNociAoYnVmLCAnXDIwMCcsIGJ1ZnNpemUpICE9IDA7CisgIGludCB0ZXh0YmluID0gYnVm ZmVyX3RleHRiaW4gKGJ1ZiwgYnVmc2l6ZSk7CisgIGlmICh0ZXh0YmluIDwgMCkKKyAgICBy ZXR1cm4gdGV4dGJpbjsKIAotICAvKiBJZiB0aGUgaW5pdGlhbCBidWZmZXIgY29udGFpbnMg YSBudWxsIGJ5dGUsIGd1ZXNzIHRoYXQgdGhlIGZpbGUKLSAgICAgaXMgYmluYXJ5LiAgKi8K LSAgaWYgKG1lbWNociAoYnVmLCAnXDAnLCBidWZzaXplKSkKLSAgICByZXR1cm4gdHJ1ZTsK LQotICAvKiBJZiB0aGUgZmlsZSBoYXMgaG9sZXMsIGl0IG11c3QgY29udGFpbiBhIG51bGwg Ynl0ZSBzb21ld2hlcmUuICAqLwotICBpZiAoU0VFS19IT0xFICE9IFNFRUtfRU5EICYmIHVz YWJsZV9zdF9zaXplIChzdCkpCisgIGlmICh1c2FibGVfc3Rfc2l6ZSAoc3QpKQogICAgIHsK LSAgICAgIG9mZl90IGN1ciA9IGJ1ZnNpemU7Ci0gICAgICBpZiAoT19CSU5BUlkgfHwgZmQg PT0gU1RESU5fRklMRU5PKQotICAgICAgICB7Ci0gICAgICAgICAgY3VyID0gbHNlZWsgKGZk LCAwLCBTRUVLX0NVUik7Ci0gICAgICAgICAgaWYgKGN1ciA8IDApCi0gICAgICAgICAgICBy ZXR1cm4gZmFsc2U7Ci0gICAgICAgIH0KKyAgICAgIGlmIChzdC0+c3Rfc2l6ZSA8PSBidWZz aXplKQorICAgICAgICByZXR1cm4gMiAqIHRleHRiaW4gLSAxOwogCi0gICAgICAvKiBMb29r IGZvciBhIGhvbGUgYWZ0ZXIgdGhlIGN1cnJlbnQgbG9jYXRpb24uICAqLwotICAgICAgb2Zm X3QgaG9sZV9zdGFydCA9IGxzZWVrIChmZCwgY3VyLCBTRUVLX0hPTEUpOwotICAgICAgaWYg KDAgPD0gaG9sZV9zdGFydCkKKyAgICAgIC8qIElmIHRoZSBmaWxlIGhhcyBob2xlcywgaXQg bXVzdCBjb250YWluIGEgbnVsbCBieXRlIHNvbWV3aGVyZS4gICovCisgICAgICBpZiAoU0VF S19IT0xFICE9IFNFRUtfRU5EICYmIGVvbGJ5dGUpCiAgICAgICAgIHsKLSAgICAgICAgICBp ZiAobHNlZWsgKGZkLCBjdXIsIFNFRUtfU0VUKSA8IDApCi0gICAgICAgICAgICBzdXBwcmVz c2libGVfZXJyb3IgKGZpbGVuYW1lLCBlcnJubyk7Ci0gICAgICAgICAgaWYgKGhvbGVfc3Rh cnQgPCBzdC0+c3Rfc2l6ZSkKLSAgICAgICAgICAgIHJldHVybiB0cnVlOworICAgICAgICAg IG9mZl90IGN1ciA9IGJ1ZnNpemU7CisgICAgICAgICAgaWYgKE9fQklOQVJZIHx8IGZkID09 IFNURElOX0ZJTEVOTykKKyAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgY3VyID0gbHNl ZWsgKGZkLCAwLCBTRUVLX0NVUik7CisgICAgICAgICAgICAgIGlmIChjdXIgPCAwKQorICAg ICAgICAgICAgICAgIHJldHVybiAwOworICAgICAgICAgICAgfQorCisgICAgICAgICAgLyog TG9vayBmb3IgYSBob2xlIGFmdGVyIHRoZSBjdXJyZW50IGxvY2F0aW9uLiAgKi8KKyAgICAg ICAgICBvZmZfdCBob2xlX3N0YXJ0ID0gbHNlZWsgKGZkLCBjdXIsIFNFRUtfSE9MRSk7Cisg ICAgICAgICAgaWYgKDAgPD0gaG9sZV9zdGFydCkKKyAgICAgICAgICAgIHsKKyAgICAgICAg ICAgICAgaWYgKGxzZWVrIChmZCwgY3VyLCBTRUVLX1NFVCkgPCAwKQorICAgICAgICAgICAg ICAgIHN1cHByZXNzaWJsZV9lcnJvciAoZmlsZW5hbWUsIGVycm5vKTsKKyAgICAgICAgICAg ICAgaWYgKGhvbGVfc3RhcnQgPCBzdC0+c3Rfc2l6ZSkKKyAgICAgICAgICAgICAgICByZXR1 cm4gLTE7CisgICAgICAgICAgICB9CiAgICAgICAgIH0KICAgICB9CiAKLSAgLyogR3Vlc3Mg dGhhdCB0aGUgZmlsZSBkb2VzIG5vdCBjb250YWluIGJpbmFyeSBkYXRhLiAgKi8KLSAgcmV0 dXJuIGZhbHNlOworICByZXR1cm4gMDsKIH0KIAogLyogQ29udmVydCBTVFIgdG8gYSBub25u ZWdhdGl2ZSBpbnRlZ2VyLCBzdG9yaW5nIHRoZSByZXN1bHQgaW4gKk9VVC4KQEAgLTExMDAs NyArMTEyNCw3IEBAIHN0YXRpYyBpbnRtYXhfdAogZ3JlcCAoaW50IGZkLCBzdHJ1Y3Qgc3Rh dCBjb25zdCAqc3QpCiB7CiAgIGludG1heF90IG5saW5lcywgaTsKLSAgYm9vbCBub3RfdGV4 dDsKKyAgaW50IHRleHRiaW47CiAgIHNpemVfdCByZXNpZHVlLCBzYXZlOwogICBjaGFyIG9s ZGM7CiAgIGNoYXIgKmJlZzsKQEAgLTExMjksMTMgKzExNTMsMTggQEAgZ3JlcCAoaW50IGZk LCBzdHJ1Y3Qgc3RhdCBjb25zdCAqc3QpCiAgICAgICByZXR1cm4gMDsKICAgICB9CiAKLSAg bm90X3RleHQgPSAoKChiaW5hcnlfZmlsZXMgPT0gQklOQVJZX0JJTkFSWV9GSUxFUyAmJiAh b3V0X3F1aWV0KQotICAgICAgICAgICAgICAgfHwgYmluYXJ5X2ZpbGVzID09IFdJVEhPVVRf TUFUQ0hfQklOQVJZX0ZJTEVTKQotICAgICAgICAgICAgICAmJiBmaWxlX2lzX2JpbmFyeSAo YnVmYmVnLCBidWZsaW0gLSBidWZiZWcsIGZkLCBzdCkpOwotICBpZiAobm90X3RleHQgJiYg YmluYXJ5X2ZpbGVzID09IFdJVEhPVVRfTUFUQ0hfQklOQVJZX0ZJTEVTKQotICAgIHJldHVy biAwOwotICBkb25lX29uX21hdGNoIHw9IG5vdF90ZXh0OwotICBvdXRfcXVpZXQgfD0gbm90 X3RleHQ7CisgIGlmIChiaW5hcnlfZmlsZXMgPT0gVEVYVF9CSU5BUllfRklMRVMpCisgICAg dGV4dGJpbiA9IDE7CisgIGVsc2UKKyAgICB7CisgICAgICB0ZXh0YmluID0gZmlsZV90ZXh0 YmluIChidWZiZWcsIGJ1ZmxpbSAtIGJ1ZmJlZywgZmQsIHN0KTsKKyAgICAgIGlmICh0ZXh0 YmluIDwgMCkKKyAgICAgICAgeworICAgICAgICAgIGlmIChiaW5hcnlfZmlsZXMgPT0gV0lU SE9VVF9NQVRDSF9CSU5BUllfRklMRVMpCisgICAgICAgICAgICByZXR1cm4gMDsKKyAgICAg ICAgICBkb25lX29uX21hdGNoID0gb3V0X3F1aWV0ID0gdHJ1ZTsKKyAgICAgICAgfQorICAg IH0KIAogICBmb3IgKDs7KQogICAgIHsKQEAgLTExODcsOCArMTIxNiwxMyBAQCBncmVwIChp bnQgZmQsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKICAgICAgICAgfQogCiAgICAgICAvKiBE ZXRlY3Qgd2hldGhlciBsZWFkaW5nIGNvbnRleHQgaXMgYWRqYWNlbnQgdG8gcHJldmlvdXMg b3V0cHV0LiAgKi8KLSAgICAgIGlmIChiZWcgIT0gbGFzdG91dCkKLSAgICAgICAgbGFzdG91 dCA9IDA7CisgICAgICBpZiAobGFzdG91dCkKKyAgICAgICAgeworICAgICAgICAgIGlmICgh dGV4dGJpbikKKyAgICAgICAgICAgIHRleHRiaW4gPSAxOworICAgICAgICAgIGlmIChiZWcg IT0gbGFzdG91dCkKKyAgICAgICAgICAgIGxhc3RvdXQgPSAwOworICAgICAgICB9CiAKICAg ICAgIC8qIEhhbmRsZSBzb21lIGRldGFpbHMgYW5kIHJlYWQgbW9yZSBkYXRhIHRvIHNjYW4u ICAqLwogICAgICAgc2F2ZSA9IHJlc2lkdWUgKyBsaW0gLSBiZWc7CkBAIC0xMjAxLDYgKzEy MzUsMTYgQEAgZ3JlcCAoaW50IGZkLCBzdHJ1Y3Qgc3RhdCBjb25zdCAqc3QpCiAgICAgICAg ICAgc3VwcHJlc3NpYmxlX2Vycm9yIChmaWxlbmFtZSwgZXJybm8pOwogICAgICAgICAgIGdv dG8gZmluaXNoX2dyZXA7CiAgICAgICAgIH0KKworICAgICAgLyogSWYgdGhlIGZpbGUncyB0 ZXh0YmluIGhhcyBub3QgYmVlbiBkZXRlcm1pbmVkIHlldCwgYXNzdW1lCisgICAgICAgICBp dCdzIGJpbmFyeSBpZiB0aGUgbmV4dCBpbnB1dCBidWZmZXIgc3VnZ2VzdHMgc28uICAqLwor ICAgICAgaWYgKCEgdGV4dGJpbiAmJiBidWZmZXJfdGV4dGJpbiAoYnVmYmVnLCBidWZsaW0g LSBidWZiZWcpIDwgMCkKKyAgICAgICAgeworICAgICAgICAgIHRleHRiaW4gPSAtMTsKKyAg ICAgICAgICBpZiAoYmluYXJ5X2ZpbGVzID09IFdJVEhPVVRfTUFUQ0hfQklOQVJZX0ZJTEVT KQorICAgICAgICAgICAgcmV0dXJuIDA7CisgICAgICAgICAgZG9uZV9vbl9tYXRjaCA9IG91 dF9xdWlldCA9IHRydWU7CisgICAgICAgIH0KICAgICB9CiAgIGlmIChyZXNpZHVlKQogICAg IHsKQEAgLTEyMTQsNyArMTI1OCw3IEBAIGdyZXAgKGludCBmZCwgc3RydWN0IHN0YXQgY29u c3QgKnN0KQogIGZpbmlzaF9ncmVwOgogICBkb25lX29uX21hdGNoID0gZG9uZV9vbl9tYXRj aF8wOwogICBvdXRfcXVpZXQgPSBvdXRfcXVpZXRfMDsKLSAgaWYgKChub3RfdGV4dCAmIH5v dXRfcXVpZXQpICYmIG5saW5lcyAhPSAwKQorICBpZiAodGV4dGJpbiA8IDAgJiYgIW91dF9x dWlldCAmJiBubGluZXMgIT0gMCkKICAgICBwcmludGYgKF8oIkJpbmFyeSBmaWxlICVzIG1h dGNoZXNcbiIpLCBmaWxlbmFtZSk7CiAgIHJldHVybiBubGluZXM7CiB9CmRpZmYgLS1naXQg YS90ZXN0cy9iYWNrcmVmLW11bHRpYnl0ZS1zbG93IGIvdGVzdHMvYmFja3JlZi1tdWx0aWJ5 dGUtc2xvdwppbmRleCBmZmViYjZiLi5kNDQ3YTRhIDEwMDc1NQotLS0gYS90ZXN0cy9iYWNr cmVmLW11bHRpYnl0ZS1zbG93CisrKyBiL3Rlc3RzL2JhY2tyZWYtbXVsdGlieXRlLXNsb3cK QEAgLTIxLDcgKzIxLDcgQEAgbWF4X3NlY29uZHM9JChMQ19BTEw9QyBwZXJsIC1sZSAndXNl IFRpbWU6OkhpUmVzIHF3KHRpbWUpOyBteSAkcyA9IHRpbWUoKTsKIAogZm9yIExPQyBpbiBl bl9VUy5VVEYtODsgZG8KICAgb3V0PW91dC0kTE9DCi0gIExDX0FMTD0kTE9DIHRpbWVvdXQg JHttYXhfc2Vjb25kc31zIGdyZXAgLUUgJ14oW2Etel0pLlwxJCcgaW4gPiAkb3V0IDI+JjEK KyAgTENfQUxMPSRMT0MgdGltZW91dCAke21heF9zZWNvbmRzfXMgZ3JlcCAtYUUgJ14oW2Et el0pLlwxJCcgaW4gPiAkb3V0IDI+JjEKICAgdGVzdCAkPyA9IDAgfHwgZmFpbD0xCiAgIGNv bXBhcmUgJG91dCBpbiB8fCBmYWlsPTEKIGRvbmUKZGlmZiAtLWdpdCBhL3Rlc3RzL2hpZ2gt Yml0LXJhbmdlIGIvdGVzdHMvaGlnaC1iaXQtcmFuZ2UKaW5kZXggNzRiNmU2NS4uNzZjMzMx MCAxMDA3NTUKLS0tIGEvdGVzdHMvaGlnaC1iaXQtcmFuZ2UKKysrIGIvdGVzdHMvaGlnaC1i aXQtcmFuZ2UKQEAgLTIxLDcgKzIxLDcgQEAKIGZhaWw9MAogCiBwcmludGYgJ1wyMDFcbicg PiBpbiB8fCBmcmFtZXdvcmtfZmFpbHVyZV8KLWdyZXAgIiQocHJpbnRmICdbXDIwMV0nKSIg aW4gPiBvdXQgfHwgZmFpbD0xCitncmVwIC1hICIkKHByaW50ZiAnW1wyMDFdJykiIGluID4g b3V0IHx8IGZhaWw9MQogCiBjb21wYXJlIG91dCBpbiB8fCBmYWlsPTEKIApkaWZmIC0tZ2l0 IGEvdGVzdHMvaW52YWxpZC1tdWx0aWJ5dGUtaW5mbG9vcCBiL3Rlc3RzL2ludmFsaWQtbXVs dGlieXRlLWluZmxvb3AKaW5kZXggYjI4YmM1My4uZDdjNjE2NSAxMDA3NTUKLS0tIGEvdGVz dHMvaW52YWxpZC1tdWx0aWJ5dGUtaW5mbG9vcAorKysgYi90ZXN0cy9pbnZhbGlkLW11bHRp Ynl0ZS1pbmZsb29wCkBAIC0xNCw3ICsxNCw3IEBAIGVuY29kZSBBQSA+IGlucHV0CiBmYWls PTAKIAogIyBCZWZvcmUgMi4xNSwgdGhpcyB3b3VsZCBpbmZsb29wLgotTENfQUxMPWVuX1VT LlVURi04IHRpbWVvdXQgMyBncmVwIC1GICQoZW5jb2RlIEEpIGlucHV0ID4gb3V0CitMQ19B TEw9ZW5fVVMuVVRGLTggdGltZW91dCAzIGdyZXAgLWFGICQoZW5jb2RlIEEpIGlucHV0ID4g b3V0CiBzdGF0dXM9JD8KIGlmIHRlc3QgJHN0YXR1cyAtZXEgMDsgdGhlbgogICBjb21wYXJl IGlucHV0IG91dApAQCAtMjQsNCArMjQsMTYgQEAgZWxzZQogICB0ZXN0ICRzdGF0dXMgLWVx IDIKIGZpIHx8IGZhaWw9MQogCitlY2hvICdCaW5hcnkgZmlsZSBpbnB1dCBtYXRjaGVzJyA+ YmluYXJ5LWZpbGUtbWF0Y2hlcworCitMQ19BTEw9ZW5fVVMuVVRGLTggdGltZW91dCAzIGdy ZXAgLUYgJChlbmNvZGUgQSkgaW5wdXQgPiBvdXQKK3N0YXR1cz0kPworaWYgdGVzdCAkc3Rh dHVzIC1lcSAwOyB0aGVuCisgIGNvbXBhcmUgYmluYXJ5LWZpbGUtbWF0Y2hlcyBvdXQKK2Vs aWYgdGVzdCAkc3RhdHVzIC1lcSAxOyB0aGVuCisgIGNvbXBhcmVfZGV2X251bGxfIC9kZXYv bnVsbCBvdXQKK2Vsc2UKKyAgdGVzdCAkc3RhdHVzIC1lcSAyCitmaSB8fCBmYWlsPTEKKwog RXhpdCAkZmFpbAotLSAKMS45LjMKCg== --------------080305090207030305040600 Content-Type: text/plain; charset=UTF-8; name="0005-grep-improve-performance-for-older-glibc.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0005-grep-improve-performance-for-older-glibc.patch" RnJvbSBjYzg3ZDU4NTAyNWEyZmYzMTBiNmM1NTA5NzQ4MGQ2ZTk1MzU1N2JkIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTdW4sIDE0IFNlcCAyMDE0IDE0OjUyOjU2IC0wNzAwClN1YmplY3Q6IFtQUk9Q T1NFRCBQQVRDSCA1LzZdIGdyZXA6IGltcHJvdmUgcGVyZm9ybWFuY2UgZm9yIG9sZGVyIGds aWJjCgpnbGliYyBoYXMgYSBidWcgd2hlcmUgbWJybGVuIGFuZCBtYnJ0b3djIG1pc2hhbmRs ZSBsZW5ndGgtMCBpbnB1dHMuCldvcmtpbmcgYXJvdW5kIGl0IGluIGdudWxpYiBzbG93cyBn cmVwIGRvd24sIHNvIGRpc2FibGUgdGhlIHRlc3RzIGZvciBpdAphbmQgbWFrZSBzdXJlIGdy ZXAgd29ya3MgZXZlbiBpZiB0aGUgYnVnIGlzIHByZXNlbnQuCiogYm9vdHN0cmFwLmNvbmYg KGF2b2lkZWRfZ251bGliX21vZHVsZXMpOiBBZGQgbWJydG93Yy10ZXN0cy4KKiBjb25maWd1 cmUuYWMgKGdsX2N2X2Z1bmNfbWJydG93Y19lbXB0eV9pbnB1dCk6IEFzc3VtZSB5ZXMuCiog c3JjL3NlYXJjaHV0aWxzLmMgKG1iX25leHRfd2MpOiBEb24ndCBpbnZva2UgbWJydG93YyBv biBlbXB0eSBpbnB1dC4KLS0tCiBib290c3RyYXAuY29uZiAgICB8IDEgKwogY29uZmlndXJl LmFjICAgICAgfCA1ICsrKysrCiBzcmMvc2VhcmNodXRpbHMuYyB8IDMgKystCiAzIGZpbGVz IGNoYW5nZWQsIDggaW5zZXJ0aW9ucygrKSwgMSBkZWxldGlvbigtKQoKZGlmZiAtLWdpdCBh L2Jvb3RzdHJhcC5jb25mIGIvYm9vdHN0cmFwLmNvbmYKaW5kZXggZDgxNzFmNS4uNTBjMGFh YiAxMDA2NDQKLS0tIGEvYm9vdHN0cmFwLmNvbmYKKysrIGIvYm9vdHN0cmFwLmNvbmYKQEAg LTE3LDYgKzE3LDcgQEAKIAogYXZvaWRlZF9nbnVsaWJfbW9kdWxlcz0nCiAgIC0tYXZvaWQ9 bG9jay10ZXN0cworICAtLWF2b2lkPW1icnRvd2MtdGVzdHMKICcKIAogIyBnbnVsaWIgbW9k dWxlcyB1c2VkIGJ5IHRoaXMgcGFja2FnZS4KZGlmZiAtLWdpdCBhL2NvbmZpZ3VyZS5hYyBi L2NvbmZpZ3VyZS5hYwppbmRleCAzMzE1ODU1Li40ZDA2OWI4IDEwMDY0NAotLS0gYS9jb25m aWd1cmUuYWMKKysrIGIvY29uZmlndXJlLmFjCkBAIC04Myw2ICs4MywxMSBAQCBBQ19QUk9H X0NDCiBnbF9FQVJMWQogQUNfUFJPR19SQU5MSUIKIAorIyBncmVwIG5ldmVyIGludm9rZXMg bWJydG93YyBvciBtYnJsZW4gb24gZW1wdHkgaW5wdXQsCisjIHNvIGRvbid0IHdvcnJ5IGFi b3V0IHRoaXMgY29tbW9uIGJ1ZywKKyMgYXMgd29ya2luZyBhcm91bmQgaXQgd291bGQgbWVy ZWx5IHNsb3cgZ3JlcCBkb3duLgorZ2xfY3ZfZnVuY19tYnJ0b3djX2VtcHR5X2lucHV0PSdh c3N1bWUgeWVzJworCiBkbmwgQ2hlY2tzIGZvciB0eXBlZGVmcywgc3RydWN0dXJlcywgYW5k IGNvbXBpbGVyIGNoYXJhY3RlcmlzdGljcy4KIEFDX1RZUEVfU0laRV9UCiBBQ19DX0NPTlNU CmRpZmYgLS1naXQgYS9zcmMvc2VhcmNodXRpbHMuYyBiL3NyYy9zZWFyY2h1dGlscy5jCmlu ZGV4IDVlYjlhMTIuLjE4ZGQ1ODQgMTAwNjQ0Ci0tLSBhL3NyYy9zZWFyY2h1dGlscy5jCisr KyBiL3NyYy9zZWFyY2h1dGlscy5jCkBAIC0yODUsNSArMjg1LDYgQEAgbWJfbmV4dF93YyAo Y2hhciBjb25zdCAqY3VyLCBjaGFyIGNvbnN0ICplbmQpCiB7CiAgIHdjaGFyX3Qgd2M7CiAg IG1ic3RhdGVfdCBtYnMgPSB7IDAgfTsKLSAgcmV0dXJuIG1icnRvd2MgKCZ3YywgY3VyLCBl bmQgLSBjdXIsICZtYnMpIDwgKHNpemVfdCkgLTIgPyB3YyA6IFdFT0Y7CisgIHJldHVybiAo ZW5kIC0gY3VyICE9IDAgJiYgbWJydG93YyAoJndjLCBjdXIsIGVuZCAtIGN1ciwgJm1icykg PCAoc2l6ZV90KSAtMgorICAgICAgICAgID8gd2MgOiBXRU9GKTsKIH0KLS0gCjEuOS4zCgo= --------------080305090207030305040600 Content-Type: text/plain; charset=UTF-8; name="0006-grep-use-mbclen-cache-more-effectively.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0006-grep-use-mbclen-cache-more-effectively.patch" RnJvbSAwNGM2NTUyYTI3OWM3YThkNTY1YWFlOWE3ZmZlYTBiNzUxNjg5MDUyIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTdW4sIDE0IFNlcCAyMDE0IDE3OjM3OjU4IC0wNzAwClN1YmplY3Q6IFtQUk9Q T1NFRCBQQVRDSCA2LzZdIGdyZXA6IHVzZSBtYmNsZW4gY2FjaGUgbW9yZSBlZmZlY3RpdmVs eQoKKiBzcmMvZ3JlcC5jIChidWZmZXJfdGV4dGJpbiwgY29udGFpbnNfZW5jb2RpbmdfZXJy b3IpOgpVc2UgbWJfY2xlbiBmb3Igc3BlZWQuCihidWZmZXJfdGV4dGJpbik6IEJ5cGFzcyBt Yl9jbGVuIGluIHVuaWJ5dGUgbG9jYWxlcy4KKG1haW4pOiBBbHdheXMgaW5pdGlhbGl6ZSB0 aGUgY2FjaGUsIHNpbmNlIGl0J3Mgc29tZXRpbWVzIHVzZWQgaW4KdW5pYnl0ZSBsb2NhbGVz IG5vdy4gIEluaXRpYWxpemUgaXQgYmVmb3JlIGNvbnRhaW5zX2VuY29kaW5nX2Vycm9yCm1p Z2h0IGJlIGNhbGxlZC4KKiBzcmMvc2VhcmNoLmggKFNFQVJDSF9JTkxJTkUpOiBOZXcgbWFj cm8uCihtYmNsZW5fY2FjaGUpOiBOb3cgZXh0ZXJuIGRlY2wuCihtYl9jbGVuKTogTmV3IGlu bGluZSBmdW5jdGlvbi4KKiBzcmMvc2VhcmNodXRpbHMuYyAoU0VBUkNIX0lOTElORSwgU1lT VEVNX0lOTElORSk6IERlZmluZS4KKG1iY2xlbl9jYWNoZSk6IE5vdyBleHRlcm4uCihidWls ZF9tYmNsZW5fY2FjaGUpOiBQdXQgMSBpbnRvIHRoZSBjYWNoZSB3aGVuIG1icmxlbiByZXR1 cm5zIDAuCihtYl9nb2JhY2spOiBVc2UgbWJfbGVuIGZvciBzcGVlZCwgYW5kIHJlbHkgb24g aXQgcmV0dXJuaW5nIG5vbnplcm8uCiogc3JjL3N5c3RlbS5oIChTWVNURU1fSU5MSU5FKTog TmV3IG1hY3JvLgoodG9fdWNoYXIpOiBVc2UgaXQuCi0tLQogc3JjL2dyZXAuYyAgICAgICAg fCAzOCArKysrKysrKysrKysrKysrKysrKystLS0tLS0tLS0tLS0tLS0tLQogc3JjL3NlYXJj aC5oICAgICAgfCAxOSArKysrKysrKysrKysrKysrKysrCiBzcmMvc2VhcmNodXRpbHMuYyB8 IDI2ICsrKysrKysrKysrKysrLS0tLS0tLS0tLS0tCiBzcmMvc3lzdGVtLmggICAgICB8ICA5 ICsrKysrKysrLQogNCBmaWxlcyBjaGFuZ2VkLCA2MiBpbnNlcnRpb25zKCspLCAzMCBkZWxl dGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zcmMvZ3JlcC5jIGIvc3JjL2dyZXAuYwppbmRleCBj Y2JhMWI2Li43MmE4MTFlIDEwMDY0NAotLS0gYS9zcmMvZ3JlcC5jCisrKyBiL3NyYy9ncmVw LmMKQEAgLTQ0MywyMiArNDQzLDI3IEBAIGNsZWFuX3VwX3N0ZG91dCAodm9pZCkKIHN0YXRp YyBpbnQKIGJ1ZmZlcl90ZXh0YmluIChjaGFyIGNvbnN0ICpidWYsIHNpemVfdCBzaXplKQog ewotICBtYnN0YXRlX3QgbWJzID0geyAwIH07Ci0gIHNpemVfdCBjaGFybGVuOwogICBjaGFy IGJhZGJ5dGUgPSBlb2xieXRlID8gJ1wwJyA6ICdcMjAwJzsKLSAgY2hhciBjb25zdCAqcDsK IAotICBmb3IgKHAgPSBidWY7IHAgPCBidWYgKyBzaXplOyBwICs9IGNoYXJsZW4pCisgIGlm IChNQl9DVVJfTUFYIDw9IDEpCisgICAgcmV0dXJuIG1lbWNociAoYnVmLCBiYWRieXRlLCBz aXplKSA/IC0xIDogMTsKKyAgZWxzZQogICAgIHsKLSAgICAgIGlmICgqcCA9PSBiYWRieXRl KQotICAgICAgICByZXR1cm4gLTE7Ci0gICAgICBjaGFybGVuID0gbWJybGVuIChwLCBidWYg KyBzaXplIC0gcCwgJm1icyk7Ci0gICAgICBpZiAoKHNpemVfdCkgLTIgPD0gY2hhcmxlbikK LSAgICAgICAgcmV0dXJuIGNoYXJsZW4gPT0gKHNpemVfdCkgLTIgPyAwIDogLTE7Ci0gICAg ICBjaGFybGVuICs9ICFjaGFybGVuOwotICAgIH0KKyAgICAgIG1ic3RhdGVfdCBtYnMgPSB7 IDAgfTsKKyAgICAgIHNpemVfdCBjbGVuOworICAgICAgY2hhciBjb25zdCAqcDsKIAotICBy ZXR1cm4gMTsKKyAgICAgIGZvciAocCA9IGJ1ZjsgcCA8IGJ1ZiArIHNpemU7IHAgKz0gY2xl bikKKyAgICAgICAgeworICAgICAgICAgIGlmICgqcCA9PSBiYWRieXRlKQorICAgICAgICAg ICAgcmV0dXJuIC0xOworICAgICAgICAgIGNsZW4gPSBtYl9jbGVuIChwLCBidWYgKyBzaXpl IC0gcCwgJm1icyk7CisgICAgICAgICAgaWYgKChzaXplX3QpIC0yIDw9IGNsZW4pCisgICAg ICAgICAgICByZXR1cm4gY2xlbiA9PSAoc2l6ZV90KSAtMiA/IDAgOiAtMTsKKyAgICAgICAg fQorCisgICAgICByZXR1cm4gMTsKKyAgICB9CiB9CiAKIC8qIFJldHVybiAxIGlmIGEgZmls ZSBpcyBrbm93biB0byBiZSB0ZXh0IGZvciB0aGUgcHVycG9zZSBvZiAnZ3JlcCcuCkBAIC0x ODg3LDkgKzE4OTIsOSBAQCBjb250YWluc19lbmNvZGluZ19lcnJvciAoY2hhciBjb25zdCAq cGF0LCBzaXplX3QgcGF0bGVuKQogICBtYnN0YXRlX3QgbWJzID0geyAwIH07CiAgIHNpemVf dCBpLCBjaGFybGVuOwogCi0gIGZvciAoaSA9IDA7IGkgPCBwYXRsZW47IGkgKz0gY2hhcmxl biArIChjaGFybGVuID09IDApKQorICBmb3IgKGkgPSAwOyBpIDwgcGF0bGVuOyBpICs9IGNo YXJsZW4pCiAgICAgewotICAgICAgY2hhcmxlbiA9IG1icmxlbiAocGF0ICsgaSwgcGF0bGVu IC0gaSwgJm1icyk7CisgICAgICBjaGFybGVuID0gbWJfY2xlbiAocGF0ICsgaSwgcGF0bGVu IC0gaSwgJm1icyk7CiAgICAgICBpZiAoKHNpemVfdCkgLTIgPD0gY2hhcmxlbikKICAgICAg ICAgcmV0dXJuIHRydWU7CiAgICAgfQpAQCAtMjMzMiw2ICsyMzM3LDggQEAgbWFpbiAoaW50 IGFyZ2MsIGNoYXIgKiphcmd2KQogICBlbHNlCiAgICAgdXNhZ2UgKEVYSVRfVFJPVUJMRSk7 CiAKKyAgYnVpbGRfbWJjbGVuX2NhY2hlICgpOworCiAgIC8qIElmIGZncmVwIGluIGEgbXVs dGlieXRlIGxvY2FsZSwgdGhlbiB1c2UgZ3JlcCBpZiBlaXRoZXIKICAgICAgKDEpIGNhc2Ug aXMgaWdub3JlZCAod2hlcmUgZ3JlcCBpcyB0eXBpY2FsbHkgZmFzdGVyKSwgb3IKICAgICAg KDIpIHRoZSBwYXR0ZXJuIGhhcyBhbiBlbmNvZGluZyBlcnJvciAod2hlcmUgZmdyZXAgbWln aHQgbm90IHdvcmspLiAgKi8KQEAgLTIzNDksOSArMjM1Niw2IEBAIG1haW4gKGludCBhcmdj LCBjaGFyICoqYXJndikKICAgICAgIGV4ZWN1dGUgPSBFR2V4ZWN1dGU7CiAgICAgfQogCi0g IGlmIChNQl9DVVJfTUFYID4gMSkKLSAgICBidWlsZF9tYmNsZW5fY2FjaGUgKCk7Ci0KICAg Y29tcGlsZSAoa2V5cywga2V5Y2MpOwogICBmcmVlIChrZXlzKTsKIApkaWZmIC0tZ2l0IGEv c3JjL3NlYXJjaC5oIGIvc3JjL3NlYXJjaC5oCmluZGV4IDE0ODc3YmMuLjNmMTBhNDcgMTAw NjQ0Ci0tLSBhL3NyYy9zZWFyY2guaAorKysgYi9zcmMvc2VhcmNoLmgKQEAgLTM0LDYgKzM0 LDExIEBACiAjaW5jbHVkZSAia3dzZXQuaCIKICNpbmNsdWRlICJ4YWxsb2MuaCIKIAorX0dM X0lOTElORV9IRUFERVJfQkVHSU4KKyNpZm5kZWYgU0VBUkNIX0lOTElORQorIyBkZWZpbmUg U0VBUkNIX0lOTElORSBfR0xfSU5MSU5FCisjZW5kaWYKKwogLyogVGhpcyBtdXN0IGJlIGEg c2lnbmVkIHR5cGUuICBFYWNoIHZhbHVlIGlzIHRoZSBkaWZmZXJlbmNlIGluIHRoZSBzaXpl CiAgICBvZiBhIGNoYXJhY3RlciAoaW4gYnl0ZXMpIGluZHVjZWQgYnkgY29udmVydGluZyB0 byBsb3dlciBjYXNlLgogICAgVGhlIHZhc3QgbWFqb3JpdHkgb2YgdmFsdWVzIGFyZSAwLCBi dXQgYSBmZXcgYXJlIDEgb3IgLTEsIHNvCkBAIC00NSw2ICs1MCw3IEBAIGV4dGVybiB2b2lk IGt3c2luaXQgKGt3c2V0X3QgKik7CiAKIGV4dGVybiBjaGFyICptYnRvdXBwZXIgKGNoYXIg Y29uc3QgKiwgc2l6ZV90ICosIG1iX2xlbl9tYXBfdCAqKik7CiBleHRlcm4gdm9pZCBidWls ZF9tYmNsZW5fY2FjaGUgKHZvaWQpOworZXh0ZXJuIHNpemVfdCBtYmNsZW5fY2FjaGVbXTsK IGV4dGVybiBwdHJkaWZmX3QgbWJfZ29iYWNrIChjaGFyIGNvbnN0ICoqLCBjaGFyIGNvbnN0 ICosIGNoYXIgY29uc3QgKik7CiBleHRlcm4gd2ludF90IG1iX3ByZXZfd2MgKGNoYXIgY29u c3QgKiwgY2hhciBjb25zdCAqLCBjaGFyIGNvbnN0ICopOwogZXh0ZXJuIHdpbnRfdCBtYl9u ZXh0X3djIChjaGFyIGNvbnN0ICosIGNoYXIgY29uc3QgKik7CkBAIC02MSw0ICs2NywxNyBA QCBleHRlcm4gc2l6ZV90IEZleGVjdXRlIChjaGFyIGNvbnN0ICosIHNpemVfdCwgc2l6ZV90 ICosIGNoYXIgY29uc3QgKik7CiBleHRlcm4gdm9pZCBQY29tcGlsZSAoY2hhciBjb25zdCAq LCBzaXplX3QpOwogZXh0ZXJuIHNpemVfdCBQZXhlY3V0ZSAoY2hhciBjb25zdCAqLCBzaXpl X3QsIHNpemVfdCAqLCBjaGFyIGNvbnN0ICopOwogCisvKiBSZXR1cm4gdGhlIG51bWJlciBv ZiBieXRlcyBpbiB0aGUgY2hhcmFjdGVyIGF0IHRoZSBzdGFydCBvZiBTLCB3aGljaAorICAg aXMgb2Ygc2l6ZSBOLiAgTiBtdXN0IGJlIHBvc2l0aXZlLiAgTUJTIGlzIHRoZSBjb252ZXJz aW9uIHN0YXRlLgorICAgVGhpcyBhY3RzIGxpa2UgbWJybGVuLCBleGNlcHQgaXQgcmV0dXJu cyAxIHdoZW4gbWJybGVuIHdvdWxkIHJldHVybiAwLAorICAgYW5kIGl0IGlzIHR5cGljYWxs eSBmYXN0ZXIgYmVjYXVzZSBvZiB0aGUgY2FjaGUuICAqLworU0VBUkNIX0lOTElORSBzaXpl X3QKK21iX2NsZW4gKGNoYXIgY29uc3QgKnMsIHNpemVfdCBuLCBtYnN0YXRlX3QgKm1icykK K3sKKyAgc2l6ZV90IGxlbiA9IG1iY2xlbl9jYWNoZVt0b191Y2hhciAoKnMpXTsKKyAgcmV0 dXJuIGxlbiA9PSAoc2l6ZV90KSAtMiA/IG1icmxlbiAocywgbiwgbWJzKSA6IGxlbjsKK30K KworX0dMX0lOTElORV9IRUFERVJfRU5ECisKICNlbmRpZiAvKiBHUkVQX1NFQVJDSF9IICov CmRpZmYgLS1naXQgYS9zcmMvc2VhcmNodXRpbHMuYyBiL3NyYy9zZWFyY2h1dGlscy5jCmlu ZGV4IDE4ZGQ1ODQuLjllZGM3ODUgMTAwNjQ0Ci0tLSBhL3NyYy9zZWFyY2h1dGlscy5jCisr KyBiL3NyYy9zZWFyY2h1dGlscy5jCkBAIC0xNywxMiArMTcsMTYgQEAKICAgIDAyMTEwLTEz MDEsIFVTQS4gICovCiAKICNpbmNsdWRlIDxjb25maWcuaD4KLSNpbmNsdWRlIDxhc3NlcnQu aD4KKworI2RlZmluZSBTRUFSQ0hfSU5MSU5FIF9HTF9FWFRFUk5fSU5MSU5FCisjZGVmaW5l IFNZU1RFTV9JTkxJTkUgX0dMX0VYVEVSTl9JTkxJTkUKICNpbmNsdWRlICJzZWFyY2guaCIK IAorI2luY2x1ZGUgPGFzc2VydC5oPgorCiAjZGVmaW5lIE5DSEFSIChVQ0hBUl9NQVggKyAx KQogCi1zdGF0aWMgc2l6ZV90IG1iY2xlbl9jYWNoZVtOQ0hBUl07CitzaXplX3QgbWJjbGVu X2NhY2hlW05DSEFSXTsKIAogdm9pZAoga3dzaW5pdCAoa3dzZXRfdCAqa3dzZXQpCkBAIC0y MTgsNyArMjIyLDggQEAgYnVpbGRfbWJjbGVuX2NhY2hlICh2b2lkKQogICAgICAgY2hhciBj ID0gaTsKICAgICAgIHVuc2lnbmVkIGNoYXIgdWMgPSBpOwogICAgICAgbWJzdGF0ZV90IG1i cyA9IHsgMCB9OwotICAgICAgbWJjbGVuX2NhY2hlW3VjXSA9IG1icmxlbiAoJmMsIDEsICZt YnMpOworICAgICAgc2l6ZV90IGxlbiA9IG1icmxlbiAoJmMsIDEsICZtYnMpOworICAgICAg bWJjbGVuX2NhY2hlW3VjXSA9IGxlbiA/IGxlbiA6IDE7CiAgICAgfQogfQogCkBAIC0yNDQs MjAgKzI0OSwxNyBAQCBtYl9nb2JhY2sgKGNoYXIgY29uc3QgKiptYl9zdGFydCwgY2hhciBj b25zdCAqY3VyLCBjaGFyIGNvbnN0ICplbmQpCiAKICAgd2hpbGUgKHAgPCBjdXIpCiAgICAg ewotICAgICAgc2l6ZV90IG1iY2xlbiA9IG1iY2xlbl9jYWNoZVt0b191Y2hhciAoKnApXTsK LQotICAgICAgaWYgKG1iY2xlbiA9PSAoc2l6ZV90KSAtMikKLSAgICAgICAgbWJjbGVuID0g bWJybGVuIChwLCBlbmQgLSBwLCAmY3VyX3N0YXRlKTsKKyAgICAgIHNpemVfdCBjbGVuID0g bWJfY2xlbiAocCwgZW5kIC0gcCwgJmN1cl9zdGF0ZSk7CiAKLSAgICAgIGlmICghICgwIDwg bWJjbGVuICYmIG1iY2xlbiA8IChzaXplX3QpIC0yKSkKKyAgICAgIGlmICgoc2l6ZV90KSAt MiA8PSBjbGVuKQogICAgICAgICB7Ci0gICAgICAgICAgLyogQW4gaW52YWxpZCBzZXF1ZW5j ZSwgb3IgYSB0cnVuY2F0ZWQgbXVsdGlieXRlIGNoYXJhY3Rlciwgb3IKLSAgICAgICAgICAg ICBhIG51bGwgd2lkZSBjaGFyYWN0ZXIuICBUcmVhdCBpdCBhcyBhIHNpbmdsZSBieXRlIGNo YXJhY3Rlci4gICovCi0gICAgICAgICAgbWJjbGVuID0gMTsKKyAgICAgICAgICAvKiBBbiBp bnZhbGlkIHNlcXVlbmNlLCBvciBhIHRydW5jYXRlZCBtdWx0aWJ5dGUgY2hhcmFjdGVyLgor ICAgICAgICAgICAgIFRyZWF0IGl0IGFzIGEgc2luZ2xlIGJ5dGUgY2hhcmFjdGVyLiAgKi8K KyAgICAgICAgICBjbGVuID0gMTsKICAgICAgICAgICBtZW1zZXQgKCZjdXJfc3RhdGUsIDAs IHNpemVvZiBjdXJfc3RhdGUpOwogICAgICAgICB9CiAgICAgICBwMCA9IHA7Ci0gICAgICBw ICs9IG1iY2xlbjsKKyAgICAgIHAgKz0gY2xlbjsKICAgICB9CiAKICAgKm1iX3N0YXJ0ID0g cDsKZGlmZiAtLWdpdCBhL3NyYy9zeXN0ZW0uaCBiL3NyYy9zeXN0ZW0uaAppbmRleCA3ZGEx ZDhkLi5iYWMyNjIzIDEwMDY0NAotLS0gYS9zcmMvc3lzdGVtLmgKKysrIGIvc3JjL3N5c3Rl bS5oCkBAIC00OSwxNSArNDksMjIgQEAgZW51bSB7IEVYSVRfVFJPVUJMRSA9IDIgfTsKIAog I2luY2x1ZGUgInVubG9ja2VkLWlvLmgiCiAKK19HTF9JTkxJTkVfSEVBREVSX0JFR0lOCisj aWZuZGVmIFNZU1RFTV9JTkxJTkUKKyMgZGVmaW5lIFNZU1RFTV9JTkxJTkUgX0dMX0lOTElO RQorI2VuZGlmCisKICNkZWZpbmUgU1RSRVEoYSwgYikgKHN0cmNtcCAoYSwgYikgPT0gMCkK IAogLyogQ29udmVydCBhIHBvc3NpYmx5LXNpZ25lZCBjaGFyYWN0ZXIgdG8gYW4gdW5zaWdu ZWQgY2hhcmFjdGVyLiAgVGhpcyBpcwogICAgYSBiaXQgc2FmZXIgdGhhbiBjYXN0aW5nIHRv IHVuc2lnbmVkIGNoYXIsIHNpbmNlIGl0IGNhdGNoZXMgc29tZSB0eXBlCiAgICBlcnJvcnMg dGhhdCB0aGUgY2FzdCBkb2Vzbid0LiAgKi8KLXN0YXRpYyBpbmxpbmUgdW5zaWduZWQgY2hh cgorU1lTVEVNX0lOTElORSB1bnNpZ25lZCBjaGFyCiB0b191Y2hhciAoY2hhciBjaCkKIHsK ICAgcmV0dXJuIGNoOwogfQogCitfR0xfSU5MSU5FX0hFQURFUl9FTkQKKwogI2VuZGlmCi0t IAoxLjkuMwoK --------------080305090207030305040600-- From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 16 21:28:32 2014 Received: (at 18266) by debbugs.gnu.org; 17 Sep 2014 01:28:32 +0000 Received: from localhost ([127.0.0.1]:42803 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XU42p-0001Pc-Md for submit@debbugs.gnu.org; Tue, 16 Sep 2014 21:28:32 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:57500) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XU42m-0001PT-9m for 18266@debbugs.gnu.org; Tue, 16 Sep 2014 21:28:29 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id EE945A60005; Tue, 16 Sep 2014 18:28:26 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id plibnO2AtMvs; Tue, 16 Sep 2014 18:28:22 -0700 (PDT) Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 38D6839E8014; Tue, 16 Sep 2014 18:28:22 -0700 (PDT) Message-ID: <5418E3B2.9090404@cs.ucla.edu> Date: Tue, 16 Sep 2014 18:28:18 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Vincent Lefevre Subject: Re: bug#18266: handling bytes not part of the charset, and other garbage References: <20140912003659.GA18162@xvii.vinc17.org> <5412496D.8050805@cs.ucla.edu> <20140912014124.GA4404@xvii.vinc17.org> <541267D4.7030605@cs.ucla.edu> <20140912082916.GD4404@xvii.vinc17.org> <54131C6D.4070503@cs.ucla.edu> <20140912212939.GJ4404@xvii.vinc17.org> <54136817.5040309@cs.ucla.edu> <20140912224033.GM4404@xvii.vinc17.org> <54139683.5010302@cs.ucla.edu> <20140913011740.GN4404@xvii.vinc17.org> <5413A7BC.2080801@cs.ucla.edu> <541679EC.5000701@cs.ucla.edu> In-Reply-To: <541679EC.5000701@cs.ucla.edu> Content-Type: multipart/mixed; boundary="------------030607050500070706010400" X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 18266 Cc: 18266@debbugs.gnu.org, 758105@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) This is a multi-part message in MIME format. --------------030607050500070706010400 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Paul Eggert wrote: > Attached are some proposed patches which should improve the performance > of grep -P when applied to binary files, among other things. I have > some other ideas for boosting performance further but thought I'd > publish these first. I pushed those patches, along with the attached further patches to fix up some porting glitches and bugs I encountered in subsequent testing. I plan to follow up soon on Bug#18454 with more performance-related patches in this area. --------------030607050500070706010400 Content-Type: text/plain; charset=UTF-8; name="0007-grep-avoid-false-alarms-for-mb_clen-and-to_uchar.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0007-grep-avoid-false-alarms-for-mb_clen-and-to_uchar.patch" RnJvbSA1M2M1ZDlmZDUwYjY4OTViODg2YzFkMTlkMDg1MTU2MmZjMDNlMDBjIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBUdWUsIDE2IFNlcCAyMDE0IDE3OjI5OjQwIC0wNzAwClN1YmplY3Q6IFtQQVRD SCAwNy8xMF0gZ3JlcDogYXZvaWQgZmFsc2UgYWxhcm1zIGZvciBtYl9jbGVuIGFuZCB0b191 Y2hhcgoKKiBjZmcubWsgKF9nbF9UU191bm1hcmtlZF9leHRlcm5fZnVuY3Rpb25zKTogTmV3 IHZhciwKdG8gYnlwYXNzIHRoZSB0aWdodF9zY29wZSBmYWxzZSBhbGFybXMgb24gbWJfY2xl biBhbmQgdG9fdWNoYXIuCi0tLQogY2ZnLm1rIHwgNCArKysrCiAxIGZpbGUgY2hhbmdlZCwg NCBpbnNlcnRpb25zKCspCgpkaWZmIC0tZ2l0IGEvY2ZnLm1rIGIvY2ZnLm1rCmluZGV4IDk0 N2QxODQuLjMzMTZiNWQgMTAwNjQ0Ci0tLSBhL2NmZy5taworKysgYi9jZmcubWsKQEAgLTI4 LDYgKzI4LDEwIEBAIGxvY2FsLWNoZWNrcy10by1za2lwID0JCQlcCiAjIFRvb2xzIHVzZWQg dG8gYm9vdHN0cmFwIHRoaXMgcGFja2FnZSwgdXNlZCBmb3IgImFubm91bmNlbWVudCIuCiBi b290c3RyYXAtdG9vbHMgPSBhdXRvY29uZixhdXRvbWFrZSxnbnVsaWIKIAorIyBUaGUgdGln aHRfc2NvcGUgdGVzdCBnZXRzIGNvbmZ1c2VkIGFib3V0IGlubGluZSBmdW5jdGlvbnMuCisj IGxpa2UgJ3RvX3VjaGFyJy4KK19nbF9UU191bm1hcmtlZF9leHRlcm5fZnVuY3Rpb25zID0g bWFpbiB1c2FnZSBtYl9jbGVuIHRvX3VjaGFyCisKICMgTm93IHRoYXQgd2UgaGF2ZSBiZXR0 ZXIgdGVzdHMsIG1ha2UgdGhpcyB0aGUgZGVmYXVsdC4KIGV4cG9ydCBWRVJCT1NFID0geWVz CiAKLS0gCjEuOS4zCgo= --------------030607050500070706010400 Content-Type: text/plain; charset=UTF-8; name="0008-grep-use-mbclen-cache-in-one-more-place.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0008-grep-use-mbclen-cache-in-one-more-place.patch" RnJvbSA0OTNkZGVjMmU2MWQ0ODk1MzYwMDU3NTg5NmE1ZDNjZTFkMWE1ODJiIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBNb24sIDE1IFNlcCAyMDE0IDIyOjI1OjIxIC0wNzAwClN1YmplY3Q6IFtQQVRD SCAwOC8xMF0gZ3JlcDogdXNlIG1iY2xlbiBjYWNoZSBpbiBvbmUgbW9yZSBwbGFjZQoKKiBz cmMvZ3JlcC5jIChmZ3JlcF90b19ncmVwX3BhdHRlcm4pOiBVc2UgbWJfY2xlbiBoZXJlLCB0 b28uCi0tLQogc3JjL2dyZXAuYyB8IDMgKy0tCiAxIGZpbGUgY2hhbmdlZCwgMSBpbnNlcnRp b24oKyksIDIgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2dyZXAuYyBiL3NyYy9n cmVwLmMKaW5kZXggNzJhODExZS4uZTQzNzliYyAxMDA2NDQKLS0tIGEvc3JjL2dyZXAuYwor KysgYi9zcmMvZ3JlcC5jCkBAIC0xOTEyLDggKzE5MTIsNyBAQCBmZ3JlcF90b19ncmVwX3Bh dHRlcm4gKHNpemVfdCBsZW4sIGNoYXIgY29uc3QgKmtleXMsCiAKICAgZm9yICg7IGxlbjsg a2V5cyArPSBuLCBsZW4gLT0gbikKICAgICB7Ci0gICAgICB3Y2hhcl90IHdjOwotICAgICAg biA9IG1icnRvd2MgKCZ3Yywga2V5cywgbGVuLCAmbWJfc3RhdGUpOworICAgICAgbiA9IG1i X2NsZW4gKGtleXMsIGxlbiwgJm1iX3N0YXRlKTsKICAgICAgIHN3aXRjaCAobikKICAgICAg ICAgewogICAgICAgICBjYXNlIChzaXplX3QpIC0yOgotLSAKMS45LjMKCg== --------------030607050500070706010400 Content-Type: text/plain; charset=UTF-8; name="0009-grep-port-P-speedup-to-hosts-lacking-PCRE_STUDY_JIT_.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0009-grep-port-P-speedup-to-hosts-lacking-PCRE_STUDY_JIT_.pa"; filename*1="tch" RnJvbSAyMTlmMTA1OTZjMTdlMzhiMjcxNjY3M2ExNDBjMmIzODI3NTQ5ODYyIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBNb24sIDE1IFNlcCAyMDE0IDE3OjI3OjU4IC0wNzAwClN1YmplY3Q6IFtQQVRD SCAwOS8xMF0gZ3JlcDogcG9ydCAtUCBzcGVlZHVwIHRvIGhvc3RzIGxhY2tpbmcKIFBDUkVf U1RVRFlfSklUX0NPTVBJTEUKCiogc3JjL3BjcmVzZWFyY2guYyAoUGNvbXBpbGUpOiBEbyBu b3QgYXNzdW1lIHRoYXQKUENSRV9TVFVEWV9KSVRfQ09NUElMRSBpcyBkZWZpbmVkLgooZW1w dHlfbWF0Y2gpOiBEZWZpbmUgb24gYWxsIHBsYXRmb3Jtcy4KLS0tCiBzcmMvcGNyZXNlYXJj aC5jIHwgMTQgKysrKysrKy0tLS0tLS0KIDEgZmlsZSBjaGFuZ2VkLCA3IGluc2VydGlvbnMo KyksIDcgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL3BjcmVzZWFyY2guYyBiL3Ny Yy9wY3Jlc2VhcmNoLmMKaW5kZXggOTU4NzdlMy4uY2U2NTc1OCAxMDA2NDQKLS0tIGEvc3Jj L3BjcmVzZWFyY2guYworKysgYi9zcmMvcGNyZXNlYXJjaC5jCkBAIC0zMywxMCArMzMsNiBA QCBzdGF0aWMgcGNyZSAqY3JlOwogLyogQWRkaXRpb25hbCBpbmZvcm1hdGlvbiBhYm91dCB0 aGUgcGF0dGVybi4gICovCiBzdGF0aWMgcGNyZV9leHRyYSAqZXh0cmE7CiAKLS8qIFRhYmxl LCBpbmRleGVkIGJ5ICEgKGZsYWcgJiBQQ1JFX05PVEJPTCksIG9mIHdoZXRoZXIgdGhlIGVt cHR5Ci0gICBzdHJpbmcgbWF0Y2hlcyB3aGVuIHRoYXQgZmxhZyBpcyB1c2VkLiAgKi8KLXN0 YXRpYyBpbnQgZW1wdHlfbWF0Y2hbMl07Ci0KICMgaWZkZWYgUENSRV9TVFVEWV9KSVRfQ09N UElMRQogc3RhdGljIHBjcmVfaml0X3N0YWNrICpqaXRfc3RhY2s7CiAjIGVsc2UKQEAgLTQ0 LDYgKzQwLDEwIEBAIHN0YXRpYyBwY3JlX2ppdF9zdGFjayAqaml0X3N0YWNrOwogIyBlbmRp ZgogI2VuZGlmCiAKKy8qIFRhYmxlLCBpbmRleGVkIGJ5ICEgKGZsYWcgJiBQQ1JFX05PVEJP TCksIG9mIHdoZXRoZXIgdGhlIGVtcHR5CisgICBzdHJpbmcgbWF0Y2hlcyB3aGVuIHRoYXQg ZmxhZyBpcyB1c2VkLiAgKi8KK3N0YXRpYyBpbnQgZW1wdHlfbWF0Y2hbMl07CisKIHZvaWQK IFBjb21waWxlIChjaGFyIGNvbnN0ICpwYXR0ZXJuLCBzaXplX3Qgc2l6ZSkKIHsKQEAgLTEy OSwxMSArMTI5LDExIEBAIFBjb21waWxlIChjaGFyIGNvbnN0ICpwYXR0ZXJuLCBzaXplX3Qg c2l6ZSkKICAgICAgIHBjcmVfYXNzaWduX2ppdF9zdGFjayAoZXh0cmEsIE5VTEwsIGppdF9z dGFjayk7CiAgICAgfQogCi0gIGVtcHR5X21hdGNoW2ZhbHNlXSA9IHBjcmVfZXhlYyAoY3Jl LCBleHRyYSwgIiIsIDAsIDAsIFBDUkVfTk9UQk9MLCBOVUxMLCAwKTsKLSAgZW1wdHlfbWF0 Y2hbdHJ1ZV0gPSBwY3JlX2V4ZWMgKGNyZSwgZXh0cmEsICIiLCAwLCAwLCAwLCBOVUxMLCAw KTsKLQogIyBlbmRpZgogICBmcmVlIChyZSk7CisKKyAgZW1wdHlfbWF0Y2hbZmFsc2VdID0g cGNyZV9leGVjIChjcmUsIGV4dHJhLCAiIiwgMCwgMCwgUENSRV9OT1RCT0wsIE5VTEwsIDAp OworICBlbXB0eV9tYXRjaFt0cnVlXSA9IHBjcmVfZXhlYyAoY3JlLCBleHRyYSwgIiIsIDAs IDAsIDAsIE5VTEwsIDApOwogI2VuZGlmIC8qIEhBVkVfTElCUENSRSAqLwogfQogCi0tIAox LjkuMwoK --------------030607050500070706010400 Content-Type: text/plain; charset=UTF-8; name="0010-grep-fix-P-speedup-bug-with-empty-match.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0010-grep-fix-P-speedup-bug-with-empty-match.patch" RnJvbSA1MzBmZDc2NTkyMmIxNjY0M2M3ODY1MmVmMDM2MDI0ZmM0ZGQ3MmViIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBNb24sIDE1IFNlcCAyMDE0IDE4OjMzOjE5IC0wNzAwClN1YmplY3Q6IFtQQVRD SCAxMC8xMF0gZ3JlcDogZml4IC1QIHNwZWVkdXAgYnVnIHdpdGggZW1wdHkgbWF0Y2gKCiog c3JjL3BjcmVzZWFyY2guYyAoTlNVQik6IE5ldyB0b3AtbGV2ZWwgY29uc3RhbnQsIHJlcGxh Y2luZwonbnN1Yicgd2l0aGluIFBleGVjdXRlLgooUGNvbXBpbGUsIFBleGVjdXRlKTogVXNl IGl0LgooUGV4ZWN1dGUpOiBEb24ndCBhc3N1bWUgc3ViWzFdIGlzIHplcm8gYWZ0ZXIgYSBQ Q1JFX0VSUk9SX0JBRFVURjgKbWF0Y2ggZmFpbHVyZS4KKiB0ZXN0cy9wY3JlLWludmFsaWQt dXRmOC1pbnB1dDogVGVzdCBmb3IgdGhpcyBidWcuCi0tLQogc3JjL3BjcmVzZWFyY2guYyAg ICAgICAgICAgICAgfCAzMiArKysrKysrKysrKysrKysrKysrLS0tLS0tLS0tLS0tLQogdGVz dHMvcGNyZS1pbnZhbGlkLXV0ZjgtaW5wdXQgfCAgNSArKysrKwogMiBmaWxlcyBjaGFuZ2Vk LCAyNCBpbnNlcnRpb25zKCspLCAxMyBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zcmMv cGNyZXNlYXJjaC5jIGIvc3JjL3BjcmVzZWFyY2guYwppbmRleCBjZTY1NzU4Li5jNDFmN2Vm IDEwMDY0NAotLS0gYS9zcmMvcGNyZXNlYXJjaC5jCisrKyBiL3NyYy9wY3Jlc2VhcmNoLmMK QEAgLTQ0LDYgKzQ0LDEwIEBAIHN0YXRpYyBwY3JlX2ppdF9zdGFjayAqaml0X3N0YWNrOwog ICAgc3RyaW5nIG1hdGNoZXMgd2hlbiB0aGF0IGZsYWcgaXMgdXNlZC4gICovCiBzdGF0aWMg aW50IGVtcHR5X21hdGNoWzJdOwogCisvKiBUaGlzIG11c3QgYmUgYXQgbGVhc3QgMjsgZXZl cnl0aGluZyBhZnRlciB0aGF0IGlzIGZvciBwZXJmb3JtYW5jZQorICAgaW4gcGNyZV9leGVj LiAgKi8KK2VudW0geyBOU1VCID0gMzAwIH07CisKIHZvaWQKIFBjb21waWxlIChjaGFyIGNv bnN0ICpwYXR0ZXJuLCBzaXplX3Qgc2l6ZSkKIHsKQEAgLTEzMiw4ICsxMzYsMTAgQEAgUGNv bXBpbGUgKGNoYXIgY29uc3QgKnBhdHRlcm4sIHNpemVfdCBzaXplKQogIyBlbmRpZgogICBm cmVlIChyZSk7CiAKLSAgZW1wdHlfbWF0Y2hbZmFsc2VdID0gcGNyZV9leGVjIChjcmUsIGV4 dHJhLCAiIiwgMCwgMCwgUENSRV9OT1RCT0wsIE5VTEwsIDApOwotICBlbXB0eV9tYXRjaFt0 cnVlXSA9IHBjcmVfZXhlYyAoY3JlLCBleHRyYSwgIiIsIDAsIDAsIDAsIE5VTEwsIDApOwor ICBpbnQgc3ViW05TVUJdOworICBlbXB0eV9tYXRjaFtmYWxzZV0gPSBwY3JlX2V4ZWMgKGNy ZSwgZXh0cmEsICIiLCAwLCAwLAorICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IFBDUkVfTk9UQk9MLCBzdWIsIE5TVUIpOworICBlbXB0eV9tYXRjaFt0cnVlXSA9IHBjcmVf ZXhlYyAoY3JlLCBleHRyYSwgIiIsIDAsIDAsIDAsIHN1YiwgTlNVQik7CiAjZW5kaWYgLyog SEFWRV9MSUJQQ1JFICovCiB9CiAKQEAgLTE0NiwxMSArMTUyLDcgQEAgUGV4ZWN1dGUgKGNo YXIgY29uc3QgKmJ1Ziwgc2l6ZV90IHNpemUsIHNpemVfdCAqbWF0Y2hfc2l6ZSwKICAgZXJy b3IgKEVYSVRfVFJPVUJMRSwgMCwgXygiaW50ZXJuYWwgZXJyb3IiKSk7CiAgIHJldHVybiAt MTsKICNlbHNlCi0gIC8qIFRoaXMgYXJyYXkgbXVzdCBoYXZlIGF0IGxlYXN0IHR3byBlbGVt ZW50czsgZXZlcnl0aGluZyBhZnRlciB0aGF0Ci0gICAgIGlzIGp1c3QgZm9yIHBlcmZvcm1h bmNlIGltcHJvdmVtZW50IGluIHBjcmVfZXhlYy4gICovCi0gIGVudW0geyBuc3ViID0gMzAw IH07Ci0gIGludCBzdWJbbnN1Yl07Ci0KKyAgaW50IHN1YltOU1VCXTsKICAgY2hhciBjb25z dCAqcCA9IHN0YXJ0X3B0ciA/IHN0YXJ0X3B0ciA6IGJ1ZjsKICAgYm9vbCBib2wgPSBwWy0x XSA9PSBlb2xieXRlOwogICBjaGFyIGNvbnN0ICpsaW5lX3N0YXJ0ID0gYnVmOwpAQCAtMTc0 LDE1ICsxNzYsMTkgQEAgUGV4ZWN1dGUgKGNoYXIgY29uc3QgKmJ1Ziwgc2l6ZV90IHNpemUs IHNpemVfdCAqbWF0Y2hfc2l6ZSwKICAgICAgICAgewogICAgICAgICAgIGludCBvcHRpb25z ID0gYm9sID8gMCA6IFBDUkVfTk9UQk9MOwogICAgICAgICAgIGludCB2YWxpZF9ieXRlczsK LSAgICAgICAgICBlID0gcGNyZV9leGVjIChjcmUsIGV4dHJhLCBwLCBsaW5lX2VuZCAtIHAs IDAsIG9wdGlvbnMsIHN1YiwgbnN1Yik7CisgICAgICAgICAgZSA9IHBjcmVfZXhlYyAoY3Jl LCBleHRyYSwgcCwgbGluZV9lbmQgLSBwLCAwLCBvcHRpb25zLCBzdWIsIE5TVUIpOwogICAg ICAgICAgIGlmIChlICE9IFBDUkVfRVJST1JfQkFEVVRGOCkKICAgICAgICAgICAgIGJyZWFr OwogICAgICAgICAgIHZhbGlkX2J5dGVzID0gc3ViWzBdOwotICAgICAgICAgIGUgPSAodmFs aWRfYnl0ZXMgPT0gMAotICAgICAgICAgICAgICAgPyBlbXB0eV9tYXRjaFtib2xdCi0gICAg ICAgICAgICAgICA6IHBjcmVfZXhlYyAoY3JlLCBleHRyYSwgcCwgdmFsaWRfYnl0ZXMsIDAs Ci0gICAgICAgICAgICAgICAgICAgICAgICAgICAgb3B0aW9ucyB8IFBDUkVfTk9fVVRGOF9D SEVDSyB8IFBDUkVfTk9URU9MLAotICAgICAgICAgICAgICAgICAgICAgICAgICAgIHN1Yiwg bnN1YikpOworICAgICAgICAgIGlmICh2YWxpZF9ieXRlcyA9PSAwKQorICAgICAgICAgICAg eworICAgICAgICAgICAgICBzdWJbMV0gPSAwOworICAgICAgICAgICAgICBlID0gZW1wdHlf bWF0Y2hbYm9sXTsKKyAgICAgICAgICAgIH0KKyAgICAgICAgICBlbHNlCisgICAgICAgICAg ICBlID0gcGNyZV9leGVjIChjcmUsIGV4dHJhLCBwLCB2YWxpZF9ieXRlcywgMCwKKyAgICAg ICAgICAgICAgICAgICAgICAgICAgIG9wdGlvbnMgfCBQQ1JFX05PX1VURjhfQ0hFQ0sgfCBQ Q1JFX05PVEVPTCwKKyAgICAgICAgICAgICAgICAgICAgICAgICAgIHN1YiwgTlNVQik7CiAg ICAgICAgICAgaWYgKGUgIT0gUENSRV9FUlJPUl9OT01BVENIKQogICAgICAgICAgICAgYnJl YWs7CiAgICAgICAgICAgcCArPSB2YWxpZF9ieXRlcyArIDE7CmRpZmYgLS1naXQgYS90ZXN0 cy9wY3JlLWludmFsaWQtdXRmOC1pbnB1dCBiL3Rlc3RzL3BjcmUtaW52YWxpZC11dGY4LWlu cHV0CmluZGV4IDlkYTRiMTguLjc4YmQxY2YgMTAwNzU1Ci0tLSBhL3Rlc3RzL3BjcmUtaW52 YWxpZC11dGY4LWlucHV0CisrKyBiL3Rlc3RzL3BjcmUtaW52YWxpZC11dGY4LWlucHV0CkBA IC0yMSw0ICsyMSw5IEBAIHRlc3QgJD8gLWVxIDAgfHwgZmFpbD0xCiBMQ19BTEw9ZW5fVVMu VVRGLTggZ3JlcCAtUCAnayQnIGluCiB0ZXN0ICQ/IC1lcSAxIHx8IGZhaWw9MQogCitlY2hv IGsgPmV4cAorCitMQ19BTEw9ZW5fVVMuVVRGLTggZ3JlcCAtYW9QICdrKicgaW4gPm91dCB8 fCBmYWlsPTEKK2NvbXBhcmUgZXhwIG91dCB8fCBmYWlsPTEKKwogRXhpdCAkZmFpbAotLSAK MS45LjMKCg== --------------030607050500070706010400-- From unknown Thu Jun 19 14:04:20 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 15 Oct 2014 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator