From unknown Sun Aug 17 09:09:24 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#22239 <22239@debbugs.gnu.org> To: bug#22239 <22239@debbugs.gnu.org> Subject: Status: fgrep -i slow in 2.21 Reply-To: bug#22239 <22239@debbugs.gnu.org> Date: Sun, 17 Aug 2025 16:09:24 +0000 retitle 22239 fgrep -i slow in 2.21 reassign 22239 grep submitter 22239 Ond=C5=99ej C=C3=ADfka severity 22239 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 25 17:45:36 2015 Received: (at submit) by debbugs.gnu.org; 25 Dec 2015 22:45:36 +0000 Received: from localhost ([127.0.0.1]:35804 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aCb7A-00038I-8F for submit@debbugs.gnu.org; Fri, 25 Dec 2015 17:45:36 -0500 Received: from eggs.gnu.org ([208.118.235.92]:47337) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aCb6Y-0002UU-Ad for submit@debbugs.gnu.org; Fri, 25 Dec 2015 17:44:58 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCb6S-00029M-JI for submit@debbugs.gnu.org; Fri, 25 Dec 2015 17:44:53 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:39567) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCb6S-00029B-GU for submit@debbugs.gnu.org; Fri, 25 Dec 2015 17:44:52 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38308) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCb6R-0003PV-O3 for bug-grep@gnu.org; Fri, 25 Dec 2015 17:44:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCb6M-00022K-OG for bug-grep@gnu.org; Fri, 25 Dec 2015 17:44:51 -0500 Received: from h1.cmg2.smtp.forpsi.com ([81.2.195.188]:36368) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCb6M-0001xo-Et for bug-grep@gnu.org; Fri, 25 Dec 2015 17:44:46 -0500 Received: from mail-lb0-f181.google.com ([209.85.217.181]) by cmg2.smtp.forpsi.com with bizsmtp id xyki1r0013vQdcH01ykjdE; Fri, 25 Dec 2015 23:44:43 +0100 Received: by mail-lb0-f181.google.com with SMTP id bc4so72233506lbc.2 for ; Fri, 25 Dec 2015 14:44:43 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.112.182.8 with SMTP id ea8mr15281798lbc.114.1451083482768; Fri, 25 Dec 2015 14:44:42 -0800 (PST) Received: by 10.25.16.22 with HTTP; Fri, 25 Dec 2015 14:44:42 -0800 (PST) Date: Fri, 25 Dec 2015 23:44:42 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: fgrep -i slow in 2.21 From: =?UTF-8?B?T25kxZllaiBDw61ma2E=?= To: bug-grep@gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 25 Dec 2015 17:45:34 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) When running "grep -Fi -f list.txt" where list.txt has thousands of lines, it takes orders of magnitude longer to process the input file than without the -i. This was not the case in grep 2.16, where fgrep -i was about as fast as fgrep. -- Ond=C5=99ej C=C3=ADfka From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 11 01:02:26 2016 Received: (at 22239) by debbugs.gnu.org; 11 Apr 2016 05:02:26 +0000 Received: from localhost ([127.0.0.1]:57855 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apTzW-0005Th-5F for submit@debbugs.gnu.org; Mon, 11 Apr 2016 01:02:26 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:34649) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apTzV-0005TV-49 for 22239@debbugs.gnu.org; Mon, 11 Apr 2016 01:02:25 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 8ACC1160FD3; Sun, 10 Apr 2016 22:02:19 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 4dnF-bDbcAJi; Sun, 10 Apr 2016 22:02:18 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 8D3C8161250; Sun, 10 Apr 2016 22:02:18 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id kkf8-j50IUYR; Sun, 10 Apr 2016 22:02:18 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 70D12160FD3; Sun, 10 Apr 2016 22:02:18 -0700 (PDT) To: =?UTF-8?B?T25kxZllaiBDw61ma2E=?= From: Paul Eggert Subject: Re: fgrep -i slow in 2.21 Organization: UCLA Computer Science Department Message-ID: <570B2FDA.1060307@cs.ucla.edu> Date: Sun, 10 Apr 2016 22:02:18 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 22239 Cc: 22239@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is following up on: http://bugs.gnu.org/22239 Could you please supply some specific test cases to illustrate the problem? I'm not seeing it, but this could be because I'm not using your locale, or am using machine-generated patterns. I am using master grep in the C locale on Ubuntu 15.10, with a pattern file generated like this: seq -f 'pattern %g' 100000 >list.txt and running commands like this in the 'grep' source directory: src/grep -F -i -f list.txt $(git ls-files | grep -v gnulib) (or the same command without -i). From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 11 01:02:51 2016 Received: (at control) by debbugs.gnu.org; 11 Apr 2016 05:02:51 +0000 Received: from localhost ([127.0.0.1]:57858 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apTzv-0005UL-Ch for submit@debbugs.gnu.org; Mon, 11 Apr 2016 01:02:51 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:34664) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apTzu-0005U9-E5 for control@debbugs.gnu.org; Mon, 11 Apr 2016 01:02:50 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1E603160FD3 for ; Sun, 10 Apr 2016 22:02:45 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id cH2_C1j6Nl9N for ; Sun, 10 Apr 2016 22:02:44 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7A61F161250 for ; Sun, 10 Apr 2016 22:02:44 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id as7AH3msr8vG for ; Sun, 10 Apr 2016 22:02:44 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 5FD69160FD3 for ; Sun, 10 Apr 2016 22:02:44 -0700 (PDT) To: control@debbugs.gnu.org From: Paul Eggert Subject: grep bug maintenance Organization: UCLA Computer Science Department Message-ID: <570B2FF4.4000206@cs.ucla.edu> Date: Sun, 10 Apr 2016 22:02:44 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 22239 moreinfo From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 11 03:14:50 2016 Received: (at 22239) by debbugs.gnu.org; 11 Apr 2016 07:14:51 +0000 Received: from localhost ([127.0.0.1]:57946 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apW3e-00022H-OO for submit@debbugs.gnu.org; Mon, 11 Apr 2016 03:14:50 -0400 Received: from h1.cmg2.smtp.forpsi.com ([81.2.195.188]:53342) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apW3c-000224-Pz for 22239@debbugs.gnu.org; Mon, 11 Apr 2016 03:14:49 -0400 Received: from mail-lf0-f49.google.com ([209.85.215.49]) by cmg2.smtp.forpsi.com with bizsmtp id gvEg1s00614XGcZ01vEgYt; Mon, 11 Apr 2016 09:14:42 +0200 Received: by mail-lf0-f49.google.com with SMTP id j11so143676370lfb.1 for <22239@debbugs.gnu.org>; Mon, 11 Apr 2016 00:14:40 -0700 (PDT) X-Gm-Message-State: AD7BkJK1GP9nJyewyU/c0GWBYIlCPO+3Mzxanoh4Yb9HjPAG28GKEk5v87PFj0upfRjKx5qW4L3DyDGy0o1/zA== MIME-Version: 1.0 X-Received: by 10.25.139.194 with SMTP id n185mr6387540lfd.79.1460358880496; Mon, 11 Apr 2016 00:14:40 -0700 (PDT) Received: by 10.25.208.6 with HTTP; Mon, 11 Apr 2016 00:14:40 -0700 (PDT) In-Reply-To: <570B2FDA.1060307@cs.ucla.edu> References: <570B2FDA.1060307@cs.ucla.edu> Date: Mon, 11 Apr 2016 09:14:40 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: fgrep -i slow in 2.21 From: =?UTF-8?B?T25kxZllaiBDw61ma2E=?= To: Paul Eggert Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22239 Cc: 22239@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) You're probably right about the locale. I'm using cs_CZ.UTF-8. With LC_ALL=C, both variants run faster and the difference is insignificant. With cs_CZ.UTF-8, on my machine, your test case takes 2.322s with -i and 0.464s without -i. I tested on my Aspell dictionary dump, where the difference is more noticeable: aspell dump master | head -n 100000 >list.txt grep 2.21 with -i: 7.336s grep 2.21 without -i: 0.312s grep 2.16 with -i: 0.372s grep 2.16 without -i: 0.431s With LC_ALL=C, both versions are about as fast. On Mon, Apr 11, 2016 at 7:02 AM, Paul Eggert wrote: > This is following up on: > > http://bugs.gnu.org/22239 > > Could you please supply some specific test cases to illustrate the problem? > I'm not seeing it, but this could be because I'm not using your locale, or > am using machine-generated patterns. I am using master grep in the C locale > on Ubuntu 15.10, with a pattern file generated like this: > > seq -f 'pattern %g' 100000 >list.txt > > and running commands like this in the 'grep' source directory: > > src/grep -F -i -f list.txt $(git ls-files | grep -v gnulib) > > (or the same command without -i). From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 11 03:34:09 2016 Received: (at control) by debbugs.gnu.org; 11 Apr 2016 07:34:09 +0000 Received: from localhost ([127.0.0.1]:57956 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apWML-0002Y2-Ik for submit@debbugs.gnu.org; Mon, 11 Apr 2016 03:34:09 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:40354) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apWMK-0002Wo-DS for control@debbugs.gnu.org; Mon, 11 Apr 2016 03:34:08 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E3711160FD3 for ; Mon, 11 Apr 2016 00:34:02 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id EZBjZwcbGt1o for ; Mon, 11 Apr 2016 00:34:02 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 376CD160FF2 for ; Mon, 11 Apr 2016 00:34:02 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id uCEXOFUmMWQ4 for ; Mon, 11 Apr 2016 00:34:02 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1EF22160FD3 for ; Mon, 11 Apr 2016 00:34:02 -0700 (PDT) To: control@debbugs.gnu.org From: Paul Eggert Subject: 22239 has more info, so remove tag Organization: UCLA Computer Science Department Message-ID: <570B5369.5030107@cs.ucla.edu> Date: Mon, 11 Apr 2016 00:34:01 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 22239 - moreinfo From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 21 01:45:46 2016 Received: (at 22239) by debbugs.gnu.org; 21 Dec 2016 06:45:46 +0000 Received: from localhost ([127.0.0.1]:49598 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJaen-0006tD-F5 for submit@debbugs.gnu.org; Wed, 21 Dec 2016 01:45:46 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:39584) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJZH5-0004FH-FU; Wed, 21 Dec 2016 00:17:12 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 497A1160059; Tue, 20 Dec 2016 21:17:04 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id WYG9VHYcmlCw; Tue, 20 Dec 2016 21:17:02 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 4B27216005F; Tue, 20 Dec 2016 21:17:02 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 2N8nxvMb7QRK; Tue, 20 Dec 2016 21:17:02 -0800 (PST) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id D8F6C160059; Tue, 20 Dec 2016 21:17:01 -0800 (PST) Subject: Re: bug#22357: grep -f not only huge memory usage, but also huge time cost To: 22357-done@debbugs.gnu.org, 21763-done@debbugs.gnu.org References: <2051273399.9196271.1452605975684.JavaMail.zimbra@redhat.com> <20161209072419.GA30226@pog.tecnopolis.ca> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <3a0eb353-96a4-80fd-5f3c-7fd77be096a0@cs.ucla.edu> Date: Tue, 20 Dec 2016 21:17:01 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161209072419.GA30226@pog.tecnopolis.ca> Content-Type: multipart/mixed; boundary="------------4476EE1989198F97E6DF6D47" X-Spam-Score: -0.1 (/) X-Debbugs-Envelope-To: 22239 X-Mailman-Approved-At: Wed, 21 Dec 2016 01:45:43 -0500 Cc: Norihiro Tanaka , sur-behoffski , "L.A. Walsh" , Jim Meyering , "Bennett, Steve" , JQK , arnold@skeeve.com, 22239@debbugs.gnu.org, Trevor Cordes , Paul Jackson , Bruno Haible , =?UTF-8?B?T25kxZllaiBDw61ma2E=?= , Bruce Dubbs X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.1 (---) This is a multi-part message in MIME format. --------------4476EE1989198F97E6DF6D47 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable I installed the attached patches into grep master. These fix the performa= nce=20 regressions noted at the start of Bug#22357. I see that the related perfo= rmance=20 problems noted in Bug#21763 seem to be fixed too, I expect because of Nor= ihiro=20 Tanaka's recent changes, so I'll boldly close both bug reports. To some extent the attached patches restore the old behavior for grep -F,= when=20 grep is given two or more patterns. The patch doesn't change the underlyi= ng=20 algorithms; it merely uses a different heuristic to decide whether to use= the -F=20 matcher. Although I wouldn't be surprised if the attached patches hurt=20 performance in some cases, I didn't uncover any such cases in my performa= nce=20 testing, which I admit mostly consisted of running the examples in the=20 abovementioned bug reports. I'll leave Bug#22239 open, as I get the following performance figures=20 (user+system CPU time) for the Bug#22239 benchmark, where list.txt is cre= ated by=20 "aspell dump master | head -n 100000 >list.txt", and the grep commands al= l use=20 the operands "-F -f list.txt /etc/passwd" in the en_US.utf8 locale on Fed= ora 24=20 x86-64. no -i -i grep version 0.25 0.33 2.16 0.26 10.95 2.21 0.11 2.90* current master (including attached patches) In the C locale, the current grep master is always significantly faster t= han=20 grep 2.16 or 2.21 on the benchmark, so the only significant problem is th= e=20 number marked "*". I ran the benchmarks on an AMD Phenom II X4 910e. --------------4476EE1989198F97E6DF6D47 Content-Type: text/x-diff; name="0001-grep-simplify-line-counting-in-patterns.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-grep-simplify-line-counting-in-patterns.patch" =46rom 75bca3ca0a6ea87a66531ab7c0f8859e1d6c4300 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 20 Dec 2016 08:56:06 -0800 Subject: [PATCH 1/3] grep: simplify line counting in patterns * src/grep.c (n_patterns): Rename from patfile_lineno, as it is now origin-zero. Now size_t, not uintmax_t. (count_nl_bytes, fl_add): Simplify to just buffer and size. All callers changed. --- src/grep.c | 49 ++++++++++++++++++++----------------------------- 1 file changed, 20 insertions(+), 29 deletions(-) diff --git a/src/grep.c b/src/grep.c index 30e0f54..76f83d5 100644 --- a/src/grep.c +++ b/src/grep.c @@ -104,46 +104,35 @@ static size_t n_fl_pair_slots; and any command-line argument that serves as a regular expression. *= / static size_t n_pattern_files; =20 -/* Given the concatenation of all patterns, one per line, be they - specified via -e, a lone command-line argument or -f, this is the - number of the first line of each entity, in that concatenation. +/* The number of patterns seen so far. It is advanced by fl_add and, when needed, used in pattern_file_name to derive a file-relative line number. */ -static uintmax_t patfile_lineno =3D 1; +static size_t n_patterns; =20 -/* Return the number of newline bytes in BUF starting at offset BEG - and up to and not including offset END. */ +/* Return the number of newline bytes in BUF with size SIZE. */ static size_t _GL_ATTRIBUTE_PURE -count_nl_bytes (char const *buf, size_t beg, size_t end) +count_nl_bytes (char const *buf, size_t size) { - char const *p =3D buf + beg; - char const *end_p =3D buf + end; - uintmax_t n =3D 0; - while (true) - { - p =3D memchr (p, '\n', end_p - p); - if (!p) - break; - p++; - n++; - } + char const *p =3D buf; + char const *end_p =3D buf + size; + size_t n =3D 0; + while ((p =3D memchr (p, '\n', end_p - p))) + p++, n++; return n; } =20 -/* Append a FILENAME,line-number pair to FL_PAIR. The line number we sa= ve - with FILENAME is the initial value of the global PATFILE_LINENO. - PATFILE_LINENO is then incremented by the number of newlines in BUF - from offset BEG up to but not including offset END. */ +/* Append a FILENAME,line-number pair to FL_PAIR, and update + pattern-related counts from the contents of BUF with SIZE bytes. */ static void -fl_add (char const *buf, size_t beg, size_t end, char const *filename) +fl_add (char const *buf, size_t size, char const *filename) { if (n_fl_pair_slots <=3D n_pattern_files) fl_pair =3D x2nrealloc (fl_pair, &n_fl_pair_slots, sizeof *fl_pair);= =20 - fl_pair[n_pattern_files].lineno =3D patfile_lineno; + fl_pair[n_pattern_files].lineno =3D n_patterns + 1; fl_pair[n_pattern_files].filename =3D filename; n_pattern_files++; - patfile_lineno +=3D count_nl_bytes (buf, beg, end); + n_patterns +=3D count_nl_bytes (buf, size); } =20 /* Map the line number, LINENO, of one of the input patterns to the @@ -2521,10 +2510,11 @@ main (int argc, char **argv) keyalloc =3D keycc + cc + 1; keys =3D x2realloc (keys, &keyalloc); } - memcpy (&keys[keycc], optarg, cc); + oldcc =3D keycc; + memcpy (keys + oldcc, optarg, cc); keycc +=3D cc; keys[keycc++] =3D '\n'; - fl_add (keys, keycc - cc - 1, keycc, ""); + fl_add (keys + oldcc, cc + 1, ""); break; =20 case 'f': @@ -2548,7 +2538,7 @@ main (int argc, char **argv) /* Append final newline if file ended in non-newline. */ if (oldcc !=3D keycc && keys[keycc - 1] !=3D '\n') keys[keycc++] =3D '\n'; - fl_add (keys, oldcc, keycc, xstrdup (optarg)); + fl_add (keys + oldcc, keycc - oldcc, optarg); break; =20 case 'h': @@ -2741,7 +2731,8 @@ main (int argc, char **argv) /* Make a copy so that it can be reallocated or freed later. */ keycc =3D strlen (argv[optind]); keys =3D xmemdup (argv[optind++], keycc + 1); - fl_add (keys, 0, keycc, ""); + fl_add (keys, keycc, ""); + n_patterns++; } else usage (EXIT_TROUBLE); --=20 2.7.4 --------------4476EE1989198F97E6DF6D47 Content-Type: text/x-diff; name="0002-grep-simplify-matcher-configuration.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0002-grep-simplify-matcher-configuration.patch" =46rom 214be13d33a26ab45231afe8e983d9a174aae282 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 20 Dec 2016 08:56:06 -0800 Subject: [PATCH 2/3] grep: simplify matcher configuration * src/grep.c (matcher, compile): Remove static vars. (compile_fp_t): Now takes a 3rd syntax argument. (Gcomppile, Ecompile, Acompile, GAcompile, PAcompile): Remove. (struct matcher): Now nameless, since it is used only once. Make 'name' a bit shorter. New member 'syntax'. (matchers): Initialize it, and change removed functions to GEAcompile. (F_MATCHER_INDEX, G_MATCHER_INDEX): New constants. (setmatcher): New arg MATCHER, and return new matcher index. Avoid unnecessary call to strcmp. (main): Keep matcher as a local int, not a global pointer. * src/kwsearch.c (Fcompile): * src/pcresearch.c (Pcompile): Ignore the 3rd syntax argument. --- src/grep.c | 111 +++++++++++++++++++------------------------------= ------ src/kwsearch.c | 2 +- src/pcresearch.c | 2 +- src/search.h | 4 +- 4 files changed, 41 insertions(+), 78 deletions(-) diff --git a/src/grep.c b/src/grep.c index 76f83d5..574a380 100644 --- a/src/grep.c +++ b/src/grep.c @@ -491,8 +491,6 @@ bool match_words; bool match_lines; char eolbyte; =20 -static char const *matcher; - /* For error messages. */ /* The input file name, or (if standard input) null or a --label argumen= t. */ static char const *filename; @@ -577,9 +575,8 @@ static bool seek_failed; static bool seek_data_failed; =20 /* Functions we'll use to search. */ -typedef void (*compile_fp_t) (char const *, size_t); +typedef void (*compile_fp_t) (char const *, size_t, reg_syntax_t); typedef size_t (*execute_fp_t) (char const *, size_t, size_t *, char con= st *); -static compile_fp_t compile; static execute_fp_t execute; =20 static char const * @@ -2019,70 +2016,36 @@ if any error occurs and -q is not given, the exit= status is 2.\n")); =20 /* Pattern compilers and matchers. */ =20 -static void -Gcompile (char const *pattern, size_t size) -{ - GEAcompile (pattern, size, RE_SYNTAX_GREP); -} - -static void -Ecompile (char const *pattern, size_t size) -{ - GEAcompile (pattern, size, RE_SYNTAX_EGREP); -} - -static void -Acompile (char const *pattern, size_t size) -{ - GEAcompile (pattern, size, RE_SYNTAX_AWK); -} - -static void -GAcompile (char const *pattern, size_t size) -{ - GEAcompile (pattern, size, RE_SYNTAX_GNU_AWK); -} - -static void -PAcompile (char const *pattern, size_t size) -{ - GEAcompile (pattern, size, RE_SYNTAX_POSIX_AWK); -} - -struct matcher +static struct { - char const name[16]; + char const name[12]; + int syntax; /* used if compile =3D=3D GEAcompile */ compile_fp_t compile; execute_fp_t execute; +} const matchers[] =3D { + { "grep", RE_SYNTAX_GREP, GEAcompile, EGexecute }, + { "egrep", RE_SYNTAX_EGREP, GEAcompile, EGexecute }, + { "fgrep", 0, Fcompile, Fexecute, }, + { "awk", RE_SYNTAX_AWK, GEAcompile, EGexecute }, + { "gawk", RE_SYNTAX_GNU_AWK, GEAcompile, EGexecute }, + { "posixawk", RE_SYNTAX_POSIX_AWK, GEAcompile, EGexecute }, + { "perl", 0, Pcompile, Pexecute, }, }; -static struct matcher const matchers[] =3D { - { "grep", Gcompile, EGexecute }, - { "egrep", Ecompile, EGexecute }, - { "fgrep", Fcompile, Fexecute }, - { "awk", Acompile, EGexecute }, - { "gawk", GAcompile, EGexecute }, - { "posixawk", PAcompile, EGexecute }, - { "perl", Pcompile, Pexecute }, - { "", NULL, NULL }, -}; +/* Keep these in sync with the 'matchers' table. */ +enum { F_MATCHER_INDEX =3D 2, G_MATCHER_INDEX =3D 0 }; =20 -/* Set the matcher to M if available. Exit in case of conflicts or if - M is not available. */ -static void -setmatcher (char const *m) +/* Return the index of the matcher corresponding to M if available. + MATCHER is the index of the previous matcher, or -1 if none. + Exit in case of conflicts or if M is not available. */ +static int +setmatcher (char const *m, int matcher) { - struct matcher const *p; - - if (matcher && !STREQ (matcher, m)) - die (EXIT_TROUBLE, 0, _("conflicting matchers specified")); - - for (p =3D matchers; p->compile; p++) - if (STREQ (m, p->name)) + for (int i =3D 0; i < sizeof matchers / sizeof *matchers; i++) + if (STREQ (m, matchers[i].name)) { - matcher =3D p->name; - compile =3D p->compile; - execute =3D p->execute; - return; + if (0 <=3D matcher && matcher !=3D i) + die (EXIT_TROUBLE, 0, _("conflicting matchers specified")); + return i; } =20 die (EXIT_TROUBLE, 0, _("invalid matcher %s"), m); @@ -2367,6 +2330,7 @@ main (int argc, char **argv) { char *keys =3D NULL; size_t keycc =3D 0, oldcc, keyalloc =3D 0; + int matcher =3D -1; bool with_filenames =3D false; size_t cc; int opt, prepended; @@ -2409,9 +2373,6 @@ main (int argc, char **argv) error (0, 0, _("warning: GREP_OPTIONS is deprecated;" " please use an alias or script")); =20 - compile =3D matchers[0].compile; - execute =3D matchers[0].execute; - while (prev_optind =3D optind, (opt =3D get_nondigit_option (argc, argv, &default_context)) !=3D= -1) switch (opt) @@ -2440,23 +2401,23 @@ main (int argc, char **argv) break; =20 case 'E': - setmatcher ("egrep"); + matcher =3D setmatcher ("egrep", matcher); break; =20 case 'F': - setmatcher ("fgrep"); + matcher =3D setmatcher ("fgrep", matcher); break; =20 case 'P': - setmatcher ("perl"); + matcher =3D setmatcher ("perl", matcher); break; =20 case 'G': - setmatcher ("grep"); + matcher =3D setmatcher ("grep", matcher); break; =20 case 'X': /* undocumented on purpose */ - setmatcher (optarg); + matcher =3D setmatcher (optarg, matcher); break; =20 case 'H': @@ -2794,24 +2755,26 @@ main (int argc, char **argv) =20 initialize_unibyte_mask (); =20 + if (matcher < 0) + matcher =3D G_MATCHER_INDEX; + /* In a unibyte locale, switch from fgrep to grep if the pattern matches words (where grep is typically faster). In a multibyte locale, switch from fgrep to grep if either (1) the pattern has an encoding error (where fgrep might not work),= or (2) case is ignored and a fast fgrep ignore-case is not available. = */ - if (compile =3D=3D Fcompile + if (matcher =3D=3D F_MATCHER_INDEX && (MB_CUR_MAX <=3D 1 ? match_words : (contains_encoding_error (keys, keycc) || (match_icase && !fgrep_icase_available (keys, keycc)))))= { fgrep_to_grep_pattern (&keys, &keycc); - matcher =3D "grep"; - compile =3D Gcompile; - execute =3D EGexecute; + matcher =3D G_MATCHER_INDEX; } =20 - compile (keys, keycc); + execute =3D matchers[matcher].execute; + matchers[matcher].compile (keys, keycc, matchers[matcher].syntax); free (keys); /* We need one byte prior and one after. */ char eolbytes[3] =3D { 0, eolbyte, 0 }; diff --git a/src/kwsearch.c b/src/kwsearch.c index c3e69b3..7275973 100644 --- a/src/kwsearch.c +++ b/src/kwsearch.c @@ -34,7 +34,7 @@ wordchar (wint_t wc) static kwset_t kwset; =20 void -Fcompile (char const *pattern, size_t size) +Fcompile (char const *pattern, size_t size, reg_syntax_t ignored) { size_t total =3D size; =20 diff --git a/src/pcresearch.c b/src/pcresearch.c index 108baff..0e34861 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -88,7 +88,7 @@ static int empty_match[2]; #endif =20 void -Pcompile (char const *pattern, size_t size) +Pcompile (char const *pattern, size_t size, reg_syntax_t ignored) { #if !HAVE_LIBPCRE die (EXIT_TROUBLE, 0, diff --git a/src/search.h b/src/search.h index 4957a63..1ff5be2 100644 --- a/src/search.h +++ b/src/search.h @@ -57,11 +57,11 @@ extern void GEAcompile (char const *, size_t, reg_syn= tax_t); extern size_t EGexecute (char const *, size_t, size_t *, char const *); =20 /* kwsearch.c */ -extern void Fcompile (char const *, size_t); +extern void Fcompile (char const *, size_t, reg_syntax_t); extern size_t Fexecute (char const *, size_t, size_t *, char const *); =20 /* pcresearch.c */ -extern void Pcompile (char const *, size_t); +extern void Pcompile (char const *, size_t, reg_syntax_t); extern size_t Pexecute (char const *, size_t, size_t *, char const *); =20 /* Return the number of bytes in the character at the start of S, which --=20 2.7.4 --------------4476EE1989198F97E6DF6D47 Content-Type: text/x-diff; name="0003-grep-fix-performance-with-multiple-patterns.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0003-grep-fix-performance-with-multiple-patterns.patch" =46rom 290ca116c9172d97b2b026951fac722d3bd3ced9 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 20 Dec 2016 18:04:39 -0800 Subject: [PATCH 3/3] grep: fix performance with multiple patterns Problem reported by Jaroslav Skarvada (Bug#22357). * NEWS: Document this and other recent performance fixes. * src/grep.c (E_MATCHER_INDEX): New constant. (all_single_byte_after_folding): New function, split out from fgrep_icase_available. (fgrep_icase_available): Use it. (try_fgrep_pattern): New function, which also uses it. (main): With two or more patterns, use try_fgrep_pattern to fix performance regression. The number "two" here is just a heuristic. --- NEWS | 11 ++++++ src/grep.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++----= ------ 2 files changed, 123 insertions(+), 20 deletions(-) diff --git a/NEWS b/NEWS index aa0483b..2e7d1ae 100644 --- a/NEWS +++ b/NEWS @@ -8,6 +8,17 @@ GNU grep NEWS -*- out= line -*- /proc file system and standard output is /dev/null. [bug introduced in grep-2.27] =20 +** Bug fixes + + Fix performance regression with multiple patterns, e.g., for -Fi in + a multi-byte locale, or for -Fw in a single-byte locale. + [bugs introduced in grep-2.19, grep-2.22 and grep-2.26] + +** Improvements + + Improve performance for -E or -G pattern lists that are easily + converted to -F format. + =20 * Noteworthy changes in release 2.27 (2016-12-06) [stable] =20 diff --git a/src/grep.c b/src/grep.c index 574a380..f36654c 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2032,7 +2032,7 @@ static struct { "perl", 0, Pcompile, Pexecute, }, }; /* Keep these in sync with the 'matchers' table. */ -enum { F_MATCHER_INDEX =3D 2, G_MATCHER_INDEX =3D 0 }; +enum { E_MATCHER_INDEX =3D 1, F_MATCHER_INDEX =3D 2, G_MATCHER_INDEX =3D= 0 }; =20 /* Return the index of the matcher corresponding to M if available. MATCHER is the index of the previous matcher, or -1 if none. @@ -2245,23 +2245,12 @@ contains_encoding_error (char const *pat, size_t = patlen) return false; } =20 -/* Return true if the fgrep pattern PAT, of size PATLEN, matches only - single-byte characters including case folding, and so is suitable - to be converted to a grep pattern. */ +/* Return true if the set of single-byte characters USED contains only + characters whose case counterparts are also single-byte. */ =20 static bool -fgrep_icase_available (char const *pat, size_t patlen) +all_single_byte_after_folding (bool used[UCHAR_MAX + 1]) { - bool used[UCHAR_MAX + 1] =3D { 0, }; - - for (size_t i =3D 0; i < patlen; i++) - { - unsigned char c =3D pat[i]; - if (localeinfo.sbctowc[c] =3D=3D WEOF) - return false; - used[c] =3D true; - } - for (int c =3D 0; c <=3D UCHAR_MAX; c++) if (used[c]) { @@ -2281,6 +2270,26 @@ fgrep_icase_available (char const *pat, size_t pat= len) return true; } =20 +/* Return true if the -F patterns PAT, of size PATLEN, match only + single-byte characters including case folding, and so can be + processed by the -F matcher even if -i is given. */ + +static bool +fgrep_icase_available (char const *pat, size_t patlen) +{ + bool used[UCHAR_MAX + 1] =3D { 0, }; + + for (size_t i =3D 0; i < patlen; i++) + { + unsigned char c =3D pat[i]; + if (localeinfo.sbctowc[c] =3D=3D WEOF) + return false; + used[c] =3D true; + } + + return all_single_byte_after_folding (used); +} + /* Change the pattern *KEYS_P, of size *LEN_P, from fgrep to grep style.= */ =20 static void @@ -2325,6 +2334,84 @@ fgrep_to_grep_pattern (char **keys_p, size_t *len_= p) *len_p =3D p - new_keys; } =20 +/* If it is easy, convert the MATCHER-style patterns KEYS (of size + *LEN_P) to -F style, update *LEN_P to a possibly-smaller value, and + return F_MATCHER_INDEX. If not, leave KEYS and *LEN_P alone and + return MATCHER. This function is conservative and sometimes misses + conversions, e.g., it does not convert the -E pattern "(a|a|[aa])" + to the -F pattern "a". */ + +static int +try_fgrep_pattern (int matcher, char *keys, size_t *len_p) +{ + int result =3D matcher; + size_t len =3D *len_p; + char *new_keys =3D xmalloc (len + 1); + char *p =3D new_keys; + mbstate_t mb_state =3D { 0 }; + size_t mblen_lim =3D match_icase ? 1 : -3; + bool used[UCHAR_MAX + 1] =3D { 0, }; + + while (len !=3D 0) + { + switch (*keys) + { + case '$': case '*': case '.': case '[': case '^': + goto fail; + + case '(': case '+': case '?': case '{': case '|': + if (matcher !=3D G_MATCHER_INDEX) + goto fail; + break; + + case '\\': + if (1 < len) + switch (keys[1]) + { + case '\n': + case 'B': case 'S': case 'W': case'\'': case '<': + case 'b': case 's': case 'w': case '`': case '>': + case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + goto fail; + + case '(': case '+': case '?': case '{': case '|': + if (matcher =3D=3D G_MATCHER_INDEX) + goto fail; + /* Fall through. */ + default: + keys++, len--; + break; + } + break; + } + + { + size_t n =3D mb_clen (keys, len, &mb_state); + if (mblen_lim < n) + goto fail; + used[to_uchar (keys[0])] =3D true; + p =3D mempcpy (p, keys, n); + keys +=3D n; + len -=3D n; + } + } + + if (mblen_lim =3D=3D 1 && !all_single_byte_after_folding (used)) + goto fail; + + if (*len_p !=3D p - new_keys) + { + *len_p =3D p - new_keys; + memcpy (keys, new_keys, p - new_keys); + } + result =3D F_MATCHER_INDEX; + + fail: + free (new_keys); + return result; +} + int main (int argc, char **argv) { @@ -2758,11 +2845,11 @@ main (int argc, char **argv) if (matcher < 0) matcher =3D G_MATCHER_INDEX; =20 - /* In a unibyte locale, switch from fgrep to grep if - the pattern matches words (where grep is typically faster). - In a multibyte locale, switch from fgrep to grep if either - (1) the pattern has an encoding error (where fgrep might not work),= or - (2) case is ignored and a fast fgrep ignore-case is not available. = */ + /* In a single-byte locale, switch from -F to -G if it is a single + pattern that matches words, where -G is typically faster. In a + multi-byte locale, switch if the patterns have an encoding error + (where -F does not work) or if -i and the patterns will not work + for -iF. */ if (matcher =3D=3D F_MATCHER_INDEX && (MB_CUR_MAX <=3D 1 ? match_words @@ -2772,6 +2859,11 @@ main (int argc, char **argv) fgrep_to_grep_pattern (&keys, &keycc); matcher =3D G_MATCHER_INDEX; } + /* With two or more patterns, if -F works then switch from either -E + or -G, as -F is probably faster then. */ + else if ((matcher =3D=3D G_MATCHER_INDEX || matcher =3D=3D E_MATCHER_I= NDEX) + && 1 < n_patterns) + matcher =3D try_fgrep_pattern (matcher, keys, &keycc); =20 execute =3D matchers[matcher].execute; matchers[matcher].compile (keys, keycc, matchers[matcher].syntax); --=20 2.7.4 --------------4476EE1989198F97E6DF6D47-- From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 21 18:23:19 2016 Received: (at 22239) by debbugs.gnu.org; 21 Dec 2016 23:23:19 +0000 Received: from localhost ([127.0.0.1]:50694 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJqEB-0003Px-D4 for submit@debbugs.gnu.org; Wed, 21 Dec 2016 18:23:19 -0500 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:60561) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJpwN-0002xs-74 for 22239@debbugs.gnu.org; Wed, 21 Dec 2016 18:04:55 -0500 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id F2A644A08BA for <22239@debbugs.gnu.org>; Thu, 22 Dec 2016 08:04:48 +0900 (JST) X-matriXscan-loop-detect: 2acf43152a12c75b80d7ac696e1f5be3400fafca Received: from mail04.kcn.ne.jp ([61.86.6.183]) by mxs02-s with ESMTP; Thu, 22 Dec 2016 08:04:46 +0900 (JST) Received: from [10.120.1.68] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail04.kcn.ne.jp (Postfix) with ESMTPA id AFA8C12900BC; Thu, 22 Dec 2016 08:04:45 +0900 (JST) Date: Thu, 22 Dec 2016 08:04:42 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#22357: grep -f not only huge memory usage, but also huge time cost In-Reply-To: <3a0eb353-96a4-80fd-5f3c-7fd77be096a0@cs.ucla.edu> References: <20161209072419.GA30226@pog.tecnopolis.ca> <3a0eb353-96a4-80fd-5f3c-7fd77be096a0@cs.ucla.edu> Message-Id: <20161222080441.6C45.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.73 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -0.1 (/) X-Debbugs-Envelope-To: 22239 X-Mailman-Approved-At: Wed, 21 Dec 2016 18:23:18 -0500 Cc: 21763-done@debbugs.gnu.org, Bruce Dubbs , 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , Jim Meyering , "Bennett, Steve" , JQK , arnold@skeeve.com, 22239@debbugs.gnu.org, Trevor Cordes , Paul Jackson , Bruno Haible , Ond?ej Cifka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.1 (---) On Tue, 20 Dec 2016 21:17:01 -0800 Paul Eggert wrote: > I installed the attached patches into grep master. These fix the > performance regressions noted at the start of Bug#22357. I see that > the related performance problems noted in Bug#21763 seem to be fixed > too, I expect because of Norihiro Tanaka's recent changes, so I'll > boldly close both bug reports. > > To some extent the attached patches restore the old behavior for grep > -F, when grep is given two or more patterns. The patch doesn't change > the underlying algorithms; it merely uses a different heuristic to > decide whether to use the -F matcher. Although I wouldn't be > surprised if the attached patches hurt performance in some cases, I > didn't uncover any such cases in my performance testing, which I > admit mostly consisted of running the examples in the abovementioned > bug reports. > > I'll leave Bug#22239 open, as I get the following performance figures > (user+system CPU time) for the Bug#22239 benchmark, where list.txt is > created by "aspell dump master | head -n 100000 >list.txt", and the > grep commands all use the operands "-F -f list.txt /etc/passwd" in > the en_US.utf8 locale on Fedora 24 x86-64. > > no -i -i grep version > 0.25 0.33 2.16 > 0.26 10.95 2.21 > 0.11 2.90* current master (including attached patches) > > In the C locale, the current grep master is always significantly > faster than grep 2.16 or 2.21 on the benchmark, so the only > significant problem is the number marked "*". I ran the benchmarks on > an AMD Phenom II X4 910e. Thanks. BTW, are you aware of extreme slowdown in the following cases after third patch? yes $(printf %040d 0) | head -10000000 >inp printf '0\n1\n' >pat env LC_ALL=C src/grep -w -f pat inp From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 22 12:10:33 2016 Received: (at 22239) by debbugs.gnu.org; 22 Dec 2016 17:10:33 +0000 Received: from localhost ([127.0.0.1]:51773 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cK6sz-0001U4-GN for submit@debbugs.gnu.org; Thu, 22 Dec 2016 12:10:33 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:33862) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cK1Kc-0005gR-Id; Thu, 22 Dec 2016 06:14:43 -0500 Received: by mail-it0-f67.google.com with SMTP id 75so20466050ite.1; Thu, 22 Dec 2016 03:14:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=v4PHNBVPCCcfR3sLcwmkMdsRlrXobKPk+KTY1gg7ODg=; b=YG0/WzDzgrbelPTNIe6ewU8AcYY9WTFANx5jG9ca8SgJkaSJhQ6RZeCtakrfEBcMqh Im1Hzn1bnm7laeo77Lg0oKwaduot+a2ZLinyELMUQWDzSf3sB+EVCUQxlCyBQzeEAJCP Eup7mYqhd+Dyh6BDmix1tNS2ZIs3AbPYeliO17+8uzZ1keR9JVjFr8803N+54FmQpK2n i9rFOudYA0oqYVEV3JC/qI56ioxIvDEfyGPzWlZ3oP+1+ygGcnAJYmDkYod4g+DxSDoA Xbky2aDIGC91wrXmjZcJV8BXPpLqaOzEtaA60c39/pEwv6k58FzMzKKqZgvdgUCi3L+g hGKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=v4PHNBVPCCcfR3sLcwmkMdsRlrXobKPk+KTY1gg7ODg=; b=F7PtYvS3ZSVhhG5jPneMb6lsZBfjxpi9M5C+RulkGYhoO/aYPY19rmoXKt6fVTpqdC 5wDcytGJv3Px5gQtmpTDMzr3XrzvQOqreBPfkLPRIV8rRsJGPa/qhNPZQqFLXRch4LMj iWWF8bkA6NqpwsJAXl4800wjiisYPCFQS6oOJML/8sWhh4u3CBUBEiVhR2HxiHwx4Fjr TymMqaxC2NKzhVkXnM20+WuUjjigUJdLo/mZHOfYmKFJxE+lqVbj3OZWtHA+OfqJ51VK 3sGulo271zkKpB218QyHufoDv7TtG67VyqVqQmyAArI14erZg7/TMeM/cQd5iph9f2v7 aqBg== X-Gm-Message-State: AIkVDXILOFEcBGJb3d5rLKT7QpzZeRhWdexVol2IC84OdQrw4lHlKKUXVM4YqOsBAgfKYW3SBPVW2qpLK9pgpA== X-Received: by 10.36.78.15 with SMTP id r15mr9523534ita.55.1482405276909; Thu, 22 Dec 2016 03:14:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.191.5 with HTTP; Thu, 22 Dec 2016 03:14:16 -0800 (PST) In-Reply-To: <20161222080441.6C45.27F6AC2D@kcn.ne.jp> References: <20161209072419.GA30226@pog.tecnopolis.ca> <3a0eb353-96a4-80fd-5f3c-7fd77be096a0@cs.ucla.edu> <20161222080441.6C45.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Thu, 22 Dec 2016 03:14:16 -0800 X-Google-Sender-Auth: cW-7xuhDHxZh5UH6IaJ-pNfgnJo Message-ID: Subject: Re: bug#22357: grep -f not only huge memory usage, but also huge time cost To: Norihiro Tanaka Content-Type: text/plain; charset=UTF-8 X-Spam-Score: 3.5 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Wed, Dec 21, 2016 at 3:04 PM, Norihiro Tanaka wrote: > yes $(printf %040d 0) | head -10000000 >inp > printf '0\n1\n' >pat > env LC_ALL=C src/grep -w -f pat inp Thanks for mentioning that. Here is some actual data (Fedora 25 on an i7-4770S): [...] Content analysis details: (3.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 3.0 MANY_TO_CC Sent to 10+ recipients 0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.214.67 listed in dnsbl.sorbs.net] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.214.67 listed in wl.mailspike.net] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (meyering[at]gmail.com) 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.214.67 listed in list.dnswl.org] 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 0.0 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different X-Debbugs-Envelope-To: 22239 X-Mailman-Approved-At: Thu, 22 Dec 2016 12:10:30 -0500 Cc: Paul Eggert , 21763-done@debbugs.gnu.org, Bruce Dubbs , 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , "Bennett, Steve" , JQK , Aharon Robbins , 22239@debbugs.gnu.org, Trevor Cordes , Paul Jackson , Bruno Haible , Ond?ej Cifka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) On Wed, Dec 21, 2016 at 3:04 PM, Norihiro Tanaka wrote: > yes $(printf %040d 0) | head -10000000 >inp > printf '0\n1\n' >pat > env LC_ALL=C src/grep -w -f pat inp Thanks for mentioning that. Here is some actual data (Fedora 25 on an i7-4770S): for i in $(env ls -1dv /p/p/grep-*/bin/grep) src/grep; do env LC_ALL=C time -f "%e $i" $i -w -f pat inp; done 1.05 /p/p/grep-2.3/bin/grep 0.91 /p/p/grep-2.4.1/bin/grep 0.91 /p/p/grep-2.4.2/bin/grep 0.91 /p/p/grep-2.4/bin/grep 0.94 /p/p/grep-2.5.1/bin/grep 0.94 /p/p/grep-2.5.3/bin/grep 0.94 /p/p/grep-2.5.4/bin/grep 0.95 /p/p/grep-2.5/bin/grep 0.90 /p/p/grep-2.6.1/bin/grep 1.03 /p/p/grep-2.6.2/bin/grep 0.90 /p/p/grep-2.6.3/bin/grep 0.90 /p/p/grep-2.6/bin/grep 0.90 /p/p/grep-2.7/bin/grep 0.90 /p/p/grep-2.8/bin/grep 0.90 /p/p/grep-2.9/bin/grep 0.90 /p/p/grep-2.10/bin/grep 0.89 /p/p/grep-2.11/bin/grep 0.90 /p/p/grep-2.12/bin/grep 0.90 /p/p/grep-2.13/bin/grep 0.89 /p/p/grep-2.14/bin/grep 0.90 /p/p/grep-2.15/bin/grep 0.90 /p/p/grep-2.16/bin/grep 0.89 /p/p/grep-2.17/bin/grep 0.90 /p/p/grep-2.18/bin/grep 0.90 /p/p/grep-2.19/bin/grep 0.90 /p/p/grep-2.20/bin/grep 0.86 /p/p/grep-2.21/bin/grep 0.90 /p/p/grep-2.22/bin/grep 0.91 /p/p/grep-2.23/bin/grep 0.91 /p/p/grep-2.24/bin/grep 0.91 /p/p/grep-2.25/bin/grep 0.25 /p/p/grep-2.26/bin/grep 13.07 /p/p/grep-2.27/bin/grep 37.49 src/grep From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 23 20:38:57 2016 Received: (at 22239) by debbugs.gnu.org; 24 Dec 2016 01:38:57 +0000 Received: from localhost ([127.0.0.1]:53076 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cKbIW-0001q2-7T for submit@debbugs.gnu.org; Fri, 23 Dec 2016 20:38:57 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:41450) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cKbIS-0001pc-09; Fri, 23 Dec 2016 20:38:53 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id D41611600CB; Fri, 23 Dec 2016 17:38:45 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id bci7v12pNFPn; Fri, 23 Dec 2016 17:38:43 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0CEB61600CD; Fri, 23 Dec 2016 17:38:43 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id v3s_MA36AOum; Fri, 23 Dec 2016 17:38:42 -0800 (PST) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id B83EA1600CB; Fri, 23 Dec 2016 17:38:42 -0800 (PST) Subject: Re: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost To: Norihiro Tanaka References: <20161209072419.GA30226@pog.tecnopolis.ca> <3a0eb353-96a4-80fd-5f3c-7fd77be096a0@cs.ucla.edu> <20161222080441.6C45.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <820176a7-a1e4-7b3f-f140-7d2329d25d63@cs.ucla.edu> Date: Fri, 23 Dec 2016 17:38:42 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161222080441.6C45.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------6B4F36F83DA3EA4416933FAA" X-Spam-Score: -3.1 (---) X-Debbugs-Envelope-To: 22239 Cc: 21763-done@debbugs.gnu.org, 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , JQK , 22239@debbugs.gnu.org, Trevor Cordes , Bruno Haible , =?UTF-8?B?T25kxZllaiBDw61ma2E=?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.1 (---) This is a multi-part message in MIME format. --------------6B4F36F83DA3EA4416933FAA Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Norihiro Tanaka wrote: > are you aware of extreme slowdown in the following cases after third pa= tch? > > yes $(printf %040d 0) | head -10000000 >inp > printf '0\n1\n' >pat > env LC_ALL=3DC src/grep -w -f pat inp No. Thanks, I hadn't considered that possibility. I looked into the slowd= own and=20 installed the attached patches, which cause 'grep' to run about as fast o= n this=20 test case as grep 2.25 (though not as fast as grep 2.26). The main fix is= in=20 patch 5. On my platform: -------grep version------ v2.25 v2.26 v2.27 master locale command 1.21 0.69 24.95 1.22 C grep -w -f pat inp 207.36 203.15 202.03 1.22 en_US.utf8 grep -w -f pat inp 1.21 0.69 25.95 0.85 C grep -w -f pat inp -F 66.33 68.07 67.21 1.22 en_US.utf8 grep -w -f pat inp -F All numbers are user+system CPU seconds on Fedora 24 x86-64 (AMD Phenom I= I X4=20 910e). "master" means after the attached patches are installed. Perhaps we can fiddle with the heuristics a bit so that v2.26 is not=20 significantly faster than the master in the C locale. --------------6B4F36F83DA3EA4416933FAA Content-Type: text/x-diff; name="0001-maint-rewrite-to-avoid-some-macros.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-maint-rewrite-to-avoid-some-macros.patch" =46rom 969447fdebabe2046ec4261837d5e8f12f75fba9 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 23 Dec 2016 08:04:13 -0800 Subject: [PATCH 1/8] maint: rewrite to avoid some macros These days, the dangerous powers of C macros are not needed if constants or functions will do just as well. * src/grep.c (SEP_CHAR_SELECTED, SEP_CHAR_REJECTED, SEP_STR_GROUP) (INITIAL_BUFSIZE): * src/kwset.c (DEPTH_SIZE): Now constants, not macros. * src/kwset.c (link): Remove macro. Instead, rename local vars from 'link' to 'cur'. (malloc) [GREP]: Remove macro. All uses of malloc changed to xmalloc. Omit double-inclusion of xalloc.h. Do not depend on 'GREP'. (U): Now a function, not a macro. * src/kwset.c, src/searchutils.c (NCHAR): Move this macro to ... * src/system.h: ... here, and make it a constant. --- src/grep.c | 8 ++--- src/kwset.c | 95 ++++++++++++++++++++++++++-----------------------= ------ src/searchutils.c | 2 -- src/system.h | 1 + 4 files changed, 50 insertions(+), 56 deletions(-) diff --git a/src/grep.c b/src/grep.c index f36654c..3729ae0 100644 --- a/src/grep.c +++ b/src/grep.c @@ -50,9 +50,9 @@ #include "xalloc.h" #include "xstrtol.h" =20 -#define SEP_CHAR_SELECTED ':' -#define SEP_CHAR_REJECTED '-' -#define SEP_STR_GROUP "--" +enum { SEP_CHAR_SELECTED =3D ':' }; +enum { SEP_CHAR_REJECTED =3D '-' }; +char const SEP_STR_GROUP[] =3D "--"; =20 #define AUTHORS \ proper_name ("Mike Haertel"), \ @@ -797,7 +797,7 @@ skipped_file (char const *name, bool command_line, bo= ol is_dir) =20 static char *buffer; /* Base of buffer. */ static size_t bufalloc; /* Allocated buffer size, counting slop. */ -#define INITIAL_BUFSIZE 32768 /* Initial buffer size, not counting slop.= */ +enum { INITIAL_BUFSIZE =3D 32768 }; /* Initial buffer size, not counting= slop. */ static int bufdesc; /* File descriptor. */ static char *bufbeg; /* Beginning of user-visible stuff. */ static char *buflim; /* Limit of user-visible stuff. */ diff --git a/src/kwset.c b/src/kwset.c index 264ef22..506c6cd 100644 --- a/src/kwset.c +++ b/src/kwset.c @@ -42,19 +42,14 @@ #include "obstack.h" #include "xalloc.h" =20 -#define link kwset_link - -#ifdef GREP -# include "xalloc.h" -# undef malloc -# define malloc xmalloc -#endif - -#define NCHAR (UCHAR_MAX + 1) -#define obstack_chunk_alloc malloc +#define obstack_chunk_alloc xmalloc #define obstack_chunk_free free =20 -#define U(c) to_uchar (c) +static unsigned char +U (char ch) +{ + return to_uchar (ch); +} =20 /* Balanced tree of edges and labels leaving a given trie node. */ struct tree @@ -159,7 +154,7 @@ kwsalloc (char const *trans, bool reverse) =20 /* This upper bound is valid for CHAR_BIT >=3D 4 and exact for CHAR_BIT in { 4..11, 13, 15, 17, 19 }. */ -#define DEPTH_SIZE (CHAR_BIT + CHAR_BIT/2) +enum { DEPTH_SIZE =3D CHAR_BIT + CHAR_BIT / 2 }; =20 /* Add the given string to the contents of the keyword set. */ void @@ -181,46 +176,46 @@ kwsincr (kwset_t kwset, char const *text, size_t le= n) /* Descend the tree of outgoing links for this trie node, looking for the current character and keeping track of the path followed. */ - struct tree *link =3D trie->links; + struct tree *cur =3D trie->links; struct tree *links[DEPTH_SIZE]; enum { L, R } dirs[DEPTH_SIZE]; links[0] =3D (struct tree *) &trie->links; dirs[0] =3D L; int depth =3D 1; =20 - while (link && label !=3D link->label) + while (cur && label !=3D cur->label) { - links[depth] =3D link; - if (label < link->label) - dirs[depth++] =3D L, link =3D link->llink; + links[depth] =3D cur; + if (label < cur->label) + dirs[depth++] =3D L, cur =3D cur->llink; else - dirs[depth++] =3D R, link =3D link->rlink; + dirs[depth++] =3D R, cur =3D cur->rlink; } =20 /* The current character doesn't have an outgoing link at this trie node, so build a new trie node and install a link in the current trie node's tree. */ - if (!link) + if (!cur) { - link =3D obstack_alloc (&kwset->obstack, sizeof *link); - link->llink =3D NULL; - link->rlink =3D NULL; - link->trie =3D obstack_alloc (&kwset->obstack, sizeof *link->t= rie); - link->trie->accepting =3D 0; - link->trie->links =3D NULL; - link->trie->parent =3D trie; - link->trie->next =3D NULL; - link->trie->fail =3D NULL; - link->trie->depth =3D trie->depth + 1; - link->trie->shift =3D 0; - link->label =3D label; - link->balance =3D 0; + cur =3D obstack_alloc (&kwset->obstack, sizeof *cur); + cur->llink =3D NULL; + cur->rlink =3D NULL; + cur->trie =3D obstack_alloc (&kwset->obstack, sizeof *cur->tri= e); + cur->trie->accepting =3D 0; + cur->trie->links =3D NULL; + cur->trie->parent =3D trie; + cur->trie->next =3D NULL; + cur->trie->fail =3D NULL; + cur->trie->depth =3D trie->depth + 1; + cur->trie->shift =3D 0; + cur->label =3D label; + cur->balance =3D 0; =20 /* Install the new tree node in its parent. */ if (dirs[--depth] =3D=3D L) - links[depth]->llink =3D link; + links[depth]->llink =3D cur; else - links[depth]->rlink =3D link; + links[depth]->rlink =3D cur; =20 /* Back up the tree fixing the balance flags. */ while (depth && !links[depth]->balance) @@ -291,7 +286,7 @@ kwsincr (kwset_t kwset, char const *text, size_t len)= } } =20 - trie =3D link->trie; + trie =3D cur->trie; } =20 /* Mark the node we finally reached as accepting, encoding the @@ -326,7 +321,7 @@ static void treefails (struct tree const *tree, struct trie const *fail, struct trie *recourse, bool reverse) { - struct tree *link; + struct tree *cur; =20 if (!tree) return; @@ -338,16 +333,16 @@ treefails (struct tree const *tree, struct trie con= st *fail, node that has a descendant on the current label. */ while (fail) { - link =3D fail->links; - while (link && tree->label !=3D link->label) - if (tree->label < link->label) - link =3D link->llink; + cur =3D fail->links; + while (cur && tree->label !=3D cur->label) + if (tree->label < cur->label) + cur =3D cur->llink; else - link =3D link->rlink; - if (link) + cur =3D cur->rlink; + if (cur) { - tree->trie->fail =3D link->trie; - if (!reverse && link->trie->accepting && !tree->trie->acceptin= g) + tree->trie->fail =3D cur->trie; + if (!reverse && cur->trie->accepting && !tree->trie->accepting= ) tree->trie->accepting =3D SIZE_MAX; return; } @@ -641,18 +636,18 @@ static size_t memoff2_kwset (char const *s, size_t n, kwset_t kwset, struct kwsmatch *kwsmatch) { - struct tree const *link =3D kwset->trie->links; - struct tree const *clink =3D link->llink ? link->llink : link->rlink; + struct tree const *cur =3D kwset->trie->links; + struct tree const *clink =3D cur->llink ? cur->llink : cur->rlink; char const *mch =3D (clink - ? memchr2 (s, link->label, clink->label, n) - : memchr (s, link->label, n)); + ? memchr2 (s, cur->label, clink->label, n) + : memchr (s, cur->label, n)); if (! mch) return SIZE_MAX; else { size_t off =3D mch - s; - if (*mch =3D=3D link->label) - kwsmatch->index =3D link->trie->accepting / 2; + if (*mch =3D=3D cur->label) + kwsmatch->index =3D cur->trie->accepting / 2; else kwsmatch->index =3D clink->trie->accepting / 2; kwsmatch->offset[0] =3D off; diff --git a/src/searchutils.c b/src/searchutils.c index 73d6c1c..deaab60 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -22,8 +22,6 @@ #define SYSTEM_INLINE _GL_EXTERN_INLINE #include "search.h" =20 -#define NCHAR (UCHAR_MAX + 1) - kwset_t kwsinit (bool mb_trans) { diff --git a/src/system.h b/src/system.h index 6f4918d..c875275 100644 --- a/src/system.h +++ b/src/system.h @@ -37,6 +37,7 @@ #include =20 enum { EXIT_TROUBLE =3D 2 }; +enum { NCHAR =3D UCHAR_MAX + 1 }; =20 #include #define N_(String) gettext_noop(String) --=20 2.7.4 --------------6B4F36F83DA3EA4416933FAA Content-Type: text/x-diff; name="0002-grep-remove-C-label.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0002-grep-remove-C-label.patch" =46rom 19227eb98bc586a5ef3c2fb993a23c1182a4dab6 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 23 Dec 2016 10:54:54 -0800 Subject: [PATCH 2/8] grep: remove C label * src/kwsearch.c (Fexecute): Remove label. --- src/kwsearch.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/kwsearch.c b/src/kwsearch.c index 7275973..d04d34c 100644 --- a/src/kwsearch.c +++ b/src/kwsearch.c @@ -103,7 +103,7 @@ Fexecute (char const *buf, size_t size, size_t *match= _size, buf + size - beg + match_lines, &kwsmatch= , longest); if (offset =3D=3D (size_t) -1) - goto failure; + break; len =3D kwsmatch.size[0] - 2 * match_lines; if (mb_check && mb_goback (&mb_start, beg + offset, buf + size) !=3D= 0) { @@ -157,7 +157,6 @@ Fexecute (char const *buf, size_t size, size_t *match= _size, goto success; } /* for (beg in buf) */ =20 - failure: return -1; =20 success: --=20 2.7.4 --------------6B4F36F83DA3EA4416933FAA Content-Type: text/x-diff; name="0003-grep-simplify-Fexecute.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0003-grep-simplify-Fexecute.patch" =46rom f139976800435db91032f3b1d9435a166890be38 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 23 Dec 2016 11:10:23 -0800 Subject: [PATCH 3/8] grep: simplify Fexecute * src/kwsearch.c (Fexecute): Avoid the need for a 'try' local or for a 'goto success'. Update mb_start to reflect newline found. --- src/kwsearch.c | 46 ++++++++++++++++++++++++---------------------- 1 file changed, 24 insertions(+), 22 deletions(-) diff --git a/src/kwsearch.c b/src/kwsearch.c index d04d34c..5596ebd 100644 --- a/src/kwsearch.c +++ b/src/kwsearch.c @@ -81,7 +81,7 @@ size_t Fexecute (char const *buf, size_t size, size_t *match_size, char const *start_ptr) { - char const *beg, *try, *end, *mb_start; + char const *beg, *end, *mb_start; size_t len; char eol =3D eolbyte; struct kwsmatch kwsmatch; @@ -131,30 +131,32 @@ Fexecute (char const *buf, size_t size, size_t *mat= ch_size, len +=3D start_ptr =3D=3D NULL; goto success_in_beg_and_len; } - if (match_words) - for (try =3D beg; ; ) + if (! match_words) + goto success; + + /* Succeed if the preceding and following characters are word + constituents. If the following character is not a word + constituent, keep trying with shorter matches. */ + char const *bol =3D memrchr (mb_start, eol, beg - mb_start); + if (bol) + mb_start =3D bol + 1; + if (! wordchar (mb_prev_wc (mb_start, beg, buf + size))) + for (;;) { - char const *bol =3D memrchr (buf, eol, beg - buf); - bol =3D bol ? bol + 1 : buf; - if (wordchar (mb_prev_wc (bol, try, buf + size))) - break; - if (wordchar (mb_next_wc (try + len, buf + size))) + if (! wordchar (mb_next_wc (beg + len, buf + size))) { - if (!len) - break; - offset =3D kwsexec (kwset, beg, --len, &kwsmatch, true);= - if (offset =3D=3D (size_t) -1) - break; - try =3D beg + offset; - len =3D kwsmatch.size[0]; + if (start_ptr) + goto success_in_beg_and_len; + else + goto success; } - else if (!start_ptr) - goto success; - else - goto success_in_beg_and_len; - } /* for (try) */ - else - goto success; + if (!len) + break; + offset =3D kwsexec (kwset, beg, --len, &kwsmatch, true); + if (offset !=3D 0) + break; + len =3D kwsmatch.size[0]; + } } /* for (beg in buf) */ =20 return -1; --=20 2.7.4 --------------6B4F36F83DA3EA4416933FAA Content-Type: text/x-diff; name="0004-grep-specialize-word-finding-functions.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0004-grep-specialize-word-finding-functions.patch" =46rom 740048e66e7c55a8e42f4f7e4c24256a61506f70 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 23 Dec 2016 12:25:24 -0800 Subject: [PATCH 4/8] grep: specialize word-finding functions This improves performance a bit. * src/dfasearch.c, src/kwsearch.c (wordchar): Remove; now in searchutils.c. * src/grep.c (main): Call wordinit if -w. * src/search.h: Adjust. * src/searchutils.c: Include verify.h. (word_start): New static var. (wordchar): Move here from dfasearch.c and kwsearch.c. (wordinit, wordchars_count, wordchar_next, wordchar_prev): New functions. (mb_prev_wc, mb_next_wc): Remove. All callers changed to use the new functions instead. --- src/dfasearch.c | 11 ++----- src/grep.c | 1 + src/kwsearch.c | 11 ++----- src/search.h | 5 +-- src/searchutils.c | 91 +++++++++++++++++++++++++++++++++++++++++++------= ------ 5 files changed, 80 insertions(+), 39 deletions(-) diff --git a/src/dfasearch.c b/src/dfasearch.c index 24a36cd..87e1f7e 100644 --- a/src/dfasearch.c +++ b/src/dfasearch.c @@ -26,13 +26,6 @@ =20 struct localeinfo localeinfo; =20 -/* Whether -w considers WC to be a word constituent. */ -static bool -wordchar (wint_t wc) -{ - return wc =3D=3D L'_' || iswalnum (wc); -} - /* KWset compiled pattern. For Ecompile and Gcompile, we compile a list of strings, at least one of which is known to occur in any string matching the regexp. */ @@ -394,8 +387,8 @@ EGexecute (char const *buf, size_t size, size_t *matc= h_size, while (match <=3D best_match) { regoff_t shorter_len =3D 0; - if (!wordchar (mb_prev_wc (beg, match, end - 1)) - && !wordchar (mb_next_wc (match + len, end - 1))= ) + if (! wordchar_next (match + len, end - 1) + && ! wordchar_prev (beg, match, end - 1)) goto assess_pattern_match; if (len > 0) { diff --git a/src/grep.c b/src/grep.c index 3729ae0..f9d1d86 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2651,6 +2651,7 @@ main (int argc, char **argv) break; =20 case 'w': + wordinit (); match_words =3D true; break; =20 diff --git a/src/kwsearch.c b/src/kwsearch.c index 5596ebd..b30dfd0 100644 --- a/src/kwsearch.c +++ b/src/kwsearch.c @@ -21,13 +21,6 @@ #include #include "search.h" =20 -/* Whether -w considers WC to be a word constituent. */ -static bool -wordchar (wint_t wc) -{ - return wc =3D=3D L'_' || iswalnum (wc); -} - /* KWset compiled pattern. For Ecompile and Gcompile, we compile a list of strings, at least one of which is known to occur in any string matching the regexp. */ @@ -140,10 +133,10 @@ Fexecute (char const *buf, size_t size, size_t *mat= ch_size, char const *bol =3D memrchr (mb_start, eol, beg - mb_start); if (bol) mb_start =3D bol + 1; - if (! wordchar (mb_prev_wc (mb_start, beg, buf + size))) + if (! wordchar_prev (mb_start, beg, buf + size)) for (;;) { - if (! wordchar (mb_next_wc (beg + len, buf + size))) + if (! wordchar_next (beg + len, buf + size)) { if (start_ptr) goto success_in_beg_and_len; diff --git a/src/search.h b/src/search.h index 1ff5be2..6fe1797 100644 --- a/src/search.h +++ b/src/search.h @@ -46,10 +46,11 @@ _GL_INLINE_HEADER_BEGIN typedef signed char mb_len_map_t; =20 /* searchutils.c */ +extern void wordinit (void); extern kwset_t kwsinit (bool); +extern size_t wordchar_next (char const *, char const *); +extern bool wordchar_prev (char const *, char const *, char const *); extern ptrdiff_t mb_goback (char const **, char const *, char const *); -extern wint_t mb_prev_wc (char const *, char const *, char const *); -extern wint_t mb_next_wc (char const *, char const *); =20 /* dfasearch.c */ extern struct localeinfo localeinfo; diff --git a/src/searchutils.c b/src/searchutils.c index deaab60..e0a1db3 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -22,6 +22,30 @@ #define SYSTEM_INLINE _GL_EXTERN_INLINE #include "search.h" =20 +#include + +/* For each byte B, word_start[B] is 1 if B is a single-byte character + that is a word constituent, 0 if B cannot start a word constituent, + and -1 if B might be or might not be the start of a word + constituent. */ +static wint_t word_start[NCHAR]; +verify (WEOF !=3D 0 && WEOF !=3D 1); + +/* Whether -w considers WC to be a word constituent. */ +static bool +wordchar (wint_t wc) +{ + return wc =3D=3D L'_' || iswalnum (wc); +} + +void +wordinit (void) +{ + for (int i =3D 0; i < NCHAR; i++) + word_start[i] =3D (localeinfo.sbclen[i] =3D=3D -2 ? WEOF + : wordchar (localeinfo.sbctowc[i])); +} + kwset_t kwsinit (bool mb_trans) { @@ -93,27 +117,56 @@ mb_goback (char const **mb_start, char const *cur, c= har const *end) return p =3D=3D cur ? 0 : cur - p0; } =20 -/* In the buffer BUF, return the wide character that is encoded just - before CUR. The buffer ends at END. Return WEOF if there is no - wide character just before CUR. */ -wint_t -mb_prev_wc (char const *buf, char const *cur, char const *end) +/* Examine the start of BUF (of size SIZE) for word constituents. + If COUNTALL, examine as many as possible; otherwise, examine at most = one. + Return the total number of bytes in the examined characters. */ +static size_t +wordchars_count (char const *buf, char const *end, bool countall) { - if (cur =3D=3D buf) - return WEOF; - char const *p =3D buf; - cur--; - cur -=3D mb_goback (&p, cur, end); - return mb_next_wc (cur, end); + size_t n =3D 0; + mbstate_t mbs =3D { 0 }; + while (n < end - buf) + { + wint_t ws =3D word_start[to_uchar (buf[n])]; + if (ws =3D=3D 0) + break; + else if (ws =3D=3D 1) + n++; + else + { + wchar_t wc =3D 0; + size_t wcbytes =3D mbrtowc (&wc, buf + n, end - buf - n, &mbs)= ; + if (!wordchar (wc)) + break; + n +=3D wcbytes + !wcbytes; + } + if (!countall) + break; + } + return n; } =20 -/* Return the wide character that is encoded at CUR. The buffer ends - at END. Return WEOF if there is no wide character encoded at CUR. *= / -wint_t -mb_next_wc (char const *cur, char const *end) +/* If BUF starts with a word constituent, return the number of bytes + used to represent it; otherwise, return zero. The buffer ends at END= =2E */ +size_t +wordchar_next (char const *buf, char const *end) { - wchar_t wc; - mbstate_t mbs =3D { 0 }; - return (end - cur !=3D 0 && mbrtowc (&wc, cur, end - cur, &mbs) < (siz= e_t) -2 - ? wc : WEOF); + return wordchars_count (buf, end, false); +} + +/* In the buffer BUF, return true if the character whose encoding + contains the byte before CUR is a word constituent. The buffer + ends at END. */ +bool +wordchar_prev (char const *buf, char const *cur, char const *end) +{ + if (buf =3D=3D cur) + return false; + cur--; + wint_t ws =3D word_start[to_uchar (*cur)]; + if (! localeinfo.multibyte) + return ws =3D=3D 1; + char const *p =3D buf; + cur -=3D mb_goback (&p, cur, end); + return wordchar_next (cur, end) !=3D 0; } --=20 2.7.4 --------------6B4F36F83DA3EA4416933FAA Content-Type: text/x-diff; name="0005-grep-speed-up-wf-in-C-locale.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0005-grep-speed-up-wf-in-C-locale.patch" =46rom e89a7e6d4be4669a8a73650c28bb1eb69399d703 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 23 Dec 2016 12:43:46 -0800 Subject: [PATCH 5/8] grep: speed up -wf in C locale Problem reported by Norihiro Tanaka (Bug#22357#100). This patch improves the performance on that benchmark on my platform so that grep is now only about 2x slower than grep 2.26, which means it is considerably faster than grep 2.25 and earlier. * src/kwsearch.c (Fexecute): Use wordchars_size to boost performance for this case. * src/search.h, src/searchutils.c (wordchars_size): New function. --- src/kwsearch.c | 6 ++++++ src/search.h | 1 + src/searchutils.c | 9 +++++++++ 3 files changed, 16 insertions(+) diff --git a/src/kwsearch.c b/src/kwsearch.c index b30dfd0..6005b60 100644 --- a/src/kwsearch.c +++ b/src/kwsearch.c @@ -150,6 +150,12 @@ Fexecute (char const *buf, size_t size, size_t *matc= h_size, break; len =3D kwsmatch.size[0]; } + + /* No word match was found at BEG. Skip past word constituents, + since they cannot precede the next match and not skipping + them could make things much slower. */ + beg +=3D wordchars_size (beg, buf + size); + mb_start =3D beg; } /* for (beg in buf) */ =20 return -1; diff --git a/src/search.h b/src/search.h index 6fe1797..1def4d6 100644 --- a/src/search.h +++ b/src/search.h @@ -48,6 +48,7 @@ typedef signed char mb_len_map_t; /* searchutils.c */ extern void wordinit (void); extern kwset_t kwsinit (bool); +extern size_t wordchars_size (char const *, char const *); extern size_t wordchar_next (char const *, char const *); extern bool wordchar_prev (char const *, char const *, char const *); extern ptrdiff_t mb_goback (char const **, char const *, char const *); diff --git a/src/searchutils.c b/src/searchutils.c index e0a1db3..6f6ae0b 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -146,6 +146,15 @@ wordchars_count (char const *buf, char const *end, b= ool countall) return n; } =20 +/* Examine the start of BUF for the longest prefix containing just + word constituents. Return the total number of bytes in the prefix. + The buffer ends at END. */ +size_t +wordchars_size (char const *buf, char const *end) +{ + return wordchars_count (buf, end, true); +} + /* If BUF starts with a word constituent, return the number of bytes used to represent it; otherwise, return zero. The buffer ends at END= =2E */ size_t --=20 2.7.4 --------------6B4F36F83DA3EA4416933FAA Content-Type: text/x-diff; name="0006-grep-standardize-on-localeinfo.multibyte.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0006-grep-standardize-on-localeinfo.multibyte.patch" =46rom 4a9bbf519f3761a70c147597bcf967a27951b376 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 23 Dec 2016 12:57:10 -0800 Subject: [PATCH 6/8] grep: standardize on localeinfo.multibyte * src/dfasearch.c (EGexecute): * src/grep.c (main): * src/kwsearch.c (Fexecute): * src/pcresearch.c (Pcompile): Prefer localeinfo.multibyte to (MB_CUR_MAX > 1). --- src/dfasearch.c | 2 +- src/grep.c | 2 +- src/kwsearch.c | 2 +- src/pcresearch.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/dfasearch.c b/src/dfasearch.c index 87e1f7e..7f68907 100644 --- a/src/dfasearch.c +++ b/src/dfasearch.c @@ -270,7 +270,7 @@ EGexecute (char const *buf, size_t size, size_t *matc= h_size, =20 if (exact_kwset_match) { - if (MB_CUR_MAX =3D=3D 1 || localeinfo.using_utf8) + if (!localeinfo.multibyte | localeinfo.using_utf8) goto success; if (mb_start < beg) mb_start =3D beg; diff --git a/src/grep.c b/src/grep.c index f9d1d86..1c45286 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2852,7 +2852,7 @@ main (int argc, char **argv) (where -F does not work) or if -i and the patterns will not work for -iF. */ if (matcher =3D=3D F_MATCHER_INDEX - && (MB_CUR_MAX <=3D 1 + && (! localeinfo.multibyte ? match_words : (contains_encoding_error (keys, keycc) || (match_icase && !fgrep_icase_available (keys, keycc)))))= diff --git a/src/kwsearch.c b/src/kwsearch.c index 6005b60..7d11230 100644 --- a/src/kwsearch.c +++ b/src/kwsearch.c @@ -86,7 +86,7 @@ Fexecute (char const *buf, size_t size, size_t *match_s= ize, mb_check =3D longest =3D false; else { - mb_check =3D MB_CUR_MAX > 1 && !localeinfo.using_utf8; + mb_check =3D localeinfo.multibyte & !localeinfo.using_utf8; longest =3D mb_check | !!start_ptr | match_words; } =20 diff --git a/src/pcresearch.c b/src/pcresearch.c index 0e34861..245469c 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -110,7 +110,7 @@ Pcompile (char const *pattern, size_t size, reg_synta= x_t ignored) char const *p; char const *pnul; =20 - if (1 < MB_CUR_MAX) + if (localeinfo.multibyte) { if (! localeinfo.using_utf8) die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 loca= les")); --=20 2.7.4 --------------6B4F36F83DA3EA4416933FAA Content-Type: text/x-diff; name="0007-grep-improve-word-checking-with-UTF-8.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0007-grep-improve-word-checking-with-UTF-8.patch" =46rom 2c84095a777c2a20e99e92f406398501242ac131 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 23 Dec 2016 16:16:01 -0800 Subject: [PATCH 7/8] grep: improve word checking with UTF-8 * src/searchutils.c: Do not include . (word_start): Remove, replacing with ... (sbwordchar): New static var. All uses changed. (wordchar_prev): Return size_t, not bool, as this generates slightly better code. Go back faster if UTF-8. --- src/search.h | 2 +- src/searchutils.c | 85 +++++++++++++++++++++++++++++++++----------------= ------ 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/src/search.h b/src/search.h index 1def4d6..b700ed5 100644 --- a/src/search.h +++ b/src/search.h @@ -50,7 +50,7 @@ extern void wordinit (void); extern kwset_t kwsinit (bool); extern size_t wordchars_size (char const *, char const *); extern size_t wordchar_next (char const *, char const *); -extern bool wordchar_prev (char const *, char const *, char const *); +extern size_t wordchar_prev (char const *, char const *, char const *); extern ptrdiff_t mb_goback (char const **, char const *, char const *); =20 /* dfasearch.c */ diff --git a/src/searchutils.c b/src/searchutils.c index 6f6ae0b..3ba3cdb 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -22,14 +22,9 @@ #define SYSTEM_INLINE _GL_EXTERN_INLINE #include "search.h" =20 -#include - -/* For each byte B, word_start[B] is 1 if B is a single-byte character - that is a word constituent, 0 if B cannot start a word constituent, - and -1 if B might be or might not be the start of a word - constituent. */ -static wint_t word_start[NCHAR]; -verify (WEOF !=3D 0 && WEOF !=3D 1); +/* For each byte B, sbwordchar[B] is true if B is a single-byte + character that is a word constituent, and is false otherwise. */ +static bool sbwordchar[NCHAR]; =20 /* Whether -w considers WC to be a word constituent. */ static bool @@ -42,8 +37,7 @@ void wordinit (void) { for (int i =3D 0; i < NCHAR; i++) - word_start[i] =3D (localeinfo.sbclen[i] =3D=3D -2 ? WEOF - : wordchar (localeinfo.sbctowc[i])); + sbwordchar[i] =3D wordchar (localeinfo.sbctowc[i]); } =20 kwset_t @@ -94,23 +88,46 @@ mb_goback (char const **mb_start, char const *cur, ch= ar const *end) { const char *p =3D *mb_start; const char *p0 =3D p; - mbstate_t cur_state; =20 - memset (&cur_state, 0, sizeof cur_state); + if (cur <=3D p) + return cur - p; =20 - while (p < cur) + if (localeinfo.using_utf8) { - size_t clen =3D mb_clen (p, end - p, &cur_state); - - if ((size_t) -2 <=3D clen) + p =3D cur; + + if (cur < end && (*cur & 0xc0) =3D=3D 0x80) + for (int i =3D 1; i <=3D 3; i++) + if ((cur[-i] & 0xc0) !=3D 0x80) + { + mbstate_t mbs =3D { 0 }; + size_t clen =3D mb_clen (cur - i, end - (cur - i), &mbs); + if (i < clen && clen < (size_t) -2) + { + p0 =3D cur - i; + p =3D p0 + clen; + } + break; + } + } + else + { + mbstate_t mbs =3D { 0 }; + do { - /* An invalid sequence, or a truncated multibyte character. - Treat it as a single byte character. */ - clen =3D 1; - memset (&cur_state, 0, sizeof cur_state); + size_t clen =3D mb_clen (p, end - p, &mbs); + + if ((size_t) -2 <=3D clen) + { + /* An invalid sequence, or a truncated multibyte character= =2E + Treat it as a single byte character. */ + clen =3D 1; + memset (&mbs, 0, sizeof mbs); + } + p0 =3D p; + p +=3D clen; } - p0 =3D p; - p +=3D clen; + while (p < cur); } =20 *mb_start =3D p; @@ -127,11 +144,11 @@ wordchars_count (char const *buf, char const *end, = bool countall) mbstate_t mbs =3D { 0 }; while (n < end - buf) { - wint_t ws =3D word_start[to_uchar (buf[n])]; - if (ws =3D=3D 0) - break; - else if (ws =3D=3D 1) + unsigned char b =3D buf[n]; + if (sbwordchar[b]) n++; + else if (localeinfo.sbclen[b] !=3D -2) + break; else { wchar_t wc =3D 0; @@ -163,19 +180,19 @@ wordchar_next (char const *buf, char const *end) return wordchars_count (buf, end, false); } =20 -/* In the buffer BUF, return true if the character whose encoding +/* In the buffer BUF, return nonzero if the character whose encoding contains the byte before CUR is a word constituent. The buffer ends at END. */ -bool +size_t wordchar_prev (char const *buf, char const *cur, char const *end) { if (buf =3D=3D cur) - return false; - cur--; - wint_t ws =3D word_start[to_uchar (*cur)]; - if (! localeinfo.multibyte) - return ws =3D=3D 1; + return 0; + unsigned char b =3D *--cur; + if (! localeinfo.multibyte + || (localeinfo.using_utf8 && localeinfo.sbclen[b] !=3D -2)) + return sbwordchar[b]; char const *p =3D buf; cur -=3D mb_goback (&p, cur, end); - return wordchar_next (cur, end) !=3D 0; + return wordchar_next (cur, end); } --=20 2.7.4 --------------6B4F36F83DA3EA4416933FAA Content-Type: text/x-diff; name="0008-grep-fix-comment-in-searchutils.c.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0008-grep-fix-comment-in-searchutils.c.patch" =46rom 2353c63efb1c471c752a1b6575f81dcafb67684b Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 23 Dec 2016 17:29:54 -0800 Subject: [PATCH 8/8] grep: fix comment in searchutils.c --- src/searchutils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/searchutils.c b/src/searchutils.c index 3ba3cdb..1552ed7 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -77,7 +77,7 @@ kwsinit (bool mb_trans) start of a multibyte character or is an error-encoding byte. The buffer ends at END (i.e., one past the address of the buffer's last byte). If CUR is already at a boundary, return 0. If *MB_START is - greater than or equal to CUR, return the negative value CUR - *MB_STA= RT. + greater than CUR, return the negative value CUR - *MB_START. =20 When returning zero, set *MB_START to CUR. When returning a positive value, set *MB_START to the next boundary after CUR, or to --=20 2.7.4 --------------6B4F36F83DA3EA4416933FAA-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 07:07:05 2016 Received: (at 22239) by debbugs.gnu.org; 26 Dec 2016 12:07:05 +0000 Received: from localhost ([127.0.0.1]:55358 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cLU3V-0005kW-Jv for submit@debbugs.gnu.org; Mon, 26 Dec 2016 07:07:05 -0500 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:55883) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cLU3S-0005je-JZ for 22239@debbugs.gnu.org; Mon, 26 Dec 2016 07:07:03 -0500 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 727814A090F for <22239@debbugs.gnu.org>; Mon, 26 Dec 2016 21:06:55 +0900 (JST) X-matriXscan-loop-detect: 254e150c4fb1c825d91bf0f8eba1195456a39152 Received: from mail01.kcn.ne.jp ([61.86.6.180]) by mxs01-s with ESMTP; Mon, 26 Dec 2016 21:06:54 +0900 (JST) Received: from [10.120.1.67] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail01.kcn.ne.jp (Postfix) with ESMTPA id C96BF5A82E8; Mon, 26 Dec 2016 21:06:53 +0900 (JST) Date: Mon, 26 Dec 2016 21:06:55 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost In-Reply-To: <820176a7-a1e4-7b3f-f140-7d2329d25d63@cs.ucla.edu> References: <20161222080441.6C45.27F6AC2D@kcn.ne.jp> <820176a7-a1e4-7b3f-f140-7d2329d25d63@cs.ucla.edu> Message-Id: <20161226210654.554A.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -3.1 (---) X-Debbugs-Envelope-To: 22239 Cc: 21763-done@debbugs.gnu.org, 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , JQK , 22239@debbugs.gnu.org, Trevor Cordes , Bruno Haible , Ond?ej Cifka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.1 (---) On Fri, 23 Dec 2016 17:38:42 -0800 Paul Eggert wrote: > No. Thanks, I hadn't considered that possibility. I looked into the > slowdown and installed the attached patches, which cause 'grep' to > run about as fast on this test case as grep 2.25 (though not as fast > as grep 2.26). The main fix is in patch 5. On my platform: Hmm, how about the following test cases, although it is extreame? $ cat pat 0 00 0 00 00 0 00 00 00 0 00 00 00 00 0 00 00 00 00 00 0 00 00 00 00 00 00 0 00 00 00 00 00 00 00 0 00 00 00 00 00 00 00 00 0 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 00 00 00 00 00 00 00 00 0 $ yes '00 00 00 00 00 00 00 00 00 00 00 00 00' | head -1000000 >inp $ time -p env LC_ALL=C src/grep -w -f pat inp From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 08:01:45 2016 Received: (at 22239) by debbugs.gnu.org; 26 Dec 2016 13:01:45 +0000 Received: from localhost ([127.0.0.1]:55389 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cLUuP-0000TH-GO for submit@debbugs.gnu.org; Mon, 26 Dec 2016 08:01:45 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:33337) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cLUuM-0000Sp-Rl; Mon, 26 Dec 2016 08:01:43 -0500 Received: by mail-it0-f67.google.com with SMTP id c20so30789116itb.0; Mon, 26 Dec 2016 05:01:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=guQGwp8QkhhQv1dXrud9cPRky2JCpUA4K9FhWsSBQwE=; b=YWTxxgV8zuyXZQ2ttNlXh4caomFJ7XoJPpnx8DMkAzCfWjg8hx9ObCkSw50HFmAdKU i3/6siocvaRxNJTgzJcct+M1zKB4+weVG/lD6XM32a3aAcHMyYEZkbsSP4Skjll0Ya0c nFoUZwrbVmn1/+gMvQ4Iu+uOzduVSEp3crgBaOSszlFwMHcSAFQlrlLTfAv9wzlh7kde Eis1H2ASkTnGyr4Fzm0hcu7dqJI4sxmBdkYXua0ePZL9WQPMAY25EWOwBChCrtUdaxWZ UitJA47nRFptP8otqNj3cc5Y8zzEDV6ZQehRhrO5QN9bF9NiV+XsnmRlBIexyWVF9V2I FufQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=guQGwp8QkhhQv1dXrud9cPRky2JCpUA4K9FhWsSBQwE=; b=F7zBe7zWXBdwk0QTKIFZRYIF4d7zljFJVwqnRvVQpgbhlH3xtxCkG+fl5++XymsQW+ hjhfwRG5wpWMLUDgu7tZt8IcC89ZCeZfJ68JoEbBPLVMDdn/tEw/A3AsU9hhv4/m2s8Y sTA96+33y4XzOt60d+NPwv6tbjrRRqSRrDNbqTuvB+z1AZTMJZeDhRQEeR9hX4Cr813J LCI+dch1ZZ0UWXoJxfZTZWMrRnXpWiTphcMDjZhVHMzKoi4mX99Cv3Og7Zfjp3I0gVQv N/5UDyHu+olh3DV3CS8orld2N/fbN+C8BSQfYaDh/MPUcbuPN+yv279+zB8yFfYQnt9U /lbA== X-Gm-Message-State: AIkVDXIZ0bFU9AxAcpKFcnMQ9hDImtMtq6iO/wLlHSfFJh4Eo60T9BD/3O1+cdSDXnxpbFYWUtJXHGXUPF1HGg== X-Received: by 10.36.17.133 with SMTP id 127mr25564240itf.31.1482757296852; Mon, 26 Dec 2016 05:01:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.191.5 with HTTP; Mon, 26 Dec 2016 05:01:16 -0800 (PST) In-Reply-To: <20161226210654.554A.27F6AC2D@kcn.ne.jp> References: <20161222080441.6C45.27F6AC2D@kcn.ne.jp> <820176a7-a1e4-7b3f-f140-7d2329d25d63@cs.ucla.edu> <20161226210654.554A.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Mon, 26 Dec 2016 14:01:16 +0100 X-Google-Sender-Auth: ARghcFI6yA-cCLNGhp7huJd2GoM Message-ID: Subject: Re: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost To: Norihiro Tanaka Content-Type: text/plain; charset=UTF-8 X-Spam-Score: 3.5 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Mon, Dec 26, 2016 at 1:06 PM, Norihiro Tanaka wrote: > > On Fri, 23 Dec 2016 17:38:42 -0800 > Paul Eggert wrote: > >> No. Thanks, I hadn't considered that possibility. I looked into the >> slowdown and installed the attached patches, which cause 'grep' to >> run about as fast on this test case as grep 2.25 (though not as fast >> as grep 2.26). The main fix is in patch 5. On my platform: > > Hmm, how about the following test cases, although it is extreame? > > $ cat pat > 0 > 00 0 > 00 00 0 > 00 00 00 0 > 00 00 00 00 0 > 00 00 00 00 00 0 > 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 00 00 0 > $ yes '00 00 00 00 00 00 00 00 00 00 00 00 00' | > head -1000000 >inp > $ time -p env LC_ALL=C src/grep -w -f pat inp [...] Content analysis details: (3.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 3.0 MANY_TO_CC Sent to 10+ recipients 0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.214.67 listed in dnsbl.sorbs.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.214.67 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (meyering[at]gmail.com) 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.214.67 listed in wl.mailspike.net] -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid 0.0 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different X-Debbugs-Envelope-To: 22239 Cc: Paul Eggert , 21763-done@debbugs.gnu.org, 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , JQK , 22239@debbugs.gnu.org, Trevor Cordes , Bruno Haible , Ond?ej Cifka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 3.5 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Mon, Dec 26, 2016 at 1:06 PM, Norihiro Tanaka wrote: > > On Fri, 23 Dec 2016 17:38:42 -0800 > Paul Eggert wrote: > >> No. Thanks, I hadn't considered that possibility. I looked into the >> slowdown and installed the attached patches, which cause 'grep' to >> run about as fast on this test case as grep 2.25 (though not as fast >> as grep 2.26). The main fix is in patch 5. On my platform: > > Hmm, how about the following test cases, although it is extreame? > > $ cat pat > 0 > 00 0 > 00 00 0 > 00 00 00 0 > 00 00 00 00 0 > 00 00 00 00 00 0 > 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 00 00 0 > $ yes '00 00 00 00 00 00 00 00 00 00 00 00 00' | > head -1000000 >inp > $ time -p env LC_ALL=C src/grep -w -f pat inp [...] Content analysis details: (3.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.214.67 listed in dnsbl.sorbs.net] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.214.67 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.214.67 listed in list.dnswl.org] 3.0 MANY_TO_CC Sent to 10+ recipients -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (meyering[at]gmail.com) 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid 0.0 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different On Mon, Dec 26, 2016 at 1:06 PM, Norihiro Tanaka wrote: > > On Fri, 23 Dec 2016 17:38:42 -0800 > Paul Eggert wrote: > >> No. Thanks, I hadn't considered that possibility. I looked into the >> slowdown and installed the attached patches, which cause 'grep' to >> run about as fast on this test case as grep 2.25 (though not as fast >> as grep 2.26). The main fix is in patch 5. On my platform: > > Hmm, how about the following test cases, although it is extreame? > > $ cat pat > 0 > 00 0 > 00 00 0 > 00 00 00 0 > 00 00 00 00 0 > 00 00 00 00 00 0 > 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 00 0 > 00 00 00 00 00 00 00 00 00 00 00 00 00 0 > $ yes '00 00 00 00 00 00 00 00 00 00 00 00 00' | > head -1000000 >inp > $ time -p env LC_ALL=C src/grep -w -f pat inp That took 7 seconds for me. Here is one that is O(N^2) in the length of the literal search string: (the search strings are sequences of '0's, with lengths from 10k.. 320k): $ n=10000; for i in $(seq 6); do env time -f "$n %e" src/grep -f <(printf %0${n}d 0) <<<1; ((n *= 2)); done 10000 0.44 20000 1.44 40000 5.41 80000 21.27 160000 97.27 320000 426.72 From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 15:07:58 2016 Received: (at 22239) by debbugs.gnu.org; 26 Dec 2016 20:07:58 +0000 Received: from localhost ([127.0.0.1]:55969 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cLbYs-0006kU-Jt for submit@debbugs.gnu.org; Mon, 26 Dec 2016 15:07:58 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:54010) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cLbYq-0006k8-Ll; Mon, 26 Dec 2016 15:07:57 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id ACAE9160079; Mon, 26 Dec 2016 12:07:50 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id ekOKOZrEI9Td; Mon, 26 Dec 2016 12:07:49 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C4C2E16007E; Mon, 26 Dec 2016 12:07:49 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id GMSYi0oqeaut; Mon, 26 Dec 2016 12:07:49 -0800 (PST) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 90DFD160079; Mon, 26 Dec 2016 12:07:49 -0800 (PST) Subject: Re: bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost To: Norihiro Tanaka References: <20161222080441.6C45.27F6AC2D@kcn.ne.jp> <820176a7-a1e4-7b3f-f140-7d2329d25d63@cs.ucla.edu> <20161226210654.554A.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <9236dc72-5ec4-20ed-56d5-454876c82bec@cs.ucla.edu> Date: Mon, 26 Dec 2016 12:07:49 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161226210654.554A.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------4C2E43CAACF24E1AF65CFC08" X-Spam-Score: -3.1 (---) X-Debbugs-Envelope-To: 22239 Cc: 21763-done@debbugs.gnu.org, 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , JQK , 22239@debbugs.gnu.org, Trevor Cordes , Bruno Haible , Ond?ej Cifka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.1 (---) This is a multi-part message in MIME format. --------------4C2E43CAACF24E1AF65CFC08 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Norihiro Tanaka wrote: > Hmm, how about the following test cases, although it is extreame? I don't think we need to worry about performance for the case when -w is = given,=20 and a pattern matches data that contains non-word characters. In practice= , such=20 cases are rare. I expect that most users would be surprised that -w can m= atch=20 non-word characters, and that users wouldn't object to -w rejecting such = matches=20 (if this wouldn't hurt performance significantly). While looking into this I did find a very small performance tweak for the= test=20 case, and installed the attached. --------------4C2E43CAACF24E1AF65CFC08 Content-Type: text/x-diff; name="0001-grep-minor-performance-tweak-for-pure-functions.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0001-grep-minor-performance-tweak-for-pure-functions.patch" =46rom 7ad36793e6bc43b98a37b1512b057a7a03ed10b2 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Mon, 26 Dec 2016 09:42:29 -0800 Subject: [PATCH] grep: minor performance tweak for pure functions * src/search.h (wordchars_size, wordchar_next, wordchar_prev): Declare to be pure. --- src/search.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/search.h b/src/search.h index 469fc9c..1aa2f94 100644 --- a/src/search.h +++ b/src/search.h @@ -48,9 +48,10 @@ typedef signed char mb_len_map_t; /* searchutils.c */ extern void wordinit (void); extern kwset_t kwsinit (bool); -extern size_t wordchars_size (char const *, char const *); -extern size_t wordchar_next (char const *, char const *); -extern size_t wordchar_prev (char const *, char const *, char const *); +extern size_t wordchars_size (char const *, char const *) _GL_ATTRIBUTE_= PURE; +extern size_t wordchar_next (char const *, char const *) _GL_ATTRIBUTE_P= URE; +extern size_t wordchar_prev (char const *, char const *, char const *) + _GL_ATTRIBUTE_PURE; extern ptrdiff_t mb_goback (char const **, char const *, char const *); =20 /* dfasearch.c */ --=20 2.7.4 --------------4C2E43CAACF24E1AF65CFC08-- From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 27 19:21:07 2016 Received: (at 22239) by debbugs.gnu.org; 28 Dec 2016 00:21:07 +0000 Received: from localhost ([127.0.0.1]:57399 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cM1zP-0001oE-FC for submit@debbugs.gnu.org; Tue, 27 Dec 2016 19:21:07 -0500 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:53008) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cM1zO-0001o2-2A for 22239@debbugs.gnu.org; Tue, 27 Dec 2016 19:21:07 -0500 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id D817A4A08CA for <22239@debbugs.gnu.org>; Wed, 28 Dec 2016 09:21:02 +0900 (JST) X-matriXscan-loop-detect: 812ce1e62af099b3b94bb2ebeb20def189b2a5c4 Received: from mail06.kcn.ne.jp ([61.86.6.185]) by mxs02-s with ESMTP; Wed, 28 Dec 2016 09:21:01 +0900 (JST) Received: from [10.120.1.58] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail06.kcn.ne.jp (Postfix) with ESMTPA id 2B3A91BF0092; Wed, 28 Dec 2016 09:21:01 +0900 (JST) Date: Wed, 28 Dec 2016 09:21:02 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost In-Reply-To: <9236dc72-5ec4-20ed-56d5-454876c82bec@cs.ucla.edu> References: <20161226210654.554A.27F6AC2D@kcn.ne.jp> <9236dc72-5ec4-20ed-56d5-454876c82bec@cs.ucla.edu> Message-Id: <20161228091446.CE2D.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5863030800000000CE28_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -3.1 (---) X-Debbugs-Envelope-To: 22239 Cc: 21763-done@debbugs.gnu.org, 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , JQK , 22239@debbugs.gnu.org, Trevor Cordes , Bruno Haible , Ond?ej Cifka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.1 (---) --------_5863030800000000CE28_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Mon, 26 Dec 2016 12:07:49 -0800 Paul Eggert wrote: > Norihiro Tanaka wrote: > > Hmm, how about the following test cases, although it is extreame? > > I don't think we need to worry about performance for the case when -w > is given, and a pattern matches data that contains non-word > characters. In practice, such cases are rare. I expect that most > users would be surprised that -w can match non-word characters, and > that users wouldn't object to -w rejecting such matches (if this > wouldn't hurt performance significantly). > > While looking into this I did find a very small performance tweak for > the test case, and installed the attached. Thanks. BTW, with multiple patterns in current master, former uses fgrep matcher, and later users grep matcher. I think that it is not reasonable. env LC_ALL=C grep -w -f pat inp env LC_ALL=C grep -F -w -f pat inp So I wrote the patch to use fgrep matcher for both. --------_5863030800000000CE28_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-imorove-performance-with-multiple-patterns.patch" Content-Disposition: attachment; filename="0001-grep-imorove-performance-with-multiple-patterns.patch" Content-Transfer-Encoding: base64 RnJvbSAwOGQ0MjZkOGEwYmY4NjgzYjBlMzRlYWQyZmQzNTYxNjcxNDE0NzM1IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBXZWQsIDI4IERlYyAyMDE2IDA4OjU3OjU0ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZ3Jl cDogaW1vcm92ZSBwZXJmb3JtYW5jZSB3aXRoIG11bHRpcGxlIHBhdHRlcm5zCgoqIHNyYy9ncmVw LmMgKG1haW4pOiBBdm9pZCBmZ3JlcC10by1ncmVwIGNvbnZlcnNpb24gZm9yIHdvcmQgbWF0Y2hp bmcKd2l0aCBtdWx0aXBsZSBwYXR0ZXJucyBpbiBzaW5nbGUgYnl0ZSBsb2NhbGVzLgotLS0KIHNy Yy9ncmVwLmMgfCAgICAyICstCiAxIGZpbGVzIGNoYW5nZWQsIDEgaW5zZXJ0aW9ucygrKSwgMSBk ZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zcmMvZ3JlcC5jIGIvc3JjL2dyZXAuYwppbmRleCBh ZWJhYjIwLi44Y2Y1MjJjIDEwMDY0NAotLS0gYS9zcmMvZ3JlcC5jCisrKyBiL3NyYy9ncmVwLmMK QEAgLTI4NjIsNyArMjg2Miw3IEBAIG1haW4gKGludCBhcmdjLCBjaGFyICoqYXJndikKICAgICAg Zm9yIC1pRi4gICovCiAgIGlmIChtYXRjaGVyID09IEZfTUFUQ0hFUl9JTkRFWAogICAgICAgJiYg KCEgbG9jYWxlaW5mby5tdWx0aWJ5dGUKLSAgICAgICAgICA/IG1hdGNoX3dvcmRzCisgICAgICAg ICAgPyAobl9wYXR0ZXJucyA9PSAxICYmIG1hdGNoX3dvcmRzKQogICAgICAgICAgIDogKGNvbnRh aW5zX2VuY29kaW5nX2Vycm9yIChrZXlzLCBrZXljYykKICAgICAgICAgICAgICB8fCAobWF0Y2hf aWNhc2UgJiYgIWZncmVwX2ljYXNlX2F2YWlsYWJsZSAoa2V5cywga2V5Y2MpKSkpKQogICAgIHsK LS0gCjEuNy4xCgo= --------_5863030800000000CE28_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 28 01:37:34 2016 Received: (at 22239) by debbugs.gnu.org; 28 Dec 2016 06:37:34 +0000 Received: from localhost ([127.0.0.1]:57562 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cM7ri-0002nZ-2D for submit@debbugs.gnu.org; Wed, 28 Dec 2016 01:37:34 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:60530) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cM7rg-0002n9-Hj; Wed, 28 Dec 2016 01:37:33 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 6B55D1600A5; Tue, 27 Dec 2016 22:37:26 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id BWnIFIAgBJi5; Tue, 27 Dec 2016 22:37:25 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B442D1600E8; Tue, 27 Dec 2016 22:37:25 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 7fO7XX8cJpoy; Tue, 27 Dec 2016 22:37:25 -0800 (PST) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 842F01600A5; Tue, 27 Dec 2016 22:37:25 -0800 (PST) Subject: Re: bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost To: Norihiro Tanaka References: <20161226210654.554A.27F6AC2D@kcn.ne.jp> <9236dc72-5ec4-20ed-56d5-454876c82bec@cs.ucla.edu> <20161228091446.CE2D.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <7ab75385-62ba-8834-70df-f94784ab0526@cs.ucla.edu> Date: Tue, 27 Dec 2016 22:37:25 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161228091446.CE2D.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.1 (---) X-Debbugs-Envelope-To: 22239 Cc: 21763-done@debbugs.gnu.org, 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , JQK , 22239@debbugs.gnu.org, Trevor Cordes , Bruno Haible , Ond?ej Cifka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.1 (---) Norihiro Tanaka wrote: > So I wrote the patch to use fgrep matcher for both. Thanks, I installed that after tweaking the commit message and omitting unnecessary parens. From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 28 09:33:44 2016 Received: (at 22239) by debbugs.gnu.org; 28 Dec 2016 14:33:44 +0000 Received: from localhost ([127.0.0.1]:57688 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMFIW-0007Z3-7p for submit@debbugs.gnu.org; Wed, 28 Dec 2016 09:33:44 -0500 Received: from mailgw06.kcn.ne.jp ([61.86.7.213]:33556) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMFIU-0007YZ-DV for 22239@debbugs.gnu.org; Wed, 28 Dec 2016 09:33:43 -0500 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw06.kcn.ne.jp (Postfix) with ESMTP id 2BAEDC8007 for <22239@debbugs.gnu.org>; Wed, 28 Dec 2016 23:33:35 +0900 (JST) X-matriXscan-loop-detect: ef7ef3769c2ce2c32e65267f5bcc2fbac092238c Received: from mail05.kcn.ne.jp ([61.86.6.184]) by mxs02-s with ESMTP; Wed, 28 Dec 2016 23:33:33 +0900 (JST) Received: from [10.120.1.74] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail05.kcn.ne.jp (Postfix) with ESMTPA id F1EE67D0098; Wed, 28 Dec 2016 23:33:32 +0900 (JST) Date: Wed, 28 Dec 2016 23:33:34 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost In-Reply-To: <7ab75385-62ba-8834-70df-f94784ab0526@cs.ucla.edu> References: <20161228091446.CE2D.27F6AC2D@kcn.ne.jp> <7ab75385-62ba-8834-70df-f94784ab0526@cs.ucla.edu> Message-Id: <20161228233333.5DE8.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -3.1 (---) X-Debbugs-Envelope-To: 22239 Cc: 21763-done@debbugs.gnu.org, 22357-done@debbugs.gnu.org, sur-behoffski , "L.A. Walsh" , JQK , 22239@debbugs.gnu.org, Trevor Cordes , Bruno Haible , Ond?ej Cifka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.1 (---) On Tue, 27 Dec 2016 22:37:25 -0800 Paul Eggert wrote: > Norihiro Tanaka wrote: > > So I wrote the patch to use fgrep matcher for both. > > Thanks, I installed that after tweaking the commit message and omitting unnecessary parens. Thanks, I confirmed it. From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 08 01:04:09 2017 Received: (at 22239) by debbugs.gnu.org; 8 Jan 2017 06:04:09 +0000 Received: from localhost ([127.0.0.1]:46434 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cQ6aO-0003HT-NX for submit@debbugs.gnu.org; Sun, 08 Jan 2017 01:04:09 -0500 Received: from mail-yw0-f182.google.com ([209.85.161.182]:34991) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cQ6aM-0003Gx-7C for 22239@debbugs.gnu.org; Sun, 08 Jan 2017 01:04:06 -0500 Received: by mail-yw0-f182.google.com with SMTP id v81so300420021ywb.2 for <22239@debbugs.gnu.org>; Sat, 07 Jan 2017 22:04:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucla-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=kgUeT5RXyVk2M3G4j7nOi5S3JXP7g3guCs05jih9QJk=; b=dLK5AAsZUivg+GMfbI85TWNYExP/u7geNO8aWsmopA2o3Uxt/kU6Q4o/bVlrjqgfkW O3s1gVRaGa91s5kkh5NA7KORxx6BPUE6momOuG7EiW4o4y0MCTovvo3lNEdwcyUsp8Di JulxLl3xwPl38qiBIoQxwvl09A0j8djkIcbxhc+O7bs7Yy3c9EaT/Ks1xA8GoOwvaICX xBq+lL9uwjnO+rMmKfcrY+wmXnnkYMyFPz02DDDXW4BVc/aKhn1iKb3vzOc9guMzSVjs edk5e6erzCt19+7TtbbMe5jqL9vsOs/P0S4ZUl/qVYnUlDZYQ7xebL3flTpTfOhwr+6x qDXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=kgUeT5RXyVk2M3G4j7nOi5S3JXP7g3guCs05jih9QJk=; b=Nx4Yx6VcHAL00hMYIUx7+/Yw6lrDrqc9+TpTIg4PpT94CjDVqE7RtXV0j3UeDbU/SR m88XByJIYpiua9R5vl3s2cFfu37aCZF+D6OK19izcKF93amhXhNY9bwfDU0m4XS49ylw WIPFzEqA4CEyF9PuDSoV/BnzjzmzESnJXAt6I+UBQhhztAdkzyk6qpHyplCA1Sh5XU0y rnyt+UKvKkWfVyhrWqgmvxQYWdw5A7miFLF/haBtAq5jghmt/+/eMMW4wvakCCNW650z iHbWB8phQ2i+WVI+Seb73Xa3iO1IvL2oSghq7s7JxJ38f3gAjK5+F4ALMl8dZuJ+kF90 6HQw== X-Gm-Message-State: AIkVDXIrmouWT4KDPJV+DwTz247mdL3+LqV41zoErXBPww9gz9A8T8VhCV8MNA2Zo06mdrSxPQGAvxh7Iudf+4qO X-Received: by 10.13.253.71 with SMTP id n68mr89550722ywf.229.1483855440738; Sat, 07 Jan 2017 22:04:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.37.206.15 with HTTP; Sat, 7 Jan 2017 22:04:00 -0800 (PST) In-Reply-To: <3a4d1fce-4375-6297-d866-13a656a217f3@cs.ucla.edu> References: <019a8326-023e-1c7d-bdd7-641beef6130b@cs.ucla.edu> <3a4d1fce-4375-6297-d866-13a656a217f3@cs.ucla.edu> From: Aaron Zhou Qian Date: Sat, 7 Jan 2017 22:04:00 -0800 Message-ID: Subject: Re: New Project To: Paul Eggert , 22239@debbugs.gnu.org Content-Type: multipart/alternative; boundary=94eb2c06c19a71ab9905458f04ab X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 22239 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) --94eb2c06c19a71ab9905458f04ab Content-Type: text/plain; charset=UTF-8 On Sat, Jan 7, 2017 at 9:29 AM, Paul Eggert wrote: > > Could you remind us about the latest status of your proposal compared to > Zev Weiss's? Does contain the latest thing > you have? Zev Weiss's latest version is at rep>. Comparing the two was the thing Jim Meyering asked for at < > http://bugs.gnu.org/22239#8>, and you can follow up by sending email to > 22239@debbugs.gnu.org. Yes that github link is the latest version. I haven't made any changes to that since last year September. Basically the main thread traverses the file tree and assign the file to be searched to each thread. There is also a dynamic buffer so that the output is identical to the original grep program. I tested the program on a server. On a directory containing 4 files, grep -r on that directory is 4 times faster. On a directory containing 8 files, grep -r is 6 times faster. On a directory containing 12 files, grep -r is 8.5 times faster. I think using multithreading is essentially different from not using multithreading, and we also don't use multithreading all the time for grep. When we're not using multithreading, i.e. when we pass in other options for grep, more functions would call those functions whose function signatures we changed. This is hard to keep track of, because the program is fairly complicated. If we had overloading in C++ I would overload those functions. But since we don't, I made it very clear in the code which functions are the counterparts of the original versions. I did this to contain any potential problems so that if there are any problems with multithreading it would not affect the sequential program, whereas if we interleave the two scenarios we might lose track of what's going on. At least this is what I initially thought. I saw that there were some recent commits by Zev together with Jim, for example: in commit 9365ed6536d4fabf42ec17fef1bbe5d78884f950 * src/grep.c (compile_fp_t): Now returns an opaque pointer (the compiled pattern). (execute_fp_t): Now passed the pointer returned by a compile_fp_t. All call sites updated accordingly. (compiled_pattern): New static variable. * src/dfasearch.c (GEAcompile): Return a void pointer (dummy NULL). (EGexecute): Receive a void pointer argument (unused). * src/kwsearch.c (Fcompile): Return a void pointer (dummy NULL). (Fexecute): Receive a void pointer argument (unused). * src/pcresearch.c (Pcompile): Return a void pointer (dummy NULL). (Pexecute): Receive a void pointer argument (unused). * src/search.h: Update compile/execute function prototypes. So we have different approaches. They are trying to add extra pointer arguments for the multithreading case. The pointer argument would be NULL in the case multithreading is not in effect. Whereas my approach is to replicate the functions so the counterparts of the original functions are used in the multithreading scenario. This was done in an attempt to reduce the complexity of each of the functions and make the program less monolithic. I leave you guys to decide. --94eb2c06c19a71ab9905458f04ab Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On Sat, Jan 7, 2017 at 9:29 AM, Paul Eggert=C2=A0<e= ggert@cs.ucla.edu>=C2=A0wrote:
Could you= remind us about the latest status of your proposal compared to Zev Weiss&#= 39;s? Does <http://bugs.gnu.org/24689> contain the latest thing y= ou have? Zev Weiss's latest version is at <https://github.com= /zevweiss/grep>. Comparing the two was the thing Jim Meyering a= sked for at <http://bugs.gnu.org/22239#8>, and you can follow u= p by sending email to=C2=A022239@debbugs.gnu.org.

=C2= =A0

Yes that github link is the latest version. I= haven't made any changes to that since last year September.

Basically the main t= hread traverses the file tree and assign the file to be searched to each th= read. There is also a dynamic buffer so that the output is identical to the= original=C2=A0grep=C2=A0program.

I tested the program on a server. On a directory con= taining 4 files,=C2=A0grep=C2=A0-r on that = directory is 4 times faster. On a directory containing 8 files,=C2=A0grep=C2=A0-r is 6 times faster. On a directory co= ntaining 12 files,=C2=A0grep=C2=A0-r is 8.5= times faster.

I think using multithreading is essentially different from not = using multithreading, and we also don't use multithreading all the time= for grep. When we're not using multithreading, i.e. when we pass in ot= her options for grep, more functions =C2=A0would call those functions whose= function signatures we changed. This is hard to keep track of, because the= program is fairly complicated.=C2=A0

If w= e had overloading in C++ I would overload those functions. But since we don= 't, I made it very clear in the code which functions are the counterpar= ts of the original versions. I did this to contain any potential problems s= o that if there are any problems with multithreading it would not affect th= e sequential program, whereas if we interleave the two scenarios we might l= ose track of what's going on. At least this is what I initially thought= .

I saw that there were some recent commits by Zev= together with Jim, for example:

in=C2=A0

commit= =C2=A09365ed6536d4fabf42ec17fef1bbe5d78884f950

* src/grep.c (comp=
ile_fp_t): Now returns an opaque pointer (the
compiled pattern).
(execute_fp_t): Now passed the pointer returned by a compile_fp_t.
All call sites updated accordingly.
(compiled_pattern): New static variable.
* src/dfasearch.c (GEAcompile): Return a void pointer (dummy NULL).
(EGexecute): Receive a void pointer argument (unused).
* src/kwsearch.c (Fcompile): Return a void pointer (dummy NULL).
(Fexecute): Receive a void pointer argument (unused).
* src/pcresearch.c (Pcompile): Return a void pointer (dummy NULL).
(Pexecute): Receive a void pointer argument (unused).
* src/search.h: Update compile/execute function prototypes.
So we have different approaches. They are trying to add extra= pointer arguments for the multithreading case. The pointer argument would = be NULL in the case multithreading is not in effect. Whereas my approach i= s to replicate the functions so the counterparts of the original functions = are used in the multithreading scenario. This was done in an attempt to red= uce the complexity of each of the functions and make the program less monol= ithic. I leave you guys to decide.
--94eb2c06c19a71ab9905458f04ab-- From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 17 11:18:43 2017 Received: (at 22239-done) by debbugs.gnu.org; 17 Jan 2017 16:18:43 +0000 Received: from localhost ([127.0.0.1]:56616 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cTWT4-0002Mq-NG for submit@debbugs.gnu.org; Tue, 17 Jan 2017 11:18:43 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46364) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cTWT0-0002MV-QM for 22239-done@debbugs.gnu.org; Tue, 17 Jan 2017 11:18:40 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id F164F1600A3; Tue, 17 Jan 2017 08:18:32 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id aCJ68aXGoTjN; Tue, 17 Jan 2017 08:18:31 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 37C9D1600B2; Tue, 17 Jan 2017 08:18:31 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id u5lGvxXRfZvg; Tue, 17 Jan 2017 08:18:31 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 148D31600A3; Tue, 17 Jan 2017 08:18:31 -0800 (PST) Subject: Re: bug#22239: fgrep -i slow in 2.21 To: =?UTF-8?B?T25kxZllaiBDw61ma2E=?= References: <570B2FDA.1060307@cs.ucla.edu> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <1eaa6f92-1d02-ba65-ef6d-d19f9807e3f7@cs.ucla.edu> Date: Tue, 17 Jan 2017 08:18:30 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------A785D4151340D592057B7B1F" X-Spam-Score: -3.2 (---) X-Debbugs-Envelope-To: 22239-done Cc: 22239-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.2 (---) This is a multi-part message in MIME format. --------------A785D4151340D592057B7B1F Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 04/11/2016 12:14 AM, Ond=C5=99ej C=C3=ADfka wrote: > You're probably right about the locale. I'm using cs_CZ.UTF-8. With > LC_ALL=3DC, both variants run faster and the difference is > insignificant. > > With cs_CZ.UTF-8, on my machine, your test case takes 2.322s with -i > and 0.464s without -i. > > I tested on my Aspell dictionary dump, where the difference is more not= iceable: > > aspell dump master | head -n 100000 >list.txt > > grep 2.21 with -i: 7.336s > grep 2.21 without -i: 0.312s > grep 2.16 with -i: 0.372s > grep 2.16 without -i: 0.431s > > With LC_ALL=3DC, both versions are about as fast. I got some free time to look into this, and installed the attached set of patches; the 2nd one is the key one. In the en_EN.utf8 locale on my platform (Fedora 25 x86-64), I get the following user times for 'grep -Ff list.txt list.txt' where list.txt was generated as you describe: 0.444 grep 2.16 0.522 grep 2.16 -i 0.443 grep 2.21 13.048 grep 2.21 -i 0.096 grep current 0.101 grep current -i Since this patch causes grep to use Aho-Corasick more often, I expect it to hurt performance in some cases involving multiple patterns, but we can look into that as they turn up. In the meantime since the original bug seems to be fixed I am taking the liberty of closing the bug report. --------------A785D4151340D592057B7B1F Content-Type: text/plain; charset=UTF-8; name="0001-build-update-gnulib-submodule-to-latest.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0001-build-update-gnulib-submodule-to-latest.txt" RnJvbSBiYTk3NmU3YTk4Y2QzYTI4ODIwZjRkOTk5MTc2YTY3MDRkNDExYWMxIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBNb24sIDE2IEphbiAyMDE3IDEyOjAwOjE3IC0wODAwClN1YmplY3Q6IFtQQVRD SCAxLzRdIGJ1aWxkOiB1cGRhdGUgZ251bGliIHN1Ym1vZHVsZSB0byBsYXRlc3QKCi0tLQog Z251bGliIHwgMiArLQogMSBmaWxlIGNoYW5nZWQsIDEgaW5zZXJ0aW9uKCspLCAxIGRlbGV0 aW9uKC0pCgpkaWZmIC0tZ2l0IGEvZ251bGliIGIvZ251bGliCmluZGV4IGEzZmQ2ODMuLmZh ZGQ4MGEgMTYwMDAwCi0tLSBhL2dudWxpYgorKysgYi9nbnVsaWIKQEAgLTEgKzEgQEAKLVN1 YnByb2plY3QgY29tbWl0IGEzZmQ2ODNkZTNkZWNiYjU4YWI1ZmI1ZDMyYWQyZTYyZjc0ZmJm MTIKK1N1YnByb2plY3QgY29tbWl0IGZhZGQ4MGFlZjlmMWFlMzJhMTZmZTczODBiMTYzNWZl ZTE4ODc2Y2UKLS0gCjIuOS4zCgo= --------------A785D4151340D592057B7B1F Content-Type: text/plain; charset=UTF-8; name="0002-Improve-i-performance-in-typical-UTF-8-searches.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0002-Improve-i-performance-in-typical-UTF-8-searches.txt" RnJvbSAxYTM2NzZkMWE0N2RhNjhlYmM4OTRhMDU2OWYxOWJiOTQ3MjExZGZhIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBNb24sIDE2IEphbiAyMDE3IDEyOjEzOjUwIC0wODAwClN1YmplY3Q6IFtQQVRD SCAyLzRdIEltcHJvdmUgLWkgcGVyZm9ybWFuY2UgaW4gdHlwaWNhbCBVVEYtOCBzZWFyY2hl cwpNSU1FLVZlcnNpb246IDEuMApDb250ZW50LVR5cGU6IHRleHQvcGxhaW47IGNoYXJzZXQ9 VVRGLTgKQ29udGVudC1UcmFuc2Zlci1FbmNvZGluZzogOGJpdAoKQ3VycmVudGx5IOKAmGdy ZXAgLWkgaeKAmSBpcyBzbG93IGluIGEgVVRGLTggbG9jYWxlLCBiZWNhdXNlIOKAmGnigJkg aW4KdGhlIHBhdHRlcm4gbWF0Y2hlcyB0aGUgdHdvLWJ5dGUgY2hhcmFjdGVyICfEsScgKFUr MDEzMSwgTEFUSU4KU01BTEwgTEVUVEVSIERPVExFU1MgSSkgaW4gZGF0YSwgYW5kIGt3c2V0 IGhhbmRsZXMgb25seQpzaW5nbGUtYnl0ZSBjaGFyYWN0ZXIgdHJhbnNsYXRpb25zLCBzbyBn cmVwIGZhbGxzIGJhY2sgb24gYSBzbG93ZXIKREZBLWJhc2VkIHNlYXJjaCBmb3IgYWxsIHNl YXJjaGVzLiAgSW1wcm92ZSAtaSBwZXJmb3JtYW5jZSBpbiB0aGUKdHlwaWNhbCBjYXNlIGJ5 IHVzaW5nIGt3c2V0IHdoZW4gZGF0YSBhcmUgZnJlZSBvZiB0cm91Ymxlc29tZQpjaGFyYWN0 ZXJzIGxpa2UgJ8SxJywgZmFsbGluZyBiYWNrIG9uIHRoZSBERkEgb25seSB3aGVuIGRhdGEK Y29udGFpbiB0cm91Ymxlc29tZSBjaGFyYWN0ZXJzLgoqIHNyYy9kZmFzZWFyY2guYyAoR0VB Y29tcGlsZSk6Ciogc3JjL2dyZXAuYyAoY29tcGlsZV9mcF90KToKKiBzcmMva3dzZWFyY2gu YyAoRmNvbXBpbGUpOgoqIHNyYy9wY3Jlc2VhcmNoLmMgKFBjb21waWxlKToKUGF0dGVybiBh cmcgaXMgbm93IGNoYXIgKiwgbm90IGNoYXIgY29uc3QgKiwgc2luY2UgRmNvbXBpbGUKbm93 IHJlYWxsb2NhdGVzIGl0IHNvbWV0aW1lcy4KKiBzcmMvZ3JlcC5jIChhbGxfc2luZ2xlX2J5 dGVfYWZ0ZXJfZm9sZGluZyk6IFJlbW92ZS4KQWxsIGNhbGxlcnMgcmVtb3ZlZC4KKGZncmVw X2ljYXNlX2NoYXJsZW4pOiBOZXcgZnVuY3Rpb24uCihmZ3JlcF9pY2FzZV9hdmFpbGFibGUs IHRyeV9mZ3JlcF9wYXR0ZXJuKToKVXNlIGl0LCBmb3IgbW9yZS1nZW5lcm91cyBzZW1hbnRp Y3MuCihmZ3JlcF90b19ncmVwX3BhdHRlcm4pOiBOb3cgZXh0ZXJuLgoobWFpbik6IERvIG5v dCBmcmVlIGtleXMsIHNpbmNlIEZleGVjdXRlIG1heSB1c2UgdGhlbS4KKiBzcmMva3dzZWFy Y2guYyAoc3RydWN0IGt3c2VhcmNoKTogTmV3IHN0cnVjdC4KKEZjb21waWxlKTogUmV0dXJu IGl0LiAgSWYgLWksIGJlIG1vcmUgZ2VuZXJvdXMgYWJvdXQgcGF0dGVybnMuCihGZXhlY3V0 ZSk6IFVzZSBpdC4gIEZhbGwgYmFjayBvbiBERkEgd2hlbiB0aGUgZGF0YSBjb250YWluCnRy b3VibGVzb21lIGNoYXJhY3RlcnM7IHRoaXMgc2hvdWxkIGJlIHJhcmUgaW4gcHJhY3RpY2Uu Ciogc3JjL2t3c2V0LmMsIHNyYy9rd3NldC5oIChrd3N3b3Jkcyk6IE5ldyBmdW5jdGlvbi4K LS0tCiBzcmMvZGZhc2VhcmNoLmMgIHwgIDIgKy0KIHNyYy9ncmVwLmMgICAgICAgfCA5MyAr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LQogc3JjL2t3c2VhcmNoLmMgICB8IDkyICsrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrLS0KIHNyYy9rd3NldC5jICAgICAgfCAgNiArKysr CiBzcmMva3dzZXQuaCAgICAgIHwgIDEgKwogc3JjL3BjcmVzZWFyY2guYyB8ICAyICstCiBz cmMvc2VhcmNoLmggICAgIHwgIDcgKysrLS0KIDcgZmlsZXMgY2hhbmdlZCwgMTUzIGluc2Vy dGlvbnMoKyksIDUwIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9kZmFzZWFyY2gu YyBiL3NyYy9kZmFzZWFyY2guYwppbmRleCA5YjMyZDQ4Li41NDJlN2YxIDEwMDY0NAotLS0g YS9zcmMvZGZhc2VhcmNoLmMKKysrIGIvc3JjL2RmYXNlYXJjaC5jCkBAIC0xMDQsNyArMTA0 LDcgQEAga3dzbXVzdHMgKHN0cnVjdCBkZmFfY29tcCAqZGMpCiB9CiAKIHZvaWQgKgotR0VB Y29tcGlsZSAoY2hhciBjb25zdCAqcGF0dGVybiwgc2l6ZV90IHNpemUsIHJlZ19zeW50YXhf dCBzeW50YXhfYml0cykKK0dFQWNvbXBpbGUgKGNoYXIgKnBhdHRlcm4sIHNpemVfdCBzaXpl LCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMpCiB7CiAgIGNoYXIgKm1vdGlmOwogICBzdHJ1 Y3QgZGZhX2NvbXAgKmRjID0geGNhbGxvYyAoMSwgc2l6ZW9mICgqZGMpKTsKZGlmZiAtLWdp dCBhL3NyYy9ncmVwLmMgYi9zcmMvZ3JlcC5jCmluZGV4IDBhNjc0ZWMuLjgxNjU0YzMgMTAw NjQ0Ci0tLSBhL3NyYy9ncmVwLmMKKysrIGIvc3JjL2dyZXAuYwpAQCAtNTc1LDcgKzU3NSw3 IEBAIHN0YXRpYyBib29sIHNlZWtfZmFpbGVkOwogc3RhdGljIGJvb2wgc2Vla19kYXRhX2Zh aWxlZDsKIAogLyogRnVuY3Rpb25zIHdlJ2xsIHVzZSB0byBzZWFyY2guICovCi10eXBlZGVm IHZvaWQgKigqY29tcGlsZV9mcF90KSAoY2hhciBjb25zdCAqLCBzaXplX3QsIHJlZ19zeW50 YXhfdCk7Cit0eXBlZGVmIHZvaWQgKigqY29tcGlsZV9mcF90KSAoY2hhciAqLCBzaXplX3Qs IHJlZ19zeW50YXhfdCk7CiB0eXBlZGVmIHNpemVfdCAoKmV4ZWN1dGVfZnBfdCkgKHZvaWQg KiwgY2hhciBjb25zdCAqLCBzaXplX3QsIHNpemVfdCAqLAogICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICBjaGFyIGNvbnN0ICopOwogc3RhdGljIGV4ZWN1dGVfZnBfdCBleGVj dXRlOwpAQCAtMjI1NCw1NCArMjI1NCw1OCBAQCBjb250YWluc19lbmNvZGluZ19lcnJvciAo Y2hhciBjb25zdCAqcGF0LCBzaXplX3QgcGF0bGVuKQogICByZXR1cm4gZmFsc2U7CiB9CiAK LS8qIFJldHVybiB0cnVlIGlmIHRoZSBzZXQgb2Ygc2luZ2xlLWJ5dGUgY2hhcmFjdGVycyBV U0VEIGNvbnRhaW5zIG9ubHkKLSAgIGNoYXJhY3RlcnMgd2hvc2UgY2FzZSBjb3VudGVycGFy dHMgYXJlIGFsc28gc2luZ2xlLWJ5dGUuICAqLworLyogUmV0dXJuIHRoZSBudW1iZXIgb2Yg Ynl0ZXMgaW4gdGhlIGluaXRpYWwgY2hhcmFjdGVyIG9mIFBBVCwgb2Ygc2l6ZQorICAgUEFU TEVOLCBpZiBGY29tcGlsZSBjYW4gaGFuZGxlIHRoYXQgY2hhcmFjdGVyLiAgUmV0dXJuIC0x IGlmCisgICBGY29tcGlsZSBjYW5ub3QgaGFuZGxlIGl0LiAgTUJTIGlzIHRoZSBtdWx0aWJ5 dGUgY29udmVyc2lvbiBzdGF0ZS4KIAotc3RhdGljIGJvb2wKLWFsbF9zaW5nbGVfYnl0ZV9h ZnRlcl9mb2xkaW5nIChib29sIHVzZWRbVUNIQVJfTUFYICsgMV0pCi17Ci0gIGZvciAoaW50 IGMgPSAwOyBjIDw9IFVDSEFSX01BWDsgYysrKQotICAgIGlmICh1c2VkW2NdKQotICAgICAg ewotICAgICAgICB3aW50X3Qgd2MgPSBsb2NhbGVpbmZvLnNiY3Rvd2NbY107Ci0gICAgICAg IHdjaGFyX3QgZm9sZGVkW0NBU0VfRk9MREVEX0JVRlNJWkVdOwotICAgICAgICBzaXplX3Qg bmZvbGRlZCA9IGNhc2VfZm9sZGVkX2NvdW50ZXJwYXJ0cyAod2MsIGZvbGRlZCk7Ci0KLSAg ICAgICAgZm9yIChzaXplX3QgaSA9IDA7IGkgPCBuZm9sZGVkOyBpKyspCi0gICAgICAgICAg ewotICAgICAgICAgICAgY2hhciBzW01CX0xFTl9NQVhdOwotICAgICAgICAgICAgbWJzdGF0 ZV90IG1iX3N0YXRlID0geyAwIH07Ci0gICAgICAgICAgICBpZiAoMSA8IHdjcnRvbWIgKHMs IGZvbGRlZFtpXSwgJm1iX3N0YXRlKSkKLSAgICAgICAgICAgICAgcmV0dXJuIGZhbHNlOwot ICAgICAgICAgIH0KLSAgICAgIH0KKyAgIEZjb21waWxlIGNhbiBoYW5kbGUgYSBjaGFyYWN0 ZXIgQyBpZiBDIGlzIHNpbmdsZS1ieXRlLCBvciBpZiBDIGhhcyBubworICAgY2FzZSBmb2xk ZWQgY291bnRlcnBhcnRzIGFuZCB0b3VwcGVyIHRyYW5zbGF0ZXMgbm9uZSBvZiBpdHMgYnl0 ZXMuICAqLwogCi0gIHJldHVybiB0cnVlOworc3RhdGljIGludAorZmdyZXBfaWNhc2VfY2hh cmxlbiAoY2hhciBjb25zdCAqcGF0LCBzaXplX3QgcGF0bGVuLCBtYnN0YXRlX3QgKm1icykK K3sKKyAgaW50IG4gPSBsb2NhbGVpbmZvLnNiY2xlblt0b191Y2hhciAoKnBhdCldOworICBp ZiAobiA8IDApCisgICAgeworICAgICAgd2NoYXJfdCB3YzsKKyAgICAgIHdjaGFyX3QgZm9s ZGVkW0NBU0VfRk9MREVEX0JVRlNJWkVdOworICAgICAgc2l6ZV90IHduID0gbWJydG93YyAo JndjLCBwYXQsIHBhdGxlbiwgbWJzKTsKKyAgICAgIGlmIChNQl9MRU5fTUFYIDwgd24gfHwg Y2FzZV9mb2xkZWRfY291bnRlcnBhcnRzICh3YywgZm9sZGVkKSkKKyAgICAgICAgcmV0dXJu IC0xOworICAgICAgZm9yIChpbnQgaSA9IHduOyAwIDwgLS1pOyApCisgICAgICAgIHsKKyAg ICAgICAgICB1bnNpZ25lZCBjaGFyIGMgPSBwYXRbaV07CisgICAgICAgICAgaWYgKHRvdXBw ZXIgKGMpICE9IGMpCisgICAgICAgICAgICByZXR1cm4gLTE7CisgICAgICAgIH0KKyAgICAg IG4gPSB3bjsKKyAgICB9CisgIHJldHVybiBuOwogfQogCi0vKiBSZXR1cm4gdHJ1ZSBpZiB0 aGUgLUYgcGF0dGVybnMgUEFULCBvZiBzaXplIFBBVExFTiwgbWF0Y2ggb25seQotICAgc2lu Z2xlLWJ5dGUgY2hhcmFjdGVycyBpbmNsdWRpbmcgY2FzZSBmb2xkaW5nLCBhbmQgc28gY2Fu IGJlCi0gICBwcm9jZXNzZWQgYnkgdGhlIC1GIG1hdGNoZXIgZXZlbiBpZiAtaSBpcyBnaXZl bi4gICovCisvKiBSZXR1cm4gdHJ1ZSBpZiB0aGUgLUYgcGF0dGVybnMgUEFULCBvZiBzaXpl IFBBVExFTiwgY29udGFpbiBvbmx5CisgICBzaW5nbGUtYnl0ZSBjaGFyYWN0ZXJzIG9yIGNo YXJhY3RlcnMgbm90IHN1YmplY3QgdG8gY2FzZSBmb2xkaW5nLAorICAgYW5kIHNvIGNhbiBi ZSBwcm9jZXNzZWQgYnkgRmNvbXBpbGUuICAqLwogCiBzdGF0aWMgYm9vbAogZmdyZXBfaWNh c2VfYXZhaWxhYmxlIChjaGFyIGNvbnN0ICpwYXQsIHNpemVfdCBwYXRsZW4pCiB7Ci0gIGJv b2wgdXNlZFtVQ0hBUl9NQVggKyAxXSA9IHsgMCwgfTsKKyAgbWJzdGF0ZV90IG1icyA9IHsw LH07CiAKLSAgZm9yIChzaXplX3QgaSA9IDA7IGkgPCBwYXRsZW47IGkrKykKKyAgZm9yIChz aXplX3QgaSA9IDA7IGkgPCBwYXRsZW47ICkKICAgICB7Ci0gICAgICB1bnNpZ25lZCBjaGFy IGMgPSBwYXRbaV07Ci0gICAgICBpZiAobG9jYWxlaW5mby5zYmN0b3djW2NdID09IFdFT0Yp CisgICAgICBpbnQgbiA9IGZncmVwX2ljYXNlX2NoYXJsZW4gKHBhdCArIGksIHBhdGxlbiAt IGksICZtYnMpOworICAgICAgaWYgKG4gPCAwKQogICAgICAgICByZXR1cm4gZmFsc2U7Ci0g ICAgICB1c2VkW2NdID0gdHJ1ZTsKKyAgICAgIGkgKz0gbjsKICAgICB9CiAKLSAgcmV0dXJu IGFsbF9zaW5nbGVfYnl0ZV9hZnRlcl9mb2xkaW5nICh1c2VkKTsKKyAgcmV0dXJuIHRydWU7 CiB9CiAKIC8qIENoYW5nZSB0aGUgcGF0dGVybiAqS0VZU19QLCBvZiBzaXplICpMRU5fUCwg ZnJvbSBmZ3JlcCB0byBncmVwIHN0eWxlLiAgKi8KIAotc3RhdGljIHZvaWQKK3ZvaWQKIGZn cmVwX3RvX2dyZXBfcGF0dGVybiAoY2hhciAqKmtleXNfcCwgc2l6ZV90ICpsZW5fcCkKIHsK ICAgc2l6ZV90IGxlbiA9ICpsZW5fcDsKQEAgLTIzNTgsOCArMjM2Miw2IEBAIHRyeV9mZ3Jl cF9wYXR0ZXJuIChpbnQgbWF0Y2hlciwgY2hhciAqa2V5cywgc2l6ZV90ICpsZW5fcCkKICAg Y2hhciAqbmV3X2tleXMgPSB4bWFsbG9jIChsZW4gKyAxKTsKICAgY2hhciAqcCA9IG5ld19r ZXlzOwogICBtYnN0YXRlX3QgbWJfc3RhdGUgPSB7IDAgfTsKLSAgc2l6ZV90IG1ibGVuX2xp bSA9IG1hdGNoX2ljYXNlID8gMSA6IC0zOwotICBib29sIHVzZWRbVUNIQVJfTUFYICsgMV0g PSB7IDAsIH07CiAKICAgd2hpbGUgKGxlbiAhPSAwKQogICAgIHsKQEAgLTIzOTYsMTkgKzIz OTgsMjcgQEAgdHJ5X2ZncmVwX3BhdHRlcm4gKGludCBtYXRjaGVyLCBjaGFyICprZXlzLCBz aXplX3QgKmxlbl9wKQogICAgICAgICB9CiAKICAgICAgIHsKLSAgICAgICAgc2l6ZV90IG4g PSBtYl9jbGVuIChrZXlzLCBsZW4sICZtYl9zdGF0ZSk7Ci0gICAgICAgIGlmIChtYmxlbl9s aW0gPCBuKQotICAgICAgICAgIGdvdG8gZmFpbDsKLSAgICAgICAgdXNlZFt0b191Y2hhciAo a2V5c1swXSldID0gdHJ1ZTsKKyAgICAgICAgc2l6ZV90IG47CisgICAgICAgIGlmIChtYXRj aF9pY2FzZSkKKyAgICAgICAgICB7CisgICAgICAgICAgICBpbnQgbmkgPSBmZ3JlcF9pY2Fz ZV9jaGFybGVuIChrZXlzLCBsZW4sICZtYl9zdGF0ZSk7CisgICAgICAgICAgICBpZiAobmkg PCAwKQorICAgICAgICAgICAgICBnb3RvIGZhaWw7CisgICAgICAgICAgICBuID0gbmk7Cisg ICAgICAgICAgfQorICAgICAgICBlbHNlCisgICAgICAgICAgeworICAgICAgICAgICAgbiA9 IG1iX2NsZW4gKGtleXMsIGxlbiwgJm1iX3N0YXRlKTsKKyAgICAgICAgICAgIGlmIChNQl9M RU5fTUFYIDwgbikKKyAgICAgICAgICAgICAgZ290byBmYWlsOworICAgICAgICAgIH0KKwog ICAgICAgICBwID0gbWVtcGNweSAocCwga2V5cywgbik7CiAgICAgICAgIGtleXMgKz0gbjsK ICAgICAgICAgbGVuIC09IG47CiAgICAgICB9CiAgICAgfQogCi0gIGlmIChtYmxlbl9saW0g PT0gMSAmJiAhYWxsX3NpbmdsZV9ieXRlX2FmdGVyX2ZvbGRpbmcgKHVzZWQpKQotICAgIGdv dG8gZmFpbDsKLQogICBpZiAoKmxlbl9wICE9IHAgLSBuZXdfa2V5cykKICAgICB7CiAgICAg ICAqbGVuX3AgPSBwIC0gbmV3X2tleXM7CkBAIC0yODc4LDcgKzI4ODgsNiBAQCBtYWluIChp bnQgYXJnYywgY2hhciAqKmFyZ3YpCiAgIGV4ZWN1dGUgPSBtYXRjaGVyc1ttYXRjaGVyXS5l eGVjdXRlOwogICBjb21waWxlZF9wYXR0ZXJuID0gbWF0Y2hlcnNbbWF0Y2hlcl0uY29tcGls ZSAoa2V5cywga2V5Y2MsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICBtYXRjaGVyc1ttYXRjaGVyXS5zeW50YXgpOwotICBmcmVlIChrZXlzKTsK ICAgLyogV2UgbmVlZCBvbmUgYnl0ZSBwcmlvciBhbmQgb25lIGFmdGVyLiAgKi8KICAgY2hh ciBlb2xieXRlc1szXSA9IHsgMCwgZW9sYnl0ZSwgMCB9OwogICBzaXplX3QgbWF0Y2hfc2l6 ZTsKZGlmZiAtLWdpdCBhL3NyYy9rd3NlYXJjaC5jIGIvc3JjL2t3c2VhcmNoLmMKaW5kZXgg NWU5ZGYxNi4uMzBhMDI3YyAxMDA2NDQKLS0tIGEvc3JjL2t3c2VhcmNoLmMKKysrIGIvc3Jj L2t3c2VhcmNoLmMKQEAgLTIxLDggKzIxLDMzIEBACiAjaW5jbHVkZSA8Y29uZmlnLmg+CiAj aW5jbHVkZSAic2VhcmNoLmgiCiAKKy8qIEEgY29tcGlsZWQgLUYgcGF0dGVybiBsaXN0LiAg Ki8KKworc3RydWN0IGt3c2VhcmNoCit7CisgIC8qIFRoZSBrd3NldCBmb3IgdGhpcyBwYXR0 ZXJuIGxpc3QuICAqLworICBrd3NldF90IGt3c2V0OworCisgIC8qIFRoZSBudW1iZXIgb2Yg dXNlci1zcGVjaWZpZWQgcGF0dGVybnMuICBUaGlzIGlzIGxlc3MgdGhhbgorICAgICAna3dz d29yZHMgKGt3c2V0KScgd2hlbiBzb21lIGV4dHJhIG9uZS1jaGFyYWN0ZXIgd29yZHMgaGF2 ZSBiZWVuCisgICAgIGFwcGVuZGVkLCBvbmUgZm9yIGVhY2ggdHJvdWJsZXNvbWUgY2hhcmFj dGVyIHRoYXQgd2lsbCByZXF1aXJlIGEKKyAgICAgREZBIHNlYXJjaC4gICovCisgIHB0cmRp ZmZfdCB3b3JkczsKKworICAvKiBUaGUgdXNlcidzIHBhdHRlcm4gYW5kIGl0cyBzaXplIGlu IGJ5dGVzLiAgKi8KKyAgY2hhciAqcGF0dGVybjsKKyAgc2l6ZV90IHNpemU7CisKKyAgLyog VGhlIHVzZXIncyBwYXR0ZXJuIGNvbXBpbGVkIGFzIGEgcmVndWxhciBleHByZXNzaW9uLAor ICAgICBvciBudWxsIGlmIGl0IGhhcyBub3QgYmVlbiBjb21waWxlZC4gICovCisgIHZvaWQg KnJlOworfTsKKworLyogQ29tcGlsZSB0aGUgLUYgc3R5bGUgUEFUVEVSTiwgY29udGFpbmlu ZyBTSVpFIGJ5dGVzLiAgUmV0dXJuIGEKKyAgIGRlc2NyaXB0aW9uIG9mIHRoZSBjb21waWxl ZCBwYXR0ZXJuLiAgKi8KKwogdm9pZCAqCi1GY29tcGlsZSAoY2hhciBjb25zdCAqcGF0dGVy biwgc2l6ZV90IHNpemUsIHJlZ19zeW50YXhfdCBpZ25vcmVkKQorRmNvbXBpbGUgKGNoYXIg KnBhdHRlcm4sIHNpemVfdCBzaXplLCByZWdfc3ludGF4X3QgaWdub3JlZCkKIHsKICAga3dz ZXRfdCBrd3NldDsKICAgcHRyZGlmZl90IHRvdGFsID0gc2l6ZTsKQEAgLTc0LDExICs5OSw1 NSBAQCBGY29tcGlsZSAoY2hhciBjb25zdCAqcGF0dGVybiwgc2l6ZV90IHNpemUsIHJlZ19z eW50YXhfdCBpZ25vcmVkKQogICB3aGlsZSAocCk7CiAKICAgZnJlZSAoYnVmKTsKKyAgcHRy ZGlmZl90IHdvcmRzID0ga3dzd29yZHMgKGt3c2V0KTsKKworICBpZiAobWF0Y2hfaWNhc2Up CisgICAgeworICAgICAgLyogRm9yIGVhY2ggcGF0dGVybiBjaGFyYWN0ZXIgQyB0aGF0IGhh cyBhIGNhc2UgZm9sZGVkCisgICAgICAgICBjb3VudGVycGFydCBGIHRoYXQgaXMgbXVsdGli eXRlIGFuZCBzbyBjYW5ub3QgZWFzaWx5IGJlCisgICAgICAgICBpbXBsZW1lbnRlZCB2aWEg dHJhbnNsYXRpbmcgYSBzaW5nbGUgYnl0ZSwgYXBwZW5kIGEgcGF0dGVybgorICAgICAgICAg Y29udGFpbmluZyBqdXN0IEYuICBUaGF0IHdheSwgaWYgdGhlIGRhdGEgY29udGFpbnMgRiwg dGhlCisgICAgICAgICBtYXRjaGVyIGNhbiBmYWxsIGJhY2sgb24gREZBLiAgRm9yIGV4YW1w bGUsIGlmIEMgaXMgJ2knIGFuZAorICAgICAgICAgdGhlIGxvY2FsZSBpcyBlbl9VUy51dGY4 LCBhcHBlbmQgYSBwYXR0ZXJuIGNvbnRhaW5pbmcganVzdAorICAgICAgICAgdGhlIGNoYXJh Y3RlciBVKzAxMzEgKExBVElOIFNNQUxMIExFVFRFUiBET1RMRVNTIEkpLCBzbyB0aGF0Cisg ICAgICAgICBGZXhlY3V0ZSB3aWxsIHVzZSBhIERGQSBpZiB0aGUgZGF0YSBjb250YWluIFUr MDEzMS4gICovCisgICAgICBtYnN0YXRlX3QgbWJzID0geyAwIH07CisgICAgICBjaGFyIGNo ZWNrZWRbTkNIQVJdID0gezAsfTsKKyAgICAgIGZvciAocCA9IHBhdHRlcm47IHAgPCBwYXR0 ZXJuICsgc2l6ZTsgcCsrKQorICAgICAgICB7CisgICAgICAgICAgdW5zaWduZWQgY2hhciBj ID0gKnA7CisgICAgICAgICAgaWYgKGNoZWNrZWRbY10pCisgICAgICAgICAgICBjb250aW51 ZTsKKyAgICAgICAgICBjaGVja2VkW2NdID0gdHJ1ZTsKKworICAgICAgICAgIHdpbnRfdCB3 YyA9IGxvY2FsZWluZm8uc2JjdG93Y1tjXTsKKyAgICAgICAgICB3Y2hhcl90IGZvbGRlZFtD QVNFX0ZPTERFRF9CVUZTSVpFXTsKKworICAgICAgICAgIGZvciAoaW50IGkgPSBjYXNlX2Zv bGRlZF9jb3VudGVycGFydHMgKHdjLCBmb2xkZWQpOyAwIDw9IC0taTsgKQorICAgICAgICAg ICAgeworICAgICAgICAgICAgICBjaGFyIHNbTUJfTEVOX01BWF07CisgICAgICAgICAgICAg IGludCBuYnl0ZXMgPSB3Y3J0b21iIChzLCBmb2xkZWRbaV0sICZtYnMpOworICAgICAgICAg ICAgICBpZiAoMSA8IG5ieXRlcykKKyAgICAgICAgICAgICAgICBrd3NpbmNyIChrd3NldCwg cywgbmJ5dGVzKTsKKyAgICAgICAgICAgIH0KKyAgICAgICAgfQorICAgIH0KKwogICBrd3Nw cmVwIChrd3NldCk7CiAKLSAgcmV0dXJuIGt3c2V0OworICBzdHJ1Y3Qga3dzZWFyY2ggKmt3 c2VhcmNoID0geG1hbGxvYyAoc2l6ZW9mICprd3NlYXJjaCk7CisgIGt3c2VhcmNoLT5rd3Nl dCA9IGt3c2V0OworICBrd3NlYXJjaC0+d29yZHMgPSB3b3JkczsKKyAga3dzZWFyY2gtPnBh dHRlcm4gPSBwYXR0ZXJuOworICBrd3NlYXJjaC0+c2l6ZSA9IHNpemU7CisgIGt3c2VhcmNo LT5yZSA9IE5VTEw7CisgIHJldHVybiBrd3NlYXJjaDsKIH0KIAorLyogVXNlIHRoZSBjb21w aWxlZCBwYXR0ZXJuIFZDUCB0byBzZWFyY2ggdGhlIGJ1ZmZlciBCVUYgb2Ygc2l6ZSBTSVpF LgorICAgSWYgZm91bmQsIHJldHVybiB0aGUgb2Zmc2V0IG9mIHRoZSBmaXJzdCBtYXRjaCBh bmQgc3RvcmUgaXRzCisgICBzaXplIGludG8gKk1BVENIX1NJWkUuICBJZiBub3QgZm91bmQs IHJldHVybiBTSVpFX01BWC4KKyAgIElmIFNUQVJUX1BUUiBpcyBub25udWxsLCBzdGFydCBz ZWFyY2hpbmcgdGhlcmUuICAqLwogc2l6ZV90CiBGZXhlY3V0ZSAodm9pZCAqdmNwLCBjaGFy IGNvbnN0ICpidWYsIHNpemVfdCBzaXplLCBzaXplX3QgKm1hdGNoX3NpemUsCiAgICAgICAg ICAgY2hhciBjb25zdCAqc3RhcnRfcHRyKQpAQCAtOTAsNyArMTU5LDggQEAgRmV4ZWN1dGUg KHZvaWQgKnZjcCwgY2hhciBjb25zdCAqYnVmLCBzaXplX3Qgc2l6ZSwgc2l6ZV90ICptYXRj aF9zaXplLAogICBzaXplX3QgcmV0X3ZhbDsKICAgYm9vbCBtYl9jaGVjazsKICAgYm9vbCBs b25nZXN0OwotICBrd3NldF90IGt3c2V0ID0gdmNwOworICBzdHJ1Y3Qga3dzZWFyY2ggKmt3 c2VhcmNoID0gdmNwOworICBrd3NldF90IGt3c2V0ID0ga3dzZWFyY2gtPmt3c2V0OwogCiAg IGlmIChtYXRjaF9saW5lcykKICAgICBtYl9jaGVjayA9IGxvbmdlc3QgPSBmYWxzZTsKQEAg LTEwOCw2ICsxNzgsMjIgQEAgRmV4ZWN1dGUgKHZvaWQgKnZjcCwgY2hhciBjb25zdCAqYnVm LCBzaXplX3Qgc2l6ZSwgc2l6ZV90ICptYXRjaF9zaXplLAogICAgICAgaWYgKG9mZnNldCA8 IDApCiAgICAgICAgIGJyZWFrOwogICAgICAgbGVuID0ga3dzbWF0Y2guc2l6ZVswXSAtIDIg KiBtYXRjaF9saW5lczsKKworICAgICAgaWYgKGt3c2VhcmNoLT53b3JkcyA8PSBrd3NtYXRj aC5pbmRleCkKKyAgICAgICAgeworICAgICAgICAgIC8qIFRoZSBkYXRhIGNvbnRhaW4gYSBt dWx0aWJ5dGUgY2hhcmFjdGVyIHRoYXQgbWF0Y2hlcworICAgICAgICAgICAgIHNvbWUgcGF0 dGVybiBjaGFyYWN0ZXIgdGhhdCBpcyBhIGNhc2UgZm9sZGVkIGNvdW50ZXJwYXJ0LgorICAg ICAgICAgICAgIFNpbmNlIHRoZSBrd3NldCBjb2RlIGNhbm5vdCBoYW5kbGUgdGhpcyBjYXNl LCBmYWxsIGJhY2sKKyAgICAgICAgICAgICBvbiB0aGUgREZBIGNvZGUsIHdoaWNoIGNhbi4g ICovCisgICAgICAgICAgaWYgKCEga3dzZWFyY2gtPnJlKQorICAgICAgICAgICAgeworICAg ICAgICAgICAgICBmZ3JlcF90b19ncmVwX3BhdHRlcm4gKCZrd3NlYXJjaC0+cGF0dGVybiwg Jmt3c2VhcmNoLT5zaXplKTsKKyAgICAgICAgICAgICAga3dzZWFyY2gtPnJlID0gR0VBY29t cGlsZSAoa3dzZWFyY2gtPnBhdHRlcm4sIGt3c2VhcmNoLT5zaXplLAorICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBSRV9TWU5UQVhfR1JFUCk7CisgICAgICAg ICAgICB9CisgICAgICAgICAgcmV0dXJuIEVHZXhlY3V0ZSAoa3dzZWFyY2gtPnJlLCBidWYs IHNpemUsIG1hdGNoX3NpemUsIHN0YXJ0X3B0cik7CisgICAgICAgIH0KKwogICAgICAgaWYg KG1iX2NoZWNrICYmIG1iX2dvYmFjayAoJm1iX3N0YXJ0LCBiZWcgKyBvZmZzZXQsIGJ1ZiAr IHNpemUpICE9IDApCiAgICAgICAgIHsKICAgICAgICAgICAvKiBXZSBoYXZlIG1hdGNoZWQg YSBzaW5nbGUgYnl0ZSB0aGF0IGlzIG5vdCBhdCB0aGUgYmVnaW5uaW5nIG9mIGEKZGlmZiAt LWdpdCBhL3NyYy9rd3NldC5jIGIvc3JjL2t3c2V0LmMKaW5kZXggMGI2ZmFiNC4uNzdlZjdj NyAxMDA2NDQKLS0tIGEvc3JjL2t3c2V0LmMKKysrIGIvc3JjL2t3c2V0LmMKQEAgLTMyOCw2 ICszMjgsMTIgQEAga3dzaW5jciAoa3dzZXRfdCBrd3NldCwgY2hhciBjb25zdCAqdGV4dCwg cHRyZGlmZl90IGxlbikKICAgICBrd3NldC0+bWF4ZCA9IHRyaWUtPmRlcHRoOwogfQogCitw dHJkaWZmX3QKK2t3c3dvcmRzIChrd3NldF90IGt3c2V0KQoreworICByZXR1cm4ga3dzZXQt PndvcmRzOworfQorCiAvKiBFbnF1ZXVlIHRoZSB0cmllIG5vZGVzIHJlZmVyZW5jZWQgZnJv bSB0aGUgZ2l2ZW4gdHJlZSBpbiB0aGUKICAgIGdpdmVuIHF1ZXVlLiAgKi8KIHN0YXRpYyB2 b2lkCmRpZmYgLS1naXQgYS9zcmMva3dzZXQuaCBiL3NyYy9rd3NldC5oCmluZGV4IGVmNzhk YjMuLjVjNzhlNTQgMTAwNjQ0Ci0tLSBhL3NyYy9rd3NldC5oCisrKyBiL3NyYy9rd3NldC5o CkBAIC0zNiw2ICszNiw3IEBAIHR5cGVkZWYgc3RydWN0IGt3c2V0ICprd3NldF90OwogCiBl eHRlcm4ga3dzZXRfdCBrd3NhbGxvYyAoY2hhciBjb25zdCAqLCBib29sKTsKIGV4dGVybiB2 b2lkIGt3c2luY3IgKGt3c2V0X3QsIGNoYXIgY29uc3QgKiwgcHRyZGlmZl90KTsKK2V4dGVy biBwdHJkaWZmX3Qga3dzd29yZHMgKGt3c2V0X3QpIF9HTF9BVFRSSUJVVEVfUFVSRTsKIGV4 dGVybiB2b2lkIGt3c3ByZXAgKGt3c2V0X3QpOwogZXh0ZXJuIHB0cmRpZmZfdCBrd3NleGVj IChrd3NldF90LCBjaGFyIGNvbnN0ICosIHB0cmRpZmZfdCwKICAgICAgICAgICAgICAgICAg ICAgICAgICAgc3RydWN0IGt3c21hdGNoICosIGJvb2wpCmRpZmYgLS1naXQgYS9zcmMvcGNy ZXNlYXJjaC5jIGIvc3JjL3BjcmVzZWFyY2guYwppbmRleCAyODE0NGI0Li43MDM0OThjIDEw MDY0NAotLS0gYS9zcmMvcGNyZXNlYXJjaC5jCisrKyBiL3NyYy9wY3Jlc2VhcmNoLmMKQEAg LTkxLDcgKzkxLDcgQEAgaml0X2V4ZWMgKHN0cnVjdCBwY3JlX2NvbXAgKnBjLCBjaGFyIGNv bnN0ICpzdWJqZWN0LCBpbnQgc2VhcmNoX2J5dGVzLAogI2VuZGlmCiAKIHZvaWQgKgotUGNv bXBpbGUgKGNoYXIgY29uc3QgKnBhdHRlcm4sIHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qg aWdub3JlZCkKK1Bjb21waWxlIChjaGFyICpwYXR0ZXJuLCBzaXplX3Qgc2l6ZSwgcmVnX3N5 bnRheF90IGlnbm9yZWQpCiB7CiAjaWYgIUhBVkVfTElCUENSRQogICBkaWUgKEVYSVRfVFJP VUJMRSwgMCwKZGlmZiAtLWdpdCBhL3NyYy9zZWFyY2guaCBiL3NyYy9zZWFyY2guaAppbmRl eCBjOGJkNWYyLi44NTMzZDU5IDEwMDY0NAotLS0gYS9zcmMvc2VhcmNoLmgKKysrIGIvc3Jj L3NlYXJjaC5oCkBAIC01NSwxOSArNTUsMjAgQEAgZXh0ZXJuIHNpemVfdCB3b3JkY2hhcl9w cmV2IChjaGFyIGNvbnN0ICosIGNoYXIgY29uc3QgKiwgY2hhciBjb25zdCAqKQogZXh0ZXJu IHB0cmRpZmZfdCBtYl9nb2JhY2sgKGNoYXIgY29uc3QgKiosIGNoYXIgY29uc3QgKiwgY2hh ciBjb25zdCAqKTsKIAogLyogZGZhc2VhcmNoLmMgKi8KLWV4dGVybiB2b2lkICpHRUFjb21w aWxlIChjaGFyIGNvbnN0ICosIHNpemVfdCwgcmVnX3N5bnRheF90KTsKK2V4dGVybiB2b2lk ICpHRUFjb21waWxlIChjaGFyICosIHNpemVfdCwgcmVnX3N5bnRheF90KTsKIGV4dGVybiBz aXplX3QgRUdleGVjdXRlICh2b2lkICosIGNoYXIgY29uc3QgKiwgc2l6ZV90LCBzaXplX3Qg KiwgY2hhciBjb25zdCAqKTsKIAogLyoga3dzZWFyY2guYyAqLwotZXh0ZXJuIHZvaWQgKkZj b21waWxlIChjaGFyIGNvbnN0ICosIHNpemVfdCwgcmVnX3N5bnRheF90KTsKK2V4dGVybiB2 b2lkICpGY29tcGlsZSAoY2hhciAqLCBzaXplX3QsIHJlZ19zeW50YXhfdCk7CiBleHRlcm4g c2l6ZV90IEZleGVjdXRlICh2b2lkICosIGNoYXIgY29uc3QgKiwgc2l6ZV90LCBzaXplX3Qg KiwgY2hhciBjb25zdCAqKTsKIAogLyogcGNyZXNlYXJjaC5jICovCi1leHRlcm4gdm9pZCAq UGNvbXBpbGUgKGNoYXIgY29uc3QgKiwgc2l6ZV90LCByZWdfc3ludGF4X3QpOworZXh0ZXJu IHZvaWQgKlBjb21waWxlIChjaGFyICosIHNpemVfdCwgcmVnX3N5bnRheF90KTsKIGV4dGVy biBzaXplX3QgUGV4ZWN1dGUgKHZvaWQgKiwgY2hhciBjb25zdCAqLCBzaXplX3QsIHNpemVf dCAqLCBjaGFyIGNvbnN0ICopOwogCiAvKiBncmVwLmMgKi8KIGV4dGVybiBzdHJ1Y3QgbG9j YWxlaW5mbyBsb2NhbGVpbmZvOworZXh0ZXJuIHZvaWQgZmdyZXBfdG9fZ3JlcF9wYXR0ZXJu IChjaGFyICoqLCBzaXplX3QgKik7CiAKIC8qIFJldHVybiB0aGUgbnVtYmVyIG9mIGJ5dGVz IGluIHRoZSBjaGFyYWN0ZXIgYXQgdGhlIHN0YXJ0IG9mIFMsIHdoaWNoCiAgICBpcyBvZiBz aXplIE4uICBOIG11c3QgYmUgcG9zaXRpdmUuICBNQlMgaXMgdGhlIGNvbnZlcnNpb24gc3Rh dGUuCi0tIAoyLjkuMwoK --------------A785D4151340D592057B7B1F Content-Type: text/plain; charset=UTF-8; name="0003-src-kwset.c-Fix-comment-typo.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0003-src-kwset.c-Fix-comment-typo.txt" RnJvbSAxNWMxMjQxY2EwNGNjNDk4NDZjYzI1YThiZGFmZTJhZGZlOTE3NTBjIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBNb24sIDE2IEphbiAyMDE3IDE3OjMxOjA2IC0wODAwClN1YmplY3Q6IFtQQVRD SCAzLzRdICogc3JjL2t3c2V0LmM6IEZpeCBjb21tZW50IHR5cG8uCgotLS0KIHNyYy9rd3Nl dC5jIHwgMiArLQogMSBmaWxlIGNoYW5nZWQsIDEgaW5zZXJ0aW9uKCspLCAxIGRlbGV0aW9u KC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2t3c2V0LmMgYi9zcmMva3dzZXQuYwppbmRleCA3N2Vm N2M3Li5iYmI4MzdjIDEwMDY0NAotLS0gYS9zcmMva3dzZXQuYworKysgYi9zcmMva3dzZXQu YwpAQCAtMTksNyArMTksNyBAQAogCiAvKiBXcml0dGVuIEF1Z3VzdCAxOTg5IGJ5IE1pa2Ug SGFlcnRlbC4gICovCiAKLS8qIFRoZSBhbGdvcml0aG0gY2FsbGVkICJDb21tZW50el9XYWx0 ZXIiIGJlbG93IGJlYXJzIGEgc3RhcnRsaW5nIHJlc2VtYmxhbmNlCisvKiBUaGUgYWxnb3Jp dGhtIGNhbGxlZCAiQ29tbWVudHotV2FsdGVyIiBiZWxvdyBiZWFycyBhIHN0YXJ0bGluZyBy ZXNlbWJsYW5jZQogICAgdG8gdGhhdCBvZiBCZWF0ZSBDb21tZW50ei1XYWx0ZXIsIGFsdGhv dWdoIGl0IGlzIG5vdCBpZGVudGljYWwuCiAgICBTZWU6IENvbW1lbnR6LVdhbHRlciBCLiBB IHN0cmluZyBtYXRjaGluZyBhbGdvcml0aG0gZmFzdCBvbiB0aGUgYXZlcmFnZS4KICAgIExl Y3R1cmUgTm90ZXMgaW4gQ29tcHV0ZXIgU2NpZW5jZSA3MSAoMTk3OSksIDExOC0zMgotLSAK Mi45LjMKCg== --------------A785D4151340D592057B7B1F Content-Type: text/plain; charset=UTF-8; name="0004-NEWS-Fix-typo.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0004-NEWS-Fix-typo.txt" RnJvbSBmNWVlZWM5NWQ5Y2NkMjJhNDc4NWNjOGVhOTEzNWRmY2ExMzg3Yjg5IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBUdWUsIDE3IEphbiAyMDE3IDA4OjAwOjExIC0wODAwClN1YmplY3Q6IFtQQVRD SCA0LzRdICogTkVXUzogRml4IHR5cG8uCgotLS0KIE5FV1MgfCAyIC0tCiAxIGZpbGUgY2hh bmdlZCwgMiBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9ORVdTIGIvTkVXUwppbmRleCA3 NTMwNTg0Li4zNTI5ZjRlIDEwMDY0NAotLS0gYS9ORVdTCisrKyBiL05FV1MKQEAgLTksOCAr OSw2IEBAIEdOVSBncmVwIE5FV1MgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAtKi0gb3V0bGluZSAtKi0KICAgc3RhbmRhcmQgaW5wdXQgaXMgYSBwaXBlIGFuZCBzdGFu ZGFyZCBvdXRwdXQgaXMgaW4gYXBwZW5kIG1vZGUuCiAgIFtidWdzIGludHJvZHVjZWQgaW4g Z3JlcC0yLjI3XQogCi0qKiBCdWcgZml4ZXMKLQogICBGaXggcGVyZm9ybWFuY2UgcmVncmVz c2lvbiB3aXRoIG11bHRpcGxlIHBhdHRlcm5zLCBlLmcuLCBmb3IgLUZpIGluCiAgIGEgbXVs dGktYnl0ZSBsb2NhbGUsIG9yIGZvciAtRncgaW4gYSBzaW5nbGUtYnl0ZSBsb2NhbGUuCiAg IFtidWdzIGludHJvZHVjZWQgaW4gZ3JlcC0yLjE5LCBncmVwLTIuMjIgYW5kIGdyZXAtMi4y Nl0KLS0gCjIuOS4zCgo= --------------A785D4151340D592057B7B1F-- From unknown Sun Aug 17 09:09:24 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 15 Feb 2017 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator