From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 14 22:20:43 2020 Received: (at submit) by debbugs.gnu.org; 15 Apr 2020 02:20:43 +0000 Received: from localhost ([127.0.0.1]:35053 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jOXfM-0002Yx-28 for submit@debbugs.gnu.org; Tue, 14 Apr 2020 22:20:43 -0400 Received: from lists.gnu.org ([209.51.188.17]:43609) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jOVtW-0004AB-HD for submit@debbugs.gnu.org; Tue, 14 Apr 2020 20:27:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53876) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jOVtU-0003yN-QC for bug-grep@gnu.org; Tue, 14 Apr 2020 20:27:10 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jOVtQ-0003Io-SR for bug-grep@gnu.org; Tue, 14 Apr 2020 20:27:05 -0400 Received: from nh601-vm9.bullet.mail.ssk.yahoo.co.jp ([182.22.90.18]:30744) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1jOVtP-0003Hg-V3 for bug-grep@gnu.org; Tue, 14 Apr 2020 20:27:04 -0400 Received: from [182.22.66.105] by nh601.bullet.mail.ssk.yahoo.co.jp with NNFMP; 15 Apr 2020 00:26:56 -0000 Received: from [182.22.91.129] by t603.bullet.mail.ssk.yahoo.co.jp with NNFMP; 15 Apr 2020 00:26:56 -0000 Received: from [127.0.0.1] by omp602.mail.ssk.yahoo.co.jp with NNFMP; 15 Apr 2020 00:26:56 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 737463.11452.bm@omp602.mail.ssk.yahoo.co.jp X-YMail-OSG: xQPxqAAVM1myqlixxifzURz9e1yDLSG30VuTvNRI.srf8JZPiK5gEaxsqgntMtG lT5DyX4dPNZXjKaC0z6G2Xg1tJwSDz303iGzdQc9Nm2tWbWwjLby5qSsK1ekGHhK3rlN4k3woLfv K.tIxMFEx7YtGAVW3iuAUQpOy3tFQSdv_RiKUUonORlK.o6epHS.MPmZXlGVg39GTN.4tjgXvXBZ rCb0pHyK23ERNeRAHFnSWcJosyJQI9E3nHTbdbJNagDpg7oFNlJlgqzQebksQWzr.1je0_qDB_Eq J57FrQZ8NoeBZ5XGYs50qeKQ7l8TMtUPdFR_khYxq6zaIqD6f_6NvdKPWBzDyU0HLt0Vk7AMQrNG 63RyB06yakuEGkINUIcuWvYXaa_DWDmPwIeWmZ8er0Jh359NnRXgnPCLHYQz73pFMHnwXr6eMVSM CCP4gNRhHGJxCyJJ7C0x8ioKtTCSuoR18onljo2lKFSt_vyDFLeGJXRYppjBcdRhiDIuNjtE1WBX aZhn90ZGWFTxJwZU2Ng6ZIRoGAlpoZ8PioR3DMkAVnJTZRKutQJJd7uh.imcLcRhr11Va3ppqLkT HVvtLTVaJPhh6w71B68OKVCo5KcwBOYDo_Jqx0QVA.BPnFy7PLpqjt9yp1j4Cq4GD2MQbhKDEIf0 01l0WXxsSMgquRzODPaOgGAc736xgc7fCrymKEE9SrH1CWnEXWA65qhntF_TDL15Kuv62wuGsu5C oVqu_sWdIfxZ4NG.KD7xtVPE6 Received: from jws704007.mail.kks.yahoo.co.jp by sendmailws520.mail.kks.yahoo.co.jp; Wed, 15 Apr 2020 09:26:56 +0000; 1586910416.133 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1586910416; s=yj20110701; d=yahoo.co.jp; h=Date:From:Reply-To:To:Message-ID:Subject:MIME-Version:Content-Type:Content-Transfer-Encoding:References; bh=SuMR0lFX59CKI/YSiW1e3+WvDIbueVZXjqD25gZA/YI=; b=b0hD/8mGHIjtcZ2GYVloRdVOyZeAroPoYz0MWrrWedMwlNBeU9h31xLdQNYMSnaD H/IlkijloiLHE5C3HA1gYl88HgRaw/QZXfk+YlQbUG2SUMlThmYFzaD2FmPUF+KEkW7 f7Cbn1m8wlN9pKd51e1JsSa3AcLVATzK+N627HvE= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=yj20110701; d=yahoo.co.jp; h=Date:From:Reply-To:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:References; b=Iw6j1fkGQkm7QyHZafN7jER7jaLTAOs2aw8WMfr52XkVdK/S4NnxkPsfmJVxBkMb UjdHLQyUEZBtkyM1oPje1KqIvq1LNSuMM/08NjfbCjT5aOOvfPP7qyr9TwHU6Q5xeuW fYWi8Ko4ulqz3be9fWYQjKhUpKIkw8bXIWObYJMU=; Date: Wed, 15 Apr 2020 09:26:55 +0900 (JST) From: fryasu@yahoo.co.jp To: "bug-grep@gnu.org" Message-ID: <581950183.837648.1586910415729.JavaMail.yahoo@mail.yahoo.co.jp> Subject: Massive pattern list handling with -E format seems very slow since 2.28. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable References: <581950183.837648.1586910415729.JavaMail.yahoo.ref@mail.yahoo.co.jp> Content-Length: 953 X-detected-operating-system: by eggs.gnu.org: FreeBSD 8.x [fuzzy] X-Received-From: 182.22.90.18 X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 14 Apr 2020 22:20:38 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: fryasu@yahoo.co.jp Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Hi, Massive pattern list handling with -E format seems very slow, since grep 2.28. Conversion from -E format to -F format may have problem about performance. When the processing time is measured by the script below, the result isobviously different between 2.28 and 2.27. ---- #!/bin/bash export LC_ALL=3DC rm -f grep-patterns.txt for i in {1..2000}; do =C2=A0=C2=A0=C2=A0=C2=A0 pat=3D$(echo -n "$i" | sha1sum | cut -f1 -d ' ') =C2=A0=C2=A0=C2=A0=C2=A0 echo -e "$pat$pat(\$|$pat)" >> grep-patterns.txt done echo executing grep... time grep -E -v -m1 -f grep-patterns.txt /dev/null ---- The following is the results in my PC with fedora's RPM. https://koji.fedoraproject.org/koji/packageinfo?packageID=3D1023 - result with grep 2.28 =C2=A0 real 0m11.087s / user 0m11.027s / sys 0m0.037s - result with grep 2.27 =C2=A0 real 0m0.144s / user 0m0.116s / sys 0m0.027s With also recent 3.4, result is same. I hope you find it useful. regards, From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 16 02:57:11 2020 Received: (at 40634) by debbugs.gnu.org; 16 Apr 2020 06:57:11 +0000 Received: from localhost ([127.0.0.1]:37413 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jOySU-0006jL-Pc for submit@debbugs.gnu.org; Thu, 16 Apr 2020 02:57:11 -0400 Received: from mailgw03.kcn.ne.jp ([61.86.7.210]:42195) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jOySS-0006j7-NS for 40634@debbugs.gnu.org; Thu, 16 Apr 2020 02:57:09 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw03.kcn.ne.jp (Postfix) with ESMTP id 6222113F905 for <40634@debbugs.gnu.org>; Thu, 16 Apr 2020 15:57:01 +0900 (JST) X-matriXscan-loop-detect: c677f73f25cada95bc6f2d646e7e8c6d0b105a80 Received: from mail12.kcn.ne.jp ([61.86.6.130]) by mxs02-s with ESMTP; Thu, 16 Apr 2020 15:56:58 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail12.kcn.ne.jp (Postfix) with ESMTPA id 9ACD040A5B53; Thu, 16 Apr 2020 15:56:58 +0900 (JST) Date: Thu, 16 Apr 2020 15:56:58 +0900 From: Norihiro Tanaka To: 40634@debbugs.gnu.org Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <581950183.837648.1586910415729.JavaMail.yahoo@mail.yahoo.co.jp> References: <581950183.837648.1586910415729.JavaMail.yahoo.ref@mail.yahoo.co.jp> <581950183.837648.1586910415729.JavaMail.yahoo@mail.yahoo.co.jp> Message-Id: <20200416155657.8753.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) + grep-2.2/src/grep -E -v -m1 -f grep-patterns.txt /dev/null grep-2.2/src/grep: invalid option -- 'm' Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. real 0.00 user 0.00 sys 0.00 + grep-2.3/src/grep -E -v -m1 -f grep-patterns.txt /dev/null grep-2.3/src/grep: invalid option -- 'm' Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. real 0.00 user 0.00 sys 0.00 + grep-2.4/src/grep -E -v -m1 -f grep-patterns.txt /dev/null grep-2.4/src/grep: invalid option -- 'm' Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. real 0.00 user 0.00 sys 0.00 + grep-2.4.1/src/grep -E -v -m1 -f grep-patterns.txt /dev/null grep-2.4.1/src/grep: invalid option -- 'm' Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. real 0.00 user 0.00 sys 0.00 + grep-2.4.2/src/grep -E -v -m1 -f grep-patterns.txt /dev/null grep-2.4.2/src/grep: invalid option -- 'm' Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. real 0.00 user 0.00 sys 0.00 + grep-2.5.4/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 13.58 user 13.43 sys 0.14 + grep-2.6/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.95 user 3.51 sys 0.42 + grep-2.6.1/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.97 user 3.86 sys 0.11 + grep-2.6.2/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.98 user 3.69 sys 0.28 + grep-2.6.3/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.99 user 3.94 sys 0.04 + grep-2.7/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.93 user 3.68 sys 0.24 + grep-2.8/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.89 user 3.83 sys 0.05 + grep-2.9/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.92 user 3.63 sys 0.27 + grep-2.10/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.93 user 3.65 sys 0.27 + grep-2.11/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.98 user 3.89 sys 0.08 + grep-2.12/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.95 user 3.87 sys 0.06 + grep-2.13/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.97 user 3.85 sys 0.11 + grep-2.14/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 4.01 user 3.91 sys 0.09 + grep-2.15/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 4.05 user 3.99 sys 0.05 + grep-2.16/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.91 user 3.50 sys 0.40 + grep-2.17/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.98 user 3.94 sys 0.03 + grep-2.18/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 3.98 user 3.87 sys 0.10 + grep-2.19/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.54 user 0.50 sys 0.03 + grep-2.20/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.62 user 0.57 sys 0.04 + grep-2.21/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.43 user 0.41 sys 0.02 + grep-2.22/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.18 user 0.16 sys 0.01 + grep-2.23/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.18 user 0.13 sys 0.04 + grep-2.24/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.18 user 0.13 sys 0.04 + grep-2.25/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.18 user 0.16 sys 0.01 + grep-2.26/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.18 user 0.14 sys 0.04 + grep-2.27/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.17 user 0.15 sys 0.02 + grep-2.28/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 7.22 user 7.14 sys 0.07 + grep-3.0/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 7.26 user 7.11 sys 0.14 + grep-3.1/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 7.22 user 6.88 sys 0.33 + grep-3.2/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 7.20 user 7.04 sys 0.15 + grep-3.3/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 7.24 user 7.17 sys 0.07 + grep-3.4/src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 7.13 user 6.49 sys 0.63 It seems to a lot of time is spent in dfa.c:replace(). It was added at d6df3873c7abc243683d0e8fccbfde4e76f23e53 in gnulib. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 16 12:31:44 2020 Received: (at 40634) by debbugs.gnu.org; 16 Apr 2020 16:31:44 +0000 Received: from localhost ([127.0.0.1]:39224 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jP7QV-0004XW-Tx for submit@debbugs.gnu.org; Thu, 16 Apr 2020 12:31:44 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:36702) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jP7QS-0004X4-10 for 40634@debbugs.gnu.org; Thu, 16 Apr 2020 12:31:41 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 066DA1600B2; Thu, 16 Apr 2020 09:31:34 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id CAAqHgpiOtac; Thu, 16 Apr 2020 09:31:33 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5ED071600C7; Thu, 16 Apr 2020 09:31:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id MiGrB6sXPbqZ; Thu, 16 Apr 2020 09:31:33 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 3509B1600B2; Thu, 16 Apr 2020 09:31:33 -0700 (PDT) Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. To: Norihiro Tanaka , 40634@debbugs.gnu.org References: <581950183.837648.1586910415729.JavaMail.yahoo.ref@mail.yahoo.co.jp> <581950183.837648.1586910415729.JavaMail.yahoo@mail.yahoo.co.jp> <20200416155657.8753.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <0f97b14a-bfd8-7c24-c3cf-b4d370589433@cs.ucla.edu> Date: Thu, 16 Apr 2020 09:31:32 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200416155657.8753.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 4/15/20 11:56 PM, Norihiro Tanaka wrote: > It seems to a lot of time is spent in dfa.c:replace(). > It was added at d6df3873c7abc243683d0e8fccbfde4e76f23e53 in gnulib. It would be pretty drastic to revert that patch. Is there some better way to move forward? From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 16 18:53:24 2020 Received: (at 40634) by debbugs.gnu.org; 16 Apr 2020 22:53:24 +0000 Received: from localhost ([127.0.0.1]:39547 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPDNs-0003Hz-Hf for submit@debbugs.gnu.org; Thu, 16 Apr 2020 18:53:24 -0400 Received: from mailgw07.kcn.ne.jp ([61.86.7.214]:50538) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPDNq-0003Hl-8z for 40634@debbugs.gnu.org; Thu, 16 Apr 2020 18:53:23 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw07.kcn.ne.jp (Postfix) with ESMTP id 05338410C7 for <40634@debbugs.gnu.org>; Fri, 17 Apr 2020 07:53:15 +0900 (JST) X-matriXscan-loop-detect: 092dc6bad9f14809bb40e91634e4595cbfc2582f Received: from mail14.kcn.ne.jp ([61.86.6.132]) by mxs02-s with ESMTP; Fri, 17 Apr 2020 07:53:13 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail14.kcn.ne.jp (Postfix) with ESMTPA id 5EF22416C35C; Fri, 17 Apr 2020 07:53:13 +0900 (JST) Date: Fri, 17 Apr 2020 07:53:12 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <0f97b14a-bfd8-7c24-c3cf-b4d370589433@cs.ucla.edu> References: <20200416155657.8753.27F6AC2D@kcn.ne.jp> <0f97b14a-bfd8-7c24-c3cf-b4d370589433@cs.ucla.edu> Message-Id: <20200417075312.8757.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, 40634@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Thu, 16 Apr 2020 09:31:32 -0700 Paul Eggert wrote: > On 4/15/20 11:56 PM, Norihiro Tanaka wrote: > > > It seems to a lot of time is spent in dfa.c:replace(). > > It was added at d6df3873c7abc243683d0e8fccbfde4e76f23e53 in gnulib. > > It would be pretty drastic to revert that patch. Is there some better way to move forward? I have had no idea to solve the problem yet. If we revert it, bug#33357 will come back. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 16 19:00:39 2020 Received: (at 40634) by debbugs.gnu.org; 16 Apr 2020 23:00:39 +0000 Received: from localhost ([127.0.0.1]:39551 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPDUt-0003T6-9z for submit@debbugs.gnu.org; Thu, 16 Apr 2020 19:00:39 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:55290) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPDUr-0003Ss-26 for 40634@debbugs.gnu.org; Thu, 16 Apr 2020 19:00:37 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 826A8160066; Thu, 16 Apr 2020 16:00:30 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id V_F0nEfyLGtB; Thu, 16 Apr 2020 16:00:29 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id D9311160085; Thu, 16 Apr 2020 16:00:29 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id dcJibFV7nj8n; Thu, 16 Apr 2020 16:00:29 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 779D0160066; Thu, 16 Apr 2020 16:00:29 -0700 (PDT) Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. To: Norihiro Tanaka References: <20200416155657.8753.27F6AC2D@kcn.ne.jp> <0f97b14a-bfd8-7c24-c3cf-b4d370589433@cs.ucla.edu> <20200417075312.8757.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <6503eb8e-e6fd-b4dd-aab7-76bb6955d87b@cs.ucla.edu> Date: Thu, 16 Apr 2020 16:00:29 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200417075312.8757.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, 40634@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > I have had no idea to solve the problem yet. If we revert it, bug#33357 > will come back. Yes, I'd rather not revert if we can help it. My own thought was to not analyze the regular expression if we discover that the input is empty. :-) From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 16 20:35:47 2020 Received: (at 40634) by debbugs.gnu.org; 17 Apr 2020 00:35:47 +0000 Received: from localhost ([127.0.0.1]:39700 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPEyx-0005oY-49 for submit@debbugs.gnu.org; Thu, 16 Apr 2020 20:35:47 -0400 Received: from mailgw03.kcn.ne.jp ([61.86.7.210]:43362) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPEyw-0005oL-0N for 40634@debbugs.gnu.org; Thu, 16 Apr 2020 20:35:46 -0400 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw03.kcn.ne.jp (Postfix) with ESMTP id C4FA713F911 for <40634@debbugs.gnu.org>; Fri, 17 Apr 2020 09:35:38 +0900 (JST) X-matriXscan-loop-detect: d2b092ae01d2dc9213a2e5009a205f0c3153b875 Received: from mail10.kcn.ne.jp ([61.86.6.128]) by mxs01-s with ESMTP; Fri, 17 Apr 2020 09:35:38 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail10.kcn.ne.jp (Postfix) with ESMTPA id 047844014574; Fri, 17 Apr 2020 09:35:37 +0900 (JST) Date: Fri, 17 Apr 2020 09:35:36 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <6503eb8e-e6fd-b4dd-aab7-76bb6955d87b@cs.ucla.edu> References: <20200417075312.8757.27F6AC2D@kcn.ne.jp> <6503eb8e-e6fd-b4dd-aab7-76bb6955d87b@cs.ucla.edu> Message-Id: <20200417093536.875E.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, 40634@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Thu, 16 Apr 2020 16:00:29 -0700 Paul Eggert wrote: > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > will come back. > > Yes, I'd rather not revert if we can help it. > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) Now, I have a idea, it is that we build indexes of epsilon nodes including in follows before remove epsilon nodes. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 16 21:24:54 2020 Received: (at 40634) by debbugs.gnu.org; 17 Apr 2020 01:24:54 +0000 Received: from localhost ([127.0.0.1]:39739 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPFkT-0002gK-Nq for submit@debbugs.gnu.org; Thu, 16 Apr 2020 21:24:54 -0400 Received: from mailgw06.kcn.ne.jp ([61.86.7.213]:37208) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPFkR-0002g5-51 for 40634@debbugs.gnu.org; Thu, 16 Apr 2020 21:24:52 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw06.kcn.ne.jp (Postfix) with ESMTP id 45560C8026 for <40634@debbugs.gnu.org>; Fri, 17 Apr 2020 10:24:44 +0900 (JST) X-matriXscan-loop-detect: 720f28862985e530c57a51f59961a2758d61e2bb Received: from mail10.kcn.ne.jp ([61.86.6.128]) by mxs02-s with ESMTP; Fri, 17 Apr 2020 10:24:43 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail10.kcn.ne.jp (Postfix) with ESMTPA id 15DA440E3754; Fri, 17 Apr 2020 10:24:43 +0900 (JST) Date: Fri, 17 Apr 2020 10:24:42 +0900 From: Norihiro Tanaka To: <40634@debbugs.gnu.org> Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <20200417093536.875E.27F6AC2D@kcn.ne.jp> References: <6503eb8e-e6fd-b4dd-aab7-76bb6955d87b@cs.ucla.edu> <20200417093536.875E.27F6AC2D@kcn.ne.jp> Message-Id: <20200417102441.8766.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5E99047C000000008756_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, Paul Eggert , bug-gnulib@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --------_5E99047C000000008756_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Fri, 17 Apr 2020 09:35:36 +0900 Norihiro Tanaka wrote: > > On Thu, 16 Apr 2020 16:00:29 -0700 > Paul Eggert wrote: > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > will come back. > > > > Yes, I'd rather not revert if we can help it. > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > Now, I have a idea, it is that we build indexes of epsilon nodes > including in follows before remove epsilon nodes. I wrote fix for the bug, but it will be slower then at grep 2.27 yet. --------_5E99047C000000008756_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-build-auxiliary-indexes-before-remove-epsilon-cl.patch" Content-Disposition: attachment; filename="0001-dfa-build-auxiliary-indexes-before-remove-epsilon-cl.patch" Content-Transfer-Encoding: base64 RnJvbSBlNTY4MGE0YzQzYjE4NWRmNmI4YzFiODA0MjNkNjYzZGU2ZjZlMzdlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBGcmksIDE3IEFwciAyMDIwIDEwOjEyOjAxICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBidWlsZCBhdXhpbGlhcnkgaW5kZXhlcyBiZWZvcmUgcmVtb3ZlIGVwc2lsb24gY2xvc3VyZQoK V2hlbiByZW1vdmUgZXBzaWxvbiBjbG9zdXJlLCBzbyBmYXIgc2VhcmNoZWQgbm9kZXMgaW5jbHVk aW5nIGVwc2lsb24gY2xvc3VyZQppbiBhbGwgbm9kZXMgc2VxdWVudGlhbGx5LCBidXQgaXQgaXMg c2xvdyBmb3Igc29tZSBjYXNlcy4gIE5vdyBidWlsZAphdXhpbGlhcnkgaW5kZXhlcyBiZWZvcmUg c2VhcmNoLgoKUHJvYmxlbSByZXBvcnRlZCBpbjogaHR0cHM6Ly9kZWJidWdzLmdudS5vcmcvY2dp L2J1Z3JlcG9ydC5jZ2k/YnVnPTQwNjM0CgoqIGxpYi9kZmEuYyAob3ZlcndyYXApOiBOZXcgZnVu Y3Rpb24uCihlcHNjbG9zdXJlKTogYnVpbGQgYXV4aWxpYXJ5IGluZGV4ZXMgYmVmb3JlIHJlbW92 ZSBlcHNpbG9uIGNsb3N1cmUuCi0tLQogbGliL2RmYS5jIHwgICA1MiArKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKystLS0tLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDQ1 IGluc2VydGlvbnMoKyksIDcgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvbGliL2RmYS5jIGIv bGliL2RmYS5jCmluZGV4IDk5MzlkMjIuLjk1ODA2OWUgMTAwNjQ0Ci0tLSBhL2xpYi9kZmEuYwor KysgYi9saWIvZGZhLmMKQEAgLTIyMDEsNiArMjIwMSwyMiBAQCByZXBsYWNlIChwb3NpdGlvbl9z ZXQgKmRzdCwgaWR4X3QgZGVsLCBwb3NpdGlvbl9zZXQgKmFkZCwKICAgICB9CiB9CiAKK3N0YXRp YyBib29sCitvdmVyd3JhcCAocG9zaXRpb25fc2V0IGNvbnN0ICpzLCBpZHhfdCBjb25zdCAqZWxl bXMsIGlkeF90IG5lbGVtKQoreworICBpZHhfdCBpID0gMCwgaiA9IDA7CisKKyAgd2hpbGUgKGkg PCBzLT5uZWxlbSAmJiBqIDwgbmVsZW0pCisgICAgaWYgKHMtPmVsZW1zW2ldLmluZGV4IDwgZWxl bXNbal0pCisgICAgICBpKys7CisgICAgZWxzZSBpZiAocy0+ZWxlbXNbaV0uaW5kZXggPiBlbGVt c1tqXSkKKyAgICAgIGorKzsKKyAgICBlbHNlCisgICAgICByZXR1cm4gdHJ1ZTsKKworICByZXR1 cm4gZmFsc2U7Cit9CisKIC8qIEZpbmQgdGhlIGluZGV4IG9mIHRoZSBzdGF0ZSBjb3JyZXNwb25k aW5nIHRvIHRoZSBnaXZlbiBwb3NpdGlvbiBzZXQgd2l0aAogICAgdGhlIGdpdmVuIHByZWNlZGlu ZyBjb250ZXh0LCBvciBjcmVhdGUgYSBuZXcgc3RhdGUgaWYgdGhlcmUgaXMgbm8gc3VjaAogICAg c3RhdGUuICBDb250ZXh0IHRlbGxzIHdoZXRoZXIgd2UgZ290IGhlcmUgb24gYSBuZXdsaW5lIG9y IGxldHRlci4gICovCkBAIC0yMjk4LDE0ICsyMzE0LDMzIEBAIHN0YXRpYyB2b2lkCiBlcHNjbG9z dXJlIChzdHJ1Y3QgZGZhIGNvbnN0ICpkKQogewogICBwb3NpdGlvbl9zZXQgdG1wOworICBpZHhf dCAqY3VycnMsICpuZXh0czsKKyAgaWR4X3QgbmN1cnIgPSAwOworICBpZHhfdCBubmV4dCA9IDA7 CisKICAgYWxsb2NfcG9zaXRpb25fc2V0ICgmdG1wLCBkLT5ubGVhdmVzKTsKLSAgZm9yIChpZHhf dCBpID0gMDsgaSA8IGQtPnRpbmRleDsgaSsrKQotICAgIGlmIChkLT5mb2xsb3dzW2ldLm5lbGVt ID4gMCAmJiBkLT50b2tlbnNbaV0gPj0gTk9UQ0hBUgorICBjdXJycyA9IHhubWFsbG9jIChkLT50 aW5kZXgsIHNpemVvZiAqY3VycnMpOworICBuZXh0cyA9IHhubWFsbG9jIChkLT50aW5kZXgsIHNp emVvZiAqbmV4dHMpOworCisgIGZvciAoaWR4X3QgaSA9IDA7IGkgPCBkLT50aW5kZXg7IGkrKykg eworICAgICBpZiAoZC0+Zm9sbG93c1tpXS5uZWxlbSA+IDAgJiYgZC0+dG9rZW5zW2ldID49IE5P VENIQVIKICAgICAgICAgJiYgZC0+dG9rZW5zW2ldICE9IEJBQ0tSRUYgJiYgZC0+dG9rZW5zW2ld ICE9IEFOWUNIQVIKICAgICAgICAgJiYgZC0+dG9rZW5zW2ldICE9IE1CQ1NFVCAmJiBkLT50b2tl bnNbaV0gPCBDU0VUKQorICAgICAgY3VycnNbbmN1cnIrK10gPSBpOworICB9CisKKyAgZm9yIChp ZHhfdCBpID0gMCwgaiA9IDA7IGkgPCBkLT50aW5kZXg7IGkrKykKKyAgICB7CisgICAgICB3aGls ZSAoaiA8IG5jdXJyICYmIGN1cnJzW2pdIDwgaSkKKyAgICAgICAgaisrOworICAgICAgaWYgKG92 ZXJ3cmFwICgmZC0+Zm9sbG93c1tpXSwgY3VycnMsIG5jdXJyKSkKKyAgICAgICAgbmV4dHNbbm5l eHQrK10gPSBpOworICAgIH0KKworICBmb3IgKGlkeF90IGkgPSAwOyBpIDwgbmN1cnI7IGkrKykK ICAgICAgIHsKICAgICAgICAgdW5zaWduZWQgaW50IGNvbnN0cmFpbnQ7Ci0gICAgICAgIHN3aXRj aCAoZC0+dG9rZW5zW2ldKQorICAgICAgICBzd2l0Y2ggKGQtPnRva2Vuc1tjdXJyc1tpXV0pCiAg ICAgICAgICAgewogICAgICAgICAgIGNhc2UgQkVHTElORToKICAgICAgICAgICAgIGNvbnN0cmFp bnQgPSBCRUdMSU5FX0NPTlNUUkFJTlQ7CkBAIC0yMzMwLDEzICsyMzY1LDE2IEBAIGVwc2Nsb3N1 cmUgKHN0cnVjdCBkZmEgY29uc3QgKmQpCiAgICAgICAgICAgICBicmVhazsKICAgICAgICAgICB9 CiAKLSAgICAgICAgZGVsZXRlIChpLCAmZC0+Zm9sbG93c1tpXSk7CisgICAgICAgIGRlbGV0ZSAo aSwgJmQtPmZvbGxvd3NbY3VycnNbaV1dKTsKIAotICAgICAgICBmb3IgKGlkeF90IGogPSAwOyBq IDwgZC0+dGluZGV4OyBqKyspCi0gICAgICAgICAgaWYgKGkgIT0gaiAmJiBkLT5mb2xsb3dzW2pd Lm5lbGVtID4gMCkKLSAgICAgICAgICAgIHJlcGxhY2UgKCZkLT5mb2xsb3dzW2pdLCBpLCAmZC0+ Zm9sbG93c1tpXSwgY29uc3RyYWludCwgJnRtcCk7CisgICAgICAgIGZvciAoaWR4X3QgaiA9IDA7 IGogPCBubmV4dDsgaisrKQorICAgICAgICAgIHJlcGxhY2UgKCZkLT5mb2xsb3dzW25leHRzW2pd XSwgY3VycnNbaV0sICZkLT5mb2xsb3dzW2N1cnJzW2ldXSwKKyAgICAgICAgICAgICAgICAgIGNv bnN0cmFpbnQsICZ0bXApOwogICAgICAgfQorCiAgIGZyZWUgKHRtcC5lbGVtcyk7CisgIGZyZWUg KGN1cnJzKTsKKyAgZnJlZSAobmV4dHMpOwogfQogCiAvKiBSZXR1cm5zIHRoZSBzZXQgb2YgY29u dGV4dHMgZm9yIHdoaWNoIHRoZXJlIGlzIGF0IGxlYXN0IG9uZQotLSAKMS43LjEKCg== --------_5E99047C000000008756_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 17 11:22:42 2020 Received: (at 40634) by debbugs.gnu.org; 17 Apr 2020 15:22:42 +0000 Received: from localhost ([127.0.0.1]:41483 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPSpG-0000kL-4Z for submit@debbugs.gnu.org; Fri, 17 Apr 2020 11:22:42 -0400 Received: from mailgw07.kcn.ne.jp ([61.86.7.214]:50037) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPSp9-0000k0-UB for 40634@debbugs.gnu.org; Fri, 17 Apr 2020 11:22:37 -0400 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw07.kcn.ne.jp (Postfix) with ESMTP id 9AAC24121E for <40634@debbugs.gnu.org>; Sat, 18 Apr 2020 00:22:29 +0900 (JST) X-matriXscan-loop-detect: 9d99c22d486aec8f3651227b90c269183c5eb669 Received: from mail12.kcn.ne.jp ([61.86.6.130]) by mxs01-s with ESMTP; Sat, 18 Apr 2020 00:22:27 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail12.kcn.ne.jp (Postfix) with ESMTPA id F3F5D415ABF0; Sat, 18 Apr 2020 00:22:26 +0900 (JST) Date: Sat, 18 Apr 2020 00:22:26 +0900 From: Norihiro Tanaka To: 40634@debbugs.gnu.org Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <20200417102441.8766.27F6AC2D@kcn.ne.jp> References: <20200417093536.875E.27F6AC2D@kcn.ne.jp> <20200417102441.8766.27F6AC2D@kcn.ne.jp> Message-Id: <20200418002153.8771.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5E99C912000000008760_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, Paul Eggert , bug-gnulib@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --------_5E99C912000000008760_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Fri, 17 Apr 2020 10:24:42 +0900 Norihiro Tanaka wrote: > > On Fri, 17 Apr 2020 09:35:36 +0900 > Norihiro Tanaka wrote: > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > Paul Eggert wrote: > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > > will come back. > > > > > > Yes, I'd rather not revert if we can help it. > > > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > > > Now, I have a idea, it is that we build indexes of epsilon nodes > > including in follows before remove epsilon nodes. > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet. It was improved previous patch. --------_5E99C912000000008760_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-build-auxiliary-indexes-before-remove-epsilon-cl_old.patch" Content-Disposition: attachment; filename="0001-dfa-build-auxiliary-indexes-before-remove-epsilon-cl_old.patch" Content-Transfer-Encoding: base64 RnJvbSBlNTY4MGE0YzQzYjE4NWRmNmI4YzFiODA0MjNkNjYzZGU2ZjZlMzdlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBGcmksIDE3IEFwciAyMDIwIDEwOjEyOjAxICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBidWlsZCBhdXhpbGlhcnkgaW5kZXhlcyBiZWZvcmUgcmVtb3ZlIGVwc2lsb24gY2xvc3VyZQoK V2hlbiByZW1vdmUgZXBzaWxvbiBjbG9zdXJlLCBzbyBmYXIgc2VhcmNoZWQgbm9kZXMgaW5jbHVk aW5nIGVwc2lsb24gY2xvc3VyZQppbiBhbGwgbm9kZXMgc2VxdWVudGlhbGx5LCBidXQgaXQgaXMg c2xvdyBmb3Igc29tZSBjYXNlcy4gIE5vdyBidWlsZAphdXhpbGlhcnkgaW5kZXhlcyBiZWZvcmUg c2VhcmNoLgoKUHJvYmxlbSByZXBvcnRlZCBpbjogaHR0cHM6Ly9kZWJidWdzLmdudS5vcmcvY2dp L2J1Z3JlcG9ydC5jZ2k/YnVnPTQwNjM0CgoqIGxpYi9kZmEuYyAob3ZlcndyYXApOiBOZXcgZnVu Y3Rpb24uCihlcHNjbG9zdXJlKTogYnVpbGQgYXV4aWxpYXJ5IGluZGV4ZXMgYmVmb3JlIHJlbW92 ZSBlcHNpbG9uIGNsb3N1cmUuCi0tLQogbGliL2RmYS5jIHwgICA1MiArKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKystLS0tLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDQ1 IGluc2VydGlvbnMoKyksIDcgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvbGliL2RmYS5jIGIv bGliL2RmYS5jCmluZGV4IDk5MzlkMjIuLjk1ODA2OWUgMTAwNjQ0Ci0tLSBhL2xpYi9kZmEuYwor KysgYi9saWIvZGZhLmMKQEAgLTIyMDEsNiArMjIwMSwyMiBAQCByZXBsYWNlIChwb3NpdGlvbl9z ZXQgKmRzdCwgaWR4X3QgZGVsLCBwb3NpdGlvbl9zZXQgKmFkZCwKICAgICB9CiB9CiAKK3N0YXRp YyBib29sCitvdmVyd3JhcCAocG9zaXRpb25fc2V0IGNvbnN0ICpzLCBpZHhfdCBjb25zdCAqZWxl bXMsIGlkeF90IG5lbGVtKQoreworICBpZHhfdCBpID0gMCwgaiA9IDA7CisKKyAgd2hpbGUgKGkg PCBzLT5uZWxlbSAmJiBqIDwgbmVsZW0pCisgICAgaWYgKHMtPmVsZW1zW2ldLmluZGV4IDwgZWxl bXNbal0pCisgICAgICBpKys7CisgICAgZWxzZSBpZiAocy0+ZWxlbXNbaV0uaW5kZXggPiBlbGVt c1tqXSkKKyAgICAgIGorKzsKKyAgICBlbHNlCisgICAgICByZXR1cm4gdHJ1ZTsKKworICByZXR1 cm4gZmFsc2U7Cit9CisKIC8qIEZpbmQgdGhlIGluZGV4IG9mIHRoZSBzdGF0ZSBjb3JyZXNwb25k aW5nIHRvIHRoZSBnaXZlbiBwb3NpdGlvbiBzZXQgd2l0aAogICAgdGhlIGdpdmVuIHByZWNlZGlu ZyBjb250ZXh0LCBvciBjcmVhdGUgYSBuZXcgc3RhdGUgaWYgdGhlcmUgaXMgbm8gc3VjaAogICAg c3RhdGUuICBDb250ZXh0IHRlbGxzIHdoZXRoZXIgd2UgZ290IGhlcmUgb24gYSBuZXdsaW5lIG9y IGxldHRlci4gICovCkBAIC0yMjk4LDE0ICsyMzE0LDMzIEBAIHN0YXRpYyB2b2lkCiBlcHNjbG9z dXJlIChzdHJ1Y3QgZGZhIGNvbnN0ICpkKQogewogICBwb3NpdGlvbl9zZXQgdG1wOworICBpZHhf dCAqY3VycnMsICpuZXh0czsKKyAgaWR4X3QgbmN1cnIgPSAwOworICBpZHhfdCBubmV4dCA9IDA7 CisKICAgYWxsb2NfcG9zaXRpb25fc2V0ICgmdG1wLCBkLT5ubGVhdmVzKTsKLSAgZm9yIChpZHhf dCBpID0gMDsgaSA8IGQtPnRpbmRleDsgaSsrKQotICAgIGlmIChkLT5mb2xsb3dzW2ldLm5lbGVt ID4gMCAmJiBkLT50b2tlbnNbaV0gPj0gTk9UQ0hBUgorICBjdXJycyA9IHhubWFsbG9jIChkLT50 aW5kZXgsIHNpemVvZiAqY3VycnMpOworICBuZXh0cyA9IHhubWFsbG9jIChkLT50aW5kZXgsIHNp emVvZiAqbmV4dHMpOworCisgIGZvciAoaWR4X3QgaSA9IDA7IGkgPCBkLT50aW5kZXg7IGkrKykg eworICAgICBpZiAoZC0+Zm9sbG93c1tpXS5uZWxlbSA+IDAgJiYgZC0+dG9rZW5zW2ldID49IE5P VENIQVIKICAgICAgICAgJiYgZC0+dG9rZW5zW2ldICE9IEJBQ0tSRUYgJiYgZC0+dG9rZW5zW2ld ICE9IEFOWUNIQVIKICAgICAgICAgJiYgZC0+dG9rZW5zW2ldICE9IE1CQ1NFVCAmJiBkLT50b2tl bnNbaV0gPCBDU0VUKQorICAgICAgY3VycnNbbmN1cnIrK10gPSBpOworICB9CisKKyAgZm9yIChp ZHhfdCBpID0gMCwgaiA9IDA7IGkgPCBkLT50aW5kZXg7IGkrKykKKyAgICB7CisgICAgICB3aGls ZSAoaiA8IG5jdXJyICYmIGN1cnJzW2pdIDwgaSkKKyAgICAgICAgaisrOworICAgICAgaWYgKG92 ZXJ3cmFwICgmZC0+Zm9sbG93c1tpXSwgY3VycnMsIG5jdXJyKSkKKyAgICAgICAgbmV4dHNbbm5l eHQrK10gPSBpOworICAgIH0KKworICBmb3IgKGlkeF90IGkgPSAwOyBpIDwgbmN1cnI7IGkrKykK ICAgICAgIHsKICAgICAgICAgdW5zaWduZWQgaW50IGNvbnN0cmFpbnQ7Ci0gICAgICAgIHN3aXRj aCAoZC0+dG9rZW5zW2ldKQorICAgICAgICBzd2l0Y2ggKGQtPnRva2Vuc1tjdXJyc1tpXV0pCiAg ICAgICAgICAgewogICAgICAgICAgIGNhc2UgQkVHTElORToKICAgICAgICAgICAgIGNvbnN0cmFp bnQgPSBCRUdMSU5FX0NPTlNUUkFJTlQ7CkBAIC0yMzMwLDEzICsyMzY1LDE2IEBAIGVwc2Nsb3N1 cmUgKHN0cnVjdCBkZmEgY29uc3QgKmQpCiAgICAgICAgICAgICBicmVhazsKICAgICAgICAgICB9 CiAKLSAgICAgICAgZGVsZXRlIChpLCAmZC0+Zm9sbG93c1tpXSk7CisgICAgICAgIGRlbGV0ZSAo aSwgJmQtPmZvbGxvd3NbY3VycnNbaV1dKTsKIAotICAgICAgICBmb3IgKGlkeF90IGogPSAwOyBq IDwgZC0+dGluZGV4OyBqKyspCi0gICAgICAgICAgaWYgKGkgIT0gaiAmJiBkLT5mb2xsb3dzW2pd Lm5lbGVtID4gMCkKLSAgICAgICAgICAgIHJlcGxhY2UgKCZkLT5mb2xsb3dzW2pdLCBpLCAmZC0+ Zm9sbG93c1tpXSwgY29uc3RyYWludCwgJnRtcCk7CisgICAgICAgIGZvciAoaWR4X3QgaiA9IDA7 IGogPCBubmV4dDsgaisrKQorICAgICAgICAgIHJlcGxhY2UgKCZkLT5mb2xsb3dzW25leHRzW2pd XSwgY3VycnNbaV0sICZkLT5mb2xsb3dzW2N1cnJzW2ldXSwKKyAgICAgICAgICAgICAgICAgIGNv bnN0cmFpbnQsICZ0bXApOwogICAgICAgfQorCiAgIGZyZWUgKHRtcC5lbGVtcyk7CisgIGZyZWUg KGN1cnJzKTsKKyAgZnJlZSAobmV4dHMpOwogfQogCiAvKiBSZXR1cm5zIHRoZSBzZXQgb2YgY29u dGV4dHMgZm9yIHdoaWNoIHRoZXJlIGlzIGF0IGxlYXN0IG9uZQotLSAKMS43LjEKCg== --------_5E99C912000000008760_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 18 18:42:00 2020 Received: (at 40634) by debbugs.gnu.org; 18 Apr 2020 22:42:01 +0000 Received: from localhost ([127.0.0.1]:43736 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPw9w-0006vu-HS for submit@debbugs.gnu.org; Sat, 18 Apr 2020 18:42:00 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:35356) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPw9u-0006ve-Je for 40634@debbugs.gnu.org; Sat, 18 Apr 2020 18:41:59 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 80E06207939A for <40634@debbugs.gnu.org>; Sun, 19 Apr 2020 07:41:51 +0900 (JST) X-matriXscan-loop-detect: 4bffb74d632da41de0fc358d1d520c0b2f7185ed Received: from mail11.kcn.ne.jp ([61.86.6.129]) by mxs02-s with ESMTP; Sun, 19 Apr 2020 07:41:50 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail11.kcn.ne.jp (Postfix) with ESMTPA id 38C8640E24E0; Sun, 19 Apr 2020 07:41:50 +0900 (JST) Date: Sun, 19 Apr 2020 07:41:49 +0900 From: Norihiro Tanaka To: 40634@debbugs.gnu.org, fryasu@yahoo.co.jp Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <20200418002153.8771.27F6AC2D@kcn.ne.jp> References: <20200417102441.8766.27F6AC2D@kcn.ne.jp> <20200418002153.8771.27F6AC2D@kcn.ne.jp> Message-Id: <20200419074109.431A.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5E9B81C6000000004311_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40634 Cc: Paul Eggert , bug-gnulib@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --------_5E9B81C6000000004311_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Sat, 18 Apr 2020 00:22:26 +0900 Norihiro Tanaka wrote: > > On Fri, 17 Apr 2020 10:24:42 +0900 > Norihiro Tanaka wrote: > > > > > On Fri, 17 Apr 2020 09:35:36 +0900 > > Norihiro Tanaka wrote: > > > > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > > Paul Eggert wrote: > > > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > > > will come back. > > > > > > > > Yes, I'd rather not revert if we can help it. > > > > > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > > > > > Now, I have a idea, it is that we build indexes of epsilon nodes > > > including in follows before remove epsilon nodes. > > > > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet. > > It was improved previous patch. Sorry, correct patch is here. --------_5E9B81C6000000004311_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-build-auxiliary-indexes-before-remove-epsilon-cl.patch" Content-Disposition: attachment; filename="0001-dfa-build-auxiliary-indexes-before-remove-epsilon-cl.patch" Content-Transfer-Encoding: base64 RnJvbSA4MzU3ZmE1NTFjOGE1YTRmMTQ1NDBiMjUwYmMyNDg1YzQ4MTJhMjM0IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBGcmksIDE3IEFwciAyMDIwIDEwOjEyOjAxICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBidWlsZCBhdXhpbGlhcnkgaW5kZXhlcyBiZWZvcmUgcmVtb3ZlIGVwc2lsb24gY2xvc3VyZQoK V2hlbiByZW1vdmUgZXBzaWxvbiBjbG9zdXJlLCBzbyBmYXIgc2VhcmNoZWQgbm9kZXMgaW5jbHVk aW5nIGVwc2lsb24gY2xvc3VyZQppbiBhbGwgbm9kZXMgc2VxdWVudGlhbGx5LCBidXQgaXQgaXMg c2xvdyBmb3Igc29tZSBjYXNlcy4gIE5vdyBidWlsZAphdXhpbGlhcnkgaW5kZXhlcyBiZWZvcmUg c2VhcmNoLgoKUHJvYmxlbSByZXBvcnRlZCBpbjogaHR0cHM6Ly9kZWJidWdzLmdudS5vcmcvY2dp L2J1Z3JlcG9ydC5jZ2k/YnVnPTQwNjM0CgoqIGxpYi9kZmEuYyAoZXBzY2xvc3VyZSk6IGJ1aWxk IGF1eGlsaWFyeSBpbmRleGVzIGJlZm9yZSByZW1vdmUgZXBzaWxvbgpjbG9zdXJlLgotLS0KIGxp Yi9kZmEuYyB8ICAxMjkgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrLS0tLS0tLS0tLS0tLS0tLQogMSBmaWxlcyBjaGFuZ2VkLCA5NSBpbnNlcnRpb25zKCspLCAz NCBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9saWIvZGZhLmMgYi9saWIvZGZhLmMKaW5kZXgg OTkzOWQyMi4uNWY5MGE5MiAxMDA2NDQKLS0tIGEvbGliL2RmYS5jCisrKyBiL2xpYi9kZmEuYwpA QCAtMjI5OCw0NSArMjI5OCwxMDYgQEAgc3RhdGljIHZvaWQKIGVwc2Nsb3N1cmUgKHN0cnVjdCBk ZmEgY29uc3QgKmQpCiB7CiAgIHBvc2l0aW9uX3NldCB0bXA7CisgIGlkeF90ICpjdXJycywgKmJh Y2tzOworICBpZHhfdCBuY3VyciA9IDA7CisgIHBvc2l0aW9uX3NldCAqcHJldnM7CisKICAgYWxs b2NfcG9zaXRpb25fc2V0ICgmdG1wLCBkLT5ubGVhdmVzKTsKKyAgY3VycnMgPSB4bm1hbGxvYyAo ZC0+bmxlYXZlcywgc2l6ZW9mICpjdXJycyk7CisgIGJhY2tzID0geG5tYWxsb2MgKGQtPnRpbmRl eCwgc2l6ZW9mICpiYWNrcyk7CisKICAgZm9yIChpZHhfdCBpID0gMDsgaSA8IGQtPnRpbmRleDsg aSsrKQotICAgIGlmIChkLT5mb2xsb3dzW2ldLm5lbGVtID4gMCAmJiBkLT50b2tlbnNbaV0gPj0g Tk9UQ0hBUgotICAgICAgICAmJiBkLT50b2tlbnNbaV0gIT0gQkFDS1JFRiAmJiBkLT50b2tlbnNb aV0gIT0gQU5ZQ0hBUgotICAgICAgICAmJiBkLT50b2tlbnNbaV0gIT0gTUJDU0VUICYmIGQtPnRv a2Vuc1tpXSA8IENTRVQpCi0gICAgICB7Ci0gICAgICAgIHVuc2lnbmVkIGludCBjb25zdHJhaW50 OwotICAgICAgICBzd2l0Y2ggKGQtPnRva2Vuc1tpXSkKLSAgICAgICAgICB7Ci0gICAgICAgICAg Y2FzZSBCRUdMSU5FOgotICAgICAgICAgICAgY29uc3RyYWludCA9IEJFR0xJTkVfQ09OU1RSQUlO VDsKLSAgICAgICAgICAgIGJyZWFrOwotICAgICAgICAgIGNhc2UgRU5ETElORToKLSAgICAgICAg ICAgIGNvbnN0cmFpbnQgPSBFTkRMSU5FX0NPTlNUUkFJTlQ7Ci0gICAgICAgICAgICBicmVhazsK LSAgICAgICAgICBjYXNlIEJFR1dPUkQ6Ci0gICAgICAgICAgICBjb25zdHJhaW50ID0gQkVHV09S RF9DT05TVFJBSU5UOwotICAgICAgICAgICAgYnJlYWs7Ci0gICAgICAgICAgY2FzZSBFTkRXT1JE OgotICAgICAgICAgICAgY29uc3RyYWludCA9IEVORFdPUkRfQ09OU1RSQUlOVDsKLSAgICAgICAg ICAgIGJyZWFrOwotICAgICAgICAgIGNhc2UgTElNV09SRDoKLSAgICAgICAgICAgIGNvbnN0cmFp bnQgPSBMSU1XT1JEX0NPTlNUUkFJTlQ7Ci0gICAgICAgICAgICBicmVhazsKLSAgICAgICAgICBj YXNlIE5PVExJTVdPUkQ6Ci0gICAgICAgICAgICBjb25zdHJhaW50ID0gTk9UTElNV09SRF9DT05T VFJBSU5UOwotICAgICAgICAgICAgYnJlYWs7Ci0gICAgICAgICAgZGVmYXVsdDoKLSAgICAgICAg ICAgIGNvbnN0cmFpbnQgPSBOT19DT05TVFJBSU5UOwotICAgICAgICAgICAgYnJlYWs7Ci0gICAg ICAgICAgfQorICAgIHsKKyAgICAgIGlmIChkLT5mb2xsb3dzW2ldLm5lbGVtID4gMCAmJiBkLT50 b2tlbnNbaV0gPj0gTk9UQ0hBUgorICAgICAgICAgICYmIGQtPnRva2Vuc1tpXSAhPSBBTllDSEFS ICYmIGQtPnRva2Vuc1tpXSAhPSBCRUcKKyAgICAgICAgICAmJiBkLT50b2tlbnNbaV0gIT0gQkFD S1JFRiAmJiBkLT50b2tlbnNbaV0gIT0gTUJDU0VUCisgICAgICAgICAgJiYgZC0+dG9rZW5zW2ld IDwgQ1NFVCkKKyAgICAgICAgeworICAgICAgICAgIGN1cnJzW25jdXJyXSA9IGk7CisgICAgICAg ICAgYmFja3NbaV0gPSBuY3VycisrOworICAgICAgICB9CisgICAgICBlbHNlCisgICAgICAgIGJh Y2tzW2ldID0gLTE7CisgICAgfQogCi0gICAgICAgIGRlbGV0ZSAoaSwgJmQtPmZvbGxvd3NbaV0p OworICBwcmV2cyA9IHhubWFsbG9jIChuY3Vyciwgc2l6ZW9mICpwcmV2cyk7CiAKLSAgICAgICAg Zm9yIChpZHhfdCBqID0gMDsgaiA8IGQtPnRpbmRleDsgaisrKQotICAgICAgICAgIGlmIChpICE9 IGogJiYgZC0+Zm9sbG93c1tqXS5uZWxlbSA+IDApCi0gICAgICAgICAgICByZXBsYWNlICgmZC0+ Zm9sbG93c1tqXSwgaSwgJmQtPmZvbGxvd3NbaV0sIGNvbnN0cmFpbnQsICZ0bXApOwotICAgICAg fQorICBmb3IgKGlkeF90IGkgPSAwOyBpIDwgbmN1cnI7IGkrKykKKyAgICB7CisgICAgICBwcmV2 c1tpXS5lbGVtcyA9IE5VTEw7CisgICAgICBwcmV2c1tpXS5uZWxlbSA9IDA7CisgICAgfQorCisg IGZvciAoaWR4X3QgaSA9IDA7IGkgPCBkLT50aW5kZXg7IGkrKykKKyAgICB7CisgICAgICBmb3Ig KGlkeF90IGogPSAwLCBrID0gMDsgaiA8IGQtPmZvbGxvd3NbaV0ubmVsZW0gJiYgayA8IG5jdXJy OykKKyAgICAgICAgeworICAgICAgICAgIGlmIChkLT5mb2xsb3dzW2ldLmVsZW1zW2pdLmluZGV4 IDwgY3VycnNba10pCisgICAgICAgICAgICBqKys7CisgICAgICAgICAgZWxzZSBpZiAoY3VycnNb a10gPCBkLT5mb2xsb3dzW2ldLmVsZW1zW2pdLmluZGV4KQorICAgICAgICAgICAgaysrOworICAg ICAgICAgIGVsc2UKKyAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgaWYgKGN1cnJzW2tdICE9 IGkpCisgICAgICAgICAgICAgICAgeworICAgICAgICAgICAgICAgICAgcG9zaXRpb24gcDsKKyAg ICAgICAgICAgICAgICAgIHAuaW5kZXggPSBpOworICAgICAgICAgICAgICAgICAgcC5jb25zdHJh aW50ID0gTk9fQ09OU1RSQUlOVDsKKyAgICAgICAgICAgICAgICAgIGlmIChwcmV2c1trXS5uZWxl bSA9PSAwKQorICAgICAgICAgICAgICAgICAgICBhbGxvY19wb3NpdGlvbl9zZXQgKCZwcmV2c1tr XSwgMSk7CisgICAgICAgICAgICAgICAgICBpbnNlcnQgKHAsICZwcmV2c1trXSk7CisgICAgICAg ICAgICAgICAgfQorICAgICAgICAgICAgICBqKys7CisgICAgICAgICAgICAgIGsrKzsKKyAgICAg ICAgICAgIH0KKyAgICAgICAgfQorICAgIH0KKworICBmb3IgKGlkeF90IGkgPSAwOyBpIDwgbmN1 cnI7IGkrKykKKyAgICB7CisgICAgICB1bnNpZ25lZCBpbnQgY29uc3RyYWludDsKKyAgICAgIHN3 aXRjaCAoZC0+dG9rZW5zW2N1cnJzW2ldXSkKKyAgICAgICAgeworICAgICAgICBjYXNlIEJFR0xJ TkU6CisgICAgICAgICAgY29uc3RyYWludCA9IEJFR0xJTkVfQ09OU1RSQUlOVDsKKyAgICAgICAg ICBicmVhazsKKyAgICAgICAgY2FzZSBFTkRMSU5FOgorICAgICAgICAgIGNvbnN0cmFpbnQgPSBF TkRMSU5FX0NPTlNUUkFJTlQ7CisgICAgICAgICAgYnJlYWs7CisgICAgICAgIGNhc2UgQkVHV09S RDoKKyAgICAgICAgICBjb25zdHJhaW50ID0gQkVHV09SRF9DT05TVFJBSU5UOworICAgICAgICAg IGJyZWFrOworICAgICAgICBjYXNlIEVORFdPUkQ6CisgICAgICAgICAgY29uc3RyYWludCA9IEVO RFdPUkRfQ09OU1RSQUlOVDsKKyAgICAgICAgICBicmVhazsKKyAgICAgICAgY2FzZSBMSU1XT1JE OgorICAgICAgICAgIGNvbnN0cmFpbnQgPSBMSU1XT1JEX0NPTlNUUkFJTlQ7CisgICAgICAgICAg YnJlYWs7CisgICAgICAgIGNhc2UgTk9UTElNV09SRDoKKyAgICAgICAgICBjb25zdHJhaW50ID0g Tk9UTElNV09SRF9DT05TVFJBSU5UOworICAgICAgICAgIGJyZWFrOworICAgICAgICBkZWZhdWx0 OgorICAgICAgICAgIGNvbnN0cmFpbnQgPSBOT19DT05TVFJBSU5UOworICAgICAgICAgIGJyZWFr OworICAgICAgICB9CisKKyAgICAgIGRlbGV0ZSAoY3VycnNbaV0sICZkLT5mb2xsb3dzW2N1cnJz W2ldXSk7CisKKyAgICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBwcmV2c1tpXS5uZWxlbTsgaisr KQorICAgICAgICByZXBsYWNlICgmZC0+Zm9sbG93c1twcmV2c1tpXS5lbGVtc1tqXS5pbmRleF0s IGN1cnJzW2ldLAorICAgICAgICAgICAgICAgICAmZC0+Zm9sbG93c1tjdXJyc1tpXV0sIGNvbnN0 cmFpbnQsICZ0bXApOworICAgICAgZm9yIChpZHhfdCBqID0gMDsgaiA8IGQtPmZvbGxvd3NbY3Vy cnNbaV1dLm5lbGVtOyBqKyspCisgICAgICAgIGlmIChiYWNrc1tkLT5mb2xsb3dzW2N1cnJzW2ld XS5lbGVtc1tqXS5pbmRleF0gPj0gMCkKKyAgICAgICAgICByZXBsYWNlICgmcHJldnNbYmFja3Nb ZC0+Zm9sbG93c1tjdXJyc1tpXV0uZWxlbXNbal0uaW5kZXhdXSwKKyAgICAgICAgICAgICAgICAg ICBjdXJyc1tpXSwgJnByZXZzW2ldLCBOT19DT05TVFJBSU5ULCAmdG1wKTsKKyAgICB9CisKKyAg Zm9yIChpZHhfdCBpID0gMDsgaSA8IG5jdXJyOyBpKyspCisgICAgZnJlZSAocHJldnNbaV0uZWxl bXMpOworICBmcmVlIChwcmV2cyk7CiAgIGZyZWUgKHRtcC5lbGVtcyk7CisgIGZyZWUgKGN1cnJz KTsKKyAgZnJlZSAoYmFja3MpOwogfQogCiAvKiBSZXR1cm5zIHRoZSBzZXQgb2YgY29udGV4dHMg Zm9yIHdoaWNoIHRoZXJlIGlzIGF0IGxlYXN0IG9uZQotLSAKMS43LjEKCg== --------_5E9B81C6000000004311_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 18 22:10:37 2020 Received: (at 40634) by debbugs.gnu.org; 19 Apr 2020 02:10:37 +0000 Received: from localhost ([127.0.0.1]:43845 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPzPp-0003pR-7i for submit@debbugs.gnu.org; Sat, 18 Apr 2020 22:10:37 -0400 Received: from mailgw07.kcn.ne.jp ([61.86.7.214]:40475) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jPzPo-0003pE-1A for 40634@debbugs.gnu.org; Sat, 18 Apr 2020 22:10:36 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw07.kcn.ne.jp (Postfix) with ESMTP id 4489741198 for <40634@debbugs.gnu.org>; Sun, 19 Apr 2020 11:10:29 +0900 (JST) X-matriXscan-loop-detect: a204fab572889166595e134212f7ff70eb9900fe Received: from mail14.kcn.ne.jp ([61.86.6.132]) by mxs02-s with ESMTP; Sun, 19 Apr 2020 11:10:26 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail14.kcn.ne.jp (Postfix) with ESMTPA id 5D6464121E1C; Sun, 19 Apr 2020 11:10:26 +0900 (JST) Date: Sun, 19 Apr 2020 11:10:26 +0900 From: Norihiro Tanaka To: 40634@debbugs.gnu.org Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <20200419074109.431A.27F6AC2D@kcn.ne.jp> References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> Message-Id: <20200419111025.4326.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5E9BB1D8000000004318_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, Paul Eggert , bug-gnulib@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --------_5E9BB1D8000000004318_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Sun, 19 Apr 2020 07:41:49 +0900 Norihiro Tanaka wrote: > > On Sat, 18 Apr 2020 00:22:26 +0900 > Norihiro Tanaka wrote: > > > > > On Fri, 17 Apr 2020 10:24:42 +0900 > > Norihiro Tanaka wrote: > > > > > > > > On Fri, 17 Apr 2020 09:35:36 +0900 > > > Norihiro Tanaka wrote: > > > > > > > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > > > Paul Eggert wrote: > > > > > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > > > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > > > > will come back. > > > > > > > > > > Yes, I'd rather not revert if we can help it. > > > > > > > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > > > > > > > Now, I have a idea, it is that we build indexes of epsilon nodes > > > > including in follows before remove epsilon nodes. > > > > > > > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet. > > > > It was improved previous patch. > > Sorry, correct patch is here. I made the previous patch even simpler. before: $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 7.24 user 7.14 sys 0.09 after: $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.62 user 0.52 sys 0.10 --------_5E9BB1D8000000004318_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-use-backword-set-in-removal-of-epsilon-closure.patch" Content-Disposition: attachment; filename="0001-dfa-use-backword-set-in-removal-of-epsilon-closure.patch" Content-Transfer-Encoding: base64 RnJvbSBmZmZlMzI2YTdjZDdkNDUyZjg2ZWE1MWNhZTZmZTA4MTY4YTEzOTFlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBTdW4sIDE5IEFwciAyMDIwIDExOjAxOjE4ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiB1c2UgYmFja3dvcmQgc2V0IGluIHJlbW92YWwgb2YgZXBzaWxvbiBjbG9zdXJlCgpXaGVuIHJl bW92ZSBlcHNpbG9uIGNsb3N1cmUsIHNvIGZhciBzZWFyY2hlZCBub2RlcyBpbmNsdWRpbmcgZXBz aWxvbiBjbG9zdXJlCmluIGFsbCBub2RlcyBzZXF1ZW50aWFsbHksIGJ1dCBpdCBpcyBzbG93IGZv ciBzb21lIGNhc2VzLiAgTm93IGJ1aWxkIGJhY2t3b3JkCnNldCBiZWZvcmUgc2VhcmNoLCBhbmQg b25seSBjaGVjayBwcmV2aW9zIHBvc2l0aW9uIHdpdGggdGhlIHNldC4KUHJvYmxlbSByZXBvcnRl ZCBpbjogaHR0cHM6Ly9kZWJidWdzLmdudS5vcmcvY2dpL2J1Z3JlcG9ydC5jZ2k/YnVnPTQwNjM0 CgoqIGxpYi9kZmEuYyAoc3RydWN0IGRmYSk6IE5ldyBtZW1iZXIgJ2Vwc2lsb24nLgooYWRkdG9r X21iKTogQ2hlY2sgd2hldGhlciBhIHBhdHRlcm4gaGFzIGVwc2lsb24gbm9kZSBvciBub3QuCihl cHNjbG9zdXJlKTogV2hlbiByZW1vdmUgZXBzaWxvbiBub2RlIGFuZCByZWNvbm5lY3QsIG9ubHkg Y2hlY2sgcHJldmlvcwpwb3NpdGlvbnMuCihkZmFhbmFseXplKTogQnVpbGQgYmFja3dvcmQgc2V0 LgotLS0KIGxpYi9kZmEuYyB8ICAgNjUgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKystLS0tLS0tLS0tLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDUxIGluc2VydGlv bnMoKyksIDE0IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL2xpYi9kZmEuYyBiL2xpYi9kZmEu YwppbmRleCA5OTM5ZDIyLi44MTFjNDFlIDEwMDY0NAotLS0gYS9saWIvZGZhLmMKKysrIGIvbGli L2RmYS5jCkBAIC00ODYsNiArNDg2LDcgQEAgc3RydWN0IGRmYQogICBib29sIGZhc3Q7CQkJLyog VGhlIERGQSBpcyBmYXN0LiAgKi8KICAgdG9rZW4gdXRmOF9hbnljaGFyX2NsYXNzZXNbOV07IC8q IFRvIGxvd2VyIEFOWUNIQVIgaW4gVVRGLTggbG9jYWxlcy4gICovCiAgIG1ic3RhdGVfdCBtYnM7 CQkvKiBNdWx0aWJ5dGUgY29udmVyc2lvbiBzdGF0ZS4gICovCisgIGJvb2wgZXBzaWxvbjsKIAog ICAvKiBUaGUgZm9sbG93aW5nIGFyZSB2YWxpZCBvbmx5IGlmIE1CX0NVUl9NQVggPiAxLiAgKi8K IApAQCAtMTYxMywxMyArMTYxNCwyMSBAQCBhZGR0b2tfbWIgKHN0cnVjdCBkZmEgKmRmYSwgdG9r ZW4gdCwgY2hhciBtYnByb3ApCiAgICAgICBkZmEtPnBhcnNlLmRlcHRoLS07CiAgICAgICBicmVh azsKIAotICAgIGNhc2UgQkFDS1JFRjoKLSAgICAgIGRmYS0+ZmFzdCA9IGZhbHNlOworICAgIGNh c2UgQkVHTElORToKKyAgICBjYXNlIEVORExJTkU6CisgICAgY2FzZSBCRUdXT1JEOgorICAgIGNh c2UgRU5EV09SRDoKKyAgICBjYXNlIExJTVdPUkQ6CisgICAgY2FzZSBOT1RMSU1XT1JEOgorICAg IGNhc2UgRU1QVFk6CisgICAgICBkZmEtPmVwc2lsb24gPSB0cnVlOwogICAgICAgRkFMTFRIUk9V R0g7CisKICAgICBkZWZhdWx0OgotICAgICAgZGZhLT5ubGVhdmVzKys7Ci0gICAgICBGQUxMVEhS T1VHSDsKLSAgICBjYXNlIEVNUFRZOgorICAgICAgaWYgKHQgPT0gQkFDS1JFRikKKyAgICAgICAg ZGZhLT5mYXN0ID0gZmFsc2U7CisgICAgICBpZiAodCAhPSBFTVBUWSkKKyAgICAgICAgZGZhLT5u bGVhdmVzKys7CiAgICAgICBkZmEtPnBhcnNlLmRlcHRoKys7CiAgICAgICBicmVhazsKICAgICB9 CkBAIC0yMjk1LDE0ICsyMzA0LDE1IEBAIHN0YXRlX2luZGV4IChzdHJ1Y3QgZGZhICpkLCBwb3Np dGlvbl9zZXQgY29uc3QgKnMsIGludCBjb250ZXh0KQogICAgY29uc3RyYWludC4gIFJlcGVhdCBl eGhhdXN0aXZlbHkgdW50aWwgbm8gZnVubnkgcG9zaXRpb25zIGFyZSBsZWZ0LgogICAgUy0+ZWxl bXMgbXVzdCBiZSBsYXJnZSBlbm91Z2ggdG8gaG9sZCB0aGUgcmVzdWx0LiAgKi8KIHN0YXRpYyB2 b2lkCi1lcHNjbG9zdXJlIChzdHJ1Y3QgZGZhIGNvbnN0ICpkKQorZXBzY2xvc3VyZSAoc3RydWN0 IGRmYSBjb25zdCAqZCwgcG9zaXRpb25fc2V0ICpiYWNrd29yZCkKIHsKICAgcG9zaXRpb25fc2V0 IHRtcDsKICAgYWxsb2NfcG9zaXRpb25fc2V0ICgmdG1wLCBkLT5ubGVhdmVzKTsKICAgZm9yIChp ZHhfdCBpID0gMDsgaSA8IGQtPnRpbmRleDsgaSsrKQogICAgIGlmIChkLT5mb2xsb3dzW2ldLm5l bGVtID4gMCAmJiBkLT50b2tlbnNbaV0gPj0gTk9UQ0hBUgogICAgICAgICAmJiBkLT50b2tlbnNb aV0gIT0gQkFDS1JFRiAmJiBkLT50b2tlbnNbaV0gIT0gQU5ZQ0hBUgotICAgICAgICAmJiBkLT50 b2tlbnNbaV0gIT0gTUJDU0VUICYmIGQtPnRva2Vuc1tpXSA8IENTRVQpCisgICAgICAgICYmIGQt PnRva2Vuc1tpXSAhPSBNQkNTRVQgJiYgZC0+dG9rZW5zW2ldIDwgQ1NFVAorICAgICAgICAmJiBk LT50b2tlbnNbaV0gIT0gQkVHKQogICAgICAgewogICAgICAgICB1bnNpZ25lZCBpbnQgY29uc3Ry YWludDsKICAgICAgICAgc3dpdGNoIChkLT50b2tlbnNbaV0pCkBAIC0yMzI1LDE2ICsyMzM1LDIx IEBAIGVwc2Nsb3N1cmUgKHN0cnVjdCBkZmEgY29uc3QgKmQpCiAgICAgICAgICAgY2FzZSBOT1RM SU1XT1JEOgogICAgICAgICAgICAgY29uc3RyYWludCA9IE5PVExJTVdPUkRfQ09OU1RSQUlOVDsK ICAgICAgICAgICAgIGJyZWFrOwotICAgICAgICAgIGRlZmF1bHQ6CisgICAgICAgICAgY2FzZSBF TVBUWToKICAgICAgICAgICAgIGNvbnN0cmFpbnQgPSBOT19DT05TVFJBSU5UOwogICAgICAgICAg ICAgYnJlYWs7CisgICAgICAgICAgZGVmYXVsdDoKKyAgICAgICAgICAgIGFib3J0ICgpOwogICAg ICAgICAgIH0KIAogICAgICAgICBkZWxldGUgKGksICZkLT5mb2xsb3dzW2ldKTsKIAotICAgICAg ICBmb3IgKGlkeF90IGogPSAwOyBqIDwgZC0+dGluZGV4OyBqKyspCi0gICAgICAgICAgaWYgKGkg IT0gaiAmJiBkLT5mb2xsb3dzW2pdLm5lbGVtID4gMCkKLSAgICAgICAgICAgIHJlcGxhY2UgKCZk LT5mb2xsb3dzW2pdLCBpLCAmZC0+Zm9sbG93c1tpXSwgY29uc3RyYWludCwgJnRtcCk7CisgICAg ICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBiYWNrd29yZFtpXS5uZWxlbTsgaisrKQorICAgICAg ICAgIHJlcGxhY2UgKCZkLT5mb2xsb3dzW2JhY2t3b3JkW2ldLmVsZW1zW2pdLmluZGV4XSwgaSwg JmQtPmZvbGxvd3NbaV0sCisgICAgICAgICAgICAgICAgICAgY29uc3RyYWludCwgJnRtcCk7Cisg ICAgICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBkLT5mb2xsb3dzW2ldLm5lbGVtOyBqKyspCisg ICAgICAgICAgcmVwbGFjZSAoJmJhY2t3b3JkW2QtPmZvbGxvd3NbaV0uZWxlbXNbal0uaW5kZXhd LCBpLCAmYmFja3dvcmRbaV0sCisgICAgICAgICAgICAgICAgICAgTk9fQ09OU1RSQUlOVCwgJnRt cCk7CiAgICAgICB9CiAgIGZyZWUgKHRtcC5lbGVtcyk7CiB9CkBAIC0yNjQxLDYgKzI2NTYsNyBA QCBkZmFhbmFseXplIChzdHJ1Y3QgZGZhICpkLCBib29sIHNlYXJjaGZsYWcpCiAgIC8qIEZpcnN0 cG9zIGFuZCBsYXN0cG9zIGVsZW1lbnRzLiAgKi8KICAgcG9zaXRpb24gKmZpcnN0cG9zID0gcG9z YWxsb2M7CiAgIHBvc2l0aW9uICpsYXN0cG9zID0gZmlyc3Rwb3MgKyBkLT5ubGVhdmVzOworICBw b3NpdGlvbl9zZXQgKmJhY2t3b3JkID0gTlVMTDsKICAgcG9zaXRpb24gcG9zOwogICBwb3NpdGlv bl9zZXQgdG1wOwogCkBAIC0yNjczLDYgKzI2ODksOSBAQCBkZmFhbmFseXplIChzdHJ1Y3QgZGZh ICpkLCBib29sIHNlYXJjaGZsYWcpCiAgIGFsbG9jX3Bvc2l0aW9uX3NldCAoJm1lcmdlZCwgZC0+ bmxlYXZlcyk7CiAgIGQtPmZvbGxvd3MgPSB4Y2FsbG9jIChkLT50aW5kZXgsIHNpemVvZiAqZC0+ Zm9sbG93cyk7CiAKKyAgaWYgKGQtPmVwc2lsb24pCisgICAgYmFja3dvcmQgPSB4Y2FsbG9jIChk LT50aW5kZXgsIHNpemVvZiAqYmFja3dvcmQpOworCiAgIGZvciAoaWR4X3QgaSA9IDA7IGkgPCBk LT50aW5kZXg7IGkrKykKICAgICB7CiAgICAgICBzd2l0Y2ggKGQtPnRva2Vuc1tpXSkKQEAgLTI3 MTAsNiArMjcyOSwxNyBAQCBkZmFhbmFseXplIChzdHJ1Y3QgZGZhICpkLCBib29sIHNlYXJjaGZs YWcpCiAgICAgICAgIGNhc2UgQ0FUOgogICAgICAgICAgIC8qIEV2ZXJ5IGVsZW1lbnQgaW4gdGhl IGZpcnN0cG9zIG9mIHRoZSBzZWNvbmQgYXJndW1lbnQgaXMgaW4gdGhlCiAgICAgICAgICAgICAg Zm9sbG93IG9mIGV2ZXJ5IGVsZW1lbnQgaW4gdGhlIGxhc3Rwb3Mgb2YgdGhlIGZpcnN0IGFyZ3Vt ZW50LiAgKi8KKyAgICAgICAgICBpZiAoZC0+ZXBzaWxvbikKKyAgICAgICAgICAgIHsKKyAgICAg ICAgICAgICAgdG1wLm5lbGVtID0gc3RrWy0yXS5ubGFzdHBvczsKKyAgICAgICAgICAgICAgdG1w LmVsZW1zID0gbGFzdHBvcyAtIHN0a1stMV0ubmxhc3Rwb3MgLSBzdGtbLTJdLm5sYXN0cG9zOwor ICAgICAgICAgICAgICBwb3NpdGlvbiAqcCA9IGZpcnN0cG9zIC0gc3RrWy0xXS5uZmlyc3Rwb3M7 CisgICAgICAgICAgICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBzdGtbLTFdLm5maXJzdHBvczsg aisrKQorICAgICAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgICAgIG1lcmdlICgmdG1wLCAm YmFja3dvcmRbcFtqXS5pbmRleF0sICZtZXJnZWQpOworICAgICAgICAgICAgICAgICAgY29weSAo Jm1lcmdlZCwgJmJhY2t3b3JkW3Bbal0uaW5kZXhdKTsKKyAgICAgICAgICAgICAgICB9CisgICAg ICAgICAgICB9CiAgICAgICAgICAgewogICAgICAgICAgICAgdG1wLm5lbGVtID0gc3RrWy0xXS5u Zmlyc3Rwb3M7CiAgICAgICAgICAgICB0bXAuZWxlbXMgPSBmaXJzdHBvcyAtIHN0a1stMV0ubmZp cnN0cG9zOwpAQCAtMjc5OSw5ICsyODI5LDE1IEBAIGRmYWFuYWx5emUgKHN0cnVjdCBkZmEgKmQs IGJvb2wgc2VhcmNoZmxhZykKICNlbmRpZgogICAgIH0KIAotICAvKiBGb3IgZWFjaCBmb2xsb3cg c2V0IHRoYXQgaXMgdGhlIGZvbGxvdyBzZXQgb2YgYSByZWFsIHBvc2l0aW9uLCByZXBsYWNlCi0g ICAgIGl0IHdpdGggaXRzIGVwc2lsb24gY2xvc3VyZS4gICovCi0gIGVwc2Nsb3N1cmUgKGQpOwor ICBpZiAoZC0+ZXBzaWxvbikKKyAgICB7CisgICAgICAvKiBGb3IgZWFjaCBmb2xsb3cgc2V0IHRo YXQgaXMgdGhlIGZvbGxvdyBzZXQgb2YgYSByZWFsIHBvc2l0aW9uLAorICAgICAgICAgcmVwbGFj ZSBpdCB3aXRoIGl0cyBlcHNpbG9uIGNsb3N1cmUuICAqLworICAgICAgZXBzY2xvc3VyZSAoZCwg YmFja3dvcmQpOworCisgICAgICBmb3IgKGlkeF90IGkgPSAwOyBpIDwgZC0+dGluZGV4OyBpKysp CisgICAgICAgIGZyZWUgKGJhY2t3b3JkW2ldLmVsZW1zKTsKKyAgICB9CiAKICAgZGZhb3B0aW1p emUgKGQpOwogCkBAIC0yODYzLDYgKzI4OTksNyBAQCBkZmFhbmFseXplIChzdHJ1Y3QgZGZhICpk LCBib29sIHNlYXJjaGZsYWcpCiAgIGQtPm1pbl90cmNvdW50Kys7CiAgIGQtPnRyY291bnQgPSAw OwogCisgIGZyZWUgKGJhY2t3b3JkKTsKICAgZnJlZSAocG9zYWxsb2MpOwogICBmcmVlIChzdGth bGxvYyk7CiAgIGZyZWUgKG1lcmdlZC5lbGVtcyk7Ci0tIAoxLjcuMQoK --------_5E9BB1D8000000004318_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 20 10:54:41 2020 Received: (at 40634) by debbugs.gnu.org; 20 Apr 2020 14:54:41 +0000 Received: from localhost ([127.0.0.1]:48238 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jQXom-0003Pv-P6 for submit@debbugs.gnu.org; Mon, 20 Apr 2020 10:54:41 -0400 Received: from nh501-vm14.bullet.mail.kks.yahoo.co.jp ([183.79.56.144]:32274) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1jQS2S-0000Z2-I6 for 40634@debbugs.gnu.org; Mon, 20 Apr 2020 04:44:26 -0400 Received: from [183.79.100.138] by nh501.bullet.mail.kks.yahoo.co.jp with NNFMP; 20 Apr 2020 08:44:16 -0000 Received: from [183.79.100.137] by t501.bullet.mail.kks.yahoo.co.jp with NNFMP; 20 Apr 2020 08:44:15 -0000 Received: from [127.0.0.1] by omp506.mail.kks.yahoo.co.jp with NNFMP; 20 Apr 2020 08:44:15 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 976861.20404.bm@omp506.mail.kks.yahoo.co.jp X-YMail-OSG: uQSCt2sVM1l5VparOXpTd0eMu8xYFfIesUU_kqm1ruPNlNYzuJXuLKNSPKJ8BuG LWGJuWnB5jucWJDz0ICeOae0lPmsfOKwmQE_ZbxWy5SaaZhoECRgv3RnOxcJ_2Ga2XGsAVnuU.rb VrnCQniFdQZ.eKzw2wg9G_dXMaapiNRtXagZuzFWUQArO7ccVZSUf3lOVh49GCY10Q9paoDOkhkp R3gNeQO0dONsDk8JPHhV8Yk57Ce5lOxY3jzdZssAOPjaJo8lcSqbFzSXXcAaT5jsHCUIyBQGRGbM SpDjurjFYygT0k0VTnlGokKhYj1nbVMIJAi8lv2C9P_bd5NpNBF4svhRTAkPRioAXH0D6wiDnv_R hTlCb7gv_oB3.ti6VxihV2Z_gAdaUUpiabZAPuwBQ2NrfBRw7FpgELtBf5Lk_kQ3.VH_D4ZR6aGN vjU_NpgO12JeWZqli2Tk9X2LaqVsVIL.iNvSydI7iJi8YlxwtjJiNElODCgX6JMuQFHMVaDbde_9 MN2ca0wLjUr88z2Z5apUc1gagXYVxSZuW1XESLx0YGYTw9nLBDgtfPhTW8K8.hJ0Pnu.8n34AAsX XbFLt_ZDGNK6HG.OXbBqgdVPVbwi457SQm2WFJKMsBj9.3zFqVJGDyDwbRP.WsxVnGHLp1cp7Ywm pG24hOvvhOF7S3HpgTYpS0o3BViKYUVjvVo1K7SlJnFzSW2HLUPHvu9bmxJeIrS98A8zdPXfqzDf FXoDto9hG0nzKCXI- Received: from jws704108.mail.ssk.yahoo.co.jp by sendmailws611.mail.ssk.yahoo.co.jp; Mon, 20 Apr 2020 17:44:15 +0000; 1587372255.386 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1587372255; s=yj20110701; d=yahoo.co.jp; h=Date:From:Reply-To:To:Message-ID:In-Reply-To:References:Subject:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=DK0Cc7mtg0oGka7XrSa5exxdFWa8CC4p/FQwdSM6Rrc=; b=YPOp0eLH/KPnj7a+opNNzeGocaJF0ChVwQz5t+VoLBygLwIEwbH/YKIUgUei/cbU 4KVUd/9CiAsUSS2d0CCYNeAX0i12aW918bXBaZblw6lyq5dtov8Kz64mk+7iajaOzIX woSVwAEch3ni7EnSis1zxIAQ0frxHNtuRDFmU6vQ= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=yj20110701; d=yahoo.co.jp; h=Date:From:Reply-To:Message-ID:In-Reply-To:References:MIME-Version:Content-Type:Content-Transfer-Encoding; b=tPQzjf53bZC2OtVaQPh2Er5kW5KvdiCu/Ftk2HN3nFne3WJzGzNQXxR3WCJWc0zs TXQ4fRsMe4Hw2KCfqcMvV3d5VqAjJKfhludbO6YPxT8cbIMVDFbfIjidRyVTAY7LRaF ta0joJhrwZm1/NB4gj48+ndN4BIco9efmOrGUIy4=; Date: Mon, 20 Apr 2020 17:44:14 +0900 (JST) From: fryasu@yahoo.co.jp To: "40634@debbugs.gnu.org" <40634@debbugs.gnu.org> Message-ID: <542802674.1162749.1587372254848.JavaMail.yahoo@mail.yahoo.co.jp> In-Reply-To: <20200419111025.4326.27F6AC2D@kcn.ne.jp> References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> <20200419111025.4326.27F6AC2D@kcn.ne.jp> Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Length: 260 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40634 X-Mailman-Approved-At: Mon, 20 Apr 2020 10:54:39 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: fryasu@yahoo.co.jp Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi, I immediately tried the attached patch, and was able to confirmthat grep perfomance is improved. Our work to find thousands of patterns in a file with tens of thousand lines is going extremely smoothly. I'm glad for your quick response. Thank you. From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 11 08:48:13 2020 Received: (at 40634) by debbugs.gnu.org; 11 Sep 2020 12:48:13 +0000 Received: from localhost ([127.0.0.1]:42875 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGiTN-0006ph-4h for submit@debbugs.gnu.org; Fri, 11 Sep 2020 08:48:13 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:56043) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGiTH-0006pR-OT for 40634@debbugs.gnu.org; Fri, 11 Sep 2020 08:48:11 -0400 Received: by mail-wm1-f67.google.com with SMTP id a65so4309033wme.5 for <40634@debbugs.gnu.org>; Fri, 11 Sep 2020 05:48:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Yvvs2VemeqLDPTIfwoP/7sSoOqHTKpjtJ18UZiLHrek=; b=QzaEPBlsGmwMFBvYxbCjYJ7c3/OVR8Ve0Tu5TJxiek0TUzHReFUNKGRStGu5v7/693 /p4VBXfa99IvDQo02lAoOOvcGGyrysb/2Gw26msV90dnTJRNkBdJoLW1BSDJSfjtkhn8 C3dQymO648dBhid13HKEfNldxrskzZh3fYi2/wHbqKUWRTA9isvAit5sF1FyXTT4vM3K FeTDCQUvE6XK/CMWr2M3TjYYETtMyAX6GWDpOPlDbGnY14Q67+a4+p9DEWAdGxfCdlUo 1eGL8Y6018xWvVR7KzmehWF5vZFXbFYi4o7HOK2bZ6wbbcxDO6dRRpdtkv/PuyHKj3og UU/A== X-Gm-Message-State: AOAM532keoEf4PH7BvOoFGSjTqWKCkCMHeMq/zmzNmwqyZ6I+JOKKu2Z ycAcDC+aqG2qkriTdiAE8wv10ouESe61VgeXspI= X-Google-Smtp-Source: ABdhPJx78j/HYahUlBKVU8dksVQfFd/3c57n1RBm3/Fkg5FaGtgOHu6f2aHfxWuY/1nVG+0nXaNfQYjT15er4fcnmB0= X-Received: by 2002:a05:600c:ce:: with SMTP id u14mr2168304wmm.137.1599828481900; Fri, 11 Sep 2020 05:48:01 -0700 (PDT) MIME-Version: 1.0 References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> <20200419111025.4326.27F6AC2D@kcn.ne.jp> In-Reply-To: <20200419111025.4326.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Fri, 11 Sep 2020 14:47:49 +0200 Message-ID: Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. To: Norihiro Tanaka , GNU grep developers Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, "bug-gnulib@gnu.org List" , Paul Eggert , 40634@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Sun, Apr 19, 2020 at 4:10 AM Norihiro Tanaka wrote: > On Sun, 19 Apr 2020 07:41:49 +0900 > Norihiro Tanaka wrote: > > On Sat, 18 Apr 2020 00:22:26 +0900 > > Norihiro Tanaka wrote: > > > > > > > > On Fri, 17 Apr 2020 10:24:42 +0900 > > > Norihiro Tanaka wrote: > > > > > > > > > > > On Fri, 17 Apr 2020 09:35:36 +0900 > > > > Norihiro Tanaka wrote: > > > > > > > > > > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > > > > Paul Eggert wrote: > > > > > > > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > > > > > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > > > > > will come back. > > > > > > > > > > > > Yes, I'd rather not revert if we can help it. > > > > > > > > > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > > > > > > > > > Now, I have a idea, it is that we build indexes of epsilon nodes > > > > > including in follows before remove epsilon nodes. > > > > > > > > > > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet. > > > > > > It was improved previous patch. > > > > Sorry, correct patch is here. > > I made the previous patch even simpler. > > before: > > $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null > real 7.24 > user 7.14 > sys 0.09 > > after: > > $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null > real 0.62 > user 0.52 > sys 0.10 Thank you for this patch. I have rebased and made minor syntactic changes. I'll push it to gnulib soon, if not today, then by Monday. I am considering creating a test case in grep, but it feels too tight to be feasible: I would use a relative perf test, requiring that a passing test incur a perf cost of less than say 100x. Here's the beginnings of my attempt (note: this is just an outline -- obviously would not rely on having "time" in path or as a shell builtin): gen() { local n=$1 local i=1 while : ; do local pat=$(printf $i | sha1sum | cut -d' ' -f1) printf '%s\n' "$pat$pat(\$|$pat)" i=$(expr $i + 1) test $i = $n && break done } gen 4000 > pats-4000 head -400 pats-4000 > pats-400 # With fixed code, that a 10x input size increase (n=400 to 4000) # induces a 40x runtime increase: .05 -> 2.0s # Just prior to this change, it's 150x: 0.2 -> 30s env LC_ALL=C time -p src/grep -E -v -m1 -f pats-400 /dev/null env LC_ALL=C time -p src/grep -E -v -m1 -f pats-4000 /dev/null From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 11 09:17:50 2020 Received: (at 40634) by debbugs.gnu.org; 11 Sep 2020 13:17:50 +0000 Received: from localhost ([127.0.0.1]:43027 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGiw1-0001Lh-Na for submit@debbugs.gnu.org; Fri, 11 Sep 2020 09:17:50 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:37657) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGivz-0001LU-HR for 40634@debbugs.gnu.org; Fri, 11 Sep 2020 09:17:48 -0400 Received: by mail-wm1-f66.google.com with SMTP id a9so4787494wmm.2 for <40634@debbugs.gnu.org>; Fri, 11 Sep 2020 06:17:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=64250QNkjfBfsR2yWWs/87yHdcoFTgvba7f1zeoE3Lw=; b=d9PxVO7YF3WoMnWAyWP4zQbOg7CNsAHhbCyTfdrLH0nm4Cgq+zth78Tuu/aUFvil0H WZ4Rt3Tmu86bAjCnlXo+vln0QBIbtTWH+6ZChqHdNIZK2nyP+6QtV8HxwRBLZzlS8XLg tEtSv+aESL5EBSpqm2fZqN2dapyWRR0N+5DBZPsy01dYIdxXr7sQ3AKEGcnn4AR2fpBT rb2TGMQEgbkdviGsSsIiEOYr92HWTZjIekz+JFs+1qToBa4urfio3K5D7Pjjee3p8tYC tycnB1Kg4jJpMKFf1QTtCtSObbhMMYMjv/1LFLs3hK8P2jZttQI2ut+dDP2rIia96RgF m9HQ== X-Gm-Message-State: AOAM5300bQyB1IG6cPNKVuyofcV0P900y2zyOhf58FS3EhVJj8m88msH EkVOf9YuSjrv1RrHHrGxF7u1bIcgUujW+b3lV+8= X-Google-Smtp-Source: ABdhPJyuXc3eiS0+q0HW1NerUytXB7TuNE2+KzyUTdbAj63k0o6azKPqUmo3NqW1A9lUUiw7IELC3kBP0IDbKpjWChQ= X-Received: by 2002:a7b:c847:: with SMTP id c7mr2240511wml.149.1599830261662; Fri, 11 Sep 2020 06:17:41 -0700 (PDT) MIME-Version: 1.0 References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> <20200419111025.4326.27F6AC2D@kcn.ne.jp> In-Reply-To: From: Jim Meyering Date: Fri, 11 Sep 2020 15:17:29 +0200 Message-ID: Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. To: Norihiro Tanaka , GNU grep developers Content-Type: multipart/mixed; boundary="00000000000072a11c05af0982ad" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, "bug-gnulib@gnu.org List" , Paul Eggert , 40634@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) --00000000000072a11c05af0982ad Content-Type: text/plain; charset="UTF-8" On Fri, Sep 11, 2020 at 2:47 PM Jim Meyering wrote: > On Sun, Apr 19, 2020 at 4:10 AM Norihiro Tanaka wrote: > > On Sun, 19 Apr 2020 07:41:49 +0900 > > Norihiro Tanaka wrote: > > > On Sat, 18 Apr 2020 00:22:26 +0900 > > > Norihiro Tanaka wrote: > > > > > > > > > > > On Fri, 17 Apr 2020 10:24:42 +0900 > > > > Norihiro Tanaka wrote: > > > > > > > > > > > > > > On Fri, 17 Apr 2020 09:35:36 +0900 > > > > > Norihiro Tanaka wrote: > > > > > > > > > > > > > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > > > > > Paul Eggert wrote: > > > > > > > > > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > > > > > > > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > > > > > > will come back. > > > > > > > > > > > > > > Yes, I'd rather not revert if we can help it. > > > > > > > > > > > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > > > > > > > > > > > Now, I have a idea, it is that we build indexes of epsilon nodes > > > > > > including in follows before remove epsilon nodes. > > > > > > > > > > > > > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet. > > > > > > > > It was improved previous patch. > > > > > > Sorry, correct patch is here. > > > > I made the previous patch even simpler. > > > > before: > > > > $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null > > real 7.24 > > user 7.14 > > sys 0.09 > > > > after: > > > > $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null > > real 0.62 > > user 0.52 > > sys 0.10 > > Thank you for this patch. I have rebased and made minor syntactic changes. > I'll push it to gnulib soon, if not today, then by Monday. > > I am considering creating a test case in grep, but it feels too tight > to be feasible: I would use a relative perf test, requiring that a > passing test incur a perf cost of less than say 100x. Here's the > beginnings of my attempt (note: this is just an outline -- obviously > would not rely on having "time" in path or as a shell builtin): > > gen() > { > local n=$1 > local i=1 > while : ; do > local pat=$(printf $i | sha1sum | cut -d' ' -f1) > printf '%s\n' "$pat$pat(\$|$pat)" > i=$(expr $i + 1) > test $i = $n && break > done > } > > gen 4000 > pats-4000 > head -400 pats-4000 > pats-400 > > # With fixed code, that a 10x input size increase (n=400 to 4000) > # induces a 40x runtime increase: .05 -> 2.0s > # Just prior to this change, it's 150x: 0.2 -> 30s > > env LC_ALL=C time -p src/grep -E -v -m1 -f pats-400 /dev/null > env LC_ALL=C time -p src/grep -E -v -m1 -f pats-4000 /dev/null And here is the adjusted patch: --00000000000072a11c05af0982ad Content-Type: application/octet-stream; name="dfa.c-epsilon-node-removal-speedup.patch" Content-Disposition: attachment; filename="dfa.c-epsilon-node-removal-speedup.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_key9mchp0 RnJvbSBiY2Q3OTMwZGJhMmYwNzhkYTk5MjY2MGY2YTE0Y2RlOWU0MmI5NGM2IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog RnJpLCAxMSBTZXAgMjAyMCAwMToyNToxMCAtMDcwMApTdWJqZWN0OiBbUEFUQ0hdIGRmYTogc3Bl ZWQgdXAgZXBzaWxvbi1ub2RlIHJlbW92YWwKCkJ1aWxkIGF1eGlsaWFyeSBpbmRpY2VzIGJlZm9y ZSByZW1vdmluZyBlcHNpbG9uIGNsb3N1cmUuCkJlZm9yZSwgd2hlbiByZW1vdmluZyBhbiBlcHNp bG9uIGNsb3N1cmUsIHdlIHdvdWxkIHNlYXJjaCBmb3Igbm9kZXMKaW5jbHVkaW5nIHRoYXQgZXBz aWxvbiBjbG9zdXJlIGluIGFsbCBub2RlcyBzZXF1ZW50aWFsbHksIGJ1dCB0aGF0CmNvdWxkIGJl IHZlcnkgc2xvdy4gIE5vdywgYnVpbGQgYXV4aWxpYXJ5IGluZGljZXMgYmVmb3JlIHNlYXJjaGlu Zy4KUmVwb3J0ZWQgaW46IGh0dHBzOi8vYnVncy5nbnUub3JnLzQwNjM0CgoqIGxpYi9kZmEuYyAo b3ZlcndyYXApOiBOZXcgZnVuY3Rpb24uCihlcHNjbG9zdXJlKTogQnVpbGQgYXV4aWxpYXJ5IGlu ZGljZXMgYmVmb3JlIHJlbW92aW5nIGFueQplcHNpbG9uIGNsb3N1cmU7IHVzZSB0aGVtIHRvIHNw ZWVkIHVwIHRoYXQgcHJvY2Vzcy4KLS0tCiBsaWIvZGZhLmMgfCAxMDcgKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKy0tLS0tLS0tLS0tLS0tLS0tCiAxIGZpbGUgY2hhbmdlZCwg NzMgaW5zZXJ0aW9ucygrKSwgMzQgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvbGliL2RmYS5j IGIvbGliL2RmYS5jCmluZGV4IDFmMDU4N2E3YS4uYzQzMDBkZmI1IDEwMDY0NAotLS0gYS9saWIv ZGZhLmMKKysrIGIvbGliL2RmYS5jCkBAIC0yMjAzLDYgKzIyMDMsMjIgQEAgcmVwbGFjZSAocG9z aXRpb25fc2V0ICpkc3QsIGlkeF90IGRlbCwgcG9zaXRpb25fc2V0ICphZGQsCiAgICAgfQogfQoK K3N0YXRpYyBib29sIF9HTF9BVFRSSUJVVEVfUFVSRQorb3ZlcndyYXAgKHBvc2l0aW9uX3NldCBj b25zdCAqcywgaWR4X3QgY29uc3QgKmVsZW1zLCBpZHhfdCBuZWxlbSkKK3sKKyAgaWR4X3QgaSA9 IDAsIGogPSAwOworCisgIHdoaWxlIChpIDwgcy0+bmVsZW0gJiYgaiA8IG5lbGVtKQorICAgIGlm IChzLT5lbGVtc1tpXS5pbmRleCA8IGVsZW1zW2pdKQorICAgICAgaSsrOworICAgIGVsc2UgaWYg KHMtPmVsZW1zW2ldLmluZGV4ID4gZWxlbXNbal0pCisgICAgICBqKys7CisgICAgZWxzZQorICAg ICAgcmV0dXJuIHRydWU7CisKKyAgcmV0dXJuIGZhbHNlOworfQorCiAvKiBGaW5kIHRoZSBpbmRl eCBvZiB0aGUgc3RhdGUgY29ycmVzcG9uZGluZyB0byB0aGUgZ2l2ZW4gcG9zaXRpb24gc2V0IHdp dGgKICAgIHRoZSBnaXZlbiBwcmVjZWRpbmcgY29udGV4dCwgb3IgY3JlYXRlIGEgbmV3IHN0YXRl IGlmIHRoZXJlIGlzIG5vIHN1Y2gKICAgIHN0YXRlLiAgQ29udGV4dCB0ZWxscyB3aGV0aGVyIHdl IGdvdCBoZXJlIG9uIGEgbmV3bGluZSBvciBsZXR0ZXIuICAqLwpAQCAtMjMwMCw0NSArMjMxNiw2 OCBAQCBzdGF0aWMgdm9pZAogZXBzY2xvc3VyZSAoc3RydWN0IGRmYSBjb25zdCAqZCkKIHsKICAg cG9zaXRpb25fc2V0IHRtcDsKKyAgaWR4X3QgKmN1cnJzLCAqbmV4dHM7CisgIGlkeF90IG5jdXJy ID0gMDsKKyAgaWR4X3Qgbm5leHQgPSAwOworCiAgIGFsbG9jX3Bvc2l0aW9uX3NldCAoJnRtcCwg ZC0+bmxlYXZlcyk7CisgIGN1cnJzID0geG5tYWxsb2MgKGQtPnRpbmRleCwgc2l6ZW9mICpjdXJy cyk7CisgIG5leHRzID0geG5tYWxsb2MgKGQtPnRpbmRleCwgc2l6ZW9mICpuZXh0cyk7CisKICAg Zm9yIChpZHhfdCBpID0gMDsgaSA8IGQtPnRpbmRleDsgaSsrKQotICAgIGlmIChkLT5mb2xsb3dz W2ldLm5lbGVtID4gMCAmJiBkLT50b2tlbnNbaV0gPj0gTk9UQ0hBUgotICAgICAgICAmJiBkLT50 b2tlbnNbaV0gIT0gQkFDS1JFRiAmJiBkLT50b2tlbnNbaV0gIT0gQU5ZQ0hBUgotICAgICAgICAm JiBkLT50b2tlbnNbaV0gIT0gTUJDU0VUICYmIGQtPnRva2Vuc1tpXSA8IENTRVQpCi0gICAgICB7 Ci0gICAgICAgIHVuc2lnbmVkIGludCBjb25zdHJhaW50OwotICAgICAgICBzd2l0Y2ggKGQtPnRv a2Vuc1tpXSkKLSAgICAgICAgICB7Ci0gICAgICAgICAgY2FzZSBCRUdMSU5FOgotICAgICAgICAg ICAgY29uc3RyYWludCA9IEJFR0xJTkVfQ09OU1RSQUlOVDsKLSAgICAgICAgICAgIGJyZWFrOwot ICAgICAgICAgIGNhc2UgRU5ETElORToKLSAgICAgICAgICAgIGNvbnN0cmFpbnQgPSBFTkRMSU5F X0NPTlNUUkFJTlQ7Ci0gICAgICAgICAgICBicmVhazsKLSAgICAgICAgICBjYXNlIEJFR1dPUkQ6 Ci0gICAgICAgICAgICBjb25zdHJhaW50ID0gQkVHV09SRF9DT05TVFJBSU5UOwotICAgICAgICAg ICAgYnJlYWs7Ci0gICAgICAgICAgY2FzZSBFTkRXT1JEOgotICAgICAgICAgICAgY29uc3RyYWlu dCA9IEVORFdPUkRfQ09OU1RSQUlOVDsKLSAgICAgICAgICAgIGJyZWFrOwotICAgICAgICAgIGNh c2UgTElNV09SRDoKLSAgICAgICAgICAgIGNvbnN0cmFpbnQgPSBMSU1XT1JEX0NPTlNUUkFJTlQ7 Ci0gICAgICAgICAgICBicmVhazsKLSAgICAgICAgICBjYXNlIE5PVExJTVdPUkQ6Ci0gICAgICAg ICAgICBjb25zdHJhaW50ID0gTk9UTElNV09SRF9DT05TVFJBSU5UOwotICAgICAgICAgICAgYnJl YWs7Ci0gICAgICAgICAgZGVmYXVsdDoKLSAgICAgICAgICAgIGNvbnN0cmFpbnQgPSBOT19DT05T VFJBSU5UOwotICAgICAgICAgICAgYnJlYWs7Ci0gICAgICAgICAgfQorICAgIHsKKyAgICAgIGlm IChkLT5mb2xsb3dzW2ldLm5lbGVtID4gMCAmJiBkLT50b2tlbnNbaV0gPj0gTk9UQ0hBUgorICAg ICAgICAgICYmIGQtPnRva2Vuc1tpXSAhPSBCQUNLUkVGICYmIGQtPnRva2Vuc1tpXSAhPSBBTllD SEFSCisgICAgICAgICAgJiYgZC0+dG9rZW5zW2ldICE9IE1CQ1NFVCAmJiBkLT50b2tlbnNbaV0g PCBDU0VUKQorICAgICAgICBjdXJyc1tuY3VycisrXSA9IGk7CisgICAgfQoKLSAgICAgICAgZGVs ZXRlIChpLCAmZC0+Zm9sbG93c1tpXSk7CisgIGZvciAoaWR4X3QgaSA9IDAsIGogPSAwOyBpIDwg ZC0+dGluZGV4OyBpKyspCisgICAgeworICAgICAgd2hpbGUgKGogPCBuY3VyciAmJiBjdXJyc1tq XSA8IGkpCisgICAgICAgIGorKzsKKyAgICAgIGlmIChvdmVyd3JhcCAoJmQtPmZvbGxvd3NbaV0s IGN1cnJzLCBuY3VycikpCisgICAgICAgIG5leHRzW25uZXh0KytdID0gaTsKKyAgICB9CisKKyAg Zm9yIChpZHhfdCBpID0gMDsgaSA8IG5jdXJyOyBpKyspCisgICAgeworICAgICAgdW5zaWduZWQg aW50IGNvbnN0cmFpbnQ7CisgICAgICBzd2l0Y2ggKGQtPnRva2Vuc1tjdXJyc1tpXV0pCisgICAg ICAgIHsKKyAgICAgICAgY2FzZSBCRUdMSU5FOgorICAgICAgICAgIGNvbnN0cmFpbnQgPSBCRUdM SU5FX0NPTlNUUkFJTlQ7CisgICAgICAgICAgYnJlYWs7CisgICAgICAgIGNhc2UgRU5ETElORToK KyAgICAgICAgICBjb25zdHJhaW50ID0gRU5ETElORV9DT05TVFJBSU5UOworICAgICAgICAgIGJy ZWFrOworICAgICAgICBjYXNlIEJFR1dPUkQ6CisgICAgICAgICAgY29uc3RyYWludCA9IEJFR1dP UkRfQ09OU1RSQUlOVDsKKyAgICAgICAgICBicmVhazsKKyAgICAgICAgY2FzZSBFTkRXT1JEOgor ICAgICAgICAgIGNvbnN0cmFpbnQgPSBFTkRXT1JEX0NPTlNUUkFJTlQ7CisgICAgICAgICAgYnJl YWs7CisgICAgICAgIGNhc2UgTElNV09SRDoKKyAgICAgICAgICBjb25zdHJhaW50ID0gTElNV09S RF9DT05TVFJBSU5UOworICAgICAgICAgIGJyZWFrOworICAgICAgICBjYXNlIE5PVExJTVdPUkQ6 CisgICAgICAgICAgY29uc3RyYWludCA9IE5PVExJTVdPUkRfQ09OU1RSQUlOVDsKKyAgICAgICAg ICBicmVhazsKKyAgICAgICAgZGVmYXVsdDoKKyAgICAgICAgICBjb25zdHJhaW50ID0gTk9fQ09O U1RSQUlOVDsKKyAgICAgICAgICBicmVhazsKKyAgICAgICAgfQorCisgICAgICBkZWxldGUgKGks ICZkLT5mb2xsb3dzW2N1cnJzW2ldXSk7CisKKyAgICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBu bmV4dDsgaisrKQorICAgICAgICByZXBsYWNlICgmZC0+Zm9sbG93c1tuZXh0c1tqXV0sIGN1cnJz W2ldLCAmZC0+Zm9sbG93c1tjdXJyc1tpXV0sCisgICAgICAgICAgICAgICAgIGNvbnN0cmFpbnQs ICZ0bXApOworICAgIH0KCi0gICAgICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBkLT50aW5kZXg7 IGorKykKLSAgICAgICAgICBpZiAoaSAhPSBqICYmIGQtPmZvbGxvd3Nbal0ubmVsZW0gPiAwKQot ICAgICAgICAgICAgcmVwbGFjZSAoJmQtPmZvbGxvd3Nbal0sIGksICZkLT5mb2xsb3dzW2ldLCBj b25zdHJhaW50LCAmdG1wKTsKLSAgICAgIH0KICAgZnJlZSAodG1wLmVsZW1zKTsKKyAgZnJlZSAo Y3VycnMpOworICBmcmVlIChuZXh0cyk7CiB9CgogLyogUmV0dXJucyB0aGUgc2V0IG9mIGNvbnRl eHRzIGZvciB3aGljaCB0aGVyZSBpcyBhdCBsZWFzdCBvbmUKLS0gCjIuMjguMC5yYzAKCg== --00000000000072a11c05af0982ad-- From debbugs-submit-bounces@debbugs.gnu.org Fri Sep 11 19:01:11 2020 Received: (at 40634) by debbugs.gnu.org; 11 Sep 2020 23:01:11 +0000 Received: from localhost ([127.0.0.1]:45785 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGs2Z-0003xB-7U for submit@debbugs.gnu.org; Fri, 11 Sep 2020 19:01:11 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:36664) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGs2W-0003wx-JD for 40634@debbugs.gnu.org; Fri, 11 Sep 2020 19:01:09 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0D4F016007A; Fri, 11 Sep 2020 16:01:03 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 5zxuxXTxDDuG; Fri, 11 Sep 2020 16:01:01 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BCAA216010A; Fri, 11 Sep 2020 16:01:01 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ahiDik701xCL; Fri, 11 Sep 2020 16:01:01 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 83B4916007A; Fri, 11 Sep 2020 16:01:01 -0700 (PDT) Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. To: Jim Meyering , Norihiro Tanaka , GNU grep developers References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> <20200419111025.4326.27F6AC2D@kcn.ne.jp> From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDVlFRVEFRZ0FQ d0liQXdZTApDUWdIQXdJR0ZRZ0NDUW9MQkJZQ0F3RUNIZ0VDRjRBV0lRUitONUtwMkt6MzFq TzhGWWp0bCtrT1lxcCtOQVVDClh5Vzlsd1VKRks0THN3QUtDUkR0bCtrT1lxcCtOS05WRC85 SE1zSTE2MDZuMFV1VFhId0lUc3lPakFJOVNET1QKK0MzRFV2NnFsTTVCSDJuV0FNVGlJaXlB NXVnbHNKdjkzb2kydk50RmYvUS9tLzFjblpXZ25WbkV4a3lMSTRFTgpTZDF1QnZyMC9sQ1Nk UGxQME1nNkdXU3BYTXUreDB2ZFQwQWFaTk9URTBGblB1b2xkYzNYRDc2QzJxZzhzWC9pCmF4 WFRLSHk5UCtCbEFxL0NzNy9weERRMEV6U24wVVNaMkMwbDV2djRQTXBBL3BpY25TNks2MDlK dkRHYU9SbXcKWmVYSVpxUU5aVitaUXMrVVl0Vm9ndURUcWJ5M0lVWTFJOEJsWEhScHRhajlB TW40VW9oL0NxcFFsVm9qb3lXbApIcWFGbm5KQktlRjBodko5U0F5YWx3dXpBakc3dlFXMDdN WW5jYU9GbTB3b2lLYmc1SkxPOEY0U0JUSWt1TzBECkNmOW5MQWF5NlZzQjRyendkRWZSd2pQ TFlBbjdNUjNmdkhDRXpmcmtsZFRyYWlCTzFUMGllREs4MEk3c0xmNnAKTWVDWUkxOXBVbHgw L05STUdDZGRpRklRZGZ0aEtXWEdSUzVMQXM4andCZjhINkc1UFdpblByRUlhb21JUDIxaQp2 dWhRRDA3YllxOUlpSWRlbGpqVWRIY0dJMGkvQjRNNTZaYWE4RmYzOGluaU9sckRZQ21ZV1I0 ZENXWml1UWVaCjNPZ3FlUXM5YTZqVHZnZERHVm1SVnFZK2p6azhQbGFIZmNvazhST2hGY0hL a2NmaHVCaEwyNWhsUklzaFJET0UKc2tYcUt3bnpyYnFnYTNHWFpYZnNYQW9GYnpOaExkTHY5 QStMSkFZU2tYUDYvNXFkVHBFTFZHb3N5SDg4NFZkYgpCcGtHSTA0b1lWcXVsYmtDRFFSTWdI SmtBUkFBcG9YcnZ4UDNESWZqQ05PdFhVL1Bkd01TaEtkWC9SbFNzNVBmCnVuVjF3YktQOGhl clhIcnZRZEZWcUVDYVRTeG1saHpiazhYMFBrWTlnY1ZhVTJPNDlUM3FzT2QxY0hlRjUyWUYK R0V0MExoc0JlTWpnTlg1dVoxVjc2cjhneWVWbEZwV1diMFNJd0pVQkhyRFhleEY2N3VwZVJi MnZkSEJqWUROZQp5U24rMEI3Z0ZFcXZWbVp1K0xhZHVkRHA2a1FMamF0RnZIUUhVU0dOc2hC bmtrY2FUYmlJOVBzdDBHQ2MyYWl6Cm5CaVBQQTJXUXhBUGxQUmgzT0dUc241VEhBRG1ianFZ NkZFTUxhc1ZYOERTQ2JsTXZMd05lTy84U3h6aUJpZGgKcUxwSkNxZFFSV0hrdTVYeGdJa0dl S096NU9MRHZYSFdKeWFmckVZamprUzZBazZCNXo2c3ZLbGlDbFduakhRYwpqbFB6eW9GRmdL VEVmY3FEeENqNFJZMEQwRGd0RkQwTmZ5ZU9pZHJTQi9TelRlMmh3cnlRRTNycFNpcW8rMGNH CmR6aDR5QUhLWUorVXJYWjRwOTNaaGpHZktEMXhsck5ZRGxXeVc5UEdtYnZxRnVEbWlJQVFm OVdEL3d6RWZJQ2MKK0YrdURESSt1WWtSeFVGcDkyeWttZGhERUZnMXlqWXNVOGlHVTY5YUh5 dmhxMzZ6NHpjdHZicWhSTnpPV0IxYgpWSi9kSU1EdnNFeEdjWFFWRElUN3NETlh2MHdFM2pL U0twcDdOREcxb1hVWEwrMitTRjk5S2p5NzUzQWJRU0FtCkg2MTdmeUJOd2hKV3ZRWWcrbVV2 UHBpR090c2VzOUVYVUkzbFM0djBNRWFQRzQzZmxFczFVUisxcnBGUVdWSG8KMXkxT08rc0FF UUVBQVlrQ1BBUVlBUWdBSmdJYkRCWWhCSDQza3FuWXJQZldNN3dWaU8yWDZRNWlxbjQwQlFK ZgpKYjJ6QlFrVXJndlBBQW9KRU8yWDZRNWlxbjQwY25NUC8xN0NnVWtYVDlhSUpyaVBNOHdi Y2VZcmNsNytiZFlFCmY3OVNsd1NiYkhON1I0Q29JSkZPbE45Uy8zNHR5cEdWWXZwZ21DSkRZ RlRCeHlQTzkyaU1YRGdBNCtjV0h6dDUKVDFhWU85aHNLaGg3dkR0Sys2UHJvWkdjKzA4Z1VU WEhoYjk3aE1NUWhrbkpsbmZqcFNFQzllbTkwNkZVK0k5MwpUMWZUR3VwbkJhM2FXY0s4ak0w SmFCR2J5MmhHMVMzb2xhRExTVHRCSU5OQlltdnVXUjlNS09oaHFEcmxrNWN3CkZESkxoNU5y WHRlRVkwOFdBemNMekczcGtyWFBIa0ZlTVF0ZnFrMGpMZEdHdkdDM05DSWtxWXJkTGhpUnZH cHIKdTM4QzI2UkVuNWY0STB2R0UzVmZJWEhlOFRNQ05tUXV0MU50TXVVbXBESXkxYUx4R3p1 cHRVaG5PSk4vL3IrVgpqRFBvaTNMT3lTTllwaHFlL2RNdWJzZlVyNm9oUDQxbUtGODFGdXdJ NGFtcUp0cnFJTDJ5cWF4M2EwcWxmd0N4ClhmdGllcUpjdWVrWCtlQ1BEQ0tyWU1YUjBGWWd3 cEcySVRaVUd0ckVqRVNsRTZEc2N4NzM0SEtkcjVPUklvY0wKVVVLRU9HZWlVNkRHaEdGZGI1 VHd1MFNuK3UxbVVQRE4wTSsrQ2RNdkNsSUU4a2xvNEc5MUVPSW11MVVwYjh4YwpPUFF3eGgx andxU3JVNVF3b05tU1llZ1FTSExwSVV1ckZ6MWlRVWgxdnBQWHpLaW5rV0VxdjRJcUExY2lM K0x5CnlTdUxrcDdNc0pwVlJNYldKQ05XT09TYmFING9EQko1ZEhNR2MzNXg1bW9zQ2s5MFBY a251RkREc1lIZkRvNXMKbWY5bG82WVh4N045Cj0zTGFJCi0tLS0tRU5EIFBHUCBQVUJMSUMg S0VZIEJMT0NLLS0tLS0K Organization: UCLA Computer Science Department Message-ID: <0417af9a-50a5-3462-6b38-393d80395085@cs.ucla.edu> Date: Fri, 11 Sep 2020 16:01:01 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------03EFE1E641F433A807B7C247" Content-Language: en-US X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, Gnulib bugs , 40634@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.8 (-----) This is a multi-part message in MIME format. --------------03EFE1E641F433A807B7C247 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit > And here is the adjusted patch: Hold on, that looks like a cleanup of the April 18 patch posted here: https://bugs.gnu.org/40634#26 But there's a later patch dated April 19, which Norihiro Tanaka said should be more correct and simpler: https://bugs.gnu.org/40634#32 I'll try to take a look at the later patch. --------------03EFE1E641F433A807B7C247 Content-Type: text/x-patch; charset=UTF-8; name="0001-dfa-minor-improvements-of-previous-change.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-dfa-minor-improvements-of-previous-change.patch" >From a185ed4e6341b5c6aab3e3d950aafeae9eafe4cc Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 11 Sep 2020 15:46:14 -0700 Subject: [PATCH] dfa: minor improvements of previous change * lib/dfa.c (epsclosure): Use one xnmalloc call to allocate CURRS and NEXTS, to lessen pressure on the heap allocator. Assign unconditionally in a couple of places, to help branch prediction. Redo comparison order to help the compiler. Omit unnecessary index variable J. --- ChangeLog | 9 +++++++++ lib/dfa.c | 28 ++++++++++++++-------------- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/ChangeLog b/ChangeLog index bf39cc512..da75e4757 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,12 @@ +2020-09-11 Paul Eggert + + dfa: minor improvements of previous change + * lib/dfa.c (epsclosure): Use one xnmalloc call to allocate + CURRS and NEXTS, to lessen pressure on the heap allocator. + Assign unconditionally in a couple of places, to help branch + prediction. Redo comparison order to help the compiler. + Omit unnecessary index variable J. + 2020-09-10 Paul Eggert canonicalize: fix pointer indexing bugs diff --git a/lib/dfa.c b/lib/dfa.c index c4300dfb5..df6399e45 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2203,6 +2203,8 @@ replace (position_set *dst, idx_t del, position_set *add, } } +/* Return true if any position in S has an index equal to any element + of ELEMS, which has NELEM elements. */ static bool _GL_ATTRIBUTE_PURE overwrap (position_set const *s, idx_t const *elems, idx_t nelem) { @@ -2316,28 +2318,27 @@ static void epsclosure (struct dfa const *d) { position_set tmp; - idx_t *currs, *nexts; idx_t ncurr = 0; idx_t nnext = 0; + idx_t tindex = d->tindex; alloc_position_set (&tmp, d->nleaves); - currs = xnmalloc (d->tindex, sizeof *currs); - nexts = xnmalloc (d->tindex, sizeof *nexts); - for (idx_t i = 0; i < d->tindex; i++) + idx_t *currs = xnmalloc (tindex, 2 * sizeof *currs); + for (idx_t i = 0; i < tindex; i++) { - if (d->follows[i].nelem > 0 && d->tokens[i] >= NOTCHAR - && d->tokens[i] != BACKREF && d->tokens[i] != ANYCHAR - && d->tokens[i] != MBCSET && d->tokens[i] < CSET) - currs[ncurr++] = i; + currs[ncurr] = i; + ncurr += (d->follows[i].nelem > 0 + && NOTCHAR <= d->tokens[i] && d->tokens[i] < CSET + && d->tokens[i] != BACKREF && d->tokens[i] != ANYCHAR + && d->tokens[i] != MBCSET); } - for (idx_t i = 0, j = 0; i < d->tindex; i++) + idx_t *nexts = currs + ncurr; + for (idx_t i = 0; i < tindex; i++) { - while (j < ncurr && currs[j] < i) - j++; - if (overwrap (&d->follows[i], currs, ncurr)) - nexts[nnext++] = i; + nexts[nnext] = i; + nnext += overwrap (&d->follows[i], currs, ncurr); } for (idx_t i = 0; i < ncurr; i++) @@ -2377,7 +2378,6 @@ epsclosure (struct dfa const *d) free (tmp.elems); free (currs); - free (nexts); } /* Returns the set of contexts for which there is at least one -- 2.17.1 --------------03EFE1E641F433A807B7C247-- From debbugs-submit-bounces@debbugs.gnu.org Sat Sep 12 02:42:12 2020 Received: (at 40634) by debbugs.gnu.org; 12 Sep 2020 06:42:12 +0000 Received: from localhost ([127.0.0.1]:46055 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGzEh-0000Bh-Ue for submit@debbugs.gnu.org; Sat, 12 Sep 2020 02:42:12 -0400 Received: from mail-wm1-f51.google.com ([209.85.128.51]:39879) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGzEf-0000BU-VU for 40634@debbugs.gnu.org; Sat, 12 Sep 2020 02:42:10 -0400 Received: by mail-wm1-f51.google.com with SMTP id b79so6610831wmb.4 for <40634@debbugs.gnu.org>; Fri, 11 Sep 2020 23:42:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CdlIqtgB2bfaaBR0toi0KBtEsdgzJrK53pGpBNeZN+4=; b=Mo/kpjU0HUKkHwzeaLIcs7lrVI3OEaH7h7QBTs4IbGKEoJ3qqGIq6liwOwSdFhNaGY cOS+1OUk04ShwL0sJBp8yTdn+Hk/qFzGVSFQI08N2Ci3eAO+oSJTusp/6xVJjcDDYP/2 NFOJacSXtP0t4KKNtBSHd/NAFHp3SmkS7Yn+246d1C97AxZ1VYgfdW3CZKjprDT7QOb8 f6sntADtEFrhgQ8GrivgwS9Z/uvtJoXCAMv9F5FnPP9q6CQosZ6CqgPQF2T5QOFs4k/G JomNTl5hC1eZxglDRDcHbUgpZIp3EQ36ieREO8WqnuWCgf7vVGWqqTxSZEcmn6CDR/7G 6K8w== X-Gm-Message-State: AOAM533hIxzsrYhMUy58CPAX0CEhD3AyTu02nq6BoHeS7StdnOIPcKe2 FG96AwyFxx0IgAaCoK/1zfNYJEhoE17VUlxCssY= X-Google-Smtp-Source: ABdhPJx0W03i4mrwGhzx+cE0Tz/G4mQsa8CUKu3DCKbtlzhBApRFcy67k9WEdQRulQg5qQzuU8rWzPhxZ/7aqz7Ff64= X-Received: by 2002:a7b:c847:: with SMTP id c7mr5697285wml.149.1599892924162; Fri, 11 Sep 2020 23:42:04 -0700 (PDT) MIME-Version: 1.0 References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> <20200419111025.4326.27F6AC2D@kcn.ne.jp> <0417af9a-50a5-3462-6b38-393d80395085@cs.ucla.edu> In-Reply-To: <0417af9a-50a5-3462-6b38-393d80395085@cs.ucla.edu> From: Jim Meyering Date: Sat, 12 Sep 2020 08:41:51 +0200 Message-ID: Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. To: Paul Eggert Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, Gnulib bugs , 40634@debbugs.gnu.org, Norihiro Tanaka , GNU grep developers X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Sat, Sep 12, 2020 at 1:01 AM Paul Eggert wrote: > > And here is the adjusted patch: > > Hold on, that looks like a cleanup of the April 18 patch posted here: > > https://bugs.gnu.org/40634#26 > > But there's a later patch dated April 19, which Norihiro Tanaka said should be > more correct and simpler: > > https://bugs.gnu.org/40634#32 > > I'll try to take a look at the later patch. Oh! Glad you spotted that. From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 13 22:03:50 2020 Received: (at 40634) by debbugs.gnu.org; 14 Sep 2020 02:03:50 +0000 Received: from localhost ([127.0.0.1]:52217 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kHdqN-0002Gb-Bu for submit@debbugs.gnu.org; Sun, 13 Sep 2020 22:03:50 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:53118) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kHdqI-0002GK-I5 for 40634@debbugs.gnu.org; Sun, 13 Sep 2020 22:03:46 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 17864160066; Sun, 13 Sep 2020 19:03:37 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id TTL4VJX9ijkC; Sun, 13 Sep 2020 19:03:34 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5CB4B16008E; Sun, 13 Sep 2020 19:03:34 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ZHzbTHZrwyGa; Sun, 13 Sep 2020 19:03:34 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 09A24160066; Sun, 13 Sep 2020 19:03:34 -0700 (PDT) Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. To: Jim Meyering References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> <20200419111025.4326.27F6AC2D@kcn.ne.jp> <0417af9a-50a5-3462-6b38-393d80395085@cs.ucla.edu> From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDVlFRVEFRZ0FQ d0liQXdZTApDUWdIQXdJR0ZRZ0NDUW9MQkJZQ0F3RUNIZ0VDRjRBV0lRUitONUtwMkt6MzFq TzhGWWp0bCtrT1lxcCtOQVVDClh5Vzlsd1VKRks0THN3QUtDUkR0bCtrT1lxcCtOS05WRC85 SE1zSTE2MDZuMFV1VFhId0lUc3lPakFJOVNET1QKK0MzRFV2NnFsTTVCSDJuV0FNVGlJaXlB NXVnbHNKdjkzb2kydk50RmYvUS9tLzFjblpXZ25WbkV4a3lMSTRFTgpTZDF1QnZyMC9sQ1Nk UGxQME1nNkdXU3BYTXUreDB2ZFQwQWFaTk9URTBGblB1b2xkYzNYRDc2QzJxZzhzWC9pCmF4 WFRLSHk5UCtCbEFxL0NzNy9weERRMEV6U24wVVNaMkMwbDV2djRQTXBBL3BpY25TNks2MDlK dkRHYU9SbXcKWmVYSVpxUU5aVitaUXMrVVl0Vm9ndURUcWJ5M0lVWTFJOEJsWEhScHRhajlB TW40VW9oL0NxcFFsVm9qb3lXbApIcWFGbm5KQktlRjBodko5U0F5YWx3dXpBakc3dlFXMDdN WW5jYU9GbTB3b2lLYmc1SkxPOEY0U0JUSWt1TzBECkNmOW5MQWF5NlZzQjRyendkRWZSd2pQ TFlBbjdNUjNmdkhDRXpmcmtsZFRyYWlCTzFUMGllREs4MEk3c0xmNnAKTWVDWUkxOXBVbHgw L05STUdDZGRpRklRZGZ0aEtXWEdSUzVMQXM4andCZjhINkc1UFdpblByRUlhb21JUDIxaQp2 dWhRRDA3YllxOUlpSWRlbGpqVWRIY0dJMGkvQjRNNTZaYWE4RmYzOGluaU9sckRZQ21ZV1I0 ZENXWml1UWVaCjNPZ3FlUXM5YTZqVHZnZERHVm1SVnFZK2p6azhQbGFIZmNvazhST2hGY0hL a2NmaHVCaEwyNWhsUklzaFJET0UKc2tYcUt3bnpyYnFnYTNHWFpYZnNYQW9GYnpOaExkTHY5 QStMSkFZU2tYUDYvNXFkVHBFTFZHb3N5SDg4NFZkYgpCcGtHSTA0b1lWcXVsYmtDRFFSTWdI SmtBUkFBcG9YcnZ4UDNESWZqQ05PdFhVL1Bkd01TaEtkWC9SbFNzNVBmCnVuVjF3YktQOGhl clhIcnZRZEZWcUVDYVRTeG1saHpiazhYMFBrWTlnY1ZhVTJPNDlUM3FzT2QxY0hlRjUyWUYK R0V0MExoc0JlTWpnTlg1dVoxVjc2cjhneWVWbEZwV1diMFNJd0pVQkhyRFhleEY2N3VwZVJi MnZkSEJqWUROZQp5U24rMEI3Z0ZFcXZWbVp1K0xhZHVkRHA2a1FMamF0RnZIUUhVU0dOc2hC bmtrY2FUYmlJOVBzdDBHQ2MyYWl6Cm5CaVBQQTJXUXhBUGxQUmgzT0dUc241VEhBRG1ianFZ NkZFTUxhc1ZYOERTQ2JsTXZMd05lTy84U3h6aUJpZGgKcUxwSkNxZFFSV0hrdTVYeGdJa0dl S096NU9MRHZYSFdKeWFmckVZamprUzZBazZCNXo2c3ZLbGlDbFduakhRYwpqbFB6eW9GRmdL VEVmY3FEeENqNFJZMEQwRGd0RkQwTmZ5ZU9pZHJTQi9TelRlMmh3cnlRRTNycFNpcW8rMGNH CmR6aDR5QUhLWUorVXJYWjRwOTNaaGpHZktEMXhsck5ZRGxXeVc5UEdtYnZxRnVEbWlJQVFm OVdEL3d6RWZJQ2MKK0YrdURESSt1WWtSeFVGcDkyeWttZGhERUZnMXlqWXNVOGlHVTY5YUh5 dmhxMzZ6NHpjdHZicWhSTnpPV0IxYgpWSi9kSU1EdnNFeEdjWFFWRElUN3NETlh2MHdFM2pL U0twcDdOREcxb1hVWEwrMitTRjk5S2p5NzUzQWJRU0FtCkg2MTdmeUJOd2hKV3ZRWWcrbVV2 UHBpR090c2VzOUVYVUkzbFM0djBNRWFQRzQzZmxFczFVUisxcnBGUVdWSG8KMXkxT08rc0FF UUVBQVlrQ1BBUVlBUWdBSmdJYkRCWWhCSDQza3FuWXJQZldNN3dWaU8yWDZRNWlxbjQwQlFK ZgpKYjJ6QlFrVXJndlBBQW9KRU8yWDZRNWlxbjQwY25NUC8xN0NnVWtYVDlhSUpyaVBNOHdi Y2VZcmNsNytiZFlFCmY3OVNsd1NiYkhON1I0Q29JSkZPbE45Uy8zNHR5cEdWWXZwZ21DSkRZ RlRCeHlQTzkyaU1YRGdBNCtjV0h6dDUKVDFhWU85aHNLaGg3dkR0Sys2UHJvWkdjKzA4Z1VU WEhoYjk3aE1NUWhrbkpsbmZqcFNFQzllbTkwNkZVK0k5MwpUMWZUR3VwbkJhM2FXY0s4ak0w SmFCR2J5MmhHMVMzb2xhRExTVHRCSU5OQlltdnVXUjlNS09oaHFEcmxrNWN3CkZESkxoNU5y WHRlRVkwOFdBemNMekczcGtyWFBIa0ZlTVF0ZnFrMGpMZEdHdkdDM05DSWtxWXJkTGhpUnZH cHIKdTM4QzI2UkVuNWY0STB2R0UzVmZJWEhlOFRNQ05tUXV0MU50TXVVbXBESXkxYUx4R3p1 cHRVaG5PSk4vL3IrVgpqRFBvaTNMT3lTTllwaHFlL2RNdWJzZlVyNm9oUDQxbUtGODFGdXdJ NGFtcUp0cnFJTDJ5cWF4M2EwcWxmd0N4ClhmdGllcUpjdWVrWCtlQ1BEQ0tyWU1YUjBGWWd3 cEcySVRaVUd0ckVqRVNsRTZEc2N4NzM0SEtkcjVPUklvY0wKVVVLRU9HZWlVNkRHaEdGZGI1 VHd1MFNuK3UxbVVQRE4wTSsrQ2RNdkNsSUU4a2xvNEc5MUVPSW11MVVwYjh4YwpPUFF3eGgx andxU3JVNVF3b05tU1llZ1FTSExwSVV1ckZ6MWlRVWgxdnBQWHpLaW5rV0VxdjRJcUExY2lM K0x5CnlTdUxrcDdNc0pwVlJNYldKQ05XT09TYmFING9EQko1ZEhNR2MzNXg1bW9zQ2s5MFBY a251RkREc1lIZkRvNXMKbWY5bG82WVh4N045Cj0zTGFJCi0tLS0tRU5EIFBHUCBQVUJMSUMg S0VZIEJMT0NLLS0tLS0K Organization: UCLA Computer Science Department Message-ID: <78d13c9d-0426-b913-66fc-d7d652a5500c@cs.ucla.edu> Date: Sun, 13 Sep 2020 19:03:33 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------DD7B168845614FF54EE0C9E9" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, Gnulib bugs , 40634@debbugs.gnu.org, Norihiro Tanaka , GNU grep developers X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------DD7B168845614FF54EE0C9E9 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 9/11/20 11:41 PM, Jim Meyering wrote: >> https://bugs.gnu.org/40634#32 >> >> I'll try to take a look at the later patch. > > Oh! Glad you spotted that. I took a look and the basic idea sounds good though I admit I did not check every detail. While looking into it I found some opportunities for improvements, plus I found what appear to be some longstanding bugs in the area, one of which causes a grep test failure on Solaris (and I suspect the bug is also on GNU/Linux but the grep tests don't catch it). I installed the attached patches into Gnulib, updated grep to point to the new Gnulib version, and added a note in grep's NEWS file about this. Patch 1 is what Norihiro Tanaka proposed in Bug#40634#32, except I edited the commit message. Patch 2 consists of minor cleanups and performance tweaks for Patch 1. (Patches 3 and 4 are omitted as they were installed by others into Gnulib at about the same time I was installing these.) Patch 5 fixes a dfa-heap-overrun failure on Solaris that appears to be a longstanding bug exposed by Patch 1 when running on Solaris. Patch 6 merely cleans up code near Patch 5. Patch 7 fixes the use of an uninitialized constraint, which I discovered while debugging Patch 5 under Valgrind; this also appears to be a longstandiung bug. Coming up with test cases for all these bugs would be pretty tricky, unfortunately. --------------DD7B168845614FF54EE0C9E9 Content-Type: text/x-patch; charset=UTF-8; name="0001-dfa-use-backward-set-in-removal-of-epsilon-closure.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-dfa-use-backward-set-in-removal-of-epsilon-closure.patc"; filename*1="h" >From da0e8454a8e68035ef4b87dbb9097f85df6ece27 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Sat, 12 Sep 2020 18:51:55 -0700 Subject: [PATCH 1/7] dfa: use backward set in removal of epsilon closure When removing in epsilon closure, the code searched all nodes sequentially, and this was slow for some cases. Build a backward set before search, and only check previous position with the set. Problem reported in . * lib/dfa.c (struct dfa): New member 'epsilon'. (addtok_mb): Check whether a pattern has epsilon node or not. (epsclosure): New arg BACKWORD; caller changed. When removing epsilon node and reconnecting, check only previous positions. Treat BEG as if it were character. (dfaanalyze): Build backward set. --- lib/dfa.c | 65 +++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 51 insertions(+), 14 deletions(-) diff --git a/lib/dfa.c b/lib/dfa.c index 1f0587a7a..7c8eb6685 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -488,6 +488,7 @@ struct dfa bool fast; /* The DFA is fast. */ token utf8_anychar_classes[9]; /* To lower ANYCHAR in UTF-8 locales. */ mbstate_t mbs; /* Multibyte conversion state. */ + bool epsilon; /* The following are valid only if MB_CUR_MAX > 1. */ @@ -1615,13 +1616,21 @@ addtok_mb (struct dfa *dfa, token t, char mbprop) dfa->parse.depth--; break; - case BACKREF: - dfa->fast = false; + case BEGLINE: + case ENDLINE: + case BEGWORD: + case ENDWORD: + case LIMWORD: + case NOTLIMWORD: + case EMPTY: + dfa->epsilon = true; FALLTHROUGH; + default: - dfa->nleaves++; - FALLTHROUGH; - case EMPTY: + if (t == BACKREF) + dfa->fast = false; + if (t != EMPTY) + dfa->nleaves++; dfa->parse.depth++; break; } @@ -2297,14 +2306,15 @@ state_index (struct dfa *d, position_set const *s, int context) constraint. Repeat exhaustively until no funny positions are left. S->elems must be large enough to hold the result. */ static void -epsclosure (struct dfa const *d) +epsclosure (struct dfa const *d, position_set *backword) { position_set tmp; alloc_position_set (&tmp, d->nleaves); for (idx_t i = 0; i < d->tindex; i++) if (d->follows[i].nelem > 0 && d->tokens[i] >= NOTCHAR && d->tokens[i] != BACKREF && d->tokens[i] != ANYCHAR - && d->tokens[i] != MBCSET && d->tokens[i] < CSET) + && d->tokens[i] != MBCSET && d->tokens[i] < CSET + && d->tokens[i] != BEG) { unsigned int constraint; switch (d->tokens[i]) @@ -2327,16 +2337,21 @@ epsclosure (struct dfa const *d) case NOTLIMWORD: constraint = NOTLIMWORD_CONSTRAINT; break; - default: + case EMPTY: constraint = NO_CONSTRAINT; break; + default: + abort (); } delete (i, &d->follows[i]); - for (idx_t j = 0; j < d->tindex; j++) - if (i != j && d->follows[j].nelem > 0) - replace (&d->follows[j], i, &d->follows[i], constraint, &tmp); + for (idx_t j = 0; j < backword[i].nelem; j++) + replace (&d->follows[backword[i].elems[j].index], i, &d->follows[i], + constraint, &tmp); + for (idx_t j = 0; j < d->follows[i].nelem; j++) + replace (&backword[d->follows[i].elems[j].index], i, &backword[i], + NO_CONSTRAINT, &tmp); } free (tmp.elems); } @@ -2643,6 +2658,7 @@ dfaanalyze (struct dfa *d, bool searchflag) /* Firstpos and lastpos elements. */ position *firstpos = posalloc; position *lastpos = firstpos + d->nleaves; + position_set *backword = NULL; position pos; position_set tmp; @@ -2675,6 +2691,9 @@ dfaanalyze (struct dfa *d, bool searchflag) alloc_position_set (&merged, d->nleaves); d->follows = xcalloc (d->tindex, sizeof *d->follows); + if (d->epsilon) + backword = xcalloc (d->tindex, sizeof *backword); + for (idx_t i = 0; i < d->tindex; i++) { switch (d->tokens[i]) @@ -2712,6 +2731,17 @@ dfaanalyze (struct dfa *d, bool searchflag) case CAT: /* Every element in the firstpos of the second argument is in the follow of every element in the lastpos of the first argument. */ + if (d->epsilon) + { + tmp.nelem = stk[-2].nlastpos; + tmp.elems = lastpos - stk[-1].nlastpos - stk[-2].nlastpos; + position *p = firstpos - stk[-1].nfirstpos; + for (idx_t j = 0; j < stk[-1].nfirstpos; j++) + { + merge (&tmp, &backword[p[j].index], &merged); + copy (&merged, &backword[p[j].index]); + } + } { tmp.nelem = stk[-1].nfirstpos; tmp.elems = firstpos - stk[-1].nfirstpos; @@ -2801,9 +2831,15 @@ dfaanalyze (struct dfa *d, bool searchflag) #endif } - /* For each follow set that is the follow set of a real position, replace - it with its epsilon closure. */ - epsclosure (d); + if (d->epsilon) + { + /* For each follow set that is the follow set of a real position, + replace it with its epsilon closure. */ + epsclosure (d, backword); + + for (idx_t i = 0; i < d->tindex; i++) + free (backword[i].elems); + } dfaoptimize (d); @@ -2865,6 +2901,7 @@ dfaanalyze (struct dfa *d, bool searchflag) d->min_trcount++; d->trcount = 0; + free (backword); free (posalloc); free (stkalloc); free (merged.elems); -- 2.17.1 --------------DD7B168845614FF54EE0C9E9 Content-Type: text/x-patch; charset=UTF-8; name="0002-dfa-epsilon-closure-tweaks-Bug-40634.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-dfa-epsilon-closure-tweaks-Bug-40634.patch" >From 4e14bef83b3c8096efebe343069baf13277337bc Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 12 Sep 2020 18:51:55 -0700 Subject: [PATCH 2/7] dfa: epsilon-closure tweaks (Bug#40634) Rename BACKWORD to BACKWARD consistently. * lib/dfa.c (struct dfa): Reorder members to reduce fragmentation. (addtok_mb): Redo slightly to make it act more like a state machine. Check depth only when it increases. (epsclosure): Let the switch test the tokens. (dfaanalyze): Cache tindex. Simplify position loops. Prefer xcalloc to xnmalloc + explicit zeroing. Free BACKWARD only if it is not null, since we're testing that anyway. (dfaanalyze, build_state): Use merge2 instead of doing it by hand. --- ChangeLog | 27 ++++++++++++ lib/dfa.c | 124 ++++++++++++++++++++++++++---------------------------- 2 files changed, 87 insertions(+), 64 deletions(-) diff --git a/ChangeLog b/ChangeLog index bf39cc512..66aec61cb 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,30 @@ +2020-09-12 Paul Eggert + + dfa: epsilon-closure tweaks (Bug#40634) + Rename BACKWORD to BACKWARD consistently. + * lib/dfa.c (struct dfa): Reorder members to reduce fragmentation. + (addtok_mb): Redo slightly to make it act more like a state machine. + Check depth only when it increases. + (epsclosure): Let the switch test the tokens. + (dfaanalyze): Cache tindex. Simplify position loops. + Prefer xcalloc to xnmalloc + explicit zeroing. Free BACKWARD + only if it is not null, since we're testing that anyway. + (dfaanalyze, build_state): Use merge2 instead of doing it by hand. + +2020-09-12 Norihiro Tanaka + + dfa: use backward set in removal of epsilon closure + When removing in epsilon closure, the code searched all nodes + sequentially, and this was slow for some cases. Build a backward + set before search, and only check previous position with the set. + Problem reported in . + * lib/dfa.c (struct dfa): New member 'epsilon'. + (addtok_mb): Check whether a pattern has epsilon node or not. + (epsclosure): New arg BACKWORD; caller changed. When removing + epsilon node and reconnecting, check only previous positions. + Treat BEG as if it were character. + (dfaanalyze): Build backward set. + 2020-09-10 Paul Eggert canonicalize: fix pointer indexing bugs diff --git a/lib/dfa.c b/lib/dfa.c index 7c8eb6685..0f0764887 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -482,13 +482,14 @@ struct dfa idx_t depth; /* Depth required of an evaluation stack used for depth-first traversal of the parse tree. */ - idx_t nleaves; /* Number of leaves on the parse tree. */ + idx_t nleaves; /* Number of non-EMPTY leaves + in the parse tree. */ idx_t nregexps; /* Count of parallel regexps being built with dfaparse. */ bool fast; /* The DFA is fast. */ + bool epsilon; /* Does a token match only the empty string? */ token utf8_anychar_classes[9]; /* To lower ANYCHAR in UTF-8 locales. */ mbstate_t mbs; /* Multibyte conversion state. */ - bool epsilon; /* The following are valid only if MB_CUR_MAX > 1. */ @@ -1616,26 +1617,31 @@ addtok_mb (struct dfa *dfa, token t, char mbprop) dfa->parse.depth--; break; + case EMPTY: + dfa->epsilon = true; + goto increment_depth; + + case BACKREF: + dfa->fast = false; + goto increment_nleaves; + case BEGLINE: case ENDLINE: case BEGWORD: case ENDWORD: case LIMWORD: case NOTLIMWORD: - case EMPTY: dfa->epsilon = true; FALLTHROUGH; - default: - if (t == BACKREF) - dfa->fast = false; - if (t != EMPTY) - dfa->nleaves++; + increment_nleaves: + dfa->nleaves++; + increment_depth: dfa->parse.depth++; + if (dfa->depth < dfa->parse.depth) + dfa->depth = dfa->parse.depth; break; } - if (dfa->parse.depth > dfa->depth) - dfa->depth = dfa->parse.depth; } static void addtok_wc (struct dfa *dfa, wint_t wc); @@ -2156,6 +2162,8 @@ merge (position_set const *s1, position_set const *s2, position_set *m) merge_constrained (s1, s2, -1, m); } +/* Merge into DST all the elements of SRC, possibly destroying + the contents of the temporary M. */ static void merge2 (position_set *dst, position_set const *src, position_set *m) { @@ -2300,25 +2308,26 @@ state_index (struct dfa *d, position_set const *s, int context) return i; } -/* Find the epsilon closure of a set of positions. If any position of the set +/* Find the epsilon closure of D's set of positions. If any position of the set contains a symbol that matches the empty string in some context, replace that position with the elements of its follow labeled with an appropriate constraint. Repeat exhaustively until no funny positions are left. - S->elems must be large enough to hold the result. */ + S->elems must be large enough to hold the result. BACKWARD is D's + backward set; use and update it too. */ static void -epsclosure (struct dfa const *d, position_set *backword) +epsclosure (struct dfa const *d, position_set *backward) { position_set tmp; alloc_position_set (&tmp, d->nleaves); for (idx_t i = 0; i < d->tindex; i++) - if (d->follows[i].nelem > 0 && d->tokens[i] >= NOTCHAR - && d->tokens[i] != BACKREF && d->tokens[i] != ANYCHAR - && d->tokens[i] != MBCSET && d->tokens[i] < CSET - && d->tokens[i] != BEG) + if (0 < d->follows[i].nelem) { unsigned int constraint; switch (d->tokens[i]) { + default: + continue; + case BEGLINE: constraint = BEGLINE_CONSTRAINT; break; @@ -2340,17 +2349,15 @@ epsclosure (struct dfa const *d, position_set *backword) case EMPTY: constraint = NO_CONSTRAINT; break; - default: - abort (); } delete (i, &d->follows[i]); - for (idx_t j = 0; j < backword[i].nelem; j++) - replace (&d->follows[backword[i].elems[j].index], i, &d->follows[i], + for (idx_t j = 0; j < backward[i].nelem; j++) + replace (&d->follows[backward[i].elems[j].index], i, &d->follows[i], constraint, &tmp); for (idx_t j = 0; j < d->follows[i].nelem; j++) - replace (&backword[d->follows[i].elems[j].index], i, &backword[i], + replace (&backward[d->follows[i].elems[j].index], i, &backward[i], NO_CONSTRAINT, &tmp); } free (tmp.elems); @@ -2658,7 +2665,6 @@ dfaanalyze (struct dfa *d, bool searchflag) /* Firstpos and lastpos elements. */ position *firstpos = posalloc; position *lastpos = firstpos + d->nleaves; - position_set *backword = NULL; position pos; position_set tmp; @@ -2676,10 +2682,11 @@ dfaanalyze (struct dfa *d, bool searchflag) position_set merged; /* Result of merging sets. */ addtok (d, CAT); + idx_t tindex = d->tindex; #ifdef DEBUG fprintf (stderr, "dfaanalyze:\n"); - for (idx_t i = 0; i < d->tindex; i++) + for (idx_t i = 0; i < tindex; i++) { fprintf (stderr, " %td:", i); prtok (d->tokens[i]); @@ -2689,12 +2696,11 @@ dfaanalyze (struct dfa *d, bool searchflag) d->searchflag = searchflag; alloc_position_set (&merged, d->nleaves); - d->follows = xcalloc (d->tindex, sizeof *d->follows); - - if (d->epsilon) - backword = xcalloc (d->tindex, sizeof *backword); + d->follows = xcalloc (tindex, sizeof *d->follows); + position_set *backward + = d->epsilon ? xcalloc (tindex, sizeof *backward) : NULL; - for (idx_t i = 0; i < d->tindex; i++) + for (idx_t i = 0; i < tindex; i++) { switch (d->tokens[i]) { @@ -2714,12 +2720,8 @@ dfaanalyze (struct dfa *d, bool searchflag) { tmp.elems = firstpos - stk[-1].nfirstpos; tmp.nelem = stk[-1].nfirstpos; - position *p = lastpos - stk[-1].nlastpos; - for (idx_t j = 0; j < stk[-1].nlastpos; j++) - { - merge (&tmp, &d->follows[p[j].index], &merged); - copy (&merged, &d->follows[p[j].index]); - } + for (position *p = lastpos - stk[-1].nlastpos; p < lastpos; p++) + merge2 (&d->follows[p->index], &tmp, &merged); } FALLTHROUGH; case QMARK: @@ -2729,28 +2731,27 @@ dfaanalyze (struct dfa *d, bool searchflag) break; case CAT: - /* Every element in the firstpos of the second argument is in the - follow of every element in the lastpos of the first argument. */ - if (d->epsilon) + /* Every element in the lastpos of the first argument is in + the backward set of every element in the firstpos of the + second argument. */ + if (backward) { tmp.nelem = stk[-2].nlastpos; tmp.elems = lastpos - stk[-1].nlastpos - stk[-2].nlastpos; - position *p = firstpos - stk[-1].nfirstpos; - for (idx_t j = 0; j < stk[-1].nfirstpos; j++) - { - merge (&tmp, &backword[p[j].index], &merged); - copy (&merged, &backword[p[j].index]); - } + for (position *p = firstpos - stk[-1].nfirstpos; + p < firstpos; p++) + merge2 (&backward[p->index], &tmp, &merged); } + + /* Every element in the firstpos of the second argument is in the + follow of every element in the lastpos of the first argument. */ { tmp.nelem = stk[-1].nfirstpos; tmp.elems = firstpos - stk[-1].nfirstpos; - position *p = lastpos - stk[-1].nlastpos - stk[-2].nlastpos; - for (idx_t j = 0; j < stk[-2].nlastpos; j++) - { - merge (&tmp, &d->follows[p[j].index], &merged); - copy (&merged, &d->follows[p[j].index]); - } + for (position *plim = lastpos - stk[-1].nlastpos, + *p = plim - stk[-2].nlastpos; + p < plim; p++) + merge2 (&d->follows[p->index], &tmp, &merged); } /* The firstpos of a CAT node is the firstpos of the first argument, @@ -2831,20 +2832,21 @@ dfaanalyze (struct dfa *d, bool searchflag) #endif } - if (d->epsilon) + if (backward) { /* For each follow set that is the follow set of a real position, replace it with its epsilon closure. */ - epsclosure (d, backword); + epsclosure (d, backward); - for (idx_t i = 0; i < d->tindex; i++) - free (backword[i].elems); + for (idx_t i = 0; i < tindex; i++) + free (backward[i].elems); + free (backward); } dfaoptimize (d); #ifdef DEBUG - for (idx_t i = 0; i < d->tindex; i++) + for (idx_t i = 0; i < tindex; i++) if (d->tokens[i] == BEG || d->tokens[i] < NOTCHAR || d->tokens[i] == BACKREF || d->tokens[i] == ANYCHAR || d->tokens[i] == MBCSET || d->tokens[i] >= CSET) @@ -2868,12 +2870,10 @@ dfaanalyze (struct dfa *d, bool searchflag) append (pos, &tmp); - d->separates = xnmalloc (d->tindex, sizeof *d->separates); + d->separates = xcalloc (tindex, sizeof *d->separates); - for (idx_t i = 0; i < d->tindex; i++) + for (idx_t i = 0; i < tindex; i++) { - d->separates[i] = 0; - if (prev_newline_dependent (d->constraints[i])) d->separates[i] |= CTX_NEWLINE; if (prev_letter_dependent (d->constraints[i])) @@ -2901,7 +2901,6 @@ dfaanalyze (struct dfa *d, bool searchflag) d->min_trcount++; d->trcount = 0; - free (backword); free (posalloc); free (stkalloc); free (merged.elems); @@ -3172,10 +3171,7 @@ build_state (state_num s, struct dfa *d, unsigned char uc) mergeit &= d->multibyte_prop[group.elems[j].index]; } if (mergeit) - { - merge (&d->states[0].elems, &group, &tmp); - copy (&tmp, &group); - } + merge2 (&group, &d->states[0].elems, &tmp); } /* Find out if the new state will want any context information, -- 2.17.1 --------------DD7B168845614FF54EE0C9E9 Content-Type: text/x-patch; charset=UTF-8; name="0005-dfa-fix-dfa-heap-overrun-failure.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0005-dfa-fix-dfa-heap-overrun-failure.patch" >From cafb61533f5bfb989698e3924f97471498b2422b Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 13 Sep 2020 18:20:01 -0700 Subject: [PATCH 5/7] dfa: fix dfa-heap-overrun failure * lib/dfa.c (reorder_tokens): When setting map[d->follows[i].elems[j].index], instead of incorrectly assuming that (i < d->follows[i].elems[j].index), use two loops, one to set the map array and the other to use it. The incorrect assumption caused some elements to be missed, and this in turn caused grep's dfa-heap-overrun test to fail on Solaris 10 sparc when compiled with GCC. I found this bug while investigating https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-sparc/builds/183 and I think the bug also occurs on GNU/Linux but with more-subtle symptoms. The bug predates the recent dfa.c changes; perhaps the recent changes make the bug more likely. --- ChangeLog | 15 +++++++++++++++ lib/dfa.c | 12 ++++++------ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/ChangeLog b/ChangeLog index a5c14c45e..1b5720761 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,18 @@ +2020-09-13 Paul Eggert + + dfa: fix dfa-heap-overrun failure + * lib/dfa.c (reorder_tokens): When setting + map[d->follows[i].elems[j].index], instead of incorrectly assuming + that (i < d->follows[i].elems[j].index), use two loops, one to set + the map array and the other to use it. The incorrect assumption + caused some elements to be missed, and this in turn caused grep's + dfa-heap-overrun test to fail on Solaris 10 sparc when compiled + with GCC. I found this bug while investigating + https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-sparc/builds/183 + and I think the bug also occurs on GNU/Linux but with more-subtle + symptoms. The bug predates the recent dfa.c changes; perhaps the + recent changes make the bug more likely. + 2020-09-13 Bruno Haible parse-datetime: Make the build rule work with parallel 'make'. diff --git a/lib/dfa.c b/lib/dfa.c index 0f0764887..4992bcaf2 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2519,6 +2519,11 @@ reorder_tokens (struct dfa *d) else multibyte_prop = NULL; + for (idx_t i = 0; i < d->tindex; i++) + for (idx_t j = 0; j < d->follows[i].nelem; j++) + if (map[d->follows[i].elems[j].index] == -1) + map[d->follows[i].elems[j].index] = nleaves++; + for (idx_t i = 0; i < d->tindex; i++) { if (map[i] == -1) @@ -2537,12 +2542,7 @@ reorder_tokens (struct dfa *d) multibyte_prop[map[i]] = d->multibyte_prop[i]; for (idx_t j = 0; j < d->follows[i].nelem; j++) - { - if (map[d->follows[i].elems[j].index] == -1) - map[d->follows[i].elems[j].index] = nleaves++; - - d->follows[i].elems[j].index = map[d->follows[i].elems[j].index]; - } + d->follows[i].elems[j].index = map[d->follows[i].elems[j].index]; qsort (d->follows[i].elems, d->follows[i].nelem, sizeof *d->follows[i].elems, compare); -- 2.17.1 --------------DD7B168845614FF54EE0C9E9 Content-Type: text/x-patch; charset=UTF-8; name="0006-dfa-assume-C99-in-reorder_tokens.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0006-dfa-assume-C99-in-reorder_tokens.patch" >From 5332a21b1188224ecddbbe8b234b618d0b84437a Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 13 Sep 2020 18:27:07 -0700 Subject: [PATCH 6/7] dfa: assume C99 in reorder_tokens * lib/dfa.c (reorder_tokens): Assume C99 and simplify. --- ChangeLog | 3 +++ lib/dfa.c | 32 ++++++++++---------------------- 2 files changed, 13 insertions(+), 22 deletions(-) diff --git a/ChangeLog b/ChangeLog index 1b5720761..5f7a148e3 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,8 @@ 2020-09-13 Paul Eggert + dfa: assume C99 in reorder_tokens + * lib/dfa.c (reorder_tokens): Assume C99 and simplify. + dfa: fix dfa-heap-overrun failure * lib/dfa.c (reorder_tokens): When setting map[d->follows[i].elems[j].index], instead of incorrectly assuming diff --git a/lib/dfa.c b/lib/dfa.c index 4992bcaf2..0fa9958fd 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2494,39 +2494,27 @@ compare (const void *a, const void *b) static void reorder_tokens (struct dfa *d) { - idx_t nleaves; - ptrdiff_t *map; - token *tokens; - position_set *follows; - int *constraints; - char *multibyte_prop; - - nleaves = 0; - - map = xnmalloc (d->tindex, sizeof *map); - + idx_t nleaves = 0; + ptrdiff_t *map = xnmalloc (d->tindex, sizeof *map); map[0] = nleaves++; - for (idx_t i = 1; i < d->tindex; i++) map[i] = -1; - tokens = xnmalloc (d->nleaves, sizeof *tokens); - follows = xnmalloc (d->nleaves, sizeof *follows); - constraints = xnmalloc (d->nleaves, sizeof *constraints); - - if (d->localeinfo.multibyte) - multibyte_prop = xnmalloc (d->nleaves, sizeof *multibyte_prop); - else - multibyte_prop = NULL; + token *tokens = xnmalloc (d->nleaves, sizeof *tokens); + position_set *follows = xnmalloc (d->nleaves, sizeof *follows); + int *constraints = xnmalloc (d->nleaves, sizeof *constraints); + char *multibyte_prop = (d->localeinfo.multibyte + ? xnmalloc (d->nleaves, sizeof *multibyte_prop) + : NULL); for (idx_t i = 0; i < d->tindex; i++) for (idx_t j = 0; j < d->follows[i].nelem; j++) - if (map[d->follows[i].elems[j].index] == -1) + if (map[d->follows[i].elems[j].index] < 0) map[d->follows[i].elems[j].index] = nleaves++; for (idx_t i = 0; i < d->tindex; i++) { - if (map[i] == -1) + if (map[i] < 0) { free (d->follows[i].elems); d->follows[i].elems = NULL; -- 2.17.1 --------------DD7B168845614FF54EE0C9E9 Content-Type: text/x-patch; charset=UTF-8; name="0007-dfa-avoid-use-of-uninitialized-constraint.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0007-dfa-avoid-use-of-uninitialized-constraint.patch" >From 46bdd627ff522193134d31bdfd3ac4e4fddb5975 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 13 Sep 2020 18:40:08 -0700 Subject: [PATCH 7/7] dfa: avoid use of uninitialized constraint * lib/dfa.c (merge_nfa_state): Do not initialize the constraint to zero here. (dfaoptimize): Do it here instead, via xcalloc. This prevents the use of an uninitialized constraint by later code when ! (flags[i] & OPT_QUEUED) means merge_nfa_state was not called to initialize the constraint. Problem found by running 'valgrind src/grep -E '(^| )*(a|b)*(c|d)*( |$)' < /dev/null' on Ubuntu 18.04.5 x86-64. --- ChangeLog | 9 +++++++++ lib/dfa.c | 4 +--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/ChangeLog b/ChangeLog index 5f7a148e3..395ac6baf 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,14 @@ 2020-09-13 Paul Eggert + dfa: avoid use of uninitialized constraint + * lib/dfa.c (merge_nfa_state): Do not initialize the constraint + to zero here. + (dfaoptimize): Do it here instead, via xcalloc. This prevents the + use of an uninitialized constraint by later code when ! (flags[i] + & OPT_QUEUED) means merge_nfa_state was not called to initialize + the constraint. Problem found by running 'valgrind src/grep -E + '(^| )*(a|b)*(c|d)*( |$)' < /dev/null' on Ubuntu 18.04.5 x86-64. + dfa: assume C99 in reorder_tokens * lib/dfa.c (reorder_tokens): Assume C99 and simplify. diff --git a/lib/dfa.c b/lib/dfa.c index 0fa9958fd..746c7b568 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2428,8 +2428,6 @@ merge_nfa_state (struct dfa *d, idx_t tindex, char *flags, position_set *follows = d->follows; idx_t nelem = 0; - d->constraints[tindex] = 0; - for (idx_t i = 0; i < follows[tindex].nelem; i++) { idx_t sindex = follows[tindex].elems[i].index; @@ -2581,7 +2579,7 @@ dfaoptimize (struct dfa *d) position_set *merged = &merged0; alloc_position_set (merged, d->nleaves); - d->constraints = xnmalloc (d->tindex, sizeof *d->constraints); + d->constraints = xcalloc (d->tindex, sizeof *d->constraints); for (idx_t i = 0; i < d->tindex; i++) if (flags[i] & OPT_QUEUED) -- 2.17.1 --------------DD7B168845614FF54EE0C9E9-- From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 14 10:14:33 2020 Received: (at 40634) by debbugs.gnu.org; 14 Sep 2020 14:14:33 +0000 Received: from localhost ([127.0.0.1]:55669 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kHpFZ-00058Q-Eq for submit@debbugs.gnu.org; Mon, 14 Sep 2020 10:14:33 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:38412) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kHpFY-00058A-3c for 40634@debbugs.gnu.org; Mon, 14 Sep 2020 10:14:32 -0400 Received: by mail-wr1-f67.google.com with SMTP id g4so18947534wrs.5 for <40634@debbugs.gnu.org>; Mon, 14 Sep 2020 07:14:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HzXbwQ2o9R0ziV7C5XHrRyjHh2jp9oVk6Sqk1xjV+SE=; b=QW/zkxRiuTJspCuWabVWH6k4Fnk1JqLgB6TUPuLFSfvEb5vHuGfKT4kse6JXNvmRHp rryEcIFQLYb/BcvBZtHi2eaqbOF9kgiADNra+xRj9hH92zzXw7qT5HJSO2rRITccgy9h Rgn8E8eIB2JW9mpjNoE4dXLxCC29mb4OzRPRYPFfMsdfFFt6Qa0e/o9QLcp0+Xz6w1sr gkaytSQLOaJYRVTvWXjP7B1QfqS4ua9cxgS5vdXctTdfNAps5owcKziPaVvhImTvnaOH Puia3GLA2CJcDIkomjpS2EgZ3wOExf/nWt9MzOAEccf8lOalvo3Ni/x0MND3al/7CoE3 WRYA== X-Gm-Message-State: AOAM5313xe1E8wEJCVjxuiO46U7VcwbAuXt0DzOHagOYeL2OsmjyvH6K bJu3VWu6EfOKOAfZrKymV/jO0dNFGNWAvaj1lr8= X-Google-Smtp-Source: ABdhPJyPMaEZwEMv74Jo1TRwPof3SI+Uf7cEFSbhKU69rq2l2kPobpyaXLYanF5rFjdO1V5T7GhmiG6mjyhEa3fgCYA= X-Received: by 2002:adf:d4c1:: with SMTP id w1mr16183242wrk.108.1600092866513; Mon, 14 Sep 2020 07:14:26 -0700 (PDT) MIME-Version: 1.0 References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> <20200419111025.4326.27F6AC2D@kcn.ne.jp> <0417af9a-50a5-3462-6b38-393d80395085@cs.ucla.edu> <78d13c9d-0426-b913-66fc-d7d652a5500c@cs.ucla.edu> In-Reply-To: <78d13c9d-0426-b913-66fc-d7d652a5500c@cs.ucla.edu> From: Jim Meyering Date: Mon, 14 Sep 2020 07:14:14 -0700 Message-ID: Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. To: Paul Eggert Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 40634 Cc: fryasu@yahoo.co.jp, Gnulib bugs , 40634@debbugs.gnu.org, Norihiro Tanaka , GNU grep developers X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Sun, Sep 13, 2020 at 7:03 PM Paul Eggert wrote: > On 9/11/20 11:41 PM, Jim Meyering wrote: > >> https://bugs.gnu.org/40634#32 > >> > >> I'll try to take a look at the later patch. > > > > Oh! Glad you spotted that. > > I took a look and the basic idea sounds good though I admit I did not check > every detail. While looking into it I found some opportunities for improvements, > plus I found what appear to be some longstanding bugs in the area, one of which > causes a grep test failure on Solaris (and I suspect the bug is also on > GNU/Linux but the grep tests don't catch it). I installed the attached patches > into Gnulib, updated grep to point to the new Gnulib version, and added a note > in grep's NEWS file about this. > > Patch 1 is what Norihiro Tanaka proposed in Bug#40634#32, except I edited the > commit message. Patch 2 consists of minor cleanups and performance tweaks for > Patch 1. (Patches 3 and 4 are omitted as they were installed by others into > Gnulib at about the same time I was installing these.) Patch 5 fixes a > dfa-heap-overrun failure on Solaris that appears to be a longstanding bug > exposed by Patch 1 when running on Solaris. Patch 6 merely cleans up code near > Patch 5. Patch 7 fixes the use of an uninitialized constraint, which I > discovered while debugging Patch 5 under Valgrind; this also appears to be a > longstandiung bug. > > Coming up with test cases for all these bugs would be pretty tricky, unfortunately. Wow! Thank you! From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 21 15:22:58 2020 Received: (at 40634-done) by debbugs.gnu.org; 21 Sep 2020 19:22:58 +0000 Received: from localhost ([127.0.0.1]:56295 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kKROs-0001qV-2M for submit@debbugs.gnu.org; Mon, 21 Sep 2020 15:22:58 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:50856) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kKROq-0001qJ-LR for 40634-done@debbugs.gnu.org; Mon, 21 Sep 2020 15:22:57 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 55532160100; Mon, 21 Sep 2020 12:22:51 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id VZaBpaJES_Ur; Mon, 21 Sep 2020 12:22:50 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A78331600FF; Mon, 21 Sep 2020 12:22:50 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id yjso88tW9ldR; Mon, 21 Sep 2020 12:22:50 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com [23.243.218.95]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 710421600FC; Mon, 21 Sep 2020 12:22:50 -0700 (PDT) Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. From: Paul Eggert To: Jim Meyering References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> <20200419111025.4326.27F6AC2D@kcn.ne.jp> <0417af9a-50a5-3462-6b38-393d80395085@cs.ucla.edu> <78d13c9d-0426-b913-66fc-d7d652a5500c@cs.ucla.edu> Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDVlFRVEFRZ0FQ d0liQXdZTApDUWdIQXdJR0ZRZ0NDUW9MQkJZQ0F3RUNIZ0VDRjRBV0lRUitONUtwMkt6MzFq TzhGWWp0bCtrT1lxcCtOQVVDClh5Vzlsd1VKRks0THN3QUtDUkR0bCtrT1lxcCtOS05WRC85 SE1zSTE2MDZuMFV1VFhId0lUc3lPakFJOVNET1QKK0MzRFV2NnFsTTVCSDJuV0FNVGlJaXlB NXVnbHNKdjkzb2kydk50RmYvUS9tLzFjblpXZ25WbkV4a3lMSTRFTgpTZDF1QnZyMC9sQ1Nk UGxQME1nNkdXU3BYTXUreDB2ZFQwQWFaTk9URTBGblB1b2xkYzNYRDc2QzJxZzhzWC9pCmF4 WFRLSHk5UCtCbEFxL0NzNy9weERRMEV6U24wVVNaMkMwbDV2djRQTXBBL3BpY25TNks2MDlK dkRHYU9SbXcKWmVYSVpxUU5aVitaUXMrVVl0Vm9ndURUcWJ5M0lVWTFJOEJsWEhScHRhajlB TW40VW9oL0NxcFFsVm9qb3lXbApIcWFGbm5KQktlRjBodko5U0F5YWx3dXpBakc3dlFXMDdN WW5jYU9GbTB3b2lLYmc1SkxPOEY0U0JUSWt1TzBECkNmOW5MQWF5NlZzQjRyendkRWZSd2pQ TFlBbjdNUjNmdkhDRXpmcmtsZFRyYWlCTzFUMGllREs4MEk3c0xmNnAKTWVDWUkxOXBVbHgw L05STUdDZGRpRklRZGZ0aEtXWEdSUzVMQXM4andCZjhINkc1UFdpblByRUlhb21JUDIxaQp2 dWhRRDA3YllxOUlpSWRlbGpqVWRIY0dJMGkvQjRNNTZaYWE4RmYzOGluaU9sckRZQ21ZV1I0 ZENXWml1UWVaCjNPZ3FlUXM5YTZqVHZnZERHVm1SVnFZK2p6azhQbGFIZmNvazhST2hGY0hL a2NmaHVCaEwyNWhsUklzaFJET0UKc2tYcUt3bnpyYnFnYTNHWFpYZnNYQW9GYnpOaExkTHY5 QStMSkFZU2tYUDYvNXFkVHBFTFZHb3N5SDg4NFZkYgpCcGtHSTA0b1lWcXVsYmtDRFFSTWdI SmtBUkFBcG9YcnZ4UDNESWZqQ05PdFhVL1Bkd01TaEtkWC9SbFNzNVBmCnVuVjF3YktQOGhl clhIcnZRZEZWcUVDYVRTeG1saHpiazhYMFBrWTlnY1ZhVTJPNDlUM3FzT2QxY0hlRjUyWUYK R0V0MExoc0JlTWpnTlg1dVoxVjc2cjhneWVWbEZwV1diMFNJd0pVQkhyRFhleEY2N3VwZVJi MnZkSEJqWUROZQp5U24rMEI3Z0ZFcXZWbVp1K0xhZHVkRHA2a1FMamF0RnZIUUhVU0dOc2hC bmtrY2FUYmlJOVBzdDBHQ2MyYWl6Cm5CaVBQQTJXUXhBUGxQUmgzT0dUc241VEhBRG1ianFZ NkZFTUxhc1ZYOERTQ2JsTXZMd05lTy84U3h6aUJpZGgKcUxwSkNxZFFSV0hrdTVYeGdJa0dl S096NU9MRHZYSFdKeWFmckVZamprUzZBazZCNXo2c3ZLbGlDbFduakhRYwpqbFB6eW9GRmdL VEVmY3FEeENqNFJZMEQwRGd0RkQwTmZ5ZU9pZHJTQi9TelRlMmh3cnlRRTNycFNpcW8rMGNH CmR6aDR5QUhLWUorVXJYWjRwOTNaaGpHZktEMXhsck5ZRGxXeVc5UEdtYnZxRnVEbWlJQVFm OVdEL3d6RWZJQ2MKK0YrdURESSt1WWtSeFVGcDkyeWttZGhERUZnMXlqWXNVOGlHVTY5YUh5 dmhxMzZ6NHpjdHZicWhSTnpPV0IxYgpWSi9kSU1EdnNFeEdjWFFWRElUN3NETlh2MHdFM2pL U0twcDdOREcxb1hVWEwrMitTRjk5S2p5NzUzQWJRU0FtCkg2MTdmeUJOd2hKV3ZRWWcrbVV2 UHBpR090c2VzOUVYVUkzbFM0djBNRWFQRzQzZmxFczFVUisxcnBGUVdWSG8KMXkxT08rc0FF UUVBQVlrQ1BBUVlBUWdBSmdJYkRCWWhCSDQza3FuWXJQZldNN3dWaU8yWDZRNWlxbjQwQlFK ZgpKYjJ6QlFrVXJndlBBQW9KRU8yWDZRNWlxbjQwY25NUC8xN0NnVWtYVDlhSUpyaVBNOHdi Y2VZcmNsNytiZFlFCmY3OVNsd1NiYkhON1I0Q29JSkZPbE45Uy8zNHR5cEdWWXZwZ21DSkRZ RlRCeHlQTzkyaU1YRGdBNCtjV0h6dDUKVDFhWU85aHNLaGg3dkR0Sys2UHJvWkdjKzA4Z1VU WEhoYjk3aE1NUWhrbkpsbmZqcFNFQzllbTkwNkZVK0k5MwpUMWZUR3VwbkJhM2FXY0s4ak0w SmFCR2J5MmhHMVMzb2xhRExTVHRCSU5OQlltdnVXUjlNS09oaHFEcmxrNWN3CkZESkxoNU5y WHRlRVkwOFdBemNMekczcGtyWFBIa0ZlTVF0ZnFrMGpMZEdHdkdDM05DSWtxWXJkTGhpUnZH cHIKdTM4QzI2UkVuNWY0STB2R0UzVmZJWEhlOFRNQ05tUXV0MU50TXVVbXBESXkxYUx4R3p1 cHRVaG5PSk4vL3IrVgpqRFBvaTNMT3lTTllwaHFlL2RNdWJzZlVyNm9oUDQxbUtGODFGdXdJ NGFtcUp0cnFJTDJ5cWF4M2EwcWxmd0N4ClhmdGllcUpjdWVrWCtlQ1BEQ0tyWU1YUjBGWWd3 cEcySVRaVUd0ckVqRVNsRTZEc2N4NzM0SEtkcjVPUklvY0wKVVVLRU9HZWlVNkRHaEdGZGI1 VHd1MFNuK3UxbVVQRE4wTSsrQ2RNdkNsSUU4a2xvNEc5MUVPSW11MVVwYjh4YwpPUFF3eGgx andxU3JVNVF3b05tU1llZ1FTSExwSVV1ckZ6MWlRVWgxdnBQWHpLaW5rV0VxdjRJcUExY2lM K0x5CnlTdUxrcDdNc0pwVlJNYldKQ05XT09TYmFING9EQko1ZEhNR2MzNXg1bW9zQ2s5MFBY a251RkREc1lIZkRvNXMKbWY5bG82WVh4N045Cj0zTGFJCi0tLS0tRU5EIFBHUCBQVUJMSUMg S0VZIEJMT0NLLS0tLS0K Organization: UCLA Computer Science Department Message-ID: Date: Mon, 21 Sep 2020 12:22:50 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <78d13c9d-0426-b913-66fc-d7d652a5500c@cs.ucla.edu> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 40634-done Cc: fryasu@yahoo.co.jp, Gnulib bugs , 40634-done@debbugs.gnu.org, Norihiro Tanaka , GNU grep developers X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) The dust seems to have settled on this, so I'm closing the grep bug report to tidy things up. From unknown Sat Sep 13 02:39:22 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 20 Oct 2020 11:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator