From unknown Sun Jun 22 07:42:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#17448: [PATCH] grep: retry DFA superset after matched with multiple lines by it Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 09 May 2014 14:42:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 17448 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: 17448@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13996464875395 (code B ref -1); Fri, 09 May 2014 14:42:01 +0000 Received: (at submit) by debbugs.gnu.org; 9 May 2014 14:41:27 +0000 Received: from localhost ([127.0.0.1]:57093 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WilzL-0001Ow-Ae for submit@debbugs.gnu.org; Fri, 09 May 2014 10:41:27 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52422) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WilzH-0001Of-GZ for submit@debbugs.gnu.org; Fri, 09 May 2014 10:41:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wilz3-00047t-LA for submit@debbugs.gnu.org; Fri, 09 May 2014 10:41:18 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:58330) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wilz3-00047p-Hd for submit@debbugs.gnu.org; Fri, 09 May 2014 10:41:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54144) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wilyv-0001PY-T6 for bug-grep@gnu.org; Fri, 09 May 2014 10:41:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wilyn-00047B-4b for bug-grep@gnu.org; Fri, 09 May 2014 10:41:01 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:55033) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wilym-00046y-Cm for bug-grep@gnu.org; Fri, 09 May 2014 10:40:52 -0400 Received: from imp03 (mailgw7.kcn.ne.jp [61.86.15.238]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 0729167C17 for ; Fri, 9 May 2014 23:40:49 +0900 (JST) Received: from mail04.kcn.ne.jp ([61.86.6.183]) by imp03 with bizsmtp id zqgp1n0023wvxAM01qgp4o; Fri, 09 May 2014 23:40:49 +0900 X-OrgRCPT: bug-grep@gnu.org Received: from [10.120.1.47] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail04.kcn.ne.jp (Postfix) with ESMTPA id AD1051290022 for ; Fri, 9 May 2014 23:40:48 +0900 (JST) Date: Fri, 09 May 2014 23:40:48 +0900 From: Norihiro Tanaka Message-Id: <20140509234020.75F0.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_536CE4C20000000075EA_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-detected-operating-system: by eggs.gnu.org: Mac OS X 10.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --------_536CE4C20000000075EA_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Currently, when matched with multiple lines by DFA superset, return to KWset. However, it won't be wrong probably, because if matches with multiple lines by DFA superset, also matches with single line there with high probability. Further more, if return to KWset after matched with multiple line by DFA superset, dfafast won't work effectively. This patch changes to retry DFA superset immediately after matched with multiple lines by it. I confirmed the patch by following tests. $ yes abcdabc | head -50000000 >k $ env LC_ALL=C time -p src/grep '\(ab\)cd\1d' k before: real 3.48 user 3.41 sys 0.06 after: real 2.14 user 2.07 sys 0.06 Norihiro --------_536CE4C20000000075EA_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-retry-DFA-superset-after-matched-with-multiple-.patch" Content-Disposition: attachment; filename="0001-grep-retry-DFA-superset-after-matched-with-multiple-.patch" Content-Transfer-Encoding: base64 RnJvbSBmYWZiOTNkYjZjNjE4ZTY5ZGVkMTUzMTdiZDk1M2E5ODQ2M2QyMDBmIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBGcmksIDkgTWF5IDIwMTQgMTU6MjY6MzggKzA5MDAKU3ViamVjdDogW1BBVENIXSBncmVw OiByZXRyeSBERkEgc3VwZXJzZXQgYWZ0ZXIgbWF0Y2hlZCB3aXRoIG11bHRpcGxlIGxpbmVzIGJ5 CiBpdAoKKiBzcmMvZGZhc2VhcmNoLmMgKEVHZXhlY3V0ZSk6IERvIGl0LgotLS0KIHNyYy9kZmFz ZWFyY2guYyB8IDMyICsrKysrKysrKysrKysrKysrKystLS0tLS0tLS0tLS0tCiAxIGZpbGUgY2hh bmdlZCwgMTkgaW5zZXJ0aW9ucygrKSwgMTMgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3Jj L2RmYXNlYXJjaC5jIGIvc3JjL2RmYXNlYXJjaC5jCmluZGV4IDQyMDI2NjYuLjlmYjc0NDkgMTAw NjQ0Ci0tLSBhL3NyYy9kZmFzZWFyY2guYworKysgYi9zcmMvZGZhc2VhcmNoLmMKQEAgLTI4NCwy NiArMjg0LDMyIEBAIEVHZXhlY3V0ZSAoY2hhciBjb25zdCAqYnVmLCBzaXplX3Qgc2l6ZSwgc2l6 ZV90ICptYXRjaF9zaXplLAogICAgICAgICAgIC8qIFRyeSBtYXRjaGluZyB3aXRoIHRoZSBzdXBl cnNldCBvZiBERkEsIGlmIGl0J3MgZGVmaW5lZC4gICovCiAgICAgICAgICAgaWYgKHN1cGVyc2V0 ICYmICFleGFjdF9rd3NldF9tYXRjaCkKICAgICAgICAgICAgIHsKLSAgICAgICAgICAgICAgbmV4 dF9iZWcgPSBkZmFleGVjIChzdXBlcnNldCwgZGZhX2JlZywgKGNoYXIgKikgZW5kLCAxLAotICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICZjb3VudCwgTlVMTCk7Ci0gICAgICAgICAg ICAgIC8qIElmIHRoZXJlJ3Mgbm8gbWF0Y2gsIG9yIGlmIHdlJ3ZlIG1hdGNoZWQgdGhlIHNlbnRp bmVsLAotICAgICAgICAgICAgICAgICB3ZSdyZSBkb25lLiAgKi8KLSAgICAgICAgICAgICAgaWYg KG5leHRfYmVnID09IE5VTEwgfHwgbmV4dF9iZWcgPT0gZW5kKQotICAgICAgICAgICAgICAgIGNv bnRpbnVlOwotCi0gICAgICAgICAgICAgIC8qIE5hcnJvdyBkb3duIHRvIHRoZSBsaW5lIHdlJ3Zl IGZvdW5kLiAgKi8KLSAgICAgICAgICAgICAgaWYgKGNvdW50ICE9IDApCisgICAgICAgICAgICAg IHdoaWxlICh0cnVlKQogICAgICAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgICAgIG5leHRf YmVnID0gZGZhZXhlYyAoc3VwZXJzZXQsIGRmYV9iZWcsIChjaGFyICopIGVuZCwgMSwKKyAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJmNvdW50LCBOVUxMKTsKKyAgICAgICAg ICAgICAgICAgIC8qIElmIHRoZXJlJ3Mgbm8gbWF0Y2gsIG9yIGlmIHdlJ3ZlIG1hdGNoZWQgdGhl IHNlbnRpbmVsLAorICAgICAgICAgICAgICAgICAgICAgd2UncmUgZG9uZS4gICovCisgICAgICAg ICAgICAgICAgICBpZiAobmV4dF9iZWcgPT0gTlVMTCB8fCBuZXh0X2JlZyA9PSBlbmQpCisgICAg ICAgICAgICAgICAgICAgIGJyZWFrOworCisgICAgICAgICAgICAgICAgICBpZiAoY291bnQgPT0g MCkKKyAgICAgICAgICAgICAgICAgICAgYnJlYWs7CisgICAgICAgICAgICAgICAgICBjb3VudCA9 IDA7CisKICAgICAgICAgICAgICAgICAgIC8qIElmIGRmYWV4ZWMgbWF5IG1hdGNoIGluIG11bHRp cGxlIGxpbmVzLCB0cnkgdG8KICAgICAgICAgICAgICAgICAgICAgIG1hdGNoIGluIG9uZSBsaW5l LiAgKi8KLSAgICAgICAgICAgICAgICAgIGVuZCA9IG1lbXJjaHIgKGJ1ZiwgZW9sLCBuZXh0X2Jl ZyAtIGJ1Zik7Ci0gICAgICAgICAgICAgICAgICBlbmQrKzsKLSAgICAgICAgICAgICAgICAgIGNv bnRpbnVlOworICAgICAgICAgICAgICAgICAgYmVnID0gbWVtcmNociAoYnVmLCBlb2wsIG5leHRf YmVnIC0gYnVmKTsKKyAgICAgICAgICAgICAgICAgIGJlZyA9IGJlZyA/IGJlZyArIDEgOiBidWY7 CisgICAgICAgICAgICAgICAgICBkZmFfYmVnID0gYmVnOwogICAgICAgICAgICAgICAgIH0KKyAg ICAgICAgICAgICAgaWYgKG5leHRfYmVnID09IE5VTEwgfHwgbmV4dF9iZWcgPT0gZW5kKQorICAg ICAgICAgICAgICAgIGNvbnRpbnVlOworCisgICAgICAgICAgICAgIC8qIE5hcnJvdyBkb3duIHRv IHRoZSBsaW5lIHdlJ3ZlIGZvdW5kLiAgKi8KICAgICAgICAgICAgICAgZW5kID0gbWVtY2hyIChu ZXh0X2JlZywgZW9sLCBidWZsaW0gLSBuZXh0X2JlZyk7CiAgICAgICAgICAgICAgIGVuZCA9IGVu ZCA/IGVuZCArIDEgOiBidWZsaW07CiAgICAgICAgICAgICB9Ci0KICAgICAgICAgICAvKiBUcnkg bWF0Y2hpbmcgd2l0aCBERkEuICAqLwogICAgICAgICAgIG5leHRfYmVnID0gZGZhZXhlYyAoZGZh LCBkZmFfYmVnLCAoY2hhciAqKSBlbmQsIDAsICZjb3VudCwgJmJhY2tyZWYpOwogCi0tIAoxLjku MgoK --------_536CE4C20000000075EA_MULTIPART_MIXED_-- From unknown Sun Jun 22 07:42:25 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Norihiro Tanaka Subject: bug#17448: closed (Re: bug#17448: [PATCH] grep: retry DFA superset after matched with multiple lines by it) Message-ID: References: <536D4E3A.1010507@cs.ucla.edu> <20140509234020.75F0.27F6AC2D@kcn.ne.jp> X-Gnu-PR-Message: they-closed 17448 X-Gnu-PR-Package: grep X-Gnu-PR-Keywords: patch Reply-To: 17448@debbugs.gnu.org Date: Fri, 09 May 2014 21:54:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1399672442-26225-1" This is a multi-part message in MIME format... ------------=_1399672442-26225-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #17448: [PATCH] grep: retry DFA superset after matched with multiple lines = by it which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 17448@debbugs.gnu.org. --=20 17448: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D17448 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1399672442-26225-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 17448-done) by debbugs.gnu.org; 9 May 2014 21:53:13 +0000 Received: from localhost ([127.0.0.1]:57379 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WisjB-0006nn-07 for submit@debbugs.gnu.org; Fri, 09 May 2014 17:53:13 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:48534) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wisj8-0006nY-KQ for 17448-done@debbugs.gnu.org; Fri, 09 May 2014 17:53:11 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 4DF71A60019; Fri, 9 May 2014 14:53:04 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id odnuHoaoQqip; Fri, 9 May 2014 14:52:59 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 8191BA60007; Fri, 9 May 2014 14:52:59 -0700 (PDT) Message-ID: <536D4E3A.1010507@cs.ucla.edu> Date: Fri, 09 May 2014 14:52:58 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Norihiro Tanaka , 17448-done@debbugs.gnu.org Subject: Re: bug#17448: [PATCH] grep: retry DFA superset after matched with multiple lines by it References: <20140509234020.75F0.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140509234020.75F0.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------060000000304060801050602" X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 17448-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) This is a multi-part message in MIME format. --------------060000000304060801050602 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Thanks, I installed that, along with the attached patch which improves it slightly by avoiding a test of the returned value of memrchr in a context where memrchr cannot return NULL, plus I redid the while-control to make it a bit clearer when the loop terminates. --------------060000000304060801050602 Content-Type: text/plain; charset=UTF-8; name="0001-grep-minor-improvements-to-retry-DFA-superset-patch.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0001-grep-minor-improvements-to-retry-DFA-superset-patch.pat"; filename*1="ch" RnJvbSBhZDZjYTliYjU0YWUyYTk4MTFjODg5Yzk5MGVhNjdkNjc5MTlhNGY0IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBGcmksIDkgTWF5IDIwMTQgMTQ6NDg6NDcgLTA3MDAKU3ViamVjdDogW1BBVENI XSBncmVwOiBtaW5vciBpbXByb3ZlbWVudHMgdG8gcmV0cnktREZBLXN1cGVyc2V0IHBhdGNo CgoqIHNyYy9kZmFzZWFyY2guYyAoRUdleGVjdXRlKTogQXZvaWQgdW5uZWNlc3NhcnkgdGVz dCBpbiBhIGNvbnRleHQKd2hlcmUgbWVtcmNociBjYW5ub3QgcmV0dXJuIGEgbnVsbCBwb2lu dGVyLgotLS0KIHNyYy9kZmFzZWFyY2guYyB8IDI0ICsrKysrKysrKystLS0tLS0tLS0tLS0t LQogMSBmaWxlIGNoYW5nZWQsIDEwIGluc2VydGlvbnMoKyksIDE0IGRlbGV0aW9ucygtKQoK ZGlmZiAtLWdpdCBhL3NyYy9kZmFzZWFyY2guYyBiL3NyYy9kZmFzZWFyY2guYwppbmRleCA5 ZmI3NDQ5Li43N2I0ZTNlIDEwMDY0NAotLS0gYS9zcmMvZGZhc2VhcmNoLmMKKysrIGIvc3Jj L2RmYXNlYXJjaC5jCkBAIC0yODQsMjMgKzI4NCwxOCBAQCBFR2V4ZWN1dGUgKGNoYXIgY29u c3QgKmJ1Ziwgc2l6ZV90IHNpemUsIHNpemVfdCAqbWF0Y2hfc2l6ZSwKICAgICAgICAgICAv KiBUcnkgbWF0Y2hpbmcgd2l0aCB0aGUgc3VwZXJzZXQgb2YgREZBLCBpZiBpdCdzIGRlZmlu ZWQuICAqLwogICAgICAgICAgIGlmIChzdXBlcnNldCAmJiAhZXhhY3Rfa3dzZXRfbWF0Y2gp CiAgICAgICAgICAgICB7Ci0gICAgICAgICAgICAgIHdoaWxlICh0cnVlKQorICAgICAgICAg ICAgICAvKiBLZWVwIHVzaW5nIHRoZSBzdXBlcnNldCB3aGlsZSBpdCByZXBvcnRzIG11bHRp bGluZQorICAgICAgICAgICAgICAgICBwb3RlbnRpYWwgbWF0Y2hlczsgdGhpcyBpcyBtb3Jl IGxpa2VseSB0byBiZSBmYXN0CisgICAgICAgICAgICAgICAgIHRoYW4gZmFsbGluZyBiYWNr IHRvIEtXc2V0IHdvdWxkIGJlLiAgKi8KKyAgICAgICAgICAgICAgd2hpbGUgKChuZXh0X2Jl ZyA9IGRmYWV4ZWMgKHN1cGVyc2V0LCBkZmFfYmVnLCAoY2hhciAqKSBlbmQsIDEsCisgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAmY291bnQsIE5VTEwpKQor ICAgICAgICAgICAgICAgICAgICAgJiYgbmV4dF9iZWcgIT0gZW5kCisgICAgICAgICAgICAg ICAgICAgICAmJiBjb3VudCAhPSAwKQogICAgICAgICAgICAgICAgIHsKLSAgICAgICAgICAg ICAgICAgIG5leHRfYmVnID0gZGZhZXhlYyAoc3VwZXJzZXQsIGRmYV9iZWcsIChjaGFyICop IGVuZCwgMSwKLSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJmNvdW50 LCBOVUxMKTsKLSAgICAgICAgICAgICAgICAgIC8qIElmIHRoZXJlJ3Mgbm8gbWF0Y2gsIG9y IGlmIHdlJ3ZlIG1hdGNoZWQgdGhlIHNlbnRpbmVsLAotICAgICAgICAgICAgICAgICAgICAg d2UncmUgZG9uZS4gICovCi0gICAgICAgICAgICAgICAgICBpZiAobmV4dF9iZWcgPT0gTlVM TCB8fCBuZXh0X2JlZyA9PSBlbmQpCi0gICAgICAgICAgICAgICAgICAgIGJyZWFrOwotCi0g ICAgICAgICAgICAgICAgICBpZiAoY291bnQgPT0gMCkKLSAgICAgICAgICAgICAgICAgICAg YnJlYWs7CisgICAgICAgICAgICAgICAgICAvKiBUcnkgdG8gbWF0Y2ggaW4ganVzdCBvbmUg bGluZS4gICovCiAgICAgICAgICAgICAgICAgICBjb3VudCA9IDA7Ci0KLSAgICAgICAgICAg ICAgICAgIC8qIElmIGRmYWV4ZWMgbWF5IG1hdGNoIGluIG11bHRpcGxlIGxpbmVzLCB0cnkg dG8KLSAgICAgICAgICAgICAgICAgICAgIG1hdGNoIGluIG9uZSBsaW5lLiAgKi8KICAgICAg ICAgICAgICAgICAgIGJlZyA9IG1lbXJjaHIgKGJ1ZiwgZW9sLCBuZXh0X2JlZyAtIGJ1Zik7 Ci0gICAgICAgICAgICAgICAgICBiZWcgPSBiZWcgPyBiZWcgKyAxIDogYnVmOworICAgICAg ICAgICAgICAgICAgYmVnKys7CiAgICAgICAgICAgICAgICAgICBkZmFfYmVnID0gYmVnOwog ICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgICAgaWYgKG5leHRfYmVnID09IE5VTEwg fHwgbmV4dF9iZWcgPT0gZW5kKQpAQCAtMzEwLDYgKzMwNSw3IEBAIEVHZXhlY3V0ZSAoY2hh ciBjb25zdCAqYnVmLCBzaXplX3Qgc2l6ZSwgc2l6ZV90ICptYXRjaF9zaXplLAogICAgICAg ICAgICAgICBlbmQgPSBtZW1jaHIgKG5leHRfYmVnLCBlb2wsIGJ1ZmxpbSAtIG5leHRfYmVn KTsKICAgICAgICAgICAgICAgZW5kID0gZW5kID8gZW5kICsgMSA6IGJ1ZmxpbTsKICAgICAg ICAgICAgIH0KKwogICAgICAgICAgIC8qIFRyeSBtYXRjaGluZyB3aXRoIERGQS4gICovCiAg ICAgICAgICAgbmV4dF9iZWcgPSBkZmFleGVjIChkZmEsIGRmYV9iZWcsIChjaGFyICopIGVu ZCwgMCwgJmNvdW50LCAmYmFja3JlZik7CiAKLS0gCjEuOS4wCgo= --------------060000000304060801050602-- ------------=_1399672442-26225-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 9 May 2014 14:41:27 +0000 Received: from localhost ([127.0.0.1]:57093 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WilzL-0001Ow-Ae for submit@debbugs.gnu.org; Fri, 09 May 2014 10:41:27 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52422) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WilzH-0001Of-GZ for submit@debbugs.gnu.org; Fri, 09 May 2014 10:41:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wilz3-00047t-LA for submit@debbugs.gnu.org; Fri, 09 May 2014 10:41:18 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:58330) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wilz3-00047p-Hd for submit@debbugs.gnu.org; Fri, 09 May 2014 10:41:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54144) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wilyv-0001PY-T6 for bug-grep@gnu.org; Fri, 09 May 2014 10:41:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wilyn-00047B-4b for bug-grep@gnu.org; Fri, 09 May 2014 10:41:01 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:55033) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wilym-00046y-Cm for bug-grep@gnu.org; Fri, 09 May 2014 10:40:52 -0400 Received: from imp03 (mailgw7.kcn.ne.jp [61.86.15.238]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 0729167C17 for ; Fri, 9 May 2014 23:40:49 +0900 (JST) Received: from mail04.kcn.ne.jp ([61.86.6.183]) by imp03 with bizsmtp id zqgp1n0023wvxAM01qgp4o; Fri, 09 May 2014 23:40:49 +0900 X-OrgRCPT: bug-grep@gnu.org Received: from [10.120.1.47] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail04.kcn.ne.jp (Postfix) with ESMTPA id AD1051290022 for ; Fri, 9 May 2014 23:40:48 +0900 (JST) Date: Fri, 09 May 2014 23:40:48 +0900 From: Norihiro Tanaka To: bug-grep@gnu.org Subject: [PATCH] grep: retry DFA superset after matched with multiple lines by it Message-Id: <20140509234020.75F0.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_536CE4C20000000075EA_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-detected-operating-system: by eggs.gnu.org: Mac OS X 10.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --------_536CE4C20000000075EA_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Currently, when matched with multiple lines by DFA superset, return to KWset. However, it won't be wrong probably, because if matches with multiple lines by DFA superset, also matches with single line there with high probability. Further more, if return to KWset after matched with multiple line by DFA superset, dfafast won't work effectively. This patch changes to retry DFA superset immediately after matched with multiple lines by it. I confirmed the patch by following tests. $ yes abcdabc | head -50000000 >k $ env LC_ALL=C time -p src/grep '\(ab\)cd\1d' k before: real 3.48 user 3.41 sys 0.06 after: real 2.14 user 2.07 sys 0.06 Norihiro --------_536CE4C20000000075EA_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-retry-DFA-superset-after-matched-with-multiple-.patch" Content-Disposition: attachment; filename="0001-grep-retry-DFA-superset-after-matched-with-multiple-.patch" Content-Transfer-Encoding: base64 RnJvbSBmYWZiOTNkYjZjNjE4ZTY5ZGVkMTUzMTdiZDk1M2E5ODQ2M2QyMDBmIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBGcmksIDkgTWF5IDIwMTQgMTU6MjY6MzggKzA5MDAKU3ViamVjdDogW1BBVENIXSBncmVw OiByZXRyeSBERkEgc3VwZXJzZXQgYWZ0ZXIgbWF0Y2hlZCB3aXRoIG11bHRpcGxlIGxpbmVzIGJ5 CiBpdAoKKiBzcmMvZGZhc2VhcmNoLmMgKEVHZXhlY3V0ZSk6IERvIGl0LgotLS0KIHNyYy9kZmFz ZWFyY2guYyB8IDMyICsrKysrKysrKysrKysrKysrKystLS0tLS0tLS0tLS0tCiAxIGZpbGUgY2hh bmdlZCwgMTkgaW5zZXJ0aW9ucygrKSwgMTMgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3Jj L2RmYXNlYXJjaC5jIGIvc3JjL2RmYXNlYXJjaC5jCmluZGV4IDQyMDI2NjYuLjlmYjc0NDkgMTAw NjQ0Ci0tLSBhL3NyYy9kZmFzZWFyY2guYworKysgYi9zcmMvZGZhc2VhcmNoLmMKQEAgLTI4NCwy NiArMjg0LDMyIEBAIEVHZXhlY3V0ZSAoY2hhciBjb25zdCAqYnVmLCBzaXplX3Qgc2l6ZSwgc2l6 ZV90ICptYXRjaF9zaXplLAogICAgICAgICAgIC8qIFRyeSBtYXRjaGluZyB3aXRoIHRoZSBzdXBl cnNldCBvZiBERkEsIGlmIGl0J3MgZGVmaW5lZC4gICovCiAgICAgICAgICAgaWYgKHN1cGVyc2V0 ICYmICFleGFjdF9rd3NldF9tYXRjaCkKICAgICAgICAgICAgIHsKLSAgICAgICAgICAgICAgbmV4 dF9iZWcgPSBkZmFleGVjIChzdXBlcnNldCwgZGZhX2JlZywgKGNoYXIgKikgZW5kLCAxLAotICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICZjb3VudCwgTlVMTCk7Ci0gICAgICAgICAg ICAgIC8qIElmIHRoZXJlJ3Mgbm8gbWF0Y2gsIG9yIGlmIHdlJ3ZlIG1hdGNoZWQgdGhlIHNlbnRp bmVsLAotICAgICAgICAgICAgICAgICB3ZSdyZSBkb25lLiAgKi8KLSAgICAgICAgICAgICAgaWYg KG5leHRfYmVnID09IE5VTEwgfHwgbmV4dF9iZWcgPT0gZW5kKQotICAgICAgICAgICAgICAgIGNv bnRpbnVlOwotCi0gICAgICAgICAgICAgIC8qIE5hcnJvdyBkb3duIHRvIHRoZSBsaW5lIHdlJ3Zl IGZvdW5kLiAgKi8KLSAgICAgICAgICAgICAgaWYgKGNvdW50ICE9IDApCisgICAgICAgICAgICAg IHdoaWxlICh0cnVlKQogICAgICAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgICAgIG5leHRf YmVnID0gZGZhZXhlYyAoc3VwZXJzZXQsIGRmYV9iZWcsIChjaGFyICopIGVuZCwgMSwKKyAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJmNvdW50LCBOVUxMKTsKKyAgICAgICAg ICAgICAgICAgIC8qIElmIHRoZXJlJ3Mgbm8gbWF0Y2gsIG9yIGlmIHdlJ3ZlIG1hdGNoZWQgdGhl IHNlbnRpbmVsLAorICAgICAgICAgICAgICAgICAgICAgd2UncmUgZG9uZS4gICovCisgICAgICAg ICAgICAgICAgICBpZiAobmV4dF9iZWcgPT0gTlVMTCB8fCBuZXh0X2JlZyA9PSBlbmQpCisgICAg ICAgICAgICAgICAgICAgIGJyZWFrOworCisgICAgICAgICAgICAgICAgICBpZiAoY291bnQgPT0g MCkKKyAgICAgICAgICAgICAgICAgICAgYnJlYWs7CisgICAgICAgICAgICAgICAgICBjb3VudCA9 IDA7CisKICAgICAgICAgICAgICAgICAgIC8qIElmIGRmYWV4ZWMgbWF5IG1hdGNoIGluIG11bHRp cGxlIGxpbmVzLCB0cnkgdG8KICAgICAgICAgICAgICAgICAgICAgIG1hdGNoIGluIG9uZSBsaW5l LiAgKi8KLSAgICAgICAgICAgICAgICAgIGVuZCA9IG1lbXJjaHIgKGJ1ZiwgZW9sLCBuZXh0X2Jl ZyAtIGJ1Zik7Ci0gICAgICAgICAgICAgICAgICBlbmQrKzsKLSAgICAgICAgICAgICAgICAgIGNv bnRpbnVlOworICAgICAgICAgICAgICAgICAgYmVnID0gbWVtcmNociAoYnVmLCBlb2wsIG5leHRf YmVnIC0gYnVmKTsKKyAgICAgICAgICAgICAgICAgIGJlZyA9IGJlZyA/IGJlZyArIDEgOiBidWY7 CisgICAgICAgICAgICAgICAgICBkZmFfYmVnID0gYmVnOwogICAgICAgICAgICAgICAgIH0KKyAg ICAgICAgICAgICAgaWYgKG5leHRfYmVnID09IE5VTEwgfHwgbmV4dF9iZWcgPT0gZW5kKQorICAg ICAgICAgICAgICAgIGNvbnRpbnVlOworCisgICAgICAgICAgICAgIC8qIE5hcnJvdyBkb3duIHRv IHRoZSBsaW5lIHdlJ3ZlIGZvdW5kLiAgKi8KICAgICAgICAgICAgICAgZW5kID0gbWVtY2hyIChu ZXh0X2JlZywgZW9sLCBidWZsaW0gLSBuZXh0X2JlZyk7CiAgICAgICAgICAgICAgIGVuZCA9IGVu ZCA/IGVuZCArIDEgOiBidWZsaW07CiAgICAgICAgICAgICB9Ci0KICAgICAgICAgICAvKiBUcnkg bWF0Y2hpbmcgd2l0aCBERkEuICAqLwogICAgICAgICAgIG5leHRfYmVnID0gZGZhZXhlYyAoZGZh LCBkZmFfYmVnLCAoY2hhciAqKSBlbmQsIDAsICZjb3VudCwgJmJhY2tyZWYpOwogCi0tIAoxLjku MgoK --------_536CE4C20000000075EA_MULTIPART_MIXED_-- ------------=_1399672442-26225-1--