From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 17 Sep 2018 14:16:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: 32750@debbugs.gnu.org X-Debbugs-Original-To: Received: via spool by submit@debbugs.gnu.org id=B.153719372020606 (code B ref -1); Mon, 17 Sep 2018 14:16:02 +0000 Received: (at submit) by debbugs.gnu.org; 17 Sep 2018 14:15:20 +0000 Received: from localhost ([127.0.0.1]:43225 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g1uJ3-0005MF-WB for submit@debbugs.gnu.org; Mon, 17 Sep 2018 10:15:20 -0400 Received: from eggs.gnu.org ([208.118.235.92]:42383) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g1uJ2-0005M3-Hk for submit@debbugs.gnu.org; Mon, 17 Sep 2018 10:15:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g1uIs-00036x-PT for submit@debbugs.gnu.org; Mon, 17 Sep 2018 10:15:11 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:57846) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g1uIs-00036c-6s for submit@debbugs.gnu.org; Mon, 17 Sep 2018 10:15:06 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35088) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g1uIq-0001P6-MD for bug-grep@gnu.org; Mon, 17 Sep 2018 10:15:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g1uIn-00033d-D0 for bug-grep@gnu.org; Mon, 17 Sep 2018 10:15:04 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:50509) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g1uIm-0002yV-Jd for bug-grep@gnu.org; Mon, 17 Sep 2018 10:15:01 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 1B7E1880B7E for ; Mon, 17 Sep 2018 23:14:55 +0900 (JST) X-matriXscan-loop-detect: 9212771b8289570d209176f874d7fae50b1744e5 Received: from mail07.kcn.ne.jp ([61.86.6.186]) by mxs02-s with ESMTP; Mon, 17 Sep 2018 23:14:53 +0900 (JST) Received: from [10.120.1.63] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail07.kcn.ne.jp (Postfix) with ESMTPA id 97263D50099 for ; Mon, 17 Sep 2018 23:14:53 +0900 (JST) Date: Mon, 17 Sep 2018 23:14:52 +0900 From: Norihiro Tanaka Message-Id: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_58832584000000008786_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.73 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) --------_58832584000000008786_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Hi, Even when similer states exists in alternation, DFA treats them as separate items. It may complicate the transition in NFA and cause slowdown. This change assembles the states into one. For example, ab|ac is changed into a(b|c). This change speeds-up matching for many branched pattern. For example, grep speeds-up more than 30x in following case. seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in time -p env LC_ALL=C grep -vf in in (before) real 63.43 user 61.67 system 1.65 (after) real 1.64 user 1.61 system 0.03 # If we do not add '.' at last in pattern, not dfa but kwset is used. grep also speeds-up about 3x in following case. time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words (before) real 2.48 user 2.09 system 0.38 (after) real 7.69 user 6.32 system 1.29 Thanks, Norihiro --------_58832584000000008786_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-simplify-initial-state.patch" Content-Disposition: attachment; filename="0001-dfa-simplify-initial-state.patch" Content-Transfer-Encoding: base64 RnJvbSAzMTkzMTkxNzMwZDZlY2IzYTBjNGUzOGI0NjE0ODRkZWFmODE5Zjg3IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBNb24sIDE3IFNlcCAyMDE4IDIyOjIwOjM3ICswOTAwClN1YmplY3Q6IFtQQVRDSCAxLzJd IGRmYTogc2ltcGxpZnkgaW5pdGlhbCBzdGF0ZQoKVG8gc2ltcGxpZnkgaW5pdGlhbCBzdGF0ZSBl bmFibGVzIHRvIGJlIGVhc3kgdG8gb3B0aW1pemF0aW9uIG9mIE5GQS4KCmRmYS5jIChlbnVtIHRv a2VuKTogQWRkIG5ldyBlbGVtZW50IEJFRy4KKHBydG9rKTogQWRqdXN0IGR1ZSB0byBhZGRpbmcg ZWxlbWVudCBCRUcuCihkZmFwYXJzZSk6IFB1dCBCRUcgYXQgYSBoZWFkIG9mIHRva2Vucy4KKHN0 YXRlX2luZGV4KTogQWRqdXN0IGR1ZSB0byBhZGRpbmcgZWxlbWVudCBCRUcuCihkZmFhbmFseXpl KTogQ29uY2F0ZW5hdGUgQkVHIHRvIG90aGVyIHRva2VucywgYW5kIHNpbXBsaWZ5IHRvIGJ1aWxk IGluaXRpYWwKc3RhdGUuCihkZmFtdXN0KTogQWRqdXN0IGR1ZSB0byBhZGRpbmcgZWxlbWVudCBC RUcuICBERkFNVVNUIGlnbm9yZXMgaXQuCi0tLQogbGliL2RmYS5jIHwgICAzNyArKysrKysrKysr KysrKysrKysrKystLS0tLS0tLS0tLS0tLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDIxIGluc2VydGlv bnMoKyksIDE2IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL2xpYi9kZmEuYyBiL2xpYi9kZmEu YwppbmRleCBlMzMwODRkLi5jMGIyNGZjIDEwMDY0NAotLS0gYS9saWIvZGZhLmMKKysrIGIvbGli L2RmYS5jCkBAIC0yMjMsMTIgKzIyMywxNSBAQCB0eXBlZGVmIHB0cmRpZmZfdCBzdGF0ZV9udW07 CiAvKiBQcmVkZWZpbmVkIHRva2VuIHZhbHVlcy4gICovCiBlbnVtCiB7Ci0gIEVORCA9IC0xLCAg ICAgICAgICAgICAgICAgICAgIC8qIEVORCBpcyBhIHRlcm1pbmFsIHN5bWJvbCB0aGF0IG1hdGNo ZXMgdGhlCisgIEVORCA9IC0yLCAgICAgICAgICAgICAgICAgICAgIC8qIEVORCBpcyBhIHRlcm1p bmFsIHN5bWJvbCB0aGF0IG1hdGNoZXMgdGhlCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIGVuZCBvZiBpbnB1dDsgYW55IHZhbHVlIG9mIEVORCBvciBsZXNzIGluCiAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgIHRoZSBwYXJzZSB0cmVlIGlzIHN1Y2ggYSBzeW1i b2wuICBBY2NlcHRpbmcKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgc3RhdGVz IG9mIHRoZSBERkEgYXJlIHRob3NlIHRoYXQgd291bGQgaGF2ZQogICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICBhIHRyYW5zaXRpb24gb24gRU5ELiAgKi8KIAorICBCRUcgPSAtMSwg ICAgICAgICAgICAgICAgICAgICAvKiBCRUcgaXMgYSBiZWdpbm5pbmcgc3ltYm9sIHRoYXQgbWF0 Y2hlcyB0aGUKKyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgYmlnaW5uaW5nIG9m IGlucHV0LiAgKi8KKwogICAvKiBPcmRpbmFyeSBjaGFyYWN0ZXIgdmFsdWVzIGFyZSB0ZXJtaW5h bCBzeW1ib2xzIHRoYXQgbWF0Y2ggdGhlbXNlbHZlcy4gICovCiAKICAgRU1QVFkgPSBOT1RDSEFS LCAgICAgICAgICAgICAgLyogRU1QVFkgaXMgYSB0ZXJtaW5hbCBzeW1ib2wgdGhhdCBtYXRjaGVz CkBAIC02MzAsOSArNjMzLDkgQEAgbWJzX3RvX3djaGFyICh3aW50X3QgKnB3YywgY2hhciBjb25z dCAqcywgc2l6ZV90IG4sIHN0cnVjdCBkZmEgKmQpCiBzdGF0aWMgdm9pZAogcHJ0b2sgKHRva2Vu IHQpCiB7Ci0gIGlmICh0IDwgMCkKKyAgaWYgKHQgPD0gRU5EKQogICAgIGZwcmludGYgKHN0ZGVy ciwgIkVORCIpOwotICBlbHNlIGlmICh0IDwgTk9UQ0hBUikKKyAgZWxzZSBpZiAoMCA8PSB0ICYm IHQgPCBOT1RDSEFSKQogICAgIHsKICAgICAgIHVuc2lnbmVkIGludCBjaCA9IHQ7CiAgICAgICBm cHJpbnRmIChzdGRlcnIsICIweCUwMngiLCBjaCk7CkBAIC02NDIsNiArNjQ1LDkgQEAgcHJ0b2sg KHRva2VuIHQpCiAgICAgICBjaGFyIGNvbnN0ICpzOwogICAgICAgc3dpdGNoICh0KQogICAgICAg ICB7CisgICAgICAgIGNhc2UgQkVHOgorICAgICAgICAgIHMgPSAiQkVHIjsKKyAgICAgICAgICBi cmVhazsKICAgICAgICAgY2FzZSBFTVBUWToKICAgICAgICAgICBzID0gIkVNUFRZIjsKICAgICAg ICAgICBicmVhazsKQEAgLTE5NjcsNiArMTk3Myw5IEBAIGRmYXBhcnNlIChjaGFyIGNvbnN0ICpz LCBzaXplX3QgbGVuLCBzdHJ1Y3QgZGZhICpkKQogICBpZiAoIWQtPnN5bnRheC5zeW50YXhfYml0 c19zZXQpCiAgICAgZGZhZXJyb3IgKF8oIm5vIHN5bnRheCBzcGVjaWZpZWQiKSk7CiAKKyAgaWYg KCFkLT5ucmVnZXhwcykKKyAgICBhZGR0b2sgKGQsIEJFRyk7CisKICAgZC0+cGFyc2UudG9rID0g bGV4IChkKTsKICAgZC0+cGFyc2UuZGVwdGggPSBkLT5kZXB0aDsKIApAQCAtMjE4MCw3ICsyMTg5 LDcgQEAgc3RhdGVfaW5kZXggKHN0cnVjdCBkZmEgKmQsIHBvc2l0aW9uX3NldCBjb25zdCAqcywg aW50IGNvbnRleHQpCiAgIGZvciAoc3RhdGVfbnVtIGogPSAwOyBqIDwgcy0+bmVsZW07IGorKykK ICAgICB7CiAgICAgICBpbnQgYyA9IHMtPmVsZW1zW2pdLmNvbnN0cmFpbnQ7Ci0gICAgICBpZiAo ZC0+dG9rZW5zW3MtPmVsZW1zW2pdLmluZGV4XSA8IDApCisgICAgICBpZiAoZC0+dG9rZW5zW3Mt PmVsZW1zW2pdLmluZGV4XSA8PSBFTkQpCiAgICAgICAgIHsKICAgICAgICAgICBpZiAoc3VjY2Vl ZHNfaW5fY29udGV4dCAoYywgY29udGV4dCwgQ1RYX0FOWSkpCiAgICAgICAgICAgICBjb25zdHJh aW50IHw9IGM7CkBAIC0yMzgwLDYgKzIzODksOCBAQCBkZmFhbmFseXplIChzdHJ1Y3QgZGZhICpk LCBib29sIHNlYXJjaGZsYWcpCiAKICAgcG9zaXRpb25fc2V0IG1lcmdlZDsgICAgICAgICAgLyog UmVzdWx0IG9mIG1lcmdpbmcgc2V0cy4gICovCiAKKyAgYWRkdG9rIChkLCBDQVQpOworCiAjaWZk ZWYgREVCVUcKICAgZnByaW50ZiAoc3RkZXJyLCAiZGZhYW5hbHl6ZTpcbiIpOwogICBmb3IgKHNp emVfdCBpID0gMDsgaSA8IGQtPnRpbmRleDsgKytpKQpAQCAtMjU0MCwyNiArMjU1MSwyMCBAQCBk ZmFhbmFseXplIChzdHJ1Y3QgZGZhICpkLCBib29sIHNlYXJjaGZsYWcpCiAgICAgICB9CiAjZW5k aWYKIAotICAvKiBHZXQgdGhlIGVwc2lsb24gY2xvc3VyZSBvZiB0aGUgZmlyc3Rwb3Mgb2YgdGhl IHJlZ2V4cC4gIFRoZSByZXN1bHQgd2lsbAotICAgICBiZSB0aGUgc2V0IG9mIHBvc2l0aW9ucyBv ZiBzdGF0ZSAwLiAgKi8KLSAgbWVyZ2VkLm5lbGVtID0gMDsKLSAgZm9yIChzaXplX3QgaSA9IDA7 IGkgPCBzdGtbLTFdLm5maXJzdHBvczsgKytpKQotICAgIGluc2VydCAoZmlyc3Rwb3NbaV0sICZt ZXJnZWQpOwotCiAgIC8qIEZvciBlYWNoIGZvbGxvdyBzZXQgdGhhdCBpcyB0aGUgZm9sbG93IHNl dCBvZiBhIHJlYWwgcG9zaXRpb24sIHJlcGxhY2UKICAgICAgaXQgd2l0aCBpdHMgZXBzaWxvbiBj bG9zdXJlLiAgKi8KLSAgZXBzY2xvc3VyZSAoJm1lcmdlZCwgZCk7CisgIGVwc2Nsb3N1cmUgKCZk LT5mb2xsb3dzWzBdLCBkKTsKIAogICAvKiBDb250ZXh0IHdhbnRlZCBieSBzb21lIHBvc2l0aW9u LiAgKi8KLSAgaW50IHNlcGFyYXRlX2NvbnRleHRzID0gc3RhdGVfc2VwYXJhdGVfY29udGV4dHMg KCZtZXJnZWQpOworICBpbnQgc2VwYXJhdGVfY29udGV4dHMgPSBzdGF0ZV9zZXBhcmF0ZV9jb250 ZXh0cyAoJmQtPmZvbGxvd3NbMF0pOwogCiAgIC8qIEJ1aWxkIHRoZSBpbml0aWFsIHN0YXRlLiAg Ki8KICAgaWYgKHNlcGFyYXRlX2NvbnRleHRzICYgQ1RYX05FV0xJTkUpCi0gICAgc3RhdGVfaW5k ZXggKGQsICZtZXJnZWQsIENUWF9ORVdMSU5FKTsKKyAgICBzdGF0ZV9pbmRleCAoZCwgJmQtPmZv bGxvd3NbMF0sIENUWF9ORVdMSU5FKTsKICAgZC0+aW5pdHN0YXRlX25vdGJvbCA9IGQtPm1pbl90 cmNvdW50Ci0gICAgPSBzdGF0ZV9pbmRleCAoZCwgJm1lcmdlZCwgc2VwYXJhdGVfY29udGV4dHMg XiBDVFhfQU5ZKTsKKyAgICA9IHN0YXRlX2luZGV4IChkLCAmZC0+Zm9sbG93c1swXSwgc2VwYXJh dGVfY29udGV4dHMgXiBDVFhfQU5ZKTsKICAgaWYgKHNlcGFyYXRlX2NvbnRleHRzICYgQ1RYX0xF VFRFUikKLSAgICBkLT5taW5fdHJjb3VudCA9IHN0YXRlX2luZGV4IChkLCAmbWVyZ2VkLCBDVFhf TEVUVEVSKTsKKyAgICBkLT5taW5fdHJjb3VudCA9IHN0YXRlX2luZGV4IChkLCAmZC0+Zm9sbG93 c1swXSwgQ1RYX0xFVFRFUik7CiAgIGQtPm1pbl90cmNvdW50Kys7CiAgIGQtPnRyY291bnQgPSAw OwogCkBAIC0zNzgzLDcgKzM3ODgsNyBAQCBkZmFtdXN0IChzdHJ1Y3QgZGZhIGNvbnN0ICpkKQog ICBib29sIG5lZWRfZW5kbGluZSA9IGZhbHNlOwogICBib29sIGNhc2VfZm9sZF91bmlieXRlID0g ZC0+c3ludGF4LmNhc2VfZm9sZCAmJiBNQl9DVVJfTUFYID09IDE7CiAKLSAgZm9yIChzaXplX3Qg cmkgPSAwOyByaSA8IGQtPnRpbmRleDsgKytyaSkKKyAgZm9yIChzaXplX3QgcmkgPSAxOyByaSA8 IGQtPnRpbmRleCAtIDE7ICsrcmkpCiAgICAgewogICAgICAgdG9rZW4gdCA9IGQtPnRva2Vuc1ty aV07CiAgICAgICBzd2l0Y2ggKHQpCi0tIAoxLjcuMQoK --------_58832584000000008786_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0002-dfa-optmization-of-alternation-in-NFA.patch" Content-Disposition: attachment; filename="0002-dfa-optmization-of-alternation-in-NFA.patch" Content-Transfer-Encoding: base64 RnJvbSA0NzI4Mjg2MzI1ODM0ZGQwMjAyNmJmMTIzNGM5ZDQ4MWE1ZGU3ZWU1IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBNb24sIDE3IFNlcCAyMDE4IDIyOjQ2OjI1ICswOTAwClN1YmplY3Q6IFtQQVRDSCAyLzJd IGRmYTogb3B0bWl6YXRpb24gb2YgYWx0ZXJuYXRpb24gaW4gTkZBCgpFdmVuIHdoZW4gc2ltaWxl ciBzdGF0ZXMgZXhpc3RzIGluIGFsdGVybmF0aW9uLCBERkEgdHJlYXRzIHRoZW0gYXMKc2VwYXJh dGUgaXRlbXMuICBJdCBtYXkgY29tcGxpY2F0ZSB0aGUgdHJhbnNpdGlvbiBpbiBORkEgYW5kIGNh dXNlCnNsb3dkb3duLiAgVGhpcyBjaGFuZ2UgYXNzZW1ibGVzIHRoZSBzdGF0ZXMgaW50byBvbmUu ICBGb3IgZXhhbXBsZSwKYWJ8YWMgaXMgY2hhbmdlZCBpbnRvIGEoYnxjKS4gIFRoaXMgY2hhbmdl IHNwZWVkcy11cCBtYXRjaGluZyBmb3IKbWFueSBicmFuY2hlZCBwYXR0ZXJuLiAgRm9yIGV4YW1w bGUsIGdyZXAgc3BlZWRzLXVwIG1vcmUgdGhhbiAzMHgKaW4gZm9sbG93aW5nIGNhc2UuCgogIHNl cSAxMDAwMCB8IHNlZCAncy8kLyBhYmNkZWZnaGlqa2xtbm9wcXJzdHV2d3h5ei87IHMvJC8uLycg PmluCiAgdGltZSAtcCBlbnYgTENfQUxMPUMgZ3JlcCAtdmYgaW4gaW4KCmRmYS5jIChwcnVuZSk6 IE5ldyBmdW5jdGlvbi4KKG1lcmdlX25mYV9zdGF0ZSk6IE5ldyBmdW5jdGlvbi4gIEl0IG1lcmdl cyBzYW1lIE5GQSBzdGF0ZXMuCihkZmFvcHRpbWl6ZSk6IE5ldyBmdW5jdGlvbi4gIEl0IHNlZWtz IG1lcmdlZCBhbmQgcmVtb3ZlZCBub2Rlcy4KKGRmYWFuYWx5emUpOiBDYWxsIG5ldyBmdW5jdGlv bi4KKGRmYXV0Zjhub3NzKTogQ2hhbmdlIG5hbWUgZnJvbSBkZmFvcHRpbWl6ZSBiZWNhdXNlIG9m IGFkZGl0aW9uIG9mIG5ldwpmdW5jdGlvbi4KKGRmYWNvbXApOiBVcGRhdGUgY2FsbGVyLgotLS0K IGxpYi9kZmEuYyB8ICAxNDYgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrLQogMSBmaWxlcyBjaGFuZ2VkLCAxNDQgaW5zZXJ0aW9ucygr KSwgMiBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9saWIvZGZhLmMgYi9saWIvZGZhLmMKaW5k ZXggYzBiMjRmYy4uMzk5MTExMiAxMDA2NDQKLS0tIGEvbGliL2RmYS5jCisrKyBiL2xpYi9kZmEu YwpAQCAtMjEyMSw2ICsyMTIxLDIyIEBAIGRlbGV0ZSAoc2l6ZV90IGRlbCwgcG9zaXRpb25fc2V0 ICpzKQogICByZXR1cm4gMDsKIH0KIAorc3RhdGljIHZvaWQKK3BydW5lIChwb3NpdGlvbl9zZXQg KnMpCit7CisgIHNpemVfdCBkID0gMDsKKworICBmb3IgKHNpemVfdCBpID0gMDsgaSA8IHMtPm5l bGVtOyBpKyspCisgICAgeworICAgICAgaWYgKHMtPmVsZW1zW2ldLmNvbnN0cmFpbnQgPT0gMCkK KyAgICAgICAgY29udGludWU7CisKKyAgICAgIHMtPmVsZW1zW2QrK10gPSBzLT5lbGVtc1tpXTsK KyAgICB9CisKKyAgcy0+bmVsZW0gPSBkOworfQorCiAvKiBSZXBsYWNlIGEgcG9zaXRpb24gd2l0 aCB0aGUgZm9sbG93ZWQgc2V0LiAgKi8KIHN0YXRpYyB2b2lkCiByZXBsYWNlIChwb3NpdGlvbl9z ZXQgKmRzdCwgc2l6ZV90IGRlbCwgcG9zaXRpb25fc2V0ICphZGQsCkBAIC0yMzE0LDYgKzIzMzAs MTMwIEBAIHN0YXRlX3NlcGFyYXRlX2NvbnRleHRzIChwb3NpdGlvbl9zZXQgY29uc3QgKnMpCiAg IHJldHVybiBzZXBhcmF0ZV9jb250ZXh0czsKIH0KIAorZW51bQoreworICAvKiBTaW5nbGUgdG9r ZW4gaXMgcmVwZWF0ZWQuICBJdCBpcyBkaXN0aW5ndWlzaGVkIGZyb20gbm9uLXJlcGVhdGVkLiAg Ki8KKyAgT1BUX1JFUEVBVCA9ICgxIDw8IDApLAorCisgIC8qIE11bHRpcGxlIHRva2VucyBhcmUg cmVwZWF0ZWQuICBUaGlzIGZsYWcgaXMgb24gYXQgaGVhZCBvZiB0b2tlbnMuICBUaGUKKyAgICAg bm9kZSBpcyBub3QgbWVyZ2VkLiAgKi8KKyAgT1BUX0xQQVJFTiA9ICgxIDw8IDEpLAorCisgIC8q IE11bHRpcGxlIGJyYW5jaGVzIGFyZSBqb2luZWQuICBUaGUgbm9kZSBpcyBub3QgbWVyZ2VkLiAg Ki8KKyAgT1BUX1JQQVJFTiA9ICgxIDw8IDIpLAorCisgIC8qIFRoZSBub2RlIGlzIHdhbGtlZC4g IElmIHRoZSBub2RlIGlzIGZvdW5kIGluIHdhbGtpbmcgYWdhaW4sIE9QVF9SUEFSRU4KKyAgICAg ZmxhZyBpcyB0dXJuZWQgb24uICovCisgIE9QVF9XQUxLRUQgPSAoMSA8PCAzKSwKKworICAvKiBU aGUgbm9kZSBpcyBxdWV1ZWQuICBUaGUgbm9kZSBpcyBub3QgcXVldWVkIGFnYWluLiAgKi8KKyAg T1BUX1FVRVVFRCA9ICgxIDw8IDQpCit9OworCitzdGF0aWMgc2l6ZV90CittZXJnZV9uZmFfc3Rh dGUgKHN0cnVjdCBkZmEgKmQsIHNpemVfdCAqcXVldWUsIHNpemVfdCBucXVldWUsIHNpemVfdCB0 aW5kZXgsCisgICAgICAgICAgICAgICAgIGludCAqZmxhZ3MsIHBvc2l0aW9uX3NldCAqbWVyZ2Vk KQoreworICBzaXplX3Qgc2luZGV4OworICBzaXplX3QgZGluZGV4OworCisgIGZvciAoc2l6ZV90 IGkgPSAwOyBpIDwgZC0+Zm9sbG93c1t0aW5kZXhdLm5lbGVtOyBpKyspCisgICAgeworICAgICAg ZGluZGV4ID0gZC0+Zm9sbG93c1t0aW5kZXhdLmVsZW1zW2ldLmluZGV4OworCisgICAgICAvKiBT a2lwIHRoZSBub2RlIGFzIHBydW5lZCBpbiBmdXR1cmUuICAqLworICAgICAgaWYgKGQtPmZvbGxv d3NbdGluZGV4XS5lbGVtc1tpXS5jb25zdHJhaW50ID09IDApCisgICAgICAgIGNvbnRpbnVlOwor CisgICAgICBpZiAodGluZGV4IDwgZGluZGV4ICYmICEoZmxhZ3NbZGluZGV4XSAmIE9QVF9RVUVV RUQpKQorICAgICAgICB7CisgICAgICAgICAgcXVldWVbbnF1ZXVlKytdID0gZGluZGV4OworICAg ICAgICAgIGZsYWdzW2RpbmRleF0gfD0gT1BUX1FVRVVFRDsKKyAgICAgICAgfQorCisgICAgICBp ZiAoZmxhZ3NbZGluZGV4XSAmIChPUFRfTFBBUkVOIHwgT1BUX1JQQVJFTikpCisgICAgICAgIGNv bnRpbnVlOworCisgICAgICBmb3IgKHNpemVfdCBqID0gaSArIDE7IGogPCBkLT5mb2xsb3dzW3Rp bmRleF0ubmVsZW07IGorKykKKyAgICAgICAgeworICAgICAgICAgIHNpbmRleCA9IGQtPmZvbGxv d3NbdGluZGV4XS5lbGVtc1tqXS5pbmRleDsKKworICAgICAgICAgIGlmIChkLT5mb2xsb3dzW3Rp bmRleF0uZWxlbXNbal0uY29uc3RyYWludCA9PSAwKQorICAgICAgICAgICAgY29udGludWU7CisK KyAgICAgICAgICBpZiAoZmxhZ3Nbc2luZGV4XSAmIChPUFRfTFBBUkVOIHwgT1BUX1JQQVJFTikp CisgICAgICAgICAgICBjb250aW51ZTsKKworICAgICAgICAgIGlmIChkLT50b2tlbnNbc2luZGV4 XSAhPSBkLT50b2tlbnNbZGluZGV4XSkKKyAgICAgICAgICAgIGNvbnRpbnVlOworCisgICAgICAg ICAgaWYgKGQtPmZvbGxvd3NbdGluZGV4XS5lbGVtc1tpXS5jb25zdHJhaW50ICE9CisgICAgICAg ICAgICAgIGQtPmZvbGxvd3NbdGluZGV4XS5lbGVtc1tqXS5jb25zdHJhaW50KQorICAgICAgICAg ICAgY29udGludWU7CisKKyAgICAgICAgICBpZiAoKGZsYWdzW3NpbmRleF0gXiBmbGFnc1tkaW5k ZXhdKSAmIE9QVF9SRVBFQVQpCisgICAgICAgICAgICBjb250aW51ZTsKKworICAgICAgICAgIGlm IChmbGFnc1tzaW5kZXhdICYgT1BUX1JFUEVBVCkKKyAgICAgICAgICAgIGRlbGV0ZSAoc2luZGV4 LCAmZC0+Zm9sbG93c1tzaW5kZXhdKTsKKworICAgICAgICAgIG1lcmdlICgmZC0+Zm9sbG93c1tk aW5kZXhdLCAmZC0+Zm9sbG93c1tzaW5kZXhdLCBtZXJnZWQpOworICAgICAgICAgIGNvcHkgKG1l cmdlZCwgJmQtPmZvbGxvd3NbZGluZGV4XSk7CisKKyAgICAgICAgICAvKiBNYXJrIGl0IGFzIHBy dW5lZCBpbiBmdXR1cmUuICAqLworICAgICAgICAgIGQtPmZvbGxvd3NbdGluZGV4XS5lbGVtc1tq XS5jb25zdHJhaW50ID0gMDsKKyAgICAgICAgfQorICAgIH0KKworICBwcnVuZSAoJmQtPmZvbGxv d3NbdGluZGV4XSk7CisKKyAgcmV0dXJuIG5xdWV1ZTsKK30KKworc3RhdGljIHZvaWQKK2RmYW9w dGltaXplIChzdHJ1Y3QgZGZhICpkKQoreworICBpbnQgKmZsYWdzOworICBzaXplX3QgKnF1ZXVl OworICBzaXplX3QgbnF1ZXVlOworICBwb3NpdGlvbl9zZXQgbWVyZ2VkMDsKKyAgcG9zaXRpb25f c2V0ICptZXJnZWQ7CisKKyAgZmxhZ3MgPSB4bm1hbGxvYyAoZC0+dGluZGV4LCBzaXplb2YgKmZs YWdzKTsKKyAgcXVldWUgPSB4bm1hbGxvYyAoZC0+bmxlYXZlcywgc2l6ZW9mICpxdWV1ZSk7CisK KyAgZm9yIChzaXplX3QgaSA9IDA7IGkgPCBkLT50aW5kZXg7ICsraSkKKyAgICBmbGFnc1tpXSA9 IDA7CisKKyAgZm9yIChzaXplX3QgaSA9IDA7IGkgPCBkLT50aW5kZXg7ICsraSkKKyAgICB7Cisg ICAgICBmb3IgKHNpemVfdCBqID0gMDsgaiA8IGQtPmZvbGxvd3NbaV0ubmVsZW07IGorKykKKyAg ICAgICAgeworICAgICAgICAgIGlmIChkLT5mb2xsb3dzW2ldLmVsZW1zW2pdLmluZGV4ID09IGkp CisgICAgICAgICAgICBmbGFnc1tkLT5mb2xsb3dzW2ldLmVsZW1zW2pdLmluZGV4XSB8PSBPUFRf UkVQRUFUOworICAgICAgICAgIGVsc2UgaWYgKGQtPmZvbGxvd3NbaV0uZWxlbXNbal0uaW5kZXgg PCBpKQorICAgICAgICAgICAgZmxhZ3NbZC0+Zm9sbG93c1tpXS5lbGVtc1tqXS5pbmRleF0gfD0g T1BUX0xQQVJFTjsKKyAgICAgICAgICBlbHNlIGlmIChmbGFnc1tkLT5mb2xsb3dzW2ldLmVsZW1z W2pdLmluZGV4XSAmPQorICAgICAgICAgICAgICAgICAgIE9QVF9XQUxLRUQpCisgICAgICAgICAg ICBmbGFnc1tkLT5mb2xsb3dzW2ldLmVsZW1zW2pdLmluZGV4XSB8PSBPUFRfUlBBUkVOOworICAg ICAgICAgIGVsc2UKKyAgICAgICAgICAgIGZsYWdzW2QtPmZvbGxvd3NbaV0uZWxlbXNbal0uaW5k ZXhdIHw9IE9QVF9XQUxLRUQ7CisgICAgICAgIH0KKyAgICB9CisKKyAgbWVyZ2VkID0gJm1lcmdl ZDA7CisgIGFsbG9jX3Bvc2l0aW9uX3NldCAobWVyZ2VkLCBkLT5ubGVhdmVzKTsKKworICBucXVl dWUgPSAwOworICBxdWV1ZVtucXVldWUrK10gPSAwOworCisgIGZvciAoc2l6ZV90IGkgPSAwOyBp IDwgbnF1ZXVlOyBpKyspCisgICAgbnF1ZXVlID0gbWVyZ2VfbmZhX3N0YXRlIChkLCBxdWV1ZSwg bnF1ZXVlLCBxdWV1ZVtpXSwgZmxhZ3MsIG1lcmdlZCk7CisKKyAgZnJlZSAobWVyZ2VkLT5lbGVt cyk7CisgIGZyZWUgKHF1ZXVlKTsKKyAgZnJlZSAoZmxhZ3MpOworfQogCiAvKiBQZXJmb3JtIGJv dHRvbS11cCBhbmFseXNpcyBvbiB0aGUgcGFyc2UgdHJlZSwgY29tcHV0aW5nIHZhcmlvdXMgZnVu Y3Rpb25zLgogICAgTm90ZSB0aGF0IGF0IHRoaXMgcG9pbnQsIHdlJ3JlIHByZXRlbmRpbmcgY29u c3RydWN0cyBsaWtlIFw8IGFyZSByZWFsCkBAIC0yNTMzLDYgKzI2NzMsOCBAQCBkZmFhbmFseXpl IChzdHJ1Y3QgZGZhICpkLCBib29sIHNlYXJjaGZsYWcpCiAjZW5kaWYKICAgICB9CiAKKyAgZGZh b3B0aW1pemUgKGQpOworCiAjaWZkZWYgREVCVUcKICAgZm9yIChzaXplX3QgaSA9IDA7IGkgPCBk LT50aW5kZXg7ICsraSkKICAgICBpZiAoZC0+dG9rZW5zW2ldIDwgTk9UQ0hBUiB8fCBkLT50b2tl bnNbaV0gPT0gQkFDS1JFRgpAQCAtMzM1Myw3ICszNDk1LDcgQEAgZGZhX3N1cHBvcnRlZCAoc3Ry dWN0IGRmYSBjb25zdCAqZCkKIH0KIAogc3RhdGljIHZvaWQKLWRmYW9wdGltaXplIChzdHJ1Y3Qg ZGZhICpkKQorZGZhdXRmOG5vc3MgKHN0cnVjdCBkZmEgKmQpCiB7CiAgIGlmICghZC0+bG9jYWxl aW5mby51c2luZ191dGY4KQogICAgIHJldHVybjsKQEAgLTM0ODEsNyArMzYyMyw3IEBAIGRmYWNv bXAgKGNoYXIgY29uc3QgKnMsIHNpemVfdCBsZW4sIHN0cnVjdCBkZmEgKmQsIGJvb2wgc2VhcmNo ZmxhZykKIAogICBpZiAoZGZhX3N1cHBvcnRlZCAoZCkpCiAgICAgewotICAgICAgZGZhb3B0aW1p emUgKGQpOworICAgICAgZGZhdXRmOG5vc3MgKGQpOwogICAgICAgZGZhYW5hbHl6ZSAoZCwgc2Vh cmNoZmxhZyk7CiAgICAgfQogICBlbHNlCi0tIAoxLjcuMQoK --------_58832584000000008786_MULTIPART_MIXED_-- From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Tue, 18 Sep 2018 06:19:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Norihiro Tanaka , 32750@debbugs.gnu.org Received: via spool by 32750-submit@debbugs.gnu.org id=B32750.153725150811845 (code B ref 32750); Tue, 18 Sep 2018 06:19:01 +0000 Received: (at 32750) by debbugs.gnu.org; 18 Sep 2018 06:18:27 +0000 Received: from localhost ([127.0.0.1]:43555 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g29L9-00034z-N3 for submit@debbugs.gnu.org; Tue, 18 Sep 2018 02:18:27 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46470) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g29L7-00034j-Mq for 32750@debbugs.gnu.org; Tue, 18 Sep 2018 02:18:26 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CD100160514; Mon, 17 Sep 2018 23:18:19 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id y2gAdyP594U6; Mon, 17 Sep 2018 23:18:19 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 368D7160632; Mon, 17 Sep 2018 23:18:19 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id OPj8AZckZjJe; Mon, 17 Sep 2018 23:18:19 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 145F3160514; Mon, 17 Sep 2018 23:18:19 -0700 (PDT) References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <6278c6c7-345a-9d54-6243-e6257c43865c@cs.ucla.edu> Date: Mon, 17 Sep 2018 23:18:18 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Thanks for the patch. A quick question: what does the identifier "dfautf8noss" stand for? I couldn't figure it out. From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Tue, 18 Sep 2018 15:14:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Paul Eggert Cc: 32750@debbugs.gnu.org Received: via spool by 32750-submit@debbugs.gnu.org id=B32750.153728362124985 (code B ref 32750); Tue, 18 Sep 2018 15:14:02 +0000 Received: (at 32750) by debbugs.gnu.org; 18 Sep 2018 15:13:41 +0000 Received: from localhost ([127.0.0.1]:44600 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2Hh7-0006Uv-8R for submit@debbugs.gnu.org; Tue, 18 Sep 2018 11:13:41 -0400 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:49656) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2Hh5-0006Ue-An for 32750@debbugs.gnu.org; Tue, 18 Sep 2018 11:13:40 -0400 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 1E2CC4A0A1A for <32750@debbugs.gnu.org>; Wed, 19 Sep 2018 00:13:32 +0900 (JST) X-matriXscan-loop-detect: 9281c36bfafadac2eba595c6be028cdd4559a7ec Received: from mail04.kcn.ne.jp ([61.86.6.183]) by mxs01-s with ESMTP; Wed, 19 Sep 2018 00:13:31 +0900 (JST) Received: from [10.120.1.71] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail04.kcn.ne.jp (Postfix) with ESMTPA id 5714812900BC; Wed, 19 Sep 2018 00:13:31 +0900 (JST) Date: Wed, 19 Sep 2018 00:13:30 +0900 From: Norihiro Tanaka In-Reply-To: <6278c6c7-345a-9d54-6243-e6257c43865c@cs.ucla.edu> References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> <6278c6c7-345a-9d54-6243-e6257c43865c@cs.ucla.edu> Message-Id: <20180919001329.E2BE.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.73 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Paul Eggert wrote: > Thanks for the patch. A quick question: what does the identifier "dfautf8noss" stand for? I couldn't figure it out. It means "No use superset for utf8". I thought of various things for the name of the function, but I could not think of a good name. From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Paul Jackson Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Tue, 18 Sep 2018 16:17:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: 32750@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.153728737331424 (code B ref -1); Tue, 18 Sep 2018 16:17:01 +0000 Received: (at submit) by debbugs.gnu.org; 18 Sep 2018 16:16:13 +0000 Received: from localhost ([127.0.0.1]:44645 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2Ifd-0008Am-57 for submit@debbugs.gnu.org; Tue, 18 Sep 2018 12:16:13 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36792) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2IfZ-0008AX-Tn for submit@debbugs.gnu.org; Tue, 18 Sep 2018 12:16:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g2IfT-0005TV-P7 for submit@debbugs.gnu.org; Tue, 18 Sep 2018 12:16:04 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:58647) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g2IfT-0005TL-J4 for submit@debbugs.gnu.org; Tue, 18 Sep 2018 12:16:03 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57766) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g2IfS-0006OR-Tp for bug-grep@gnu.org; Tue, 18 Sep 2018 12:16:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g2IfO-0005Qp-Gn for bug-grep@gnu.org; Tue, 18 Sep 2018 12:16:02 -0400 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:39521) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g2IfO-0005Pw-9f for bug-grep@gnu.org; Tue, 18 Sep 2018 12:15:58 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id C6F304E5 for ; Tue, 18 Sep 2018 12:15:55 -0400 (EDT) Received: from web1 ([10.202.2.211]) by compute1.internal (MEProxy); Tue, 18 Sep 2018 12:15:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=LrCn8I j5nW6jELkpqgSLlqbChnavOGqkwYnxyiEuvgw=; b=Nvs29gb9z3DMHWrj73VBK3 2KFoiD0xu9/hT6NnaxojuPFISH8JWv1dWtUCP+4u+z38yZTf1tAEHnFyPIHOhaup cdDswjYVRWZDiCFSEMzczzhZ4q7TCdDXU5wbDgI9LB9OgSdnkwx/WTPD03VXtq/D E6qQ16irnaCX+7iNZprjGSMOjjhdUtzp5FQdgAfi4SAI9DX/DJoqrUE2uMx2XYoD 4ozmuyPJfmQFRVwVwcZ3XUR7JRIc2YNmprZ4EWR8uw8EqQ4D6J71APUtQYHFiMX6 iSumw2+UbTdiJt/nUOjkrPI1U3uVTwsDefknTApnNIYgvpTPCt+FYf43gYmGqtew == X-ME-Proxy: X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id F04D0941B4; Tue, 18 Sep 2018 12:15:54 -0400 (EDT) Message-Id: <1537287354.1159723.1512345000.62D22C94@webmail.messagingengine.com> From: Paul Jackson MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface - ajax-e556cd15 Date: Tue, 18 Sep 2018 11:15:54 -0500 References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> <6278c6c7-345a-9d54-6243-e6257c43865c@cs.ucla.edu> <20180919001329.E2BE.27F6AC2D@kcn.ne.jp> In-Reply-To: <20180919001329.E2BE.27F6AC2D@kcn.ne.jp> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.3 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.3 (-----) Norihiro Tanaka wrote: >> It means "No use superset for utf8". Would a comment be useful, such as in the following, (which may have to be reworded to be correct) ? /* Optimize DFA, but don't use superset (noss) for utf8 */ static void -dfaoptimize (struct dfa *d) +dfautf8noss (struct dfa *d) -- Paul Jackson pj@usa.net From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Eric Blake Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Tue, 18 Sep 2018 16:25:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Paul Jackson , 32750@debbugs.gnu.org Received: via spool by 32750-submit@debbugs.gnu.org id=B32750.153728786032213 (code B ref 32750); Tue, 18 Sep 2018 16:25:02 +0000 Received: (at 32750) by debbugs.gnu.org; 18 Sep 2018 16:24:20 +0000 Received: from localhost ([127.0.0.1]:44649 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2InT-0008NV-Vl for submit@debbugs.gnu.org; Tue, 18 Sep 2018 12:24:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53534) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2InS-0008NG-1O for 32750@debbugs.gnu.org; Tue, 18 Sep 2018 12:24:18 -0400 Received: from smtp.corp.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 15B7E8553D; Tue, 18 Sep 2018 16:24:12 +0000 (UTC) Received: from [10.10.122.46] (ovpn-122-46.rdu2.redhat.com [10.10.122.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A19B87502; Tue, 18 Sep 2018 16:24:11 +0000 (UTC) References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> <6278c6c7-345a-9d54-6243-e6257c43865c@cs.ucla.edu> <20180919001329.E2BE.27F6AC2D@kcn.ne.jp> <1537287354.1159723.1512345000.62D22C94@webmail.messagingengine.com> From: Eric Blake Organization: Red Hat, Inc. Message-ID: <06bb041a-c263-da0a-be5c-fe2693f95d22@redhat.com> Date: Tue, 18 Sep 2018 11:24:10 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: <1537287354.1159723.1512345000.62D22C94@webmail.messagingengine.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 18 Sep 2018 16:24:12 +0000 (UTC) X-Spam-Score: -5.0 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) On 9/18/18 11:15 AM, Paul Jackson wrote: > Norihiro Tanaka wrote: >>> It means "No use superset for utf8". > > Would a comment be useful, such as in the following, > (which may have to be reworded to be correct) ? > > /* Optimize DFA, but don't use superset (noss) for utf8 */ > static void > -dfaoptimize (struct dfa *d) > +dfautf8noss (struct dfa *d) Or even some strategic spelling, as in: dfa_utf8_no_ss -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org From unknown Tue Jun 24 20:52:07 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Norihiro Tanaka Subject: bug#32750: closed (Re: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA) Message-ID: References: <69bce139-3875-52bb-bc6d-edae41ac023c@cs.ucla.edu> <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> X-Gnu-PR-Message: they-closed 32750 X-Gnu-PR-Package: grep X-Gnu-PR-Keywords: patch Reply-To: 32750@debbugs.gnu.org Date: Wed, 19 Sep 2018 04:13:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1537330382-2711-1" This is a multi-part message in MIME format... ------------=_1537330382-2711-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #32750: [PATCH 2/2] dfa: optmization of alternation in NFA which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 32750@debbugs.gnu.org. --=20 32750: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D32750 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1537330382-2711-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 32750-done) by debbugs.gnu.org; 19 Sep 2018 04:12:33 +0000 Received: from localhost ([127.0.0.1]:45001 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2Tqq-0000go-EU for submit@debbugs.gnu.org; Wed, 19 Sep 2018 00:12:33 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:48022) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2Tqo-0000gZ-57 for 32750-done@debbugs.gnu.org; Wed, 19 Sep 2018 00:12:31 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9DC2616161C; Tue, 18 Sep 2018 21:12:23 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id JbybjFt9Uw9a; Tue, 18 Sep 2018 21:12:21 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B5476161624; Tue, 18 Sep 2018 21:12:21 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 6IRJp6lTdqre; Tue, 18 Sep 2018 21:12:21 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 7FCCC16161C; Tue, 18 Sep 2018 21:12:21 -0700 (PDT) Subject: Re: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA To: Norihiro Tanaka References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> <6278c6c7-345a-9d54-6243-e6257c43865c@cs.ucla.edu> <20180919001329.E2BE.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <69bce139-3875-52bb-bc6d-edae41ac023c@cs.ucla.edu> Date: Tue, 18 Sep 2018 21:12:18 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180919001329.E2BE.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------31F832DF804FDC70D4569485" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32750-done Cc: 32750-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------31F832DF804FDC70D4569485 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Norihiro Tanaka wrote: > I thought of various things for the name of the function, but I could > not think of a good name. Yes, that's a tough one. I eventually came up with maybe_disable_superset_dfa. Perhaps we can think of an ever better one. I installed your patches into Gnulib, along with the attached relatively-minor fixups, and then propagated the result into grep by updating the Gnulib version that the grep master points to. As you may notice, nowadays I prefer using signed types like ptrdiff_t to unsigned ones like size_t, as the signed types have a significant advantage in overflow checking with gcc -fsanitize=undefined. Perhaps at some point we should change the other instances of size_t in dfa.c, but one step at a time. Thanks again for the performance improvement. --------------31F832DF804FDC70D4569485 Content-Type: text/x-patch; name="0001-dfa-reorder-enum-for-efficiency.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-dfa-reorder-enum-for-efficiency.patch" >From 5eae16cfb8d40ca75691605e8f2f4a40eab70c7a Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 18 Sep 2018 15:25:51 -0700 Subject: [PATCH 1/4] dfa: reorder enum for efficiency * lib/dfa.c (END): Now -1 again. Reorder other elements of the enumeration to make it easier for GCC to generate efficient code by using fewer comparisons to check for ranges of values. (atom): Take advantage of the reordering. --- ChangeLog | 9 ++++ lib/dfa.c | 130 +++++++++++++++++++++++++++++------------------------- 2 files changed, 78 insertions(+), 61 deletions(-) diff --git a/ChangeLog b/ChangeLog index 59581e3c5..64926503a 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,12 @@ +2018-09-18 Paul Eggert + + dfa: reorder enum for efficiency + * lib/dfa.c (END): Now -1 again. Reorder other elements + of the enumeration to make it easier for GCC to generate + efficient code by using fewer comparisons to check for + ranges of values. + (atom): Take advantage of the reordering. + 2018-09-18 Norihiro Tanaka dfa: optimize alternation in NFA diff --git a/lib/dfa.c b/lib/dfa.c index 3991112ef..849645154 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -223,50 +223,24 @@ typedef ptrdiff_t state_num; /* Predefined token values. */ enum { - END = -2, /* END is a terminal symbol that matches the + END = -1, /* END is a terminal symbol that matches the end of input; any value of END or less in the parse tree is such a symbol. Accepting states of the DFA are those that would have - a transition on END. */ - - BEG = -1, /* BEG is a beginning symbol that matches the - biginning of input. */ + a transition on END. This is -1, not some + more-negative value, to tweak the speed of + comparisons to END. */ /* Ordinary character values are terminal symbols that match themselves. */ + /* CSET must come last in the following list of special tokens. Otherwise, + the list order matters only for performance. Related special tokens + should have nearby values so that code like (t == ANYCHAR || t == MBCSET + || CSET <= t) can be done with a single machine-level comparison. */ + EMPTY = NOTCHAR, /* EMPTY is a terminal symbol that matches the empty string. */ - BACKREF, /* BACKREF is generated by \ - or by any other construct that - is not completely handled. If the scanner - detects a transition on backref, it returns - a kind of "semi-success" indicating that - the match will have to be verified with - a backtracking matcher. */ - - BEGLINE, /* BEGLINE is a terminal symbol that matches - the empty string at the beginning of a - line. */ - - ENDLINE, /* ENDLINE is a terminal symbol that matches - the empty string at the end of a line. */ - - BEGWORD, /* BEGWORD is a terminal symbol that matches - the empty string at the beginning of a - word. */ - - ENDWORD, /* ENDWORD is a terminal symbol that matches - the empty string at the end of a word. */ - - LIMWORD, /* LIMWORD is a terminal symbol that matches - the empty string at the beginning or the - end of a word. */ - - NOTLIMWORD, /* NOTLIMWORD is a terminal symbol that - matches the empty string not at - the beginning or end of a word. */ - QMARK, /* QMARK is an operator of one argument that matches zero or one occurrences of its argument. */ @@ -296,16 +270,49 @@ enum RPAREN, /* RPAREN never appears in the parse tree. */ + WCHAR, /* Only returned by lex. wctok contains + the wide character representation. */ + ANYCHAR, /* ANYCHAR is a terminal symbol that matches a valid multibyte (or single byte) character. It is used only if MB_CUR_MAX > 1. */ + BEG, /* BEG is an initial symbol that matches the + beginning of input. */ + + BEGLINE, /* BEGLINE is a terminal symbol that matches + the empty string at the beginning of a + line. */ + + ENDLINE, /* ENDLINE is a terminal symbol that matches + the empty string at the end of a line. */ + + BEGWORD, /* BEGWORD is a terminal symbol that matches + the empty string at the beginning of a + word. */ + + ENDWORD, /* ENDWORD is a terminal symbol that matches + the empty string at the end of a word. */ + + LIMWORD, /* LIMWORD is a terminal symbol that matches + the empty string at the beginning or the + end of a word. */ + + NOTLIMWORD, /* NOTLIMWORD is a terminal symbol that + matches the empty string not at + the beginning or end of a word. */ + + BACKREF, /* BACKREF is generated by \ + or by any other construct that + is not completely handled. If the scanner + detects a transition on backref, it returns + a kind of "semi-success" indicating that + the match will have to be verified with + a backtracking matcher. */ + MBCSET, /* MBCSET is similar to CSET, but for multibyte characters. */ - WCHAR, /* Only returned by lex. wctok contains - the wide character representation. */ - CSET /* CSET and (and any value greater) is a terminal symbol that matches any of a class of characters. */ @@ -1803,7 +1810,30 @@ add_utf8_anychar (struct dfa *dfa) static void atom (struct dfa *dfa) { - if (dfa->parse.tok == WCHAR) + if ((0 <= dfa->parse.tok && dfa->parse.tok < NOTCHAR) + || dfa->parse.tok >= CSET + || dfa->parse.tok == BEG || dfa->parse.tok == BACKREF + || dfa->parse.tok == BEGLINE || dfa->parse.tok == ENDLINE + || dfa->parse.tok == BEGWORD || dfa->parse.tok == ENDWORD + || dfa->parse.tok == LIMWORD || dfa->parse.tok == NOTLIMWORD + || dfa->parse.tok == ANYCHAR || dfa->parse.tok == MBCSET) + { + if (dfa->parse.tok == ANYCHAR && dfa->localeinfo.using_utf8) + { + /* For UTF-8 expand the period to a series of CSETs that define a + valid UTF-8 character. This avoids using the slow multibyte + path. I'm pretty sure it would be both profitable and correct to + do it for any encoding; however, the optimization must be done + manually as it is done above in add_utf8_anychar. So, let's + start with UTF-8: it is the most used, and the structure of the + encoding makes the correctness more obvious. */ + add_utf8_anychar (dfa); + } + else + addtok (dfa, dfa->parse.tok); + dfa->parse.tok = lex (dfa); + } + else if (dfa->parse.tok == WCHAR) { if (dfa->lex.wctok == WEOF) addtok (dfa, BACKREF); @@ -1826,28 +1856,6 @@ atom (struct dfa *dfa) dfa->parse.tok = lex (dfa); } - else if (dfa->parse.tok == ANYCHAR && dfa->localeinfo.using_utf8) - { - /* For UTF-8 expand the period to a series of CSETs that define a valid - UTF-8 character. This avoids using the slow multibyte path. I'm - pretty sure it would be both profitable and correct to do it for - any encoding; however, the optimization must be done manually as - it is done above in add_utf8_anychar. So, let's start with - UTF-8: it is the most used, and the structure of the encoding - makes the correctness more obvious. */ - add_utf8_anychar (dfa); - dfa->parse.tok = lex (dfa); - } - else if ((0 <= dfa->parse.tok && dfa->parse.tok < NOTCHAR) - || dfa->parse.tok >= CSET || dfa->parse.tok == BACKREF - || dfa->parse.tok == BEGLINE || dfa->parse.tok == ENDLINE - || dfa->parse.tok == BEGWORD || dfa->parse.tok == ANYCHAR - || dfa->parse.tok == MBCSET || dfa->parse.tok == ENDWORD - || dfa->parse.tok == LIMWORD || dfa->parse.tok == NOTLIMWORD) - { - addtok (dfa, dfa->parse.tok); - dfa->parse.tok = lex (dfa); - } else if (dfa->parse.tok == LPAREN) { dfa->parse.tok = lex (dfa); -- 2.17.1 --------------31F832DF804FDC70D4569485 Content-Type: text/x-patch; name="0002-dfa-prune-states-as-we-go.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-dfa-prune-states-as-we-go.patch" >From cbf5523bd54a92c1d8ffeae4b629d2b82651f651 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 18 Sep 2018 18:24:27 -0700 Subject: [PATCH 2/4] dfa: prune states as we go * lib/dfa.c (prune): Remove. (merge_nfa_state): Prune as we go instead of at the end. Prefer ptrdiff_t for indexes, as this helps the compiler a bit. Simplify constraint checking a bit. --- ChangeLog | 5 +++++ lib/dfa.c | 49 ++++++++++++++++--------------------------------- 2 files changed, 21 insertions(+), 33 deletions(-) diff --git a/ChangeLog b/ChangeLog index 64926503a..bad8eda6d 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,6 +1,11 @@ 2018-09-18 Paul Eggert + dfa: prune states as we go + * lib/dfa.c (prune): Remove. dfa: reorder enum for efficiency + (merge_nfa_state): Prune as we go instead of at the end. + Prefer ptrdiff_t for indexes, as this helps the compiler a bit. + * lib/dfa.c (END): Now -1 again. Reorder other elements of the enumeration to make it easier for GCC to generate efficient code by using fewer comparisons to check for diff --git a/lib/dfa.c b/lib/dfa.c index 849645154..73bd6ca09 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2129,22 +2129,6 @@ delete (size_t del, position_set *s) return 0; } -static void -prune (position_set *s) -{ - size_t d = 0; - - for (size_t i = 0; i < s->nelem; i++) - { - if (s->elems[i].constraint == 0) - continue; - - s->elems[d++] = s->elems[i]; - } - - s->nelem = d; -} - /* Replace a position with the followed set. */ static void replace (position_set *dst, size_t del, position_set *add, @@ -2362,17 +2346,20 @@ static size_t merge_nfa_state (struct dfa *d, size_t *queue, size_t nqueue, size_t tindex, int *flags, position_set *merged) { - size_t sindex; - size_t dindex; + position_set *follows = d->follows; + ptrdiff_t nelem = 0; - for (size_t i = 0; i < d->follows[tindex].nelem; i++) + for (ptrdiff_t i = 0; i < follows[tindex].nelem; i++) { - dindex = d->follows[tindex].elems[i].index; + size_t dindex = follows[tindex].elems[i].index; /* Skip the node as pruned in future. */ - if (d->follows[tindex].elems[i].constraint == 0) + unsigned int iconstraint = follows[tindex].elems[i].constraint; + if (iconstraint == 0) continue; + follows[tindex].elems[nelem++] = follows[tindex].elems[i]; + if (tindex < dindex && !(flags[dindex] & OPT_QUEUED)) { queue[nqueue++] = dindex; @@ -2382,11 +2369,11 @@ merge_nfa_state (struct dfa *d, size_t *queue, size_t nqueue, size_t tindex, if (flags[dindex] & (OPT_LPAREN | OPT_RPAREN)) continue; - for (size_t j = i + 1; j < d->follows[tindex].nelem; j++) + for (size_t j = i + 1; j < follows[tindex].nelem; j++) { - sindex = d->follows[tindex].elems[j].index; + size_t sindex = follows[tindex].elems[j].index; - if (d->follows[tindex].elems[j].constraint == 0) + if (follows[tindex].elems[j].constraint != iconstraint) continue; if (flags[sindex] & (OPT_LPAREN | OPT_RPAREN)) @@ -2395,25 +2382,21 @@ merge_nfa_state (struct dfa *d, size_t *queue, size_t nqueue, size_t tindex, if (d->tokens[sindex] != d->tokens[dindex]) continue; - if (d->follows[tindex].elems[i].constraint != - d->follows[tindex].elems[j].constraint) - continue; - if ((flags[sindex] ^ flags[dindex]) & OPT_REPEAT) continue; if (flags[sindex] & OPT_REPEAT) - delete (sindex, &d->follows[sindex]); + delete (sindex, &follows[sindex]); - merge (&d->follows[dindex], &d->follows[sindex], merged); - copy (merged, &d->follows[dindex]); + merge (&follows[dindex], &follows[sindex], merged); + copy (merged, &follows[dindex]); /* Mark it as pruned in future. */ - d->follows[tindex].elems[j].constraint = 0; + follows[tindex].elems[j].constraint = 0; } } - prune (&d->follows[tindex]); + follows[tindex].nelem = nelem; return nqueue; } -- 2.17.1 --------------31F832DF804FDC70D4569485 Content-Type: text/x-patch; name="0003-dfa-tweak-allocation-performance.patch" Content-Disposition: attachment; filename="0003-dfa-tweak-allocation-performance.patch" Content-Transfer-Encoding: quoted-printable >From 28d7f171a7ac284c2377793559d55e887610fcc8 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 18 Sep 2018 19:05:26 -0700 Subject: [PATCH 3/4] dfa: tweak allocation performance MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit * lib/dfa.c (merge_nfa_state, dfaoptimize): Prefer ptrdiff_t for indexes some more. Use char for flags, as it=E2=80=99s wide enough. Allocate queue and flags together, with one malloc call. No need to use xnmalloc since the multiplication and addition cannot overflow (it=E2=80=99s already been checked by earlier allocation). Prefer memset to open-coding. --- ChangeLog | 9 +++++++++ lib/dfa.c | 34 ++++++++++++++++------------------ 2 files changed, 25 insertions(+), 18 deletions(-) diff --git a/ChangeLog b/ChangeLog index bad8eda6d..9c4b12c60 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,14 @@ 2018-09-18 Paul Eggert =20 + dfa: tweak allocation performance + * lib/dfa.c (merge_nfa_state, dfaoptimize): + Prefer ptrdiff_t for indexes some more. + Use char for flags, as it=E2=80=99s wide enough. + Allocate queue and flags together, with one malloc call. + No need to use xnmalloc since the multiplication and + addition cannot overflow (it=E2=80=99s already been checked by + earlier allocation). Prefer memset to open-coding. + dfa: prune states as we go * lib/dfa.c (prune): Remove. dfa: reorder enum for efficiency diff --git a/lib/dfa.c b/lib/dfa.c index 73bd6ca09..59e15195a 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2342,9 +2342,9 @@ enum OPT_QUEUED =3D (1 << 4) }; =20 -static size_t -merge_nfa_state (struct dfa *d, size_t *queue, size_t nqueue, size_t tin= dex, - int *flags, position_set *merged) +static ptrdiff_t +merge_nfa_state (struct dfa *d, size_t *queue, ptrdiff_t nqueue, size_t = tindex, + char *flags, position_set *merged) { position_set *follows =3D d->follows; ptrdiff_t nelem =3D 0; @@ -2369,7 +2369,7 @@ merge_nfa_state (struct dfa *d, size_t *queue, size= _t nqueue, size_t tindex, if (flags[dindex] & (OPT_LPAREN | OPT_RPAREN)) continue; =20 - for (size_t j =3D i + 1; j < follows[tindex].nelem; j++) + for (ptrdiff_t j =3D i + 1; j < follows[tindex].nelem; j++) { size_t sindex =3D follows[tindex].elems[j].index; =20 @@ -2404,28 +2404,27 @@ merge_nfa_state (struct dfa *d, size_t *queue, si= ze_t nqueue, size_t tindex, static void dfaoptimize (struct dfa *d) { - int *flags; + char *flags; size_t *queue; - size_t nqueue; + ptrdiff_t nqueue; position_set merged0; position_set *merged; + ptrdiff_t extra_space =3D d->tindex + sizeof *queue - 1; + extra_space -=3D extra_space % sizeof *queue; =20 - flags =3D xnmalloc (d->tindex, sizeof *flags); - queue =3D xnmalloc (d->nleaves, sizeof *queue); - - for (size_t i =3D 0; i < d->tindex; ++i) - flags[i] =3D 0; + queue =3D xmalloc (d->nleaves * sizeof *queue + extra_space); + flags =3D (char *) (queue + d->nleaves); + memset (flags, 0, d->tindex * sizeof *flags); =20 - for (size_t i =3D 0; i < d->tindex; ++i) + for (size_t i =3D 0; i < d->tindex; i++) { - for (size_t j =3D 0; j < d->follows[i].nelem; j++) + for (ptrdiff_t j =3D 0; j < d->follows[i].nelem; j++) { if (d->follows[i].elems[j].index =3D=3D i) flags[d->follows[i].elems[j].index] |=3D OPT_REPEAT; else if (d->follows[i].elems[j].index < i) flags[d->follows[i].elems[j].index] |=3D OPT_LPAREN; - else if (flags[d->follows[i].elems[j].index] &=3D - OPT_WALKED) + else if (flags[d->follows[i].elems[j].index] &=3D OPT_WALKED) flags[d->follows[i].elems[j].index] |=3D OPT_RPAREN; else flags[d->follows[i].elems[j].index] |=3D OPT_WALKED; @@ -2438,12 +2437,11 @@ dfaoptimize (struct dfa *d) nqueue =3D 0; queue[nqueue++] =3D 0; =20 - for (size_t i =3D 0; i < nqueue; i++) + for (ptrdiff_t i =3D 0; i < nqueue; i++) nqueue =3D merge_nfa_state (d, queue, nqueue, queue[i], flags, merge= d); =20 free (merged->elems); free (queue); - free (flags); } =20 /* Perform bottom-up analysis on the parse tree, computing various funct= ions. @@ -3921,7 +3919,7 @@ dfamust (struct dfa const *d) bool need_endline =3D false; bool case_fold_unibyte =3D d->syntax.case_fold && MB_CUR_MAX =3D=3D 1; =20 - for (size_t ri =3D 1; ri < d->tindex - 1; ++ri) + for (size_t ri =3D 1; ri + 1 < d->tindex; ri++) { token t =3D d->tokens[ri]; switch (t) --=20 2.17.1 --------------31F832DF804FDC70D4569485 Content-Type: text/x-patch; name="0004-dfa-use-more-informative-function-name.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0004-dfa-use-more-informative-function-name.patch" >From bc7c0f010b3c6c4214e026bf678a62ef92f61a4f Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 18 Sep 2018 19:12:06 -0700 Subject: [PATCH 4/4] dfa: use more-informative function name * lib/dfa.c (maybe_disable_superset_dfa): Rename from dfautf8noss. Use change. --- ChangeLog | 4 ++++ lib/dfa.c | 6 ++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/ChangeLog b/ChangeLog index 9c4b12c60..fa7a4557d 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,9 @@ 2018-09-18 Paul Eggert + dfa: use more-informative function name + * lib/dfa.c (maybe_disable_superset_dfa): + Rename from dfautf8noss. Use change. + dfa: tweak allocation performance * lib/dfa.c (merge_nfa_state, dfaoptimize): Prefer ptrdiff_t for indexes some more. diff --git a/lib/dfa.c b/lib/dfa.c index 59e15195a..760e060c3 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -3483,8 +3483,10 @@ dfa_supported (struct dfa const *d) return true; } +/* Disable use of the superset DFA is it is not likely to help + performance. */ static void -dfautf8noss (struct dfa *d) +maybe_disable_superset_dfa (struct dfa *d) { if (!d->localeinfo.using_utf8) return; @@ -3612,7 +3614,7 @@ dfacomp (char const *s, size_t len, struct dfa *d, bool searchflag) if (dfa_supported (d)) { - dfautf8noss (d); + maybe_disable_superset_dfa (d); dfaanalyze (d, searchflag); } else -- 2.17.1 --------------31F832DF804FDC70D4569485-- ------------=_1537330382-2711-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 17 Sep 2018 14:15:20 +0000 Received: from localhost ([127.0.0.1]:43225 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g1uJ3-0005MF-WB for submit@debbugs.gnu.org; Mon, 17 Sep 2018 10:15:20 -0400 Received: from eggs.gnu.org ([208.118.235.92]:42383) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g1uJ2-0005M3-Hk for submit@debbugs.gnu.org; Mon, 17 Sep 2018 10:15:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g1uIs-00036x-PT for submit@debbugs.gnu.org; Mon, 17 Sep 2018 10:15:11 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:57846) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g1uIs-00036c-6s for submit@debbugs.gnu.org; Mon, 17 Sep 2018 10:15:06 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35088) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g1uIq-0001P6-MD for bug-grep@gnu.org; Mon, 17 Sep 2018 10:15:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g1uIn-00033d-D0 for bug-grep@gnu.org; Mon, 17 Sep 2018 10:15:04 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:50509) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g1uIm-0002yV-Jd for bug-grep@gnu.org; Mon, 17 Sep 2018 10:15:01 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 1B7E1880B7E for ; Mon, 17 Sep 2018 23:14:55 +0900 (JST) X-matriXscan-loop-detect: 9212771b8289570d209176f874d7fae50b1744e5 Received: from mail07.kcn.ne.jp ([61.86.6.186]) by mxs02-s with ESMTP; Mon, 17 Sep 2018 23:14:53 +0900 (JST) Received: from [10.120.1.63] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail07.kcn.ne.jp (Postfix) with ESMTPA id 97263D50099 for ; Mon, 17 Sep 2018 23:14:53 +0900 (JST) Date: Mon, 17 Sep 2018 23:14:52 +0900 From: Norihiro Tanaka To: Subject: [PATCH 2/2] dfa: optmization of alternation in NFA Message-Id: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_58832584000000008786_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.73 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) --------_58832584000000008786_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Hi, Even when similer states exists in alternation, DFA treats them as separate items. It may complicate the transition in NFA and cause slowdown. This change assembles the states into one. For example, ab|ac is changed into a(b|c). This change speeds-up matching for many branched pattern. For example, grep speeds-up more than 30x in following case. seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in time -p env LC_ALL=C grep -vf in in (before) real 63.43 user 61.67 system 1.65 (after) real 1.64 user 1.61 system 0.03 # If we do not add '.' at last in pattern, not dfa but kwset is used. grep also speeds-up about 3x in following case. time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words (before) real 2.48 user 2.09 system 0.38 (after) real 7.69 user 6.32 system 1.29 Thanks, Norihiro --------_58832584000000008786_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-simplify-initial-state.patch" Content-Disposition: attachment; filename="0001-dfa-simplify-initial-state.patch" Content-Transfer-Encoding: base64 RnJvbSAzMTkzMTkxNzMwZDZlY2IzYTBjNGUzOGI0NjE0ODRkZWFmODE5Zjg3IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBNb24sIDE3IFNlcCAyMDE4IDIyOjIwOjM3ICswOTAwClN1YmplY3Q6IFtQQVRDSCAxLzJd IGRmYTogc2ltcGxpZnkgaW5pdGlhbCBzdGF0ZQoKVG8gc2ltcGxpZnkgaW5pdGlhbCBzdGF0ZSBl bmFibGVzIHRvIGJlIGVhc3kgdG8gb3B0aW1pemF0aW9uIG9mIE5GQS4KCmRmYS5jIChlbnVtIHRv a2VuKTogQWRkIG5ldyBlbGVtZW50IEJFRy4KKHBydG9rKTogQWRqdXN0IGR1ZSB0byBhZGRpbmcg ZWxlbWVudCBCRUcuCihkZmFwYXJzZSk6IFB1dCBCRUcgYXQgYSBoZWFkIG9mIHRva2Vucy4KKHN0 YXRlX2luZGV4KTogQWRqdXN0IGR1ZSB0byBhZGRpbmcgZWxlbWVudCBCRUcuCihkZmFhbmFseXpl KTogQ29uY2F0ZW5hdGUgQkVHIHRvIG90aGVyIHRva2VucywgYW5kIHNpbXBsaWZ5IHRvIGJ1aWxk IGluaXRpYWwKc3RhdGUuCihkZmFtdXN0KTogQWRqdXN0IGR1ZSB0byBhZGRpbmcgZWxlbWVudCBC RUcuICBERkFNVVNUIGlnbm9yZXMgaXQuCi0tLQogbGliL2RmYS5jIHwgICAzNyArKysrKysrKysr KysrKysrKysrKystLS0tLS0tLS0tLS0tLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDIxIGluc2VydGlv bnMoKyksIDE2IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL2xpYi9kZmEuYyBiL2xpYi9kZmEu YwppbmRleCBlMzMwODRkLi5jMGIyNGZjIDEwMDY0NAotLS0gYS9saWIvZGZhLmMKKysrIGIvbGli L2RmYS5jCkBAIC0yMjMsMTIgKzIyMywxNSBAQCB0eXBlZGVmIHB0cmRpZmZfdCBzdGF0ZV9udW07 CiAvKiBQcmVkZWZpbmVkIHRva2VuIHZhbHVlcy4gICovCiBlbnVtCiB7Ci0gIEVORCA9IC0xLCAg ICAgICAgICAgICAgICAgICAgIC8qIEVORCBpcyBhIHRlcm1pbmFsIHN5bWJvbCB0aGF0IG1hdGNo ZXMgdGhlCisgIEVORCA9IC0yLCAgICAgICAgICAgICAgICAgICAgIC8qIEVORCBpcyBhIHRlcm1p bmFsIHN5bWJvbCB0aGF0IG1hdGNoZXMgdGhlCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIGVuZCBvZiBpbnB1dDsgYW55IHZhbHVlIG9mIEVORCBvciBsZXNzIGluCiAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgIHRoZSBwYXJzZSB0cmVlIGlzIHN1Y2ggYSBzeW1i b2wuICBBY2NlcHRpbmcKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgc3RhdGVz IG9mIHRoZSBERkEgYXJlIHRob3NlIHRoYXQgd291bGQgaGF2ZQogICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICBhIHRyYW5zaXRpb24gb24gRU5ELiAgKi8KIAorICBCRUcgPSAtMSwg ICAgICAgICAgICAgICAgICAgICAvKiBCRUcgaXMgYSBiZWdpbm5pbmcgc3ltYm9sIHRoYXQgbWF0 Y2hlcyB0aGUKKyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgYmlnaW5uaW5nIG9m IGlucHV0LiAgKi8KKwogICAvKiBPcmRpbmFyeSBjaGFyYWN0ZXIgdmFsdWVzIGFyZSB0ZXJtaW5h bCBzeW1ib2xzIHRoYXQgbWF0Y2ggdGhlbXNlbHZlcy4gICovCiAKICAgRU1QVFkgPSBOT1RDSEFS LCAgICAgICAgICAgICAgLyogRU1QVFkgaXMgYSB0ZXJtaW5hbCBzeW1ib2wgdGhhdCBtYXRjaGVz CkBAIC02MzAsOSArNjMzLDkgQEAgbWJzX3RvX3djaGFyICh3aW50X3QgKnB3YywgY2hhciBjb25z dCAqcywgc2l6ZV90IG4sIHN0cnVjdCBkZmEgKmQpCiBzdGF0aWMgdm9pZAogcHJ0b2sgKHRva2Vu IHQpCiB7Ci0gIGlmICh0IDwgMCkKKyAgaWYgKHQgPD0gRU5EKQogICAgIGZwcmludGYgKHN0ZGVy ciwgIkVORCIpOwotICBlbHNlIGlmICh0IDwgTk9UQ0hBUikKKyAgZWxzZSBpZiAoMCA8PSB0ICYm IHQgPCBOT1RDSEFSKQogICAgIHsKICAgICAgIHVuc2lnbmVkIGludCBjaCA9IHQ7CiAgICAgICBm cHJpbnRmIChzdGRlcnIsICIweCUwMngiLCBjaCk7CkBAIC02NDIsNiArNjQ1LDkgQEAgcHJ0b2sg KHRva2VuIHQpCiAgICAgICBjaGFyIGNvbnN0ICpzOwogICAgICAgc3dpdGNoICh0KQogICAgICAg ICB7CisgICAgICAgIGNhc2UgQkVHOgorICAgICAgICAgIHMgPSAiQkVHIjsKKyAgICAgICAgICBi cmVhazsKICAgICAgICAgY2FzZSBFTVBUWToKICAgICAgICAgICBzID0gIkVNUFRZIjsKICAgICAg ICAgICBicmVhazsKQEAgLTE5NjcsNiArMTk3Myw5IEBAIGRmYXBhcnNlIChjaGFyIGNvbnN0ICpz LCBzaXplX3QgbGVuLCBzdHJ1Y3QgZGZhICpkKQogICBpZiAoIWQtPnN5bnRheC5zeW50YXhfYml0 c19zZXQpCiAgICAgZGZhZXJyb3IgKF8oIm5vIHN5bnRheCBzcGVjaWZpZWQiKSk7CiAKKyAgaWYg KCFkLT5ucmVnZXhwcykKKyAgICBhZGR0b2sgKGQsIEJFRyk7CisKICAgZC0+cGFyc2UudG9rID0g bGV4IChkKTsKICAgZC0+cGFyc2UuZGVwdGggPSBkLT5kZXB0aDsKIApAQCAtMjE4MCw3ICsyMTg5 LDcgQEAgc3RhdGVfaW5kZXggKHN0cnVjdCBkZmEgKmQsIHBvc2l0aW9uX3NldCBjb25zdCAqcywg aW50IGNvbnRleHQpCiAgIGZvciAoc3RhdGVfbnVtIGogPSAwOyBqIDwgcy0+bmVsZW07IGorKykK ICAgICB7CiAgICAgICBpbnQgYyA9IHMtPmVsZW1zW2pdLmNvbnN0cmFpbnQ7Ci0gICAgICBpZiAo ZC0+dG9rZW5zW3MtPmVsZW1zW2pdLmluZGV4XSA8IDApCisgICAgICBpZiAoZC0+dG9rZW5zW3Mt PmVsZW1zW2pdLmluZGV4XSA8PSBFTkQpCiAgICAgICAgIHsKICAgICAgICAgICBpZiAoc3VjY2Vl ZHNfaW5fY29udGV4dCAoYywgY29udGV4dCwgQ1RYX0FOWSkpCiAgICAgICAgICAgICBjb25zdHJh aW50IHw9IGM7CkBAIC0yMzgwLDYgKzIzODksOCBAQCBkZmFhbmFseXplIChzdHJ1Y3QgZGZhICpk LCBib29sIHNlYXJjaGZsYWcpCiAKICAgcG9zaXRpb25fc2V0IG1lcmdlZDsgICAgICAgICAgLyog UmVzdWx0IG9mIG1lcmdpbmcgc2V0cy4gICovCiAKKyAgYWRkdG9rIChkLCBDQVQpOworCiAjaWZk ZWYgREVCVUcKICAgZnByaW50ZiAoc3RkZXJyLCAiZGZhYW5hbHl6ZTpcbiIpOwogICBmb3IgKHNp emVfdCBpID0gMDsgaSA8IGQtPnRpbmRleDsgKytpKQpAQCAtMjU0MCwyNiArMjU1MSwyMCBAQCBk ZmFhbmFseXplIChzdHJ1Y3QgZGZhICpkLCBib29sIHNlYXJjaGZsYWcpCiAgICAgICB9CiAjZW5k aWYKIAotICAvKiBHZXQgdGhlIGVwc2lsb24gY2xvc3VyZSBvZiB0aGUgZmlyc3Rwb3Mgb2YgdGhl IHJlZ2V4cC4gIFRoZSByZXN1bHQgd2lsbAotICAgICBiZSB0aGUgc2V0IG9mIHBvc2l0aW9ucyBv ZiBzdGF0ZSAwLiAgKi8KLSAgbWVyZ2VkLm5lbGVtID0gMDsKLSAgZm9yIChzaXplX3QgaSA9IDA7 IGkgPCBzdGtbLTFdLm5maXJzdHBvczsgKytpKQotICAgIGluc2VydCAoZmlyc3Rwb3NbaV0sICZt ZXJnZWQpOwotCiAgIC8qIEZvciBlYWNoIGZvbGxvdyBzZXQgdGhhdCBpcyB0aGUgZm9sbG93IHNl dCBvZiBhIHJlYWwgcG9zaXRpb24sIHJlcGxhY2UKICAgICAgaXQgd2l0aCBpdHMgZXBzaWxvbiBj bG9zdXJlLiAgKi8KLSAgZXBzY2xvc3VyZSAoJm1lcmdlZCwgZCk7CisgIGVwc2Nsb3N1cmUgKCZk LT5mb2xsb3dzWzBdLCBkKTsKIAogICAvKiBDb250ZXh0IHdhbnRlZCBieSBzb21lIHBvc2l0aW9u LiAgKi8KLSAgaW50IHNlcGFyYXRlX2NvbnRleHRzID0gc3RhdGVfc2VwYXJhdGVfY29udGV4dHMg KCZtZXJnZWQpOworICBpbnQgc2VwYXJhdGVfY29udGV4dHMgPSBzdGF0ZV9zZXBhcmF0ZV9jb250 ZXh0cyAoJmQtPmZvbGxvd3NbMF0pOwogCiAgIC8qIEJ1aWxkIHRoZSBpbml0aWFsIHN0YXRlLiAg Ki8KICAgaWYgKHNlcGFyYXRlX2NvbnRleHRzICYgQ1RYX05FV0xJTkUpCi0gICAgc3RhdGVfaW5k ZXggKGQsICZtZXJnZWQsIENUWF9ORVdMSU5FKTsKKyAgICBzdGF0ZV9pbmRleCAoZCwgJmQtPmZv bGxvd3NbMF0sIENUWF9ORVdMSU5FKTsKICAgZC0+aW5pdHN0YXRlX25vdGJvbCA9IGQtPm1pbl90 cmNvdW50Ci0gICAgPSBzdGF0ZV9pbmRleCAoZCwgJm1lcmdlZCwgc2VwYXJhdGVfY29udGV4dHMg XiBDVFhfQU5ZKTsKKyAgICA9IHN0YXRlX2luZGV4IChkLCAmZC0+Zm9sbG93c1swXSwgc2VwYXJh dGVfY29udGV4dHMgXiBDVFhfQU5ZKTsKICAgaWYgKHNlcGFyYXRlX2NvbnRleHRzICYgQ1RYX0xF VFRFUikKLSAgICBkLT5taW5fdHJjb3VudCA9IHN0YXRlX2luZGV4IChkLCAmbWVyZ2VkLCBDVFhf TEVUVEVSKTsKKyAgICBkLT5taW5fdHJjb3VudCA9IHN0YXRlX2luZGV4IChkLCAmZC0+Zm9sbG93 c1swXSwgQ1RYX0xFVFRFUik7CiAgIGQtPm1pbl90cmNvdW50Kys7CiAgIGQtPnRyY291bnQgPSAw OwogCkBAIC0zNzgzLDcgKzM3ODgsNyBAQCBkZmFtdXN0IChzdHJ1Y3QgZGZhIGNvbnN0ICpkKQog ICBib29sIG5lZWRfZW5kbGluZSA9IGZhbHNlOwogICBib29sIGNhc2VfZm9sZF91bmlieXRlID0g ZC0+c3ludGF4LmNhc2VfZm9sZCAmJiBNQl9DVVJfTUFYID09IDE7CiAKLSAgZm9yIChzaXplX3Qg cmkgPSAwOyByaSA8IGQtPnRpbmRleDsgKytyaSkKKyAgZm9yIChzaXplX3QgcmkgPSAxOyByaSA8 IGQtPnRpbmRleCAtIDE7ICsrcmkpCiAgICAgewogICAgICAgdG9rZW4gdCA9IGQtPnRva2Vuc1ty aV07CiAgICAgICBzd2l0Y2ggKHQpCi0tIAoxLjcuMQoK --------_58832584000000008786_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0002-dfa-optmization-of-alternation-in-NFA.patch" Content-Disposition: attachment; filename="0002-dfa-optmization-of-alternation-in-NFA.patch" Content-Transfer-Encoding: base64 RnJvbSA0NzI4Mjg2MzI1ODM0ZGQwMjAyNmJmMTIzNGM5ZDQ4MWE1ZGU3ZWU1IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBNb24sIDE3IFNlcCAyMDE4IDIyOjQ2OjI1ICswOTAwClN1YmplY3Q6IFtQQVRDSCAyLzJd IGRmYTogb3B0bWl6YXRpb24gb2YgYWx0ZXJuYXRpb24gaW4gTkZBCgpFdmVuIHdoZW4gc2ltaWxl ciBzdGF0ZXMgZXhpc3RzIGluIGFsdGVybmF0aW9uLCBERkEgdHJlYXRzIHRoZW0gYXMKc2VwYXJh dGUgaXRlbXMuICBJdCBtYXkgY29tcGxpY2F0ZSB0aGUgdHJhbnNpdGlvbiBpbiBORkEgYW5kIGNh dXNlCnNsb3dkb3duLiAgVGhpcyBjaGFuZ2UgYXNzZW1ibGVzIHRoZSBzdGF0ZXMgaW50byBvbmUu ICBGb3IgZXhhbXBsZSwKYWJ8YWMgaXMgY2hhbmdlZCBpbnRvIGEoYnxjKS4gIFRoaXMgY2hhbmdl IHNwZWVkcy11cCBtYXRjaGluZyBmb3IKbWFueSBicmFuY2hlZCBwYXR0ZXJuLiAgRm9yIGV4YW1w bGUsIGdyZXAgc3BlZWRzLXVwIG1vcmUgdGhhbiAzMHgKaW4gZm9sbG93aW5nIGNhc2UuCgogIHNl cSAxMDAwMCB8IHNlZCAncy8kLyBhYmNkZWZnaGlqa2xtbm9wcXJzdHV2d3h5ei87IHMvJC8uLycg PmluCiAgdGltZSAtcCBlbnYgTENfQUxMPUMgZ3JlcCAtdmYgaW4gaW4KCmRmYS5jIChwcnVuZSk6 IE5ldyBmdW5jdGlvbi4KKG1lcmdlX25mYV9zdGF0ZSk6IE5ldyBmdW5jdGlvbi4gIEl0IG1lcmdl cyBzYW1lIE5GQSBzdGF0ZXMuCihkZmFvcHRpbWl6ZSk6IE5ldyBmdW5jdGlvbi4gIEl0IHNlZWtz IG1lcmdlZCBhbmQgcmVtb3ZlZCBub2Rlcy4KKGRmYWFuYWx5emUpOiBDYWxsIG5ldyBmdW5jdGlv bi4KKGRmYXV0Zjhub3NzKTogQ2hhbmdlIG5hbWUgZnJvbSBkZmFvcHRpbWl6ZSBiZWNhdXNlIG9m IGFkZGl0aW9uIG9mIG5ldwpmdW5jdGlvbi4KKGRmYWNvbXApOiBVcGRhdGUgY2FsbGVyLgotLS0K IGxpYi9kZmEuYyB8ICAxNDYgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrLQogMSBmaWxlcyBjaGFuZ2VkLCAxNDQgaW5zZXJ0aW9ucygr KSwgMiBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9saWIvZGZhLmMgYi9saWIvZGZhLmMKaW5k ZXggYzBiMjRmYy4uMzk5MTExMiAxMDA2NDQKLS0tIGEvbGliL2RmYS5jCisrKyBiL2xpYi9kZmEu YwpAQCAtMjEyMSw2ICsyMTIxLDIyIEBAIGRlbGV0ZSAoc2l6ZV90IGRlbCwgcG9zaXRpb25fc2V0 ICpzKQogICByZXR1cm4gMDsKIH0KIAorc3RhdGljIHZvaWQKK3BydW5lIChwb3NpdGlvbl9zZXQg KnMpCit7CisgIHNpemVfdCBkID0gMDsKKworICBmb3IgKHNpemVfdCBpID0gMDsgaSA8IHMtPm5l bGVtOyBpKyspCisgICAgeworICAgICAgaWYgKHMtPmVsZW1zW2ldLmNvbnN0cmFpbnQgPT0gMCkK KyAgICAgICAgY29udGludWU7CisKKyAgICAgIHMtPmVsZW1zW2QrK10gPSBzLT5lbGVtc1tpXTsK KyAgICB9CisKKyAgcy0+bmVsZW0gPSBkOworfQorCiAvKiBSZXBsYWNlIGEgcG9zaXRpb24gd2l0 aCB0aGUgZm9sbG93ZWQgc2V0LiAgKi8KIHN0YXRpYyB2b2lkCiByZXBsYWNlIChwb3NpdGlvbl9z ZXQgKmRzdCwgc2l6ZV90IGRlbCwgcG9zaXRpb25fc2V0ICphZGQsCkBAIC0yMzE0LDYgKzIzMzAs MTMwIEBAIHN0YXRlX3NlcGFyYXRlX2NvbnRleHRzIChwb3NpdGlvbl9zZXQgY29uc3QgKnMpCiAg IHJldHVybiBzZXBhcmF0ZV9jb250ZXh0czsKIH0KIAorZW51bQoreworICAvKiBTaW5nbGUgdG9r ZW4gaXMgcmVwZWF0ZWQuICBJdCBpcyBkaXN0aW5ndWlzaGVkIGZyb20gbm9uLXJlcGVhdGVkLiAg Ki8KKyAgT1BUX1JFUEVBVCA9ICgxIDw8IDApLAorCisgIC8qIE11bHRpcGxlIHRva2VucyBhcmUg cmVwZWF0ZWQuICBUaGlzIGZsYWcgaXMgb24gYXQgaGVhZCBvZiB0b2tlbnMuICBUaGUKKyAgICAg bm9kZSBpcyBub3QgbWVyZ2VkLiAgKi8KKyAgT1BUX0xQQVJFTiA9ICgxIDw8IDEpLAorCisgIC8q IE11bHRpcGxlIGJyYW5jaGVzIGFyZSBqb2luZWQuICBUaGUgbm9kZSBpcyBub3QgbWVyZ2VkLiAg Ki8KKyAgT1BUX1JQQVJFTiA9ICgxIDw8IDIpLAorCisgIC8qIFRoZSBub2RlIGlzIHdhbGtlZC4g IElmIHRoZSBub2RlIGlzIGZvdW5kIGluIHdhbGtpbmcgYWdhaW4sIE9QVF9SUEFSRU4KKyAgICAg ZmxhZyBpcyB0dXJuZWQgb24uICovCisgIE9QVF9XQUxLRUQgPSAoMSA8PCAzKSwKKworICAvKiBU aGUgbm9kZSBpcyBxdWV1ZWQuICBUaGUgbm9kZSBpcyBub3QgcXVldWVkIGFnYWluLiAgKi8KKyAg T1BUX1FVRVVFRCA9ICgxIDw8IDQpCit9OworCitzdGF0aWMgc2l6ZV90CittZXJnZV9uZmFfc3Rh dGUgKHN0cnVjdCBkZmEgKmQsIHNpemVfdCAqcXVldWUsIHNpemVfdCBucXVldWUsIHNpemVfdCB0 aW5kZXgsCisgICAgICAgICAgICAgICAgIGludCAqZmxhZ3MsIHBvc2l0aW9uX3NldCAqbWVyZ2Vk KQoreworICBzaXplX3Qgc2luZGV4OworICBzaXplX3QgZGluZGV4OworCisgIGZvciAoc2l6ZV90 IGkgPSAwOyBpIDwgZC0+Zm9sbG93c1t0aW5kZXhdLm5lbGVtOyBpKyspCisgICAgeworICAgICAg ZGluZGV4ID0gZC0+Zm9sbG93c1t0aW5kZXhdLmVsZW1zW2ldLmluZGV4OworCisgICAgICAvKiBT a2lwIHRoZSBub2RlIGFzIHBydW5lZCBpbiBmdXR1cmUuICAqLworICAgICAgaWYgKGQtPmZvbGxv d3NbdGluZGV4XS5lbGVtc1tpXS5jb25zdHJhaW50ID09IDApCisgICAgICAgIGNvbnRpbnVlOwor CisgICAgICBpZiAodGluZGV4IDwgZGluZGV4ICYmICEoZmxhZ3NbZGluZGV4XSAmIE9QVF9RVUVV RUQpKQorICAgICAgICB7CisgICAgICAgICAgcXVldWVbbnF1ZXVlKytdID0gZGluZGV4OworICAg ICAgICAgIGZsYWdzW2RpbmRleF0gfD0gT1BUX1FVRVVFRDsKKyAgICAgICAgfQorCisgICAgICBp ZiAoZmxhZ3NbZGluZGV4XSAmIChPUFRfTFBBUkVOIHwgT1BUX1JQQVJFTikpCisgICAgICAgIGNv bnRpbnVlOworCisgICAgICBmb3IgKHNpemVfdCBqID0gaSArIDE7IGogPCBkLT5mb2xsb3dzW3Rp bmRleF0ubmVsZW07IGorKykKKyAgICAgICAgeworICAgICAgICAgIHNpbmRleCA9IGQtPmZvbGxv d3NbdGluZGV4XS5lbGVtc1tqXS5pbmRleDsKKworICAgICAgICAgIGlmIChkLT5mb2xsb3dzW3Rp bmRleF0uZWxlbXNbal0uY29uc3RyYWludCA9PSAwKQorICAgICAgICAgICAgY29udGludWU7CisK KyAgICAgICAgICBpZiAoZmxhZ3Nbc2luZGV4XSAmIChPUFRfTFBBUkVOIHwgT1BUX1JQQVJFTikp CisgICAgICAgICAgICBjb250aW51ZTsKKworICAgICAgICAgIGlmIChkLT50b2tlbnNbc2luZGV4 XSAhPSBkLT50b2tlbnNbZGluZGV4XSkKKyAgICAgICAgICAgIGNvbnRpbnVlOworCisgICAgICAg ICAgaWYgKGQtPmZvbGxvd3NbdGluZGV4XS5lbGVtc1tpXS5jb25zdHJhaW50ICE9CisgICAgICAg ICAgICAgIGQtPmZvbGxvd3NbdGluZGV4XS5lbGVtc1tqXS5jb25zdHJhaW50KQorICAgICAgICAg ICAgY29udGludWU7CisKKyAgICAgICAgICBpZiAoKGZsYWdzW3NpbmRleF0gXiBmbGFnc1tkaW5k ZXhdKSAmIE9QVF9SRVBFQVQpCisgICAgICAgICAgICBjb250aW51ZTsKKworICAgICAgICAgIGlm IChmbGFnc1tzaW5kZXhdICYgT1BUX1JFUEVBVCkKKyAgICAgICAgICAgIGRlbGV0ZSAoc2luZGV4 LCAmZC0+Zm9sbG93c1tzaW5kZXhdKTsKKworICAgICAgICAgIG1lcmdlICgmZC0+Zm9sbG93c1tk aW5kZXhdLCAmZC0+Zm9sbG93c1tzaW5kZXhdLCBtZXJnZWQpOworICAgICAgICAgIGNvcHkgKG1l cmdlZCwgJmQtPmZvbGxvd3NbZGluZGV4XSk7CisKKyAgICAgICAgICAvKiBNYXJrIGl0IGFzIHBy dW5lZCBpbiBmdXR1cmUuICAqLworICAgICAgICAgIGQtPmZvbGxvd3NbdGluZGV4XS5lbGVtc1tq XS5jb25zdHJhaW50ID0gMDsKKyAgICAgICAgfQorICAgIH0KKworICBwcnVuZSAoJmQtPmZvbGxv d3NbdGluZGV4XSk7CisKKyAgcmV0dXJuIG5xdWV1ZTsKK30KKworc3RhdGljIHZvaWQKK2RmYW9w dGltaXplIChzdHJ1Y3QgZGZhICpkKQoreworICBpbnQgKmZsYWdzOworICBzaXplX3QgKnF1ZXVl OworICBzaXplX3QgbnF1ZXVlOworICBwb3NpdGlvbl9zZXQgbWVyZ2VkMDsKKyAgcG9zaXRpb25f c2V0ICptZXJnZWQ7CisKKyAgZmxhZ3MgPSB4bm1hbGxvYyAoZC0+dGluZGV4LCBzaXplb2YgKmZs YWdzKTsKKyAgcXVldWUgPSB4bm1hbGxvYyAoZC0+bmxlYXZlcywgc2l6ZW9mICpxdWV1ZSk7CisK KyAgZm9yIChzaXplX3QgaSA9IDA7IGkgPCBkLT50aW5kZXg7ICsraSkKKyAgICBmbGFnc1tpXSA9 IDA7CisKKyAgZm9yIChzaXplX3QgaSA9IDA7IGkgPCBkLT50aW5kZXg7ICsraSkKKyAgICB7Cisg ICAgICBmb3IgKHNpemVfdCBqID0gMDsgaiA8IGQtPmZvbGxvd3NbaV0ubmVsZW07IGorKykKKyAg ICAgICAgeworICAgICAgICAgIGlmIChkLT5mb2xsb3dzW2ldLmVsZW1zW2pdLmluZGV4ID09IGkp CisgICAgICAgICAgICBmbGFnc1tkLT5mb2xsb3dzW2ldLmVsZW1zW2pdLmluZGV4XSB8PSBPUFRf UkVQRUFUOworICAgICAgICAgIGVsc2UgaWYgKGQtPmZvbGxvd3NbaV0uZWxlbXNbal0uaW5kZXgg PCBpKQorICAgICAgICAgICAgZmxhZ3NbZC0+Zm9sbG93c1tpXS5lbGVtc1tqXS5pbmRleF0gfD0g T1BUX0xQQVJFTjsKKyAgICAgICAgICBlbHNlIGlmIChmbGFnc1tkLT5mb2xsb3dzW2ldLmVsZW1z W2pdLmluZGV4XSAmPQorICAgICAgICAgICAgICAgICAgIE9QVF9XQUxLRUQpCisgICAgICAgICAg ICBmbGFnc1tkLT5mb2xsb3dzW2ldLmVsZW1zW2pdLmluZGV4XSB8PSBPUFRfUlBBUkVOOworICAg ICAgICAgIGVsc2UKKyAgICAgICAgICAgIGZsYWdzW2QtPmZvbGxvd3NbaV0uZWxlbXNbal0uaW5k ZXhdIHw9IE9QVF9XQUxLRUQ7CisgICAgICAgIH0KKyAgICB9CisKKyAgbWVyZ2VkID0gJm1lcmdl ZDA7CisgIGFsbG9jX3Bvc2l0aW9uX3NldCAobWVyZ2VkLCBkLT5ubGVhdmVzKTsKKworICBucXVl dWUgPSAwOworICBxdWV1ZVtucXVldWUrK10gPSAwOworCisgIGZvciAoc2l6ZV90IGkgPSAwOyBp IDwgbnF1ZXVlOyBpKyspCisgICAgbnF1ZXVlID0gbWVyZ2VfbmZhX3N0YXRlIChkLCBxdWV1ZSwg bnF1ZXVlLCBxdWV1ZVtpXSwgZmxhZ3MsIG1lcmdlZCk7CisKKyAgZnJlZSAobWVyZ2VkLT5lbGVt cyk7CisgIGZyZWUgKHF1ZXVlKTsKKyAgZnJlZSAoZmxhZ3MpOworfQogCiAvKiBQZXJmb3JtIGJv dHRvbS11cCBhbmFseXNpcyBvbiB0aGUgcGFyc2UgdHJlZSwgY29tcHV0aW5nIHZhcmlvdXMgZnVu Y3Rpb25zLgogICAgTm90ZSB0aGF0IGF0IHRoaXMgcG9pbnQsIHdlJ3JlIHByZXRlbmRpbmcgY29u c3RydWN0cyBsaWtlIFw8IGFyZSByZWFsCkBAIC0yNTMzLDYgKzI2NzMsOCBAQCBkZmFhbmFseXpl IChzdHJ1Y3QgZGZhICpkLCBib29sIHNlYXJjaGZsYWcpCiAjZW5kaWYKICAgICB9CiAKKyAgZGZh b3B0aW1pemUgKGQpOworCiAjaWZkZWYgREVCVUcKICAgZm9yIChzaXplX3QgaSA9IDA7IGkgPCBk LT50aW5kZXg7ICsraSkKICAgICBpZiAoZC0+dG9rZW5zW2ldIDwgTk9UQ0hBUiB8fCBkLT50b2tl bnNbaV0gPT0gQkFDS1JFRgpAQCAtMzM1Myw3ICszNDk1LDcgQEAgZGZhX3N1cHBvcnRlZCAoc3Ry dWN0IGRmYSBjb25zdCAqZCkKIH0KIAogc3RhdGljIHZvaWQKLWRmYW9wdGltaXplIChzdHJ1Y3Qg ZGZhICpkKQorZGZhdXRmOG5vc3MgKHN0cnVjdCBkZmEgKmQpCiB7CiAgIGlmICghZC0+bG9jYWxl aW5mby51c2luZ191dGY4KQogICAgIHJldHVybjsKQEAgLTM0ODEsNyArMzYyMyw3IEBAIGRmYWNv bXAgKGNoYXIgY29uc3QgKnMsIHNpemVfdCBsZW4sIHN0cnVjdCBkZmEgKmQsIGJvb2wgc2VhcmNo ZmxhZykKIAogICBpZiAoZGZhX3N1cHBvcnRlZCAoZCkpCiAgICAgewotICAgICAgZGZhb3B0aW1p emUgKGQpOworICAgICAgZGZhdXRmOG5vc3MgKGQpOwogICAgICAgZGZhYW5hbHl6ZSAoZCwgc2Vh cmNoZmxhZyk7CiAgICAgfQogICBlbHNlCi0tIAoxLjcuMQoK --------_58832584000000008786_MULTIPART_MIXED_-- ------------=_1537330382-2711-1-- From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 19 Sep 2018 05:14:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Norihiro Tanaka Cc: 32750@debbugs.gnu.org Received: via spool by 32750-submit@debbugs.gnu.org id=B32750.15373340389232 (code B ref 32750); Wed, 19 Sep 2018 05:14:02 +0000 Received: (at 32750) by debbugs.gnu.org; 19 Sep 2018 05:13:58 +0000 Received: from localhost ([127.0.0.1]:45018 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2UoI-0002Oo-IV for submit@debbugs.gnu.org; Wed, 19 Sep 2018 01:13:58 -0400 Received: from mail-wm1-f46.google.com ([209.85.128.46]:35171) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2UoG-0002Oa-Gi for 32750@debbugs.gnu.org; Wed, 19 Sep 2018 01:13:57 -0400 Received: by mail-wm1-f46.google.com with SMTP id o18-v6so5265221wmc.0 for <32750@debbugs.gnu.org>; Tue, 18 Sep 2018 22:13:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RWywzXoxRzFbT4qAIvxskPkIK+YLrmMVHWXS6N2GdlE=; b=UfFdVCOmGb/gkcneZXF0DYglLDDPyPo/cxzEk+/WdtDLbPJX0Pml01dvAk+isIEF1y 2TDPOYDUfF/d3EtINYjHDaZqOpvrq/hpZFr2v3SwpGqq+L3f/03MtOSA9+21XFzEGp1Y GKn6GaJNBlT5TkuN4jMY39hzwbLY5VDpgkUvXZVa7peRrqzXfWZ0xk8/2t4LexS6Xs9f U2p3UpbKXg8wc1gW3hMYA3r03ELpmu+AbLxX5m8fHgaKS8v3+x+kbMEMmMo3QZxAOQnx vBHU8ENpfai8/vIJmRxKeoj8oGG+d7EkefpqjMh/c8t5puM30LAA0dDsR5hUCckhyM7K BnGA== X-Gm-Message-State: APzg51CwRBYQAeYGDS/WGuDiwywiqbCOgwxk73kuy0l0+8jRafJLe8+h cZkXEVXcFJYVfbpvEobNv4xHGL35O1ZtEwnNG1o= X-Google-Smtp-Source: ANB0VdZhqShFDpbTMQqToY2ZgUNAsQcx6zrWpBMIbURX60pBfla+YrE4VcCbVpEGgVKMYevd/FlBsC2jEFOpTtpGvrE= X-Received: by 2002:a1c:f913:: with SMTP id x19-v6mr19248129wmh.63.1537334030815; Tue, 18 Sep 2018 22:13:50 -0700 (PDT) MIME-Version: 1.0 References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> In-Reply-To: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Tue, 18 Sep 2018 22:13:38 -0700 Message-ID: Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Mon, Sep 17, 2018 at 7:16 AM Norihiro Tanaka wrote: > Even when similer states exists in alternation, DFA treats them as > separate items. It may complicate the transition in NFA and cause > slowdown. This change assembles the states into one. > > For example, ab|ac is changed into a(b|c). > > This change speeds-up matching for many branched pattern. For > example, grep speeds-up more than 30x in following case. > > seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in > time -p env LC_ALL=C grep -vf in in > > (before) real 63.43 user 61.67 system 1.65 > (after) real 1.64 user 1.61 system 0.03 > > # If we do not add '.' at last in pattern, not dfa but kwset is used. > > grep also speeds-up about 3x in following case. > > time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words > > (before) real 2.48 user 2.09 system 0.38 > (after) real 7.69 user 6.32 system 1.29 Thank you for the patches. However, the before/after numbers you show here suggest that the "after" code takes more than triple of the time of "before". Also, when I compared grep compiled at 123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your changes) and that compiled at the prior commit, 9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version takes over 30 seconds, while the prior one took about 20 seconds. FTR, I used gcc version 9.0.0 20180912, compiling with -O3. From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 19 Sep 2018 07:49:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Jim Meyering , Norihiro Tanaka Cc: 32750@debbugs.gnu.org Received: via spool by 32750-submit@debbugs.gnu.org id=B32750.153734333225371 (code B ref 32750); Wed, 19 Sep 2018 07:49:02 +0000 Received: (at 32750) by debbugs.gnu.org; 19 Sep 2018 07:48:52 +0000 Received: from localhost ([127.0.0.1]:45094 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2XEB-0006b9-LV for submit@debbugs.gnu.org; Wed, 19 Sep 2018 03:48:51 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:36968) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2XEA-0006as-3H for 32750@debbugs.gnu.org; Wed, 19 Sep 2018 03:48:50 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id AEE0716162D; Wed, 19 Sep 2018 00:48:43 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id mOiIk4pOGINh; Wed, 19 Sep 2018 00:48:42 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E647C16162E; Wed, 19 Sep 2018 00:48:42 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id E930lEvyH8n6; Wed, 19 Sep 2018 00:48:42 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id C176816162D; Wed, 19 Sep 2018 00:48:42 -0700 (PDT) References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <42232272-8810-7976-a287-ebc735a9b7f9@cs.ucla.edu> Date: Wed, 19 Sep 2018 00:48:42 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Jim Meyering wrote: >> seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in >> time -p env LC_ALL=C grep -vf in in >> >> (before) real 63.43 user 61.67 system 1.65 >> (after) real 1.64 user 1.61 system 0.03 >> >> # If we do not add '.' at last in pattern, not dfa but kwset is used. >> >> grep also speeds-up about 3x in following case. >> >> time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words >> >> (before) real 2.48 user 2.09 system 0.38 >> (after) real 7.69 user 6.32 system 1.29 > Thank you for the patches. > However, the before/after numbers you show here suggest that the > "after" code takes more than triple of the time of "before". > > Also, when I compared grep compiled at > 123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your > changes) and that compiled at the prior commit, > 9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version > takes over 30 seconds, while the prior one took about 20 seconds. Is that last pair of times for the second benchmark he gave? I confess to being lazy and not trying that benchmark, as I was on an Ubuntu system that didn't have the linux.words file. Did you try the first benchmark? On my Ubuntu 18.04 x86-64 (Xeon E3-1225 V2) platform, I got (before) real 55.51 user 51.53 sys 3.98, (after) real 0.64 user 0.60 sys 0.04, so the change is a big performance win there. On the other hand, I just now did the 2nd benchmark with a copy of a Fedora 28 linux.words file, and got (before) real 8.06 user 6.20 sys 1.85, (after) real 21.69 user 21.21 sys 0.47, so it's about three times slower. Ouch. From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 19 Sep 2018 15:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Jim Meyering Cc: 32750@debbugs.gnu.org Received: via spool by 32750-submit@debbugs.gnu.org id=B32750.153737067517419 (code B ref 32750); Wed, 19 Sep 2018 15:25:01 +0000 Received: (at 32750) by debbugs.gnu.org; 19 Sep 2018 15:24:35 +0000 Received: from localhost ([127.0.0.1]:45871 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2eLC-0004Ws-Or for submit@debbugs.gnu.org; Wed, 19 Sep 2018 11:24:34 -0400 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:40528) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2eLA-0004WW-AW for 32750@debbugs.gnu.org; Wed, 19 Sep 2018 11:24:33 -0400 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 390324A0846 for <32750@debbugs.gnu.org>; Thu, 20 Sep 2018 00:24:25 +0900 (JST) X-matriXscan-loop-detect: 08a2db7e71a703906d2cc4b51c202d3c153d7071 Received: from mail04.kcn.ne.jp ([61.86.6.183]) by mxs01-s with ESMTP; Thu, 20 Sep 2018 00:24:24 +0900 (JST) Received: from [10.120.1.71] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail04.kcn.ne.jp (Postfix) with ESMTPA id F2AC212900BC; Thu, 20 Sep 2018 00:24:23 +0900 (JST) Date: Thu, 20 Sep 2018 00:24:22 +0900 From: Norihiro Tanaka In-Reply-To: References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> Message-Id: <20180920002420.712D.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5BA26964000000007132_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.73 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --------_5BA26964000000007132_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Tue, 18 Sep 2018 22:13:38 -0700 Jim Meyering wrote: > Also, when I compared grep compiled at > 123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your > changes) and that compiled at the prior commit, > 9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version > takes over 30 seconds, while the prior one took about 20 seconds. > > FTR, I used gcc version 9.0.0 20180912, compiling with -O3. Thanks for your investigation. Sorry, I forgot to send the patch. We need the patch to optimize MERGE function to speed-up for some cases. Thanks, Norihiro --------_5BA26964000000007132_MULTIPART_MIXED_ Content-Type: application/octet-stream; name="0001-dfa-optimization-for-state-merge.patch" Content-Disposition: attachment; filename="0001-dfa-optimization-for-state-merge.patch" Content-Transfer-Encoding: base64 RnJvbSBiMzdjODc4NTQ4YTdhZDUzZDk0YTIwODlhYTIwMGFjYWIyMTVmYTBjIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDIwIFNlcCAyMDE4IDAwOjE4OjExICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBvcHRpbWl6YXRpb24gZm9yIHN0YXRlIG1lcmdlCgpkZmEuYyAobWVyZ2UyKTogQWRkIG5ldyBm dW5jdGlvbi4KKG1lcmdlX25mYV9zdGF0ZSk6IFVwZGF0ZSBjYWxsZXIuCi0tLQogbGliL2RmYS5j IHwgICAxOCArKysrKysrKysrKysrKysrLS0KIDEgZmlsZXMgY2hhbmdlZCwgMTYgaW5zZXJ0aW9u cygrKSwgMiBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9saWIvZGZhLmMgYi9saWIvZGZhLmMK aW5kZXggMTNiMmEwYi4uZDA0YmNiZSAxMDA2NDQKLS0tIGEvbGliL2RmYS5jCisrKyBiL2xpYi9k ZmEuYwpAQCAtMjEwMiw2ICsyMTAyLDIxIEBAIG1lcmdlIChwb3NpdGlvbl9zZXQgY29uc3QgKnMx LCBwb3NpdGlvbl9zZXQgY29uc3QgKnMyLCBwb3NpdGlvbl9zZXQgKm0pCiAgIG1lcmdlX2NvbnN0 cmFpbmVkIChzMSwgczIsIC0xLCBtKTsKIH0KIAorc3RhdGljIHZvaWQKK21lcmdlMiAocG9zaXRp b25fc2V0ICpkc3QsIHBvc2l0aW9uX3NldCBjb25zdCAqc3JjLCBwb3NpdGlvbl9zZXQgKm0pCit7 CisgIGlmIChzcmMtPm5lbGVtIDwgNCkKKyAgICB7CisgICAgICBmb3IgKHB0cmRpZmZfdCBpID0g MDsgaSA8IHNyYy0+bmVsZW07ICsraSkKKyAgICAgICAgaW5zZXJ0IChzcmMtPmVsZW1zW2ldLCBk c3QpOworICAgIH0KKyAgIGVsc2UKKyAgICB7CisgICAgICBtZXJnZSAoc3JjLCBkc3QsIG0pOwor ICAgICAgY29weSAobSwgZHN0KTsKKyAgICB9Cit9CisKIC8qIERlbGV0ZSBhIHBvc2l0aW9uIGZy b20gYSBzZXQuICBSZXR1cm4gdGhlIG5vbnplcm8gY29uc3RyYWludCBvZiB0aGUKICAgIGRlbGV0 ZWQgcG9zaXRpb24sIG9yIHplcm8gaWYgdGhlcmUgd2FzIG5vIHN1Y2ggcG9zaXRpb24uICAqLwog c3RhdGljIHVuc2lnbmVkIGludApAQCAtMjM4OCw4ICsyNDAzLDcgQEAgbWVyZ2VfbmZhX3N0YXRl IChzdHJ1Y3QgZGZhICpkLCBzaXplX3QgKnF1ZXVlLCBwdHJkaWZmX3QgbnF1ZXVlLCBzaXplX3Qg dGluZGV4LAogICAgICAgICAgIGlmIChmbGFnc1tzaW5kZXhdICYgT1BUX1JFUEVBVCkKICAgICAg ICAgICAgIGRlbGV0ZSAoc2luZGV4LCAmZm9sbG93c1tzaW5kZXhdKTsKIAotICAgICAgICAgIG1l cmdlICgmZm9sbG93c1tkaW5kZXhdLCAmZm9sbG93c1tzaW5kZXhdLCBtZXJnZWQpOwotICAgICAg ICAgIGNvcHkgKG1lcmdlZCwgJmZvbGxvd3NbZGluZGV4XSk7CisgICAgICAgICAgbWVyZ2UyICgm Zm9sbG93c1tkaW5kZXhdLCAmZm9sbG93c1tzaW5kZXhdLCBtZXJnZWQpOwogCiAgICAgICAgICAg LyogTWFyayBpdCBhcyBwcnVuZWQgaW4gZnV0dXJlLiAgKi8KICAgICAgICAgICBmb2xsb3dzW3Rp bmRleF0uZWxlbXNbal0uY29uc3RyYWludCA9IDA7Ci0tIAoxLjcuMQoK --------_5BA26964000000007132_MULTIPART_MIXED_-- From unknown Tue Jun 24 20:52:07 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 19 Sep 2018 15:46:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32750 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Norihiro Tanaka , Jim Meyering Cc: 32750@debbugs.gnu.org Received: via spool by 32750-submit@debbugs.gnu.org id=B32750.153737195419888 (code B ref 32750); Wed, 19 Sep 2018 15:46:01 +0000 Received: (at 32750) by debbugs.gnu.org; 19 Sep 2018 15:45:54 +0000 Received: from localhost ([127.0.0.1]:45876 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2efq-0005Ai-Ha for submit@debbugs.gnu.org; Wed, 19 Sep 2018 11:45:54 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:56640) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g2efo-0005AO-5f for 32750@debbugs.gnu.org; Wed, 19 Sep 2018 11:45:53 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id D37C8160EC0; Wed, 19 Sep 2018 08:45:45 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id pTUWO38c9AK5; Wed, 19 Sep 2018 08:45:44 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 8FA781613F9; Wed, 19 Sep 2018 08:45:44 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id mgx3qJ2SXmm1; Wed, 19 Sep 2018 08:45:44 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 6857D160EC0; Wed, 19 Sep 2018 08:45:44 -0700 (PDT) References: <20180917231311.1BE7.27F6AC2D@kcn.ne.jp> <20180920002420.712D.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <9ac51bef-427d-6c60-e647-dab253622d8c@cs.ucla.edu> Date: Wed, 19 Sep 2018 08:45:44 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180920002420.712D.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------7628C1A21B1F2C85457AE687" Content-Language: en-US X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------7628C1A21B1F2C85457AE687 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Norihiro Tanaka wrote: > Sorry, I forgot to send the patch. We need the patch to optimize MERGE > function to speed-up for some cases. Thanks, that improved the performance of the 'grep -vf linux.words linux.words' benchmark from (before the recent changes) real 8.06 user 6.20 sys 1.85 to (after) real 2.57 user 2.11 sys 0.45. I installed it (with minor tweaks to the ChangeLog) into gnulib master and updated the grep master accordingly. I'll CC: this email to bug-gnulib to give them a heads-up, attaching the revised patch. --------------7628C1A21B1F2C85457AE687 Content-Type: text/x-patch; name="0001-dfa-optimization-for-state-merge.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-dfa-optimization-for-state-merge.patch" >From 617a6097466d818e425609eff9ed85039aea25db Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Wed, 19 Sep 2018 08:35:49 -0700 Subject: [PATCH] dfa: optimization for state merge * lib/dfa.c (merge2): New function. (merge_nfa_state): Use it. --- ChangeLog | 6 ++++++ lib/dfa.c | 18 ++++++++++++++++-- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/ChangeLog b/ChangeLog index 181039c1a..b9d7b30d6 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +2018-09-19 Norihiro Tanaka + + dfa: optimization for state merge + * lib/dfa.c (merge2): New function. + (merge_nfa_state): Use it. + 2018-09-18 Jim Meyering dfa: trivial comment fix: s/is/if/ diff --git a/lib/dfa.c b/lib/dfa.c index 13b2a0baf..d04bcbe64 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2102,6 +2102,21 @@ merge (position_set const *s1, position_set const *s2, position_set *m) merge_constrained (s1, s2, -1, m); } +static void +merge2 (position_set *dst, position_set const *src, position_set *m) +{ + if (src->nelem < 4) + { + for (ptrdiff_t i = 0; i < src->nelem; ++i) + insert (src->elems[i], dst); + } + else + { + merge (src, dst, m); + copy (m, dst); + } +} + /* Delete a position from a set. Return the nonzero constraint of the deleted position, or zero if there was no such position. */ static unsigned int @@ -2388,8 +2403,7 @@ merge_nfa_state (struct dfa *d, size_t *queue, ptrdiff_t nqueue, size_t tindex, if (flags[sindex] & OPT_REPEAT) delete (sindex, &follows[sindex]); - merge (&follows[dindex], &follows[sindex], merged); - copy (merged, &follows[dindex]); + merge2 (&follows[dindex], &follows[sindex], merged); /* Mark it as pruned in future. */ follows[tindex].elems[j].constraint = 0; -- 2.17.1 --------------7628C1A21B1F2C85457AE687--