From unknown Tue Jun 17 20:17:19 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#16912 <16912@debbugs.gnu.org> To: bug#16912 <16912@debbugs.gnu.org> Subject: Status: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine Reply-To: bug#16912 <16912@debbugs.gnu.org> Date: Wed, 18 Jun 2025 03:17:19 +0000 retitle 16912 [PATCH] no longer use CSET for non-UTF8 locale in DFA engine reassign 16912 grep submitter 16912 Norihiro Tanaka severity 16912 normal tag 16912 patch thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 01 04:48:43 2014 Received: (at submit) by debbugs.gnu.org; 1 Mar 2014 09:48:43 +0000 Received: from localhost ([127.0.0.1]:45320 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJgXC-0002cS-Qn for submit@debbugs.gnu.org; Sat, 01 Mar 2014 04:48:43 -0500 Received: from pbsg500.nifty.com ([202.248.238.70]:54926) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJgX6-0002cB-IO for submit@debbugs.gnu.org; Sat, 01 Mar 2014 04:48:39 -0500 Received: from [10.120.1.49] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg500.nifty.com with ESMTP id s219mNL5020442 for ; Sat, 1 Mar 2014 18:48:23 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Sat, 01 Mar 2014 18:48:22 +0900 From: Norihiro Tanaka To: submit@debbugs.gnu.org Subject: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine Message-Id: <20140301184821.6A22.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_52D2369D000000001525_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: 1.6 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Package: grep Tags: patch I have overlooked the important thing about optimization by trivial_case_ignore. After optimization by trivial_case_ignore, kwset engine can be used yet. However, if remove trivial_case_ignore, it's never used longer because kwsmusts does nothing when MB_CUR_MAX > 1 && match_icase. [...] Content analysis details: (1.6 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 1.6 OBFU_TEXT_ATTACH BODY: Text attachment with non-text MIME type X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.6 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Package: grep Tags: patch I have overlooked the important thing about optimization by trivial_case_ignore. After optimization by trivial_case_ignore, kwset engine can be used yet. However, if remove trivial_case_ignore, it's never used longer because kwsmusts does nothing when MB_CUR_MAX > 1 && match_icase. [...] Content analysis details: (1.6 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 1.6 OBFU_TEXT_ATTACH BODY: Text attachment with non-text MIME type --------_52D2369D000000001525_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Package: grep Tags: patch I have overlooked the important thing about optimization by trivial_case_ignore. After optimization by trivial_case_ignore, kwset engine can be used yet. However, if remove trivial_case_ignore, it's never used longer because kwsmusts does nothing when MB_CUR_MAX > 1 && match_icase. The patch reverts removal of trivial_case_ignore and fixes 200x slower for non-UTF8 locales with another approach. It always prefers CSET to replacement to OR and no longer use CSET for non-UTF8 locales in DFA engine. It can also optimize by trivial_case_ignore and enables to speed-up >20x for non-UTF8 locales. (I tested it with euc-jp) Norihiro --------_52D2369D000000001525_MULTIPART_MIXED_ Content-Type: application/octet-stream; name="patch.txt" Content-Disposition: attachment; filename="patch.txt" Content-Transfer-Encoding: base64 RnJvbSA5Y2Y2Njg4YmNlMWUzYWViM2JhNjE0MzlkYWM0MWRkM2RjYzliNDA3IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBTYXQsIDEgTWFyIDIwMTQgMTg6NDU6MDQgKzA5MDAKU3ViamVjdDogW1BBVENIIDEvMl0g bm8gbG9uZ2VyIHVzZSBDU0VUIGZvciBub24tVVRGOCBsb2NhbGVzIGluIERGQSBlbmdpbmUKCi0t LQogc3JjL2RmYS5jIHwgMTAgKysrKystLS0tLQogMSBmaWxlIGNoYW5nZWQsIDUgaW5zZXJ0aW9u cygrKSwgNSBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zcmMvZGZhLmMgYi9zcmMvZGZhLmMK aW5kZXggODBiYjgwNy4uZWVkMTAyMCAxMDA2NDQKLS0tIGEvc3JjL2RmYS5jCisrKyBiL3NyYy9k ZmEuYwpAQCAtMTA0NSw3ICsxMDQ1LDggQEAgcGFyc2VfYnJhY2tldF9leHAgKHZvaWQpCiAgICAg ICAgICAgICAgICAgICBpZiAoIXByZWQpCiAgICAgICAgICAgICAgICAgICAgIGRmYWVycm9yIChf KCJpbnZhbGlkIGNoYXJhY3RlciBjbGFzcyIpKTsKIAotICAgICAgICAgICAgICAgICAgaWYgKE1C X0NVUl9NQVggPiAxICYmICFwcmVkLT5zaW5nbGVfYnl0ZV9vbmx5KQorICAgICAgICAgICAgICAg ICAgaWYgKE1CX0NVUl9NQVggPiAxCisgICAgICAgICAgICAgICAgICAgICAgJiYgKCF1c2luZ191 dGY4ICgpIHx8ICFwcmVkLT5zaW5nbGVfYnl0ZV9vbmx5KSkKICAgICAgICAgICAgICAgICAgICAg ewogICAgICAgICAgICAgICAgICAgICAgIC8qIFN0b3JlIHRoZSBjaGFyYWN0ZXIgY2xhc3MgYXMg d2N0eXBlX3QuICAqLwogICAgICAgICAgICAgICAgICAgICAgIHdjdHlwZV90IHd0ID0gd2N0eXBl IChjbGFzcyk7CkBAIC0xMTUyLDIxICsxMTUzLDIxIEBAIHBhcnNlX2JyYWNrZXRfZXhwICh2b2lk KQogICAgICAgaWYgKGNhc2VfZm9sZCkKICAgICAgICAgewogICAgICAgICAgIHdpbnRfdCBmb2xk ZWQgPSB0b3dsb3dlciAod2MpOwotICAgICAgICAgIGlmIChmb2xkZWQgIT0gd2MgJiYgIXNldGJp dF93YyAoZm9sZGVkLCBjY2wpKQorICAgICAgICAgIGlmIChmb2xkZWQgIT0gd2MgJiYgKCF1c2lu Z191dGY4ICgpIHx8ICFzZXRiaXRfd2MgKGZvbGRlZCwgY2NsKSkpCiAgICAgICAgICAgICB7CiAg ICAgICAgICAgICAgIFJFQUxMT0NfSUZfTkVDRVNTQVJZICh3b3JrX21iYy0+Y2hhcnMsIGNoYXJz X2FsLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgd29ya19tYmMtPm5jaGFy cyArIDEpOwogICAgICAgICAgICAgICB3b3JrX21iYy0+Y2hhcnNbd29ya19tYmMtPm5jaGFycysr XSA9IGZvbGRlZDsKICAgICAgICAgICAgIH0KICAgICAgICAgICBmb2xkZWQgPSB0b3d1cHBlciAo d2MpOwotICAgICAgICAgIGlmIChmb2xkZWQgIT0gd2MgJiYgIXNldGJpdF93YyAoZm9sZGVkLCBj Y2wpKQorICAgICAgICAgIGlmIChmb2xkZWQgIT0gd2MgJiYgKCF1c2luZ191dGY4ICgpIHx8ICFz ZXRiaXRfd2MgKGZvbGRlZCwgY2NsKSkpCiAgICAgICAgICAgICB7CiAgICAgICAgICAgICAgIFJF QUxMT0NfSUZfTkVDRVNTQVJZICh3b3JrX21iYy0+Y2hhcnMsIGNoYXJzX2FsLAogICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgd29ya19tYmMtPm5jaGFycyArIDEpOwogICAgICAg ICAgICAgICB3b3JrX21iYy0+Y2hhcnNbd29ya19tYmMtPm5jaGFycysrXSA9IGZvbGRlZDsKICAg ICAgICAgICAgIH0KICAgICAgICAgfQotICAgICAgaWYgKCFzZXRiaXRfd2MgKHdjLCBjY2wpKQor ICAgICAgaWYgKCF1c2luZ191dGY4ICgpIHx8ICFzZXRiaXRfd2MgKHdjLCBjY2wpKQogICAgICAg ICB7CiAgICAgICAgICAgUkVBTExPQ19JRl9ORUNFU1NBUlkgKHdvcmtfbWJjLT5jaGFycywgY2hh cnNfYWwsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHdvcmtfbWJjLT5uY2hhcnMg KyAxKTsKQEAgLTE2MDAsNyArMTYwMSw2IEBAIGFkZHRvayAodG9rZW4gdCkKIAogICAgICAgLyog VVRGLTggYWxsb3dzIHRyZWF0aW5nIGEgc2ltcGxlLCBub24taW52ZXJ0ZWQgTUJDU0VUIGxpa2Ug YSBDU0VULiAgKi8KICAgICAgIGlmICh3b3JrX21iYy0+aW52ZXJ0Ci0gICAgICAgICAgfHwgKCF1 c2luZ191dGY4ICgpICYmIHdvcmtfbWJjLT5jc2V0ICE9IC0xKQogICAgICAgICAgIHx8IHdvcmtf bWJjLT5uY2hhcnMgIT0gMAogICAgICAgICAgIHx8IHdvcmtfbWJjLT5uY2hfY2xhc3NlcyAhPSAw CiAgICAgICAgICAgfHwgd29ya19tYmMtPm5yYW5nZXMgIT0gMAotLSAKMS44LjUuMgoKCkZyb20g ZDlmOWE1YWVkYWQ5ZTgxY2Y1NmI2YzRlMzc1ZTVmMzk1ZTRlMjEzYSBNb24gU2VwIDE3IDAwOjAw OjAwIDIwMDEKRnJvbTogTm9yaWhpcm8gVGFuYWthIDxub3JpdG5rQGtjbi5uZS5qcD4KRGF0ZTog U2F0LCAxIE1hciAyMDE0IDE4OjQ1OjE4ICswOTAwClN1YmplY3Q6IFtQQVRDSCAyLzJdIHJldmVy dDogcmVtb3ZlIHRyaXZpYWxfY2FzZV9pZ25vcmUKCi0tLQogc3JjL21haW4uYyB8IDEwNiArKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr CiAxIGZpbGUgY2hhbmdlZCwgMTA2IGluc2VydGlvbnMoKykKCmRpZmYgLS1naXQgYS9zcmMvbWFp bi5jIGIvc3JjL21haW4uYwppbmRleCA3YzA0ZDc3Li4wOTBmYzRmIDEwMDY0NAotLS0gYS9zcmMv bWFpbi5jCisrKyBiL3NyYy9tYWluLmMKQEAgLTE4NjcsNiArMTg2Nyw4MyBAQCBwYXJzZV9ncmVw X2NvbG9ycyAodm9pZCkKICAgICAgIHJldHVybjsKIH0KIAorLyogSWYgdGhlIG5ld2xpbmUtc2Vw YXJhdGVkIHJlZ3VsYXIgZXhwcmVzc2lvbnMsIEtFWVMgKHdpdGggbGVuZ3RoLCBMRU4KKyAgIGFu ZCBubyB0cmFpbGluZyBOVUwgYnl0ZSksIGFyZSBhbWVuYWJsZSB0byB0cmFuc2Zvcm1hdGlvbiBp bnRvCisgICBvdGhlcndpc2UgZXF1aXZhbGVudCBjYXNlLWlnbm9yaW5nIG9uZXMsIHBlcmZvcm0g dGhlIHRyYW5zZm9ybWF0aW9uLAorICAgcHV0IHRoZSByZXN1bHQgaW50byBtYWxsb2MnZCBtZW1v cnksICpORVdfS0VZUyB3aXRoIGxlbmd0aCAqTkVXX0xFTiwKKyAgIGFuZCByZXR1cm4gdHJ1ZS4g IE90aGVyd2lzZSwgcmV0dXJuIGZhbHNlLiAgKi8KKyNkZWZpbmUgTUJSVE9XQyhwd2MsIHMsIG4s IHBzKSBcCisgIChNQl9DVVJfTUFYID09IDEgPyBcCisgICAoKihwd2MpID0gYnRvd2MgKCoodW5z aWduZWQgY2hhciAqKSAocykpLCAxKSA6IFwKKyAgIG1icnRvd2MgKChwd2MpLCAocyksIChuKSwg KHBzKSkpCisjZGVmaW5lIFdDUlRPTUIocywgd2MsIHBzKSBcCisgIChNQl9DVVJfTUFYID09IDEg PyBcCisgICAoKihzKSA9IHdjdG9iICgod2ludF90KSAod2MpKSwgMSkgOiBcCisgICB3Y3J0b21i ICgocyksICh3YyksIChwcykpKQorCitzdGF0aWMgYm9vbAordHJpdmlhbF9jYXNlX2lnbm9yZSAo c2l6ZV90IGxlbiwgY2hhciBjb25zdCAqa2V5cywKKyAgICAgICAgICAgICAgICAgICAgIHNpemVf dCAqbmV3X2xlbiwgY2hhciAqKm5ld19rZXlzKQoreworICAvKiBGSVhNRTogY29uc2lkZXIgcmVt b3ZpbmcgdGhlIGZvbGxvd2luZyByZXN0cmljdGlvbjoKKyAgICAgUmVqZWN0IGlmIEtFWVMgY29u dGFpbiBBU0NJSSAnXFwnIG9yICdbJy4gICovCisgIGlmIChtZW1jaHIgKGtleXMsICdcXCcsIGxl bikgfHwgbWVtY2hyIChrZXlzLCAnWycsIGxlbikpCisgICAgcmV0dXJuIGZhbHNlOworCisgIC8q IFdvcnN0IGNhc2UgaXMgdGhhdCBlYWNoIGJ5dGUgQiBvZiBLRVlTIGlzIEFTQ0lJIGFscGhhYmV0 aWMgYW5kIGVhY2gKKyAgICAgb3RoZXJfY2FzZShCKSBjaGFyYWN0ZXIsIEMsIG9jY3VwaWVzIE1C X0NVUl9NQVggYnl0ZXMsIHNvIGVhY2ggQgorICAgICBtYXBzIHRvIFtCQ10sIHdoaWNoIHJlcXVp cmVzIE1CX0NVUl9NQVggKyAzIGJ5dGVzLiAgICovCisgICpuZXdfa2V5cyA9IHhubWFsbG9jIChN Ql9DVVJfTUFYICsgMywgbGVuICsgMSk7CisgIGNoYXIgKnAgPSAqbmV3X2tleXM7CisKKyAgbWJz dGF0ZV90IG1iX3N0YXRlOworICBtZW1zZXQgKCZtYl9zdGF0ZSwgMCwgc2l6ZW9mIG1iX3N0YXRl KTsKKyAgd2hpbGUgKGxlbikKKyAgICB7CisgICAgICB3Y2hhcl90IHdjOworICAgICAgaW50IG4g PSBNQlJUT1dDICgmd2MsIGtleXMsIGxlbiwgJm1iX3N0YXRlKTsKKworICAgICAgLyogRm9yIGFu IGludmFsaWQsIGluY29tcGxldGUgb3IgTCdcMCcsIHNraXAgdGhpcyBvcHRpbWl6YXRpb24uICAq LworICAgICAgaWYgKG4gPD0gMCkKKyAgICAgICAgeworICAgICAgICBza2lwX2Nhc2VfaWdub3Jl X29wdGltaXphdGlvbjoKKyAgICAgICAgICBmcmVlICgqbmV3X2tleXMpOworICAgICAgICAgIHJl dHVybiBmYWxzZTsKKyAgICAgICAgfQorCisgICAgICBjaGFyIGNvbnN0ICpvcmlnID0ga2V5czsK KyAgICAgIGtleXMgKz0gbjsKKyAgICAgIGxlbiAtPSBuOworCisgICAgICBpZiAoIWlzd2FscGhh ICh3YykpCisgICAgICAgIHsKKyAgICAgICAgICBtZW1jcHkgKHAsIG9yaWcsIG4pOworICAgICAg ICAgIHAgKz0gbjsKKyAgICAgICAgfQorICAgICAgZWxzZQorICAgICAgICB7CisgICAgICAgICAg KnArKyA9ICdbJzsKKyAgICAgICAgICBtZW1jcHkgKHAsIG9yaWcsIG4pOworICAgICAgICAgIHAg Kz0gbjsKKworICAgICAgICAgIHdjaGFyX3Qgd2MyID0gaXN3dXBwZXIgKHdjKSA/IHRvd2xvd2Vy ICh3YykgOiB0b3d1cHBlciAod2MpOworICAgICAgICAgIGNoYXIgYnVmW01CX0NVUl9NQVhdOwor ICAgICAgICAgIGludCBuMiA9IFdDUlRPTUIgKGJ1Ziwgd2MyLCAmbWJfc3RhdGUpOworICAgICAg ICAgIGlmIChuMiA8PSAwKQorICAgICAgICAgICAgZ290byBza2lwX2Nhc2VfaWdub3JlX29wdGlt aXphdGlvbjsKKyAgICAgICAgICBhc3NlcnQgKG4yIDw9IE1CX0NVUl9NQVgpOworICAgICAgICAg IG1lbWNweSAocCwgYnVmLCBuMik7CisgICAgICAgICAgcCArPSBuMjsKKworICAgICAgICAgICpw KysgPSAnXSc7CisgICAgICAgIH0KKyAgICB9CisKKyAgKm5ld19sZW4gPSBwIC0gKm5ld19rZXlz OworCisgIHJldHVybiB0cnVlOworfQorCiBpbnQKIG1haW4gKGludCBhcmdjLCBjaGFyICoqYXJn dikKIHsKQEAgLTIyNjEsNiArMjMzOCwzNSBAQCBtYWluIChpbnQgYXJnYywgY2hhciAqKmFyZ3Yp CiAgIGVsc2UKICAgICB1c2FnZSAoRVhJVF9UUk9VQkxFKTsKIAorICAvKiBBcyBjdXJyZW50bHkg aW1wbGVtZW50ZWQsIGNhc2UtaW5zZW5zaXRpdmUgbWF0Y2hpbmcgaXMgZXhwZW5zaXZlIGluCisg ICAgIG11bHRpLWJ5dGUgbG9jYWxlcyBiZWNhdXNlIG9mIGEgZmV3IG91dGxpZXIgbG9jYWxlcyBp biB3aGljaCBzb21lCisgICAgIGNoYXJhY3RlcnMgY2hhbmdlIHNpemUgd2hlbiBjb252ZXJ0ZWQg dG8gdXBwZXIgb3IgbG93ZXIgY2FzZS4gIFRvCisgICAgIGFjY29tbW9kYXRlIHRob3NlLCB3ZSBy ZXZlcnQgdG8gc2VhcmNoaW5nIHRoZSBpbnB1dCBvbmUgbGluZSBhdCBhCisgICAgIHRpbWUsIHJh dGhlciB0aGFuIHVzaW5nIHRoZSBtdWNoIG1vcmUgZWZmaWNpZW50IGJ1ZmZlciBzZWFyY2guCisg ICAgIEhvd2V2ZXIsIGlmIHdlIGhhdmUgYSByZWd1bGFyIGV4cHJlc3Npb24sIC9mb28vaSwgd2Ug Y2FuIGNvbnZlcnQKKyAgICAgaXQgdG8gYW4gZXF1aXZhbGVudCBjYXNlLWluc2Vuc2l0aXZlIC9b ZkZdW29PXVtvT10vLCBhbmQgdGh1cworICAgICBhdm9pZCB0aGUgZXhwZW5zaXZlIHJlYWQtYW5k LXByb2Nlc3MtYS1saW5lLWF0LWEtdGltZSByZXF1aXJlbWVudC4KKyAgICAgT3B0aW1pemUtYXdh eSB0aGUgIi1pIiBvcHRpb24sIHdoZW4gcG9zc2libGUsIGNvbnZlcnRpbmcgZWFjaAorICAgICBj YW5kaWRhdGUgYWxwaGEsIEMsIGluIHRoZSByZWdleHAgdG8gW0NjXS4gICovCisgIGlmIChtYXRj aF9pY2FzZSkKKyAgICB7CisgICAgICBzaXplX3QgbmV3X2tleWNjOworICAgICAgY2hhciAqbmV3 X2tleXM7CisgICAgICAvKiBJdCBpcyBub3QgcG9zc2libGUgd2l0aCAtRiwgbm90IHVzZWZ1bCB3 aXRoIC1QIChwY3JlKSBhbmQgdGhlcmUgaXMgbm8KKyAgICAgICAgIHBvaW50IHdoZW4gdGhlcmUg aXMgbm8gcmVnZXhwLiAgSXQgYWxzbyBkZXBlbmRzIG9uIHdoaWNoIGNvbnN0cnVjdHMKKyAgICAg ICAgIGFwcGVhciBpbiB0aGUgcmVnZXhwLiAgU2VlIHRyaXZpYWxfY2FzZV9pZ25vcmUgZm9yIHRo b3NlIGRldGFpbHMuICAqLworICAgICAgaWYgKGtleWNjCisgICAgICAgICAgJiYgISAobWF0Y2hl cgorICAgICAgICAgICAgICAgICYmIChTVFJFUSAobWF0Y2hlciwgImZncmVwIikgfHwgU1RSRVEg KG1hdGNoZXIsICJwY3JlIikpKQorICAgICAgICAgICYmIHRyaXZpYWxfY2FzZV9pZ25vcmUgKGtl eWNjLCBrZXlzLCAmbmV3X2tleWNjLCAmbmV3X2tleXMpKQorICAgICAgICB7CisgICAgICAgICAg bWF0Y2hfaWNhc2UgPSAwOworICAgICAgICAgIGZyZWUgKGtleXMpOworICAgICAgICAgIGtleXMg PSBuZXdfa2V5czsKKyAgICAgICAgICBrZXljYyA9IG5ld19rZXljYzsKKyAgICAgICAgfQorICAg IH0KKwogI2lmIE1CU19TVVBQT1JUCiAgIGlmIChNQl9DVVJfTUFYID4gMSkKICAgICBidWlsZF9t YmNsZW5fY2FjaGUgKCk7Ci0tIAoxLjguNS4yCgo= --------_52D2369D000000001525_MULTIPART_MIXED_ Content-Type: application/octet-stream; name="tests.txt" Content-Disposition: attachment; filename="tests.txt" Content-Transfer-Encoding: base64 CkJlZm9yZSB0aGUgcGF0Y2g6CiQgZW52IExDX0FMTD1lbl9VUy5VVEYtOCB0aW1lIHNyYy9ncmVw IC1pIO+8pu+8r++8r++8ou+8oe+8siAuLi9rCkNvbW1hbmQgZXhpdGVkIHdpdGggbm9uLXplcm8g c3RhdHVzIDEKMC44OHVzZXIgMC4zMXN5c3RlbSAwOjAxLjIyZWxhcHNlZCA5NyVDUFUgKDBhdmd0 ZXh0KzBhdmdkYXRhIDMwNzJtYXhyZXNpZGVudClrCjBpbnB1dHMrMG91dHB1dHMgKDBtYWpvcisy MTZtaW5vcilwYWdlZmF1bHRzIDBzd2FwcwokIGVudiBMQ19BTEw9amFfSlAuZXVjSlAgdGltZSBz cmMvZ3JlcCAtaSDvvKbvvK/vvK/vvKLvvKHvvLIgLi4vawpDb21tYW5kIGV4aXRlZCB3aXRoIG5v bi16ZXJvIHN0YXR1cyAxCjIwLjg0dXNlciA2LjM4c3lzdGVtIDA6MjcuODFlbGFwc2VkIDk3JUNQ VSAoMGF2Z3RleHQrMGF2Z2RhdGEgMzIzMm1heHJlc2lkZW50KWsKMGlucHV0cyswb3V0cHV0cyAo MG1ham9yKzEyNzQwbWlub3IpcGFnZWZhdWx0cyAwc3dhcHMKCkFmdGVyIHRoZSBwYXRjaDoKJCBl bnYgTENfQUxMPWVuX1VTLlVURi04IHRpbWUgc3JjL2dyZXAgLWkg77ym77yv77yv77yi77yh77yy IC4uL2sKQ29tbWFuZCBleGl0ZWQgd2l0aCBub24temVybyBzdGF0dXMgMQowLjEzdXNlciAwLjMw c3lzdGVtIDA6MDAuNDNlbGFwc2VkIDk5JUNQVSAoMGF2Z3RleHQrMGF2Z2RhdGEgMzMxMm1heHJl c2lkZW50KWsKMGlucHV0cyswb3V0cHV0cyAoMG1ham9yKzIzMm1pbm9yKXBhZ2VmYXVsdHMgMHN3 YXBzCiQgZW52IExDX0FMTD1qYV9KUC5ldWNKUCB0aW1lIHNyYy9ncmVwIC1pIO+8pu+8r++8r++8 ou+8oe+8siAuLi9rCkNvbW1hbmQgZXhpdGVkIHdpdGggbm9uLXplcm8gc3RhdHVzIDEKMC4xMHVz ZXIgMC4yNXN5c3RlbSAwOjAwLjM2ZWxhcHNlZCA5OCVDUFUgKDBhdmd0ZXh0KzBhdmdkYXRhIDMz MjhtYXhyZXNpZGVudClrCjBpbnB1dHMrMG91dHB1dHMgKDBtYWpvcisyMzRtaW5vcilwYWdlZmF1 bHRzIDBzd2FwcwoK --------_52D2369D000000001525_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 01 19:13:19 2014 Received: (at 16912) by debbugs.gnu.org; 2 Mar 2014 00:13:19 +0000 Received: from localhost ([127.0.0.1]:47137 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJu1u-0003MR-Hj for submit@debbugs.gnu.org; Sat, 01 Mar 2014 19:13:18 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:37243) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJu1r-0003MH-Nl for 16912@debbugs.gnu.org; Sat, 01 Mar 2014 19:13:16 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 0ADC039E8015; Sat, 1 Mar 2014 16:13:15 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qDArdkwY5POS; Sat, 1 Mar 2014 16:13:14 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 6FE6439E8008; Sat, 1 Mar 2014 16:13:14 -0800 (PST) Message-ID: <5312779A.3010300@cs.ucla.edu> Date: Sat, 01 Mar 2014 16:13:14 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Norihiro Tanaka , 16912@debbugs.gnu.org Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine References: <20140301184821.6A22.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140301184821.6A22.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 16912 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Thanks for looking into this. Unfortunately the combination of the two patches causes "make check" to fail, because it reintroduces a titlecase bug. I can draft a further patch for that, but in the meantime can you look at a few other things? First, why does the first patch add those four using_utf8 calls to parse_bracket_exp? Isn't that optimization valid regardless of whether the multibyte encoding is UTF-8? Second, the comment "UTF-8 allows treating a simple, non-inverted MBCSET like a CSET." no longer seems to match the code, since addtok no longer invokes using_utf8. Third, could you please draft a proper commit message? The format is something like this: grep: minor tuning for mb_case_map_apply * src/kwsearch.c (mb_case_map_apply): Avoid unnecessary widening of size_t to intmax_t. Avoid unnecessary reinitialization of k. That is, a first line of the form "program: short description". Then an empty line. Then a ChangeLog entry in standard GNU format. I'll take a look at the second patch later. From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 01 20:23:38 2014 Received: (at 16912) by debbugs.gnu.org; 2 Mar 2014 01:23:38 +0000 Received: from localhost ([127.0.0.1]:47206 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJv7x-00059j-FZ for submit@debbugs.gnu.org; Sat, 01 Mar 2014 20:23:37 -0500 Received: from pbsg500.nifty.com ([202.248.238.70]:41343) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJv7u-00059W-Dt for 16912@debbugs.gnu.org; Sat, 01 Mar 2014 20:23:36 -0500 Received: from [10.120.1.51] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg500.nifty.com with ESMTP id s221NRQf008996; Sun, 2 Mar 2014 10:23:27 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Sun, 02 Mar 2014 10:23:28 +0900 From: Norihiro Tanaka To: Paul Eggert Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine In-Reply-To: <5312779A.3010300@cs.ucla.edu> References: <20140301184821.6A22.27F6AC2D@kcn.ne.jp> <5312779A.3010300@cs.ucla.edu> Message-Id: <20140302102328.25CC.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 16912 Cc: 16912@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hi Paul Thank you for checking the patch. > First, why does the first patch add those four using_utf8 calls to > parse_bracket_exp? Isn't that optimization valid regardless of > whether the multibyte encoding is UTF-8? The optimization which MBCSET is changed into CSET in addtok is completed on UTF8 locale only, because even if work_mbc->cset is defined in non-UTF8 locales, it's treated as not CSET but MBCSET. So if not CSET to replacement to OR, dfa will keep MBCSET until last and return backref. I want to avoid it. However I don't understand why the optimization isn't completed on non-UTF8 locale only. Can you explain it? Norihiro From debbugs-submit-bounces@debbugs.gnu.org Sun Mar 02 04:48:26 2014 Received: (at 16912) by debbugs.gnu.org; 2 Mar 2014 09:48:26 +0000 Received: from localhost ([127.0.0.1]:47402 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WK30U-0002AL-3P for submit@debbugs.gnu.org; Sun, 02 Mar 2014 04:48:26 -0500 Received: from pbsg500.nifty.com ([202.248.238.70]:32927) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WK30P-0002A5-6N for 16912@debbugs.gnu.org; Sun, 02 Mar 2014 04:48:24 -0500 Received: from [10.120.1.51] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg500.nifty.com with ESMTP id s229lxkn004830 for <16912@debbugs.gnu.org>; Sun, 2 Mar 2014 18:48:00 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Sun, 02 Mar 2014 18:47:59 +0900 From: Norihiro Tanaka To: 16912@debbugs.gnu.org Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine In-Reply-To: <20140302102328.25CC.27F6AC2D@kcn.ne.jp> References: <5312779A.3010300@cs.ucla.edu> <20140302102328.25CC.27F6AC2D@kcn.ne.jp> Message-Id: <20140302184759.B657.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5312F82500000000B64E_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: 2.6 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: I have added several modifications to the patch. First, I fixed the bug for titlecase. Second, I changed it so that prefered replacement to OR to CSET in order to reduce a number of states. [...] Content analysis details: (2.6 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see ] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 1.3 OBFU_TEXT_ATTACH BODY: Text attachment with non-text MIME type X-Debbugs-Envelope-To: 16912 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 2.6 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: I have added several modifications to the patch. First, I fixed the bug for titlecase. Second, I changed it so that prefered replacement to OR to CSET in order to reduce a number of states. [...] Content analysis details: (2.6 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see ] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 1.3 OBFU_TEXT_ATTACH BODY: Text attachment with non-text MIME type --------_5312F82500000000B64E_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit I have added several modifications to the patch. First, I fixed the bug for titlecase. Second, I changed it so that prefered replacement to OR to CSET in order to reduce a number of states. Third, I modified comments in source code and put drafts of commit messages in the patch. Norihiro --------_5312F82500000000B64E_MULTIPART_MIXED_ Content-Type: application/octet-stream; name="patch.txt" Content-Disposition: attachment; filename="patch.txt" Content-Transfer-Encoding: base64 RnJvbSBmYzUwZDcxMWRiY2Q3ZmNmODNmMTExZTYxYjFjODY2NGQ0ZTgwNDAzIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBTdW4sIDIgTWFyIDIwMTQgMTc6MDg6MzYgKzA5MDAKU3ViamVjdDogW1BBVENIIDEvMl0g Z3JlcDogb3B0aW1pemF0aW9uIG9mIGJyYWNrZXQgZXhwcmVzc2lvbiBmb3Igbm9uLVVURjggbG9j YWxlcwoKSWYgTUJDU0VUIGlzIG5vbi1pbnZlcnRlZCBhbmQgZG9lc24ndCBpbmNsdWRlIG5laXRo ZXIgY2hhcmFjdGVyIGNsYXNzZXMKaW5jbHVkaW5nIG11bHRpYnl0ZSBjaGFyYWN0ZXJzLCByYW5n ZSBleHByZXNzaW9ucywgZXF1aXZhbGVuY2UgY2xhc3Nlcwpub3IgY29sbGF0aW5nIGVsZW1lbnRz LCByZXBsYWNlIGl0IHRvIGEgc2ltcGxlIENTRVQuCgoqIHNyYy9kZmEuYyAoYWRkdG9rKTogQXBw bHkgcmVwbGFjZW1lbnQgb2YgTUJDU0VUIHRvIHNpbXBsZSBDU0VUIGZvciBub3QKb25seSBVVEY4 IGxvY2FsZSBidXQgbm9uLVVURjggbG9jYWxlcy4KLS0tCiBzcmMvZGZhLmMgfCA4ICsrKystLS0t CiAxIGZpbGUgY2hhbmdlZCwgNCBpbnNlcnRpb25zKCspLCA0IGRlbGV0aW9ucygtKQoKZGlmZiAt LWdpdCBhL3NyYy9kZmEuYyBiL3NyYy9kZmEuYwppbmRleCA4MGJiODA3Li41ODVhNTk5IDEwMDY0 NAotLS0gYS9zcmMvZGZhLmMKKysrIGIvc3JjL2RmYS5jCkBAIC0xNTk4LDEwICsxNTk4LDExIEBA IGFkZHRvayAodG9rZW4gdCkKICAgICAgICAgICB3b3JrX21iYy0+bmNoYXJzID0gMDsKICAgICAg ICAgfQogCi0gICAgICAvKiBVVEYtOCBhbGxvd3MgdHJlYXRpbmcgYSBzaW1wbGUsIG5vbi1pbnZl cnRlZCBNQkNTRVQgbGlrZSBhIENTRVQuICAqLworICAgICAgLyogSWYgdGhlIE1CQ1NFVCBpcyBu b24taW52ZXJ0ZWQgYW5kIGRvZXNuJ3QgaW5jbHVkZSBuZWl0aGVyCisgICAgICAgICBjaGFyYWN0 ZXIgY2xhc3NlcyBpbmNsdWRpbmcgbXVsdGlieXRlIGNoYXJhY3RlcnMsIHJhbmdlCisgICAgICAg ICBleHByZXNzaW9ucywgZXF1aXZhbGVuY2UgY2xhc3NlcyBub3IgY29sbGF0aW5nIGVsZW1lbnRz LAorICAgICAgICAgaXQgY2FuIGJlIHJlcGxhY2VkIHRvIGEgc2ltcGxlIENTRVQuICovCiAgICAg ICBpZiAod29ya19tYmMtPmludmVydAotICAgICAgICAgIHx8ICghdXNpbmdfdXRmOCAoKSAmJiB3 b3JrX21iYy0+Y3NldCAhPSAtMSkKLSAgICAgICAgICB8fCB3b3JrX21iYy0+bmNoYXJzICE9IDAK ICAgICAgICAgICB8fCB3b3JrX21iYy0+bmNoX2NsYXNzZXMgIT0gMAogICAgICAgICAgIHx8IHdv cmtfbWJjLT5ucmFuZ2VzICE9IDAKICAgICAgICAgICB8fCB3b3JrX21iYy0+bmVxdWl2cyAhPSAw IHx8IHdvcmtfbWJjLT5uY29sbF9lbGVtcyAhPSAwKQpAQCAtMTYxNiw3ICsxNjE3LDYgQEAgYWRk dG9rICh0b2tlbiB0KQogICAgICAgICAgICAgIHRoYXQgdGhlIG1iY3NldCBpcyBlbXB0eSBub3cu ICBEbyBub3RoaW5nIGluIHRoYXQgY2FzZS4gICovCiAgICAgICAgICAgaWYgKHdvcmtfbWJjLT5j c2V0ICE9IC0xKQogICAgICAgICAgICAgewotICAgICAgICAgICAgICBhc3NlcnQgKHVzaW5nX3V0 ZjggKCkpOwogICAgICAgICAgICAgICBhZGR0b2sgKENTRVQgKyB3b3JrX21iYy0+Y3NldCk7CiAg ICAgICAgICAgICAgIGlmIChuZWVkX29yKQogICAgICAgICAgICAgICAgIGFkZHRvayAoT1IpOwot LSAKMS44LjUuMgoKCkZyb20gOWYyOTQ3MTM4NTBmYzBiZGY3NDhiMTYwZjJmYTI3OTk1YzFmZjVm MSBNb24gU2VwIDE3IDAwOjAwOjAwIDIwMDEKRnJvbTogTm9yaWhpcm8gVGFuYWthIDxub3JpdG5r QGtjbi5uZS5qcD4KRGF0ZTogU3VuLCAyIE1hciAyMDE0IDE3OjE5OjQ3ICswOTAwClN1YmplY3Q6 IFtQQVRDSCAyLzJdIGdyZXA6IHJldmVydCByZW1vdmFsIG9mIHRyaXZpYWxfY2FzZV9pZ25vcmUu CgpSZXZpdmUgdHJpdmlhbF9jYXNlX2lnbm9yZSBmdW5jdGlvbiBpbiBvcmRlciB0byBiZSBhYmxl IHRvIHVzZSBrd3NldC4KCiogc3JjL21haW4uYyAodHJpdmlhbF9jYXNlX2lnbm9yZSk6IE5ldyBm dW5jdGlvbi4KLS0tCiBzcmMvbWFpbi5jIHwgMTIwICsrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysKIDEgZmlsZSBjaGFuZ2VkLCAxMjAg aW5zZXJ0aW9ucygrKQoKZGlmZiAtLWdpdCBhL3NyYy9tYWluLmMgYi9zcmMvbWFpbi5jCmluZGV4 IDdjMDRkNzcuLjJlZTU4NWEgMTAwNjQ0Ci0tLSBhL3NyYy9tYWluLmMKKysrIGIvc3JjL21haW4u YwpAQCAtMTg2Nyw2ICsxODY3LDk3IEBAIHBhcnNlX2dyZXBfY29sb3JzICh2b2lkKQogICAgICAg cmV0dXJuOwogfQogCisvKiBJZiB0aGUgbmV3bGluZS1zZXBhcmF0ZWQgcmVndWxhciBleHByZXNz aW9ucywgS0VZUyAod2l0aCBsZW5ndGgsIExFTgorICAgYW5kIG5vIHRyYWlsaW5nIE5VTCBieXRl KSwgYXJlIGFtZW5hYmxlIHRvIHRyYW5zZm9ybWF0aW9uIGludG8KKyAgIG90aGVyd2lzZSBlcXVp dmFsZW50IGNhc2UtaWdub3Jpbmcgb25lcywgcGVyZm9ybSB0aGUgdHJhbnNmb3JtYXRpb24sCisg ICBwdXQgdGhlIHJlc3VsdCBpbnRvIG1hbGxvYydkIG1lbW9yeSwgKk5FV19LRVlTIHdpdGggbGVu Z3RoICpORVdfTEVOLAorICAgYW5kIHJldHVybiB0cnVlLiAgT3RoZXJ3aXNlLCByZXR1cm4gZmFs c2UuICAqLworI2RlZmluZSBNQlJUT1dDKHB3YywgcywgbiwgcHMpIFwKKyAgKE1CX0NVUl9NQVgg PT0gMSA/IFwKKyAgICgqKHB3YykgPSBidG93YyAoKih1bnNpZ25lZCBjaGFyICopIChzKSksIDEp IDogXAorICAgbWJydG93YyAoKHB3YyksIChzKSwgKG4pLCAocHMpKSkKKyNkZWZpbmUgV0NSVE9N QihzLCB3YywgcHMpIFwKKyAgKE1CX0NVUl9NQVggPT0gMSA/IFwKKyAgICgqKHMpID0gd2N0b2Ig KCh3aW50X3QpICh3YykpLCAxKSA6IFwKKyAgIHdjcnRvbWIgKChzKSwgKHdjKSwgKHBzKSkpCisK K3N0YXRpYyBib29sCit0cml2aWFsX2Nhc2VfaWdub3JlIChzaXplX3QgbGVuLCBjaGFyIGNvbnN0 ICprZXlzLAorICAgICAgICAgICAgICAgICAgICAgc2l6ZV90ICpuZXdfbGVuLCBjaGFyICoqbmV3 X2tleXMpCit7CisgIC8qIEZJWE1FOiBjb25zaWRlciByZW1vdmluZyB0aGUgZm9sbG93aW5nIHJl c3RyaWN0aW9uOgorICAgICBSZWplY3QgaWYgS0VZUyBjb250YWluIEFTQ0lJICdcXCcgb3IgJ1sn LiAgKi8KKyAgaWYgKG1lbWNociAoa2V5cywgJ1xcJywgbGVuKSB8fCBtZW1jaHIgKGtleXMsICdb JywgbGVuKSkKKyAgICByZXR1cm4gZmFsc2U7CisKKyAgLyogV29yc3QgY2FzZSBpcyB0aGF0IGVh Y2ggYnl0ZSBCIG9mIEtFWVMgaXMgQVNDSUkgYWxwaGFiZXRpYyBhbmQgZWFjaAorICAgICBvdGhl cl9jYXNlKEIpIGNoYXJhY3RlciwgQywgb2NjdXBpZXMgTUJfQ1VSX01BWCBieXRlcywgc28gZWFj aCBCCisgICAgIG1hcHMgdG8gW0JDXSwgd2hpY2ggcmVxdWlyZXMgTUJfQ1VSX01BWCArIDMgYnl0 ZXMuICAgKi8KKyAgKm5ld19rZXlzID0geG5tYWxsb2MgKE1CX0NVUl9NQVggKyAzLCBsZW4gKyAx KTsKKyAgY2hhciAqcCA9ICpuZXdfa2V5czsKKworICBtYnN0YXRlX3QgbWJfc3RhdGU7CisgIG1l bXNldCAoJm1iX3N0YXRlLCAwLCBzaXplb2YgbWJfc3RhdGUpOworICB3aGlsZSAobGVuKQorICAg IHsKKyAgICAgIHdjaGFyX3Qgd2M7CisgICAgICBpbnQgbiA9IE1CUlRPV0MgKCZ3Yywga2V5cywg bGVuLCAmbWJfc3RhdGUpOworCisgICAgICAvKiBGb3IgYW4gaW52YWxpZCwgaW5jb21wbGV0ZSBv ciBMJ1wwJywgc2tpcCB0aGlzIG9wdGltaXphdGlvbi4gICovCisgICAgICBpZiAobiA8PSAwKQor ICAgICAgICB7CisgICAgICAgIHNraXBfY2FzZV9pZ25vcmVfb3B0aW1pemF0aW9uOgorICAgICAg ICAgIGZyZWUgKCpuZXdfa2V5cyk7CisgICAgICAgICAgcmV0dXJuIGZhbHNlOworICAgICAgICB9 CisKKyAgICAgIGNoYXIgY29uc3QgKm9yaWcgPSBrZXlzOworICAgICAga2V5cyArPSBuOworICAg ICAgbGVuIC09IG47CisKKyAgICAgIGlmICghaXN3YWxwaGEgKHdjKSkKKyAgICAgICAgeworICAg ICAgICAgIG1lbWNweSAocCwgb3JpZywgbik7CisgICAgICAgICAgcCArPSBuOworICAgICAgICB9 CisgICAgICBlbHNlCisgICAgICAgIHsKKyAgICAgICAgICAqcCsrID0gJ1snOworICAgICAgICAg IG1lbWNweSAocCwgb3JpZywgbik7CisgICAgICAgICAgcCArPSBuOworCisgICAgICAgICAgd2lu dF90IGZvbGRlZCA9IHRvd2xvd2VyICh3Yyk7CisgICAgICAgICAgaWYgKGZvbGRlZCAhPSB3YykK KyAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgY2hhciBidWZbTUJfQ1VSX01BWF07CisgICAg ICAgICAgICAgIGludCBuMiA9IFdDUlRPTUIgKGJ1ZiwgZm9sZGVkLCAmbWJfc3RhdGUpOworICAg ICAgICAgICAgICBpZiAobjIgPD0gMCkKKyAgICAgICAgICAgICAgICBnb3RvIHNraXBfY2FzZV9p Z25vcmVfb3B0aW1pemF0aW9uOworICAgICAgICAgICAgICBhc3NlcnQgKG4yIDw9IE1CX0NVUl9N QVgpOworICAgICAgICAgICAgICBtZW1jcHkgKHAsIGJ1ZiwgbjIpOworICAgICAgICAgICAgICBw ICs9IG4yOworICAgICAgICAgICAgfQorICAgICAgICAgIGZvbGRlZCA9IHRvd3VwcGVyICh3Yyk7 CisgICAgICAgICAgaWYgKGZvbGRlZCAhPSB3YykKKyAgICAgICAgICAgIHsKKyAgICAgICAgICAg ICAgY2hhciBidWZbTUJfQ1VSX01BWF07CisgICAgICAgICAgICAgIGludCBuMiA9IFdDUlRPTUIg KGJ1ZiwgZm9sZGVkLCAmbWJfc3RhdGUpOworICAgICAgICAgICAgICBpZiAobjIgPD0gMCkKKyAg ICAgICAgICAgICAgICBnb3RvIHNraXBfY2FzZV9pZ25vcmVfb3B0aW1pemF0aW9uOworICAgICAg ICAgICAgICBhc3NlcnQgKG4yIDw9IE1CX0NVUl9NQVgpOworICAgICAgICAgICAgICBtZW1jcHkg KHAsIGJ1ZiwgbjIpOworICAgICAgICAgICAgICBwICs9IG4yOworICAgICAgICAgICAgfQorCisg ICAgICAgICAgKnArKyA9ICddJzsKKyAgICAgICAgfQorICAgIH0KKworICAqbmV3X2xlbiA9IHAg LSAqbmV3X2tleXM7CisKKyAgcmV0dXJuIHRydWU7Cit9CisKIGludAogbWFpbiAoaW50IGFyZ2Ms IGNoYXIgKiphcmd2KQogewpAQCAtMjI2MSw2ICsyMzUyLDM1IEBAIG1haW4gKGludCBhcmdjLCBj aGFyICoqYXJndikKICAgZWxzZQogICAgIHVzYWdlIChFWElUX1RST1VCTEUpOwogCisgIC8qIEFz IGN1cnJlbnRseSBpbXBsZW1lbnRlZCwgY2FzZS1pbnNlbnNpdGl2ZSBtYXRjaGluZyBpcyBleHBl bnNpdmUgaW4KKyAgICAgbXVsdGktYnl0ZSBsb2NhbGVzIGJlY2F1c2Ugb2YgYSBmZXcgb3V0bGll ciBsb2NhbGVzIGluIHdoaWNoIHNvbWUKKyAgICAgY2hhcmFjdGVycyBjaGFuZ2Ugc2l6ZSB3aGVu IGNvbnZlcnRlZCB0byB1cHBlciBvciBsb3dlciBjYXNlLiAgVG8KKyAgICAgYWNjb21tb2RhdGUg dGhvc2UsIHdlIHJldmVydCB0byBzZWFyY2hpbmcgdGhlIGlucHV0IG9uZSBsaW5lIGF0IGEKKyAg ICAgdGltZSwgcmF0aGVyIHRoYW4gdXNpbmcgdGhlIG11Y2ggbW9yZSBlZmZpY2llbnQgYnVmZmVy IHNlYXJjaC4KKyAgICAgSG93ZXZlciwgaWYgd2UgaGF2ZSBhIHJlZ3VsYXIgZXhwcmVzc2lvbiwg L2Zvby9pLCB3ZSBjYW4gY29udmVydAorICAgICBpdCB0byBhbiBlcXVpdmFsZW50IGNhc2UtaW5z ZW5zaXRpdmUgL1tmRl1bb09dW29PXS8sIGFuZCB0aHVzCisgICAgIGF2b2lkIHRoZSBleHBlbnNp dmUgcmVhZC1hbmQtcHJvY2Vzcy1hLWxpbmUtYXQtYS10aW1lIHJlcXVpcmVtZW50LgorICAgICBP cHRpbWl6ZS1hd2F5IHRoZSAiLWkiIG9wdGlvbiwgd2hlbiBwb3NzaWJsZSwgY29udmVydGluZyBl YWNoCisgICAgIGNhbmRpZGF0ZSBhbHBoYSwgQywgaW4gdGhlIHJlZ2V4cCB0byBbQ2NdLiAgKi8K KyAgaWYgKG1hdGNoX2ljYXNlKQorICAgIHsKKyAgICAgIHNpemVfdCBuZXdfa2V5Y2M7CisgICAg ICBjaGFyICpuZXdfa2V5czsKKyAgICAgIC8qIEl0IGlzIG5vdCBwb3NzaWJsZSB3aXRoIC1GLCBu b3QgdXNlZnVsIHdpdGggLVAgKHBjcmUpIGFuZCB0aGVyZSBpcyBubworICAgICAgICAgcG9pbnQg d2hlbiB0aGVyZSBpcyBubyByZWdleHAuICBJdCBhbHNvIGRlcGVuZHMgb24gd2hpY2ggY29uc3Ry dWN0cworICAgICAgICAgYXBwZWFyIGluIHRoZSByZWdleHAuICBTZWUgdHJpdmlhbF9jYXNlX2ln bm9yZSBmb3IgdGhvc2UgZGV0YWlscy4gICovCisgICAgICBpZiAoa2V5Y2MKKyAgICAgICAgICAm JiAhIChtYXRjaGVyCisgICAgICAgICAgICAgICAgJiYgKFNUUkVRIChtYXRjaGVyLCAiZmdyZXAi KSB8fCBTVFJFUSAobWF0Y2hlciwgInBjcmUiKSkpCisgICAgICAgICAgJiYgdHJpdmlhbF9jYXNl X2lnbm9yZSAoa2V5Y2MsIGtleXMsICZuZXdfa2V5Y2MsICZuZXdfa2V5cykpCisgICAgICAgIHsK KyAgICAgICAgICBtYXRjaF9pY2FzZSA9IDA7CisgICAgICAgICAgZnJlZSAoa2V5cyk7CisgICAg ICAgICAga2V5cyA9IG5ld19rZXlzOworICAgICAgICAgIGtleWNjID0gbmV3X2tleWNjOworICAg ICAgICB9CisgICAgfQorCiAjaWYgTUJTX1NVUFBPUlQKICAgaWYgKE1CX0NVUl9NQVggPiAxKQog ICAgIGJ1aWxkX21iY2xlbl9jYWNoZSAoKTsKLS0gCjEuOC41LjIKCg== --------_5312F82500000000B64E_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 03 01:13:30 2014 Received: (at 16912) by debbugs.gnu.org; 3 Mar 2014 06:13:30 +0000 Received: from localhost ([127.0.0.1]:48433 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKM82-0003j2-2n for submit@debbugs.gnu.org; Mon, 03 Mar 2014 01:13:30 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:60736) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKM7z-0003it-Db for 16912@debbugs.gnu.org; Mon, 03 Mar 2014 01:13:27 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 729C239E8018; Sun, 2 Mar 2014 22:13:26 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u7ovOT5kAU5U; Sun, 2 Mar 2014 22:13:26 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 1A70C39E8012; Sun, 2 Mar 2014 22:13:26 -0800 (PST) Message-ID: <53141D85.1000602@cs.ucla.edu> Date: Sun, 02 Mar 2014 22:13:25 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Norihiro Tanaka Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine References: <20140301184821.6A22.27F6AC2D@kcn.ne.jp> <5312779A.3010300@cs.ucla.edu> <20140302102328.25CC.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140302102328.25CC.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 16912 Cc: 16912@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Norihiro Tanaka wrote: > However I don't understand why the optimization isn't completed on > non-UTF8 locale only. Can you explain it? Sorry, no; there's a lot about that code I don't yet understand. From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 03 02:07:48 2014 Received: (at 16912-done) by debbugs.gnu.org; 3 Mar 2014 07:07:48 +0000 Received: from localhost ([127.0.0.1]:48470 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKMyZ-00058j-AA for submit@debbugs.gnu.org; Mon, 03 Mar 2014 02:07:47 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:34234) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKMyV-00058Z-PA for 16912-done@debbugs.gnu.org; Mon, 03 Mar 2014 02:07:44 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 07408A60001; Sun, 2 Mar 2014 23:07:43 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BjSEL--njcRD; Sun, 2 Mar 2014 23:07:41 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 9276839E8008; Sun, 2 Mar 2014 23:07:41 -0800 (PST) Message-ID: <53142A3D.80202@cs.ucla.edu> Date: Sun, 02 Mar 2014 23:07:41 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Norihiro Tanaka , 16912-done@debbugs.gnu.org Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine References: <5312779A.3010300@cs.ucla.edu> <20140302102328.25CC.27F6AC2D@kcn.ne.jp> <20140302184759.B657.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140302184759.B657.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------080400040706000805040500" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 16912-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) This is a multi-part message in MIME format. --------------080400040706000805040500 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Thanks, I tweaked the ChangeLog entries a bit and pushed that. I also pushed the attached patch, which fixes some new bugs and some bugs that were reintroduced by the revival of trivial_case_ignore. I wish we didn't need that function, as it is a bit of a kludge. --------------080400040706000805040500 Content-Type: text/plain; charset=UTF-8; name="0001-grep-fix-some-unlikely-bugs-in-trivial_case_ignore.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0001-grep-fix-some-unlikely-bugs-in-trivial_case_ignore.patc"; filename*1="h" RnJvbSBkNjNhYmU3ZWU5OTBlZTg0ZGFjMzY2NTY0ZTEyZDBjMWI0MTAyMzgyIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTdW4sIDIgTWFyIDIwMTQgMjM6MDI6MjIgLTA4MDAKU3ViamVjdDogW1BBVENI XSBncmVwOiBmaXggc29tZSB1bmxpa2VseSBidWdzIGluIHRyaXZpYWxfY2FzZV9pZ25vcmUK Ciogc3JjL21haW4uYyAoTUJSVE9XQywgV0NSVE9NQik6IFJlZm9ybWF0IGFzIHBlciB1c3Vh bCBHTlUgc3R5bGUuCih0cml2aWFsX2Nhc2VfaWdub3JlKTogRG9uJ3Qgb3ZlcnJ1biBidWZm ZXIgaW4gdGhlIHVudXN1YWwgY2FzZQp3aGVuIGEgY2hhcmFjdGVyIGhhcyBib3RoIGxvd2Vy Y2FzZSBhbmQgdXBwZXJjYXNlIGNvdW50ZXJwYXJ0cy4KRG9uJ3QgcmVseSBvbiB1bmRlZmlu ZWQgYmVoYXZpb3Igd2hlbiBhc3NpZ25pbmcgb3V0LW9mLXJhbmdlIHZhbHVlCnRvIGFuICdp bnQnLiAgU2ltcGxpZnkgYnkgYXZvaWRpbmcgdW5uZWNlc3NhcnkgYnVmZmVyIGNvcGllcy4K V29yayBldmVuIHdpdGggc2hpZnQgZW5jb2RpbmdzLCBieSB1c2luZyBtYnNpbml0IHRvCmRp c2FibGUgdGhlIG9wdGltaXphdGlvbiBpZiB3ZSBhcmUgbm90IGluIHRoZSBpbml0aWFsIHN0 YXRlCndoZW4gd2UgcmVwbGFjZSBCIGJ5IFtCQ0RdLgotLS0KIHNyYy9tYWluLmMgfCA3MiAr KysrKysrKysrKysrKysrKysrKysrKysrKysrLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLQogMSBmaWxlIGNoYW5nZWQsIDMzIGluc2VydGlvbnMoKyksIDM5IGRlbGV0aW9u cygtKQoKZGlmZiAtLWdpdCBhL3NyYy9tYWluLmMgYi9zcmMvbWFpbi5jCmluZGV4IDJlZTU4 NWEuLjE0YjdiZTIgMTAwNjQ0Ci0tLSBhL3NyYy9tYWluLmMKKysrIGIvc3JjL21haW4uYwpA QCAtMTg2NywxOSArMTg2NywyMCBAQCBwYXJzZV9ncmVwX2NvbG9ycyAodm9pZCkKICAgICAg IHJldHVybjsKIH0KIAorI2RlZmluZSBNQlJUT1dDKHB3YywgcywgbiwgcHMpIFwKKyAgKE1C X0NVUl9NQVggPT0gMSBcCisgICA/ICgqKHB3YykgPSBidG93YyAoKih1bnNpZ25lZCBjaGFy ICopIChzKSksIDEpIFwKKyAgIDogbWJydG93YyAocHdjLCBzLCBuLCBwcykpCisjZGVmaW5l IFdDUlRPTUIocywgd2MsIHBzKSBcCisgIChNQl9DVVJfTUFYID09IDEgXAorICAgPyAoKihz KSA9IHdjdG9iICgod2ludF90KSAod2MpKSwgMSkgXAorICAgOiB3Y3J0b21iIChzLCB3Yywg cHMpKQorCiAvKiBJZiB0aGUgbmV3bGluZS1zZXBhcmF0ZWQgcmVndWxhciBleHByZXNzaW9u cywgS0VZUyAod2l0aCBsZW5ndGgsIExFTgogICAgYW5kIG5vIHRyYWlsaW5nIE5VTCBieXRl KSwgYXJlIGFtZW5hYmxlIHRvIHRyYW5zZm9ybWF0aW9uIGludG8KICAgIG90aGVyd2lzZSBl cXVpdmFsZW50IGNhc2UtaWdub3Jpbmcgb25lcywgcGVyZm9ybSB0aGUgdHJhbnNmb3JtYXRp b24sCiAgICBwdXQgdGhlIHJlc3VsdCBpbnRvIG1hbGxvYydkIG1lbW9yeSwgKk5FV19LRVlT IHdpdGggbGVuZ3RoICpORVdfTEVOLAogICAgYW5kIHJldHVybiB0cnVlLiAgT3RoZXJ3aXNl LCByZXR1cm4gZmFsc2UuICAqLwotI2RlZmluZSBNQlJUT1dDKHB3YywgcywgbiwgcHMpIFwK LSAgKE1CX0NVUl9NQVggPT0gMSA/IFwKLSAgICgqKHB3YykgPSBidG93YyAoKih1bnNpZ25l ZCBjaGFyICopIChzKSksIDEpIDogXAotICAgbWJydG93YyAoKHB3YyksIChzKSwgKG4pLCAo cHMpKSkKLSNkZWZpbmUgV0NSVE9NQihzLCB3YywgcHMpIFwKLSAgKE1CX0NVUl9NQVggPT0g MSA/IFwKLSAgICgqKHMpID0gd2N0b2IgKCh3aW50X3QpICh3YykpLCAxKSA6IFwKLSAgIHdj cnRvbWIgKChzKSwgKHdjKSwgKHBzKSkpCiAKIHN0YXRpYyBib29sCiB0cml2aWFsX2Nhc2Vf aWdub3JlIChzaXplX3QgbGVuLCBjaGFyIGNvbnN0ICprZXlzLApAQCAtMTg5MCwyMSArMTg5 MSwyMyBAQCB0cml2aWFsX2Nhc2VfaWdub3JlIChzaXplX3QgbGVuLCBjaGFyIGNvbnN0ICpr ZXlzLAogICBpZiAobWVtY2hyIChrZXlzLCAnXFwnLCBsZW4pIHx8IG1lbWNociAoa2V5cywg J1snLCBsZW4pKQogICAgIHJldHVybiBmYWxzZTsKIAotICAvKiBXb3JzdCBjYXNlIGlzIHRo YXQgZWFjaCBieXRlIEIgb2YgS0VZUyBpcyBBU0NJSSBhbHBoYWJldGljIGFuZCBlYWNoCi0g ICAgIG90aGVyX2Nhc2UoQikgY2hhcmFjdGVyLCBDLCBvY2N1cGllcyBNQl9DVVJfTUFYIGJ5 dGVzLCBzbyBlYWNoIEIKLSAgICAgbWFwcyB0byBbQkNdLCB3aGljaCByZXF1aXJlcyBNQl9D VVJfTUFYICsgMyBieXRlcy4gICAqLwotICAqbmV3X2tleXMgPSB4bm1hbGxvYyAoTUJfQ1VS X01BWCArIDMsIGxlbiArIDEpOworICAvKiBXb3JzdCBjYXNlIGlzIHRoYXQgZWFjaCBieXRl IEIgb2YgS0VZUyBpcyBBU0NJSSBhbHBoYWJldGljIGFuZAorICAgICB0aGUgdHdvIHR3byBv dGhlcl9jYXNlKEIpIGNoYXJhY3RlcnMsIEMgYW5kIEQsIGVhY2ggb2NjdXBpZXMKKyAgICAg TUJfQ1VSX01BWCBieXRlcywgc28gZWFjaCBCIG1hcHMgdG8gW0JDRF0sIHdoaWNoIHJlcXVp cmVzIDIgKgorICAgICBNQl9DVVJfTUFYICsgMyBieXRlczsgdGhpcyBpcyBib3VuZGVkIGFi b3ZlIGJ5IHRoZSBjb25zdGFudAorICAgICBleHByZXNzaW9uIDIgKiBNQl9MRU5fTUFYICsg My4gICovCisgICpuZXdfa2V5cyA9IHhubWFsbG9jIChsZW4gKyAxLCAyICogTUJfTEVOX01B WCArIDMpOwogICBjaGFyICpwID0gKm5ld19rZXlzOwogCi0gIG1ic3RhdGVfdCBtYl9zdGF0 ZTsKLSAgbWVtc2V0ICgmbWJfc3RhdGUsIDAsIHNpemVvZiBtYl9zdGF0ZSk7CisgIG1ic3Rh dGVfdCBtYl9zdGF0ZSA9IHsgMCB9OwogICB3aGlsZSAobGVuKQogICAgIHsKKyAgICAgIGJv b2wgaW5pdGlhbF9zdGF0ZSA9IG1ic2luaXQgKCZtYl9zdGF0ZSkgIT0gMDsKICAgICAgIHdj aGFyX3Qgd2M7Ci0gICAgICBpbnQgbiA9IE1CUlRPV0MgKCZ3Yywga2V5cywgbGVuLCAmbWJf c3RhdGUpOworICAgICAgc2l6ZV90IG4gPSBNQlJUT1dDICgmd2MsIGtleXMsIGxlbiwgJm1i X3N0YXRlKTsKIAogICAgICAgLyogRm9yIGFuIGludmFsaWQsIGluY29tcGxldGUgb3IgTCdc MCcsIHNraXAgdGhpcyBvcHRpbWl6YXRpb24uICAqLwotICAgICAgaWYgKG4gPD0gMCkKKyAg ICAgIGlmICgoc2l6ZV90KSAtMiA8PSBuKQogICAgICAgICB7CiAgICAgICAgIHNraXBfY2Fz ZV9pZ25vcmVfb3B0aW1pemF0aW9uOgogICAgICAgICAgIGZyZWUgKCpuZXdfa2V5cyk7CkBA IC0xOTE1LDM5ICsxOTE4LDMwIEBAIHRyaXZpYWxfY2FzZV9pZ25vcmUgKHNpemVfdCBsZW4s IGNoYXIgY29uc3QgKmtleXMsCiAgICAgICBrZXlzICs9IG47CiAgICAgICBsZW4gLT0gbjsK IAotICAgICAgaWYgKCFpc3dhbHBoYSAod2MpKQorICAgICAgd2ludF90IGxjID0gdG93bG93 ZXIgKHdjKTsKKyAgICAgIHdpbnRfdCB1YyA9IHRvd3VwcGVyICh3Yyk7CisgICAgICBpZiAo bGMgPT0gd2MgJiYgdWMgPT0gd2MpCiAgICAgICAgIHsKICAgICAgICAgICBtZW1jcHkgKHAs IG9yaWcsIG4pOwogICAgICAgICAgIHAgKz0gbjsKICAgICAgICAgfQorICAgICAgZWxzZSBp ZiAoISBpbml0aWFsX3N0YXRlKQorICAgICAgICBnb3RvIHNraXBfY2FzZV9pZ25vcmVfb3B0 aW1pemF0aW9uOwogICAgICAgZWxzZQogICAgICAgICB7CiAgICAgICAgICAgKnArKyA9ICdb JzsKICAgICAgICAgICBtZW1jcHkgKHAsIG9yaWcsIG4pOwogICAgICAgICAgIHAgKz0gbjsK IAotICAgICAgICAgIHdpbnRfdCBmb2xkZWQgPSB0b3dsb3dlciAod2MpOwotICAgICAgICAg IGlmIChmb2xkZWQgIT0gd2MpCi0gICAgICAgICAgICB7Ci0gICAgICAgICAgICAgIGNoYXIg YnVmW01CX0NVUl9NQVhdOwotICAgICAgICAgICAgICBpbnQgbjIgPSBXQ1JUT01CIChidWYs IGZvbGRlZCwgJm1iX3N0YXRlKTsKLSAgICAgICAgICAgICAgaWYgKG4yIDw9IDApCi0gICAg ICAgICAgICAgICAgZ290byBza2lwX2Nhc2VfaWdub3JlX29wdGltaXphdGlvbjsKLSAgICAg ICAgICAgICAgYXNzZXJ0IChuMiA8PSBNQl9DVVJfTUFYKTsKLSAgICAgICAgICAgICAgbWVt Y3B5IChwLCBidWYsIG4yKTsKLSAgICAgICAgICAgICAgcCArPSBuMjsKLSAgICAgICAgICAg IH0KLSAgICAgICAgICBmb2xkZWQgPSB0b3d1cHBlciAod2MpOwotICAgICAgICAgIGlmIChm b2xkZWQgIT0gd2MpCi0gICAgICAgICAgICB7Ci0gICAgICAgICAgICAgIGNoYXIgYnVmW01C X0NVUl9NQVhdOwotICAgICAgICAgICAgICBpbnQgbjIgPSBXQ1JUT01CIChidWYsIGZvbGRl ZCwgJm1iX3N0YXRlKTsKLSAgICAgICAgICAgICAgaWYgKG4yIDw9IDApCi0gICAgICAgICAg ICAgICAgZ290byBza2lwX2Nhc2VfaWdub3JlX29wdGltaXphdGlvbjsKLSAgICAgICAgICAg ICAgYXNzZXJ0IChuMiA8PSBNQl9DVVJfTUFYKTsKLSAgICAgICAgICAgICAgbWVtY3B5IChw LCBidWYsIG4yKTsKLSAgICAgICAgICAgICAgcCArPSBuMjsKLSAgICAgICAgICAgIH0KKyAg ICAgICAgICBzaXplX3QgbGNieXRlcyA9IFdDUlRPTUIgKHAsIGxjLCAmbWJfc3RhdGUpOwor ICAgICAgICAgIGlmIChsY2J5dGVzID09IChzaXplX3QpIC0xKQorICAgICAgICAgICAgZ290 byBza2lwX2Nhc2VfaWdub3JlX29wdGltaXphdGlvbjsKKyAgICAgICAgICBwICs9IGxjYnl0 ZXM7CisKKyAgICAgICAgICBzaXplX3QgdWNieXRlcyA9IFdDUlRPTUIgKHAsIHVjLCAmbWJf c3RhdGUpOworICAgICAgICAgIGlmICh1Y2J5dGVzID09IChzaXplX3QpIC0xIHx8ICEgbWJz aW5pdCAoJm1iX3N0YXRlKSkKKyAgICAgICAgICAgIGdvdG8gc2tpcF9jYXNlX2lnbm9yZV9v cHRpbWl6YXRpb247CisgICAgICAgICAgcCArPSB1Y2J5dGVzOwogCiAgICAgICAgICAgKnAr KyA9ICddJzsKICAgICAgICAgfQotLSAKMS44LjUuMwoK --------------080400040706000805040500-- From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 04 10:50:12 2014 Received: (at 16912) by debbugs.gnu.org; 4 Mar 2014 15:50:12 +0000 Received: from localhost ([127.0.0.1]:50483 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKrbf-0004qe-C7 for submit@debbugs.gnu.org; Tue, 04 Mar 2014 10:50:11 -0500 Received: from mail-qc0-f173.google.com ([209.85.216.173]:57360) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKrbY-0004qQ-4A for 16912@debbugs.gnu.org; Tue, 04 Mar 2014 10:50:05 -0500 Received: by mail-qc0-f173.google.com with SMTP id r5so4320415qcx.18 for <16912@debbugs.gnu.org>; Tue, 04 Mar 2014 07:50:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=EWH290Byc1PT3KzkSc88SlFNijZOZwyjoOpP3CmhYVg=; b=RdVOTpaqwRFvyMnobwvtWAL0n+1yAKru2mlnl9aeOrLD34h8dQ4kFbPxON1KMlPA1C kxlJ/Ap7YOeGXRsZ2OCOK1fe1NLUVXJ5LTpshtWiXKei+TxzdecHP6Ax2T9srakPW5lS vsgU5be3rrR2CmgeL1+MGNghwwEQv7kcB9LHM7OjKcsrwXHqZtX1LoVV54Tqn25iRDMl YPKKUSiKx0bJkuhtLfh5OLLEKWNXqWbbg3N8xjKPw+DEVswpEgx4L+ayusVpR7+3Do22 hKIvN5imrinrxn8PF6WSkRLIFiZhQrvBanPJkzcNYbGT3gScSB4YQQY4c+ddTVGrdptE sG5A== X-Received: by 10.140.34.99 with SMTP id k90mr238753qgk.15.1393948203333; Tue, 04 Mar 2014 07:50:03 -0800 (PST) Received: from yakj.usersys.redhat.com (net-37-117-154-249.cust.vodafonedsl.it. [37.117.154.249]) by mx.google.com with ESMTPSA id l6sm50589323qac.8.2014.03.04.07.50.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 04 Mar 2014 07:50:02 -0800 (PST) Message-ID: <5315F627.6050802@gnu.org> Date: Tue, 04 Mar 2014 16:49:59 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Paul Eggert , Norihiro Tanaka Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine References: <20140301184821.6A22.27F6AC2D@kcn.ne.jp> <5312779A.3010300@cs.ucla.edu> <20140302102328.25CC.27F6AC2D@kcn.ne.jp> <53141D85.1000602@cs.ucla.edu> In-Reply-To: <53141D85.1000602@cs.ucla.edu> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16912 Cc: 16912@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Il 03/03/2014 07:13, Paul Eggert ha scritto: > Norihiro Tanaka wrote: >> However I don't understand why the optimization isn't completed on >> non-UTF8 locale only. Can you explain it? > > Sorry, no; there's a lot about that code I don't yet understand. IIRC it's because a CSET matches any byte, while the corresponding MBCSET only matches that byte if it is a single-byte character. So for example, say "\x83A" is a two-byte character. The CSET "A" will match it but the corresponding MBCSET will not. This can happen in the Shift-JIS encoding. Paolo From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 04 18:12:53 2014 Received: (at 16912) by debbugs.gnu.org; 4 Mar 2014 23:12:53 +0000 Received: from localhost ([127.0.0.1]:50873 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKyW4-0004qr-UM for submit@debbugs.gnu.org; Tue, 04 Mar 2014 18:12:53 -0500 Received: from pbsg501.nifty.com ([202.248.238.71]:63057) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKyW0-0004qf-Jj for 16912@debbugs.gnu.org; Tue, 04 Mar 2014 18:12:51 -0500 Received: from [10.120.1.49] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg501.nifty.com with ESMTP id s24NCf2L029219; Wed, 5 Mar 2014 08:12:41 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Wed, 05 Mar 2014 08:12:42 +0900 From: Norihiro Tanaka To: Paolo Bonzini Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine In-Reply-To: <5315F627.6050802@gnu.org> References: <53141D85.1000602@cs.ucla.edu> <5315F627.6050802@gnu.org> Message-Id: <20140305081241.7834.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Paul Eggert wrote: > IIRC it's because a CSET matches any byte, while the corresponding > MBCSET only matches that byte if it is a single-byte character. > So for example, say "\x82\x61" is a two-byte character. The CSET "A" > will match it but the corresponding MBCSET will not. > > This can happen in the Shift-JIS encoding. [...] Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see ] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain X-Debbugs-Envelope-To: 16912 Cc: Paul Eggert , 16912@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Paul Eggert wrote: > IIRC it's because a CSET matches any byte, while the corresponding > MBCSET only matches that byte if it is a single-byte character. > So for example, say "\x82\x61" is a two-byte character. The CSET "A" > will match it but the corresponding MBCSET will not. > > This can happen in the Shift-JIS encoding. [...] Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see ] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain Paul Eggert wrote: > IIRC it's because a CSET matches any byte, while the corresponding > MBCSET only matches that byte if it is a single-byte character. > So for example, say "\x82\x61" is a two-byte character. The CSET "A" > will match it but the corresponding MBCSET will not. > > This can happen in the Shift-JIS encoding. First, I also thoutht such a case. But perhaps it's no problem, because DFA will never come across CSET on second byte in Shift_JIS. "grep -i A" -> [Aa] -> CSET "grep -i $"\x82A" -> [$"\x82\x82A"$"\x82\x82"] -> \x82 A CAT \x82 \x82 CAT OR Laster will be never \x82 [A\x82] -> \x82 CSET CAT. From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 05 03:00:06 2014 Received: (at 16912) by debbugs.gnu.org; 5 Mar 2014 08:00:06 +0000 Received: from localhost ([127.0.0.1]:51035 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WL6kH-0003JU-5P for submit@debbugs.gnu.org; Wed, 05 Mar 2014 03:00:05 -0500 Received: from mail-qa0-f45.google.com ([209.85.216.45]:44736) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WL6kC-0003IW-Fs for 16912@debbugs.gnu.org; Wed, 05 Mar 2014 03:00:00 -0500 Received: by mail-qa0-f45.google.com with SMTP id hw13so649431qab.4 for <16912@debbugs.gnu.org>; Tue, 04 Mar 2014 23:59:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=vsqz6zYeq1WY5CaU0wCW8PZQM57mOY/3abvI0tx5L/E=; b=E9C++D4mwypJwUzBRG+mByLKbOckS0EaxGqCBT7AMwTqrbzYesf8PpEzI0BMUVWmiS x/3AtymqRiV0aYh9rte9KYqF0Z4xyclkId3gDNNSLwa/bgxalYd/Plwl5WJcmuUxPZ+e 7xsjKH4ydYc9U4hQQF8kytA9JUScTSqV+kZ3DdciAP8NBNrl4+tMbe12YGY4x6JQNpt/ ZLtQcWeCY2bRdyjd1zpdwM5Pjiy9WQAI73WrOgjSRkbC6p9XUnEQYo1p9jShctuZbbCC htzb91CLy0TsWB/HcnksgX46bf73brJW8T6KEHvnq9fbQiiW4KZP9PevRKev7SjEyUGm BDOg== X-Received: by 10.224.112.6 with SMTP id u6mr4934059qap.78.1394006399752; Tue, 04 Mar 2014 23:59:59 -0800 (PST) Received: from yakj.usersys.redhat.com (nat-pool-mxp-t.redhat.com. [209.132.186.18]) by mx.google.com with ESMTPSA id q10sm5384944qaj.13.2014.03.04.23.59.57 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 04 Mar 2014 23:59:58 -0800 (PST) Message-ID: <5316D978.2060002@gnu.org> Date: Wed, 05 Mar 2014 08:59:52 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Norihiro Tanaka Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine References: <53141D85.1000602@cs.ucla.edu> <5315F627.6050802@gnu.org> <20140305081241.7834.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140305081241.7834.27F6AC2D@kcn.ne.jp> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16912 Cc: Paul Eggert , 16912@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Il 05/03/2014 00:12, Norihiro Tanaka ha scritto: > First, I also thoutht such a case. But perhaps it's no problem, because > DFA will never come across CSET on second byte in Shift_JIS. > > "grep -i A" -> [Aa] -> CSET > "grep -i $"\x82A" -> [$"\x82\x82A"$"\x82\x82"] -> \x82 A CAT \x82 \x82 CAT OR > > Laster will be never \x82 [A\x82] -> \x82 CSET CAT. What about these two commands: grep [a] grep -i A Would they match \x82\x61 ("B", U+0FF22) with your patch? And without it? Paolo From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 05 08:41:52 2014 Received: (at 16912) by debbugs.gnu.org; 5 Mar 2014 13:41:52 +0000 Received: from localhost ([127.0.0.1]:51170 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WLC51-0006eD-8Q for submit@debbugs.gnu.org; Wed, 05 Mar 2014 08:41:51 -0500 Received: from pbsg501.nifty.com ([202.248.238.71]:19842) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WLC4v-0006dw-3w for 16912@debbugs.gnu.org; Wed, 05 Mar 2014 08:41:48 -0500 Received: from [10.120.1.41] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg501.nifty.com with ESMTP id s25DfUQx024779; Wed, 5 Mar 2014 22:41:31 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Wed, 05 Mar 2014 22:41:31 +0900 From: Norihiro Tanaka To: Paolo Bonzini Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine In-Reply-To: <5316D978.2060002@gnu.org> References: <20140305081241.7834.27F6AC2D@kcn.ne.jp> <5316D978.2060002@gnu.org> Message-Id: <20140305224128.7885.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 16912 Cc: Paul Eggert , 16912@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Paolo Bonzini wrote: > What about these two commands: > > grep [a] > grep -i A > > Would they match \x82\x61 ("B", U+0FF22) with your patch? And without it? No match for all. -- Before the patch: $ locale -a | grep sjis ja_JP.sjis $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep -i 'A' dfaanalyze: 0:A 1:a 2:OR 3:END 4:CAT $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep '[a]' dfaanalyze: 0:MBCSET 1:END 2:CAT After the patch: $ locale -a | grep sjis ja_JP.sjis $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep -i 'A' dfaanalyze: 0:CSET 1:END 2:CAT $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep '[a]' dfaanalyze: 0:CSET 1:END 2:CAT -- Norihiro From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 05 09:32:06 2014 Received: (at 16912) by debbugs.gnu.org; 5 Mar 2014 14:32:07 +0000 Received: from localhost ([127.0.0.1]:51198 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WLCrd-00088J-BZ for submit@debbugs.gnu.org; Wed, 05 Mar 2014 09:32:06 -0500 Received: from mail-ee0-f47.google.com ([74.125.83.47]:52896) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WLCrW-00087u-It for 16912@debbugs.gnu.org; Wed, 05 Mar 2014 09:31:59 -0500 Received: by mail-ee0-f47.google.com with SMTP id b15so492197eek.20 for <16912@debbugs.gnu.org>; Wed, 05 Mar 2014 06:31:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=kVuMsUAtx6PoqbcYT86gjkRSMzJnH4do617W0InDHq4=; b=EeiE/fEQ9kURh3JAOYmD/n0+/h6fIwHGyZiwzbU9pQwZwJmXykNpMAmVUJDK5PjZKA P5dcGRRw0ae3ZHJxgAPPvZ6kDHJV46hdCkkDuEkhrVs+QBV0/l/wpg31aNZgxHegJXMs La+RiJDGxiVKwf44icC+LyrkQMPHiGKQn5EGUaMqKbsn1Rorkjy2tIVervCZVi+RygRb F0FrQJEWU1/YzcSQFU5kFCCMXixSuyE27jVOMxF0dctIri8JuMqMwWSZtTErQ34wRX1Y RUWq+ZI3/iDd0/QE5gVbHxFZ2VCFUXRTNK0OtukrbJ2bCnQDnWgXPHlOvIrkKoDeng9H MoQQ== X-Received: by 10.14.205.130 with SMTP id j2mr6225667eeo.76.1394029917448; Wed, 05 Mar 2014 06:31:57 -0800 (PST) Received: from yakj.usersys.redhat.com (nat-pool-mxp-t.redhat.com. [209.132.186.18]) by mx.google.com with ESMTPSA id o5sm9659221eeg.8.2014.03.05.06.31.55 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 05 Mar 2014 06:31:56 -0800 (PST) Message-ID: <5317355A.5060406@gnu.org> Date: Wed, 05 Mar 2014 15:31:54 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Norihiro Tanaka Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine References: <20140305081241.7834.27F6AC2D@kcn.ne.jp> <5316D978.2060002@gnu.org> <20140305224128.7885.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140305224128.7885.27F6AC2D@kcn.ne.jp> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16912 Cc: Paul Eggert , 16912@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Il 05/03/2014 14:41, Norihiro Tanaka ha scritto: > Paolo Bonzini wrote: > >> What about these two commands: >> >> grep [a] >> grep -i A >> >> Would they match \x82\x61 ("B", U+0FF22) with your patch? And without it? > > No match for all. Right, it's handled by SKIP_REMAINS_MB_IF_INITIAL_STATE. dfa.c never stops surprising me. Great catch! Paolo > -- > Before the patch: > > $ locale -a | grep sjis > ja_JP.sjis > $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep -i 'A' > dfaanalyze: > 0:A 1:a 2:OR 3:END 4:CAT > $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep '[a]' > dfaanalyze: > 0:MBCSET 1:END 2:CAT > > After the patch: > > $ locale -a | grep sjis > ja_JP.sjis > $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep -i 'A' > dfaanalyze: > 0:CSET 1:END 2:CAT > $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep '[a]' > dfaanalyze: > 0:CSET 1:END 2:CAT > -- > > Norihiro > > From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 05 18:47:50 2014 Received: (at 16912) by debbugs.gnu.org; 5 Mar 2014 23:47:50 +0000 Received: from localhost ([127.0.0.1]:52112 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WLLXR-0008Op-Li for submit@debbugs.gnu.org; Wed, 05 Mar 2014 18:47:50 -0500 Received: from pbsg501.nifty.com ([202.248.238.71]:33409) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WLLXN-0008Oa-9I for 16912@debbugs.gnu.org; Wed, 05 Mar 2014 18:47:47 -0500 Received: from [10.120.1.13] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg501.nifty.com with ESMTP id s25Nlen1028791; Thu, 6 Mar 2014 08:47:41 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Thu, 06 Mar 2014 08:47:41 +0900 From: Norihiro Tanaka To: Paolo Bonzini Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine In-Reply-To: <5317355A.5060406@gnu.org> References: <20140305224128.7885.27F6AC2D@kcn.ne.jp> <5317355A.5060406@gnu.org> Message-Id: <20140306084739.6F00.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 16912 Cc: Paul Eggert , 16912@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Paolo Bonzini wrote: > Right, it's handled by SKIP_REMAINS_MB_IF_INITIAL_STATE. Yes. It's handled by SKIP_REMAINS_MB_IF_INITIAL_STATE, so no problem. Norihiro From unknown Tue Jun 17 20:17:19 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 03 Apr 2014 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator