From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 23 17:39:43 2013 Received: (at submit) by debbugs.gnu.org; 23 Dec 2013 22:39:43 +0000 Received: from localhost ([127.0.0.1]:36821 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvEA2-00019J-W5 for submit@debbugs.gnu.org; Mon, 23 Dec 2013 17:39:43 -0500 Received: from eggs.gnu.org ([208.118.235.92]:42455) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvEA0-00019A-U1 for submit@debbugs.gnu.org; Mon, 23 Dec 2013 17:39:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VvE9z-00011c-Eb for submit@debbugs.gnu.org; Mon, 23 Dec 2013 17:39:40 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:46932) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VvE9z-00011Y-Bx for submit@debbugs.gnu.org; Mon, 23 Dec 2013 17:39:39 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44156) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VvE9y-0006qK-2d for bug-grep@gnu.org; Mon, 23 Dec 2013 17:39:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VvE9w-00010s-Ne for bug-grep@gnu.org; Mon, 23 Dec 2013 17:39:37 -0500 Received: from mail-pb0-x22b.google.com ([2607:f8b0:400e:c01::22b]:43258) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VvE9w-00010c-Au for bug-grep@gnu.org; Mon, 23 Dec 2013 17:39:36 -0500 Received: by mail-pb0-f43.google.com with SMTP id rq2so5765606pbb.2 for ; Mon, 23 Dec 2013 14:39:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:message-id:subject:to:content-type; bh=HqK6N9iAz6KLNGc+4jbaK1kRt+eX3LqpTy+vXbnOsLY=; b=iEZk31buf6NGzV/yimeAPBqfKC+uiFXQerHoauLqCCJj6E0JZd8HcQXaIpgJP6Zp0H WcMFvJ3I4EUvO95bl2AmQ/qMZZKPW1TU6hKM1pD9iqauyvrjda5jJsMZsqC7zX2fNsBh EHjhIalNBMOP9cyQLxJiUYQQ1MzMq8sKQgSrPz5BQaS7YvbiLh7N0RJ77XGuj4Lf/PHu EfU2BEVPJZpWpgEshrVgBBxiEAECMme7Pfu4eDoYK6iQEVGZlEp/5et1P219yZH3od+g eRcgAQcliXgzpX6YlTg158S9lYuPpsmIAVBLVlJ6a0ISeMH7+gIeNg8rhIr+80z4SH7G lt8A== X-Received: by 10.67.5.233 with SMTP id cp9mr5966270pad.147.1387838375140; Mon, 23 Dec 2013 14:39:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.6.66 with HTTP; Mon, 23 Dec 2013 14:39:14 -0800 (PST) From: Jim Meyering Date: Mon, 23 Dec 2013 14:39:14 -0800 X-Google-Sender-Auth: oyyliFtta9ux04JoDHxXFMb2Wpk Message-ID: Subject: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: bug-grep@gnu.org Content-Type: multipart/mixed; boundary=047d7b15b1e55c767704ee3b4dc2 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --047d7b15b1e55c767704ee3b4dc2 Content-Type: text/plain; charset=ISO-8859-1 FYI, here is a quick and clean/safe performance improvement for grep -i. I expect to push this commit right after the upcoming bug-fix release. Currently, this optimization is enabled when the search string is ASCII and contains neither of '\' (backslash) nor '['. I expect to eliminate the latter two constraints in a follow-on commit including tests to exercise all of the corner cases. Happy holidays, Jim --047d7b15b1e55c767704ee3b4dc2 Content-Type: text/plain; charset=US-ASCII; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hpkayddn0 RnJvbSBjYTY0ZmU2M2Q5NDA0YjY1NzU0YjZmNmIwN2FkYzE5MTBkZGFlNWE0IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog U3VuLCAyNCBOb3YgMjAxMyAxODo0OTozMSAtMDgwMApTdWJqZWN0OiBbUEFUQ0hdIGdyZXA6IG1h a2UgLS1pZ25vcmUtY2FzZSAoLWkpIGZhc3RlciAoc29tZXRpbWVzIDEweCkgaW4KIG11bHRpYnl0 ZSBsb2NhbGVzCgpUaGVzZSBkYXlzLCBuZWFybHkgZXZlcnlvbmUgdXNlcyBhIG11bHRpYnl0ZSBs b2NhbGUsIGFuZCBncmVwIGlzIG9mdGVuCnVzZWQgd2l0aCB0aGUgLS1pZ25vcmUtY2FzZSAoLWkp IG9wdGlvbiwgYnV0IHRoYXQgb3B0aW9uIGltcG9zZXMgYSB2ZXJ5CmhpZ2ggY29zdCBpbiBvcmRl ciB0byBoYW5kbGUgc29tZSB1bnVzdWFsIGNhc2VzIGluIGp1c3QgYSBmZXcgbXVsdGlieXRlCmxv Y2FsZXMuICBUaGlzIGNoYW5nZSBnZXRzIG1vc3Qgb2YgdGhlIHBlcmZvcm1hbmNlIG9mIHVzaW5n IExDX0FMTD1DCndpdGhvdXQgZWxpbWluYXRpbmcgdGhlIGFiaWxpdHkgdG8gc2VhcmNoIGZvciBt dWx0aWJ5dGUgc3RyaW5ncy4KCldpdGggdGhlIGZvbGxvd2luZyBleGFtcGxlLCBJIHNlZSBhbiAx MXggc3BlZWQtdXAgd2l0aCBhIDIuM0dIeiBpNyBhbmQgYW4gU1NEOgpHZW5lcmF0ZSBhIDEwTS1s aW5lIGZpbGUsIHdpdGggZWFjaCBsaW5lIGNvbnNpc3Rpbmcgb2YgNDAgJ2onczoKCiAgICB5ZXMg ampqampqampqampqampqampqampqampqampqampqampqampqampqaiB8IGhlYWQgLTEwMDAwMDAw ID4gawoKVGltZSBzZWFyY2hpbmcgaXQgZm9yIHRoZSBzaW1wbGUvbm9leGlzdGVudCBzdHJpbmcg ImZvb2JhciIsCmZpcnN0IHdpdGggdGhpcyBwYXRjaCAoYmVzdC1vZi01IHRyaWFscyk6CgogICAg TENfQUxMPWVuX1VTLlVURi04IGVudiB0aW1lIHNyYy9ncmVwIC1pIGZvb2JhciBrCiAgICAgICAg MS4xMCByZWFsICAgICAgICAgMS4wMyB1c2VyICAgICAgICAgMC4wNyBzeXMKCkJhY2sgb3V0IHRo YXQgY29tbWl0ICh0ZW1wb3JhcmlseSksIHJlY29tcGlsZSwgYW5kIHJlcnVuIHRoZSBleHBlcmlt ZW50OgoKICAgIGdpdCBsb2cgLTEgLXB8cGF0Y2ggLVIgLXAxOyBtYWtlCiAgICBMQ19BTEw9ZW5f VVMuVVRGLTggZW52IHRpbWUgc3JjL2dyZXAgLWkgZm9vYmFyIGsKICAgICAgICAyNS4xMSByZWFs ICAgICAgICAxNy40NSB1c2VyICAgICAgICAgMC4xNyBzeXMKClRoZSB0cmljayBpcyB0byByZWFs aXplIHRoYXQgZm9yIHNvbWUgc2VhcmNoIHN0cmluZ3MsIGl0IGlzIGVhc3kKdG8gY29udmVydCB0 byBhbiBlcXVpdmFsZW50IG9uZSB0aGF0IGlzIGhhbmRsZWQgbXVjaCBtb3JlIGVmZmljaWVudGx5 LgpFLmcuLCBjb252ZXJ0IHRoaXMgY29tbWFuZDoKCiAgZ3JlcCAtaSBmb29iYXIgawoKdG8gdGhp czoKCiAgZ3JlcCAnW2ZGXVtvT11bb09dW2JCXVthQV1bclJdJyBrCgpUaGF0IGFsbG93cyB0aGUg bWF0Y2hlciB0byBzZWFyY2ggaW4gYnVmZmVyIG1vZGUsIHJhdGhlciB0aGFuIGhhdmluZyB0bwpl eHRyYWN0L2Nhc2UtY29udmVydC9zZWFyY2ggZWFjaCBsaW5lIHNlcGFyYXRlbHkuICBDdXJyZW50 bHksIHdlIHBlcmZvcm0KdGhpcyBjb252ZXJzaW9uIG9ubHkgd2hlbiBzZWFyY2ggc3RyaW5ncyBh cmUgYWxsIEFTQ0lJIGFuZCBjb250YWluIG5laXRoZXIKJ1wnIG5vciAnWycuICBTZWUgdGhlIGNv bW1lbnRzIGZvciBtb3JlIGRldGFpbC4KCiogc3JjL21haW4uYyAodHJpdmlhbF9jYXNlX2NvbnZl cnQpOiBOZXcgZnVuY3Rpb24uCihtYWluKTogV2hlbiBwb3NzaWJsZSwgdHJhbnNmb3JtIHRoZSBy ZWdleHAgc28gd2UgY2FuIGRyb3AgdGhlIC1pLgotLS0KIHNyYy9tYWluLmMgfCA3NiArKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrLQog MSBmaWxlIGNoYW5nZWQsIDc1IGluc2VydGlvbnMoKyksIDEgZGVsZXRpb24oLSkKCmRpZmYgLS1n aXQgYS9zcmMvbWFpbi5jIGIvc3JjL21haW4uYwppbmRleCAyMTY4YTBhLi44YjdiNDkyIDEwMDY0 NAotLS0gYS9zcmMvbWFpbi5jCisrKyBiL3NyYy9tYWluLmMKQEAgLTE2NDQsMTMgKzE2NDQsMTQg QEAgaWYgYW55IGVycm9yIG9jY3VycyBhbmQgLXEgaXMgbm90IGdpdmVuLCB0aGUgZXhpdCBzdGF0 dXMgaXMgMi5cbiIpKTsKICAgZXhpdCAoc3RhdHVzKTsKIH0KCitzdGF0aWMgY2hhciBjb25zdCAq bWF0Y2hlcjsKKwogLyogSWYgTSBpcyBOVUxMLCBpbml0aWFsaXplIHRoZSBtYXRjaGVyIHRvIHRo ZSBkZWZhdWx0LiAgT3RoZXJ3aXNlIHNldCB0aGUKICAgIG1hdGNoZXIgdG8gTSBpZiBhdmFpbGFi bGUuICBFeGl0IGluIGNhc2Ugb2YgY29uZmxpY3RzIG9yIGlmIE0gaXMgbm90CiAgICBhdmFpbGFi bGUuICAqLwogc3RhdGljIHZvaWQKIHNldG1hdGNoZXIgKGNoYXIgY29uc3QgKm0pCiB7Ci0gIHN0 YXRpYyBjaGFyIGNvbnN0ICptYXRjaGVyOwogICB1bnNpZ25lZCBpbnQgaTsKCiAgIGlmICghbSkK QEAgLTE4NjUsNiArMTg2Niw1MCBAQCBwYXJzZV9ncmVwX2NvbG9ycyAodm9pZCkKICAgICAgIHJl dHVybjsKIH0KCisvKiBJZiB0aGUgcmVnZXhwIEtFWVMsIGlzIGFtZW5hYmxlIHRvIHRyaXZpYWwg Y2FzZSBjb252ZXJzaW9uLAorICAgY3JlYXRlIGEgbmV3IHJlZ2V4cCwgKCpORVdfS0VZUywgKk5F V19LRVlDQykgdGhhdCBpcyBlcXVpdmFsZW50LAorICAgYnV0IG1hdGNoZXMgY2FzZS1pbnNlbnNp dGl2ZWx5LiAgKi8KK3N0YXRpYyBib29sCit0cml2aWFsX2Nhc2VfY29udmVydCAoc2l6ZV90IGtl eWNjLCBjaGFyIGNvbnN0ICprZXlzLAorICAgICAgICAgICAgICAgICAgICAgIHNpemVfdCAqbmV3 X2tleWNjLCBjaGFyICoqbmV3X2tleXMpCit7CisgIC8qIEZJWE1FOiBjb25zaWRlciByZW1vdmlu ZyB0aGUgZm9sbG93aW5nIHJlc3RyaWN0aW9uOgorICAgICBSZWplY3QgaWYgS0VZUyBjb250YWlu cyAnXFwnIG9yICdbJy4gICovCisgIGlmIChtZW1jaHIgKGtleXMsICdcXCcsIGtleWNjKSB8fCBt ZW1jaHIgKGtleXMsICdbJywga2V5Y2MpKQorICAgIHJldHVybiBmYWxzZTsKKworICAvKiBXb3Jz dCBjYXNlIGlzIHRoYXQgZXZlcnkgYnl0ZSBvZiBrZXlzIHdpbGwgYmUgYWxwaGEsCisgICAgIHNv IGV2ZXJ5IGJ5dGUgQiB3aWxsIG1hcCB0byB0aGUgc2VxdWVuY2Ugb2YgNCBieXRlcyBbQmJdLiAg Ki8KKyAgKm5ld19rZXlzID0geG5tYWxsb2MgKDQsIGtleWNjICsgMSk7CisgIGNoYXIgKnAgPSAq bmV3X2tleXM7CisgIHdoaWxlICgqa2V5cykKKyAgICB7CisgICAgICAvKiBGSVhNRTogY29uc2lk ZXIgcmVtb3ZpbmcgdGhpcyBhc2NpaS1vbmx5IHJlc3RyaWN0aW9uLiAgKi8KKyAgICAgIGlmICgh aXNhc2NpaSAoKmtleXMpKQorICAgICAgICB7CisgICAgICAgICAgZnJlZSAoKm5ld19rZXlzKTsK KyAgICAgICAgICByZXR1cm4gZmFsc2U7CisgICAgICAgIH0KKyAgICAgIGlmICghaXNhbHBoYSAo KmtleXMpKQorICAgICAgICB7CisgICAgICAgICAgKnArKyA9ICprZXlzOworICAgICAgICB9Cisg ICAgICBlbHNlCisgICAgICAgIHsKKyAgICAgICAgICAqcCsrID0gJ1snOworICAgICAgICAgICpw KysgPSAqa2V5czsKKyAgICAgICAgICAqcCsrID0gaXNsb3dlciAoKmtleXMpID8gdG91cHBlciAo KmtleXMpIDogdG9sb3dlciAoKmtleXMpOworICAgICAgICAgICpwKysgPSAnXSc7CisgICAgICAg IH0KKworICAgICAgKytrZXlzOworICAgIH0KKworICAqbmV3X2tleWNjID0gcCAtICpuZXdfa2V5 czsKKworICByZXR1cm4gdHJ1ZTsKK30KKwogaW50CiBtYWluIChpbnQgYXJnYywgY2hhciAqKmFy Z3YpCiB7CkBAIC0yMjYzLDYgKzIzMDgsMzUgQEAgbWFpbiAoaW50IGFyZ2MsIGNoYXIgKiphcmd2 KQogICBlbHNlCiAgICAgdXNhZ2UgKEVYSVRfVFJPVUJMRSk7CgorICAvKiBBcyBjdXJyZW50bHkg aW1wbGVtZW50ZWQsIGNhc2UtaW5zZW5zaXRpdmUgbWF0Y2hpbmcgaXMgZXhwZW5zaXZlIGluCisg ICAgIG11bHRpLWJ5dGUgbG9jYWxlcyBiZWNhdXNlIG9mIGEgZmV3IG91dGxpZXIgbG9jYWxlcyBp biB3aGljaCBzb21lCisgICAgIGNoYXJhY3RlcnMgY2hhbmdlIHNpemUgd2hlbiBjb252ZXJ0ZWQg dG8gdXBwZXIgb3IgbG93ZXIgY2FzZS4gIFRvCisgICAgIGFjY29tbW9kYXRlIHRob3NlLCB3ZSBy ZXZlcnQgdG8gc2VhcmNoaW5nIHRoZSBpbnB1dCBvbmUgbGluZSBhdCBhCisgICAgIHRpbWUsIHJh dGhlciB0aGFuIHVzaW5nIHRoZSBtdWNoIG1vcmUgZWZmaWNpZW50IGJ1ZmZlciBzZWFyY2guCisg ICAgIEhvd2V2ZXIsIGlmIHdlIGhhdmUgYSBwbGFpbiBhc2NpaSBzZWFyY2ggc3RyaW5nLCAvZm9v Lywgd2UgY2FuCisgICAgIGNvbnZlcnQgaXQgdG8gYW4gZXF1aXZhbGVudCBjYXNlLWluc2Vuc2l0 aXZlIC9bZkZdW29PXVtvT10vLCBhbmQgdGh1cworICAgICBhdm9pZCB0aGUgZXhwZW5zaXZlIHJl YWQtYW5kLXByb2Nlc3MtYS1saW5lLWF0LWEtdGltZSByZXF1aXJlbWVudC4KKyAgICAgT3B0aW1p emUtYXdheSB0aGUgIi1pIiBvcHRpb24sIHdoZW4gcG9zc2libGUsIGNvbnZlcnRpbmcgZWFjaAor ICAgICBjYW5kaWRhdGUgYWxwaGEsIEMsIGluIHRoZSByZWdleHAgdG8gW0NjXS4gICovCisgIGlm IChtYXRjaF9pY2FzZSkKKyAgICB7CisgICAgICBzaXplX3QgbmV3X2tleWNjOworICAgICAgY2hh ciAqbmV3X2tleXM7CisgICAgICAvKiBJdCBpcyBub3QgcG9zc2libGUgd2l0aCAtRiwgbm90IHVz ZWZ1bCB3aXRoIC1QIChwY3JlKSBhbmQgdGhlcmUgaXMgbm8KKyAgICAgICAgIHBvaW50IHdoZW4g dGhlcmUgaXMgbm8gcmVnZXhwLiAgSXQgYWxzbyBkZXBlbmRzIG9uIHdoaWNoIGNvbnN0cnVjdHMK KyAgICAgICAgIGFwcGVhciBpbiB0aGUgcmVnZXhwLiAgU2VlIHRyaXZpYWxfY2FzZV9jb252ZXJ0 IGZvciB0aG9zZSBkZXRhaWxzLiAgKi8KKyAgICAgIGlmIChrZXljYworICAgICAgICAgICYmICEg KG1hdGNoZXIKKyAgICAgICAgICAgICAgICAmJiAoU1RSRVEgKG1hdGNoZXIsICJmZ3JlcCIpIHx8 IFNUUkVRIChtYXRjaGVyLCAicGNyZSIpKSkKKyAgICAgICAgICAmJiB0cml2aWFsX2Nhc2VfY29u dmVydCAoa2V5Y2MsIGtleXMsICZuZXdfa2V5Y2MsICZuZXdfa2V5cykpCisgICAgICAgIHsKKyAg ICAgICAgICBtYXRjaF9pY2FzZSA9IDA7CisgICAgICAgICAgZnJlZSAoa2V5cyk7CisgICAgICAg ICAga2V5cyA9IG5ld19rZXlzOworICAgICAgICAgIGtleWNjID0gbmV3X2tleWNjOworICAgICAg ICB9CisgICAgfQorCiAgIGNvbXBpbGUgKGtleXMsIGtleWNjKTsKICAgZnJlZSAoa2V5cyk7Cgot LSAKMS44LjUucmMyLjYuZ2M2ZjFiOTIKCg== --047d7b15b1e55c767704ee3b4dc2-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 23 17:52:32 2013 Received: (at 16232) by debbugs.gnu.org; 23 Dec 2013 22:52:32 +0000 Received: from localhost ([127.0.0.1]:36862 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvEMS-0001Wk-35 for submit@debbugs.gnu.org; Mon, 23 Dec 2013 17:52:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:10919) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvEMO-0001WV-S8 for 16232@debbugs.gnu.org; Mon, 23 Dec 2013 17:52:30 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rBNMqRJe021069 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 23 Dec 2013 17:52:27 -0500 Received: from [10.3.113.56] (ovpn-113-56.phx2.redhat.com [10.3.113.56]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id rBNMqRkL028931; Mon, 23 Dec 2013 17:52:27 -0500 Message-ID: <52B8BEAA.2090808@redhat.com> Date: Mon, 23 Dec 2013 15:52:26 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Jim Meyering , 16232@debbugs.gnu.org Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: In-Reply-To: X-Enigmail-Version: 1.6 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="aJs6afNMFvkuDCxdbD19ku29tkBMd31Ku" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 16232 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.6 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --aJs6afNMFvkuDCxdbD19ku29tkBMd31Ku Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 12/23/2013 03:39 PM, Jim Meyering wrote: > FYI, here is a quick and clean/safe performance improvement for grep -i= =2E > I expect to push this commit right after the upcoming bug-fix release. > Currently, this optimization is enabled when the search string is > ASCII and contains neither of '\' (backslash) nor '['. I expect to > eliminate the latter two constraints in a follow-on commit including > tests to exercise all of the corner cases. >=20 > + > + /* Worst case is that every byte of keys will be alpha, > + so every byte B will map to the sequence of 4 bytes [Bb]. */ Umm, is this always true? Consider the UTF-8 Turkish locale, where single-byte i has a multi-byte uppercase (and conversely the single-byte I has a multi-byte lowercase) - that is, 'i' and 'I' are not case pairs. > + else > + { > + *p++ =3D '['; > + *p++ =3D *keys; > + *p++ =3D islower (*keys) ? toupper (*keys) : tolower (*keys)= ; This performs the ASCII-only toupper/tolower, rather than the proper locale-sensitive case conversion, which probably makes this patch misbehave for LC_ALL=3Dtr_TR.UTF-8. >=20 > + /* As currently implemented, case-insensitive matching is expensive = in > + multi-byte locales because of a few outlier locales in which some= > + characters change size when converted to upper or lower case. To= > + accommodate those, we revert to searching the input one line at a= > + time, rather than using the much more efficient buffer search. > + However, if we have a plain ascii search string, /foo/, we can > + convert it to an equivalent case-insensitive /[fF][oO][oO]/, and = thus > + avoid the expensive read-and-process-a-line-at-a-time requirement= =2E > + Optimize-away the "-i" option, when possible, converting each > + candidate alpha, C, in the regexp to [Cc]. */ In other words, this comment describes the very flaw of the outlier tr_TR locale that you are now violating with your optimization. Without your patch, this is correct behavior: $ echo i | LC_ALL=3Dtr_TR.UTF-8 grep -i I $ echo i | LC_ALL=3Dtr_TR.UTF-8 grep -i i i --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --aJs6afNMFvkuDCxdbD19ku29tkBMd31Ku Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJSuL6qAAoJEKeha0olJ0NqJ9QH/01Jt5P304DCEN34bMVM+7/g 1pCLh1nYiWwHWJFX2knnqyDGKSVIkqMMmmfHYcHy9Eg4RU4BmbQ2n7VHcgXt3T01 +gqIr+r1Guf8o94I6zRv/unwBYjtTCiatoBfa/HFSczMHexutIwMFC87xVu68mXy 5GOnRHjO97FKEjdIgeS+8V9aVGLZ8p9TBPQ4BkSWJKSfCw3Mho9Oz/PX1+l02kya +xr/On+cE55i3xId1nbTI/mdX8WMk1FnaOIDCJ5b+tmiGwQkEcWxDLklO2UELa8X AEOg28SvjiPbFRLc24MWkxtThvkBMDNUxnPFSIuh5vSizD5cQfEqmNarutKmVmk= =xLNN -----END PGP SIGNATURE----- --aJs6afNMFvkuDCxdbD19ku29tkBMd31Ku-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 23 18:12:51 2013 Received: (at 16232) by debbugs.gnu.org; 23 Dec 2013 23:12:51 +0000 Received: from localhost ([127.0.0.1]:36913 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvEg6-00024z-KD for submit@debbugs.gnu.org; Mon, 23 Dec 2013 18:12:50 -0500 Received: from mail-pd0-f179.google.com ([209.85.192.179]:43853) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvEg3-00024o-Pk for 16232@debbugs.gnu.org; Mon, 23 Dec 2013 18:12:48 -0500 Received: by mail-pd0-f179.google.com with SMTP id r10so5658610pdi.38 for <16232@debbugs.gnu.org>; Mon, 23 Dec 2013 15:12:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=OeoX7wfKYquuOovKYajnhTRa5pG9is1JLop5LLmwYpY=; b=bqpopkz9rHHUDut0A51EBKPK1qxcS6rkWSY5ySxOS2lZfrtZOmTB0HWkSdg/F6Fgac eOy5fUMqYalPqzlqIxaAMsvGGdArot1ltRQ/QfqchwFN2yB3S51wczZhaCPGJixGa75p vWADzAAbTIATzN/aiqHfWqJt/AfZ6NZx7WKmUQuYOO+ld96vUHyLJ9eWV4Iqc9GLXzso 2UL8E4ZtsObgPvpiwIfMk1G/UerBdQ0dtiNGHeeUKqdjONB+kglvBtI5E/a57O9CxfgQ k3OKUohKLwrvhwGFuMChE5SGuvZRZp6z9IE95ciuvKymPczzr3/4gefjfQoqjrP/Uc3k +DsA== X-Received: by 10.66.145.166 with SMTP id sv6mr29100445pab.31.1387840367082; Mon, 23 Dec 2013 15:12:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.6.66 with HTTP; Mon, 23 Dec 2013 15:12:26 -0800 (PST) In-Reply-To: <52B8BEAA.2090808@redhat.com> References: <52B8BEAA.2090808@redhat.com> From: Jim Meyering Date: Mon, 23 Dec 2013 15:12:26 -0800 X-Google-Sender-Auth: -Mvjy54hvPypY10mt9xJ4Vto7Qs Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Eric Blake Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Mon, Dec 23, 2013 at 2:52 PM, Eric Blake wrote: > On 12/23/2013 03:39 PM, Jim Meyering wrote: >> FYI, here is a quick and clean/safe performance improvement for grep -i. >> I expect to push this commit right after the upcoming bug-fix release. >> Currently, this optimization is enabled when the search string is >> ASCII and contains neither of '\' (backslash) nor '['. I expect to >> eliminate the latter two constraints in a follow-on commit including >> tests to exercise all of the corner cases. >> > >> + >> + /* Worst case is that every byte of keys will be alpha, >> + so every byte B will map to the sequence of 4 bytes [Bb]. */ > > Umm, is this always true? Consider the UTF-8 Turkish locale, where Hi Eric, Thanks for the review. Did you miss the "isascii" check in the new trivial_case_convert function? If you can describe circumstances in which the new patch malfunctions, please do, but everything you wrote seems to rely on a false assumption. E.g., your turkish-I example works fine with my patch. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 23 18:30:26 2013 Received: (at 16232) by debbugs.gnu.org; 23 Dec 2013 23:30:26 +0000 Received: from localhost ([127.0.0.1]:36941 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvEx7-0002Zt-RH for submit@debbugs.gnu.org; Mon, 23 Dec 2013 18:30:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:22337) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvEx3-0002Ze-Ec for 16232@debbugs.gnu.org; Mon, 23 Dec 2013 18:30:22 -0500 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rBNNUJYj030564 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 23 Dec 2013 18:30:20 -0500 Received: from [10.3.113.56] (ovpn-113-56.phx2.redhat.com [10.3.113.56]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id rBNNUJtA026872; Mon, 23 Dec 2013 18:30:19 -0500 Message-ID: <52B8C78B.7030401@redhat.com> Date: Mon, 23 Dec 2013 16:30:19 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: <52B8BEAA.2090808@redhat.com> In-Reply-To: X-Enigmail-Version: 1.6 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="68UjOxSTssIEM12DC4ntTfp6GOxxxT7pS" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 16232 Cc: 16232@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.6 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --68UjOxSTssIEM12DC4ntTfp6GOxxxT7pS Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 12/23/2013 04:12 PM, Jim Meyering wrote: > Did you miss the "isascii" check in the new trivial_case_convert functi= on? No. But even with that check in place: > If you can describe circumstances in which the new patch malfunctions, > please do, > but everything you wrote seems to rely on a false assumption. No, it's a quite real complaint - your patch is broken for tr_TR. > E.g., your turkish-I example works fine with my patch. isascii('i') is true, but converting 'i' to '[iI]' is incorrect in the tr_TR locale. Rather, the conversion must be to '[i=C4=B0]'; similarly, = 'I' would be translated to '[I=C4=B1]'. Neither of those conversions fit in = 4 bytes (since dotted-capital-I and dotless-lower-i are both multi-byte characters). Need help easily finding those characters on a non-Turkish keyboard? I used: $ echo iI | LC_ALL=3Dtr_TR.UTF-8 sed 's/\(.\)\(.\)/\U\1\L\2/' At any rate, prior to your patch, lower dotless i in the buffer gives an insensitive match to upper dotless I in the pattern: $ echo =C4=B1 | LC_ALL=3Dtr_TR.UTF-8 grep -i I || echo no match =C4=B1 After your patch: $ echo =C4=B1 | LC_ALL=3Dtr_TR.UTF-8 src/grep -i I || echo no match no match Oops, you failed to match lower dotless i insensitively against upper dotless I, because upper dotless I is ascii, but you incorrectly converted it into the wrong pattern. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --68UjOxSTssIEM12DC4ntTfp6GOxxxT7pS Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJSuMeLAAoJEKeha0olJ0NqfpIIAJZ0VhFbsMIfKMnCAayylcHC ZTy2GgvGIHw02T/hAE45bh1nynUupsFq3WzUWx9CYY83leqyNuayd4DYYQbo+Xf0 gSsjfuodv39liBaoV+RH7i/P/RSzk6X4Ny02PUB0p4470vqUy+bQLLclssVCXu8I ju9QUkW7GnKg1c/C0CppbZT5vJNfVFuOQW083HAZ3PhRovo4xExvG29sBUmSFL4a rLHEgbpqpkpnE7G6ct5BYGBgbvKvT0ojLZRftUEOK46x65H5Mj3M3KIu4cp66Zgu 6foslPgdKWgin74QqWhbPW3/x3WkhGFMfJyQhLpBkbtUzUaTJ8UuUbLBOP6qmMY= =LQKq -----END PGP SIGNATURE----- --68UjOxSTssIEM12DC4ntTfp6GOxxxT7pS-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 23 18:54:56 2013 Received: (at 16232) by debbugs.gnu.org; 23 Dec 2013 23:54:56 +0000 Received: from localhost ([127.0.0.1]:36977 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvFKp-0003Ex-8D for submit@debbugs.gnu.org; Mon, 23 Dec 2013 18:54:56 -0500 Received: from mail-pd0-f173.google.com ([209.85.192.173]:65171) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VvFKi-0003Eh-M4 for 16232@debbugs.gnu.org; Mon, 23 Dec 2013 18:54:49 -0500 Received: by mail-pd0-f173.google.com with SMTP id p10so5639655pdj.18 for <16232@debbugs.gnu.org>; Mon, 23 Dec 2013 15:54:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=Kc7fl03XKjcnYiza3wgRaxYsMSQSA7Kff5Uy0bjjDqg=; b=k3yHn1P0WK1tkvjVcBlnw0QNTHPNIQHYJ7GZqYAQVTRPrAY2ALnNJYg2jRnFydCSXp hUUW+95/XNChkIjThq0CavOVujH4ubIXKN+LLPty53OD0fs3KQmxf42JeEY0kZV3uLjq RR7tq1l5vl4aXXtoSXHatsI/mBsfUQCQJlq6uPw5NaruA4vw1he4HRfSFd7XCDtfv1Wq 17eG/aLApVoM0hIiQR0ozJoh+pC25nzov6PfaI4ivf8f/hzvymK2+xexVaHFwlpSsuFz yE6RUQKKx50wSSfSUCON8sybRil1nDheUZgYOXGnpu8dYwhGzst9op0b0TI7+ZqaAo+f T0uw== X-Received: by 10.68.209.193 with SMTP id mo1mr29403942pbc.38.1387842887590; Mon, 23 Dec 2013 15:54:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.6.66 with HTTP; Mon, 23 Dec 2013 15:54:27 -0800 (PST) In-Reply-To: <52B8C78B.7030401@redhat.com> References: <52B8BEAA.2090808@redhat.com> <52B8C78B.7030401@redhat.com> From: Jim Meyering Date: Mon, 23 Dec 2013 15:54:27 -0800 X-Google-Sender-Auth: OU2FbesjfuL9S_CBK6N4IZF7nEs Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Eric Blake Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Mon, Dec 23, 2013 at 3:30 PM, Eric Blake wrote: > On 12/23/2013 04:12 PM, Jim Meyering wrote: >> Did you miss the "isascii" check in the new trivial_case_convert functio= n? > > No. But even with that check in place: > >> If you can describe circumstances in which the new patch malfunctions, >> please do, >> but everything you wrote seems to rely on a false assumption. > > No, it's a quite real complaint - your patch is broken for tr_TR. > >> E.g., your turkish-I example works fine with my patch. > > isascii('i') is true, but converting 'i' to '[iI]' is incorrect in the > tr_TR locale. Rather, the conversion must be to '[i=C4=B0]'; similarly, = 'I' > would be translated to '[I=C4=B1]'. Neither of those conversions fit in = 4 > bytes (since dotted-capital-I and dotless-lower-i are both multi-byte > characters). > > Need help easily finding those characters on a non-Turkish keyboard? I > used: > $ echo iI | LC_ALL=3Dtr_TR.UTF-8 sed 's/\(.\)\(.\)/\U\1\L\2/' > > At any rate, prior to your patch, lower dotless i in the buffer gives an > insensitive match to upper dotless I in the pattern: > > $ echo =C4=B1 | LC_ALL=3Dtr_TR.UTF-8 grep -i I || echo no match > =C4=B1 > > After your patch: > > $ echo =C4=B1 | LC_ALL=3Dtr_TR.UTF-8 src/grep -i I || echo no match > no match > > Oops, you failed to match lower dotless i insensitively against upper > dotless I, because upper dotless I is ascii, but you incorrectly > converted it into the wrong pattern. Thanks for dotting those 'i's. While there is no risk of buffer overrun, there would definitely be a problem with the tr_TR locale. I will resolve it by removing the isascii check and performing multibyte case conversion to form each [cC] pair. Of course, that will mean removing the "4 * byte-length-of-search-string" buffer size limitation. I will also add tests based on your examples. From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 07 22:56:47 2014 Received: (at 16232) by debbugs.gnu.org; 8 Jan 2014 03:56:47 +0000 Received: from localhost ([127.0.0.1]:41840 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W0kG6-0002h1-3D for submit@debbugs.gnu.org; Tue, 07 Jan 2014 22:56:46 -0500 Received: from mail-pb0-f51.google.com ([209.85.160.51]:36447) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W0kG2-0002gq-Ot for 16232@debbugs.gnu.org; Tue, 07 Jan 2014 22:56:44 -0500 Received: by mail-pb0-f51.google.com with SMTP id up15so1014643pbc.24 for <16232@debbugs.gnu.org>; Tue, 07 Jan 2014 19:56:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=7WrkV/+b+60rxy7hkl3nGC4e1LRSWvE3hMZhYl0Pe7A=; b=FyFY6fOOGuL9sr4eOcxjGMNh/opxVGO0hW/etepeb0x9e5bmTlHpKJCysORbT092hi vXxL0r8n94B0wwtc5n8BJnwpZWrCU60z0I/G/f+mTkvnWn2M4vpusGJb5QjwFa5bAHCD JiPejrxroXGcy0geD7O438J/Ozmvq1xnOHiyvuA21pHT6fjEwOLmRwkkstLMqNt0YPjO lEBVt36vhwh41SxqwFWpHAzWTGsvLDX6YgNE1vfsTzBTFGCVCv6vyoxWQOYFBnFBWzen Lb5zQ6jhfUukOAiPMzNNm7TPrI+TjqI50AW31B/JFY7V6PdyjCPWa0fLbHDnzS6xtYI+ 80pw== X-Received: by 10.68.209.193 with SMTP id mo1mr140924062pbc.38.1389153401739; Tue, 07 Jan 2014 19:56:41 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.157.202 with HTTP; Tue, 7 Jan 2014 19:56:21 -0800 (PST) In-Reply-To: References: <52B8BEAA.2090808@redhat.com> <52B8C78B.7030401@redhat.com> From: Jim Meyering Date: Tue, 7 Jan 2014 19:56:21 -0800 X-Google-Sender-Auth: RAJfywvdVACyoAZ-AZpENtD2VMg Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Eric Blake Content-Type: multipart/mixed; boundary=047d7b15a1e50cbf1304ef6d7be4 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --047d7b15a1e50cbf1304ef6d7be4 Content-Type: text/plain; charset=ISO-8859-1 On Mon, Dec 23, 2013 at 3:54 PM, Jim Meyering wrote: ... > Thanks for dotting those 'i's. While there is no risk of buffer > overrun, there would definitely be a problem with the tr_TR locale. > I will resolve it by removing the isascii check and performing > multibyte case conversion to form each [cC] pair. Of course, > that will mean removing the "4 * byte-length-of-search-string" > buffer size limitation. > > I will also add tests based on your examples. Here is the improved patch. --047d7b15a1e50cbf1304ef6d7be4 Content-Type: text/plain; charset=UTF-8; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hq61yaci0 RnJvbSAyZDIyNzNmMzA3MDdiZGM4OTY3M2Y4NzkwNmY1OTdkMTU0NjZjZTIyIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog U3VuLCAyNCBOb3YgMjAxMyAxODo0OTozMSAtMDgwMApTdWJqZWN0OiBbUEFUQ0hdIGdyZXA6IG1h a2UgLS1pZ25vcmUtY2FzZSAoLWkpIGZhc3RlciAoc29tZXRpbWVzIDEweCkgaW4KIG11bHRpYnl0 ZSBsb2NhbGVzCgpUaGVzZSBkYXlzLCBuZWFybHkgZXZlcnlvbmUgdXNlcyBhIG11bHRpYnl0ZSBs b2NhbGUsIGFuZCBncmVwIGlzIG9mdGVuCnVzZWQgd2l0aCB0aGUgLS1pZ25vcmUtY2FzZSAoLWkp IG9wdGlvbiwgYnV0IHRoYXQgb3B0aW9uIGltcG9zZXMgYSB2ZXJ5CmhpZ2ggY29zdCBpbiBvcmRl ciB0byBoYW5kbGUgc29tZSB1bnVzdWFsIGNhc2VzIGluIGp1c3QgYSBmZXcgbXVsdGlieXRlCmxv Y2FsZXMuICBUaGlzIGNoYW5nZSBnZXRzIG1vc3Qgb2YgdGhlIHBlcmZvcm1hbmNlIG9mIHVzaW5n IExDX0FMTD1DCndpdGhvdXQgZWxpbWluYXRpbmcgdGhlIGFiaWxpdHkgdG8gc2VhcmNoIGZvciBt dWx0aWJ5dGUgc3RyaW5ncy4KCldpdGggdGhlIGZvbGxvd2luZyBleGFtcGxlLCBJIHNlZSBhbiAx MXggc3BlZWQtdXAgd2l0aCBhIDIuM0dIeiBpNzoKR2VuZXJhdGUgYSAxME0tbGluZSBmaWxlLCB3 aXRoIGVhY2ggbGluZSBjb25zaXN0aW5nIG9mIDQwICdqJ3M6CgogICAgeWVzIGpqampqampqampq ampqampqampqampqampqampqampqampqampqamogfCBoZWFkIC0xMDAwMDAwMCA+IGsKClRpbWUg c2VhcmNoaW5nIGl0IGZvciB0aGUgc2ltcGxlL25vZXhpc3RlbnQgc3RyaW5nICJmb29iYXIiLApm aXJzdCB3aXRoIHRoaXMgcGF0Y2ggKGJlc3Qtb2YtNSB0cmlhbHMpOgoKICAgIExDX0FMTD1lbl9V Uy5VVEYtOCBlbnYgdGltZSBzcmMvZ3JlcCAtaSBmb29iYXIgawogICAgICAgIDEuMTAgcmVhbCAg ICAgICAgIDEuMDMgdXNlciAgICAgICAgIDAuMDcgc3lzCgpCYWNrIG91dCB0aGF0IGNvbW1pdCAo dGVtcG9yYXJpbHkpLCByZWNvbXBpbGUsIGFuZCByZXJ1biB0aGUgZXhwZXJpbWVudDoKCiAgICBn aXQgbG9nIC0xIC1wfHBhdGNoIC1SIC1wMTsgbWFrZQogICAgTENfQUxMPWVuX1VTLlVURi04IGVu diB0aW1lIHNyYy9ncmVwIC1pIGZvb2JhciBrCiAgICAgICAgMTIuNTAgcmVhbCAgICAgICAgMTIu NDEgdXNlciAgICAgICAgIDAuMDggc3lzCgpUaGUgdHJpY2sgaXMgdG8gcmVhbGl6ZSB0aGF0IGZv ciBzb21lIHNlYXJjaCBzdHJpbmdzLCBpdCBpcyBlYXN5CnRvIGNvbnZlcnQgdG8gYW4gZXF1aXZh bGVudCBvbmUgdGhhdCBpcyBoYW5kbGVkIG11Y2ggbW9yZSBlZmZpY2llbnRseS4KRS5nLiwgY29u dmVydCB0aGlzIGNvbW1hbmQ6CgogIGdyZXAgLWkgZm9vYmFyIGsKCnRvIHRoaXM6CgogIGdyZXAg J1tmRl1bb09dW29PXVtiQl1bYUFdW3JSXScgawoKVGhhdCBhbGxvd3MgdGhlIG1hdGNoZXIgdG8g c2VhcmNoIGluIGJ1ZmZlciBtb2RlLCByYXRoZXIgdGhhbiBoYXZpbmcgdG8KZXh0cmFjdC9jYXNl LWNvbnZlcnQvc2VhcmNoIGVhY2ggbGluZSBzZXBhcmF0ZWx5LiAgQ3VycmVudGx5LCB3ZSBwZXJm b3JtCnRoaXMgY29udmVyc2lvbiBvbmx5IHdoZW4gc2VhcmNoIHN0cmluZ3MgY29udGFpbiBuZWl0 aGVyICdcJyBub3IgJ1snLgpTZWUgdGhlIGNvbW1lbnRzIGZvciBtb3JlIGRldGFpbC4KCiogc3Jj L21haW4uYyAodHJpdmlhbF9jYXNlX2lnbm9yZSk6IE5ldyBmdW5jdGlvbi4KKG1haW4pOiBXaGVu IHBvc3NpYmxlLCB0cmFuc2Zvcm0gdGhlIHJlZ2V4cCBzbyB3ZSBjYW4gZHJvcCB0aGUgLWkuCiog dGVzdHMvdHVya2lzaC1leWVzOiBOZXcgZmlsZS4KKiB0ZXN0cy9NYWtlZmlsZS5hbSAoVEVTVFMp OiBVc2UgaXQuCi0tLQogc3JjL21haW4uYyAgICAgICAgIHwgMTExICsrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKystCiB0ZXN0cy9NYWtlZmlsZS5hbSAg fCAgIDEgKwogdGVzdHMvdHVya2lzaC1leWVzIHwgIDQ0ICsrKysrKysrKysrKysrKysrKysrKwog MyBmaWxlcyBjaGFuZ2VkLCAxNTUgaW5zZXJ0aW9ucygrKSwgMSBkZWxldGlvbigtKQogY3JlYXRl IG1vZGUgMTAwNzU1IHRlc3RzL3R1cmtpc2gtZXllcwoKZGlmZiAtLWdpdCBhL3NyYy9tYWluLmMg Yi9zcmMvbWFpbi5jCmluZGV4IDQ0MDkwYmUuLmJmZDA5ODIgMTAwNjQ0Ci0tLSBhL3NyYy9tYWlu LmMKKysrIGIvc3JjL21haW4uYwpAQCAtMjcsNiArMjcsNyBAQAogI2luY2x1ZGUgPGZjbnRsLmg+ CiAjaW5jbHVkZSA8aW50dHlwZXMuaD4KICNpbmNsdWRlIDxzdGRpby5oPgorI2luY2x1ZGUgPGFz c2VydC5oPgogI2luY2x1ZGUgInN5c3RlbS5oIgoKICNpbmNsdWRlICJhcmdtYXRjaC5oIgpAQCAt MTY0NCwxMyArMTY0NSwxNCBAQCBpZiBhbnkgZXJyb3Igb2NjdXJzIGFuZCAtcSBpcyBub3QgZ2l2 ZW4sIHRoZSBleGl0IHN0YXR1cyBpcyAyLlxuIikpOwogICBleGl0IChzdGF0dXMpOwogfQoKK3N0 YXRpYyBjaGFyIGNvbnN0ICptYXRjaGVyOworCiAvKiBJZiBNIGlzIE5VTEwsIGluaXRpYWxpemUg dGhlIG1hdGNoZXIgdG8gdGhlIGRlZmF1bHQuICBPdGhlcndpc2Ugc2V0IHRoZQogICAgbWF0Y2hl ciB0byBNIGlmIGF2YWlsYWJsZS4gIEV4aXQgaW4gY2FzZSBvZiBjb25mbGljdHMgb3IgaWYgTSBp cyBub3QKICAgIGF2YWlsYWJsZS4gICovCiBzdGF0aWMgdm9pZAogc2V0bWF0Y2hlciAoY2hhciBj b25zdCAqbSkKIHsKLSAgc3RhdGljIGNoYXIgY29uc3QgKm1hdGNoZXI7CiAgIHVuc2lnbmVkIGlu dCBpOwoKICAgaWYgKCFtKQpAQCAtMTg2NSw2ICsxODY3LDg0IEBAIHBhcnNlX2dyZXBfY29sb3Jz ICh2b2lkKQogICAgICAgcmV0dXJuOwogfQoKKyNkZWZpbmUgTUJSVE9XQyhwd2MsIHMsIG4sIHBz KSBcCisgIChNQl9DVVJfTUFYID09IDEgPyBcCisgICAoKihwd2MpID0gYnRvd2MgKCoodW5zaWdu ZWQgY2hhciAqKSAocykpLCAxKSA6IFwKKyAgIG1icnRvd2MgKChwd2MpLCAocyksIChuKSwgKHBz KSkpCisKKyNkZWZpbmUgV0NSVE9NQihzLCB3YywgcHMpIFwKKyAgKE1CX0NVUl9NQVggPT0gMSA/ IFwKKyAgICgqKHMpID0gd2N0b2IgKCh3aW50X3QpICh3YykpLCAxKSA6IFwKKyAgIHdjcnRvbWIg KChzKSwgKHdjKSwgKHBzKSkpCisKKy8qIElmIHRoZSBuZXdsaW5lLXNlcGFyYXRlZCByZWd1bGFy IGV4cHJlc3Npb25zLCBLRVlTICh3aXRoIGxlbmd0aCwgTEVOCisgICBhbmQgbm8gdHJhaWxpbmcg TlVMIGJ5dGUpLCBhcmUgYW1lbmFibGUgdG8gdHJhbnNmb3JtYXRpb24gaW50bworICAgb3RoZXJ3 aXNlIGVxdWl2YWxlbnQgY2FzZS1pZ25vcmluZyBvbmVzLCBwZXJmb3JtIHRoZSB0cmFuc2Zvcm1h dGlvbiwKKyAgIHB1dCB0aGUgcmVzdWx0IGludG8gbWFsbG9jJ2QgbWVtb3J5LCAqTkVXX0tFWVMg d2l0aCBsZW5ndGggKk5FV19MRU4sCisgICBhbmQgcmV0dXJuIHRydWUuICBPdGhlcndpc2UsIHJl dHVybiBmYWxzZS4gICovCitzdGF0aWMgYm9vbAordHJpdmlhbF9jYXNlX2lnbm9yZSAoc2l6ZV90 IGxlbiwgY2hhciBjb25zdCAqa2V5cywKKyAgICAgICAgICAgICAgICAgICAgIHNpemVfdCAqbmV3 X2xlbiwgY2hhciAqKm5ld19rZXlzKQoreworICAvKiBGSVhNRTogY29uc2lkZXIgcmVtb3Zpbmcg dGhlIGZvbGxvd2luZyByZXN0cmljdGlvbjoKKyAgICAgUmVqZWN0IGlmIEtFWVMgY29udGFpbiBB U0NJSSAnXFwnIG9yICdbJy4gICovCisgIGlmIChtZW1jaHIgKGtleXMsICdcXCcsIGxlbikgfHwg bWVtY2hyIChrZXlzLCAnWycsIGxlbikpCisgICAgcmV0dXJuIGZhbHNlOworCisgIC8qIFdvcnN0 IGNhc2UgaXMgdGhhdCBlYWNoIGJ5dGUgQiBvZiBLRVlTIGlzIEFTQ0lJIGFscGhhYmV0aWMgYW5k IGVhY2gKKyAgICAgb3RoZXJfY2FzZShCKSBjaGFyYWN0ZXIsIEMsIG9jY3VwaWVzIE1CX0NVUl9N QVggYnl0ZXMsIHNvIGVhY2ggQgorICAgICBtYXBzIHRvIFtCQ10sIHdoaWNoIHJlcXVpcmVzIE1C X0NVUl9NQVggKyAzIGJ5dGVzLiAgICovCisgICpuZXdfa2V5cyA9IHhubWFsbG9jIChNQl9DVVJf TUFYICsgMywgbGVuICsgMSk7CisgIGNoYXIgKnAgPSAqbmV3X2tleXM7CisKKyAgbWJzdGF0ZV90 IG1iX3N0YXRlOworICBtZW1zZXQgKCZtYl9zdGF0ZSwgMCwgc2l6ZW9mIG1iX3N0YXRlKTsKKyAg d2hpbGUgKGxlbikKKyAgICB7CisgICAgICB3Y2hhcl90IHdjOworICAgICAgaW50IG4gPSBNQlJU T1dDICgmd2MsIGtleXMsIGxlbiwgJm1iX3N0YXRlKTsKKworICAgICAgLyogRm9yIGFuIGludmFs aWQsIGluY29tcGxldGUgb3IgTCdcMCcsIHNraXAgdGhpcyBvcHRpbWl6YXRpb24uICAqLworICAg ICAgaWYgKG4gPD0gMCkKKyAgICAgICAgeworICAgICAgICBza2lwX2Nhc2VfaWdub3JlX29wdGlt aXphdGlvbjoKKyAgICAgICAgICBmcmVlICgqbmV3X2tleXMpOworICAgICAgICAgIHJldHVybiBm YWxzZTsKKyAgICAgICAgfQorCisgICAgICBjaGFyIGNvbnN0ICpvcmlnID0ga2V5czsKKyAgICAg IGtleXMgKz0gbjsKKyAgICAgIGxlbiAtPSBuOworCisgICAgICBpZiAoIWlzd2FscGhhICh3Yykp CisgICAgICAgIHsKKyAgICAgICAgICBtZW1jcHkgKHAsIG9yaWcsIG4pOworICAgICAgICAgIHAg Kz0gbjsKKyAgICAgICAgfQorICAgICAgZWxzZQorICAgICAgICB7CisgICAgICAgICAgKnArKyA9 ICdbJzsKKyAgICAgICAgICBtZW1jcHkgKHAsIG9yaWcsIG4pOworICAgICAgICAgIHAgKz0gbjsK KworICAgICAgICAgIHdjaGFyX3Qgd2MyID0gaXN3dXBwZXIgKHdjKSA/IHRvd2xvd2VyICh3Yykg OiB0b3d1cHBlciAod2MpOworICAgICAgICAgIGNoYXIgYnVmW01CX0NVUl9NQVhdOworICAgICAg ICAgIGludCBuMiA9IFdDUlRPTUIgKGJ1Ziwgd2MyLCAmbWJfc3RhdGUpOworICAgICAgICAgIGlm IChuMiA8PSAwKQorICAgICAgICAgICAgZ290byBza2lwX2Nhc2VfaWdub3JlX29wdGltaXphdGlv bjsKKyAgICAgICAgICBhc3NlcnQgKG4yIDw9IE1CX0NVUl9NQVgpOworICAgICAgICAgIG1lbWNw eSAocCwgYnVmLCBuMik7CisgICAgICAgICAgcCArPSBuMjsKKworICAgICAgICAgICpwKysgPSAn XSc7CisgICAgICAgIH0KKyAgICB9CisKKyAgKm5ld19sZW4gPSBwIC0gKm5ld19rZXlzOworCisg IHJldHVybiB0cnVlOworfQorCiBpbnQKIG1haW4gKGludCBhcmdjLCBjaGFyICoqYXJndikKIHsK QEAgLTIyNjMsNiArMjM0MywzNSBAQCBtYWluIChpbnQgYXJnYywgY2hhciAqKmFyZ3YpCiAgIGVs c2UKICAgICB1c2FnZSAoRVhJVF9UUk9VQkxFKTsKCisgIC8qIEFzIGN1cnJlbnRseSBpbXBsZW1l bnRlZCwgY2FzZS1pbnNlbnNpdGl2ZSBtYXRjaGluZyBpcyBleHBlbnNpdmUgaW4KKyAgICAgbXVs dGktYnl0ZSBsb2NhbGVzIGJlY2F1c2Ugb2YgYSBmZXcgb3V0bGllciBsb2NhbGVzIGluIHdoaWNo IHNvbWUKKyAgICAgY2hhcmFjdGVycyBjaGFuZ2Ugc2l6ZSB3aGVuIGNvbnZlcnRlZCB0byB1cHBl ciBvciBsb3dlciBjYXNlLiAgVG8KKyAgICAgYWNjb21tb2RhdGUgdGhvc2UsIHdlIHJldmVydCB0 byBzZWFyY2hpbmcgdGhlIGlucHV0IG9uZSBsaW5lIGF0IGEKKyAgICAgdGltZSwgcmF0aGVyIHRo YW4gdXNpbmcgdGhlIG11Y2ggbW9yZSBlZmZpY2llbnQgYnVmZmVyIHNlYXJjaC4KKyAgICAgSG93 ZXZlciwgaWYgd2UgaGF2ZSBhIHJlZ3VsYXIgZXhwcmVzc2lvbiwgL2Zvby9pLCB3ZSBjYW4gY29u dmVydAorICAgICBpdCB0byBhbiBlcXVpdmFsZW50IGNhc2UtaW5zZW5zaXRpdmUgL1tmRl1bb09d W29PXS8sIGFuZCB0aHVzCisgICAgIGF2b2lkIHRoZSBleHBlbnNpdmUgcmVhZC1hbmQtcHJvY2Vz cy1hLWxpbmUtYXQtYS10aW1lIHJlcXVpcmVtZW50LgorICAgICBPcHRpbWl6ZS1hd2F5IHRoZSAi LWkiIG9wdGlvbiwgd2hlbiBwb3NzaWJsZSwgY29udmVydGluZyBlYWNoCisgICAgIGNhbmRpZGF0 ZSBhbHBoYSwgQywgaW4gdGhlIHJlZ2V4cCB0byBbQ2NdLiAgKi8KKyAgaWYgKG1hdGNoX2ljYXNl KQorICAgIHsKKyAgICAgIHNpemVfdCBuZXdfa2V5Y2M7CisgICAgICBjaGFyICpuZXdfa2V5czsK KyAgICAgIC8qIEl0IGlzIG5vdCBwb3NzaWJsZSB3aXRoIC1GLCBub3QgdXNlZnVsIHdpdGggLVAg KHBjcmUpIGFuZCB0aGVyZSBpcyBubworICAgICAgICAgcG9pbnQgd2hlbiB0aGVyZSBpcyBubyBy ZWdleHAuICBJdCBhbHNvIGRlcGVuZHMgb24gd2hpY2ggY29uc3RydWN0cworICAgICAgICAgYXBw ZWFyIGluIHRoZSByZWdleHAuICBTZWUgdHJpdmlhbF9jYXNlX2lnbm9yZSBmb3IgdGhvc2UgZGV0 YWlscy4gICovCisgICAgICBpZiAoa2V5Y2MKKyAgICAgICAgICAmJiAhIChtYXRjaGVyCisgICAg ICAgICAgICAgICAgJiYgKFNUUkVRIChtYXRjaGVyLCAiZmdyZXAiKSB8fCBTVFJFUSAobWF0Y2hl ciwgInBjcmUiKSkpCisgICAgICAgICAgJiYgdHJpdmlhbF9jYXNlX2lnbm9yZSAoa2V5Y2MsIGtl eXMsICZuZXdfa2V5Y2MsICZuZXdfa2V5cykpCisgICAgICAgIHsKKyAgICAgICAgICBtYXRjaF9p Y2FzZSA9IDA7CisgICAgICAgICAgZnJlZSAoa2V5cyk7CisgICAgICAgICAga2V5cyA9IG5ld19r ZXlzOworICAgICAgICAgIGtleWNjID0gbmV3X2tleWNjOworICAgICAgICB9CisgICAgfQorCiAg IGNvbXBpbGUgKGtleXMsIGtleWNjKTsKICAgZnJlZSAoa2V5cyk7CgpkaWZmIC0tZ2l0IGEvdGVz dHMvTWFrZWZpbGUuYW0gYi90ZXN0cy9NYWtlZmlsZS5hbQppbmRleCBmNDU4MGI1Li5hMzdhODE0 IDEwMDY0NAotLS0gYS90ZXN0cy9NYWtlZmlsZS5hbQorKysgYi90ZXN0cy9NYWtlZmlsZS5hbQpA QCAtOTQsNiArOTQsNyBAQCBURVNUUyA9CQkJCQkJXAogICBzdGF0dXMJCQkJCVwKICAgc3Vycm9n YXRlLXBhaXIJCQkJXAogICBzeW1saW5rCQkJCQlcCisgIHR1cmtpc2gtZXllcwkJCQkJXAogICB0 dXJraXNoLUkJCQkJCVwKICAgdHVya2lzaC1JLXdpdGhvdXQtZG90CQkJCVwKICAgd2Fybi1jaGFy LWNsYXNzZXMJCQkJXApkaWZmIC0tZ2l0IGEvdGVzdHMvdHVya2lzaC1leWVzIGIvdGVzdHMvdHVy a2lzaC1leWVzCm5ldyBmaWxlIG1vZGUgMTAwNzU1CmluZGV4IDAwMDAwMDAuLjMyM2ViMzUKLS0t IC9kZXYvbnVsbAorKysgYi90ZXN0cy90dXJraXNoLWV5ZXMKQEAgLTAsMCArMSw0NCBAQAorIyEv YmluL3NoCisjIEVuc3VyZSB0aGF0IGNhc2UtaW5zZW5zaXRpdmUgbWF0Y2hpbmcgd29ya3Mgd2l0 aCBhbGwgVHVya2lzaCBpJ3MKKworIyBDb3B5cmlnaHQgKEMpIDIwMTQgRnJlZSBTb2Z0d2FyZSBG b3VuZGF0aW9uLCBJbmMuCisKKyMgVGhpcyBwcm9ncmFtIGlzIGZyZWUgc29mdHdhcmU6IHlvdSBj YW4gcmVkaXN0cmlidXRlIGl0IGFuZC9vciBtb2RpZnkKKyMgaXQgdW5kZXIgdGhlIHRlcm1zIG9m IHRoZSBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZSBhcyBwdWJsaXNoZWQgYnkKKyMgdGhlIEZy ZWUgU29mdHdhcmUgRm91bmRhdGlvbiwgZWl0aGVyIHZlcnNpb24gMyBvZiB0aGUgTGljZW5zZSwg b3IKKyMgKGF0IHlvdXIgb3B0aW9uKSBhbnkgbGF0ZXIgdmVyc2lvbi4KKworIyBUaGlzIHByb2dy YW0gaXMgZGlzdHJpYnV0ZWQgaW4gdGhlIGhvcGUgdGhhdCBpdCB3aWxsIGJlIHVzZWZ1bCwKKyMg YnV0IFdJVEhPVVQgQU5ZIFdBUlJBTlRZOyB3aXRob3V0IGV2ZW4gdGhlIGltcGxpZWQgd2FycmFu dHkgb2YKKyMgTUVSQ0hBTlRBQklMSVRZIG9yIEZJVE5FU1MgRk9SIEEgUEFSVElDVUxBUiBQVVJQ T1NFLiAgU2VlIHRoZQorIyBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZSBmb3IgbW9yZSBkZXRh aWxzLgorCisjIFlvdSBzaG91bGQgaGF2ZSByZWNlaXZlZCBhIGNvcHkgb2YgdGhlIEdOVSBHZW5l cmFsIFB1YmxpYyBMaWNlbnNlCisjIGFsb25nIHdpdGggdGhpcyBwcm9ncmFtLiAgSWYgbm90LCBz ZWUgPGh0dHA6Ly93d3cuZ251Lm9yZy9saWNlbnNlcy8+LgorCisuICIke3NyY2Rpcj0ufS9pbml0 LnNoIjsgcGF0aF9wcmVwZW5kXyAuLi9zcmMKKworcmVxdWlyZV9jb21waWxlZF9pbl9NQl9zdXBw b3J0CisKK2ZhaWw9MAorCitMPXRyX1RSLlVURi04CisKKyMgQ2hlY2sgZm9yIGEgYnJva2VuIHRy X1RSLlVURi04IGxvY2FsZSBkZWZpbml0aW9uLgorIyBJbiB0aGlzIGxvY2FsZSwgJ2knIGlzIG5v dCBhIGxvd2VyLWNhc2UgJ0knLgorZWNobyBJIHwgTENfQUxMPSRMIGdyZXAgLWkgaSA+IC9kZXYv bnVsbCBcCisgICAgJiYgc2tpcF8gInlvdXIgJEwgbG9jYWxlIGFwcGVhcnMgdG8gYmUgYnJva2Vu IgorCisjIEVuc3VyZSB0aGF0IHRoaXMgbWF0Y2hlczoKKyMgcHJpbnRmICdJOsSwIMSxOmlcbid8 TENfQUxMPXRyX1RSLnV0ZjggZ3JlcCAtaSAnxLE6aSBJOsSwJworST0kKHByaW50ZiAnXDMwNFwy NjAnKSAjIGNhcGl0YWwgSSB3aXRoIGRvdAoraT0kKHByaW50ZiAnXDMwNFwyNjEnKSAjIGxvd2Vy Y2FzZSBkb3RsZXNzIGkKKworZGF0YT0kKCAgICAgIHByaW50ZiAiSTokSSAkaTppIikKK3NlYXJj aF9zdHI9JChwcmludGYgIiRpOmkgSTokSSIpCitwcmludGYgIiRkYXRhXG4iID4gaW4gfHwgZnJh bWV3b3JrX2ZhaWx1cmVfCisKK0xDX0FMTD0kTCBncmVwIC1pICJeJHNlYXJjaF9zdHJcJCIgaW4g PiBvdXQgfHwgZmFpbD0xCitjb21wYXJlIG91dCBpbiB8fCBmYWlsPTEKKworRXhpdCAkZmFpbAot LSAKMS44LjUuMi4yMjkuZzQ0NDg0NjYKCg== --047d7b15a1e50cbf1304ef6d7be4-- From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 10 00:19:47 2014 Received: (at 16232) by debbugs.gnu.org; 10 Jan 2014 05:19:47 +0000 Received: from localhost ([127.0.0.1]:44596 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1UVW-0000Cj-GE for submit@debbugs.gnu.org; Fri, 10 Jan 2014 00:19:47 -0500 Received: from mail-pd0-f178.google.com ([209.85.192.178]:56234) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1UVT-0000CU-UU for 16232@debbugs.gnu.org; Fri, 10 Jan 2014 00:19:45 -0500 Received: by mail-pd0-f178.google.com with SMTP id y10so4138339pdj.9 for <16232@debbugs.gnu.org>; Thu, 09 Jan 2014 21:19:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=xCNk0DoW4V5kwp/+3/JiuLZ880iXktgfQOkowJ9u1Mk=; b=PL3P/Oa5AsloCw9YB96rgzwsR09a5tJ2Nw7rh6egPytJbJ6EpOJ0lN7doanhoHFxZe SRIRd7ZdQ55CyX0V25HvhL+mIXK6cRVmU3hnhEbxNWpspf0KKsme4drRcRIT5ZKrgqNi BGZ6SC/xqrH3XwdGXHAqleYfizycNUHbYqeuxz4FP2iG5710PCMkZK85z3pGaZCqfxiu Toyo8NCuPSaylDlje+Oi3ixnJbjPADu+Bp0aqaQdcp+4kh8aDj104U8VdJSZyWWVDUNW MysodHryMwFKuKgscG3AJ5ewKVVc1fy7oxwVuFdDVn9gcVKfzCTWGTiKQ39k6phLGZf5 7Dzg== X-Received: by 10.68.192.131 with SMTP id hg3mr8622371pbc.136.1389331182924; Thu, 09 Jan 2014 21:19:42 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.157.202 with HTTP; Thu, 9 Jan 2014 21:19:21 -0800 (PST) In-Reply-To: References: <52B8BEAA.2090808@redhat.com> <52B8C78B.7030401@redhat.com> From: Jim Meyering Date: Thu, 9 Jan 2014 21:19:21 -0800 X-Google-Sender-Auth: 1bl0RZwJxX6lRbtXWVqkr1FG9JA Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Eric Blake Content-Type: multipart/mixed; boundary=047d7b6d80b4a2694d04ef96df7a X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --047d7b6d80b4a2694d04ef96df7a Content-Type: text/plain; charset=ISO-8859-1 On Tue, Jan 7, 2014 at 7:56 PM, Jim Meyering wrote: > Here is the improved patch. I've added a NEWS entry. Pushing tomorrow: --047d7b6d80b4a2694d04ef96df7a Content-Type: text/plain; charset=UTF-8; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hq8zv55f1 RnJvbSA5NzMxOGY1ZTU5YTFlZjZmZWI4YTM3ODQzNGEwMDkzMmEzZmMxZTBiIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog U3VuLCAyNCBOb3YgMjAxMyAxODo0OTozMSAtMDgwMApTdWJqZWN0OiBbUEFUQ0hdIGdyZXA6IG1h a2UgLS1pZ25vcmUtY2FzZSAoLWkpIGZhc3RlciAoc29tZXRpbWVzIDEweCkgaW4KIG11bHRpYnl0 ZSBsb2NhbGVzCgpUaGVzZSBkYXlzLCBuZWFybHkgZXZlcnlvbmUgdXNlcyBhIG11bHRpYnl0ZSBs b2NhbGUsIGFuZCBncmVwIGlzIG9mdGVuCnVzZWQgd2l0aCB0aGUgLS1pZ25vcmUtY2FzZSAoLWkp IG9wdGlvbiwgYnV0IHRoYXQgb3B0aW9uIGltcG9zZXMgYSB2ZXJ5CmhpZ2ggY29zdCBpbiBvcmRl ciB0byBoYW5kbGUgc29tZSB1bnVzdWFsIGNhc2VzIGluIGp1c3QgYSBmZXcgbXVsdGlieXRlCmxv Y2FsZXMuICBUaGlzIGNoYW5nZSBnZXRzIG1vc3Qgb2YgdGhlIHBlcmZvcm1hbmNlIG9mIHVzaW5n IExDX0FMTD1DCndpdGhvdXQgZWxpbWluYXRpbmcgdGhlIGFiaWxpdHkgdG8gc2VhcmNoIGZvciBt dWx0aWJ5dGUgc3RyaW5ncy4KCldpdGggdGhlIGZvbGxvd2luZyBleGFtcGxlLCBJIHNlZSBhbiAx MXggc3BlZWQtdXAgd2l0aCBhIDIuM0dIeiBpNzoKR2VuZXJhdGUgYSAxME0tbGluZSBmaWxlLCB3 aXRoIGVhY2ggbGluZSBjb25zaXN0aW5nIG9mIDQwICdqJ3M6CgogICAgeWVzIGpqampqampqampq ampqampqampqampqampqampqampqampqampqamogfCBoZWFkIC0xMDAwMDAwMCA+IGsKClRpbWUg c2VhcmNoaW5nIGl0IGZvciB0aGUgc2ltcGxlL25vZXhpc3RlbnQgc3RyaW5nICJmb29iYXIiLApm aXJzdCB3aXRoIHRoaXMgcGF0Y2ggKGJlc3Qtb2YtNSB0cmlhbHMpOgoKICAgIExDX0FMTD1lbl9V Uy5VVEYtOCBlbnYgdGltZSBzcmMvZ3JlcCAtaSBmb29iYXIgawogICAgICAgIDEuMTAgcmVhbCAg ICAgICAgIDEuMDMgdXNlciAgICAgICAgIDAuMDcgc3lzCgpCYWNrIG91dCB0aGF0IGNvbW1pdCAo dGVtcG9yYXJpbHkpLCByZWNvbXBpbGUsIGFuZCByZXJ1biB0aGUgZXhwZXJpbWVudDoKCiAgICBn aXQgbG9nIC0xIC1wfHBhdGNoIC1SIC1wMTsgbWFrZQogICAgTENfQUxMPWVuX1VTLlVURi04IGVu diB0aW1lIHNyYy9ncmVwIC1pIGZvb2JhciBrCiAgICAgICAgMTIuNTAgcmVhbCAgICAgICAgMTIu NDEgdXNlciAgICAgICAgIDAuMDggc3lzCgpUaGUgdHJpY2sgaXMgdG8gcmVhbGl6ZSB0aGF0IGZv ciBzb21lIHNlYXJjaCBzdHJpbmdzLCBpdCBpcyBlYXN5CnRvIGNvbnZlcnQgdG8gYW4gZXF1aXZh bGVudCBvbmUgdGhhdCBpcyBoYW5kbGVkIG11Y2ggbW9yZSBlZmZpY2llbnRseS4KRS5nLiwgY29u dmVydCB0aGlzIGNvbW1hbmQ6CgogIGdyZXAgLWkgZm9vYmFyIGsKCnRvIHRoaXM6CgogIGdyZXAg J1tmRl1bb09dW29PXVtiQl1bYUFdW3JSXScgawoKVGhhdCBhbGxvd3MgdGhlIG1hdGNoZXIgdG8g c2VhcmNoIGluIGJ1ZmZlciBtb2RlLCByYXRoZXIgdGhhbiBoYXZpbmcgdG8KZXh0cmFjdC9jYXNl LWNvbnZlcnQvc2VhcmNoIGVhY2ggbGluZSBzZXBhcmF0ZWx5LiAgQ3VycmVudGx5LCB3ZSBwZXJm b3JtCnRoaXMgY29udmVyc2lvbiBvbmx5IHdoZW4gc2VhcmNoIHN0cmluZ3MgY29udGFpbiBuZWl0 aGVyICdcJyBub3IgJ1snLgpTZWUgdGhlIGNvbW1lbnRzIGZvciBtb3JlIGRldGFpbC4KCiogc3Jj L21haW4uYyAodHJpdmlhbF9jYXNlX2lnbm9yZSk6IE5ldyBmdW5jdGlvbi4KKG1haW4pOiBXaGVu IHBvc3NpYmxlLCB0cmFuc2Zvcm0gdGhlIHJlZ2V4cCBzbyB3ZSBjYW4gZHJvcCB0aGUgLWkuCiog dGVzdHMvdHVya2lzaC1leWVzOiBOZXcgZmlsZS4KKiB0ZXN0cy9NYWtlZmlsZS5hbSAoVEVTVFMp OiBVc2UgaXQuCiogTkVXUyAoSW1wcm92ZW1lbnRzKTogTWVudGlvbiBpdC4KLS0tCiBORVdTICAg ICAgICAgICAgICAgfCAgIDUgKysrCiBzcmMvbWFpbi5jICAgICAgICAgfCAxMTEgKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKy0KIHRlc3RzL01ha2Vm aWxlLmFtICB8ICAgMSArCiB0ZXN0cy90dXJraXNoLWV5ZXMgfCAgNDQgKysrKysrKysrKysrKysr KysrKysrCiA0IGZpbGVzIGNoYW5nZWQsIDE2MCBpbnNlcnRpb25zKCspLCAxIGRlbGV0aW9uKC0p CiBjcmVhdGUgbW9kZSAxMDA3NTUgdGVzdHMvdHVya2lzaC1leWVzCgpkaWZmIC0tZ2l0IGEvTkVX UyBiL05FV1MKaW5kZXggNjg1OWNhMC4uNmU0NjY4NCAxMDA2NDQKLS0tIGEvTkVXUworKysgYi9O RVdTCkBAIC0yLDYgKzIsMTEgQEAgR05VIGdyZXAgTkVXUyAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIC0qLSBvdXRsaW5lIC0qLQoKICogTm90ZXdvcnRoeSBjaGFuZ2VzIGluIHJl bGVhc2UgPy4/ICg/Pz8/LT8/LT8/KSBbP10KCisqKiBJbXByb3ZlbWVudHMKKworICBncmVwIC1p IGluIGEgbXVsdGlieXRlIGxvY2FsZSBpcyBub3cgdHlwaWNhbGx5IDEwIHRpbWVzIGZhc3Rlcgor ICBmb3IgcGF0dGVybnMgdGhhdCBkbyBub3QgY29udGFpbiBcIG9yIFsuCisKCiAqIE5vdGV3b3J0 aHkgY2hhbmdlcyBpbiByZWxlYXNlIDIuMTYgKDIwMTQtMDEtMDEpIFtzdGFibGVdCgpkaWZmIC0t Z2l0IGEvc3JjL21haW4uYyBiL3NyYy9tYWluLmMKaW5kZXggNDQwOTBiZS4uYmZkMDk4MiAxMDA2 NDQKLS0tIGEvc3JjL21haW4uYworKysgYi9zcmMvbWFpbi5jCkBAIC0yNyw2ICsyNyw3IEBACiAj aW5jbHVkZSA8ZmNudGwuaD4KICNpbmNsdWRlIDxpbnR0eXBlcy5oPgogI2luY2x1ZGUgPHN0ZGlv Lmg+CisjaW5jbHVkZSA8YXNzZXJ0Lmg+CiAjaW5jbHVkZSAic3lzdGVtLmgiCgogI2luY2x1ZGUg ImFyZ21hdGNoLmgiCkBAIC0xNjQ0LDEzICsxNjQ1LDE0IEBAIGlmIGFueSBlcnJvciBvY2N1cnMg YW5kIC1xIGlzIG5vdCBnaXZlbiwgdGhlIGV4aXQgc3RhdHVzIGlzIDIuXG4iKSk7CiAgIGV4aXQg KHN0YXR1cyk7CiB9Cgorc3RhdGljIGNoYXIgY29uc3QgKm1hdGNoZXI7CisKIC8qIElmIE0gaXMg TlVMTCwgaW5pdGlhbGl6ZSB0aGUgbWF0Y2hlciB0byB0aGUgZGVmYXVsdC4gIE90aGVyd2lzZSBz ZXQgdGhlCiAgICBtYXRjaGVyIHRvIE0gaWYgYXZhaWxhYmxlLiAgRXhpdCBpbiBjYXNlIG9mIGNv bmZsaWN0cyBvciBpZiBNIGlzIG5vdAogICAgYXZhaWxhYmxlLiAgKi8KIHN0YXRpYyB2b2lkCiBz ZXRtYXRjaGVyIChjaGFyIGNvbnN0ICptKQogewotICBzdGF0aWMgY2hhciBjb25zdCAqbWF0Y2hl cjsKICAgdW5zaWduZWQgaW50IGk7CgogICBpZiAoIW0pCkBAIC0xODY1LDYgKzE4NjcsODQgQEAg cGFyc2VfZ3JlcF9jb2xvcnMgKHZvaWQpCiAgICAgICByZXR1cm47CiB9CgorI2RlZmluZSBNQlJU T1dDKHB3YywgcywgbiwgcHMpIFwKKyAgKE1CX0NVUl9NQVggPT0gMSA/IFwKKyAgICgqKHB3Yykg PSBidG93YyAoKih1bnNpZ25lZCBjaGFyICopIChzKSksIDEpIDogXAorICAgbWJydG93YyAoKHB3 YyksIChzKSwgKG4pLCAocHMpKSkKKworI2RlZmluZSBXQ1JUT01CKHMsIHdjLCBwcykgXAorICAo TUJfQ1VSX01BWCA9PSAxID8gXAorICAgKCoocykgPSB3Y3RvYiAoKHdpbnRfdCkgKHdjKSksIDEp IDogXAorICAgd2NydG9tYiAoKHMpLCAod2MpLCAocHMpKSkKKworLyogSWYgdGhlIG5ld2xpbmUt c2VwYXJhdGVkIHJlZ3VsYXIgZXhwcmVzc2lvbnMsIEtFWVMgKHdpdGggbGVuZ3RoLCBMRU4KKyAg IGFuZCBubyB0cmFpbGluZyBOVUwgYnl0ZSksIGFyZSBhbWVuYWJsZSB0byB0cmFuc2Zvcm1hdGlv biBpbnRvCisgICBvdGhlcndpc2UgZXF1aXZhbGVudCBjYXNlLWlnbm9yaW5nIG9uZXMsIHBlcmZv cm0gdGhlIHRyYW5zZm9ybWF0aW9uLAorICAgcHV0IHRoZSByZXN1bHQgaW50byBtYWxsb2MnZCBt ZW1vcnksICpORVdfS0VZUyB3aXRoIGxlbmd0aCAqTkVXX0xFTiwKKyAgIGFuZCByZXR1cm4gdHJ1 ZS4gIE90aGVyd2lzZSwgcmV0dXJuIGZhbHNlLiAgKi8KK3N0YXRpYyBib29sCit0cml2aWFsX2Nh c2VfaWdub3JlIChzaXplX3QgbGVuLCBjaGFyIGNvbnN0ICprZXlzLAorICAgICAgICAgICAgICAg ICAgICAgc2l6ZV90ICpuZXdfbGVuLCBjaGFyICoqbmV3X2tleXMpCit7CisgIC8qIEZJWE1FOiBj b25zaWRlciByZW1vdmluZyB0aGUgZm9sbG93aW5nIHJlc3RyaWN0aW9uOgorICAgICBSZWplY3Qg aWYgS0VZUyBjb250YWluIEFTQ0lJICdcXCcgb3IgJ1snLiAgKi8KKyAgaWYgKG1lbWNociAoa2V5 cywgJ1xcJywgbGVuKSB8fCBtZW1jaHIgKGtleXMsICdbJywgbGVuKSkKKyAgICByZXR1cm4gZmFs c2U7CisKKyAgLyogV29yc3QgY2FzZSBpcyB0aGF0IGVhY2ggYnl0ZSBCIG9mIEtFWVMgaXMgQVND SUkgYWxwaGFiZXRpYyBhbmQgZWFjaAorICAgICBvdGhlcl9jYXNlKEIpIGNoYXJhY3RlciwgQywg b2NjdXBpZXMgTUJfQ1VSX01BWCBieXRlcywgc28gZWFjaCBCCisgICAgIG1hcHMgdG8gW0JDXSwg d2hpY2ggcmVxdWlyZXMgTUJfQ1VSX01BWCArIDMgYnl0ZXMuICAgKi8KKyAgKm5ld19rZXlzID0g eG5tYWxsb2MgKE1CX0NVUl9NQVggKyAzLCBsZW4gKyAxKTsKKyAgY2hhciAqcCA9ICpuZXdfa2V5 czsKKworICBtYnN0YXRlX3QgbWJfc3RhdGU7CisgIG1lbXNldCAoJm1iX3N0YXRlLCAwLCBzaXpl b2YgbWJfc3RhdGUpOworICB3aGlsZSAobGVuKQorICAgIHsKKyAgICAgIHdjaGFyX3Qgd2M7Cisg ICAgICBpbnQgbiA9IE1CUlRPV0MgKCZ3Yywga2V5cywgbGVuLCAmbWJfc3RhdGUpOworCisgICAg ICAvKiBGb3IgYW4gaW52YWxpZCwgaW5jb21wbGV0ZSBvciBMJ1wwJywgc2tpcCB0aGlzIG9wdGlt aXphdGlvbi4gICovCisgICAgICBpZiAobiA8PSAwKQorICAgICAgICB7CisgICAgICAgIHNraXBf Y2FzZV9pZ25vcmVfb3B0aW1pemF0aW9uOgorICAgICAgICAgIGZyZWUgKCpuZXdfa2V5cyk7Cisg ICAgICAgICAgcmV0dXJuIGZhbHNlOworICAgICAgICB9CisKKyAgICAgIGNoYXIgY29uc3QgKm9y aWcgPSBrZXlzOworICAgICAga2V5cyArPSBuOworICAgICAgbGVuIC09IG47CisKKyAgICAgIGlm ICghaXN3YWxwaGEgKHdjKSkKKyAgICAgICAgeworICAgICAgICAgIG1lbWNweSAocCwgb3JpZywg bik7CisgICAgICAgICAgcCArPSBuOworICAgICAgICB9CisgICAgICBlbHNlCisgICAgICAgIHsK KyAgICAgICAgICAqcCsrID0gJ1snOworICAgICAgICAgIG1lbWNweSAocCwgb3JpZywgbik7Cisg ICAgICAgICAgcCArPSBuOworCisgICAgICAgICAgd2NoYXJfdCB3YzIgPSBpc3d1cHBlciAod2Mp ID8gdG93bG93ZXIgKHdjKSA6IHRvd3VwcGVyICh3Yyk7CisgICAgICAgICAgY2hhciBidWZbTUJf Q1VSX01BWF07CisgICAgICAgICAgaW50IG4yID0gV0NSVE9NQiAoYnVmLCB3YzIsICZtYl9zdGF0 ZSk7CisgICAgICAgICAgaWYgKG4yIDw9IDApCisgICAgICAgICAgICBnb3RvIHNraXBfY2FzZV9p Z25vcmVfb3B0aW1pemF0aW9uOworICAgICAgICAgIGFzc2VydCAobjIgPD0gTUJfQ1VSX01BWCk7 CisgICAgICAgICAgbWVtY3B5IChwLCBidWYsIG4yKTsKKyAgICAgICAgICBwICs9IG4yOworCisg ICAgICAgICAgKnArKyA9ICddJzsKKyAgICAgICAgfQorICAgIH0KKworICAqbmV3X2xlbiA9IHAg LSAqbmV3X2tleXM7CisKKyAgcmV0dXJuIHRydWU7Cit9CisKIGludAogbWFpbiAoaW50IGFyZ2Ms IGNoYXIgKiphcmd2KQogewpAQCAtMjI2Myw2ICsyMzQzLDM1IEBAIG1haW4gKGludCBhcmdjLCBj aGFyICoqYXJndikKICAgZWxzZQogICAgIHVzYWdlIChFWElUX1RST1VCTEUpOwoKKyAgLyogQXMg Y3VycmVudGx5IGltcGxlbWVudGVkLCBjYXNlLWluc2Vuc2l0aXZlIG1hdGNoaW5nIGlzIGV4cGVu c2l2ZSBpbgorICAgICBtdWx0aS1ieXRlIGxvY2FsZXMgYmVjYXVzZSBvZiBhIGZldyBvdXRsaWVy IGxvY2FsZXMgaW4gd2hpY2ggc29tZQorICAgICBjaGFyYWN0ZXJzIGNoYW5nZSBzaXplIHdoZW4g Y29udmVydGVkIHRvIHVwcGVyIG9yIGxvd2VyIGNhc2UuICBUbworICAgICBhY2NvbW1vZGF0ZSB0 aG9zZSwgd2UgcmV2ZXJ0IHRvIHNlYXJjaGluZyB0aGUgaW5wdXQgb25lIGxpbmUgYXQgYQorICAg ICB0aW1lLCByYXRoZXIgdGhhbiB1c2luZyB0aGUgbXVjaCBtb3JlIGVmZmljaWVudCBidWZmZXIg c2VhcmNoLgorICAgICBIb3dldmVyLCBpZiB3ZSBoYXZlIGEgcmVndWxhciBleHByZXNzaW9uLCAv Zm9vL2ksIHdlIGNhbiBjb252ZXJ0CisgICAgIGl0IHRvIGFuIGVxdWl2YWxlbnQgY2FzZS1pbnNl bnNpdGl2ZSAvW2ZGXVtvT11bb09dLywgYW5kIHRodXMKKyAgICAgYXZvaWQgdGhlIGV4cGVuc2l2 ZSByZWFkLWFuZC1wcm9jZXNzLWEtbGluZS1hdC1hLXRpbWUgcmVxdWlyZW1lbnQuCisgICAgIE9w dGltaXplLWF3YXkgdGhlICItaSIgb3B0aW9uLCB3aGVuIHBvc3NpYmxlLCBjb252ZXJ0aW5nIGVh Y2gKKyAgICAgY2FuZGlkYXRlIGFscGhhLCBDLCBpbiB0aGUgcmVnZXhwIHRvIFtDY10uICAqLwor ICBpZiAobWF0Y2hfaWNhc2UpCisgICAgeworICAgICAgc2l6ZV90IG5ld19rZXljYzsKKyAgICAg IGNoYXIgKm5ld19rZXlzOworICAgICAgLyogSXQgaXMgbm90IHBvc3NpYmxlIHdpdGggLUYsIG5v dCB1c2VmdWwgd2l0aCAtUCAocGNyZSkgYW5kIHRoZXJlIGlzIG5vCisgICAgICAgICBwb2ludCB3 aGVuIHRoZXJlIGlzIG5vIHJlZ2V4cC4gIEl0IGFsc28gZGVwZW5kcyBvbiB3aGljaCBjb25zdHJ1 Y3RzCisgICAgICAgICBhcHBlYXIgaW4gdGhlIHJlZ2V4cC4gIFNlZSB0cml2aWFsX2Nhc2VfaWdu b3JlIGZvciB0aG9zZSBkZXRhaWxzLiAgKi8KKyAgICAgIGlmIChrZXljYworICAgICAgICAgICYm ICEgKG1hdGNoZXIKKyAgICAgICAgICAgICAgICAmJiAoU1RSRVEgKG1hdGNoZXIsICJmZ3JlcCIp IHx8IFNUUkVRIChtYXRjaGVyLCAicGNyZSIpKSkKKyAgICAgICAgICAmJiB0cml2aWFsX2Nhc2Vf aWdub3JlIChrZXljYywga2V5cywgJm5ld19rZXljYywgJm5ld19rZXlzKSkKKyAgICAgICAgewor ICAgICAgICAgIG1hdGNoX2ljYXNlID0gMDsKKyAgICAgICAgICBmcmVlIChrZXlzKTsKKyAgICAg ICAgICBrZXlzID0gbmV3X2tleXM7CisgICAgICAgICAga2V5Y2MgPSBuZXdfa2V5Y2M7CisgICAg ICAgIH0KKyAgICB9CisKICAgY29tcGlsZSAoa2V5cywga2V5Y2MpOwogICBmcmVlIChrZXlzKTsK CmRpZmYgLS1naXQgYS90ZXN0cy9NYWtlZmlsZS5hbSBiL3Rlc3RzL01ha2VmaWxlLmFtCmluZGV4 IGY0NTgwYjUuLmEzN2E4MTQgMTAwNjQ0Ci0tLSBhL3Rlc3RzL01ha2VmaWxlLmFtCisrKyBiL3Rl c3RzL01ha2VmaWxlLmFtCkBAIC05NCw2ICs5NCw3IEBAIFRFU1RTID0JCQkJCQlcCiAgIHN0YXR1 cwkJCQkJXAogICBzdXJyb2dhdGUtcGFpcgkJCQlcCiAgIHN5bWxpbmsJCQkJCVwKKyAgdHVya2lz aC1leWVzCQkJCQlcCiAgIHR1cmtpc2gtSQkJCQkJXAogICB0dXJraXNoLUktd2l0aG91dC1kb3QJ CQkJXAogICB3YXJuLWNoYXItY2xhc3NlcwkJCQlcCmRpZmYgLS1naXQgYS90ZXN0cy90dXJraXNo LWV5ZXMgYi90ZXN0cy90dXJraXNoLWV5ZXMKbmV3IGZpbGUgbW9kZSAxMDA3NTUKaW5kZXggMDAw MDAwMC4uMzIzZWIzNQotLS0gL2Rldi9udWxsCisrKyBiL3Rlc3RzL3R1cmtpc2gtZXllcwpAQCAt MCwwICsxLDQ0IEBACisjIS9iaW4vc2gKKyMgRW5zdXJlIHRoYXQgY2FzZS1pbnNlbnNpdGl2ZSBt YXRjaGluZyB3b3JrcyB3aXRoIGFsbCBUdXJraXNoIGkncworCisjIENvcHlyaWdodCAoQykgMjAx NCBGcmVlIFNvZnR3YXJlIEZvdW5kYXRpb24sIEluYy4KKworIyBUaGlzIHByb2dyYW0gaXMgZnJl ZSBzb2Z0d2FyZTogeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yIG1vZGlmeQorIyBpdCB1 bmRlciB0aGUgdGVybXMgb2YgdGhlIEdOVSBHZW5lcmFsIFB1YmxpYyBMaWNlbnNlIGFzIHB1Ymxp c2hlZCBieQorIyB0aGUgRnJlZSBTb2Z0d2FyZSBGb3VuZGF0aW9uLCBlaXRoZXIgdmVyc2lvbiAz IG9mIHRoZSBMaWNlbnNlLCBvcgorIyAoYXQgeW91ciBvcHRpb24pIGFueSBsYXRlciB2ZXJzaW9u LgorCisjIFRoaXMgcHJvZ3JhbSBpcyBkaXN0cmlidXRlZCBpbiB0aGUgaG9wZSB0aGF0IGl0IHdp bGwgYmUgdXNlZnVsLAorIyBidXQgV0lUSE9VVCBBTlkgV0FSUkFOVFk7IHdpdGhvdXQgZXZlbiB0 aGUgaW1wbGllZCB3YXJyYW50eSBvZgorIyBNRVJDSEFOVEFCSUxJVFkgb3IgRklUTkVTUyBGT1Ig QSBQQVJUSUNVTEFSIFBVUlBPU0UuICBTZWUgdGhlCisjIEdOVSBHZW5lcmFsIFB1YmxpYyBMaWNl bnNlIGZvciBtb3JlIGRldGFpbHMuCisKKyMgWW91IHNob3VsZCBoYXZlIHJlY2VpdmVkIGEgY29w eSBvZiB0aGUgR05VIEdlbmVyYWwgUHVibGljIExpY2Vuc2UKKyMgYWxvbmcgd2l0aCB0aGlzIHBy b2dyYW0uICBJZiBub3QsIHNlZSA8aHR0cDovL3d3dy5nbnUub3JnL2xpY2Vuc2VzLz4uCisKKy4g IiR7c3JjZGlyPS59L2luaXQuc2giOyBwYXRoX3ByZXBlbmRfIC4uL3NyYworCityZXF1aXJlX2Nv bXBpbGVkX2luX01CX3N1cHBvcnQKKworZmFpbD0wCisKK0w9dHJfVFIuVVRGLTgKKworIyBDaGVj ayBmb3IgYSBicm9rZW4gdHJfVFIuVVRGLTggbG9jYWxlIGRlZmluaXRpb24uCisjIEluIHRoaXMg bG9jYWxlLCAnaScgaXMgbm90IGEgbG93ZXItY2FzZSAnSScuCitlY2hvIEkgfCBMQ19BTEw9JEwg Z3JlcCAtaSBpID4gL2Rldi9udWxsIFwKKyAgICAmJiBza2lwXyAieW91ciAkTCBsb2NhbGUgYXBw ZWFycyB0byBiZSBicm9rZW4iCisKKyMgRW5zdXJlIHRoYXQgdGhpcyBtYXRjaGVzOgorIyBwcmlu dGYgJ0k6xLAgxLE6aVxuJ3xMQ19BTEw9dHJfVFIudXRmOCBncmVwIC1pICfEsTppIEk6xLAnCitJ PSQocHJpbnRmICdcMzA0XDI2MCcpICMgY2FwaXRhbCBJIHdpdGggZG90CitpPSQocHJpbnRmICdc MzA0XDI2MScpICMgbG93ZXJjYXNlIGRvdGxlc3MgaQorCitkYXRhPSQoICAgICAgcHJpbnRmICJJ OiRJICRpOmkiKQorc2VhcmNoX3N0cj0kKHByaW50ZiAiJGk6aSBJOiRJIikKK3ByaW50ZiAiJGRh dGFcbiIgPiBpbiB8fCBmcmFtZXdvcmtfZmFpbHVyZV8KKworTENfQUxMPSRMIGdyZXAgLWkgIl4k c2VhcmNoX3N0clwkIiBpbiA+IG91dCB8fCBmYWlsPTEKK2NvbXBhcmUgb3V0IGluIHx8IGZhaWw9 MQorCitFeGl0ICRmYWlsCi0tIAoxLjguNS4yLjIyOS5nNDQ0ODQ2NgoK --047d7b6d80b4a2694d04ef96df7a-- From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 10 20:49:35 2014 Received: (at 16232) by debbugs.gnu.org; 11 Jan 2014 01:49:35 +0000 Received: from localhost ([127.0.0.1]:45785 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1nhe-0005wY-Ed for submit@debbugs.gnu.org; Fri, 10 Jan 2014 20:49:35 -0500 Received: from mail5.vodafone.ie ([213.233.128.176]:59855) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1nhc-0005wO-O1 for 16232@debbugs.gnu.org; Fri, 10 Jan 2014 20:49:33 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApQBAIii0FJtThai/2dsb2JhbAANTINDg1S2YIEngxoBAQQjDwFWGA0CBRYLAgIJAwIBAgFFBgEMCAEBiAUIpyl2mnYXgSmQVIFIBJ8DjRiBPg Received: from unknown (HELO [192.168.1.79]) ([109.78.22.162]) by mail3.vodafone.ie with ESMTP; 11 Jan 2014 01:49:31 +0000 Message-ID: <52D0A32A.8050604@draigBrady.com> Date: Sat, 11 Jan 2014 01:49:30 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: 16232@debbugs.gnu.org, Jim Meyering Subject: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: CA+8g5KGyCLeOOK7HMz5iCWy4Pfu+vZ-BuDMKT=LnJ6ZzuDdTwQ@mail.gmail.com X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Cool so it does this transformation: sed 's/./[\L&\U&]/g' Though multi byte case handling has all sorts of edge cases (pardon the pun), and it may not be always valid to treat each character independently? For example see some of the tests in: http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=tests/unicase/test-ulc-casecmp.c;hb=HEAD I wonder might this faster path be restricted to a safer but very common input subset of: (MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80)) Also are the following printfs in the test redundant? > +data=$( printf "I:$I $i:i") > +search_str=$(printf "$i:i I:$I") nice improvement! Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 10 23:52:25 2014 Received: (at 16232) by debbugs.gnu.org; 11 Jan 2014 04:52:25 +0000 Received: from localhost ([127.0.0.1]:45935 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1qYa-0002r8-Em for submit@debbugs.gnu.org; Fri, 10 Jan 2014 23:52:24 -0500 Received: from mail-pd0-f171.google.com ([209.85.192.171]:39892) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1qYY-0002r0-7H for 16232@debbugs.gnu.org; Fri, 10 Jan 2014 23:52:22 -0500 Received: by mail-pd0-f171.google.com with SMTP id x10so683217pdj.16 for <16232@debbugs.gnu.org>; Fri, 10 Jan 2014 20:52:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=SroUkCdAXYreKwP96NJ+FXjMSuryJU3uU8Qt2H7MjD4=; b=qiMVFbw7lJ+fy29gJ8DAYkulm/O/LkygtynV58BmZTWOk4ADgbKvg7+qsN8KB0eDPj BuPY/A6RzEoLGrgB5RSQJebMOiQpudh4H/UXcL0jkG/GeKIQsOM6uDwn0B1NzJSUXLQZ DTrkARvZGvOkLVVaIR9sBbBeI+JNCFn/Dye7Kxx5r3KGPRmBaRtd/+gD8FFoQpfxPdGb gI8aQ2BpDc/PfZGAv3IAgkAmmBE4vEkClPiIJTuAowb4CO3otcR9ZKPfheBCMZ7cjQzT Q6EaEGN9DZyQnqmcHzt1YA9cWZrgZqXhoUiT8W5v0gzYJJIJPBSqrgNf/Sn1tFcTg0bW DeCQ== X-Received: by 10.68.196.195 with SMTP id io3mr16498093pbc.6.1389415940967; Fri, 10 Jan 2014 20:52:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.157.202 with HTTP; Fri, 10 Jan 2014 20:52:00 -0800 (PST) In-Reply-To: <52D0A32A.8050604@draigBrady.com> References: <52D0A32A.8050604@draigBrady.com> From: Jim Meyering Date: Fri, 10 Jan 2014 20:52:00 -0800 X-Google-Sender-Auth: Yfwyv7BJA_12rjM1QTY6izAy6xg Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: =?ISO-8859-1?Q?P=E1draig_Brady?= Content-Type: multipart/mixed; boundary=e89a8fb208d49b809604efaa9b3f X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --e89a8fb208d49b809604efaa9b3f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Fri, Jan 10, 2014 at 5:49 PM, P=E1draig Brady wrote: > Cool so it does this transformation: > > sed 's/./[\L&\U&]/g' > > Though multi byte case handling has all sorts of edge cases (pardon the p= un), > and it may not be always valid to treat each character independently? > For example see some of the tests in: > http://git.sv.gnu.org/gitweb/?p=3Dgnulib.git;a=3Dblob;f=3Dtests/unicase/t= est-ulc-casecmp.c;hb=3DHEAD It seems you're right. Since it's a many-to-one mapping in some cases, simply using one lower case character and one upper case version won't cover all possibilities. > I wonder might this faster path be restricted to a safer but very common = input subset of: > > (MB_CUR_MAX =3D=3D 1 || (in_utf8 && *c < 0x80)) That sounds like a good approach. Now I need another test case, to demonstrate that the current code can cause trouble. > Also are the following printfs in the test redundant? > >> +data=3D$( printf "I:$I $i:i") >> +search_str=3D$(printf "$i:i I:$I") Good catch. Those were vestiges of pre-factoring code, where they were needed. Here's the patch to fix that part, in your name: --e89a8fb208d49b809604efaa9b3f Content-Type: text/plain; charset=US-ASCII; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hqae8wfq0 RnJvbSA5N2QzNDMwYzc1YTlkZDgyZDg3MWVjYTE3MGIxM2MxZjhkODk1ZmFkIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiA9P1VURi04P3E/UD1DMz1BMWRyYWlnPTIwQnJhZHk/PSA8UEBk cmFpZ0JyYWR5LmNvbT4KRGF0ZTogRnJpLCAxMCBKYW4gMjAxNCAyMDo0Mjo1MyAtMDgwMApTdWJq ZWN0OiBbUEFUQ0hdIHRlc3RzOiByZW1vdmUgc3VwZXJmbHVvdXMgdXNlcyBvZiBwcmludGYKCiog dGVzdHMvdHVya2lzaC1leWVzOiBSZW1vdmUgdW5uZWNlc3NhcnkgdXNlcyBvZiBwcmludGYuCi0t LQogdGVzdHMvdHVya2lzaC1leWVzIHwgNCArKy0tCiAxIGZpbGUgY2hhbmdlZCwgMiBpbnNlcnRp b25zKCspLCAyIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3Rlc3RzL3R1cmtpc2gtZXllcyBi L3Rlc3RzL3R1cmtpc2gtZXllcwppbmRleCAzMjNlYjM1Li42ODMwMWU3IDEwMDc1NQotLS0gYS90 ZXN0cy90dXJraXNoLWV5ZXMKKysrIGIvdGVzdHMvdHVya2lzaC1leWVzCkBAIC0zNCw4ICszNCw4 IEBAIGVjaG8gSSB8IExDX0FMTD0kTCBncmVwIC1pIGkgPiAvZGV2L251bGwgXAogST0kKHByaW50 ZiAnXDMwNFwyNjAnKSAjIGNhcGl0YWwgSSB3aXRoIGRvdAogaT0kKHByaW50ZiAnXDMwNFwyNjEn KSAjIGxvd2VyY2FzZSBkb3RsZXNzIGkKCi1kYXRhPSQoICAgICAgcHJpbnRmICJJOiRJICRpOmki KQotc2VhcmNoX3N0cj0kKHByaW50ZiAiJGk6aSBJOiRJIikKKyAgICAgIGRhdGE9Ikk6JEkgJGk6 aSIKK3NlYXJjaF9zdHI9IiRpOmkgSTokSSIKIHByaW50ZiAiJGRhdGFcbiIgPiBpbiB8fCBmcmFt ZXdvcmtfZmFpbHVyZV8KCiBMQ19BTEw9JEwgZ3JlcCAtaSAiXiRzZWFyY2hfc3RyXCQiIGluID4g b3V0IHx8IGZhaWw9MQotLSAKMS44LjUuMi4yMjkuZzQ0NDg0NjYKCg== --e89a8fb208d49b809604efaa9b3f-- From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 11 00:40:50 2014 Received: (at 16232) by debbugs.gnu.org; 11 Jan 2014 05:40:50 +0000 Received: from localhost ([127.0.0.1]:45967 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1rJR-0004I5-Nn for submit@debbugs.gnu.org; Sat, 11 Jan 2014 00:40:50 -0500 Received: from mail-pd0-f177.google.com ([209.85.192.177]:34981) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1rJN-0004Hr-RG for 16232@debbugs.gnu.org; Sat, 11 Jan 2014 00:40:46 -0500 Received: by mail-pd0-f177.google.com with SMTP id q10so5368943pdj.22 for <16232@debbugs.gnu.org>; Fri, 10 Jan 2014 21:40:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=BYOF/f6lD66T7DroVtDeg94RSVW6redcW8pgW1h54ho=; b=Mbb8LP9OvWe4SHbD7xNiANxjNa15Otunx62Y5ZBb7gutFPKPuBYd+trRNnshcBd8KR C+JWWbgsXXP8FFXlCbVzHthwLFJ48oPfme87aJfcKBXjJ0Vjz3eeQIeJ9iUgbRWQ2K7q gOYOlN/OVmzFM8d/9rnbH1HtobVwitImLCJRq7LuX19mcyGF/1Znrsv3JpRp3ZiVuWx4 JX5KGT8Ia89/gjwc6IHaExVSfaivQRaTicoKsYwl/pf/XV0t6GD86Fz42uDGlopU2Pg9 ZoIzEWJEdlXyJ9B2yinXH3GaVhMqFrG+kX/E/0YDScOdtsZ05yaZQp1MSCxMgtdvICRh mGzQ== X-Received: by 10.66.240.4 with SMTP id vw4mr16476613pac.26.1389418844990; Fri, 10 Jan 2014 21:40:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.157.202 with HTTP; Fri, 10 Jan 2014 21:40:24 -0800 (PST) In-Reply-To: References: <52D0A32A.8050604@draigBrady.com> From: Jim Meyering Date: Fri, 10 Jan 2014 21:40:24 -0800 X-Google-Sender-Auth: ye-Aay0E044qHYeT8axmED55qF8 Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: =?ISO-8859-1?Q?P=E1draig_Brady?= Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Fri, Jan 10, 2014 at 8:52 PM, Jim Meyering wrote: >> I wonder might this faster path be restricted to a safer but very common input subset of: >> >> (MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80)) > > That sounds like a good approach. > Now I need another test case, to demonstrate that the current code can > cause trouble. Hmm... after thinking about this for a while and actually trying to break the current code (did not find a way to demonstrate a regression), I have concluded that the current approach is no worse than the prior one of matching a case-mapped regexp vs. each case-mapped input line. That's not to say that it's perfect, of course. The "LATIN SMALL LETTER J WITH CARON, COMBINING DOT BELOW" example from gnulib's test-ulc-casecmp.c is a great example: this matches: printf '\x6A\xCC\x8C\xCC\xA3\n'|src/grep -i "$(printf '\x6A\xCC\x8C\xCC\xA3')" but this does not, yet probably should: printf '\xC7\xB0\xCC\xA3\n'|src/grep -i "$(printf '\x6A\xCC\x8C\xCC\xA3')" Can you see a way to demonstrate a regression? Thanks again, Jim From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 11 06:33:53 2014 Received: (at 16232) by debbugs.gnu.org; 11 Jan 2014 11:33:53 +0000 Received: from localhost ([127.0.0.1]:46050 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1wp7-0005pr-AT for submit@debbugs.gnu.org; Sat, 11 Jan 2014 06:33:53 -0500 Received: from mail2.vodafone.ie ([213.233.128.44]:22810) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1wp4-0005pf-BY for 16232@debbugs.gnu.org; Sat, 11 Jan 2014 06:33:51 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAO0q0VJtT1jC/2dsb2JhbAANTL4BgRuDGQEBAQQyAUYQCw0LCRYPCQMCAQIBRQYNAQUCAQGIBadfm1wXjwcHhDcBA58DjRiBPg Received: from unknown (HELO [192.168.1.79]) ([109.79.88.194]) by mail2.vodafone.ie with ESMTP; 11 Jan 2014 11:33:48 +0000 Message-ID: <52D12C1B.4020106@draigBrady.com> Date: Sat, 11 Jan 2014 11:33:47 +0000 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: <52D0A32A.8050604@draigBrady.com> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 01/11/2014 05:40 AM, Jim Meyering wrote: > On Fri, Jan 10, 2014 at 8:52 PM, Jim Meyering wrote: >>> I wonder might this faster path be restricted to a safer but very common input subset of: >>> >>> (MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80)) >> >> That sounds like a good approach. >> Now I need another test case, to demonstrate that the current code can >> cause trouble. > > Hmm... after thinking about this for a while and actually trying to > break the current code (did not find a way to demonstrate a regression), > I have concluded that the current approach is no worse than the prior > one of matching a case-mapped regexp vs. each case-mapped input line. > > That's not to say that it's perfect, of course. > The "LATIN SMALL LETTER J WITH CARON, COMBINING DOT BELOW" example > from gnulib's test-ulc-casecmp.c is a great example: this matches: > > printf '\x6A\xCC\x8C\xCC\xA3\n'|src/grep -i "$(printf > '\x6A\xCC\x8C\xCC\xA3')" > > but this does not, yet probably should: > > printf '\xC7\xB0\xCC\xA3\n'|src/grep -i "$(printf '\x6A\xCC\x8C\xCC\xA3')" > > Can you see a way to demonstrate a regression? Oh right, it doesn't handle these cases already. Fair enough I don't see a regression then. +1 Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 11 09:16:05 2014 Received: (at 16232) by debbugs.gnu.org; 11 Jan 2014 14:16:05 +0000 Received: from localhost ([127.0.0.1]:46357 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1zM4-0004Tq-Nw for submit@debbugs.gnu.org; Sat, 11 Jan 2014 09:16:05 -0500 Received: from mail2.vodafone.ie ([213.233.128.44]:60596) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W1zM0-0004QR-8T for 16232@debbugs.gnu.org; Sat, 11 Jan 2014 09:16:01 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApQBANZR0VJtT1jC/2dsb2JhbAANTINDg1S2aoEcgxkBAQEEIw8BRhAJAg0LAgIFFgsCAgkDAgECAUUGDQEFAgEBBYgACIxkmnt2mmEXgSmNXgeCb4FIBJ8DjRiBPg Received: from unknown (HELO [192.168.1.79]) ([109.79.88.194]) by mail2.vodafone.ie with ESMTP; 11 Jan 2014 14:15:58 +0000 Message-ID: <52D1521E.1030206@draigBrady.com> Date: Sat, 11 Jan 2014 14:15:58 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: <52D0A32A.8050604@draigBrady.com> <52D12C1B.4020106@draigBrady.com> In-Reply-To: <52D12C1B.4020106@draigBrady.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 01/11/2014 11:33 AM, Pádraig Brady wrote: > On 01/11/2014 05:40 AM, Jim Meyering wrote: >> On Fri, Jan 10, 2014 at 8:52 PM, Jim Meyering wrote: >>>> I wonder might this faster path be restricted to a safer but very common input subset of: >>>> >>>> (MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80)) >>> >>> That sounds like a good approach. >>> Now I need another test case, to demonstrate that the current code can >>> cause trouble. >> >> Hmm... after thinking about this for a while and actually trying to >> break the current code (did not find a way to demonstrate a regression), >> I have concluded that the current approach is no worse than the prior >> one of matching a case-mapped regexp vs. each case-mapped input line. >> >> That's not to say that it's perfect, of course. >> The "LATIN SMALL LETTER J WITH CARON, COMBINING DOT BELOW" example >> from gnulib's test-ulc-casecmp.c is a great example: this matches: >> >> printf '\x6A\xCC\x8C\xCC\xA3\n'|src/grep -i "$(printf >> '\x6A\xCC\x8C\xCC\xA3')" >> >> but this does not, yet probably should: >> >> printf '\xC7\xB0\xCC\xA3\n'|src/grep -i "$(printf '\x6A\xCC\x8C\xCC\xA3')" >> >> Can you see a way to demonstrate a regression? > > Oh right, it doesn't handle these cases already. > Fair enough I don't see a regression then. This is also a good summary of stuff to consider with case: http://www.unicode.org/faq/casemap_charprop.html So picking another case situation from there: "in the Greek script, capital sigma (U+03A3) is the uppercase form of both the regular (U+03C2) and final (U+03C3) lowercase sigma." One can see that sed handles this: $ printf '\u03C2\u03C3\n' | sed 's/.*/&\U&/' ςσΣΣ $ printf '\u03A3\n' | sed 's/.*/&\L&/' Σσ Though I was surprised the grep (2.14) didn't match any combo of these $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf \u03A3)" $ printf '\u03A3\n' | grep -Fi "$(printf \u03C2)" $ printf '\u03A3\n' | grep -Fi "$(printf \u03C3)" Not a regression of course. cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 11 12:57:19 2014 Received: (at 16232) by debbugs.gnu.org; 11 Jan 2014 17:57:19 +0000 Received: from localhost ([127.0.0.1]:47018 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W22oA-0002Un-3H for submit@debbugs.gnu.org; Sat, 11 Jan 2014 12:57:18 -0500 Received: from mail-pb0-f41.google.com ([209.85.160.41]:62149) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W22o7-0002Uc-E2 for 16232@debbugs.gnu.org; Sat, 11 Jan 2014 12:57:16 -0500 Received: by mail-pb0-f41.google.com with SMTP id jt11so5701256pbb.28 for <16232@debbugs.gnu.org>; Sat, 11 Jan 2014 09:57:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=MWWEqegvwC4wnKc3IKhj6vOOUbFKjCMthXyc7Jd6Obw=; b=MYDWu2bc/1ZCuuW//6jDA3mbkUa89+yaHvssoApJaG+zlNlbxZOD3qQWQf7AaMPBru tbGFrRpuVU44j+67xaA6BYWKj1VhkNSDBG0zHMoUrJ7qecBA9UC05jrtkaCoVbvx/zgz wRwrqG346zJv3MWcZgZ1ZD2vFf4NZDSjMrqq6Bb50KmQO7h2CNvlhwex6yenHFSGx9tj 1BG38/WIgIMhvVeINrhLSzUpk7p9xX8rNMKBpu8NctW6+NaFb31DueUWG3R7CvZpBLsu dXwFhUyA7oco6wQLXAaHgGsJoCrh/LDtasdDDdkkTYwyaKJ4nb8qUN3LODplCbD9KCYf rY2g== X-Received: by 10.66.192.74 with SMTP id he10mr19720325pac.126.1389463034189; Sat, 11 Jan 2014 09:57:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.157.202 with HTTP; Sat, 11 Jan 2014 09:56:53 -0800 (PST) In-Reply-To: <52D1521E.1030206@draigBrady.com> References: <52D0A32A.8050604@draigBrady.com> <52D12C1B.4020106@draigBrady.com> <52D1521E.1030206@draigBrady.com> From: Jim Meyering Date: Sat, 11 Jan 2014 09:56:53 -0800 X-Google-Sender-Auth: gyhKiyUnA5SLq6XrGySqSHhC1-8 Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: =?ISO-8859-1?Q?P=E1draig_Brady?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Sat, Jan 11, 2014 at 6:15 AM, P=C3=A1draig Brady wrot= e: > On 01/11/2014 11:33 AM, P=C3=A1draig Brady wrote: >> On 01/11/2014 05:40 AM, Jim Meyering wrote: >>> On Fri, Jan 10, 2014 at 8:52 PM, Jim Meyering wrote: >>>>> I wonder might this faster path be restricted to a safer but very com= mon input subset of: >>>>> >>>>> (MB_CUR_MAX =3D=3D 1 || (in_utf8 && *c < 0x80)) >>>> >>>> That sounds like a good approach. >>>> Now I need another test case, to demonstrate that the current code can >>>> cause trouble. >>> >>> Hmm... after thinking about this for a while and actually trying to >>> break the current code (did not find a way to demonstrate a regression)= , >>> I have concluded that the current approach is no worse than the prior >>> one of matching a case-mapped regexp vs. each case-mapped input line. >>> >>> That's not to say that it's perfect, of course. >>> The "LATIN SMALL LETTER J WITH CARON, COMBINING DOT BELOW" example >>> from gnulib's test-ulc-casecmp.c is a great example: this matches: >>> >>> printf '\x6A\xCC\x8C\xCC\xA3\n'|src/grep -i "$(printf >>> '\x6A\xCC\x8C\xCC\xA3')" >>> >>> but this does not, yet probably should: >>> >>> printf '\xC7\xB0\xCC\xA3\n'|src/grep -i "$(printf '\x6A\xCC\x8C\xCC= \xA3')" >>> >>> Can you see a way to demonstrate a regression? >> >> Oh right, it doesn't handle these cases already. >> Fair enough I don't see a regression then. > > This is also a good summary of stuff to consider with case: > http://www.unicode.org/faq/casemap_charprop.html > > So picking another case situation from there: > "in the Greek script, capital sigma (U+03A3) is the uppercase form of b= oth > the regular (U+03C2) and final (U+03C3) lowercase sigma." > > One can see that sed handles this: > $ printf '\u03C2\u03C3\n' | sed 's/.*/&\U&/' > =CF=82=CF=83=CE=A3=CE=A3 > $ printf '\u03A3\n' | sed 's/.*/&\L&/' > =CE=A3=CF=83 > > Though I was surprised the grep (2.14) didn't match any combo of these > $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf \u03A3)" > $ printf '\u03A3\n' | grep -Fi "$(printf \u03C2)" > $ printf '\u03A3\n' | grep -Fi "$(printf \u03C3)" > > Not a regression of course. Thank you for the reference and the fine examples. I'll add the latter as a known-failing test. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 11 23:36:54 2014 Received: (at 16232) by debbugs.gnu.org; 12 Jan 2014 04:36:54 +0000 Received: from localhost ([127.0.0.1]:47310 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W2Cn8-0003hR-9A for submit@debbugs.gnu.org; Sat, 11 Jan 2014 23:36:54 -0500 Received: from mail-pb0-f47.google.com ([209.85.160.47]:62031) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W2Cn6-0003hJ-2U for 16232@debbugs.gnu.org; Sat, 11 Jan 2014 23:36:52 -0500 Received: by mail-pb0-f47.google.com with SMTP id um1so5983517pbc.6 for <16232@debbugs.gnu.org>; Sat, 11 Jan 2014 20:36:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=1Mf842lrnZiuGrQQPkAbwAU3SqkjjBl4xkJ2MwAMklc=; b=CfvKUQEHZPreC6FXclwX+myFhWTrFEQVeRxXjxd1cCWLIS5Kj9uLd8UhMc7bnYdEFZ Vjyf6i7u4nqhaOEveTx+AwVn9leA45+MW6DoCrGELMuwmq1I1281iHXau1zhDtw4WfHW krxdk22l8QKQ4AxYtV3K+BhJDJ5qYWMmGshuxjo3UciKqRJoXShZ0IAZ5FRsILLMWscf k5+nEmypxj7QIv9nylxugskS7UhBpGup7HksopZa0woTCayiyNpW1fO7wdSrF9+NAh1k xLtXWhUISndD6A0YlcO18+SVusEG3uYnISRFGzndITcy++g6hyl8r5W3eEJvVed7zxaE o84w== X-Received: by 10.66.136.131 with SMTP id qa3mr21497593pab.77.1389501411103; Sat, 11 Jan 2014 20:36:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.157.202 with HTTP; Sat, 11 Jan 2014 20:36:31 -0800 (PST) In-Reply-To: <52D1521E.1030206@draigBrady.com> References: <52D0A32A.8050604@draigBrady.com> <52D12C1B.4020106@draigBrady.com> <52D1521E.1030206@draigBrady.com> From: Jim Meyering Date: Sat, 11 Jan 2014 20:36:31 -0800 X-Google-Sender-Auth: YtICEf-R95v4Zd1yRr2e0-Vunok Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: =?ISO-8859-1?Q?P=E1draig_Brady?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Sat, Jan 11, 2014 at 6:15 AM, P=C3=A1draig Brady wrot= e: > On 01/11/2014 11:33 AM, P=C3=A1draig Brady wrote: ... > This is also a good summary of stuff to consider with case: > http://www.unicode.org/faq/casemap_charprop.html > > So picking another case situation from there: > "in the Greek script, capital sigma (U+03A3) is the uppercase form of b= oth > the regular (U+03C2) and final (U+03C3) lowercase sigma." > > One can see that sed handles this: > $ printf '\u03C2\u03C3\n' | sed 's/.*/&\U&/' > =CF=82=CF=83=CE=A3=CE=A3 > $ printf '\u03A3\n' | sed 's/.*/&\L&/' > =CE=A3=CF=83 > > Though I was surprised the grep (2.14) didn't match any combo of these > $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf \u03A3)" > $ printf '\u03A3\n' | grep -Fi "$(printf \u03C2)" > $ printf '\u03A3\n' | grep -Fi "$(printf \u03C3)" Actually, if you quote the argument to the latter printf, two of those do match, both with -F and without: $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf '\u03A3')" =CF=82=CF=83 $ printf '\u03A3\n' | grep -Fi "$(printf '\u03C2')" $ printf '\u03A3\n' | grep -Fi "$(printf '\u03C3')" =CE=A3 From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 12 07:56:27 2014 Received: (at 16232) by debbugs.gnu.org; 12 Jan 2014 12:56:27 +0000 Received: from localhost ([127.0.0.1]:47458 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W2KaY-0002HV-Tp for submit@debbugs.gnu.org; Sun, 12 Jan 2014 07:56:27 -0500 Received: from mail3.vodafone.ie ([213.233.128.45]:63294) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W2KaX-0002HM-2D for 16232@debbugs.gnu.org; Sun, 12 Jan 2014 07:56:25 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApQBAJeQ0lJtTiy7/2dsb2JhbAANTYNDg1S2aYEcgxkBAQEEIw8BRhAJAg0LAgIFFgsCAgkDAgECAUUGDQEFAgEBBYgACI1Gmnt2mlEXgSmNXgeCb4FIAQOfA40YgT4 Received: from unknown (HELO [192.168.1.79]) ([109.78.44.187]) by mail3.vodafone.ie with ESMTP; 12 Jan 2014 12:56:23 +0000 Message-ID: <52D290F7.6050002@draigBrady.com> Date: Sun, 12 Jan 2014 12:56:23 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: <52D0A32A.8050604@draigBrady.com> <52D12C1B.4020106@draigBrady.com> <52D1521E.1030206@draigBrady.com> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 01/12/2014 04:36 AM, Jim Meyering wrote: > On Sat, Jan 11, 2014 at 6:15 AM, Pádraig Brady wrote: >> On 01/11/2014 11:33 AM, Pádraig Brady wrote: > ... >> This is also a good summary of stuff to consider with case: >> http://www.unicode.org/faq/casemap_charprop.html >> >> So picking another case situation from there: >> "in the Greek script, capital sigma (U+03A3) is the uppercase form of both >> the regular (U+03C2) and final (U+03C3) lowercase sigma." >> >> One can see that sed handles this: >> $ printf '\u03C2\u03C3\n' | sed 's/.*/&\U&/' >> ςσΣΣ >> $ printf '\u03A3\n' | sed 's/.*/&\L&/' >> Σσ >> >> Though I was surprised the grep (2.14) didn't match any combo of these >> $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf \u03A3)" >> $ printf '\u03A3\n' | grep -Fi "$(printf \u03C2)" >> $ printf '\u03A3\n' | grep -Fi "$(printf \u03C3)" > > Actually, if you quote the argument to the latter printf, two of those > do match, both with -F and without: > > $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf '\u03A3')" > ςσ > $ printf '\u03A3\n' | grep -Fi "$(printf '\u03C2')" > $ printf '\u03A3\n' | grep -Fi "$(printf '\u03C3')" > Σ Oops right. So that's still no regression with the new scheme since grep is 1:1 here for Σ and σ. thanks, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 19 09:22:29 2014 Received: (at 16232) by debbugs.gnu.org; 19 Feb 2014 14:22:29 +0000 Received: from localhost ([127.0.0.1]:59675 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WG82f-0005za-2D for submit@debbugs.gnu.org; Wed, 19 Feb 2014 09:22:29 -0500 Received: from pbsg500.nifty.com ([202.248.238.70]:38545) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WG82Z-0005zK-Nj for 16232@debbugs.gnu.org; Wed, 19 Feb 2014 09:22:26 -0500 Received: from [10.120.1.67] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg500.nifty.com with ESMTP id s1JEM7O6025239; Wed, 19 Feb 2014 23:22:08 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Wed, 19 Feb 2014 23:22:08 +0900 From: Norihiro Tanaka To: Padraig Brady Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales In-Reply-To: <52D290F7.6050002@draigBrady.com> References: <52D290F7.6050002@draigBrady.com> Message-Id: <20140219232205.23D3.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org>, Jim Meyering X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) Hi, Slow down may be caused by the patch, because MBCSET is processed by not DFA engine but regexp engine. I tested performance on grep-2.17 and the version which the patch is reverted. Latter is 100x faster. yes $(printf '%078dm' 0)|head -10000 > in grep-2.17 original: $ for i in $(seq 10); do env LC_ALL=ja_JP.eucJP time src/grep -i n in; done Command exited with non-zero status 1 5.92user 1.69system 0:07.73elapsed 98%CPU (0avgtext+0avgdata 3856maxresident)k 0inputs+0outputs (0major+422minor)pagefaults 0swaps Command exited with non-zero status 1 5.59user 1.87system 0:07.58elapsed 98%CPU (0avgtext+0avgdata 3872maxresident)k 0inputs+0outputs (0major+423minor)pagefaults 0swaps Command exited with non-zero status 1 6.06user 1.58system 0:07.81elapsed 97%CPU (0avgtext+0avgdata 3872maxresident)k 0inputs+0outputs (0major+423minor)pagefaults 0swaps Command exited with non-zero status 1 5.73user 1.66system 0:07.52elapsed 98%CPU (0avgtext+0avgdata 3856maxresident)k 0inputs+0outputs (0major+422minor)pagefaults 0swaps Command exited with non-zero status 1 6.42user 1.19system 0:07.86elapsed 96%CPU (0avgtext+0avgdata 3872maxresident)k 0inputs+0outputs (0major+423minor)pagefaults 0swaps Command exited with non-zero status 1 6.15user 1.56system 0:08.34elapsed 92%CPU (0avgtext+0avgdata 3888maxresident)k 0inputs+0outputs (0major+424minor)pagefaults 0swaps Command exited with non-zero status 1 6.97user 0.61system 0:07.77elapsed 97%CPU (0avgtext+0avgdata 3856maxresident)k 0inputs+0outputs (0major+422minor)pagefaults 0swaps Command exited with non-zero status 1 7.00user 0.57system 0:07.71elapsed 98%CPU (0avgtext+0avgdata 3872maxresident)k 0inputs+0outputs (0major+423minor)pagefaults 0swaps Command exited with non-zero status 1 7.16user 0.25system 0:07.56elapsed 97%CPU (0avgtext+0avgdata 3872maxresident)k 0inputs+0outputs (0major+423minor)pagefaults 0swaps Command exited with non-zero status 1 7.04user 0.39system 0:07.60elapsed 97%CPU (0avgtext+0avgdata 3856maxresident)k 0inputs+0outputs (0major+422minor)pagefaults 0swaps After revert the patch: $ for i in $(seq 10); do env LC_ALL=ja_JP.eucJP time src/grep -i n in; done Command exited with non-zero status 1 0.07user 0.02system 0:00.10elapsed 92%CPU (0avgtext+0avgdata 3072maxresident)k 0inputs+0outputs (0major+232minor)pagefaults 0swaps Command exited with non-zero status 1 0.03user 0.01system 0:00.05elapsed 101%CPU (0avgtext+0avgdata 3072maxresident)k 0inputs+0outputs (0major+218minor)pagefaults 0swaps Command exited with non-zero status 1 0.04user 0.01system 0:00.06elapsed 90%CPU (0avgtext+0avgdata 3072maxresident)k 0inputs+0outputs (0major+218minor)pagefaults 0swaps Command exited with non-zero status 1 0.03user 0.02system 0:00.05elapsed 103%CPU (0avgtext+0avgdata 3056maxresident)k 0inputs+0outputs (0major+217minor)pagefaults 0swaps Command exited with non-zero status 1 0.04user 0.01system 0:00.06elapsed 86%CPU (0avgtext+0avgdata 3088maxresident)k 0inputs+0outputs (0major+219minor)pagefaults 0swaps Command exited with non-zero status 1 0.04user 0.01system 0:00.06elapsed 91%CPU (0avgtext+0avgdata 3056maxresident)k 0inputs+0outputs (0major+217minor)pagefaults 0swaps Command exited with non-zero status 1 0.04user 0.01system 0:00.06elapsed 87%CPU (0avgtext+0avgdata 3056maxresident)k 0inputs+0outputs (0major+217minor)pagefaults 0swaps Command exited with non-zero status 1 0.03user 0.02system 0:00.05elapsed 105%CPU (0avgtext+0avgdata 3072maxresident)k 0inputs+0outputs (0major+218minor)pagefaults 0swaps Command exited with non-zero status 1 0.04user 0.00system 0:00.06elapsed 90%CPU (0avgtext+0avgdata 3072maxresident)k 0inputs+0outputs (0major+218minor)pagefaults 0swaps Command exited with non-zero status 1 0.04user 0.01system 0:00.06elapsed 90%CPU (0avgtext+0avgdata 3056maxresident)k 0inputs+0outputs (0major+217minor)pagefaults 0swaps Norihiro From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 19 13:30:38 2014 Received: (at 16232) by debbugs.gnu.org; 19 Feb 2014 18:30:38 +0000 Received: from localhost ([127.0.0.1]:60536 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGBuo-0004dL-7g for submit@debbugs.gnu.org; Wed, 19 Feb 2014 13:30:38 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:54122) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGBuk-0004cq-Pl for 16232@debbugs.gnu.org; Wed, 19 Feb 2014 13:30:35 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id E7C09A60017; Wed, 19 Feb 2014 10:30:28 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8mJwGt173HPK; Wed, 19 Feb 2014 10:30:28 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 902FDA60008; Wed, 19 Feb 2014 10:30:28 -0800 (PST) Message-ID: <5304F840.9010908@cs.ucla.edu> Date: Wed, 19 Feb 2014 10:30:24 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Norihiro Tanaka , Padraig Brady Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140219232205.23D3.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) On 02/19/2014 06:22 AM, Norihiro Tanaka wrote: > I tested performance on grep-2.17 and the version which the patch is reverted. > Latter is 100x faster. While we're on the topic, can someone please say why that patch was done as a preprocessor for the regex code? I normally would think that the way to speed up regex processing is to improve dfa.c and/or the regex library, as that should speed up all programs that use these modules, not just grep. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 19 14:17:47 2014 Received: (at 16232) by debbugs.gnu.org; 19 Feb 2014 19:17:47 +0000 Received: from localhost ([127.0.0.1]:60601 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGCeQ-0005uX-Fg for submit@debbugs.gnu.org; Wed, 19 Feb 2014 14:17:47 -0500 Received: from mail-pa0-f50.google.com ([209.85.220.50]:35502) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGCeN-0005uI-1D for 16232@debbugs.gnu.org; Wed, 19 Feb 2014 14:17:44 -0500 Received: by mail-pa0-f50.google.com with SMTP id kp14so815363pab.9 for <16232@debbugs.gnu.org>; Wed, 19 Feb 2014 11:17:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=BkHaEMKPoGJF/BGVmRcxdkJQIllI93y6++EPyRxpWkY=; b=VeCRHDXT/5bGriAKkBp2je44n0+8Wgw/sF5ToWqNSgDk/zdgdnoXHFPICQolJpuU/x +7fkZLI0vgdVId+pe5ZIfzQjn8rcA9zlNf+SH9ri0b6mjn2XNJG9Acst69IMWgeRhcXV CU27QIi+ptGJKo7iv8hGAaGv/yUkAgS898oVxPna7b+L6I8zgBWNkbiht1mxoMDJxohk Y+11clB6ZgM6Xvdv0+O4MkBLhG5p4ZllBcokq87TmWq4J7v5paFhL4XOk5S7bWLYwnsh NBR6MBx6GigsOznQx6Mdque2H5BOAfGC9BGn91lpwzQUhzE+BzsMav/1IZUzL8oQX32q pu+Q== X-Received: by 10.66.138.40 with SMTP id qn8mr4161903pab.154.1392837457056; Wed, 19 Feb 2014 11:17:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Wed, 19 Feb 2014 11:17:16 -0800 (PST) In-Reply-To: <20140219232205.23D3.27F6AC2D@kcn.ne.jp> References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Wed, 19 Feb 2014 11:17:16 -0800 X-Google-Sender-Auth: SHunOpJna4giPjNTCrXSODfWUpU Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Norihiro Tanaka Content-Type: multipart/mixed; boundary=047d7b15a41fdbaa9504f2c73da0 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org>, Padraig Brady X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --047d7b15a41fdbaa9504f2c73da0 Content-Type: text/plain; charset=ISO-8859-1 On Wed, Feb 19, 2014 at 6:22 AM, Norihiro Tanaka wrote: > for i in $(seq 10); do env LC_ALL=ja_JP.eucJP time src/grep -i n in; done Wow. You're right. With the attached patch, I see a speedup of more than 130x in this case: (fyi, the "time" output is slightly different, because I have installed GNU time) grep-2.17$ for i in $(seq 5); do env LC_ALL=ja_JP.eucJP time grep -i n in; done 2.78 real 2.78 user 0.00 sys 2.73 real 2.73 user 0.00 sys 2.75 real 2.75 user 0.00 sys 2.73 real 2.73 user 0.00 sys 2.74 real 2.74 user 0.00 sys 2.17+patch$ for i in $(seq 5); do env LC_ALL=ja_JP.eucJP time src/grep -i n in; done 0.02 real 0.02 user 0.00 sys 0.02 real 0.02 user 0.00 sys 0.02 real 0.02 user 0.00 sys 0.02 real 0.02 user 0.00 sys 0.02 real 0.02 user 0.00 sys I haven't investigated they "why" yet, but expect that I will make grep-2.18 with just this one performance-improving patch. Thank you, Norihiro, Jim --047d7b15a41fdbaa9504f2c73da0 Content-Type: text/plain; charset=US-ASCII; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hruze9ny0 RnJvbSA2NWNiYTU1NmE2MjAxNjk1MmRjNzZlOWE5NDNkNDlhMmUyYjI3ZWMzIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBXZWQsIDE5IEZlYiAyMDE0IDExOjE0OjUyIC0wODAwClN1YmplY3Q6IFtQQVRDSF0gZ3Jl cDogbWFrZSAtaSB1cCB0byAxMzB4IGZhc3RlciBpbiBhIG11bHRpLWJ5dGUgbG9jYWxlCgpUaGlz IHJldmVydHMgdGhlIGdyZXAtc291cmNlIGNoYW5nZXMgb2YgY29tbWl0IHYyLjE2LTQtZzk3MzE4 ZjUsCiJncmVwOiBtYWtlIC0taWdub3JlLWNhc2UgKC1pKSBmYXN0ZXIgKHNvbWV0aW1lcyAxMHgp IGluIG11bHRpYnl0ZQpsb2NhbGVzIiwgYnV0IGxlYXZlcyB0aGF0IGNvbW1pdCdzIGFkZGVkIHRl c3RzLgoqIHNyYy9tYWluLmMgKHRyaXZpYWxfY2FzZV9jb252ZXJ0KTogUmVtb3ZlIGZ1bmN0aW9u LgoobWFpbik6IFJlbW92ZSByZWdleHAgcHJlcHJvY2Vzc2luZyBmb3IgLWkuCiogTkVXUyAoSW1w cm92ZW1lbnRzKTogTWVudGlvbiB0aGlzLgotLS0KIE5FV1MgICAgICAgfCAgIDQgKysrCiBzcmMv bWFpbi5jIHwgMTExICstLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0KIDIgZmlsZXMgY2hhbmdlZCwgNSBpbnNlcnRpb25zKCspLCAxMTAg ZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvTkVXUyBiL05FV1MKaW5kZXggNjc4NWE5Ni4uYzZk NzhkMCAxMDA2NDQKLS0tIGEvTkVXUworKysgYi9ORVdTCkBAIC0yLDYgKzIsMTAgQEAgR05VIGdy ZXAgTkVXUyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC0qLSBvdXRsaW5lIC0q LQoKICogTm90ZXdvcnRoeSBjaGFuZ2VzIGluIHJlbGVhc2UgPy4/ICg/Pz8/LT8/LT8/KSBbP10K CisqKiBJbXByb3ZlbWVudHMKKworICBncmVwIC1pIGluIGEgbXVsdGlieXRlIGxvY2FsZSBtYXkg YmUgb3ZlciAxMzAgdGltZXMgZmFzdGVyIHRoYW4gaW4gMi4xNworCgogKiBOb3Rld29ydGh5IGNo YW5nZXMgaW4gcmVsZWFzZSAyLjE3ICgyMDE0LTAyLTE3KSBbc3RhYmxlXQoKZGlmZiAtLWdpdCBh L3NyYy9tYWluLmMgYi9zcmMvbWFpbi5jCmluZGV4IGJkMjAyOTcuLjI4ZDM3MjAgMTAwNjQ0Ci0t LSBhL3NyYy9tYWluLmMKKysrIGIvc3JjL21haW4uYwpAQCAtMjcsNyArMjcsNiBAQAogI2luY2x1 ZGUgPGZjbnRsLmg+CiAjaW5jbHVkZSA8aW50dHlwZXMuaD4KICNpbmNsdWRlIDxzdGRpby5oPgot I2luY2x1ZGUgPGFzc2VydC5oPgogI2luY2x1ZGUgInN5c3RlbS5oIgoKICNpbmNsdWRlICJhcmdt YXRjaC5oIgpAQCAtMTY0MiwxNCArMTY0MSwxMyBAQCBpZiBhbnkgZXJyb3Igb2NjdXJzIGFuZCAt cSBpcyBub3QgZ2l2ZW4sIHRoZSBleGl0IHN0YXR1cyBpcyAyLlxuIikpOwogICBleGl0IChzdGF0 dXMpOwogfQoKLXN0YXRpYyBjaGFyIGNvbnN0ICptYXRjaGVyOwotCiAvKiBJZiBNIGlzIE5VTEws IGluaXRpYWxpemUgdGhlIG1hdGNoZXIgdG8gdGhlIGRlZmF1bHQuICBPdGhlcndpc2Ugc2V0IHRo ZQogICAgbWF0Y2hlciB0byBNIGlmIGF2YWlsYWJsZS4gIEV4aXQgaW4gY2FzZSBvZiBjb25mbGlj dHMgb3IgaWYgTSBpcyBub3QKICAgIGF2YWlsYWJsZS4gICovCiBzdGF0aWMgdm9pZAogc2V0bWF0 Y2hlciAoY2hhciBjb25zdCAqbSkKIHsKKyAgc3RhdGljIGNoYXIgY29uc3QgKm1hdGNoZXI7CiAg IHVuc2lnbmVkIGludCBpOwoKICAgaWYgKCFtKQpAQCAtMTg2NCw4NCArMTg2Miw2IEBAIHBhcnNl X2dyZXBfY29sb3JzICh2b2lkKQogICAgICAgcmV0dXJuOwogfQoKLSNkZWZpbmUgTUJSVE9XQyhw d2MsIHMsIG4sIHBzKSBcCi0gIChNQl9DVVJfTUFYID09IDEgPyBcCi0gICAoKihwd2MpID0gYnRv d2MgKCoodW5zaWduZWQgY2hhciAqKSAocykpLCAxKSA6IFwKLSAgIG1icnRvd2MgKChwd2MpLCAo cyksIChuKSwgKHBzKSkpCi0KLSNkZWZpbmUgV0NSVE9NQihzLCB3YywgcHMpIFwKLSAgKE1CX0NV Ul9NQVggPT0gMSA/IFwKLSAgICgqKHMpID0gd2N0b2IgKCh3aW50X3QpICh3YykpLCAxKSA6IFwK LSAgIHdjcnRvbWIgKChzKSwgKHdjKSwgKHBzKSkpCi0KLS8qIElmIHRoZSBuZXdsaW5lLXNlcGFy YXRlZCByZWd1bGFyIGV4cHJlc3Npb25zLCBLRVlTICh3aXRoIGxlbmd0aCwgTEVOCi0gICBhbmQg bm8gdHJhaWxpbmcgTlVMIGJ5dGUpLCBhcmUgYW1lbmFibGUgdG8gdHJhbnNmb3JtYXRpb24gaW50 bwotICAgb3RoZXJ3aXNlIGVxdWl2YWxlbnQgY2FzZS1pZ25vcmluZyBvbmVzLCBwZXJmb3JtIHRo ZSB0cmFuc2Zvcm1hdGlvbiwKLSAgIHB1dCB0aGUgcmVzdWx0IGludG8gbWFsbG9jJ2QgbWVtb3J5 LCAqTkVXX0tFWVMgd2l0aCBsZW5ndGggKk5FV19MRU4sCi0gICBhbmQgcmV0dXJuIHRydWUuICBP dGhlcndpc2UsIHJldHVybiBmYWxzZS4gICovCi1zdGF0aWMgYm9vbAotdHJpdmlhbF9jYXNlX2ln bm9yZSAoc2l6ZV90IGxlbiwgY2hhciBjb25zdCAqa2V5cywKLSAgICAgICAgICAgICAgICAgICAg IHNpemVfdCAqbmV3X2xlbiwgY2hhciAqKm5ld19rZXlzKQotewotICAvKiBGSVhNRTogY29uc2lk ZXIgcmVtb3ZpbmcgdGhlIGZvbGxvd2luZyByZXN0cmljdGlvbjoKLSAgICAgUmVqZWN0IGlmIEtF WVMgY29udGFpbiBBU0NJSSAnXFwnIG9yICdbJy4gICovCi0gIGlmIChtZW1jaHIgKGtleXMsICdc XCcsIGxlbikgfHwgbWVtY2hyIChrZXlzLCAnWycsIGxlbikpCi0gICAgcmV0dXJuIGZhbHNlOwot Ci0gIC8qIFdvcnN0IGNhc2UgaXMgdGhhdCBlYWNoIGJ5dGUgQiBvZiBLRVlTIGlzIEFTQ0lJIGFs cGhhYmV0aWMgYW5kIGVhY2gKLSAgICAgb3RoZXJfY2FzZShCKSBjaGFyYWN0ZXIsIEMsIG9jY3Vw aWVzIE1CX0NVUl9NQVggYnl0ZXMsIHNvIGVhY2ggQgotICAgICBtYXBzIHRvIFtCQ10sIHdoaWNo IHJlcXVpcmVzIE1CX0NVUl9NQVggKyAzIGJ5dGVzLiAgICovCi0gICpuZXdfa2V5cyA9IHhubWFs bG9jIChNQl9DVVJfTUFYICsgMywgbGVuICsgMSk7Ci0gIGNoYXIgKnAgPSAqbmV3X2tleXM7Ci0K LSAgbWJzdGF0ZV90IG1iX3N0YXRlOwotICBtZW1zZXQgKCZtYl9zdGF0ZSwgMCwgc2l6ZW9mIG1i X3N0YXRlKTsKLSAgd2hpbGUgKGxlbikKLSAgICB7Ci0gICAgICB3Y2hhcl90IHdjOwotICAgICAg aW50IG4gPSBNQlJUT1dDICgmd2MsIGtleXMsIGxlbiwgJm1iX3N0YXRlKTsKLQotICAgICAgLyog Rm9yIGFuIGludmFsaWQsIGluY29tcGxldGUgb3IgTCdcMCcsIHNraXAgdGhpcyBvcHRpbWl6YXRp b24uICAqLwotICAgICAgaWYgKG4gPD0gMCkKLSAgICAgICAgewotICAgICAgICBza2lwX2Nhc2Vf aWdub3JlX29wdGltaXphdGlvbjoKLSAgICAgICAgICBmcmVlICgqbmV3X2tleXMpOwotICAgICAg ICAgIHJldHVybiBmYWxzZTsKLSAgICAgICAgfQotCi0gICAgICBjaGFyIGNvbnN0ICpvcmlnID0g a2V5czsKLSAgICAgIGtleXMgKz0gbjsKLSAgICAgIGxlbiAtPSBuOwotCi0gICAgICBpZiAoIWlz d2FscGhhICh3YykpCi0gICAgICAgIHsKLSAgICAgICAgICBtZW1jcHkgKHAsIG9yaWcsIG4pOwot ICAgICAgICAgIHAgKz0gbjsKLSAgICAgICAgfQotICAgICAgZWxzZQotICAgICAgICB7Ci0gICAg ICAgICAgKnArKyA9ICdbJzsKLSAgICAgICAgICBtZW1jcHkgKHAsIG9yaWcsIG4pOwotICAgICAg ICAgIHAgKz0gbjsKLQotICAgICAgICAgIHdjaGFyX3Qgd2MyID0gaXN3dXBwZXIgKHdjKSA/IHRv d2xvd2VyICh3YykgOiB0b3d1cHBlciAod2MpOwotICAgICAgICAgIGNoYXIgYnVmW01CX0NVUl9N QVhdOwotICAgICAgICAgIGludCBuMiA9IFdDUlRPTUIgKGJ1Ziwgd2MyLCAmbWJfc3RhdGUpOwot ICAgICAgICAgIGlmIChuMiA8PSAwKQotICAgICAgICAgICAgZ290byBza2lwX2Nhc2VfaWdub3Jl X29wdGltaXphdGlvbjsKLSAgICAgICAgICBhc3NlcnQgKG4yIDw9IE1CX0NVUl9NQVgpOwotICAg ICAgICAgIG1lbWNweSAocCwgYnVmLCBuMik7Ci0gICAgICAgICAgcCArPSBuMjsKLQotICAgICAg ICAgICpwKysgPSAnXSc7Ci0gICAgICAgIH0KLSAgICB9Ci0KLSAgKm5ld19sZW4gPSBwIC0gKm5l d19rZXlzOwotCi0gIHJldHVybiB0cnVlOwotfQotCiBpbnQKIG1haW4gKGludCBhcmdjLCBjaGFy ICoqYXJndikKIHsKQEAgLTIzMzYsMzUgKzIyNTYsNiBAQCBtYWluIChpbnQgYXJnYywgY2hhciAq KmFyZ3YpCiAgIGVsc2UKICAgICB1c2FnZSAoRVhJVF9UUk9VQkxFKTsKCi0gIC8qIEFzIGN1cnJl bnRseSBpbXBsZW1lbnRlZCwgY2FzZS1pbnNlbnNpdGl2ZSBtYXRjaGluZyBpcyBleHBlbnNpdmUg aW4KLSAgICAgbXVsdGktYnl0ZSBsb2NhbGVzIGJlY2F1c2Ugb2YgYSBmZXcgb3V0bGllciBsb2Nh bGVzIGluIHdoaWNoIHNvbWUKLSAgICAgY2hhcmFjdGVycyBjaGFuZ2Ugc2l6ZSB3aGVuIGNvbnZl cnRlZCB0byB1cHBlciBvciBsb3dlciBjYXNlLiAgVG8KLSAgICAgYWNjb21tb2RhdGUgdGhvc2Us IHdlIHJldmVydCB0byBzZWFyY2hpbmcgdGhlIGlucHV0IG9uZSBsaW5lIGF0IGEKLSAgICAgdGlt ZSwgcmF0aGVyIHRoYW4gdXNpbmcgdGhlIG11Y2ggbW9yZSBlZmZpY2llbnQgYnVmZmVyIHNlYXJj aC4KLSAgICAgSG93ZXZlciwgaWYgd2UgaGF2ZSBhIHJlZ3VsYXIgZXhwcmVzc2lvbiwgL2Zvby9p LCB3ZSBjYW4gY29udmVydAotICAgICBpdCB0byBhbiBlcXVpdmFsZW50IGNhc2UtaW5zZW5zaXRp dmUgL1tmRl1bb09dW29PXS8sIGFuZCB0aHVzCi0gICAgIGF2b2lkIHRoZSBleHBlbnNpdmUgcmVh ZC1hbmQtcHJvY2Vzcy1hLWxpbmUtYXQtYS10aW1lIHJlcXVpcmVtZW50LgotICAgICBPcHRpbWl6 ZS1hd2F5IHRoZSAiLWkiIG9wdGlvbiwgd2hlbiBwb3NzaWJsZSwgY29udmVydGluZyBlYWNoCi0g ICAgIGNhbmRpZGF0ZSBhbHBoYSwgQywgaW4gdGhlIHJlZ2V4cCB0byBbQ2NdLiAgKi8KLSAgaWYg KG1hdGNoX2ljYXNlKQotICAgIHsKLSAgICAgIHNpemVfdCBuZXdfa2V5Y2M7Ci0gICAgICBjaGFy ICpuZXdfa2V5czsKLSAgICAgIC8qIEl0IGlzIG5vdCBwb3NzaWJsZSB3aXRoIC1GLCBub3QgdXNl ZnVsIHdpdGggLVAgKHBjcmUpIGFuZCB0aGVyZSBpcyBubwotICAgICAgICAgcG9pbnQgd2hlbiB0 aGVyZSBpcyBubyByZWdleHAuICBJdCBhbHNvIGRlcGVuZHMgb24gd2hpY2ggY29uc3RydWN0cwot ICAgICAgICAgYXBwZWFyIGluIHRoZSByZWdleHAuICBTZWUgdHJpdmlhbF9jYXNlX2lnbm9yZSBm b3IgdGhvc2UgZGV0YWlscy4gICovCi0gICAgICBpZiAoa2V5Y2MKLSAgICAgICAgICAmJiAhICht YXRjaGVyCi0gICAgICAgICAgICAgICAgJiYgKFNUUkVRIChtYXRjaGVyLCAiZmdyZXAiKSB8fCBT VFJFUSAobWF0Y2hlciwgInBjcmUiKSkpCi0gICAgICAgICAgJiYgdHJpdmlhbF9jYXNlX2lnbm9y ZSAoa2V5Y2MsIGtleXMsICZuZXdfa2V5Y2MsICZuZXdfa2V5cykpCi0gICAgICAgIHsKLSAgICAg ICAgICBtYXRjaF9pY2FzZSA9IDA7Ci0gICAgICAgICAgZnJlZSAoa2V5cyk7Ci0gICAgICAgICAg a2V5cyA9IG5ld19rZXlzOwotICAgICAgICAgIGtleWNjID0gbmV3X2tleWNjOwotICAgICAgICB9 Ci0gICAgfQotCiAjaWYgTUJTX1NVUFBPUlQKICAgaWYgKE1CX0NVUl9NQVggPiAxKQogICAgIGJ1 aWxkX21iY2xlbl9jYWNoZSAoKTsKLS0gCjEuOS4wCgo= --047d7b15a41fdbaa9504f2c73da0-- From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 19 14:51:57 2014 Received: (at 16232) by debbugs.gnu.org; 19 Feb 2014 19:51:57 +0000 Received: from localhost ([127.0.0.1]:60639 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGDBV-0006sQ-6Q for submit@debbugs.gnu.org; Wed, 19 Feb 2014 14:51:57 -0500 Received: from mail-pa0-f46.google.com ([209.85.220.46]:59255) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGDBR-0006s9-TX for 16232@debbugs.gnu.org; Wed, 19 Feb 2014 14:51:54 -0500 Received: by mail-pa0-f46.google.com with SMTP id rd3so844643pab.33 for <16232@debbugs.gnu.org>; Wed, 19 Feb 2014 11:51:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=ugIrJfRop22Vh8nNu6zFC24uDZYsd5jkdcnmDZeqCp8=; b=W8CfIk0iHKF1iTUIpRrp+H8vIWWuEYOQDQYFBt8l4yqZn/cixqFxDbFL2eod+4lG4q WfkL+hf+GUzTNzIM8T6yIxNLh73LDUEQMNU+jO/4Q6ts/LVM3vPOJUzQRYroZqJuQ3qQ 3KjPucblQvHebet3d0vVwxKZJmcnn6j+sAwGOLIxr1vna2im6rnvpVLkbaxFx0tnqcnq nF9zjR90kwNHSxTREklKMazajZWAwnysbf+ojMIr+3Csq1d9pEzwdFdAqb3yI2JjoDoQ JgQbT1M/f6HbbZBKsHSgTkRdiT5bJIr6lD0g+OgWSdthUMWF5yrkgpSCupoYMStWWDMh RrqQ== X-Received: by 10.68.209.193 with SMTP id mo1mr42456003pbc.38.1392839507889; Wed, 19 Feb 2014 11:51:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Wed, 19 Feb 2014 11:51:27 -0800 (PST) In-Reply-To: <5304F840.9010908@cs.ucla.edu> References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <5304F840.9010908@cs.ucla.edu> From: Jim Meyering Date: Wed, 19 Feb 2014 11:51:27 -0800 X-Google-Sender-Auth: McZeGubGkHPg35f76tHvZMt6jME Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org>, Padraig Brady , Norihiro Tanaka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Wed, Feb 19, 2014 at 10:30 AM, Paul Eggert wrote: > While we're on the topic, can someone please say why that patch was done as > a preprocessor for the regex code? I normally would think that the way to > speed up regex processing is to improve dfa.c and/or the regex library, as > that should speed up all programs that use these modules, not just grep. Hi Paul, My patch was a first crack at dealing with an inefficiency that was grep-specific, so it made sense to apply as a grep-specific preprocessing phase. Norihiro found an additional improvement, that, it so happens, would have had an even bigger impact if my patch had not been applied first. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 19 14:56:03 2014 Received: (at submit) by debbugs.gnu.org; 19 Feb 2014 19:56:03 +0000 Received: from localhost ([127.0.0.1]:60643 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGDFT-0006zG-4R for submit@debbugs.gnu.org; Wed, 19 Feb 2014 14:56:03 -0500 Received: from eggs.gnu.org ([208.118.235.92]:43667) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGDFQ-0006yj-Oe for submit@debbugs.gnu.org; Wed, 19 Feb 2014 14:56:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WGDFC-00030W-5k for submit@debbugs.gnu.org; Wed, 19 Feb 2014 14:55:55 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45684) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WGDFC-00030S-2e for submit@debbugs.gnu.org; Wed, 19 Feb 2014 14:55:46 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45299) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WGDF4-0003cO-P8 for bug-grep@gnu.org; Wed, 19 Feb 2014 14:55:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WGDEx-0002tn-3F for bug-grep@gnu.org; Wed, 19 Feb 2014 14:55:38 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:50434) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WGDEw-0002tL-O8 for bug-grep@gnu.org; Wed, 19 Feb 2014 14:55:30 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 99FDCA60021 for ; Wed, 19 Feb 2014 11:55:29 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D4g+65Pyi1PF for ; Wed, 19 Feb 2014 11:55:29 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 41EA4A60008 for ; Wed, 19 Feb 2014 11:55:29 -0800 (PST) Message-ID: <53050C31.8000606@cs.ucla.edu> Date: Wed, 19 Feb 2014 11:55:29 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: bug-grep@gnu.org Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) On 02/19/2014 11:17 AM, Jim Meyering wrote: > I haven't investigated they "why" yet, but expect that I will make > grep-2.18 with just this one performance-improving patch. Can you make room for one more patch? I have a test case and bugfix for grep's mishandling of regular expressions like [^^-~] in unibyte locales like the C locale on most platforms; this is due to the range-expression bug that Arnold reported recently.I can send out the patch later today if you like. The bug has been present since grep 2.8 so I can understand if you'd rather wait to fix it until later. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 19 15:01:08 2014 Received: (at 16232) by debbugs.gnu.org; 19 Feb 2014 20:01:08 +0000 Received: from localhost ([127.0.0.1]:60651 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGDKN-00079I-Bl for submit@debbugs.gnu.org; Wed, 19 Feb 2014 15:01:07 -0500 Received: from mail-pa0-f44.google.com ([209.85.220.44]:64251) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGDKL-00078l-9O for 16232@debbugs.gnu.org; Wed, 19 Feb 2014 15:01:05 -0500 Received: by mail-pa0-f44.google.com with SMTP id kq14so855493pab.3 for <16232@debbugs.gnu.org>; Wed, 19 Feb 2014 12:00:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=g9IXDZ1/jo+XBkh9g/CBpHrVKUX5QNorCQDtobv9aCk=; b=f4VLNFnLrJ6k1e3ELfcndo4EiwAX0JnCDXogxeUMFJHWiLhP+I2qIscDUGJhl3C5CF cJwrh3T7NZR5fjzr1yrLl/ivrXVhtlWtSS8g92kKmMRUypNaG1290keDvUCc+KnctUaQ AwlMthehEYa0HNAzaNz5xxSEIoUAX2mHpx+4gDKzCoSUahIxAguTbrVwtNbtfwvEO+wI QPQdXVoP152dfeoKbzV1KUwUBaE74T5ZHDWQYBg9yQLdRO3Gx1X+0l2bDKtX22KfSLny niiatsf+TBeaMJj9A5FX86zcfXWdX/L/gKd4a+nc5v/6cp/d6TK984k8fQ0H5EBkyUlj NsHw== X-Received: by 10.66.164.104 with SMTP id yp8mr16677738pab.25.1392840059028; Wed, 19 Feb 2014 12:00:59 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Wed, 19 Feb 2014 12:00:38 -0800 (PST) In-Reply-To: <53050C31.8000606@cs.ucla.edu> References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <53050C31.8000606@cs.ucla.edu> From: Jim Meyering Date: Wed, 19 Feb 2014 12:00:38 -0800 X-Google-Sender-Auth: 3UyNnH9kf71k8Zreuc4bJmdTPRc Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Wed, Feb 19, 2014 at 11:55 AM, Paul Eggert wrote: > On 02/19/2014 11:17 AM, Jim Meyering wrote: >> >> I haven't investigated they "why" yet, but expect that I will make >> grep-2.18 with just this one performance-improving patch. > > Can you make room for one more patch? I have a test case and bugfix for > grep's mishandling of regular expressions like [^^-~] in unibyte locales > like the C locale on most platforms; this is due to the range-expression bug > that Arnold reported recently.I can send out the patch later today if you > like. The bug has been present since grep 2.8 so I can understand if you'd > rather wait to fix it until later. Hi Paul, I'll be very happy to consider it. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 19 22:45:30 2014 Received: (at 16232) by debbugs.gnu.org; 20 Feb 2014 03:45:30 +0000 Received: from localhost ([127.0.0.1]:32806 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGKZl-0003TY-I2 for submit@debbugs.gnu.org; Wed, 19 Feb 2014 22:45:30 -0500 Received: from mail-pa0-f49.google.com ([209.85.220.49]:41578) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGKZi-0003TF-6l for 16232@debbugs.gnu.org; Wed, 19 Feb 2014 22:45:27 -0500 Received: by mail-pa0-f49.google.com with SMTP id hz1so1319419pad.36 for <16232@debbugs.gnu.org>; Wed, 19 Feb 2014 19:45:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=r/KbOJm16KKS362EoJDzViTLuEVwYKHZDO5FWKeo3pE=; b=ws66lY9cEzd7mfmQButsY2Rbp6p6bal2KM5pdZwR2r3GwcydQe9IpajSebH39nP+Wf 3slmQ5AzHyxmhV1rIipbe9Lwp4DE8I+HGmvzVjMzAlwLdWf7tycjlGVGQ2Pmi+wGMaYl yncVl3cAYRV1mLSW+/tTAxbUUy8mM4f253qwlyOAuUXsYnJ4VNCad7HG6tdr8vhmdWE/ hEP0frbZXPS7FqsK+Lg1UHUjmblyiNVt7eCm9H3ukfs0GkvgKZU8y9Z4VTpgSPuc6mQE Wl9CaRizwkN5142RrEdFeibcE054LIfYnSd+9qbVghHv1MCjvGjHBrbARFc8091CJpZ9 bZ2Q== X-Received: by 10.66.164.104 with SMTP id yp8mr18714732pab.25.1392867920051; Wed, 19 Feb 2014 19:45:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Wed, 19 Feb 2014 19:44:59 -0800 (PST) In-Reply-To: References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Wed, 19 Feb 2014 19:44:59 -0800 X-Google-Sender-Auth: 6Xi7nh1DEYjW5-qCLDISLlzZ7vI Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Norihiro Tanaka Content-Type: multipart/mixed; boundary=047d7b86f760982a4004f2ce5567 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org>, Padraig Brady X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --047d7b86f760982a4004f2ce5567 Content-Type: text/plain; charset=ISO-8859-1 Hmm... it's not as clear-cut as I first thought. (I built 2.17+ the above patch and put it in a directory named grep-2.18) The following times 2.16, 2.17 and 2.17+patch two ways: $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k $ for i in 16 17 18; do echo $i; env LC_ALL=en_US.UTF-8 time /p/p/grep-2.$i/bin/grep -i foobar k; done 16 15.96 real 14.57 user 0.12 sys 17 1.13 real 1.07 user 0.06 sys 18 1.96 real 1.89 user 0.06 sys The above search takes more than 70% longer with the proposed patch. Contrast that with performance in the non-UTF8 ja_JP.eucJP locale: $ yes $(printf '%078dm' 0)|head -10000 > in $ for i in 16 17 18; do echo $i; env LC_ALL=ja_JP.eucJP time /p/p/grep-2.$i/bin/grep -i n in; done 16 0.03 real 0.02 user 0.00 sys 17 2.98 real 2.96 user 0.00 sys 18 0.02 real 0.02 user 0.00 sys Using the jjj+foobar example, but with only 100k lines, we see there was a 200x performance regression going from grep-2.16 to 2.17: $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -100000 > k $ for i in 16 17 18; do echo $i; env LC_ALL=ja_JP.eucJP time /p/p/grep-2.$i/bin/grep -i foobar k; done 16 0.15 real 0.14 user 0.00 sys 17 27.74 real 27.72 user 0.01 sys 18 0.11 real 0.11 user 0.00 sys Obviously, I want to retain all of 2.17's performance gain in UTF-8 locales, while avoiding the 200x penalty in multi-byte non-UTF8 locales like ja_JP.eucJP. So I have prepared a better patch. With the two attached commits (on top of 2.17), I get these timings, i.e., the same 200x improvement with ja_JP.eucJP, and no regression with en_US.UTF8) $ for i in 16 17 18; do printf "$i: "; env LC_ALL=ja_JP.eucJP time /p/p/grep-2.$i/bin/grep -i foobar k; done 16: 0.14 real 0.14 user 0.00 sys 17: 27.97 real 27.95 user 0.01 sys 18: 0.12 real 0.12 user 0.00 sys $ for i in 16 17 18; do printf "$i: "; env LC_ALL=en_US.UTF-8 time /p/p/grep-2.$i/bin/grep -i foobar k; done 16: 0.13 real 0.12 user 0.00 sys 17: 0.01 real 0.01 user 0.00 sys 18: 0.01 real 0.01 user 0.00 sys --047d7b86f760982a4004f2ce5567 Content-Type: text/plain; charset=US-ASCII; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hrvhkaty1 RnJvbSAwN2E0ZjY5ZGE3MDFhYmZkZWUwNDdmMjZjNjAzMDAyYzIwZDRjN2Q0IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog V2VkLCAxOSBGZWIgMjAxNCAxOToyMjoyNCAtMDgwMApTdWJqZWN0OiBbUEFUQ0ggMS8yXSBtYWlu dDogZmFjdG9yIG91dCB1c2luZ191dGY4IGZ1bmN0aW9uIGZvciB1c2UgaW4gbWFpbi5jCgoqIHNy Yy9zZWFyY2h1dGlscy5jIChpc19tYl9taWRkbGUpOiBVc2UgdXNpbmdfdXRmOCByYXRoZXIgdGhh bgpyb2xsaW5nIG91ciBvd24uCih1c2luZ191dGY4KTogTmV3IGZ1bmN0aW9uIChjb3B5IG9mIHRo ZSBvbmUgaW4gZGZhLmMpLgoqIHNyYy9zZWFyY2guaCAodXNpbmdfdXRmOCk6IERlY2xhcmUgaXQu Ci0tLQogc3JjL3NlYXJjaC5oICAgICAgfCAgMiArKwogc3JjL3NlYXJjaHV0aWxzLmMgfCAyNiAr KysrKysrKysrKysrKysrKysrLS0tLS0tLQogMiBmaWxlcyBjaGFuZ2VkLCAyMSBpbnNlcnRpb25z KCspLCA3IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9zZWFyY2guaCBiL3NyYy9zZWFy Y2guaAppbmRleCAxMmQwODIyLi4xNjdlMGU3IDEwMDY0NAotLS0gYS9zcmMvc2VhcmNoLmgKKysr IGIvc3JjL3NlYXJjaC5oCkBAIC04MCw0ICs4MCw2IEBAIG1iX2Nhc2VfbWFwX2FwcGx5IChtYl9s ZW5fbWFwX3QgY29uc3QgKm1hcCwgc2l6ZV90ICpvZmYsIHNpemVfdCAqbGVuKQogICAgIH0KIH0K CitpbnQgdXNpbmdfdXRmOCAodm9pZCk7CisKICNlbmRpZiAvKiBHUkVQX1NFQVJDSF9IICovCmRp ZmYgLS1naXQgYS9zcmMvc2VhcmNodXRpbHMuYyBiL3NyYy9zZWFyY2h1dGlscy5jCmluZGV4IDM0 Nzg0MTcuLjUxYmJhNTkgMTAwNjQ0Ci0tLSBhL3NyYy9zZWFyY2h1dGlscy5jCisrKyBiL3NyYy9z ZWFyY2h1dGlscy5jCkBAIC0yMzQsMTMgKzIzNCw4IEBAIGlzX21iX21pZGRsZSAoY29uc3QgY2hh ciAqKmdvb2QsIGNvbnN0IGNoYXIgKmJ1ZiwgY29uc3QgY2hhciAqZW5kLAogICBjb25zdCBjaGFy ICpwID0gKmdvb2Q7CiAgIGNvbnN0IGNoYXIgKnByZXYgPSBwOwogICBtYnN0YXRlX3QgY3VyX3N0 YXRlOwotI2lmIEhBVkVfTEFOR0lORk9fQ09ERVNFVAotICBzdGF0aWMgaW50IGlzX3V0ZjggPSAt MTsKLQotICBpZiAoaXNfdXRmOCA9PSAtMSkKLSAgICBpc191dGY4ID0gU1RSRVEgKG5sX2xhbmdp bmZvIChDT0RFU0VUKSwgIlVURi04Iik7CgotICBpZiAoaXNfdXRmOCAmJiBidWYgLSBwID4gTUJf Q1VSX01BWCkKKyAgaWYgKHVzaW5nX3V0ZjggKCkgJiYgYnVmIC0gcCA+IE1CX0NVUl9NQVgpCiAg ICAgewogICAgICAgZm9yIChwID0gYnVmOyBidWYgLSBwID4gTUJfQ1VSX01BWDsgcC0tKQogICAg ICAgICBpZiAobWJjbGVuX2NhY2hlW3RvX3VjaGFyICgqcCldICE9IChzaXplX3QpIC0xKQpAQCAt MjQ5LDcgKzI0NCw2IEBAIGlzX21iX21pZGRsZSAoY29uc3QgY2hhciAqKmdvb2QsIGNvbnN0IGNo YXIgKmJ1ZiwgY29uc3QgY2hhciAqZW5kLAogICAgICAgaWYgKGJ1ZiAtIHAgPT0gTUJfQ1VSX01B WCkKICAgICAgICAgcCA9IGJ1ZjsKICAgICB9Ci0jZW5kaWYKCiAgIG1lbXNldCAoJmN1cl9zdGF0 ZSwgMCwgc2l6ZW9mIGN1cl9zdGF0ZSk7CgpAQCAtMjgzLDMgKzI3NywyMSBAQCBpc19tYl9taWRk bGUgKGNvbnN0IGNoYXIgKipnb29kLCBjb25zdCBjaGFyICpidWYsIGNvbnN0IGNoYXIgKmVuZCwK ICAgcmV0dXJuIDAgPCBtYXRjaF9sZW4gJiYgbWF0Y2hfbGVuIDwgbWJybGVuIChwLCBlbmQgLSBw LCAmY3VyX3N0YXRlKTsKIH0KICNlbmRpZiAvKiBNQlNfU1VQUE9SVCAqLworCisvKiBVVEYtOCBl bmNvZGluZyBhbGxvd3Mgc29tZSBvcHRpbWl6YXRpb25zIHRoYXQgd2UgY2FuJ3Qgb3RoZXJ3aXNl CisgICBhc3N1bWUgaW4gYSBtdWx0aWJ5dGUgZW5jb2RpbmcuICAqLworaW50Cit1c2luZ191dGY4 ICh2b2lkKQoreworICBzdGF0aWMgaW50IHV0ZjggPSAtMTsKKyAgaWYgKHV0ZjggPT0gLTEpCisg ICAgeworI2lmIGRlZmluZWQgSEFWRV9MQU5HSU5GT19DT0RFU0VUICYmIE1CU19TVVBQT1JUCisg ICAgICB1dGY4ID0gKFNUUkVRIChubF9sYW5naW5mbyAoQ09ERVNFVCksICJVVEYtOCIpKTsKKyNl bHNlCisgICAgICB1dGY4ID0gMDsKKyNlbmRpZgorICAgIH0KKworICByZXR1cm4gdXRmODsKK30K LS0gCjEuOS4wCgoKRnJvbSAyNGMxYzMxM2U0YmZhMTdhYjEyNzVhNWQ3MGUwY2MxOGM0YWExYjM1 IE1vbiBTZXAgMTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZi LmNvbT4KRGF0ZTogV2VkLCAxOSBGZWIgMjAxNCAxOTozMTo0MyAtMDgwMApTdWJqZWN0OiBbUEFU Q0ggMi8yXSBncmVwIC1pOiBhdm9pZCAyMDB4IHBlcmYuIHJlZ3Jlc3Npb24gaW4gbXVsdGlieXRl CiBub24tVVRGOCBsb2NhbGVzCgoqIHNyYy9tYWluLmMgKHRyaXZpYWxfY2FzZV9pZ25vcmUpOiBQ ZXJmb3JtIHRoaXMgb3B0aW1pemF0aW9uCm9ubHkgZm9yIFVURjggbG9jYWxlcy4gIFRoaXMgcmVj dGlmaWVzIGEgMjAweCBwZXJmb3JtYW5jZQpyZWdyZXNzaW9uIGluIG11bHRpLWJ5dGUgbm9uLVVU RjggbG9jYWxlcyBsaWtlIGphX0pQLmV1Y0pQLgpSZXBvcnRlZCBieSBOb3JpaGlybyBUYW5ha2Eg aW4gaHR0cDovL2RlYmJ1Z3MuZ251Lm9yZy8xNjIzMiM1MAoqIE5FV1MgKEJ1ZyBmaXhlcyk6IE1l bnRpb24gaXQuCi0tLQogTkVXUyAgICAgICB8IDUgKysrKysKIHNyYy9tYWluLmMgfCAzICsrKwog MiBmaWxlcyBjaGFuZ2VkLCA4IGluc2VydGlvbnMoKykKCmRpZmYgLS1naXQgYS9ORVdTIGIvTkVX UwppbmRleCA2Nzg1YTk2Li40OWExN2IwIDEwMDY0NAotLS0gYS9ORVdTCisrKyBiL05FV1MKQEAg LTIsNiArMiwxMSBAQCBHTlUgZ3JlcCBORVdTICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgLSotIG91dGxpbmUgLSotCgogKiBOb3Rld29ydGh5IGNoYW5nZXMgaW4gcmVsZWFzZSA/ Lj8gKD8/Pz8tPz8tPz8pIFs/XQoKKyoqIEJ1ZyBmaXhlcworCisgIGdyZXAgLWkgaW4gYSBtdWx0 aWJ5dGUsIG5vbi1VVEY4IGxvY2FsZSBjb3VsZCBiZSB1cCB0byAyMDAgdGltZXMgc2xvd2VyCisg IHRoYW4gaW4gMi4xNi4gIFtidWcgaW50cm9kdWNlZCBpbiBncmVwLTIuMTddCisKCiAqIE5vdGV3 b3J0aHkgY2hhbmdlcyBpbiByZWxlYXNlIDIuMTcgKDIwMTQtMDItMTcpIFtzdGFibGVdCgpkaWZm IC0tZ2l0IGEvc3JjL21haW4uYyBiL3NyYy9tYWluLmMKaW5kZXggYmQyMDI5Ny4uZDYzYzEzMyAx MDA2NDQKLS0tIGEvc3JjL21haW4uYworKysgYi9zcmMvbWFpbi5jCkBAIC0xODg4LDYgKzE4ODgs OSBAQCB0cml2aWFsX2Nhc2VfaWdub3JlIChzaXplX3QgbGVuLCBjaGFyIGNvbnN0ICprZXlzLAog ICBpZiAobWVtY2hyIChrZXlzLCAnXFwnLCBsZW4pIHx8IG1lbWNociAoa2V5cywgJ1snLCBsZW4p KQogICAgIHJldHVybiBmYWxzZTsKCisgIGlmICggISB1c2luZ191dGY4ICgpKQorICAgIHJldHVy biBmYWxzZTsKKwogICAvKiBXb3JzdCBjYXNlIGlzIHRoYXQgZWFjaCBieXRlIEIgb2YgS0VZUyBp cyBBU0NJSSBhbHBoYWJldGljIGFuZCBlYWNoCiAgICAgIG90aGVyX2Nhc2UoQikgY2hhcmFjdGVy LCBDLCBvY2N1cGllcyBNQl9DVVJfTUFYIGJ5dGVzLCBzbyBlYWNoIEIKICAgICAgbWFwcyB0byBb QkNdLCB3aGljaCByZXF1aXJlcyBNQl9DVVJfTUFYICsgMyBieXRlcy4gICAqLwotLSAKMS45LjAK Cg== --047d7b86f760982a4004f2ce5567-- From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 20 08:39:40 2014 Received: (at 16232) by debbugs.gnu.org; 20 Feb 2014 13:39:40 +0000 Received: from localhost ([127.0.0.1]:33176 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGTqm-000482-3Z for submit@debbugs.gnu.org; Thu, 20 Feb 2014 08:39:40 -0500 Received: from pbsg501.nifty.com ([202.248.238.71]:52955) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGTqh-00047q-K6 for 16232@debbugs.gnu.org; Thu, 20 Feb 2014 08:39:36 -0500 Received: from [10.120.1.54] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg501.nifty.com with ESMTP id s1KDdI4t020601; Thu, 20 Feb 2014 22:39:18 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Thu, 20 Feb 2014 22:39:18 +0900 From: Norihiro TANAKA To: Jim Meyering Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales In-Reply-To: References: Message-Id: <20140220223917.6ECE.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org>, Padraig Brady X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) Hi Jim, Your patch is probably right. However, I think that the true cause for 100x slow is that DFA engine is slower than regex engine for case-insensitive matching on a non-UTF-8 locle. On a multibyte locale, for case-insensitive "a" grep prefers DFA engine, but for character class "[Aa]" prefers regex engine. Norihiro From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 20 12:13:48 2014 Received: (at 16232) by debbugs.gnu.org; 20 Feb 2014 17:13:48 +0000 Received: from localhost ([127.0.0.1]:33846 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGXBz-0001yg-HZ for submit@debbugs.gnu.org; Thu, 20 Feb 2014 12:13:47 -0500 Received: from mail-pb0-f46.google.com ([209.85.160.46]:35175) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGXBw-0001yO-Rq for 16232@debbugs.gnu.org; Thu, 20 Feb 2014 12:13:45 -0500 Received: by mail-pb0-f46.google.com with SMTP id um1so2157142pbc.19 for <16232@debbugs.gnu.org>; Thu, 20 Feb 2014 09:13:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=OzeY4hEOMmalHtqauhx3xgxD6MnVQcoqRQAY3r/gDf8=; b=IEnl0k7UVBVYhGL+7qkcUl6a+2BWmo4OWF3TA7QxHufgbK8tI3AO017VUSQsNQVXeK qdbAu1Fmve0bZERzZWi3AMCqYYCeOk6A51s5+JmjYzL3edf2mUZG3SH9S8R6V/cuW4fP T/9rggAO835M8QZva3PPhgLT1KFYbhIY3b0R64h1ZW55HA6UehbaVodfRXfPElK2RtlG aeCxWt2c8zUmtMq4OhKnwgdcG7lcE0t/YDOlxMAJQ4Qk3ARd0/LXBCRXbJ32tjwV8DXj /gw/TYW4Yh9W66GQn4UsMruyoE+qnsiXTr4GMm9UJa5HcqtDUxYgo/70+Jrnl+TSvi+b sG6w== X-Received: by 10.66.192.74 with SMTP id he10mr3387037pac.126.1392916418825; Thu, 20 Feb 2014 09:13:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Thu, 20 Feb 2014 09:13:18 -0800 (PST) In-Reply-To: References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <53050C31.8000606@cs.ucla.edu> From: Jim Meyering Date: Thu, 20 Feb 2014 09:13:18 -0800 X-Google-Sender-Auth: IY2RfAtV7W0neHx4qG09EEOUHN8 Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Paul Eggert Content-Type: multipart/mixed; boundary=047d7bdc9ebc58a20e04f2d9a054 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --047d7bdc9ebc58a20e04f2d9a054 Content-Type: text/plain; charset=ISO-8859-1 Hi Paul, In case your bug fix looks safe/small, and is ready, ... I'm hoping to release 2.18 today, with the attached commits. Changes since yesterday: comment/log tweaks, and I've hoisted the using_utf8 test in trivial_case_ignore to precede the two memchr tests. --047d7bdc9ebc58a20e04f2d9a054 Content-Type: text/plain; charset=US-ASCII; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hrwaedmj0 RnJvbSAwN2E0ZjY5ZGE3MDFhYmZkZWUwNDdmMjZjNjAzMDAyYzIwZDRjN2Q0IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog V2VkLCAxOSBGZWIgMjAxNCAxOToyMjoyNCAtMDgwMApTdWJqZWN0OiBbUEFUQ0ggMS8yXSBtYWlu dDogZmFjdG9yIG91dCB1c2luZ191dGY4IGZ1bmN0aW9uIGZvciB1c2UgaW4gbWFpbi5jCgoqIHNy Yy9zZWFyY2h1dGlscy5jIChpc19tYl9taWRkbGUpOiBVc2UgdXNpbmdfdXRmOCByYXRoZXIgdGhh bgpyb2xsaW5nIG91ciBvd24uCih1c2luZ191dGY4KTogTmV3IGZ1bmN0aW9uIChjb3B5IG9mIHRo ZSBvbmUgaW4gZGZhLmMpLgoqIHNyYy9zZWFyY2guaCAodXNpbmdfdXRmOCk6IERlY2xhcmUgaXQu Ci0tLQogc3JjL3NlYXJjaC5oICAgICAgfCAgMiArKwogc3JjL3NlYXJjaHV0aWxzLmMgfCAyNiAr KysrKysrKysrKysrKysrKysrLS0tLS0tLQogMiBmaWxlcyBjaGFuZ2VkLCAyMSBpbnNlcnRpb25z KCspLCA3IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9zZWFyY2guaCBiL3NyYy9zZWFy Y2guaAppbmRleCAxMmQwODIyLi4xNjdlMGU3IDEwMDY0NAotLS0gYS9zcmMvc2VhcmNoLmgKKysr IGIvc3JjL3NlYXJjaC5oCkBAIC04MCw0ICs4MCw2IEBAIG1iX2Nhc2VfbWFwX2FwcGx5IChtYl9s ZW5fbWFwX3QgY29uc3QgKm1hcCwgc2l6ZV90ICpvZmYsIHNpemVfdCAqbGVuKQogICAgIH0KIH0K CitpbnQgdXNpbmdfdXRmOCAodm9pZCk7CisKICNlbmRpZiAvKiBHUkVQX1NFQVJDSF9IICovCmRp ZmYgLS1naXQgYS9zcmMvc2VhcmNodXRpbHMuYyBiL3NyYy9zZWFyY2h1dGlscy5jCmluZGV4IDM0 Nzg0MTcuLjUxYmJhNTkgMTAwNjQ0Ci0tLSBhL3NyYy9zZWFyY2h1dGlscy5jCisrKyBiL3NyYy9z ZWFyY2h1dGlscy5jCkBAIC0yMzQsMTMgKzIzNCw4IEBAIGlzX21iX21pZGRsZSAoY29uc3QgY2hh ciAqKmdvb2QsIGNvbnN0IGNoYXIgKmJ1ZiwgY29uc3QgY2hhciAqZW5kLAogICBjb25zdCBjaGFy ICpwID0gKmdvb2Q7CiAgIGNvbnN0IGNoYXIgKnByZXYgPSBwOwogICBtYnN0YXRlX3QgY3VyX3N0 YXRlOwotI2lmIEhBVkVfTEFOR0lORk9fQ09ERVNFVAotICBzdGF0aWMgaW50IGlzX3V0ZjggPSAt MTsKLQotICBpZiAoaXNfdXRmOCA9PSAtMSkKLSAgICBpc191dGY4ID0gU1RSRVEgKG5sX2xhbmdp bmZvIChDT0RFU0VUKSwgIlVURi04Iik7CgotICBpZiAoaXNfdXRmOCAmJiBidWYgLSBwID4gTUJf Q1VSX01BWCkKKyAgaWYgKHVzaW5nX3V0ZjggKCkgJiYgYnVmIC0gcCA+IE1CX0NVUl9NQVgpCiAg ICAgewogICAgICAgZm9yIChwID0gYnVmOyBidWYgLSBwID4gTUJfQ1VSX01BWDsgcC0tKQogICAg ICAgICBpZiAobWJjbGVuX2NhY2hlW3RvX3VjaGFyICgqcCldICE9IChzaXplX3QpIC0xKQpAQCAt MjQ5LDcgKzI0NCw2IEBAIGlzX21iX21pZGRsZSAoY29uc3QgY2hhciAqKmdvb2QsIGNvbnN0IGNo YXIgKmJ1ZiwgY29uc3QgY2hhciAqZW5kLAogICAgICAgaWYgKGJ1ZiAtIHAgPT0gTUJfQ1VSX01B WCkKICAgICAgICAgcCA9IGJ1ZjsKICAgICB9Ci0jZW5kaWYKCiAgIG1lbXNldCAoJmN1cl9zdGF0 ZSwgMCwgc2l6ZW9mIGN1cl9zdGF0ZSk7CgpAQCAtMjgzLDMgKzI3NywyMSBAQCBpc19tYl9taWRk bGUgKGNvbnN0IGNoYXIgKipnb29kLCBjb25zdCBjaGFyICpidWYsIGNvbnN0IGNoYXIgKmVuZCwK ICAgcmV0dXJuIDAgPCBtYXRjaF9sZW4gJiYgbWF0Y2hfbGVuIDwgbWJybGVuIChwLCBlbmQgLSBw LCAmY3VyX3N0YXRlKTsKIH0KICNlbmRpZiAvKiBNQlNfU1VQUE9SVCAqLworCisvKiBVVEYtOCBl bmNvZGluZyBhbGxvd3Mgc29tZSBvcHRpbWl6YXRpb25zIHRoYXQgd2UgY2FuJ3Qgb3RoZXJ3aXNl CisgICBhc3N1bWUgaW4gYSBtdWx0aWJ5dGUgZW5jb2RpbmcuICAqLworaW50Cit1c2luZ191dGY4 ICh2b2lkKQoreworICBzdGF0aWMgaW50IHV0ZjggPSAtMTsKKyAgaWYgKHV0ZjggPT0gLTEpCisg ICAgeworI2lmIGRlZmluZWQgSEFWRV9MQU5HSU5GT19DT0RFU0VUICYmIE1CU19TVVBQT1JUCisg ICAgICB1dGY4ID0gKFNUUkVRIChubF9sYW5naW5mbyAoQ09ERVNFVCksICJVVEYtOCIpKTsKKyNl bHNlCisgICAgICB1dGY4ID0gMDsKKyNlbmRpZgorICAgIH0KKworICByZXR1cm4gdXRmODsKK30K LS0gCjEuOS4wCgoKRnJvbSA2MDUzYzM4OGQ0ZjU2ZmFlMmI2MzlmNTY2ZjJiZDBmOTgzMGYwMjc2 IE1vbiBTZXAgMTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZi LmNvbT4KRGF0ZTogV2VkLCAxOSBGZWIgMjAxNCAxOTozMTo0MyAtMDgwMApTdWJqZWN0OiBbUEFU Q0ggMi8yXSBncmVwIC1pOiBhdm9pZCAyMDB4IHBlcmYuIHJlZ3Jlc3Npb24gaW4gbXVsdGlieXRl CiBub24tVVRGOCBsb2NhbGVzCgoqIHNyYy9tYWluLmMgKHRyaXZpYWxfY2FzZV9pZ25vcmUpOiBQ ZXJmb3JtIHRoaXMgb3B0aW1pemF0aW9uIG9ubHkKZm9yIFVURjggbG9jYWxlcy4gIFRoaXMgcmVj dGlmaWVzIGEgMjAweCBwZXJmb3JtYW5jZSByZWdyZXNzaW9uIGluCm11bHRpLWJ5dGUgbm9uLVVU RjggbG9jYWxlcyBsaWtlIGphX0pQLmV1Y0pQLiAgVGhlIHJlZ3Jlc3Npb24gd2FzCmludHJvZHVj ZWQgYnkgdGhlIDEweCBVVEY4L2dyZXAtaSBzcGVlZHVwLCBjb21taXQgdjIuMTYtNC1nOTczMThm NS4KUmVwb3J0ZWQgYnkgTm9yaWhpcm8gVGFuYWthIGluIGh0dHA6Ly9kZWJidWdzLmdudS5vcmcv MTYyMzIjNTAKKiBORVdTIChCdWcgZml4ZXMpOiBNZW50aW9uIGl0LgotLS0KIE5FV1MgICAgICAg fCA1ICsrKysrCiBzcmMvbWFpbi5jIHwgNSArKysrKwogMiBmaWxlcyBjaGFuZ2VkLCAxMCBpbnNl cnRpb25zKCspCgpkaWZmIC0tZ2l0IGEvTkVXUyBiL05FV1MKaW5kZXggNjc4NWE5Ni4uNDlhMTdi MCAxMDA2NDQKLS0tIGEvTkVXUworKysgYi9ORVdTCkBAIC0yLDYgKzIsMTEgQEAgR05VIGdyZXAg TkVXUyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC0qLSBvdXRsaW5lIC0qLQoK ICogTm90ZXdvcnRoeSBjaGFuZ2VzIGluIHJlbGVhc2UgPy4/ICg/Pz8/LT8/LT8/KSBbP10KCisq KiBCdWcgZml4ZXMKKworICBncmVwIC1pIGluIGEgbXVsdGlieXRlLCBub24tVVRGOCBsb2NhbGUg Y291bGQgYmUgdXAgdG8gMjAwIHRpbWVzIHNsb3dlcgorICB0aGFuIGluIDIuMTYuICBbYnVnIGlu dHJvZHVjZWQgaW4gZ3JlcC0yLjE3XQorCgogKiBOb3Rld29ydGh5IGNoYW5nZXMgaW4gcmVsZWFz ZSAyLjE3ICgyMDE0LTAyLTE3KSBbc3RhYmxlXQoKZGlmZiAtLWdpdCBhL3NyYy9tYWluLmMgYi9z cmMvbWFpbi5jCmluZGV4IGJkMjAyOTcuLmNhN2M3YjMgMTAwNjQ0Ci0tLSBhL3NyYy9tYWluLmMK KysrIGIvc3JjL21haW4uYwpAQCAtMTg4Myw2ICsxODgzLDExIEBAIHN0YXRpYyBib29sCiB0cml2 aWFsX2Nhc2VfaWdub3JlIChzaXplX3QgbGVuLCBjaGFyIGNvbnN0ICprZXlzLAogICAgICAgICAg ICAgICAgICAgICAgc2l6ZV90ICpuZXdfbGVuLCBjaGFyICoqbmV3X2tleXMpCiB7CisgIC8qIFBl cmZvcm0gdGhpcyB0cmFuc2xhdGlvbiBvbmx5IGZvciBVVEYtOC4gIE90aGVyd2lzZSwgdGhpcyB3 b3VsZCBpbmR1Y2UKKyAgICAgYSAxMDAtMjAweCBwZXJmb3JtYW5jZSBwZW5hbHR5IGZvciBub24t VVRGOCBtdWx0aWJ5dGUgbG9jYWxlcy4gICovCisgIGlmICggISB1c2luZ191dGY4ICgpKQorICAg IHJldHVybiBmYWxzZTsKKwogICAvKiBGSVhNRTogY29uc2lkZXIgcmVtb3ZpbmcgdGhlIGZvbGxv d2luZyByZXN0cmljdGlvbjoKICAgICAgUmVqZWN0IGlmIEtFWVMgY29udGFpbiBBU0NJSSAnXFwn IG9yICdbJy4gICovCiAgIGlmIChtZW1jaHIgKGtleXMsICdcXCcsIGxlbikgfHwgbWVtY2hyIChr ZXlzLCAnWycsIGxlbikpCi0tIAoxLjkuMAoK --047d7bdc9ebc58a20e04f2d9a054-- From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 20 12:24:41 2014 Received: (at 16232) by debbugs.gnu.org; 20 Feb 2014 17:24:41 +0000 Received: from localhost ([127.0.0.1]:33871 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGXMX-0002JR-0o for submit@debbugs.gnu.org; Thu, 20 Feb 2014 12:24:41 -0500 Received: from frenzy.freefriends.org ([66.54.153.139]:33639 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGXMT-0002JG-NW for 16232@debbugs.gnu.org; Thu, 20 Feb 2014 12:24:38 -0500 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (localhost [127.0.0.1]) by freefriends.org (8.14.8/8.14.8) with ESMTP id s1KHMnPZ013168; Thu, 20 Feb 2014 10:22:49 -0700 Received: (from arnold@localhost) by freefriends.org (8.14.8/8.14.8/submit) id s1KHMn0j013167; Thu, 20 Feb 2014 17:22:49 GMT From: arnold@skeeve.com Message-Id: <201402201722.s1KHMn0j013167@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Thu, 20 Feb 2014 10:22:49 -0700 To: jim@meyering.net, eggert@cs.ucla.edu Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <53050C31.8000606@cs.ucla.edu> In-Reply-To: User-Agent: Heirloom mailx 12.4 7/29/08 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Jim Meyering wrote: > Hi Paul, > > In case your bug fix looks safe/small, and is ready, ... > > I'm hoping to release 2.18 today, with the attached commits. > Changes since yesterday: comment/log tweaks, and I've hoisted the using_utf8 > test in trivial_case_ignore to precede the two memchr tests. Hi Jim. Why copy the using_utf8() routine out of dfa.c? Why not just link to it instead? If it's static, make it extern... That way if the logic ever changes then it only has to be changed in one place. Just a thought. :-) Arnold From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 20 13:35:04 2014 Received: (at 16232) by debbugs.gnu.org; 20 Feb 2014 18:35:04 +0000 Received: from localhost ([127.0.0.1]:33915 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGYSd-0004Pl-CC for submit@debbugs.gnu.org; Thu, 20 Feb 2014 13:35:04 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:37466) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGYSY-0004PE-Sm for 16232@debbugs.gnu.org; Thu, 20 Feb 2014 13:35:00 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 2CFBFA6003F; Thu, 20 Feb 2014 10:34:53 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dScEZ4qh5pus; Thu, 20 Feb 2014 10:34:52 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id C12C7A60035; Thu, 20 Feb 2014 10:34:51 -0800 (PST) Message-ID: <53064AC8.8000303@cs.ucla.edu> Date: Thu, 20 Feb 2014 10:34:48 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <53050C31.8000606@cs.ucla.edu> In-Reply-To: Content-Type: multipart/mixed; boundary="------------080908070305020103090603" X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) This is a multi-part message in MIME format. --------------080908070305020103090603 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 02/20/2014 09:13 AM, Jim Meyering wrote: > In case your bug fix looks safe/small, and is ready, ... Attached. I have some other fixes too, which I'll try to get out the door today (though I can't promise that). --------------080908070305020103090603 Content-Type: text/x-patch; name="0001-tests-test-in-unibyte-locales.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-tests-test-in-unibyte-locales.patch" >From 80250e3fae3a333160014bed7613a5cc9e42413a Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 19 Feb 2014 18:58:42 -0800 Subject: [PATCH 1/2] tests: test [^^-^] in unibyte locales This is a bug in the current dfa.c, which was reintroduced by the recent reversion from RRI. * tests/unibyte-negated-circumflex: New file. * tests/Makefile.am (TESTS): Add it. * tests/init.cfg (require_unibyte_locale): New function. --- tests/Makefile.am | 1 + tests/init.cfg | 16 ++++++++++++++++ tests/unibyte-negated-circumflex | 27 +++++++++++++++++++++++++++ 3 files changed, 44 insertions(+) create mode 100755 tests/unibyte-negated-circumflex diff --git a/tests/Makefile.am b/tests/Makefile.am index e2967fa..331467a 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -75,6 +75,7 @@ TESTS = \ multibyte-white-space \ empty-line-mb \ unibyte-bracket-expr \ + unibyte-negated-circumflex \ high-bit-range \ options \ pcre \ diff --git a/tests/init.cfg b/tests/init.cfg index 2e8330b..ee5d537 100644 --- a/tests/init.cfg +++ b/tests/init.cfg @@ -87,6 +87,22 @@ require_compiled_in_MB_support() || skip_ this test requires MBS support } +require_unibyte_locale() +{ + path_prepend_ . + for loc in C en_US; do + for encoding in '' .iso88591 .iso885915 .ISO8859-1 .ISO8859-15; do + locale=$loc$encoding + MB_CUR_MAX=$(get-mb-cur-max $locale 2>/dev/null) && + test "$MB_CUR_MAX" -eq 1 && + LC_ALL=$locale && + export LC_ALL && + return + done + done + skip_ 'no unibyte locale found' +} + expensive_() { if test "$RUN_EXPENSIVE_TESTS" != yes; then diff --git a/tests/unibyte-negated-circumflex b/tests/unibyte-negated-circumflex new file mode 100755 index 0000000..b6d747c --- /dev/null +++ b/tests/unibyte-negated-circumflex @@ -0,0 +1,27 @@ +#!/bin/sh +# Exercise a bug where [^^-^] was treated as if it were [^-^]. + +# Copyright 2014 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_unibyte_locale + +fail=0 + +echo a >in || framework_failure_ +grep '[^^-^]' in >out || fail=1 +compare out in || fail=1 +Exit $fail -- 1.8.5.3 --------------080908070305020103090603 Content-Type: text/x-patch; name="0002-grep-fix-bug-with-patterns-like-in-unibyte-locales.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0002-grep-fix-bug-with-patterns-like-in-unibyte-locales.patc"; filename*1="h" >From c1fa72bd324aac44a1d99b83bb585dbcf291041d Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 20 Feb 2014 10:17:47 -0800 Subject: [PATCH 2/2] grep: fix bug with patterns like [^^-~] in unibyte locales * NEWS: Document this. * src/dfa.c (parse_bracket_exp): Escape patterns like [^^-~], or Awk patterns like [\^-\]], so that they are not misinterpreted by the system regex library. Check for system regex failure due to memory exhaustion. --- NEWS | 5 +++++ src/dfa.c | 41 ++++++++++++++++++++++------------------- 2 files changed, 27 insertions(+), 19 deletions(-) diff --git a/NEWS b/NEWS index c6d78d0..8639ce1 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,11 @@ GNU grep NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Bug fixes + + grep no longer mishandles patterns like [^^-~] in unibyte locales. + [bug introduced in grep-2.8] + ** Improvements grep -i in a multibyte locale may be over 130 times faster than in 2.17 diff --git a/src/dfa.c b/src/dfa.c index a133e03..9266f6f 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -1108,28 +1108,31 @@ parse_bracket_exp (void) { /* Defer to the system regex library about the meaning of range expressions. */ - regex_t re; - char pattern[6] = { '[', 0, '-', 0, ']', 0 }; - char subject[2] = { 0, 0 }; - c1 = c; - if (case_fold) - { - c1 = tolower (c1); - c2 = tolower (c2); - } - - pattern[1] = c1; - pattern[3] = c2; - regcomp (&re, pattern, REG_NOSUB); - for (c = 0; c < NOTCHAR; ++c) + struct re_pattern_buffer re = { 0 }; + char const *compile_msg; +#if 199901 <= __STDC_VERSION__ + char pattern[] = { '[', '\\', c, '-', '\\', c2, ']' }; +#else + char pattern[] = { '[', '\\', 0, '-', '\\', 0, ']' }; + pattern[2] = c; + pattern[5] = c2; +#endif + re_set_syntax (syntax_bits | RE_BACKSLASH_ESCAPE_IN_LISTS); + compile_msg = re_compile_pattern (pattern, sizeof pattern, &re); + if (compile_msg) + dfaerror (compile_msg); + for (c = 0; c < NOTCHAR; c++) { - if ((case_fold && isupper (c))) - continue; - subject[0] = c; - if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH) - setbit_case_fold_c (c, ccl); + char subject = c; + switch (re_match (&re, &subject, 1, 0, NULL)) + { + case 1: setbit (c, ccl); break; + case -1: break; + default: xalloc_die (); + } } regfree (&re); + re_set_syntax (syntax_bits); } colon_warning_state |= 8; -- 1.8.5.3 --------------080908070305020103090603-- From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 20 15:42:37 2014 Received: (at 16232) by debbugs.gnu.org; 20 Feb 2014 20:42:37 +0000 Received: from localhost ([127.0.0.1]:33941 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGaS3-0007so-T4 for submit@debbugs.gnu.org; Thu, 20 Feb 2014 15:42:36 -0500 Received: from mail-pd0-f169.google.com ([209.85.192.169]:44529) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGaS1-0007sX-Eg for 16232@debbugs.gnu.org; Thu, 20 Feb 2014 15:42:34 -0500 Received: by mail-pd0-f169.google.com with SMTP id v10so2318711pde.28 for <16232@debbugs.gnu.org>; Thu, 20 Feb 2014 12:42:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=kBlaJD3RzFyjxZVLl3eVDaGUpTsYQ1H/HDHF68yISvY=; b=S8ntsiAVK5Tt98lD2WZC9tvVv6EyOHJBuvMcmxZdIO6nGF5J8Tw2gYACUgrSyp/lkt pwqtu+NRPMob/fI1OmBK5B6uwcV5AnNSN6rezlf7NiRxVxZn3Tx1o05HETfNgK95dQie qBWeSy28Qb1g+e67o4zWyJHgVwm2m9IGmFYjtDqVHUTGAvCntEUkOUjwSfloBJg2h+Jr Fujve+LVWiTGcGY7iduzH0eGkqOZj2LiI04E25qRqFYd4gPOtRpMaZaDTTKTs7CvCoVu BpHqyOMslSWltcxMNQtIbuEuQHptqCi4nGv50VugoH4NNBlV6959rVc9t98TK6tx7TFg 1PuQ== X-Received: by 10.66.192.74 with SMTP id he10mr4506586pac.126.1392928947434; Thu, 20 Feb 2014 12:42:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Thu, 20 Feb 2014 12:42:06 -0800 (PST) In-Reply-To: <201402201722.s1KHMn0j013167@freefriends.org> References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <53050C31.8000606@cs.ucla.edu> <201402201722.s1KHMn0j013167@freefriends.org> From: Jim Meyering Date: Thu, 20 Feb 2014 12:42:06 -0800 X-Google-Sender-Auth: HttPv1IyvNPrnozYptd0YzfzlR8 Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Aharon Robbins Content-Type: multipart/mixed; boundary=047d7bdc9ebc1c0b8a04f2dc8bac X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org>, Paul Eggert X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --047d7bdc9ebc1c0b8a04f2dc8bac Content-Type: text/plain; charset=ISO-8859-1 On Thu, Feb 20, 2014 at 9:22 AM, wrote: > Hi Jim. > > Why copy the using_utf8() routine out of dfa.c? Why not just link > to it instead? If it's static, make it extern... That way if the > logic ever changes then it only has to be changed in one place. Hi Arnold, That was due to my reflex of avoiding unnecessary change to dfa.c, but in this case, it is definitely better to do as you suggest, not just to avoid code duplication, but also for run-time efficiency: with two copies of the function, there would have been two calls to nl_langinfo per run; with only that one copy, we save a call, too. Revised commits attached. Thanks, Jim --047d7bdc9ebc1c0b8a04f2dc8bac Content-Type: text/plain; charset=US-ASCII; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hrwhwm5r0 RnJvbSA5YTVjNmM4NTY4OTJmZGU1ZGYwNzY2NmQ0YmI2NjQxYTA1ZjMzNzEyIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog V2VkLCAxOSBGZWIgMjAxNCAxOToyMjoyNCAtMDgwMApTdWJqZWN0OiBbUEFUQ0ggMS8yXSBtYWlu dDogZ2l2ZSBkZmEuYydzIHVzaW5nX3V0ZjggZnVuY3Rpb24gZXh0ZXJuYWwgc2NvcGUKCiogc3Jj L2RmYS5jICh1c2luZ191dGY4KTogUmVtb3ZlICJzdGF0aWMgaW5saW5lIi4KKiBzcmMvZGZhLmgg KHVzaW5nX3V0ZjgpOiBEZWNsYXJlIGl0LgoqIHNyYy9zZWFyY2h1dGlscy5jIChpc19tYl9taWRk bGUpOiBVc2UgdXNpbmdfdXRmOCByYXRoZXIgdGhhbgpyb2xsaW5nIG91ciBvd24uCi0tLQogc3Jj L2RmYS5jICAgICAgICAgfCAyICstCiBzcmMvZGZhLmggICAgICAgICB8IDIgKysKIHNyYy9zZWFy Y2h1dGlscy5jIHwgOSArKy0tLS0tLS0KIDMgZmlsZXMgY2hhbmdlZCwgNSBpbnNlcnRpb25zKCsp LCA4IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9kZmEuYyBiL3NyYy9kZmEuYwppbmRl eCBhMTMzZTAzLi5iYTllN2EyIDEwMDY0NAotLS0gYS9zcmMvZGZhLmMKKysrIGIvc3JjL2RmYS5j CkBAIC03NTMsNyArNzUzLDcgQEAgc2V0Yml0X2Nhc2VfZm9sZF9jIChpbnQgYiwgY2hhcmNsYXNz IGMpCgogLyogVVRGLTggZW5jb2RpbmcgYWxsb3dzIHNvbWUgb3B0aW1pemF0aW9ucyB0aGF0IHdl IGNhbid0IG90aGVyd2lzZQogICAgYXNzdW1lIGluIGEgbXVsdGlieXRlIGVuY29kaW5nLiAgKi8K LXN0YXRpYyBpbmxpbmUgaW50CitpbnQKIHVzaW5nX3V0ZjggKHZvaWQpCiB7CiAgIHN0YXRpYyBp bnQgdXRmOCA9IC0xOwpkaWZmIC0tZ2l0IGEvc3JjL2RmYS5oIGIvc3JjL2RmYS5oCmluZGV4IGJh Y2Q0ODkuLjdlMDY3NGYgMTAwNjQ0Ci0tLSBhL3NyYy9kZmEuaAorKysgYi9zcmMvZGZhLmgKQEAg LTk5LDMgKzk5LDUgQEAgZXh0ZXJuIHZvaWQgZGZhd2FybiAoY29uc3QgY2hhciAqKTsKICAgIHRh a2VzIGEgc2luZ2xlIGFyZ3VtZW50LCBhIE5VTC10ZXJtaW5hdGVkIHN0cmluZyBkZXNjcmliaW5n IHRoZSBlcnJvci4KICAgIFRoZSB1c2VyIG11c3Qgc3VwcGx5IGEgZGZhZXJyb3IuICAqLwogZXh0 ZXJuIF9Ob3JldHVybiB2b2lkIGRmYWVycm9yIChjb25zdCBjaGFyICopOworCitleHRlcm4gaW50 IHVzaW5nX3V0ZjggKHZvaWQpOwpkaWZmIC0tZ2l0IGEvc3JjL3NlYXJjaHV0aWxzLmMgYi9zcmMv c2VhcmNodXRpbHMuYwppbmRleCAzNDc4NDE3Li43MzYzNzAxIDEwMDY0NAotLS0gYS9zcmMvc2Vh cmNodXRpbHMuYworKysgYi9zcmMvc2VhcmNodXRpbHMuYwpAQCAtMTksNiArMTksNyBAQAogI2lu Y2x1ZGUgPGNvbmZpZy5oPgogI2luY2x1ZGUgPGFzc2VydC5oPgogI2luY2x1ZGUgInNlYXJjaC5o IgorI2luY2x1ZGUgImRmYS5oIgogI2lmIEhBVkVfTEFOR0lORk9fQ09ERVNFVAogIyBpbmNsdWRl IDxsYW5naW5mby5oPgogI2VuZGlmCkBAIC0yMzQsMTMgKzIzNSw4IEBAIGlzX21iX21pZGRsZSAo Y29uc3QgY2hhciAqKmdvb2QsIGNvbnN0IGNoYXIgKmJ1ZiwgY29uc3QgY2hhciAqZW5kLAogICBj b25zdCBjaGFyICpwID0gKmdvb2Q7CiAgIGNvbnN0IGNoYXIgKnByZXYgPSBwOwogICBtYnN0YXRl X3QgY3VyX3N0YXRlOwotI2lmIEhBVkVfTEFOR0lORk9fQ09ERVNFVAotICBzdGF0aWMgaW50IGlz X3V0ZjggPSAtMTsKLQotICBpZiAoaXNfdXRmOCA9PSAtMSkKLSAgICBpc191dGY4ID0gU1RSRVEg KG5sX2xhbmdpbmZvIChDT0RFU0VUKSwgIlVURi04Iik7CgotICBpZiAoaXNfdXRmOCAmJiBidWYg LSBwID4gTUJfQ1VSX01BWCkKKyAgaWYgKHVzaW5nX3V0ZjggKCkgJiYgYnVmIC0gcCA+IE1CX0NV Ul9NQVgpCiAgICAgewogICAgICAgZm9yIChwID0gYnVmOyBidWYgLSBwID4gTUJfQ1VSX01BWDsg cC0tKQogICAgICAgICBpZiAobWJjbGVuX2NhY2hlW3RvX3VjaGFyICgqcCldICE9IChzaXplX3Qp IC0xKQpAQCAtMjQ5LDcgKzI0NSw2IEBAIGlzX21iX21pZGRsZSAoY29uc3QgY2hhciAqKmdvb2Qs IGNvbnN0IGNoYXIgKmJ1ZiwgY29uc3QgY2hhciAqZW5kLAogICAgICAgaWYgKGJ1ZiAtIHAgPT0g TUJfQ1VSX01BWCkKICAgICAgICAgcCA9IGJ1ZjsKICAgICB9Ci0jZW5kaWYKCiAgIG1lbXNldCAo JmN1cl9zdGF0ZSwgMCwgc2l6ZW9mIGN1cl9zdGF0ZSk7CgotLSAKMS45LjAKCgpGcm9tIDUyOTVk MWQ1MjhhZmFiYmExNWVkMDcxMDIxMWJhMjQ4NTRjMGM3YWIgTW9uIFNlcCAxNyAwMDowMDowMCAy MDAxCkZyb206IEppbSBNZXllcmluZyA8bWV5ZXJpbmdAZmIuY29tPgpEYXRlOiBXZWQsIDE5IEZl YiAyMDE0IDE5OjMxOjQzIC0wODAwClN1YmplY3Q6IFtQQVRDSCAyLzJdIGdyZXAgLWk6IGF2b2lk IDIwMHggcGVyZi4gcmVncmVzc2lvbiBpbiBtdWx0aWJ5dGUKIG5vbi1VVEY4IGxvY2FsZXMKCiog c3JjL21haW4uYzogSW5jbHVkZSBkZmEuaC4KKHRyaXZpYWxfY2FzZV9pZ25vcmUpOiBQZXJmb3Jt IHRoaXMgb3B0aW1pemF0aW9uIG9ubHkgZm9yIFVURjggbG9jYWxlcy4KVGhpcyByZWN0aWZpZXMg YSAyMDB4IHBlcmZvcm1hbmNlIHJlZ3Jlc3Npb24gaW4gbXVsdGktYnl0ZSBub24tVVRGOApsb2Nh bGVzIGxpa2UgamFfSlAuZXVjSlAuICBUaGUgcmVncmVzc2lvbiB3YXMgaW50cm9kdWNlZCBieSB0 aGUgMTB4ClVURjgvZ3JlcC1pIHNwZWVkdXAsIGNvbW1pdCB2Mi4xNi00LWc5NzMxOGY1LgoqIE5F V1MgKEJ1ZyBmaXhlcyk6IE1lbnRpb24gaXQuClJlcG9ydGVkIGJ5IE5vcmloaXJvIFRhbmFrYSBp biBodHRwOi8vZGViYnVncy5nbnUub3JnLzE2MjMyIzUwCi0tLQogTkVXUyAgICAgICB8IDUgKysr KysKIHNyYy9tYWluLmMgfCA2ICsrKysrKwogMiBmaWxlcyBjaGFuZ2VkLCAxMSBpbnNlcnRpb25z KCspCgpkaWZmIC0tZ2l0IGEvTkVXUyBiL05FV1MKaW5kZXggNjc4NWE5Ni4uNDlhMTdiMCAxMDA2 NDQKLS0tIGEvTkVXUworKysgYi9ORVdTCkBAIC0yLDYgKzIsMTEgQEAgR05VIGdyZXAgTkVXUyAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC0qLSBvdXRsaW5lIC0qLQoKICogTm90 ZXdvcnRoeSBjaGFuZ2VzIGluIHJlbGVhc2UgPy4/ICg/Pz8/LT8/LT8/KSBbP10KCisqKiBCdWcg Zml4ZXMKKworICBncmVwIC1pIGluIGEgbXVsdGlieXRlLCBub24tVVRGOCBsb2NhbGUgY291bGQg YmUgdXAgdG8gMjAwIHRpbWVzIHNsb3dlcgorICB0aGFuIGluIDIuMTYuICBbYnVnIGludHJvZHVj ZWQgaW4gZ3JlcC0yLjE3XQorCgogKiBOb3Rld29ydGh5IGNoYW5nZXMgaW4gcmVsZWFzZSAyLjE3 ICgyMDE0LTAyLTE3KSBbc3RhYmxlXQoKZGlmZiAtLWdpdCBhL3NyYy9tYWluLmMgYi9zcmMvbWFp bi5jCmluZGV4IGJkMjAyOTcuLjU2ZWM2YjMgMTAwNjQ0Ci0tLSBhL3NyYy9tYWluLmMKKysrIGIv c3JjL21haW4uYwpAQCAtMzQsNiArMzQsNyBAQAogI2luY2x1ZGUgImMtY3R5cGUuaCIKICNpbmNs dWRlICJjbG9zZW91dC5oIgogI2luY2x1ZGUgImNvbG9yaXplLmgiCisjaW5jbHVkZSAiZGZhLmgi CiAjaW5jbHVkZSAiZXJyb3IuaCIKICNpbmNsdWRlICJleGNsdWRlLmgiCiAjaW5jbHVkZSAiZXhp dGZhaWwuaCIKQEAgLTE4ODMsNiArMTg4NCwxMSBAQCBzdGF0aWMgYm9vbAogdHJpdmlhbF9jYXNl X2lnbm9yZSAoc2l6ZV90IGxlbiwgY2hhciBjb25zdCAqa2V5cywKICAgICAgICAgICAgICAgICAg ICAgIHNpemVfdCAqbmV3X2xlbiwgY2hhciAqKm5ld19rZXlzKQogeworICAvKiBQZXJmb3JtIHRo aXMgdHJhbnNsYXRpb24gb25seSBmb3IgVVRGLTguICBPdGhlcndpc2UsIHRoaXMgd291bGQgaW5k dWNlCisgICAgIGEgMTAwLTIwMHggcGVyZm9ybWFuY2UgcGVuYWx0eSBmb3Igbm9uLVVURjggbXVs dGlieXRlIGxvY2FsZXMuICAqLworICBpZiAoICEgdXNpbmdfdXRmOCAoKSkKKyAgICByZXR1cm4g ZmFsc2U7CisKICAgLyogRklYTUU6IGNvbnNpZGVyIHJlbW92aW5nIHRoZSBmb2xsb3dpbmcgcmVz dHJpY3Rpb246CiAgICAgIFJlamVjdCBpZiBLRVlTIGNvbnRhaW4gQVNDSUkgJ1xcJyBvciAnWycu ICAqLwogICBpZiAobWVtY2hyIChrZXlzLCAnXFwnLCBsZW4pIHx8IG1lbWNociAoa2V5cywgJ1sn LCBsZW4pKQotLSAKMS45LjAKCg== --047d7bdc9ebc1c0b8a04f2dc8bac-- From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 20 19:41:26 2014 Received: (at 16232) by debbugs.gnu.org; 21 Feb 2014 00:41:26 +0000 Received: from localhost ([127.0.0.1]:34095 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGeBC-0006SL-EG for submit@debbugs.gnu.org; Thu, 20 Feb 2014 19:41:26 -0500 Received: from mail-pa0-f44.google.com ([209.85.220.44]:58421) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGeBA-0006Rt-ME for 16232@debbugs.gnu.org; Thu, 20 Feb 2014 19:41:25 -0500 Received: by mail-pa0-f44.google.com with SMTP id kq14so2668995pab.3 for <16232@debbugs.gnu.org>; Thu, 20 Feb 2014 16:41:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=6RtbrirL6ijaoF4m1VzGg/0sFny+5PTawxkQwxSzVXc=; b=RtjVaNfYZtiqy+9aHVCSWui5q7VCA7CybCVxt1h8/kRYzNqbp+W+vx0n3hesreUQpT PnyYdilqAAggu69Mik8X0NZ9Q6BUzCQf5Ld/9QbQI6vWoNn+A+mqz0mbFvdIN6E5YIIy NWkwWgu81HVc/fTNkDMexHD29bO0r5LC0xsD35g1koc0VTTt6iRPhtSffOKjfiAwnssp 6OW70s8lFALd7hqrRAKqouBbsZ9OMukDFTTyp3gO9/aZyK3g3CMllaTUghhteAyP9fIf Hz8phwGWqkOGBd+cyvSQvMJUocX1Xt+jn6jNXlZYDpkmG9eSZakKE6CN9avpEtrbkdEc ybfw== X-Received: by 10.66.164.104 with SMTP id yp8mr5454750pab.25.1392943278259; Thu, 20 Feb 2014 16:41:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Thu, 20 Feb 2014 16:40:58 -0800 (PST) In-Reply-To: <53064AC8.8000303@cs.ucla.edu> References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <53050C31.8000606@cs.ucla.edu> <53064AC8.8000303@cs.ucla.edu> From: Jim Meyering Date: Thu, 20 Feb 2014 16:40:58 -0800 X-Google-Sender-Auth: fSUd00-jBbxa4mEdSsfigmDJqoI Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Thu, Feb 20, 2014 at 10:34 AM, Paul Eggert wrote: > On 02/20/2014 09:13 AM, Jim Meyering wrote: >> >> In case your bug fix looks safe/small, and is ready, ... > > Attached. I have some other fixes too, which I'll try to get out the door > today (though I can't promise that). Thank you, Paul. Those looked perfect. I reversed the order, putting the otherwise-failing test after its fix, and pushed both. From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 20 19:45:51 2014 Received: (at 16232) by debbugs.gnu.org; 21 Feb 2014 00:45:51 +0000 Received: from localhost ([127.0.0.1]:34099 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGeFS-0006bf-4E for submit@debbugs.gnu.org; Thu, 20 Feb 2014 19:45:50 -0500 Received: from mail-pb0-f48.google.com ([209.85.160.48]:35736) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGeFO-0006bG-HM for 16232@debbugs.gnu.org; Thu, 20 Feb 2014 19:45:47 -0500 Received: by mail-pb0-f48.google.com with SMTP id rr13so2668531pbb.7 for <16232@debbugs.gnu.org>; Thu, 20 Feb 2014 16:45:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=eJ57Ai4CGkzeuSYSDuZLJSiBF8iCbA/cFRRi9WxAgqU=; b=xtR235GXe/Uxr0cgG/Zir88jtgO34HxKpJLLxJbFQUPe6kPkJ9mq3LMDaKJ4WOxV3Q Ff7/uXlNGVt8s44+EabWj74ClspOnLND//htrNlacp/5jf8QpFaWJ3IZCVgXrbwlYCzi IKZy5KzvJqBQ2PmtA7Jho0Gtm36cDE7anH9Ka3yCyPIq0kpRR8EVFwsC03Bb7CJlgEvN m1sEy/hEALplfyfqnV2xHDByGt/HFyGsGpk/56utCK7bkOZ/J3VPLUPYYe8lQEZP2TAc RygOj8fSAhq/cPctLK/APYVwlsYyOIbLMX3bKmE5KH9qTxdwsOnzPXA6T5hjwr0YI+5P 1rOg== X-Received: by 10.66.145.166 with SMTP id sv6mr5643914pab.31.1392943540525; Thu, 20 Feb 2014 16:45:40 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Thu, 20 Feb 2014 16:45:20 -0800 (PST) In-Reply-To: References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <53050C31.8000606@cs.ucla.edu> <201402201722.s1KHMn0j013167@freefriends.org> From: Jim Meyering Date: Thu, 20 Feb 2014 16:45:20 -0800 X-Google-Sender-Auth: IcEobqhhvX2a54dyUWZJjs32yWI Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Aharon Robbins Content-Type: multipart/mixed; boundary=047d7b6783f6ed596804f2dff029 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org>, Paul Eggert X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --047d7b6783f6ed596804f2dff029 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Feb 20, 2014 at 12:42 PM, Jim Meyering wrote: > Revised commits attached. One more revision. This is a big enough deal (and subtle enough) that I thought I really should add a test for it. Did that, so now it's 3 commits: I'm going to push these in an hour or so, then see if I have time to make the release this evening. --047d7b6783f6ed596804f2dff029 Content-Type: text/plain; charset=US-ASCII; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hrwqjbg91 RnJvbSAxMWNlODA4NjExMDkzNjE1NzBjYmVkZGE2YTk2NjI2NDM2N2Y3Yzc2IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog V2VkLCAxOSBGZWIgMjAxNCAxOToyMjoyNCAtMDgwMApTdWJqZWN0OiBbUEFUQ0ggMS8zXSBtYWlu dDogZ2l2ZSBkZmEuYydzIHVzaW5nX3V0ZjggZnVuY3Rpb24gZXh0ZXJuYWwgc2NvcGUKCiogc3Jj L2RmYS5jICh1c2luZ191dGY4KTogUmVtb3ZlICJzdGF0aWMgaW5saW5lIi4KKiBzcmMvZGZhLmgg KHVzaW5nX3V0ZjgpOiBEZWNsYXJlIGl0LgoqIHNyYy9zZWFyY2h1dGlscy5jIChpc19tYl9taWRk bGUpOiBVc2UgdXNpbmdfdXRmOCByYXRoZXIgdGhhbgpyb2xsaW5nIG91ciBvd24uCi0tLQogc3Jj L2RmYS5jICAgICAgICAgfCAyICstCiBzcmMvZGZhLmggICAgICAgICB8IDIgKysKIHNyYy9zZWFy Y2h1dGlscy5jIHwgOSArKy0tLS0tLS0KIDMgZmlsZXMgY2hhbmdlZCwgNSBpbnNlcnRpb25zKCsp LCA4IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9kZmEuYyBiL3NyYy9kZmEuYwppbmRl eCA5MjY2ZjZmLi44OTA2ZWQzIDEwMDY0NAotLS0gYS9zcmMvZGZhLmMKKysrIGIvc3JjL2RmYS5j CkBAIC03NTMsNyArNzUzLDcgQEAgc2V0Yml0X2Nhc2VfZm9sZF9jIChpbnQgYiwgY2hhcmNsYXNz IGMpCgogLyogVVRGLTggZW5jb2RpbmcgYWxsb3dzIHNvbWUgb3B0aW1pemF0aW9ucyB0aGF0IHdl IGNhbid0IG90aGVyd2lzZQogICAgYXNzdW1lIGluIGEgbXVsdGlieXRlIGVuY29kaW5nLiAgKi8K LXN0YXRpYyBpbmxpbmUgaW50CitpbnQKIHVzaW5nX3V0ZjggKHZvaWQpCiB7CiAgIHN0YXRpYyBp bnQgdXRmOCA9IC0xOwpkaWZmIC0tZ2l0IGEvc3JjL2RmYS5oIGIvc3JjL2RmYS5oCmluZGV4IGJh Y2Q0ODkuLjdlMDY3NGYgMTAwNjQ0Ci0tLSBhL3NyYy9kZmEuaAorKysgYi9zcmMvZGZhLmgKQEAg LTk5LDMgKzk5LDUgQEAgZXh0ZXJuIHZvaWQgZGZhd2FybiAoY29uc3QgY2hhciAqKTsKICAgIHRh a2VzIGEgc2luZ2xlIGFyZ3VtZW50LCBhIE5VTC10ZXJtaW5hdGVkIHN0cmluZyBkZXNjcmliaW5n IHRoZSBlcnJvci4KICAgIFRoZSB1c2VyIG11c3Qgc3VwcGx5IGEgZGZhZXJyb3IuICAqLwogZXh0 ZXJuIF9Ob3JldHVybiB2b2lkIGRmYWVycm9yIChjb25zdCBjaGFyICopOworCitleHRlcm4gaW50 IHVzaW5nX3V0ZjggKHZvaWQpOwpkaWZmIC0tZ2l0IGEvc3JjL3NlYXJjaHV0aWxzLmMgYi9zcmMv c2VhcmNodXRpbHMuYwppbmRleCAzNDc4NDE3Li43MzYzNzAxIDEwMDY0NAotLS0gYS9zcmMvc2Vh cmNodXRpbHMuYworKysgYi9zcmMvc2VhcmNodXRpbHMuYwpAQCAtMTksNiArMTksNyBAQAogI2lu Y2x1ZGUgPGNvbmZpZy5oPgogI2luY2x1ZGUgPGFzc2VydC5oPgogI2luY2x1ZGUgInNlYXJjaC5o IgorI2luY2x1ZGUgImRmYS5oIgogI2lmIEhBVkVfTEFOR0lORk9fQ09ERVNFVAogIyBpbmNsdWRl IDxsYW5naW5mby5oPgogI2VuZGlmCkBAIC0yMzQsMTMgKzIzNSw4IEBAIGlzX21iX21pZGRsZSAo Y29uc3QgY2hhciAqKmdvb2QsIGNvbnN0IGNoYXIgKmJ1ZiwgY29uc3QgY2hhciAqZW5kLAogICBj b25zdCBjaGFyICpwID0gKmdvb2Q7CiAgIGNvbnN0IGNoYXIgKnByZXYgPSBwOwogICBtYnN0YXRl X3QgY3VyX3N0YXRlOwotI2lmIEhBVkVfTEFOR0lORk9fQ09ERVNFVAotICBzdGF0aWMgaW50IGlz X3V0ZjggPSAtMTsKLQotICBpZiAoaXNfdXRmOCA9PSAtMSkKLSAgICBpc191dGY4ID0gU1RSRVEg KG5sX2xhbmdpbmZvIChDT0RFU0VUKSwgIlVURi04Iik7CgotICBpZiAoaXNfdXRmOCAmJiBidWYg LSBwID4gTUJfQ1VSX01BWCkKKyAgaWYgKHVzaW5nX3V0ZjggKCkgJiYgYnVmIC0gcCA+IE1CX0NV Ul9NQVgpCiAgICAgewogICAgICAgZm9yIChwID0gYnVmOyBidWYgLSBwID4gTUJfQ1VSX01BWDsg cC0tKQogICAgICAgICBpZiAobWJjbGVuX2NhY2hlW3RvX3VjaGFyICgqcCldICE9IChzaXplX3Qp IC0xKQpAQCAtMjQ5LDcgKzI0NSw2IEBAIGlzX21iX21pZGRsZSAoY29uc3QgY2hhciAqKmdvb2Qs IGNvbnN0IGNoYXIgKmJ1ZiwgY29uc3QgY2hhciAqZW5kLAogICAgICAgaWYgKGJ1ZiAtIHAgPT0g TUJfQ1VSX01BWCkKICAgICAgICAgcCA9IGJ1ZjsKICAgICB9Ci0jZW5kaWYKCiAgIG1lbXNldCAo JmN1cl9zdGF0ZSwgMCwgc2l6ZW9mIGN1cl9zdGF0ZSk7CgotLSAKMS45LjAKCgpGcm9tIGM3Yzhi Y2RlZmU3YmU1ZjU5YTI0MmVlYTYzZGY3ZjY0ZWFjYjZhMDkgTW9uIFNlcCAxNyAwMDowMDowMCAy MDAxCkZyb206IEppbSBNZXllcmluZyA8bWV5ZXJpbmdAZmIuY29tPgpEYXRlOiBXZWQsIDE5IEZl YiAyMDE0IDE5OjMxOjQzIC0wODAwClN1YmplY3Q6IFtQQVRDSCAyLzNdIGdyZXAgLWk6IGF2b2lk IGEgcGVyZm9ybWFuY2UgcmVncmVzc2lvbiBpbiBtdWx0aWJ5dGUKIG5vbi1VVEY4IGxvY2FsZXMK Ciogc3JjL21haW4uYzogSW5jbHVkZSBkZmEuaC4KKHRyaXZpYWxfY2FzZV9pZ25vcmUpOiBQZXJm b3JtIHRoaXMgb3B0aW1pemF0aW9uIG9ubHkgZm9yIFVURjggbG9jYWxlcy4KVGhpcyByZWN0aWZp ZXMgYSAxMDAtMjAweCBwZXJmb3JtYW5jZSByZWdyZXNzaW9uIGluIG5vbi1VVEY4IG11bHRpLWJ5 dGUKbG9jYWxlcyBsaWtlIGphX0pQLmV1Y0pQLiAgVGhlIHJlZ3Jlc3Npb24gd2FzIGludHJvZHVj ZWQgYnkgdGhlIDEweApVVEY4L2dyZXAtaSBzcGVlZHVwLCBjb21taXQgdjIuMTYtNC1nOTczMThm NS4KKiBORVdTIChCdWcgZml4ZXMpOiBNZW50aW9uIGl0LgpSZXBvcnRlZCBieSBOb3JpaGlybyBU YW5ha2EgaW4gaHR0cDovL2RlYmJ1Z3MuZ251Lm9yZy8xNjIzMiM1MAotLS0KIE5FV1MgICAgICAg fCAzICsrKwogc3JjL21haW4uYyB8IDYgKysrKysrCiAyIGZpbGVzIGNoYW5nZWQsIDkgaW5zZXJ0 aW9ucygrKQoKZGlmZiAtLWdpdCBhL05FV1MgYi9ORVdTCmluZGV4IDZjYmJjNDYuLmRmNTYzMmIg MTAwNjQ0Ci0tLSBhL05FV1MKKysrIGIvTkVXUwpAQCAtNyw2ICs3LDkgQEAgR05VIGdyZXAgTkVX UyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC0qLSBvdXRsaW5lIC0qLQogICBn cmVwIG5vIGxvbmdlciBtaXNoYW5kbGVzIHBhdHRlcm5zIGxpa2UgW15eLX5dIGluIHVuaWJ5dGUg bG9jYWxlcy4KICAgW2J1ZyBpbnRyb2R1Y2VkIGluIGdyZXAtMi44XQoKKyAgZ3JlcCAtaSBpbiBh IG11bHRpYnl0ZSwgbm9uLVVURjggbG9jYWxlIGNvdWxkIGJlIHVwIHRvIDIwMCB0aW1lcyBzbG93 ZXIKKyAgdGhhbiBpbiAyLjE2LiAgW2J1ZyBpbnRyb2R1Y2VkIGluIGdyZXAtMi4xN10KKwoKICog Tm90ZXdvcnRoeSBjaGFuZ2VzIGluIHJlbGVhc2UgMi4xNyAoMjAxNC0wMi0xNykgW3N0YWJsZV0K CmRpZmYgLS1naXQgYS9zcmMvbWFpbi5jIGIvc3JjL21haW4uYwppbmRleCBiZDIwMjk3Li41NmVj NmIzIDEwMDY0NAotLS0gYS9zcmMvbWFpbi5jCisrKyBiL3NyYy9tYWluLmMKQEAgLTM0LDYgKzM0 LDcgQEAKICNpbmNsdWRlICJjLWN0eXBlLmgiCiAjaW5jbHVkZSAiY2xvc2VvdXQuaCIKICNpbmNs dWRlICJjb2xvcml6ZS5oIgorI2luY2x1ZGUgImRmYS5oIgogI2luY2x1ZGUgImVycm9yLmgiCiAj aW5jbHVkZSAiZXhjbHVkZS5oIgogI2luY2x1ZGUgImV4aXRmYWlsLmgiCkBAIC0xODgzLDYgKzE4 ODQsMTEgQEAgc3RhdGljIGJvb2wKIHRyaXZpYWxfY2FzZV9pZ25vcmUgKHNpemVfdCBsZW4sIGNo YXIgY29uc3QgKmtleXMsCiAgICAgICAgICAgICAgICAgICAgICBzaXplX3QgKm5ld19sZW4sIGNo YXIgKipuZXdfa2V5cykKIHsKKyAgLyogUGVyZm9ybSB0aGlzIHRyYW5zbGF0aW9uIG9ubHkgZm9y IFVURi04LiAgT3RoZXJ3aXNlLCB0aGlzIHdvdWxkIGluZHVjZQorICAgICBhIDEwMC0yMDB4IHBl cmZvcm1hbmNlIHBlbmFsdHkgZm9yIG5vbi1VVEY4IG11bHRpYnl0ZSBsb2NhbGVzLiAgKi8KKyAg aWYgKCAhIHVzaW5nX3V0ZjggKCkpCisgICAgcmV0dXJuIGZhbHNlOworCiAgIC8qIEZJWE1FOiBj b25zaWRlciByZW1vdmluZyB0aGUgZm9sbG93aW5nIHJlc3RyaWN0aW9uOgogICAgICBSZWplY3Qg aWYgS0VZUyBjb250YWluIEFTQ0lJICdcXCcgb3IgJ1snLiAgKi8KICAgaWYgKG1lbWNociAoa2V5 cywgJ1xcJywgbGVuKSB8fCBtZW1jaHIgKGtleXMsICdbJywgbGVuKSkKLS0gCjEuOS4wCgoKRnJv bSBkNWJmY2MyZGFhMTIzZmEwZTg2NjA5MDkwNTJmN2NhMmVjNmI3NjQ5IE1vbiBTZXAgMTcgMDA6 MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTogVGh1 LCAyMCBGZWIgMjAxNCAxNjowNjoxMyAtMDgwMApTdWJqZWN0OiBbUEFUQ0ggMy8zXSB0ZXN0czog dGVzdCBmb3IgdGhlIG5vbi1VVEY4IG11bHRpLWJ5dGUgcGVyZm9ybWFuY2UKIHJlZ3Jlc3Npb24K ClRlc3QgZm9yIHRoZSBqdXN0LWZpeGVkIHBlcmZvcm1hbmNlIHJlZ3Jlc3Npb24uCldpdGggYSAx MDAtMjAweCBkaWZmZXJlbnRpYWwsIGl0IGlzIHJlYXNvbmFibGUgdG8gZXhwZWN0IHRoYXQKYSB2 ZXJ5IHNsb3cgc3lzdGVtIHdpbGwgYmUgYWJsZSB0byBjb21wbGV0ZSB0aGUgZGVzaWduYXRlZAp0 YXNrIGluIGEgZmV3IHNlY29uZHMsIHdoaWxlIHdpdGggdGhlIGJ1ZywgZXZlbiBhIHZlcnkgZmFz dApzeXN0ZW0gd291bGQgZXhjZWVkIHRoZSB0aW1lb3V0LgoqIHRlc3RzL21iLW5vbi1VVEY4LXBl cmZvcm1hbmNlOiBOZXcgZmlsZS4KKiB0ZXN0cy9NYWtlZmlsZS5hbSAoVEVTVFMpOiBBZGQgaXQu CiogdGVzdHMvaW5pdC5jZmcgKHJlcXVpcmVfSlBfRVVDX2xvY2FsZV8pOiBOZXcgZnVuY3Rpb24u Ci0tLQogdGVzdHMvTWFrZWZpbGUuYW0gICAgICAgICAgICAgfCAgMSArCiB0ZXN0cy9pbml0LmNm ZyAgICAgICAgICAgICAgICB8IDE2ICsrKysrKysrKysrKysrKysKIHRlc3RzL21iLW5vbi1VVEY4 LXBlcmZvcm1hbmNlIHwgMzIgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysKIDMgZmls ZXMgY2hhbmdlZCwgNDkgaW5zZXJ0aW9ucygrKQogY3JlYXRlIG1vZGUgMTAwNzU1IHRlc3RzL21i LW5vbi1VVEY4LXBlcmZvcm1hbmNlCgpkaWZmIC0tZ2l0IGEvdGVzdHMvTWFrZWZpbGUuYW0gYi90 ZXN0cy9NYWtlZmlsZS5hbQppbmRleCAzMzE0NjdhLi40ZmZlYTg1IDEwMDY0NAotLS0gYS90ZXN0 cy9NYWtlZmlsZS5hbQorKysgYi90ZXN0cy9NYWtlZmlsZS5hbQpAQCAtNzIsNiArNzIsNyBAQCBU RVNUUyA9CQkJCQkJXAogICBraGFkYWZ5CQkJCQlcCiAgIGxvbmctbGluZS12cy0yR2lCLXJlYWQJ CQlcCiAgIG1heC1jb3VudC12cy1jb250ZXh0CQkJCVwKKyAgbWItbm9uLVVURjgtcGVyZm9ybWFu Y2UJCQlcCiAgIG11bHRpYnl0ZS13aGl0ZS1zcGFjZQkJCQlcCiAgIGVtcHR5LWxpbmUtbWIJCQkJ CVwKICAgdW5pYnl0ZS1icmFja2V0LWV4cHIJCQkJXApkaWZmIC0tZ2l0IGEvdGVzdHMvaW5pdC5j ZmcgYi90ZXN0cy9pbml0LmNmZwppbmRleCBlZTVkNTM3Li41NTU1ZTJkIDEwMDY0NAotLS0gYS90 ZXN0cy9pbml0LmNmZworKysgYi90ZXN0cy9pbml0LmNmZwpAQCAtMTAzLDYgKzEwMywyMiBAQCBy ZXF1aXJlX3VuaWJ5dGVfbG9jYWxlKCkKICAgc2tpcF8gJ25vIHVuaWJ5dGUgbG9jYWxlIGZvdW5k JwogfQoKK3JlcXVpcmVfSlBfRVVDX2xvY2FsZV8oKQoreworICBsb2NhbCBsb2NhbGU9amFfSlAu ZXVjSlAKKyAgcGF0aF9wcmVwZW5kXyAuCisgIGNhc2UgJChnZXQtbWItY3VyLW1heCAkbG9jYWxl KSBpbgorICAgIFsyM10pCisgICAgICAgIExDX0FMTD0kbG9jYWxlICYmCisgICAgICAgIGV4cG9y dCBMQ19BTEwgJiYKKyAgICAgICAgcmV0dXJuCisgICAgICAgIDs7CisgICAgKikgOzsKKyAgZXNh YworCisgIHNraXBfICIkbG9jIGxvY2FsZSBub3QgZm91bmQiCit9CisKIGV4cGVuc2l2ZV8oKQog ewogICBpZiB0ZXN0ICIkUlVOX0VYUEVOU0lWRV9URVNUUyIgIT0geWVzOyB0aGVuCmRpZmYgLS1n aXQgYS90ZXN0cy9tYi1ub24tVVRGOC1wZXJmb3JtYW5jZSBiL3Rlc3RzL21iLW5vbi1VVEY4LXBl cmZvcm1hbmNlCm5ldyBmaWxlIG1vZGUgMTAwNzU1CmluZGV4IDAwMDAwMDAuLjI4MmYwYzQKLS0t IC9kZXYvbnVsbAorKysgYi90ZXN0cy9tYi1ub24tVVRGOC1wZXJmb3JtYW5jZQpAQCAtMCwwICsx LDMyIEBACisjIS9iaW4vc2gKKyMgZ3JlcC0yLjE3IHdvdWxkIHRha2UgbmVhcmx5IDIwMHggbG9u Z2VyIHRvIHJ1biB0aGUgY29tbWFuZCBiZWxvdy4KKworIyBDb3B5cmlnaHQgMjAxNCBGcmVlIFNv ZnR3YXJlIEZvdW5kYXRpb24sIEluYy4KKworIyBUaGlzIHByb2dyYW0gaXMgZnJlZSBzb2Z0d2Fy ZTogeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yIG1vZGlmeQorIyBpdCB1bmRlciB0aGUg dGVybXMgb2YgdGhlIEdOVSBHZW5lcmFsIFB1YmxpYyBMaWNlbnNlIGFzIHB1Ymxpc2hlZCBieQor IyB0aGUgRnJlZSBTb2Z0d2FyZSBGb3VuZGF0aW9uLCBlaXRoZXIgdmVyc2lvbiAzIG9mIHRoZSBM aWNlbnNlLCBvcgorIyAoYXQgeW91ciBvcHRpb24pIGFueSBsYXRlciB2ZXJzaW9uLgorCisjIFRo aXMgcHJvZ3JhbSBpcyBkaXN0cmlidXRlZCBpbiB0aGUgaG9wZSB0aGF0IGl0IHdpbGwgYmUgdXNl ZnVsLAorIyBidXQgV0lUSE9VVCBBTlkgV0FSUkFOVFk7IHdpdGhvdXQgZXZlbiB0aGUgaW1wbGll ZCB3YXJyYW50eSBvZgorIyBNRVJDSEFOVEFCSUxJVFkgb3IgRklUTkVTUyBGT1IgQSBQQVJUSUNV TEFSIFBVUlBPU0UuICBTZWUgdGhlCisjIEdOVSBHZW5lcmFsIFB1YmxpYyBMaWNlbnNlIGZvciBt b3JlIGRldGFpbHMuCisKKyMgWW91IHNob3VsZCBoYXZlIHJlY2VpdmVkIGEgY29weSBvZiB0aGUg R05VIEdlbmVyYWwgUHVibGljIExpY2Vuc2UKKyMgYWxvbmcgd2l0aCB0aGlzIHByb2dyYW0uICBJ ZiBub3QsIHNlZSA8aHR0cDovL3d3dy5nbnUub3JnL2xpY2Vuc2VzLz4uCisKKy4gIiR7c3JjZGly PS59L2luaXQuc2giOyBwYXRoX3ByZXBlbmRfIC4uL3NyYworcmVxdWlyZV90aW1lb3V0XworCitm YWlsPTAKKworcmVxdWlyZV9KUF9FVUNfbG9jYWxlXworCit5ZXMgJChwcmludGYgJyUwNzhkJyAw KSB8IGhlYWQgLTUwMDAwID4gaW4gfHwgZnJhbWV3b3JrX2ZhaWx1cmVfCisKKyMgRXhwZWN0IG5v IG1hdGNoLCBpLmUuLCBleGl0IHN0YXR1cyBvZiAxLiAgQW55dGhpbmcgZWxzZSBpcyBhbiBlcnJv ci4KK3RpbWVvdXQgNCBncmVwIC1pIGZvb2JhciBpbjsgc3Q9JD8KK3Rlc3QgJHN0ID0gMSB8fCBm YWlsPTEKKworRXhpdCAkZmFpbAotLSAKMS45LjAKCg== --047d7b6783f6ed596804f2dff029-- From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 20 19:55:27 2014 Received: (at 16232) by debbugs.gnu.org; 21 Feb 2014 00:55:27 +0000 Received: from localhost ([127.0.0.1]:34108 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGeOi-0006tS-Vt for submit@debbugs.gnu.org; Thu, 20 Feb 2014 19:55:26 -0500 Received: from mail-pb0-f41.google.com ([209.85.160.41]:59641) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WGeOe-0006t8-Mv for 16232@debbugs.gnu.org; Thu, 20 Feb 2014 19:55:21 -0500 Received: by mail-pb0-f41.google.com with SMTP id up15so2698171pbc.0 for <16232@debbugs.gnu.org>; Thu, 20 Feb 2014 16:55:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=HcizmNBO2iXYBn1Xd/3IbFmpK/CSnattxhXW+AKDFNs=; b=EvENdDGTsswB3NO4M3VUCX2YPH9Q+wm0S7CrEr5bkZr12QPfFsRwFiE2QpA6hUulTx qWXSTQ6jDeO4oFLXCyw6rwVqJOuFtLvd8rJPAOI8OT02TQCT66fgkyH9CQCsu8QxwF/i xnOpQbeNNU6sMXGEmrNYLUSwIt5r7kvZR+Ca5jduKcpiBLhb+Ovc7QagxTOihnTsv5R0 9pc658FnAVJmU9mMgXBWmPnPVLfxQjHBsbP70D81LbTQ95qrpsRsSjgQCJRi8VrcbIHg sQko+DxnbTTMK6ZJ6KPQHmezx2RGxczSFmGB2ia9n4Eu0Ssr37/vv82nQZjkWClF+xWT 4rFg== X-Received: by 10.67.5.233 with SMTP id cp9mr5410899pad.147.1392944114377; Thu, 20 Feb 2014 16:55:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Thu, 20 Feb 2014 16:54:54 -0800 (PST) In-Reply-To: References: <52D290F7.6050002@draigBrady.com> <20140219232205.23D3.27F6AC2D@kcn.ne.jp> <53050C31.8000606@cs.ucla.edu> <201402201722.s1KHMn0j013167@freefriends.org> From: Jim Meyering Date: Thu, 20 Feb 2014 16:54:54 -0800 X-Google-Sender-Auth: lERXARsYsH4aSt5QIEdgiKZ0dTo Message-ID: Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales To: Aharon Robbins Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16232 Cc: 16232 <16232@debbugs.gnu.org>, Paul Eggert X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) PS. if anyone has really slow hardware on which they can exercise this test, I'd appreciate your letting me know if you can make this new test fail. I chose the "4-second" limit presuming that there are few systems slow enough that they'd take longer than that. Technically, you could probably test with grep-2.16, if that's easier (but not 2.17). Run these commands: grep --version yes $(printf '%078d' 0)|head -50000 > k env LC_ALL=ja_JP.eucJP time grep -i foobar k [use src/grep in place of "grep" above, if you're testing just-build grep -- but in that case, you can just run "make check"] and let me know if it takes longer than 4 seconds. Ideally, I would compare timings of two commands, one with the offending locale, and the other with LC_ALL=C. That would eliminate the scaling issue. From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 07 21:48:27 2014 Received: (at 16232-done) by debbugs.gnu.org; 8 Mar 2014 02:48:27 +0000 Received: from localhost ([127.0.0.1]:55416 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WM7JL-0006jL-AK for submit@debbugs.gnu.org; Fri, 07 Mar 2014 21:48:27 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:41372) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WM7JI-0006jD-UT for 16232-done@debbugs.gnu.org; Fri, 07 Mar 2014 21:48:25 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 1B07C39E8013 for <16232-done@debbugs.gnu.org>; Fri, 7 Mar 2014 18:48:24 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SmSphFL-cbiy for <16232-done@debbugs.gnu.org>; Fri, 7 Mar 2014 18:48:23 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id C123239E8011 for <16232-done@debbugs.gnu.org>; Fri, 7 Mar 2014 18:48:23 -0800 (PST) Message-ID: <531A84F7.8010608@cs.ucla.edu> Date: Fri, 07 Mar 2014 18:48:23 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: 16232-done@debbugs.gnu.org Subject: Re: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 16232-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) This patch has been installed, so I'm taking the liberty of closing the bug report. From unknown Mon Jun 23 16:44:58 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 05 Apr 2014 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator