From unknown Wed Sep 24 09:10:15 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#16927 <16927@debbugs.gnu.org> To: bug#16927 <16927@debbugs.gnu.org> Subject: Status: [PATCH] grep: avoid to add same character to a bracket expression Reply-To: bug#16927 <16927@debbugs.gnu.org> Date: Wed, 24 Sep 2025 16:10:15 +0000 retitle 16927 [PATCH] grep: avoid to add same character to a bracket expres= sion reassign 16927 grep submitter 16927 Norihiro Tanaka severity 16927 normal tag 16927 patch thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 03 08:13:26 2014 Received: (at submit) by debbugs.gnu.org; 3 Mar 2014 13:13:26 +0000 Received: from localhost ([127.0.0.1]:48624 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKSgP-0000TG-J4 for submit@debbugs.gnu.org; Mon, 03 Mar 2014 08:13:25 -0500 Received: from pbsg500.nifty.com ([202.248.238.70]:49998) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKSgL-0000T3-1g for submit@debbugs.gnu.org; Mon, 03 Mar 2014 08:13:23 -0500 Received: from [10.120.1.42] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) (authenticated) by pbsg500.nifty.com with ESMTP id s23DD0JK005823 for ; Mon, 3 Mar 2014 22:13:01 +0900 X-Nifty-SrcIP: [118.21.128.66] Date: Mon, 03 Mar 2014 22:13:00 +0900 From: Norihiro Tanaka To: submit@debbugs.gnu.org Subject: [PATCH] grep: avoid to add same character to a bracket expression Message-Id: <20140303221254.D29E.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_53147BE200000000D29A_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.64.06 [ja] X-Spam-Score: 4.8 (++++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Package: grep Tags: patch The patch avoids to add same character to a bracket expression in trivial_case_ignore. That may be able to generate smaller tokens in multibyte locales. For example, FULLWIDTH LATIN CAPITAL LETTER A (ef bd 81) will transform as below, because multibyte characters in CSET is extended to OR expressions in DFA. [...] Content analysis details: (4.8 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see ] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 3.6 OBFU_TEXT_ATTACH BODY: Text attachment with non-text MIME type X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 4.8 (++++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Package: grep Tags: patch The patch avoids to add same character to a bracket expression in trivial_case_ignore. That may be able to generate smaller tokens in multibyte locales. For example, FULLWIDTH LATIN CAPITAL LETTER A (ef bd 81) will transform as below, because multibyte characters in CSET is extended to OR expressions in DFA. [...] Content analysis details: (4.8 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see ] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 3.6 OBFU_TEXT_ATTACH BODY: Text attachment with non-text MIME type --------_53147BE200000000D29A_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Package: grep Tags: patch The patch avoids to add same character to a bracket expression in trivial_case_ignore. That may be able to generate smaller tokens in multibyte locales. For example, FULLWIDTH LATIN CAPITAL LETTER A (ef bd 81) will transform as below, because multibyte characters in CSET is extended to OR expressions in DFA. Before the patch: [AAa] (where each charactecter is fullwidth) EF BD CAT 81 CAT EF BD CAT 81 CAT OR EF BC CAT A1 CAT OR After the patch: [Aa] (where each charactecter is fullwidth) EF BD CAT 81 CAT EF BC CAT A1 CAT OR --------_53147BE200000000D29A_MULTIPART_MIXED_ Content-Type: application/octet-stream; name="patch.txt" Content-Disposition: attachment; filename="patch.txt" Content-Transfer-Encoding: base64 RnJvbSA2NmI5MWVkMTdmN2UyYmY0NDRiZDM1MDE5ZDM4MDMxZWQyM2VlNzg2IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBNb24sIDMgTWFyIDIwMTQgMjE6NDM6NDggKzA5MDAKU3ViamVjdDogW1BBVENIXSBncmVw OiBhdm9pZCB0byBhZGQgc2FtZSBjaGFyYWN0ZXIgdG8gYSBicmFja2V0IGV4cHJlc3Npb24KCiog c3JjL21haW4uYyAodHJpdmlhbF9pZ25vcmVfY2FzZSk6IE9ubHkgd2hlbiB1cHBlcmNhc2UgYW5k L29yCmxvd2VyY2FzZSBpcyBkaWZmZXJlbnQgZnJvbSBvcmlnaW5hbCBjaGFyYWN0ZXIsIGFkZCBp dCB0byBuZXcgcGF0dGVybi4KLS0tCiBzcmMvbWFpbi5jIHwgMjMgKysrKysrKysrKysrKystLS0t LS0tLS0KIDEgZmlsZSBjaGFuZ2VkLCAxNCBpbnNlcnRpb25zKCspLCA5IGRlbGV0aW9ucygtKQoK ZGlmZiAtLWdpdCBhL3NyYy9tYWluLmMgYi9zcmMvbWFpbi5jCmluZGV4IDE0YjdiZTIuLjFlN2M3 OWQgMTAwNjQ0Ci0tLSBhL3NyYy9tYWluLmMKKysrIGIvc3JjL21haW4uYwpAQCAtMTkzMywxNSAr MTkzMywyMCBAQCB0cml2aWFsX2Nhc2VfaWdub3JlIChzaXplX3QgbGVuLCBjaGFyIGNvbnN0ICpr ZXlzLAogICAgICAgICAgIG1lbWNweSAocCwgb3JpZywgbik7CiAgICAgICAgICAgcCArPSBuOwog Ci0gICAgICAgICAgc2l6ZV90IGxjYnl0ZXMgPSBXQ1JUT01CIChwLCBsYywgJm1iX3N0YXRlKTsK LSAgICAgICAgICBpZiAobGNieXRlcyA9PSAoc2l6ZV90KSAtMSkKLSAgICAgICAgICAgIGdvdG8g c2tpcF9jYXNlX2lnbm9yZV9vcHRpbWl6YXRpb247Ci0gICAgICAgICAgcCArPSBsY2J5dGVzOwot Ci0gICAgICAgICAgc2l6ZV90IHVjYnl0ZXMgPSBXQ1JUT01CIChwLCB1YywgJm1iX3N0YXRlKTsK LSAgICAgICAgICBpZiAodWNieXRlcyA9PSAoc2l6ZV90KSAtMSB8fCAhIG1ic2luaXQgKCZtYl9z dGF0ZSkpCi0gICAgICAgICAgICBnb3RvIHNraXBfY2FzZV9pZ25vcmVfb3B0aW1pemF0aW9uOwot ICAgICAgICAgIHAgKz0gdWNieXRlczsKKyAgICAgICAgICBpZiAobGMgIT0gd2MpCisgICAgICAg ICAgICB7CisgICAgICAgICAgICAgIHNpemVfdCBsY2J5dGVzID0gV0NSVE9NQiAocCwgbGMsICZt Yl9zdGF0ZSk7CisgICAgICAgICAgICAgIGlmIChsY2J5dGVzID09IChzaXplX3QpIC0xKQorICAg ICAgICAgICAgICAgIGdvdG8gc2tpcF9jYXNlX2lnbm9yZV9vcHRpbWl6YXRpb247CisgICAgICAg ICAgICAgIHAgKz0gbGNieXRlczsKKyAgICAgICAgICAgIH0KKyAgICAgICAgICBpZiAodWMgIT0g d2MpCisgICAgICAgICAgICB7CisgICAgICAgICAgICAgIHNpemVfdCB1Y2J5dGVzID0gV0NSVE9N QiAocCwgdWMsICZtYl9zdGF0ZSk7CisgICAgICAgICAgICAgIGlmICh1Y2J5dGVzID09IChzaXpl X3QpIC0xIHx8ICEgbWJzaW5pdCAoJm1iX3N0YXRlKSkKKyAgICAgICAgICAgICAgICBnb3RvIHNr aXBfY2FzZV9pZ25vcmVfb3B0aW1pemF0aW9uOworICAgICAgICAgICAgICBwICs9IHVjYnl0ZXM7 CisgICAgICAgICAgICB9CiAKICAgICAgICAgICAqcCsrID0gJ10nOwogICAgICAgICB9Ci0tIAox LjguNS4yCgo= --------_53147BE200000000D29A_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 03 10:31:23 2014 Received: (at 16927-done) by debbugs.gnu.org; 3 Mar 2014 15:31:24 +0000 Received: from localhost ([127.0.0.1]:49252 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKUpv-0005OU-H4 for submit@debbugs.gnu.org; Mon, 03 Mar 2014 10:31:23 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:52022) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WKUps-0005OM-Sh for 16927-done@debbugs.gnu.org; Mon, 03 Mar 2014 10:31:21 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 256B839E8015; Mon, 3 Mar 2014 07:31:20 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TIpcJTW57ZMQ; Mon, 3 Mar 2014 07:31:19 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id CA75E39E8008; Mon, 3 Mar 2014 07:31:19 -0800 (PST) Message-ID: <5314A047.6000905@cs.ucla.edu> Date: Mon, 03 Mar 2014 07:31:19 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Norihiro Tanaka , 16927-done@debbugs.gnu.org Subject: Re: bug#16927: [PATCH] grep: avoid to add same character to a bracket expression References: <20140303221254.D29E.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140303221254.D29E.27F6AC2D@kcn.ne.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 16927-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Thanks, I installed that. From unknown Wed Sep 24 09:10:15 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 01 Apr 2014 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator