From unknown Tue Aug 19 10:09:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 18 Oct 2014 12:41:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 18762 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: 18762@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.141363602014329 (code B ref -1); Sat, 18 Oct 2014 12:41:03 +0000 Received: (at submit) by debbugs.gnu.org; 18 Oct 2014 12:40:20 +0000 Received: from localhost ([127.0.0.1]:47981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfTIw-0003j3-R6 for submit@debbugs.gnu.org; Sat, 18 Oct 2014 08:40:19 -0400 Received: from eggs.gnu.org ([208.118.235.92]:51475) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfTIr-0003is-Ls for submit@debbugs.gnu.org; Sat, 18 Oct 2014 08:40:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XfTIj-00050w-3c for submit@debbugs.gnu.org; Sat, 18 Oct 2014 08:40:13 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:60507) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XfTIj-00050r-1O for submit@debbugs.gnu.org; Sat, 18 Oct 2014 08:40:05 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53093) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XfTIb-0002i9-EM for bug-grep@gnu.org; Sat, 18 Oct 2014 08:40:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XfTIT-0004n5-IR for bug-grep@gnu.org; Sat, 18 Oct 2014 08:39:57 -0400 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:42446) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XfTIT-0004lu-98 for bug-grep@gnu.org; Sat, 18 Oct 2014 08:39:49 -0400 Received: from imp02 (mailgw6.kcn.ne.jp [61.86.15.232]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 9A467802E9 for ; Sat, 18 Oct 2014 21:39:44 +0900 (JST) Received: from mail06.kcn.ne.jp ([61.86.6.185]) by imp02 with bizsmtp id 4cfk1p00L3zXHqt01cfkkF; Sat, 18 Oct 2014 21:39:44 +0900 X-OrgRCPT: bug-grep@gnu.org Received: from [10.120.1.30] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail06.kcn.ne.jp (Postfix) with ESMTPA id 4847E1BF00B0 for ; Sat, 18 Oct 2014 21:39:44 +0900 (JST) Date: Sat, 18 Oct 2014 21:39:37 +0900 From: Norihiro Tanaka Message-Id: <20141018213936.42B6.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_54425BB00000000042AA_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --------_54425BB00000000042AA_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit RE_DOT_NEW_LINE and NOT_NULL work for '.' only in regex. OTOH, they work for MBCSET in addition to '.' in DFA. This patch adapts the behavior of DFA to of regex. BTW, at the moment, grep and gawk never use match_mb_charset function to be fixed by it. --------_54425BB00000000042AA_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-don-t-consider-RE_DOT_NEWLINE-and-RE_DOT_NOT_NUL.patch" Content-Disposition: attachment; filename="0001-dfa-don-t-consider-RE_DOT_NEWLINE-and-RE_DOT_NOT_NUL.patch" Content-Transfer-Encoding: base64 RnJvbSAxMDg3NmYwNTAxMGUyZGYzYTA3MDVlOTVhMWJlNjJjYmI5OTBmYmZhIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBXZWQsIDE1IE9jdCAyMDE0IDA4OjI0OjIzICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBkb24ndCBjb25zaWRlciBSRV9ET1RfTkVXTElORSBhbmQgUkVfRE9UX05PVF9OVUxMIGluCiBt YXRjaGluZyB3aXRoIGEgYnJhY2tldCBleHByZXNzaW9uCgpSRV9ET1RfTkVXTElORSBhbmQgUkVf RE9UX05PVF9OVUxMIHNob3VsZCBiZSBhcHBseSB0byBhIGRvdCBvbmx5CndoaWNoIG1hdGNoZXMg YW55IGNoYXJhY3Rlci4gIFNvIGRvbid0IGNvbnNpZGVyIFJFX0RPVF9ORVdMSU5FIGFuZApSRV9E T1RfTk9UX05VTEwgaW4gbWF0Y2hpbmcgd2l0aCBhIGJyYWNrZXQgZXhwcmVzc2lvbi4KCiogc3Jj L2RmYS5jIChtYXRjaF9tYl9jaGFyc2V0KTogUmVtb3ZlIFJFX0RPVF9ORVdMSU5FIGFuZCBSRV9E T1RfTk9UX05VTEwuCi0tLQogc3JjL2RmYS5jIHwgMTIgKy0tLS0tLS0tLS0tCiAxIGZpbGUgY2hh bmdlZCwgMSBpbnNlcnRpb24oKyksIDExIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9k ZmEuYyBiL3NyYy9kZmEuYwppbmRleCA1OGE0YjgzLi5hNGM0OGI1IDEwMDY0NAotLS0gYS9zcmMv ZGZhLmMKKysrIGIvc3JjL2RmYS5jCkBAIC0yOTk4LDE3ICsyOTk4LDcgQEAgbWF0Y2hfbWJfY2hh cnNldCAoc3RydWN0IGRmYSAqZCwgc3RhdGVfbnVtIHMsIHBvc2l0aW9uIHBvcywKICAgaW50IGNv bnRleHQ7CiAKICAgLyogQ2hlY2sgc3ludGF4IGJpdHMuICAqLwotICBpZiAod2MgPT0gKHdjaGFy X3QpIGVvbGJ5dGUpCi0gICAgewotICAgICAgaWYgKCEoc3ludGF4X2JpdHMgJiBSRV9ET1RfTkVX TElORSkpCi0gICAgICAgIHJldHVybiAwOwotICAgIH0KLSAgZWxzZSBpZiAod2MgPT0gKHdjaGFy X3QpICdcMCcpCi0gICAgewotICAgICAgaWYgKHN5bnRheF9iaXRzICYgUkVfRE9UX05PVF9OVUxM KQotICAgICAgICByZXR1cm4gMDsKLSAgICB9Ci0gIGVsc2UgaWYgKHdjID09IFdFT0YpCisgIGlm ICh3YyA9PSBXRU9GKQogICAgIHJldHVybiAwOwogCiAgIGNvbnRleHQgPSB3Y2hhcl9jb250ZXh0 ICh3Yyk7Ci0tIAoyLjEuMQoK --------_54425BB00000000042AA_MULTIPART_MIXED_-- From unknown Tue Aug 19 10:09:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 18 Oct 2014 17:07:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18762 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Norihiro Tanaka Cc: 18762@debbugs.gnu.org Received: via spool by 18762-submit@debbugs.gnu.org id=B18762.14136520197310 (code B ref 18762); Sat, 18 Oct 2014 17:07:02 +0000 Received: (at 18762) by debbugs.gnu.org; 18 Oct 2014 17:06:59 +0000 Received: from localhost ([127.0.0.1]:48630 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfXSz-0001tm-W1 for submit@debbugs.gnu.org; Sat, 18 Oct 2014 13:06:58 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:46584) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfXSw-0001td-9m for 18762@debbugs.gnu.org; Sat, 18 Oct 2014 13:06:55 -0400 Received: by mail-wi0-f178.google.com with SMTP id h11so3714866wiw.17 for <18762@debbugs.gnu.org>; Sat, 18 Oct 2014 10:06:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=olyXdo2ZREOlKMnw+ikrLj9SU4/kUhZ+N0Z0HkFfZsY=; b=ILVWAc87qAY2IczN4Fca+cRP5sqX3gcE/GvOrAGQUhHqPTa7fQqPFSfBrhJZ0pRcjT dkJbULk/Sj7r8OMCMlJ84pwTHIqX2lLAYndyy59cXW5GJLPTXRMaOtVC6P/C9ln1U4tQ n+aL40myIFyXsfWjO/TcvlP8wIq5+zyt7nnVlLLVZjuzmbVI92rka1XYrHd5MFsGnGF9 nT1YJqMs+Duy/0ILBwBp3McNGae/tGfn8mM5k5CDWIqu+aCofLdYQHR/fJdJpQxX18Qb /7e32e3gea0WKjSio/GS1p/cSgBh3khUEvikTZ5COElS44FPqOBq6Xh9zKLHmKWUhhFH 4sIg== X-Received: by 10.180.198.234 with SMTP id jf10mr7140890wic.68.1413652013524; Sat, 18 Oct 2014 10:06:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.86.131 with HTTP; Sat, 18 Oct 2014 10:06:33 -0700 (PDT) In-Reply-To: <20141018213936.42B6.27F6AC2D@kcn.ne.jp> References: <20141018213936.42B6.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Sat, 18 Oct 2014 10:06:33 -0700 X-Google-Sender-Auth: IFEwCye-tbzryjnK9ZmVJhfYn8k Message-ID: Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Sat, Oct 18, 2014 at 5:39 AM, Norihiro Tanaka wrote: > RE_DOT_NEW_LINE and NOT_NULL work for '.' only in regex. OTOH, they > work for MBCSET in addition to '.' in DFA. This patch adapts the behavior > of DFA to of regex. > > BTW, at the moment, grep and gawk never use match_mb_charset function to > be fixed by it. Thank you for the patch. It is clearly correct. However, it presents a puzzle: does your patch induce any semantic change in grep? I.e., is this a bug fix, or simply the removal of code that would have no effect. So far, I have been unable to construct a case for which it induces a semantic change. On the other hand, this does eliminate a few comparisons, so there may be a small performance improvement. From unknown Tue Aug 19 10:09:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 18 Oct 2014 23:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18762 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Jim Meyering Cc: 18762@debbugs.gnu.org Received: via spool by 18762-submit@debbugs.gnu.org id=B18762.14136750234527 (code B ref 18762); Sat, 18 Oct 2014 23:31:02 +0000 Received: (at 18762) by debbugs.gnu.org; 18 Oct 2014 23:30:23 +0000 Received: from localhost ([127.0.0.1]:55331 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfdS2-0001Au-5i for submit@debbugs.gnu.org; Sat, 18 Oct 2014 19:30:22 -0400 Received: from mailgw06.kcn.ne.jp ([61.86.7.213]:38940) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfdRy-0001Ae-UB for 18762@debbugs.gnu.org; Sat, 18 Oct 2014 19:30:20 -0400 Received: from imp02 (mailgw6.kcn.ne.jp [61.86.15.232]) by mailgw06.kcn.ne.jp (Postfix) with ESMTP id D1417C8003 for <18762@debbugs.gnu.org>; Sun, 19 Oct 2014 08:30:11 +0900 (JST) Received: from mail07.kcn.ne.jp ([61.86.6.186]) by imp02 with bizsmtp id 4nWB1p00Z40oyB901nWBYp; Sun, 19 Oct 2014 08:30:11 +0900 X-OrgRCPT: 18762@debbugs.gnu.org Received: from [10.120.1.51] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail07.kcn.ne.jp (Postfix) with ESMTPA id 90332D5002B; Sun, 19 Oct 2014 08:30:11 +0900 (JST) Date: Sun, 19 Oct 2014 08:30:06 +0900 From: Norihiro Tanaka In-Reply-To: References: <20141018213936.42B6.27F6AC2D@kcn.ne.jp> Message-Id: <20141019083006.920A.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Thanks for the review. This is a potential bug fix. However, match_mb_charset function isn't used in grep and gawk, as DFA treats MBCSET as BACKREF by following code if `backref' is provided. Therefore the fix never induces any semantic change in grep and gawk. if (d->states[s].has_mbcset && backref) { *backref = 1; goto done; } Essentially, the function is able to be removed. However, if we regard DFA as a library, we should keep it. From unknown Tue Aug 19 10:09:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 19 Oct 2014 00:17:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18762 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Norihiro Tanaka Cc: 18762@debbugs.gnu.org Received: via spool by 18762-submit@debbugs.gnu.org id=B18762.14136778028958 (code B ref 18762); Sun, 19 Oct 2014 00:17:01 +0000 Received: (at 18762) by debbugs.gnu.org; 19 Oct 2014 00:16:42 +0000 Received: from localhost ([127.0.0.1]:55339 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfeAs-0002KP-1A for submit@debbugs.gnu.org; Sat, 18 Oct 2014 20:16:42 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:53763) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfeAp-0002KD-Ox for 18762@debbugs.gnu.org; Sat, 18 Oct 2014 20:16:40 -0400 Received: by mail-wi0-f173.google.com with SMTP id fb4so4570407wid.6 for <18762@debbugs.gnu.org>; Sat, 18 Oct 2014 17:16:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=FIRepVXfq6z/IcUG07Fk6olSBEHh3ErT8Y0XmK9kjao=; b=U2nRiXdOYHoRXPNuFinBk1AwCb1h+uUaZP4eI8+vEXWGCjaweuM7UkVgWUyPY+OGWk bUIXKloHozIpz1zyT2ysQ5RjsqDoK5CFTU3XL79fYQ0DyuW/laQbVgOmuQvPATB4DYeK 9sN8kdZ68qTl5uPsY32MoNsEBzDxz/2m4bmER4XassZ+bdJkQMKMBx1R3vAWqbYfmzfh AtumqvOSzcgMa0IVMk0CRa3+XSEALBTvdOrx/bVmAlnqXBePqQk071qmq3lRCBuif50z 7BLplY7wSugFxkdpocifOc/hxumOdVLdChYqX6GVbm7pcnilsqL0WmreFnH1DPcaK5HD 6UHw== X-Received: by 10.180.198.234 with SMTP id jf10mr8964370wic.68.1413677794110; Sat, 18 Oct 2014 17:16:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.86.131 with HTTP; Sat, 18 Oct 2014 17:16:13 -0700 (PDT) In-Reply-To: <20141019083006.920A.27F6AC2D@kcn.ne.jp> References: <20141018213936.42B6.27F6AC2D@kcn.ne.jp> <20141019083006.920A.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Sat, 18 Oct 2014 17:16:13 -0700 X-Google-Sender-Auth: S5HcGkW3IEisjdN0G-Of_17_2r8 Message-ID: Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Sat, Oct 18, 2014 at 4:30 PM, Norihiro Tanaka wrote: > Thanks for the review. > > This is a potential bug fix. However, match_mb_charset function isn't > used in grep and gawk, as DFA treats MBCSET as BACKREF by following code > if `backref' is provided. dfa.c's match_mb_charset function *is* used, e.g., in a command like this one: printf '\0' |src/grep -aE '^\s?$' However, as I mentioned, so far I have been unable to construct a combination of syntax_bits settings and input/RE pairs that induces a change in behavior. > Therefore the fix never induces any semantic > change in grep and gawk. > > if (d->states[s].has_mbcset && backref) > { > *backref = 1; > goto done; > } > > Essentially, the function is able to be removed. However, if we regard > DFA as a library, we should keep it. From unknown Tue Aug 19 10:09:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 19 Oct 2014 02:09:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18762 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Jim Meyering Cc: 18762@debbugs.gnu.org Received: via spool by 18762-submit@debbugs.gnu.org id=B18762.141368448919719 (code B ref 18762); Sun, 19 Oct 2014 02:09:01 +0000 Received: (at 18762) by debbugs.gnu.org; 19 Oct 2014 02:08:09 +0000 Received: from localhost ([127.0.0.1]:55375 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xffui-00057v-AT for submit@debbugs.gnu.org; Sat, 18 Oct 2014 22:08:08 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:51301) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xffuf-00057Q-AN for 18762@debbugs.gnu.org; Sat, 18 Oct 2014 22:08:06 -0400 Received: from imp02 (mailgw6.kcn.ne.jp [61.86.15.232]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 631EC67DF5 for <18762@debbugs.gnu.org>; Sun, 19 Oct 2014 11:07:58 +0900 (JST) Received: from mail07.kcn.ne.jp ([61.86.6.186]) by imp02 with bizsmtp id 4q7y1p00S40oyB901q7yKQ; Sun, 19 Oct 2014 11:07:58 +0900 X-OrgRCPT: 18762@debbugs.gnu.org Received: from [10.120.1.51] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail07.kcn.ne.jp (Postfix) with ESMTPA id 16FC9D5009D; Sun, 19 Oct 2014 11:07:58 +0900 (JST) Date: Sun, 19 Oct 2014 11:07:52 +0900 From: Norihiro Tanaka In-Reply-To: References: <20141019083006.920A.27F6AC2D@kcn.ne.jp> Message-Id: <20141019110734.9229.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_54430CA800000000924F_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --------_54430CA800000000924F_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Jim Meyering wrote: > dfa.c's match_mb_charset function *is* used, e.g., in a > command like this one: > > printf '\0' |src/grep -aE '^\s?$' Wow, just it isn't good. I think that behavior of `fails' should be same as of `trans' except `fails' checks accepted conditions, including following part. match_mb_charset() should be avoided as far as possible, as it doesn't support collating symbols and equivalence classes. > /* Falling back to the glibc matcher in this case gives > better performance (up to 25% better on [a-z], for > example) and enables support for collating symbols and > equivalence classes. */ > if (d->states[s].has_mbcset && backref) > { > *backref = 1; > goto done; > } --------_54430CA800000000924F_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-fall-MBCSET-back-to-the-glibc-matcher-in-transit.patch" Content-Disposition: attachment; filename="0001-dfa-fall-MBCSET-back-to-the-glibc-matcher-in-transit.patch" Content-Transfer-Encoding: base64 RnJvbSAzZDAwNmI2MDUzMzdiMmIyYWE2ZjIzNTU3YTE5NDlmNmY1ZGU0NjEzIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBTdW4sIDE5IE9jdCAyMDE0IDEwOjQwOjE4ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBmYWxsIE1CQ1NFVCBiYWNrIHRvIHRoZSBnbGliYyBtYXRjaGVyIGluIHRyYW5zaXRpb24gYXQK IGFjY2VwdGFibGUgcG9zaXRpb24KCkRGQSBkb2Vzbid0IHN1cHBvcnQgY29sbGF0aW5nIHN5bWJv bHMgYW5kIGVxdWl2YWxlbmNlIGNsYXNzZXMsIHNvIGZhbGwKTUJDU0VUIGJhY2sgdG8gdGhlIGds aWJjIG1hdGNoZXIgaW4gdHJhbnNpdGlvbiBhdCBhY2NlcHRhYmxlIHBvc2l0aW9uLgpCVFcgYXQg b3RoZXIgcG9zaXRpb25zLCBoYXMgYWxyZWFkeSBmYWxsZW4gaXQgYmFjay4KCiogc3JjL2RmYS5j IChkZmFleGVjX21haW4pOiBEbyBpdC4KLS0tCiBzcmMvZGZhLmMgfCAyMCArKysrKysrKysrLS0t LS0tLS0tLQogMSBmaWxlIGNoYW5nZWQsIDEwIGluc2VydGlvbnMoKyksIDEwIGRlbGV0aW9ucygt KQoKZGlmZiAtLWdpdCBhL3NyYy9kZmEuYyBiL3NyYy9kZmEuYwppbmRleCA1OGE0YjgzLi5kZTgz Njg5IDEwMDY0NAotLS0gYS9zcmMvZGZhLmMKKysrIGIvc3JjL2RmYS5jCkBAIC0zMzM4LDIwICsz MzM4LDIwIEBAIGRmYWV4ZWNfbWFpbiAoc3RydWN0IGRmYSAqZCwgY2hhciBjb25zdCAqYmVnaW4s IGNoYXIgKmVuZCwKICAgICAgICAgICAgICAgICAgIGNvbnRpbnVlOwogICAgICAgICAgICAgICAg IH0KIAotICAgICAgICAgICAgICAvKiBGYWxsaW5nIGJhY2sgdG8gdGhlIGdsaWJjIG1hdGNoZXIg aW4gdGhpcyBjYXNlIGdpdmVzCi0gICAgICAgICAgICAgICAgIGJldHRlciBwZXJmb3JtYW5jZSAo dXAgdG8gMjUlIGJldHRlciBvbiBbYS16XSwgZm9yCi0gICAgICAgICAgICAgICAgIGV4YW1wbGUp IGFuZCBlbmFibGVzIHN1cHBvcnQgZm9yIGNvbGxhdGluZyBzeW1ib2xzIGFuZAotICAgICAgICAg ICAgICAgICBlcXVpdmFsZW5jZSBjbGFzc2VzLiAgKi8KLSAgICAgICAgICAgICAgaWYgKGQtPnN0 YXRlc1tzXS5oYXNfbWJjc2V0ICYmIGJhY2tyZWYpCi0gICAgICAgICAgICAgICAgewotICAgICAg ICAgICAgICAgICAgKmJhY2tyZWYgPSAxOwotICAgICAgICAgICAgICAgICAgZ290byBkb25lOwot ICAgICAgICAgICAgICAgIH0KLQogICAgICAgICAgICAgICAvKiBUaGUgZm9sbG93aW5nIGNvZGUg aXMgdXNlZCB0d2ljZS4KICAgICAgICAgICAgICAgICAgVXNlIGEgbWFjcm8gdG8gYXZvaWQgdGhl IHJpc2sgdGhhdCB0aGV5IGRpdmVyZ2UuICAqLwogI2RlZmluZSBTdGF0ZV90cmFuc2l0aW9uKCkg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAogICBkbyB7ICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgXAorICAgICAgICAgICAgICAvKiBGYWxsaW5nIGJhY2sgdG8gdGhlIGdsaWJjIG1hdGNo ZXIgaW4gdGhpcyBjYXNlIGdpdmVzICAgXAorICAgICAgICAgICAgICAgICBiZXR0ZXIgcGVyZm9y bWFuY2UgKHVwIHRvIDI1JSBiZXR0ZXIgb24gW2Etel0sIGZvciAgICAgXAorICAgICAgICAgICAg ICAgICBleGFtcGxlKSBhbmQgZW5hYmxlcyBzdXBwb3J0IGZvciBjb2xsYXRpbmcgc3ltYm9scyBh bmQgXAorICAgICAgICAgICAgICAgICBlcXVpdmFsZW5jZSBjbGFzc2VzLiAgKi8gICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgXAorICAgICAgICAgICAgICBpZiAoZC0+c3RhdGVzW3NdLmhh c19tYmNzZXQgJiYgYmFja3JlZikgICAgICAgICAgICAgICAgICAgXAorICAgICAgICAgICAgICAg IHsgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg XAorICAgICAgICAgICAgICAgICAgKmJhY2tyZWYgPSAxOyAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgXAorICAgICAgICAgICAgICAgICAgZ290byBkb25lOyAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAorICAgICAgICAgICAgICAgIH0g ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAor ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgXAogICAgICAgICAgICAgICAvKiBDYW4gbWF0Y2ggd2l0aCBhIG11bHRp Ynl0ZSBjaGFyYWN0ZXIgKGFuZCBtdWx0aS1jaGFyYWN0ZXIgXAogICAgICAgICAgICAgICAgICBj b2xsYXRpbmcgZWxlbWVudCkuICBUcmFuc2l0aW9uIHRhYmxlIG1pZ2h0IGJlIHVwZGF0ZWQuICAq LyBcCiAgICAgICAgICAgICAgIHMgPSB0cmFuc2l0X3N0YXRlIChkLCBzLCAmcCwgKHVuc2lnbmVk IGNoYXIgKikgZW5kKTsgICAgICBcCi0tIAoyLjEuMQoK --------_54430CA800000000924F_MULTIPART_MIXED_-- From unknown Tue Aug 19 10:09:48 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Norihiro Tanaka Subject: bug#18762: closed (Re: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression) Message-ID: References: <20141018213936.42B6.27F6AC2D@kcn.ne.jp> X-Gnu-PR-Message: they-closed 18762 X-Gnu-PR-Package: grep X-Gnu-PR-Keywords: patch Reply-To: 18762@debbugs.gnu.org Date: Mon, 20 Oct 2014 01:26:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1413768362-24656-1" This is a multi-part message in MIME format... ------------=_1413768362-24656-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in m= atching with a bracket expression which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 18762@debbugs.gnu.org. --=20 18762: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D18762 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1413768362-24656-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 18762-done) by debbugs.gnu.org; 20 Oct 2014 01:25:29 +0000 Received: from localhost ([127.0.0.1]:56179 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xg1iz-0006Ov-9y for submit@debbugs.gnu.org; Sun, 19 Oct 2014 21:25:29 -0400 Received: from mail-wi0-f182.google.com ([209.85.212.182]:51616) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xg1iv-0006Od-1i for 18762-done@debbugs.gnu.org; Sun, 19 Oct 2014 21:25:26 -0400 Received: by mail-wi0-f182.google.com with SMTP id n3so5021813wiv.9 for <18762-done@debbugs.gnu.org>; Sun, 19 Oct 2014 18:25:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=fa94RArUEIPrmI60SJ2GrwDSn6WVvYUXPOdvhZMr+7g=; b=sroMAz4ghTUTCaHDUl9mgceC+dn7hApdMGv/bn0QY+dWwaaA+LvNPOkREs1un0QEOU UWaO9ewhLLVgzh0WZwSGViRvXi4ibtQKNbQO886OySD4NVXTEA3KM1/OQdssGY14cy1j pxCfqjl/fg4cFncwwOuZcHd+dxZa7ZLSB9zcneN3Mwao/+GgiUCsVAPZv28D/KfD7uYf gr1Dg10cZiYeE0U+nqCX+wBifexYyWsAaFdvBW1FgZUk6Cti96yc/bHssbhLza7bkG5O Y/dQxLZ0spzDyiiFNhtuc4MR6u052dSLCmqnEdH6UtHdy+sVJuzI0Sj5qpN/bJdV/6+6 sgPg== X-Received: by 10.180.75.116 with SMTP id b20mr15701180wiw.49.1413768319334; Sun, 19 Oct 2014 18:25:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.86.131 with HTTP; Sun, 19 Oct 2014 18:24:56 -0700 (PDT) In-Reply-To: <20141019110734.9229.27F6AC2D@kcn.ne.jp> References: <20141019083006.920A.27F6AC2D@kcn.ne.jp> <20141019110734.9229.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Sun, 19 Oct 2014 18:24:56 -0700 X-Google-Sender-Auth: tTgeRdTeKmdG8G-Ax3x4r7UGGtU Message-ID: Subject: Re: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression To: Norihiro Tanaka Content-Type: multipart/mixed; boundary=f46d0438914f781af00505d0963b X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18762-done Cc: 18762-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --f46d0438914f781af00505d0963b Content-Type: text/plain; charset=ISO-8859-1 On Sat, Oct 18, 2014 at 7:07 PM, Norihiro Tanaka wrote: > Jim Meyering wrote: >> dfa.c's match_mb_charset function *is* used, e.g., in a >> command like this one: >> >> printf '\0' |src/grep -aE '^\s?$' > > Wow, just it isn't good. I think that behavior of `fails' should be > same as of `trans' except `fails' checks accepted conditions, including > following part. match_mb_charset() should be avoided as far as possible, > as it doesn't support collating symbols and equivalence classes. > >> /* Falling back to the glibc matcher in this case gives >> better performance (up to 25% better on [a-z], for >> example) and enables support for collating symbols and >> equivalence classes. */ >> if (d->states[s].has_mbcset && backref) >> { >> *backref = 1; >> goto done; >> } Nice change. I've adjusted the commit log and added the test above, since no other code even excercised the now-inaccessible function. I will push it tomorrow. --f46d0438914f781af00505d0963b Content-Type: application/octet-stream; name="0001-dfa-process-all-MBCSET-constructs-via-glibc-s-matche.patch" Content-Disposition: attachment; filename="0001-dfa-process-all-MBCSET-constructs-via-glibc-s-matche.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i1h553w91 RnJvbSBmNWNkMTkxYTYyNGYzMzIzN2Q3NjE4ZDFjMjQ4MjlhZDUwMWJjNWMwIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBTdW4sIDE5IE9jdCAyMDE0IDEwOjQwOjE4ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBwcm9jZXNzIGFsbCBNQkNTRVQgY29uc3RydWN0cyB2aWEgZ2xpYmMncyBtYXRjaGVyCgpUaGUg REZBIG1hdGNoZXIgZG9lcyBub3Qgc3VwcG9ydCBjb2xsYXRpbmcgc3ltYm9scyBvciBlcXVpdmFs ZW5jZQpjbGFzc2VzLCBzbyBlbnN1cmUgdGhhdCBhbnkgTUJDU0VUIHJlZmVyZW5jZSBpcyBoYW5k bGVkIGJ5IHRoZSBnbGliYwptYXRjaGVyLiAgZGZhLmMgYWxyZWFkeSBoYW5kbGVkIHRoaXMgaW4g b25lIGNhc2UsIGJ1dCBub3QgdGhlIG90aGVyLApzbyB0aGF0IGEgY29tbWFuZCBsaWtlICJwcmlu dGYgJ1wwJyB8c3JjL2dyZXAgLWFFICdeXHM/JCciIHdvdWxkCm1pc3Rha2VubHkgZW5kIHVwIHVz aW5nIGRmYS5jJ3MgbWF0Y2hfbWJfY2hhcnNldCBmdW5jdGlvbiByYXRoZXIKdGhhbiBnbGliYydz IG1hdGNoZXIuCgoqIHNyYy9kZmEuYyAoZGZhZXhlY19tYWluKTogTW92ZSB0aGF0IGNvZGUgaW50 byB0aGUKU3RhdGVfdHJhbnNpdGlvbiBtYWNyby4gIFRoaXMgcmVuZGVycyB0aGUgbWF0Y2hfbWJf Y2hhcnNldAp1bnVzZWQgYnkgZ3JlcC4KKiB0ZXN0cy9tdWx0aWJ5dGUtd2hpdGUtc3BhY2U6IEFk ZCBhIHRlc3QgdG8gZXhlcmNpc2UgdGhlCmp1c3QtcmVuZGVyZWQtaW5hY2Nlc3NpYmxlIGNvZGUg cGF0aC4KLS0tCiBzcmMvZGZhLmMgICAgICAgICAgICAgICAgICAgfCAyMCArKysrKysrKysrLS0t LS0tLS0tLQogdGVzdHMvbXVsdGlieXRlLXdoaXRlLXNwYWNlIHwgMTAgKysrKysrKysrKwogMiBm aWxlcyBjaGFuZ2VkLCAyMCBpbnNlcnRpb25zKCspLCAxMCBkZWxldGlvbnMoLSkKCmRpZmYgLS1n aXQgYS9zcmMvZGZhLmMgYi9zcmMvZGZhLmMKaW5kZXggNThhNGI4My4uZGU4MzY4OSAxMDA2NDQK LS0tIGEvc3JjL2RmYS5jCisrKyBiL3NyYy9kZmEuYwpAQCAtMzMzOCwyMCArMzMzOCwyMCBAQCBk ZmFleGVjX21haW4gKHN0cnVjdCBkZmEgKmQsIGNoYXIgY29uc3QgKmJlZ2luLCBjaGFyICplbmQs CiAgICAgICAgICAgICAgICAgICBjb250aW51ZTsKICAgICAgICAgICAgICAgICB9CgotICAgICAg ICAgICAgICAvKiBGYWxsaW5nIGJhY2sgdG8gdGhlIGdsaWJjIG1hdGNoZXIgaW4gdGhpcyBjYXNl IGdpdmVzCi0gICAgICAgICAgICAgICAgIGJldHRlciBwZXJmb3JtYW5jZSAodXAgdG8gMjUlIGJl dHRlciBvbiBbYS16XSwgZm9yCi0gICAgICAgICAgICAgICAgIGV4YW1wbGUpIGFuZCBlbmFibGVz IHN1cHBvcnQgZm9yIGNvbGxhdGluZyBzeW1ib2xzIGFuZAotICAgICAgICAgICAgICAgICBlcXVp dmFsZW5jZSBjbGFzc2VzLiAgKi8KLSAgICAgICAgICAgICAgaWYgKGQtPnN0YXRlc1tzXS5oYXNf bWJjc2V0ICYmIGJhY2tyZWYpCi0gICAgICAgICAgICAgICAgewotICAgICAgICAgICAgICAgICAg KmJhY2tyZWYgPSAxOwotICAgICAgICAgICAgICAgICAgZ290byBkb25lOwotICAgICAgICAgICAg ICAgIH0KLQogICAgICAgICAgICAgICAvKiBUaGUgZm9sbG93aW5nIGNvZGUgaXMgdXNlZCB0d2lj ZS4KICAgICAgICAgICAgICAgICAgVXNlIGEgbWFjcm8gdG8gYXZvaWQgdGhlIHJpc2sgdGhhdCB0 aGV5IGRpdmVyZ2UuICAqLwogI2RlZmluZSBTdGF0ZV90cmFuc2l0aW9uKCkgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAogICBkbyB7ICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAorICAg ICAgICAgICAgICAvKiBGYWxsaW5nIGJhY2sgdG8gdGhlIGdsaWJjIG1hdGNoZXIgaW4gdGhpcyBj YXNlIGdpdmVzICAgXAorICAgICAgICAgICAgICAgICBiZXR0ZXIgcGVyZm9ybWFuY2UgKHVwIHRv IDI1JSBiZXR0ZXIgb24gW2Etel0sIGZvciAgICAgXAorICAgICAgICAgICAgICAgICBleGFtcGxl KSBhbmQgZW5hYmxlcyBzdXBwb3J0IGZvciBjb2xsYXRpbmcgc3ltYm9scyBhbmQgXAorICAgICAg ICAgICAgICAgICBlcXVpdmFsZW5jZSBjbGFzc2VzLiAgKi8gICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgXAorICAgICAgICAgICAgICBpZiAoZC0+c3RhdGVzW3NdLmhhc19tYmNzZXQgJiYg YmFja3JlZikgICAgICAgICAgICAgICAgICAgXAorICAgICAgICAgICAgICAgIHsgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAorICAgICAgICAg ICAgICAgICAgKmJhY2tyZWYgPSAxOyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgXAorICAgICAgICAgICAgICAgICAgZ290byBkb25lOyAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgXAorICAgICAgICAgICAgICAgIH0gICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXAorICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgXAogICAgICAgICAgICAgICAvKiBDYW4gbWF0Y2ggd2l0aCBhIG11bHRpYnl0ZSBjaGFyYWN0 ZXIgKGFuZCBtdWx0aS1jaGFyYWN0ZXIgXAogICAgICAgICAgICAgICAgICBjb2xsYXRpbmcgZWxl bWVudCkuICBUcmFuc2l0aW9uIHRhYmxlIG1pZ2h0IGJlIHVwZGF0ZWQuICAqLyBcCiAgICAgICAg ICAgICAgIHMgPSB0cmFuc2l0X3N0YXRlIChkLCBzLCAmcCwgKHVuc2lnbmVkIGNoYXIgKikgZW5k KTsgICAgICBcCmRpZmYgLS1naXQgYS90ZXN0cy9tdWx0aWJ5dGUtd2hpdGUtc3BhY2UgYi90ZXN0 cy9tdWx0aWJ5dGUtd2hpdGUtc3BhY2UKaW5kZXggYzliM2QxZi4uNTgxNjY0MyAxMDA3NTUKLS0t IGEvdGVzdHMvbXVsdGlieXRlLXdoaXRlLXNwYWNlCisrKyBiL3Rlc3RzL211bHRpYnl0ZS13aGl0 ZS1zcGFjZQpAQCAtNzMsNCArNzMsMTQgQEAgZm9yIGkgaW4gJHV0Zjhfc3BhY2VfY2hhcmFjdGVy czsgZG8KICAgICAgIHx8IHsgd2Fybl8gIiRpIHZzLiBcXFMgRkFJTEVEIjsgZmFpbD0xOyB9CiBk b25lCgorCisjIFRoaXMgaXMgYSBzZXBhcmF0ZSB0ZXN0LCBvbmx5IG5vbWluYWxseSByZWxhdGVk IHRvIFxzLgorIyBJdCBpcyBzb2xlbHkgdG8gZ2V0IGNvdmVyYWdlIG9mIGEgY29kZSBwYXRoIChl eGVyY2lzaW5nIGRmYS5jJ3MKKyMgbWF0Y2hfbWJfY2hhcnNldCBmdW5jdGlvbikgdGhhdCB3b3Vs ZCBoYXZlIG90aGVyd2lzZSBiZWVuIHVudG91Y2hlZC4KKyMgSG93ZXZlciwgYXMgb2YgdGhlIGNo YW5nZS1zZXQgYWRkaW5nIHRoaXMgbmV3IHRlc3QsIG1hdGNoX21iX2NoYXJzZXQKKyMgaXMgdW5y ZWFjaGFibGUgdmlhIGdyZXAuCitwcmludGYgJ1wwJyB8IGdyZXAgLWFFICdeXHM/JCcgPiBvdXQg Mj4mMQordGVzdCAkPyA9IDEgfHwgZmFpbD0xCitjb21wYXJlIC9kZXYvbnVsbCBvdXQKKwogRXhp dCAkZmFpbAotLSAKMi4wLjAuNDIxLmc3ODZhODlkCgo= --f46d0438914f781af00505d0963b-- ------------=_1413768362-24656-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 18 Oct 2014 12:40:20 +0000 Received: from localhost ([127.0.0.1]:47981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfTIw-0003j3-R6 for submit@debbugs.gnu.org; Sat, 18 Oct 2014 08:40:19 -0400 Received: from eggs.gnu.org ([208.118.235.92]:51475) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XfTIr-0003is-Ls for submit@debbugs.gnu.org; Sat, 18 Oct 2014 08:40:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XfTIj-00050w-3c for submit@debbugs.gnu.org; Sat, 18 Oct 2014 08:40:13 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:60507) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XfTIj-00050r-1O for submit@debbugs.gnu.org; Sat, 18 Oct 2014 08:40:05 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53093) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XfTIb-0002i9-EM for bug-grep@gnu.org; Sat, 18 Oct 2014 08:40:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XfTIT-0004n5-IR for bug-grep@gnu.org; Sat, 18 Oct 2014 08:39:57 -0400 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:42446) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XfTIT-0004lu-98 for bug-grep@gnu.org; Sat, 18 Oct 2014 08:39:49 -0400 Received: from imp02 (mailgw6.kcn.ne.jp [61.86.15.232]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 9A467802E9 for ; Sat, 18 Oct 2014 21:39:44 +0900 (JST) Received: from mail06.kcn.ne.jp ([61.86.6.185]) by imp02 with bizsmtp id 4cfk1p00L3zXHqt01cfkkF; Sat, 18 Oct 2014 21:39:44 +0900 X-OrgRCPT: bug-grep@gnu.org Received: from [10.120.1.30] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail06.kcn.ne.jp (Postfix) with ESMTPA id 4847E1BF00B0 for ; Sat, 18 Oct 2014 21:39:44 +0900 (JST) Date: Sat, 18 Oct 2014 21:39:37 +0900 From: Norihiro Tanaka To: bug-grep@gnu.org Subject: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression Message-Id: <20141018213936.42B6.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_54425BB00000000042AA_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --------_54425BB00000000042AA_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit RE_DOT_NEW_LINE and NOT_NULL work for '.' only in regex. OTOH, they work for MBCSET in addition to '.' in DFA. This patch adapts the behavior of DFA to of regex. BTW, at the moment, grep and gawk never use match_mb_charset function to be fixed by it. --------_54425BB00000000042AA_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-don-t-consider-RE_DOT_NEWLINE-and-RE_DOT_NOT_NUL.patch" Content-Disposition: attachment; filename="0001-dfa-don-t-consider-RE_DOT_NEWLINE-and-RE_DOT_NOT_NUL.patch" Content-Transfer-Encoding: base64 RnJvbSAxMDg3NmYwNTAxMGUyZGYzYTA3MDVlOTVhMWJlNjJjYmI5OTBmYmZhIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBXZWQsIDE1IE9jdCAyMDE0IDA4OjI0OjIzICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBkb24ndCBjb25zaWRlciBSRV9ET1RfTkVXTElORSBhbmQgUkVfRE9UX05PVF9OVUxMIGluCiBt YXRjaGluZyB3aXRoIGEgYnJhY2tldCBleHByZXNzaW9uCgpSRV9ET1RfTkVXTElORSBhbmQgUkVf RE9UX05PVF9OVUxMIHNob3VsZCBiZSBhcHBseSB0byBhIGRvdCBvbmx5CndoaWNoIG1hdGNoZXMg YW55IGNoYXJhY3Rlci4gIFNvIGRvbid0IGNvbnNpZGVyIFJFX0RPVF9ORVdMSU5FIGFuZApSRV9E T1RfTk9UX05VTEwgaW4gbWF0Y2hpbmcgd2l0aCBhIGJyYWNrZXQgZXhwcmVzc2lvbi4KCiogc3Jj L2RmYS5jIChtYXRjaF9tYl9jaGFyc2V0KTogUmVtb3ZlIFJFX0RPVF9ORVdMSU5FIGFuZCBSRV9E T1RfTk9UX05VTEwuCi0tLQogc3JjL2RmYS5jIHwgMTIgKy0tLS0tLS0tLS0tCiAxIGZpbGUgY2hh bmdlZCwgMSBpbnNlcnRpb24oKyksIDExIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9k ZmEuYyBiL3NyYy9kZmEuYwppbmRleCA1OGE0YjgzLi5hNGM0OGI1IDEwMDY0NAotLS0gYS9zcmMv ZGZhLmMKKysrIGIvc3JjL2RmYS5jCkBAIC0yOTk4LDE3ICsyOTk4LDcgQEAgbWF0Y2hfbWJfY2hh cnNldCAoc3RydWN0IGRmYSAqZCwgc3RhdGVfbnVtIHMsIHBvc2l0aW9uIHBvcywKICAgaW50IGNv bnRleHQ7CiAKICAgLyogQ2hlY2sgc3ludGF4IGJpdHMuICAqLwotICBpZiAod2MgPT0gKHdjaGFy X3QpIGVvbGJ5dGUpCi0gICAgewotICAgICAgaWYgKCEoc3ludGF4X2JpdHMgJiBSRV9ET1RfTkVX TElORSkpCi0gICAgICAgIHJldHVybiAwOwotICAgIH0KLSAgZWxzZSBpZiAod2MgPT0gKHdjaGFy X3QpICdcMCcpCi0gICAgewotICAgICAgaWYgKHN5bnRheF9iaXRzICYgUkVfRE9UX05PVF9OVUxM KQotICAgICAgICByZXR1cm4gMDsKLSAgICB9Ci0gIGVsc2UgaWYgKHdjID09IFdFT0YpCisgIGlm ICh3YyA9PSBXRU9GKQogICAgIHJldHVybiAwOwogCiAgIGNvbnRleHQgPSB3Y2hhcl9jb250ZXh0 ICh3Yyk7Ci0tIAoyLjEuMQoK --------_54425BB00000000042AA_MULTIPART_MIXED_-- ------------=_1413768362-24656-1-- From unknown Tue Aug 19 10:09:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 20 Oct 2014 03:17:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18762 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Norihiro Tanaka Cc: 18762-done@debbugs.gnu.org Received: via spool by 18762-done@debbugs.gnu.org id=D18762.14137750003206 (code D ref 18762); Mon, 20 Oct 2014 03:17:02 +0000 Received: (at 18762-done) by debbugs.gnu.org; 20 Oct 2014 03:16:40 +0000 Received: from localhost ([127.0.0.1]:56213 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xg3SZ-0000pd-OD for submit@debbugs.gnu.org; Sun, 19 Oct 2014 23:16:40 -0400 Received: from mail-wi0-f175.google.com ([209.85.212.175]:52493) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xg3SW-0000pJ-Gy for 18762-done@debbugs.gnu.org; Sun, 19 Oct 2014 23:16:37 -0400 Received: by mail-wi0-f175.google.com with SMTP id d1so5921181wiv.14 for <18762-done@debbugs.gnu.org>; Sun, 19 Oct 2014 20:16:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=FOkayOUDXHwqL0m6xHjMYSJAc80o/oA5gojhk8amaPI=; b=R+1MXY5cgX9kyaMiLo3gv8ZQg4q2fFjQ0TKfjQ07T3tHDLzgBG2zklzDMvfvxDq/RS io5Chv7mHeLA8b7/V2J6LnlqBilxoVj6HDZQ/z2BMfWFzMW7WKTpbRvSWuWG8p8c3Uiy 4qmeyNfNWjTISV4iTY2xe6nJ3RgDdKdoSS8uF9uoH/Y4PzwQoZP1gKDf+aaJIA71c6vG Uqrh/CrgI4opzcBL2K9K6PR1kxfnzkMJbjOW6QLeKmCi6M5uACQAk71sa9i7qSM1ofwN +UMTxbjwi8W+KOO4CCkuI1mKQ+C8mp3IkjuRfbaBeWDKb5VAce/QLA/W1SA+OnGMCTuX /DTQ== X-Received: by 10.180.208.42 with SMTP id mb10mr16179055wic.49.1413774990776; Sun, 19 Oct 2014 20:16:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.86.131 with HTTP; Sun, 19 Oct 2014 20:16:09 -0700 (PDT) In-Reply-To: References: <20141019083006.920A.27F6AC2D@kcn.ne.jp> <20141019110734.9229.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Sun, 19 Oct 2014 20:16:09 -0700 X-Google-Sender-Auth: voDJcCZE0oxxrFwiKbqz4gZ-ZUo Message-ID: Content-Type: multipart/mixed; boundary=001a11c382fa1e2fff0505d2243d X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --001a11c382fa1e2fff0505d2243d Content-Type: text/plain; charset=ISO-8859-1 On Sun, Oct 19, 2014 at 6:24 PM, Jim Meyering wrote: > On Sat, Oct 18, 2014 at 7:07 PM, Norihiro Tanaka wrote: >> Jim Meyering wrote: >>> dfa.c's match_mb_charset function *is* used, e.g., in a >>> command like this one: >>> >>> printf '\0' |src/grep -aE '^\s?$' >> >> Wow, just it isn't good. I think that behavior of `fails' should be >> same as of `trans' except `fails' checks accepted conditions, including >> following part. match_mb_charset() should be avoided as far as possible, >> as it doesn't support collating symbols and equivalence classes. >> >>> /* Falling back to the glibc matcher in this case gives >>> better performance (up to 25% better on [a-z], for >>> example) and enables support for collating symbols and >>> equivalence classes. */ >>> if (d->states[s].has_mbcset && backref) >>> { >>> *backref = 1; >>> goto done; >>> } > > Nice change. I've adjusted the commit log and added the test > above, since no other code even excercised the > now-inaccessible function. I will push it tomorrow. By the way, I've also adjusted your preceding patch (see attached), and will push it tomorrow, too. --001a11c382fa1e2fff0505d2243d Content-Type: application/octet-stream; name="0001-dfa-remove-two-erroneous-clauses-from-a-now-unused-f.patch" Content-Disposition: attachment; filename="0001-dfa-remove-two-erroneous-clauses-from-a-now-unused-f.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i1h93y8i1 RnJvbSBiMjQ5MDgwMmRlZmUzYzNiZjdlZjAwMzZhNDUxNWQwMDZhMDhhNzY5IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBXZWQsIDE1IE9jdCAyMDE0IDA4OjI0OjIzICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiByZW1vdmUgdHdvIGVycm9uZW91cyBjbGF1c2VzIGZyb20gYSBub3ctdW51c2VkIGZ1bmN0aW9u CgpSRV9ET1RfTkVXTElORSBhbmQgUkVfRE9UX05PVF9OVUxMIGFwcGx5IG9ubHkgdG8gYSBkb3Qg dGhhdAptYXRjaGVzIGFueSBjaGFyYWN0ZXIuICBEbyBub3QgY29uc2lkZXIgdGhlbSB3aGVuIG1h dGNoaW5nCndpdGggYSBicmFja2V0IGV4cHJlc3Npb24uCgoqIHNyYy9kZmEuYyAobWF0Y2hfbWJf Y2hhcnNldCk6IFJlbW92ZSB0ZXN0cyBmb3IgUkVfRE9UX05FV0xJTkUKYW5kIFJFX0RPVF9OT1Rf TlVMTC4KLS0tCiBzcmMvZGZhLmMgfCAxMiArLS0tLS0tLS0tLS0KIDEgZmlsZSBjaGFuZ2VkLCAx IGluc2VydGlvbigrKSwgMTEgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2RmYS5jIGIv c3JjL2RmYS5jCmluZGV4IGRlODM2ODkuLjgwNTEwYTggMTAwNjQ0Ci0tLSBhL3NyYy9kZmEuYwor KysgYi9zcmMvZGZhLmMKQEAgLTI5OTgsMTcgKzI5OTgsNyBAQCBtYXRjaF9tYl9jaGFyc2V0IChz dHJ1Y3QgZGZhICpkLCBzdGF0ZV9udW0gcywgcG9zaXRpb24gcG9zLAogICBpbnQgY29udGV4dDsK CiAgIC8qIENoZWNrIHN5bnRheCBiaXRzLiAgKi8KLSAgaWYgKHdjID09ICh3Y2hhcl90KSBlb2xi eXRlKQotICAgIHsKLSAgICAgIGlmICghKHN5bnRheF9iaXRzICYgUkVfRE9UX05FV0xJTkUpKQot ICAgICAgICByZXR1cm4gMDsKLSAgICB9Ci0gIGVsc2UgaWYgKHdjID09ICh3Y2hhcl90KSAnXDAn KQotICAgIHsKLSAgICAgIGlmIChzeW50YXhfYml0cyAmIFJFX0RPVF9OT1RfTlVMTCkKLSAgICAg ICAgcmV0dXJuIDA7Ci0gICAgfQotICBlbHNlIGlmICh3YyA9PSBXRU9GKQorICBpZiAod2MgPT0g V0VPRikKICAgICByZXR1cm4gMDsKCiAgIGNvbnRleHQgPSB3Y2hhcl9jb250ZXh0ICh3Yyk7Ci0t IAoyLjAuMC40MjEuZzc4NmE4OWQKCg== --001a11c382fa1e2fff0505d2243d-- From unknown Tue Aug 19 10:09:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 20 Oct 2014 14:16:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18762 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Jim Meyering Cc: 18762-done@debbugs.gnu.org Received: via spool by 18762-done@debbugs.gnu.org id=D18762.141381455115915 (code D ref 18762); Mon, 20 Oct 2014 14:16:01 +0000 Received: (at 18762-done) by debbugs.gnu.org; 20 Oct 2014 14:15:51 +0000 Received: from localhost ([127.0.0.1]:57345 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XgDkV-00048c-3v for submit@debbugs.gnu.org; Mon, 20 Oct 2014 10:15:51 -0400 Received: from mailgw04.kcn.ne.jp ([61.86.7.211]:51798) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XgDkR-00048L-LP for 18762-done@debbugs.gnu.org; Mon, 20 Oct 2014 10:15:49 -0400 Received: from imp01 (mailgw5.kcn.ne.jp [61.86.15.231]) by mailgw04.kcn.ne.jp (Postfix) with ESMTP id 6C8F56C1C6C for <18762-done@debbugs.gnu.org>; Mon, 20 Oct 2014 23:15:38 +0900 (JST) Received: from mail07.kcn.ne.jp ([61.86.6.186]) by imp01 with bizsmtp id 5SFe1p00940oyB901SFegA; Mon, 20 Oct 2014 23:15:38 +0900 X-OrgRCPT: 18762-done@debbugs.gnu.org Received: from [10.120.1.60] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail07.kcn.ne.jp (Postfix) with ESMTPA id 0BF0ED5009D; Mon, 20 Oct 2014 23:15:37 +0900 (JST) Date: Mon, 20 Oct 2014 23:15:36 +0900 From: Norihiro Tanaka In-Reply-To: References: Message-Id: <20141020231534.100E.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-Spam-Score: -1.4 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.4 (-) Jim Meyering wrote: > > Nice change. I've adjusted the commit log and added the test > > above, since no other code even excercised the > > now-inaccessible function. I will push it tomorrow. > > By the way, I've also adjusted your preceding patch (see attached), > and will push it tomorrow, too. Thanks for the adjustments and addition of the test.