From unknown Sun Jun 22 17:14:32 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#43862 <43862@debbugs.gnu.org> To: bug#43862 <43862@debbugs.gnu.org> Subject: Status: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax Reply-To: bug#43862 <43862@debbugs.gnu.org> Date: Mon, 23 Jun 2025 00:14:32 +0000 retitle 43862 [PATCH] grep: set RE_NO_SUB for calling regex only to check s= yntax reassign 43862 grep submitter 43862 Norihiro Tanaka severity 43862 normal tag 43862 patch thanks From debbugs-submit-bounces@debbugs.gnu.org Thu Oct 08 05:40:53 2020 Received: (at submit) by debbugs.gnu.org; 8 Oct 2020 09:40:53 +0000 Received: from localhost ([127.0.0.1]:58744 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kQSPt-00061b-19 for submit@debbugs.gnu.org; Thu, 08 Oct 2020 05:40:53 -0400 Received: from lists.gnu.org ([209.51.188.17]:50940) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kQSPp-00061R-Pv for submit@debbugs.gnu.org; Thu, 08 Oct 2020 05:40:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55160) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kQSPp-0005rj-Hd for bug-grep@gnu.org; Thu, 08 Oct 2020 05:40:49 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:49696) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kQSPm-00070j-UW for bug-grep@gnu.org; Thu, 08 Oct 2020 05:40:49 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 8315AC0E69AF for ; Thu, 8 Oct 2020 18:40:39 +0900 (JST) X-matriXscan-loop-detect: f790de5ea4798bcf2b8cfc8a1181b52527836af8 Received: from mail14.kcn.ne.jp ([61.86.6.132]) by mxs02-s with ESMTP; Thu, 08 Oct 2020 18:40:38 +0900 (JST) Received: from [10.120.1.105] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail14.kcn.ne.jp (Postfix) with ESMTPA id D119640A940C for ; Thu, 8 Oct 2020 18:40:37 +0900 (JST) Date: Thu, 08 Oct 2020 18:40:36 +0900 From: Norihiro Tanaka To: Subject: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax Message-Id: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5F7EDD43000000002F00_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized Received-SPF: pass client-ip=61.86.7.212; envelope-from=noritnk@kcn.ne.jp; helo=mailgw05.kcn.ne.jp X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/08 05:40:39 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --------_5F7EDD43000000002F00_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit We can set RE_NO_SUB for calling regex only to check syntax. It brings performance gains in cases to have a lot of enormous epsilon nodes. $ printf '(%020000d)\n' | sed 's/0/|/g' >pat (before) $ time -p env LC_ALL=C src/grep -Ef pat /dev/null real 6.15 user 4.62 sys 1.52 (after) $ time -p env LC_ALL=C src/grep -Ef pat /dev/null real 0.66 user 0.19 sys 0.46 --------_5F7EDD43000000002F00_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-set-RE_NO_SUB-for-calling-regex-only-to-check-s.patch" Content-Disposition: attachment; filename="0001-grep-set-RE_NO_SUB-for-calling-regex-only-to-check-s.patch" Content-Transfer-Encoding: base64 RnJvbSAwZWY0MzI5YzliNGE1Nzg1YzU0ZGZhMWQzNmFhYzJiYjcyODkzMTk4IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDggT2N0IDIwMjAgMTg6MjA6MTMgKzA5MDAKU3ViamVjdDogW1BBVENIXSBncmVw OiBzZXQgUkVfTk9fU1VCIGZvciBjYWxsaW5nIHJlZ2V4IG9ubHkgdG8gY2hlY2sgc3ludGF4Cgoq IHNyYy9kZmFzZWFyY2guYyAocmVnZXhfY29tcGlsZSk6IE5ldyBwYXJhbWV0ZXIuIEFsbCBjYWxs ZXJzIGNoYW5nZWQuCihHRUFjb21waWxlKTogTW92ZSBzZXR0aW5nIHN5bnRheCBmb3IgcmVnZXgg aW50byByZWdleF9jb21waWxlKCkgZnVuY3Rpb24uCi0tLQogc3JjL2RmYXNlYXJjaC5jIHwgICAx NiArKysrKysrKysrKystLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDEyIGluc2VydGlvbnMoKyksIDQg ZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2RmYXNlYXJjaC5jIGIvc3JjL2RmYXNlYXJj aC5jCmluZGV4IDgxMmEwZGMuLjhlZGUwZWMgMTAwNjQ0Ci0tLSBhL3NyYy9kZmFzZWFyY2guYwor KysgYi9zcmMvZGZhc2VhcmNoLmMKQEAgLTE0NSw3ICsxNDUsOCBAQCBwb3NzaWJsZV9iYWNrcmVm c19pbl9wYXR0ZXJuIChjaGFyIGNvbnN0ICprZXlzLCBwdHJkaWZmX3QgbGVuLCBib29sIGJzX3Nh ZmUpCiAKIHN0YXRpYyBib29sCiByZWdleF9jb21waWxlIChzdHJ1Y3QgZGZhX2NvbXAgKmRjLCBj aGFyIGNvbnN0ICpwLCBwdHJkaWZmX3QgbGVuLAotICAgICAgICAgICAgICAgcHRyZGlmZl90IHBj b3VudCwgcHRyZGlmZl90IGxpbmVubywgYm9vbCBzeW50YXhfb25seSkKKyAgICAgICAgICAgICAg IHB0cmRpZmZfdCBwY291bnQsIHB0cmRpZmZfdCBsaW5lbm8sIHJlZ19zeW50YXhfdCBzeW50YXhf Yml0cywKKyAgICAgICAgICAgICAgIGJvb2wgc3ludGF4X29ubHkpCiB7CiAgIHN0cnVjdCByZV9w YXR0ZXJuX2J1ZmZlciBwYXQwOwogICBzdHJ1Y3QgcmVfcGF0dGVybl9idWZmZXIgKnBhdCA9IHN5 bnRheF9vbmx5ID8gJnBhdDAgOiAmZGMtPnBhdHRlcm5zW3Bjb3VudF07CkBAIC0xNTcsNiArMTU4 LDExIEBAIHJlZ2V4X2NvbXBpbGUgKHN0cnVjdCBkZmFfY29tcCAqZGMsIGNoYXIgY29uc3QgKnAs IHB0cmRpZmZfdCBsZW4sCiAKICAgcGF0LT50cmFuc2xhdGUgPSBOVUxMOwogCisgIGlmIChzeW50 YXhfb25seSkKKyAgICByZV9zZXRfc3ludGF4IChzeW50YXhfYml0cyB8IFJFX05PX1NVQik7Cisg IGVsc2UKKyAgICByZV9zZXRfc3ludGF4IChzeW50YXhfYml0cyk7CisKICAgY2hhciBjb25zdCAq ZXJyID0gcmVfY29tcGlsZV9wYXR0ZXJuIChwLCBsZW4sIHBhdCk7CiAgIGlmICghZXJyKQogICAg IHJldHVybiB0cnVlOwpAQCAtMTg5LDcgKzE5NSw2IEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRl cm4sIHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAKICAgaWYgKG1hdGNo X2ljYXNlKQogICAgIHN5bnRheF9iaXRzIHw9IFJFX0lDQVNFOwotICByZV9zZXRfc3ludGF4IChz eW50YXhfYml0cyk7CiAgIGludCBkZmFvcHRzID0gZW9sYnl0ZSA/IDAgOiBERkFfRU9MX05VTDsK ICAgZGZhc3ludGF4IChkYy0+ZGZhLCAmbG9jYWxlaW5mbywgc3ludGF4X2JpdHMsIGRmYW9wdHMp OwogICBib29sIGJzX3NhZmUgPSAhbG9jYWxlaW5mby5tdWx0aWJ5dGUgfCBsb2NhbGVpbmZvLnVz aW5nX3V0Zjg7CkBAIC0yNDIsNyArMjQ3LDEwIEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRlcm4s IHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAgICAgICAgICAgZGMtPnBh dHRlcm5zKys7CiAgICAgICAgIH0KIAotICAgICAgaWYgKCFyZWdleF9jb21waWxlIChkYywgcCwg bGVuLCBkYy0+cGNvdW50LCBsaW5lbm8sICFiYWNrcmVmKSkKKyAgICAgIHJlX3NldF9zeW50YXgg KHN5bnRheF9iaXRzKTsKKworICAgICAgaWYgKCFyZWdleF9jb21waWxlIChkYywgcCwgbGVuLCBk Yy0+cGNvdW50LCBsaW5lbm8sIHN5bnRheF9iaXRzLAorICAgICAgICAgICAgICAgICAgICAgICAg ICAhYmFja3JlZikpCiAgICAgICAgIGNvbXBpbGF0aW9uX2ZhaWxlZCA9IHRydWU7CiAKICAgICAg IHAgPSBzZXAgKyAxOwpAQCAtMzE3LDcgKzMyNSw3IEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRl cm4sIHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAgICAgICAgICAgZGMt PnBhdHRlcm5zLS07CiAgICAgICAgICAgZGMtPnBjb3VudCsrOwogCi0gICAgICAgICAgaWYgKCFy ZWdleF9jb21waWxlIChkYywgYnVmLCBidWZsZW4sIDAsIC0xLCBmYWxzZSkpCisgICAgICAgICAg aWYgKCFyZWdleF9jb21waWxlIChkYywgYnVmLCBidWZsZW4sIDAsIC0xLCBzeW50YXhfYml0cywg ZmFsc2UpKQogICAgICAgICAgICAgYWJvcnQgKCk7CiAgICAgICAgIH0KIAotLSAKMS43LjEKCg== --------_5F7EDD43000000002F00_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 12 19:08:47 2020 Received: (at 43862) by debbugs.gnu.org; 12 Oct 2020 23:08:47 +0000 Received: from localhost ([127.0.0.1]:44328 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kS6vv-0000Zc-JZ for submit@debbugs.gnu.org; Mon, 12 Oct 2020 19:08:47 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:38284) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kS6vu-0000ZP-6A for 43862@debbugs.gnu.org; Mon, 12 Oct 2020 19:08:46 -0400 Received: by mail-wr1-f66.google.com with SMTP id n18so21443732wrs.5 for <43862@debbugs.gnu.org>; Mon, 12 Oct 2020 16:08:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hcz7vMMkwOJhbOOpWX90Q3m/gVJlY+x7//bdOMxHbL0=; b=Kc+5PWAT8KK6gUopZ1u+1GU8/m4aTjnZXnG+AQzhBJ/U0gY/DT4HKGEdi4iJlAJsPe imdLRuyTmBHParwsw0anzqpV8zKJMh3jlCCg1fWWxsm7SgB8xvoPOz6k8izvT5UZ7kfm L8HJjS5/5SoQxue5PEI1NRA3s998bKqiZrtQHm8XBGWFGMF4bDiOdWkJ8ktygHWJWanA kY3XGduPr7ntndV2Lnr3ToAk0fB6vnaWksOV9aJjwL+W4QysmVqefMUh7UnxPG26E8yV RZlEq48s5wFpV7auFbFIObLYTy6LyCsqW2ieI7UG/Rvc10iKk5/pVQ7Mw6+iR4eRaKDP xBFw== X-Gm-Message-State: AOAM532LO+n7S/emWW4INgBFmTUzIdRzYVfuAKJDajlykEHCzKMRUwZ/ fG+oo5a1ppFwnWadr2EHjHLNnNKewctIqhHNwjcZsdGDbys= X-Google-Smtp-Source: ABdhPJyowESIEfzSxeiyC1eZvL/jFxYdgjhl5FEI+g1ZUPwEK/DQAgeRe8cl32rdzdPT7hp4tV4P42tHE21PHM80LHA= X-Received: by 2002:adf:f792:: with SMTP id q18mr5106735wrp.333.1602544120683; Mon, 12 Oct 2020 16:08:40 -0700 (PDT) MIME-Version: 1.0 References: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> In-Reply-To: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Mon, 12 Oct 2020 16:08:28 -0700 Message-ID: Subject: Re: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax To: Norihiro Tanaka Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 43862 Cc: 43862@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka wrote: > > We can set RE_NO_SUB for calling regex only to check syntax. It brings > performance gains in cases to have a lot of enormous epsilon nodes. > > > $ printf '(%020000d)\n' | sed 's/0/|/g' >pat > > (before) > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > real 6.15 > user 4.62 > sys 1.52 > > (after) > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > real 0.66 > user 0.19 > sys 0.46 Thank you. FYI, when running similar commands with and without your patch (with an eye to adding a test), I ran this one (with your patch). It shows that using 80,000 terms caused grep to consume 32GB of memory before being OOM-killed: $ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null Command terminated by signal 9 6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata 32024460maxresident)k 6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps [Exit 137 (KILL)] I will come back to this later this week. From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 01 14:40:15 2020 Received: (at 43862-done) by debbugs.gnu.org; 1 Nov 2020 19:40:15 +0000 Received: from localhost ([127.0.0.1]:38594 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kZJD5-0001ve-2w for submit@debbugs.gnu.org; Sun, 01 Nov 2020 14:40:15 -0500 Received: from mail-pg1-f193.google.com ([209.85.215.193]:33986) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kZJD3-0001vP-G1 for 43862-done@debbugs.gnu.org; Sun, 01 Nov 2020 14:40:13 -0500 Received: by mail-pg1-f193.google.com with SMTP id t14so8943894pgg.1 for <43862-done@debbugs.gnu.org>; Sun, 01 Nov 2020 11:40:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZP8n3UpejrsotxASw19jzakKH0HguYADsI7EdwJgDVU=; b=TzCIc0MU1Gbjw60uVgvEcAr7PgeOSk2K9leS/k7LhZdRCnoFMTyciwsalvExqNSPh4 jnEzFngBVx8EwxWlejPbRZBs83A3Jvu5kXwcEAK3WguNooe99frdPDtiZFJB2ewS2q2w 7Iv/EAb3gsA2/b0qO7RX0h6KIcRhUObWsa60Q/6JwkLxC2UWPfFRWg7zZay2THwB4Gmi EQYPdb9liRI7kfQVGYur0y4J2B01+RRVwNjcQLs04HEl/s0Z4tAEO1U7PdquYhO4WNdb yNEX+q0NLR2II+UyIOke23YktNmoK/Uua+Rbs/UFssDoqMZINOoNv70whHMzW+73td6e 1vJA== X-Gm-Message-State: AOAM532LkN29uGkwRGME+P665zOgmGh6uHQ1py7eihTkV8+NatW+j8fh 9fm60ho+KmHdUjK21QfwBLcJGofRzB/JAiK+GcCgDhIUoEs= X-Google-Smtp-Source: ABdhPJzIC1NtRkfPuxJqIU6Ld4JgvK85rnQE1dACGUHdDQ3r8D+bYbWWblWHfsz51UaRiY4081/cuValMRmnyPy1GtQ= X-Received: by 2002:a17:90a:318d:: with SMTP id j13mr14172930pjb.209.1604259607556; Sun, 01 Nov 2020 11:40:07 -0800 (PST) MIME-Version: 1.0 References: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> In-Reply-To: From: Jim Meyering Date: Sun, 1 Nov 2020 11:39:55 -0800 Message-ID: Subject: Re: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax To: Norihiro Tanaka Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 43862-done Cc: 43862-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Mon, Oct 12, 2020 at 4:08 PM Jim Meyering wrote: > On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka wrote: > > > > We can set RE_NO_SUB for calling regex only to check syntax. It brings > > performance gains in cases to have a lot of enormous epsilon nodes. > > > > > > $ printf '(%020000d)\n' | sed 's/0/|/g' >pat > > > > (before) > > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > > real 6.15 > > user 4.62 > > sys 1.52 > > > > (after) > > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > > real 0.66 > > user 0.19 > > sys 0.46 > > Thank you. > > FYI, when running similar commands with and without your patch (with > an eye to adding a test), I ran this one (with your patch). It shows > that using 80,000 terms caused grep to consume 32GB of memory before > being OOM-killed: > > $ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null > Command terminated by signal 9 > 6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata > 32024460maxresident)k > 6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps > [Exit 137 (KILL)] > > I will come back to this later this week. We must accept the fact that extreme regular expressions will cause resource exhaustion like that when processed by classical regex_* functions. This is yet another good reason to prefer PCRE and to use grep's -P option. In that case, it fails like this: $ printf '(%080000d)\n' | sed 's/0/|/g' |grep -Pf- /dev/null grep: regular expression is too large I have just pushed your patch, but without adding a test. From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 01 23:21:57 2020 Received: (at 43862-done) by debbugs.gnu.org; 2 Nov 2020 04:21:57 +0000 Received: from localhost ([127.0.0.1]:39189 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kZRLw-0002Dr-Vx for submit@debbugs.gnu.org; Sun, 01 Nov 2020 23:21:57 -0500 Received: from mailgw02.kcn.ne.jp ([61.86.7.209]:34303) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kZRLt-0002DZ-Nk for 43862-done@debbugs.gnu.org; Sun, 01 Nov 2020 23:21:55 -0500 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw02.kcn.ne.jp (Postfix) with ESMTP id F2265BFA33 for <43862-done@debbugs.gnu.org>; Mon, 2 Nov 2020 13:21:45 +0900 (JST) X-matriXscan-loop-detect: 3eeae2c2de9d056d9e57c960c55e5af073dcea9e Received: from mail12.kcn.ne.jp ([61.86.6.130]) by mxs02-s with ESMTP; Mon, 02 Nov 2020 13:21:43 +0900 (JST) Received: from [10.120.1.105] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail12.kcn.ne.jp (Postfix) with ESMTPA id 6A1384014563; Mon, 2 Nov 2020 13:21:43 +0900 (JST) Date: Mon, 02 Nov 2020 13:21:42 +0900 From: Norihiro Tanaka To: Jim Meyering Subject: Re: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax In-Reply-To: References: Message-Id: <20201102132141.B4A0.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 43862-done Cc: 43862-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Sun, 1 Nov 2020 11:39:55 -0800 Jim Meyering wrote: > We must accept the fact that extreme regular expressions will cause > resource exhaustion like that when processed by classical regex_* > functions. This is yet another good reason to prefer PCRE and to use > grep's -P option. In that case, it fails like this: > > $ printf '(%080000d)\n' | sed 's/0/|/g' |grep -Pf- /dev/null > grep: regular expression is too large > > I have just pushed your patch, but without adding a test. I also investigated the slowdown, and I reached the same view as you. The regex consumes a lot of memory for patterns with normous epsilon closures. From unknown Sun Jun 22 17:14:32 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 30 Nov 2020 12:24:07 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator