From unknown Sun Jun 22 17:18:15 2025 X-Loop: help-debbugs@gnu.org Subject: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 08 Oct 2020 09:41:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 43862 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: 43862@debbugs.gnu.org X-Debbugs-Original-To: Received: via spool by submit@debbugs.gnu.org id=B.160215005323167 (code B ref -1); Thu, 08 Oct 2020 09:41:01 +0000 Received: (at submit) by debbugs.gnu.org; 8 Oct 2020 09:40:53 +0000 Received: from localhost ([127.0.0.1]:58744 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kQSPt-00061b-19 for submit@debbugs.gnu.org; Thu, 08 Oct 2020 05:40:53 -0400 Received: from lists.gnu.org ([209.51.188.17]:50940) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kQSPp-00061R-Pv for submit@debbugs.gnu.org; Thu, 08 Oct 2020 05:40:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55160) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kQSPp-0005rj-Hd for bug-grep@gnu.org; Thu, 08 Oct 2020 05:40:49 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:49696) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kQSPm-00070j-UW for bug-grep@gnu.org; Thu, 08 Oct 2020 05:40:49 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 8315AC0E69AF for ; Thu, 8 Oct 2020 18:40:39 +0900 (JST) X-matriXscan-loop-detect: f790de5ea4798bcf2b8cfc8a1181b52527836af8 Received: from mail14.kcn.ne.jp ([61.86.6.132]) by mxs02-s with ESMTP; Thu, 08 Oct 2020 18:40:38 +0900 (JST) Received: from [10.120.1.105] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail14.kcn.ne.jp (Postfix) with ESMTPA id D119640A940C for ; Thu, 8 Oct 2020 18:40:37 +0900 (JST) Date: Thu, 08 Oct 2020 18:40:36 +0900 From: Norihiro Tanaka Message-Id: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5F7EDD43000000002F00_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized Received-SPF: pass client-ip=61.86.7.212; envelope-from=noritnk@kcn.ne.jp; helo=mailgw05.kcn.ne.jp X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/08 05:40:39 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --------_5F7EDD43000000002F00_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit We can set RE_NO_SUB for calling regex only to check syntax. It brings performance gains in cases to have a lot of enormous epsilon nodes. $ printf '(%020000d)\n' | sed 's/0/|/g' >pat (before) $ time -p env LC_ALL=C src/grep -Ef pat /dev/null real 6.15 user 4.62 sys 1.52 (after) $ time -p env LC_ALL=C src/grep -Ef pat /dev/null real 0.66 user 0.19 sys 0.46 --------_5F7EDD43000000002F00_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-set-RE_NO_SUB-for-calling-regex-only-to-check-s.patch" Content-Disposition: attachment; filename="0001-grep-set-RE_NO_SUB-for-calling-regex-only-to-check-s.patch" Content-Transfer-Encoding: base64 RnJvbSAwZWY0MzI5YzliNGE1Nzg1YzU0ZGZhMWQzNmFhYzJiYjcyODkzMTk4IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDggT2N0IDIwMjAgMTg6MjA6MTMgKzA5MDAKU3ViamVjdDogW1BBVENIXSBncmVw OiBzZXQgUkVfTk9fU1VCIGZvciBjYWxsaW5nIHJlZ2V4IG9ubHkgdG8gY2hlY2sgc3ludGF4Cgoq IHNyYy9kZmFzZWFyY2guYyAocmVnZXhfY29tcGlsZSk6IE5ldyBwYXJhbWV0ZXIuIEFsbCBjYWxs ZXJzIGNoYW5nZWQuCihHRUFjb21waWxlKTogTW92ZSBzZXR0aW5nIHN5bnRheCBmb3IgcmVnZXgg aW50byByZWdleF9jb21waWxlKCkgZnVuY3Rpb24uCi0tLQogc3JjL2RmYXNlYXJjaC5jIHwgICAx NiArKysrKysrKysrKystLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDEyIGluc2VydGlvbnMoKyksIDQg ZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2RmYXNlYXJjaC5jIGIvc3JjL2RmYXNlYXJj aC5jCmluZGV4IDgxMmEwZGMuLjhlZGUwZWMgMTAwNjQ0Ci0tLSBhL3NyYy9kZmFzZWFyY2guYwor KysgYi9zcmMvZGZhc2VhcmNoLmMKQEAgLTE0NSw3ICsxNDUsOCBAQCBwb3NzaWJsZV9iYWNrcmVm c19pbl9wYXR0ZXJuIChjaGFyIGNvbnN0ICprZXlzLCBwdHJkaWZmX3QgbGVuLCBib29sIGJzX3Nh ZmUpCiAKIHN0YXRpYyBib29sCiByZWdleF9jb21waWxlIChzdHJ1Y3QgZGZhX2NvbXAgKmRjLCBj aGFyIGNvbnN0ICpwLCBwdHJkaWZmX3QgbGVuLAotICAgICAgICAgICAgICAgcHRyZGlmZl90IHBj b3VudCwgcHRyZGlmZl90IGxpbmVubywgYm9vbCBzeW50YXhfb25seSkKKyAgICAgICAgICAgICAg IHB0cmRpZmZfdCBwY291bnQsIHB0cmRpZmZfdCBsaW5lbm8sIHJlZ19zeW50YXhfdCBzeW50YXhf Yml0cywKKyAgICAgICAgICAgICAgIGJvb2wgc3ludGF4X29ubHkpCiB7CiAgIHN0cnVjdCByZV9w YXR0ZXJuX2J1ZmZlciBwYXQwOwogICBzdHJ1Y3QgcmVfcGF0dGVybl9idWZmZXIgKnBhdCA9IHN5 bnRheF9vbmx5ID8gJnBhdDAgOiAmZGMtPnBhdHRlcm5zW3Bjb3VudF07CkBAIC0xNTcsNiArMTU4 LDExIEBAIHJlZ2V4X2NvbXBpbGUgKHN0cnVjdCBkZmFfY29tcCAqZGMsIGNoYXIgY29uc3QgKnAs IHB0cmRpZmZfdCBsZW4sCiAKICAgcGF0LT50cmFuc2xhdGUgPSBOVUxMOwogCisgIGlmIChzeW50 YXhfb25seSkKKyAgICByZV9zZXRfc3ludGF4IChzeW50YXhfYml0cyB8IFJFX05PX1NVQik7Cisg IGVsc2UKKyAgICByZV9zZXRfc3ludGF4IChzeW50YXhfYml0cyk7CisKICAgY2hhciBjb25zdCAq ZXJyID0gcmVfY29tcGlsZV9wYXR0ZXJuIChwLCBsZW4sIHBhdCk7CiAgIGlmICghZXJyKQogICAg IHJldHVybiB0cnVlOwpAQCAtMTg5LDcgKzE5NSw2IEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRl cm4sIHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAKICAgaWYgKG1hdGNo X2ljYXNlKQogICAgIHN5bnRheF9iaXRzIHw9IFJFX0lDQVNFOwotICByZV9zZXRfc3ludGF4IChz eW50YXhfYml0cyk7CiAgIGludCBkZmFvcHRzID0gZW9sYnl0ZSA/IDAgOiBERkFfRU9MX05VTDsK ICAgZGZhc3ludGF4IChkYy0+ZGZhLCAmbG9jYWxlaW5mbywgc3ludGF4X2JpdHMsIGRmYW9wdHMp OwogICBib29sIGJzX3NhZmUgPSAhbG9jYWxlaW5mby5tdWx0aWJ5dGUgfCBsb2NhbGVpbmZvLnVz aW5nX3V0Zjg7CkBAIC0yNDIsNyArMjQ3LDEwIEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRlcm4s IHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAgICAgICAgICAgZGMtPnBh dHRlcm5zKys7CiAgICAgICAgIH0KIAotICAgICAgaWYgKCFyZWdleF9jb21waWxlIChkYywgcCwg bGVuLCBkYy0+cGNvdW50LCBsaW5lbm8sICFiYWNrcmVmKSkKKyAgICAgIHJlX3NldF9zeW50YXgg KHN5bnRheF9iaXRzKTsKKworICAgICAgaWYgKCFyZWdleF9jb21waWxlIChkYywgcCwgbGVuLCBk Yy0+cGNvdW50LCBsaW5lbm8sIHN5bnRheF9iaXRzLAorICAgICAgICAgICAgICAgICAgICAgICAg ICAhYmFja3JlZikpCiAgICAgICAgIGNvbXBpbGF0aW9uX2ZhaWxlZCA9IHRydWU7CiAKICAgICAg IHAgPSBzZXAgKyAxOwpAQCAtMzE3LDcgKzMyNSw3IEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRl cm4sIHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAgICAgICAgICAgZGMt PnBhdHRlcm5zLS07CiAgICAgICAgICAgZGMtPnBjb3VudCsrOwogCi0gICAgICAgICAgaWYgKCFy ZWdleF9jb21waWxlIChkYywgYnVmLCBidWZsZW4sIDAsIC0xLCBmYWxzZSkpCisgICAgICAgICAg aWYgKCFyZWdleF9jb21waWxlIChkYywgYnVmLCBidWZsZW4sIDAsIC0xLCBzeW50YXhfYml0cywg ZmFsc2UpKQogICAgICAgICAgICAgYWJvcnQgKCk7CiAgICAgICAgIH0KIAotLSAKMS43LjEKCg== --------_5F7EDD43000000002F00_MULTIPART_MIXED_-- From unknown Sun Jun 22 17:18:15 2025 X-Loop: help-debbugs@gnu.org Subject: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 12 Oct 2020 23:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 43862 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Norihiro Tanaka Cc: 43862@debbugs.gnu.org Received: via spool by 43862-submit@debbugs.gnu.org id=B43862.16025441272212 (code B ref 43862); Mon, 12 Oct 2020 23:09:02 +0000 Received: (at 43862) by debbugs.gnu.org; 12 Oct 2020 23:08:47 +0000 Received: from localhost ([127.0.0.1]:44328 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kS6vv-0000Zc-JZ for submit@debbugs.gnu.org; Mon, 12 Oct 2020 19:08:47 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:38284) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kS6vu-0000ZP-6A for 43862@debbugs.gnu.org; Mon, 12 Oct 2020 19:08:46 -0400 Received: by mail-wr1-f66.google.com with SMTP id n18so21443732wrs.5 for <43862@debbugs.gnu.org>; Mon, 12 Oct 2020 16:08:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hcz7vMMkwOJhbOOpWX90Q3m/gVJlY+x7//bdOMxHbL0=; b=Kc+5PWAT8KK6gUopZ1u+1GU8/m4aTjnZXnG+AQzhBJ/U0gY/DT4HKGEdi4iJlAJsPe imdLRuyTmBHParwsw0anzqpV8zKJMh3jlCCg1fWWxsm7SgB8xvoPOz6k8izvT5UZ7kfm L8HJjS5/5SoQxue5PEI1NRA3s998bKqiZrtQHm8XBGWFGMF4bDiOdWkJ8ktygHWJWanA kY3XGduPr7ntndV2Lnr3ToAk0fB6vnaWksOV9aJjwL+W4QysmVqefMUh7UnxPG26E8yV RZlEq48s5wFpV7auFbFIObLYTy6LyCsqW2ieI7UG/Rvc10iKk5/pVQ7Mw6+iR4eRaKDP xBFw== X-Gm-Message-State: AOAM532LO+n7S/emWW4INgBFmTUzIdRzYVfuAKJDajlykEHCzKMRUwZ/ fG+oo5a1ppFwnWadr2EHjHLNnNKewctIqhHNwjcZsdGDbys= X-Google-Smtp-Source: ABdhPJyowESIEfzSxeiyC1eZvL/jFxYdgjhl5FEI+g1ZUPwEK/DQAgeRe8cl32rdzdPT7hp4tV4P42tHE21PHM80LHA= X-Received: by 2002:adf:f792:: with SMTP id q18mr5106735wrp.333.1602544120683; Mon, 12 Oct 2020 16:08:40 -0700 (PDT) MIME-Version: 1.0 References: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> In-Reply-To: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Mon, 12 Oct 2020 16:08:28 -0700 Message-ID: Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka wrote: > > We can set RE_NO_SUB for calling regex only to check syntax. It brings > performance gains in cases to have a lot of enormous epsilon nodes. > > > $ printf '(%020000d)\n' | sed 's/0/|/g' >pat > > (before) > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > real 6.15 > user 4.62 > sys 1.52 > > (after) > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > real 0.66 > user 0.19 > sys 0.46 Thank you. FYI, when running similar commands with and without your patch (with an eye to adding a test), I ran this one (with your patch). It shows that using 80,000 terms caused grep to consume 32GB of memory before being OOM-killed: $ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null Command terminated by signal 9 6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata 32024460maxresident)k 6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps [Exit 137 (KILL)] I will come back to this later this week. From unknown Sun Jun 22 17:18:15 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Norihiro Tanaka Subject: bug#43862: closed (Re: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax) Message-ID: References: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> X-Gnu-PR-Message: they-closed 43862 X-Gnu-PR-Package: grep X-Gnu-PR-Keywords: patch Reply-To: 43862@debbugs.gnu.org Date: Sun, 01 Nov 2020 19:41:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1604259662-7489-1" This is a multi-part message in MIME format... ------------=_1604259662-7489-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 43862@debbugs.gnu.org. --=20 43862: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D43862 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1604259662-7489-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 43862-done) by debbugs.gnu.org; 1 Nov 2020 19:40:15 +0000 Received: from localhost ([127.0.0.1]:38594 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kZJD5-0001ve-2w for submit@debbugs.gnu.org; Sun, 01 Nov 2020 14:40:15 -0500 Received: from mail-pg1-f193.google.com ([209.85.215.193]:33986) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kZJD3-0001vP-G1 for 43862-done@debbugs.gnu.org; Sun, 01 Nov 2020 14:40:13 -0500 Received: by mail-pg1-f193.google.com with SMTP id t14so8943894pgg.1 for <43862-done@debbugs.gnu.org>; Sun, 01 Nov 2020 11:40:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZP8n3UpejrsotxASw19jzakKH0HguYADsI7EdwJgDVU=; b=TzCIc0MU1Gbjw60uVgvEcAr7PgeOSk2K9leS/k7LhZdRCnoFMTyciwsalvExqNSPh4 jnEzFngBVx8EwxWlejPbRZBs83A3Jvu5kXwcEAK3WguNooe99frdPDtiZFJB2ewS2q2w 7Iv/EAb3gsA2/b0qO7RX0h6KIcRhUObWsa60Q/6JwkLxC2UWPfFRWg7zZay2THwB4Gmi EQYPdb9liRI7kfQVGYur0y4J2B01+RRVwNjcQLs04HEl/s0Z4tAEO1U7PdquYhO4WNdb yNEX+q0NLR2II+UyIOke23YktNmoK/Uua+Rbs/UFssDoqMZINOoNv70whHMzW+73td6e 1vJA== X-Gm-Message-State: AOAM532LkN29uGkwRGME+P665zOgmGh6uHQ1py7eihTkV8+NatW+j8fh 9fm60ho+KmHdUjK21QfwBLcJGofRzB/JAiK+GcCgDhIUoEs= X-Google-Smtp-Source: ABdhPJzIC1NtRkfPuxJqIU6Ld4JgvK85rnQE1dACGUHdDQ3r8D+bYbWWblWHfsz51UaRiY4081/cuValMRmnyPy1GtQ= X-Received: by 2002:a17:90a:318d:: with SMTP id j13mr14172930pjb.209.1604259607556; Sun, 01 Nov 2020 11:40:07 -0800 (PST) MIME-Version: 1.0 References: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> In-Reply-To: From: Jim Meyering Date: Sun, 1 Nov 2020 11:39:55 -0800 Message-ID: Subject: Re: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax To: Norihiro Tanaka Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 43862-done Cc: 43862-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Mon, Oct 12, 2020 at 4:08 PM Jim Meyering wrote: > On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka wrote: > > > > We can set RE_NO_SUB for calling regex only to check syntax. It brings > > performance gains in cases to have a lot of enormous epsilon nodes. > > > > > > $ printf '(%020000d)\n' | sed 's/0/|/g' >pat > > > > (before) > > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > > real 6.15 > > user 4.62 > > sys 1.52 > > > > (after) > > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null > > real 0.66 > > user 0.19 > > sys 0.46 > > Thank you. > > FYI, when running similar commands with and without your patch (with > an eye to adding a test), I ran this one (with your patch). It shows > that using 80,000 terms caused grep to consume 32GB of memory before > being OOM-killed: > > $ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null > Command terminated by signal 9 > 6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata > 32024460maxresident)k > 6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps > [Exit 137 (KILL)] > > I will come back to this later this week. We must accept the fact that extreme regular expressions will cause resource exhaustion like that when processed by classical regex_* functions. This is yet another good reason to prefer PCRE and to use grep's -P option. In that case, it fails like this: $ printf '(%080000d)\n' | sed 's/0/|/g' |grep -Pf- /dev/null grep: regular expression is too large I have just pushed your patch, but without adding a test. ------------=_1604259662-7489-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 8 Oct 2020 09:40:53 +0000 Received: from localhost ([127.0.0.1]:58744 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kQSPt-00061b-19 for submit@debbugs.gnu.org; Thu, 08 Oct 2020 05:40:53 -0400 Received: from lists.gnu.org ([209.51.188.17]:50940) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kQSPp-00061R-Pv for submit@debbugs.gnu.org; Thu, 08 Oct 2020 05:40:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55160) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kQSPp-0005rj-Hd for bug-grep@gnu.org; Thu, 08 Oct 2020 05:40:49 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:49696) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kQSPm-00070j-UW for bug-grep@gnu.org; Thu, 08 Oct 2020 05:40:49 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 8315AC0E69AF for ; Thu, 8 Oct 2020 18:40:39 +0900 (JST) X-matriXscan-loop-detect: f790de5ea4798bcf2b8cfc8a1181b52527836af8 Received: from mail14.kcn.ne.jp ([61.86.6.132]) by mxs02-s with ESMTP; Thu, 08 Oct 2020 18:40:38 +0900 (JST) Received: from [10.120.1.105] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail14.kcn.ne.jp (Postfix) with ESMTPA id D119640A940C for ; Thu, 8 Oct 2020 18:40:37 +0900 (JST) Date: Thu, 08 Oct 2020 18:40:36 +0900 From: Norihiro Tanaka To: Subject: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax Message-Id: <20201008184034.2F0C.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5F7EDD43000000002F00_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized Received-SPF: pass client-ip=61.86.7.212; envelope-from=noritnk@kcn.ne.jp; helo=mailgw05.kcn.ne.jp X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/08 05:40:39 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --------_5F7EDD43000000002F00_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit We can set RE_NO_SUB for calling regex only to check syntax. It brings performance gains in cases to have a lot of enormous epsilon nodes. $ printf '(%020000d)\n' | sed 's/0/|/g' >pat (before) $ time -p env LC_ALL=C src/grep -Ef pat /dev/null real 6.15 user 4.62 sys 1.52 (after) $ time -p env LC_ALL=C src/grep -Ef pat /dev/null real 0.66 user 0.19 sys 0.46 --------_5F7EDD43000000002F00_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-set-RE_NO_SUB-for-calling-regex-only-to-check-s.patch" Content-Disposition: attachment; filename="0001-grep-set-RE_NO_SUB-for-calling-regex-only-to-check-s.patch" Content-Transfer-Encoding: base64 RnJvbSAwZWY0MzI5YzliNGE1Nzg1YzU0ZGZhMWQzNmFhYzJiYjcyODkzMTk4IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDggT2N0IDIwMjAgMTg6MjA6MTMgKzA5MDAKU3ViamVjdDogW1BBVENIXSBncmVw OiBzZXQgUkVfTk9fU1VCIGZvciBjYWxsaW5nIHJlZ2V4IG9ubHkgdG8gY2hlY2sgc3ludGF4Cgoq IHNyYy9kZmFzZWFyY2guYyAocmVnZXhfY29tcGlsZSk6IE5ldyBwYXJhbWV0ZXIuIEFsbCBjYWxs ZXJzIGNoYW5nZWQuCihHRUFjb21waWxlKTogTW92ZSBzZXR0aW5nIHN5bnRheCBmb3IgcmVnZXgg aW50byByZWdleF9jb21waWxlKCkgZnVuY3Rpb24uCi0tLQogc3JjL2RmYXNlYXJjaC5jIHwgICAx NiArKysrKysrKysrKystLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDEyIGluc2VydGlvbnMoKyksIDQg ZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2RmYXNlYXJjaC5jIGIvc3JjL2RmYXNlYXJj aC5jCmluZGV4IDgxMmEwZGMuLjhlZGUwZWMgMTAwNjQ0Ci0tLSBhL3NyYy9kZmFzZWFyY2guYwor KysgYi9zcmMvZGZhc2VhcmNoLmMKQEAgLTE0NSw3ICsxNDUsOCBAQCBwb3NzaWJsZV9iYWNrcmVm c19pbl9wYXR0ZXJuIChjaGFyIGNvbnN0ICprZXlzLCBwdHJkaWZmX3QgbGVuLCBib29sIGJzX3Nh ZmUpCiAKIHN0YXRpYyBib29sCiByZWdleF9jb21waWxlIChzdHJ1Y3QgZGZhX2NvbXAgKmRjLCBj aGFyIGNvbnN0ICpwLCBwdHJkaWZmX3QgbGVuLAotICAgICAgICAgICAgICAgcHRyZGlmZl90IHBj b3VudCwgcHRyZGlmZl90IGxpbmVubywgYm9vbCBzeW50YXhfb25seSkKKyAgICAgICAgICAgICAg IHB0cmRpZmZfdCBwY291bnQsIHB0cmRpZmZfdCBsaW5lbm8sIHJlZ19zeW50YXhfdCBzeW50YXhf Yml0cywKKyAgICAgICAgICAgICAgIGJvb2wgc3ludGF4X29ubHkpCiB7CiAgIHN0cnVjdCByZV9w YXR0ZXJuX2J1ZmZlciBwYXQwOwogICBzdHJ1Y3QgcmVfcGF0dGVybl9idWZmZXIgKnBhdCA9IHN5 bnRheF9vbmx5ID8gJnBhdDAgOiAmZGMtPnBhdHRlcm5zW3Bjb3VudF07CkBAIC0xNTcsNiArMTU4 LDExIEBAIHJlZ2V4X2NvbXBpbGUgKHN0cnVjdCBkZmFfY29tcCAqZGMsIGNoYXIgY29uc3QgKnAs IHB0cmRpZmZfdCBsZW4sCiAKICAgcGF0LT50cmFuc2xhdGUgPSBOVUxMOwogCisgIGlmIChzeW50 YXhfb25seSkKKyAgICByZV9zZXRfc3ludGF4IChzeW50YXhfYml0cyB8IFJFX05PX1NVQik7Cisg IGVsc2UKKyAgICByZV9zZXRfc3ludGF4IChzeW50YXhfYml0cyk7CisKICAgY2hhciBjb25zdCAq ZXJyID0gcmVfY29tcGlsZV9wYXR0ZXJuIChwLCBsZW4sIHBhdCk7CiAgIGlmICghZXJyKQogICAg IHJldHVybiB0cnVlOwpAQCAtMTg5LDcgKzE5NSw2IEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRl cm4sIHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAKICAgaWYgKG1hdGNo X2ljYXNlKQogICAgIHN5bnRheF9iaXRzIHw9IFJFX0lDQVNFOwotICByZV9zZXRfc3ludGF4IChz eW50YXhfYml0cyk7CiAgIGludCBkZmFvcHRzID0gZW9sYnl0ZSA/IDAgOiBERkFfRU9MX05VTDsK ICAgZGZhc3ludGF4IChkYy0+ZGZhLCAmbG9jYWxlaW5mbywgc3ludGF4X2JpdHMsIGRmYW9wdHMp OwogICBib29sIGJzX3NhZmUgPSAhbG9jYWxlaW5mby5tdWx0aWJ5dGUgfCBsb2NhbGVpbmZvLnVz aW5nX3V0Zjg7CkBAIC0yNDIsNyArMjQ3LDEwIEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRlcm4s IHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAgICAgICAgICAgZGMtPnBh dHRlcm5zKys7CiAgICAgICAgIH0KIAotICAgICAgaWYgKCFyZWdleF9jb21waWxlIChkYywgcCwg bGVuLCBkYy0+cGNvdW50LCBsaW5lbm8sICFiYWNrcmVmKSkKKyAgICAgIHJlX3NldF9zeW50YXgg KHN5bnRheF9iaXRzKTsKKworICAgICAgaWYgKCFyZWdleF9jb21waWxlIChkYywgcCwgbGVuLCBk Yy0+cGNvdW50LCBsaW5lbm8sIHN5bnRheF9iaXRzLAorICAgICAgICAgICAgICAgICAgICAgICAg ICAhYmFja3JlZikpCiAgICAgICAgIGNvbXBpbGF0aW9uX2ZhaWxlZCA9IHRydWU7CiAKICAgICAg IHAgPSBzZXAgKyAxOwpAQCAtMzE3LDcgKzMyNSw3IEBAIEdFQWNvbXBpbGUgKGNoYXIgKnBhdHRl cm4sIHNpemVfdCBzaXplLCByZWdfc3ludGF4X3Qgc3ludGF4X2JpdHMsCiAgICAgICAgICAgZGMt PnBhdHRlcm5zLS07CiAgICAgICAgICAgZGMtPnBjb3VudCsrOwogCi0gICAgICAgICAgaWYgKCFy ZWdleF9jb21waWxlIChkYywgYnVmLCBidWZsZW4sIDAsIC0xLCBmYWxzZSkpCisgICAgICAgICAg aWYgKCFyZWdleF9jb21waWxlIChkYywgYnVmLCBidWZsZW4sIDAsIC0xLCBzeW50YXhfYml0cywg ZmFsc2UpKQogICAgICAgICAgICAgYWJvcnQgKCk7CiAgICAgICAgIH0KIAotLSAKMS43LjEKCg== --------_5F7EDD43000000002F00_MULTIPART_MIXED_-- ------------=_1604259662-7489-1-- From unknown Sun Jun 22 17:18:15 2025 X-Loop: help-debbugs@gnu.org Subject: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 02 Nov 2020 04:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 43862 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: Jim Meyering Cc: 43862-done@debbugs.gnu.org Received: via spool by 43862-done@debbugs.gnu.org id=D43862.16042909178551 (code D ref 43862); Mon, 02 Nov 2020 04:22:01 +0000 Received: (at 43862-done) by debbugs.gnu.org; 2 Nov 2020 04:21:57 +0000 Received: from localhost ([127.0.0.1]:39189 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kZRLw-0002Dr-Vx for submit@debbugs.gnu.org; Sun, 01 Nov 2020 23:21:57 -0500 Received: from mailgw02.kcn.ne.jp ([61.86.7.209]:34303) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kZRLt-0002DZ-Nk for 43862-done@debbugs.gnu.org; Sun, 01 Nov 2020 23:21:55 -0500 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw02.kcn.ne.jp (Postfix) with ESMTP id F2265BFA33 for <43862-done@debbugs.gnu.org>; Mon, 2 Nov 2020 13:21:45 +0900 (JST) X-matriXscan-loop-detect: 3eeae2c2de9d056d9e57c960c55e5af073dcea9e Received: from mail12.kcn.ne.jp ([61.86.6.130]) by mxs02-s with ESMTP; Mon, 02 Nov 2020 13:21:43 +0900 (JST) Received: from [10.120.1.105] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail12.kcn.ne.jp (Postfix) with ESMTPA id 6A1384014563; Mon, 2 Nov 2020 13:21:43 +0900 (JST) Date: Mon, 02 Nov 2020 13:21:42 +0900 From: Norihiro Tanaka In-Reply-To: References: Message-Id: <20201102132141.B4A0.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Sun, 1 Nov 2020 11:39:55 -0800 Jim Meyering wrote: > We must accept the fact that extreme regular expressions will cause > resource exhaustion like that when processed by classical regex_* > functions. This is yet another good reason to prefer PCRE and to use > grep's -P option. In that case, it fails like this: > > $ printf '(%080000d)\n' | sed 's/0/|/g' |grep -Pf- /dev/null > grep: regular expression is too large > > I have just pushed your patch, but without adding a test. I also investigated the slowdown, and I reached the same view as you. The regex consumes a lot of memory for patterns with normous epsilon closures.