From unknown Sun Jun 15 13:01:21 2025 X-Loop: help-debbugs@gnu.org Subject: bug#51727: add an optional flag to -P to disable JIT Resent-From: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Tue, 09 Nov 2021 19:05:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 51727 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 51727@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.16364846854641 (code B ref -1); Tue, 09 Nov 2021 19:05:02 +0000 Received: (at submit) by debbugs.gnu.org; 9 Nov 2021 19:04:45 +0000 Received: from localhost ([127.0.0.1]:35797 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mkWQG-0001Cn-N2 for submit@debbugs.gnu.org; Tue, 09 Nov 2021 14:04:45 -0500 Received: from lists.gnu.org ([209.51.188.17]:54450) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mkWQE-0001Cd-8Q for submit@debbugs.gnu.org; Tue, 09 Nov 2021 14:04:42 -0500 Received: from eggs.gnu.org ([209.51.188.92]:36166) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mkWQE-000720-4n for bug-grep@gnu.org; Tue, 09 Nov 2021 14:04:42 -0500 Received: from [2607:f8b0:4864:20::f2c] (port=40927 helo=mail-qv1-xf2c.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mkWQ7-0000Jf-VG for bug-grep@gnu.org; Tue, 09 Nov 2021 14:04:38 -0500 Received: by mail-qv1-xf2c.google.com with SMTP id b11so176757qvm.7 for ; Tue, 09 Nov 2021 11:04:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:subject:message-id:mime-version:content-disposition :content-transfer-encoding; bh=r9Gcrcqr5YKWu5oevDkC+8ogsyAwVkPgfPspQ0QQ3GI=; b=Ilb2FBru7bYl7iY8wzC0WQIlQJuZLFQm+GwZFEZ/BMnPcmLqj4fs8a8shgXgXzY7fy 2/PjAyUz4p1KxMwO4kYsLUeBbEQc2komz0cq7f2SqgF64qsSlmb8qZvk0lhiroHCzYTY qHctHjqNF9cq2oAd2HUek2+WTjuWgHmqrGDkop+KefpSfr7DoCTlDlJOsh4dvW0fKp0N ABCMwpewW4+0WmMtIppp0j4oZO2c0EiKI6owU653vcjJShKpTsbkUe2j0ZBWDCgJlqn0 KTMVeYXjJ29b1GPyvO3qjfCl4fNPSSi5/+ABp2AFMlAcakYO4qXmUD307bghPHaPE1J6 zQPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:content-transfer-encoding; bh=r9Gcrcqr5YKWu5oevDkC+8ogsyAwVkPgfPspQ0QQ3GI=; b=xpquyFNImrJSSjWsF4Vf1f/dVU3BpnY8kgQiOwlCnRU78hypb4USEuqkTzMlHit7+4 Bn3zi5HjX7qgBmbyrsnjQUt3/nJJ6awN/QEtGoW+6ZPs7y4AVSNyN2cUcJ1Y7o8QOzKW 6jfgNayJwV1NR+2ln4aaPvNoHepKjwYNn0YH/NeW7wE1P/iaLffZwTVXLzOJHl7QA44B OmgnRAnpO68rAycgPuWf8X4ZbiHWXinCq92Rz0XpbIC6YKg2PVobxl6kS9IjVH3/Ljzr 5whwCipnWMn26otL+WaxUocj0Jdk8xJ7V13edxPM8fUmQDikhETxrUmZNs6fv3XIteRY 7+SQ== X-Gm-Message-State: AOAM530zz7F2WEqSRr5HbDSc095VFNLXBQCJRGSx01yIV6XMP5azGLfF O+PgX7OguRD78tPOeqOFMOTn2HBhT/s= X-Google-Smtp-Source: ABdhPJyBtvLQYmWurZRvEesYcHkMZFcfo6+3/+Skfy1XZIupJoeuetfP6RrEq1/tbn0uk0KYGJ+SzA== X-Received: by 2002:a05:6214:1bc6:: with SMTP id m6mr9927637qvc.14.1636484674466; Tue, 09 Nov 2021 11:04:34 -0800 (PST) Received: from carlos-mbp.lan (104-1-92-200.lightspeed.sntcca.sbcglobal.net. [104.1.92.200]) by smtp.gmail.com with ESMTPSA id 15sm833872qtp.55.2021.11.09.11.04.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Nov 2021 11:04:33 -0800 (PST) Date: Tue, 9 Nov 2021 11:04:31 -0800 From: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="fYzImzj8SfRAFUen" Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::f2c (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::f2c; envelope-from=carenas@gmail.com; helo=mail-qv1-xf2c.google.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.9 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --fYzImzj8SfRAFUen Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Severity: wishlist There are times, when the expression is too simple or will not be used too often to justify the extra time in -P that is required for JIT compilation. Make it simpler for users to pass flags to the PCRE backend, and start with a flag to disable JIT (enabled by default) --fYzImzj8SfRAFUen Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline; filename="0001-pcre-add-a-flag-to-disable-JIT.patch" Content-Transfer-Encoding: 8bit >From caeca5e806fe1b2e368833f05bb4cfb75763d1b3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Sat, 16 Oct 2021 01:38:11 -0700 Subject: [PATCH] pcre: add a flag to disable JIT MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mainly useful for performance testing. Signed-off-by: Carlo Marcelo Arenas Belón --- doc/grep.texi | 4 +++- src/grep.c | 13 +++++++++++-- src/grep.h | 1 + src/pcresearch.c | 6 ++++-- tests/Makefile.am | 1 + tests/pcre-nojit | 22 ++++++++++++++++++++++ 6 files changed, 42 insertions(+), 5 deletions(-) create mode 100755 tests/pcre-nojit diff --git a/doc/grep.texi b/doc/grep.texi index e5b9fd8..8fb41ac 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -1138,7 +1138,7 @@ Interpret patterns as fixed strings, not regular expressions. (@option{-F} is specified by POSIX.) @item -P -@itemx --perl-regexp +@itemx --perl-regexp[=@var{FLAG}] @opindex -P @opindex --perl-regexp @cindex matching Perl-compatible regular expressions @@ -1146,6 +1146,8 @@ Interpret patterns as Perl-compatible regular expressions (PCREs). PCRE support is here to stay, but consider this option experimental when combined with the @option{-z} (@option{--null-data}) option, and note that @samp{grep@ -P} may warn of unimplemented features. +The optional flag 'no-jit' could be used to disable JIT, and only use the +slower PCRE's interpreter. @xref{Other Options}. @end table diff --git a/src/grep.c b/src/grep.c index a55194c..44e21b7 100644 --- a/src/grep.c +++ b/src/grep.c @@ -508,7 +508,7 @@ static struct option const long_options[] = {"extended-regexp", no_argument, NULL, 'E'}, {"fixed-regexp", no_argument, NULL, 'F'}, {"fixed-strings", no_argument, NULL, 'F'}, - {"perl-regexp", no_argument, NULL, 'P'}, + {"perl-regexp", optional_argument, NULL, 'P'}, {"after-context", required_argument, NULL, 'A'}, {"before-context", required_argument, NULL, 'B'}, {"binary-files", required_argument, NULL, BINARY_FILES_OPTION}, @@ -563,6 +563,7 @@ bool match_icase; bool match_words; bool match_lines; char eolbyte; +bool pcre_jit = true; /* For error messages. */ /* The input file name, or (if standard input) null or a --label argument. */ @@ -1987,7 +1988,8 @@ Pattern selection and interpretation:\n"), getprogname ()); -E, --extended-regexp PATTERNS are extended regular expressions\n\ -F, --fixed-strings PATTERNS are strings\n\ -G, --basic-regexp PATTERNS are basic regular expressions\n\ - -P, --perl-regexp PATTERNS are Perl regular expressions\n")); + -P, --perl-regexp[=FLAG] PATTERNS are Perl regular expressions\n\ + FLAG is 'no-jit' (JIT enabled by default)\n")); /* -X is deliberately undocumented. */ printf (_("\ -e, --regexp=PATTERNS use PATTERNS for matching\n\ @@ -2545,6 +2547,13 @@ main (int argc, char **argv) case 'P': matcher = setmatcher ("perl", matcher); + if (optarg) + { + if (STREQ (optarg, "no-jit")) + pcre_jit = false; + else + die (EXIT_TROUBLE, 0, _("unknown PCRE flag")); + } break; case 'G': diff --git a/src/grep.h b/src/grep.h index 04c15dd..263e98c 100644 --- a/src/grep.h +++ b/src/grep.h @@ -29,6 +29,7 @@ extern bool match_icase; /* -i */ extern bool match_words; /* -w */ extern bool match_lines; /* -x */ extern char eolbyte; /* -z */ +extern bool pcre_jit; /* --perl-regexp=no-jit */ extern char const *pattern_file_name (idx_t, idx_t *); diff --git a/src/pcresearch.c b/src/pcresearch.c index 09f92c8..988d753 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -180,7 +180,9 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact) if (!pc->cre) die (EXIT_TROUBLE, 0, "%s", ep); - int pcre_study_flags = PCRE_STUDY_EXTRA_NEEDED | PCRE_STUDY_JIT_COMPILE; + int pcre_study_flags = PCRE_STUDY_EXTRA_NEEDED; + if (pcre_jit) + pcre_study_flags |= PCRE_STUDY_JIT_COMPILE; pc->extra = pcre_study (pc->cre, pcre_study_flags, &ep); if (ep) die (EXIT_TROUBLE, 0, "%s", ep); @@ -191,7 +193,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact) /* The PCRE documentation says that a 32 KiB stack is the default. */ if (e) - pc->jit_stack_size = 32 << 10; + pc->jit_stack_size = (pcre_jit) ? 32 << 10 : 0; #endif free (re); diff --git a/tests/Makefile.am b/tests/Makefile.am index c84cdc0..cd83e00 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -135,6 +135,7 @@ TESTS = \ null-byte \ options \ pcre \ + pcre-nojit \ pcre-abort \ pcre-context \ pcre-count \ diff --git a/tests/pcre-nojit b/tests/pcre-nojit new file mode 100755 index 0000000..e752f33 --- /dev/null +++ b/tests/pcre-nojit @@ -0,0 +1,22 @@ +#! /bin/sh +# Simple PCRE tests with JIT disabled. +# +# Copyright (C) 2001, 2006, 2009-2021 Free Software Foundation, Inc. +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notice and this notice are preserved. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_pcre_ + +fail=0 + +echo | grep --perl-regex=no-jit '\s*$' || fail=1 +echo | grep -z --perl-regex=no-jit '\s$' || fail=1 +echo '.ab' | returns_ 1 grep --perl-regex=no-jit -wx ab || fail=1 +echo x | grep --perl-regex=no-jit -z '[^a]' || fail=1 +printf 'x\n\0' | returns_ 1 grep -z --perl-regex=no-jit 'x$' || fail=1 +printf 'a\nb\0' | grep -zx --perl-regex=no-jit a && fail=1 + +Exit $fail -- 2.34.0.rc1.349.g8f33748433 --fYzImzj8SfRAFUen-- From unknown Sun Jun 15 13:01:21 2025 X-Loop: help-debbugs@gnu.org Subject: bug#51727: add an optional flag to -P to disable JIT Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 10 Nov 2021 00:42:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 51727 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Cc: 51727@debbugs.gnu.org Received: via spool by 51727-submit@debbugs.gnu.org id=B51727.163650486430408 (code B ref 51727); Wed, 10 Nov 2021 00:42:01 +0000 Received: (at 51727) by debbugs.gnu.org; 10 Nov 2021 00:41:04 +0000 Received: from localhost ([127.0.0.1]:36340 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mkbfk-0007uO-IE for submit@debbugs.gnu.org; Tue, 09 Nov 2021 19:41:04 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:41410) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mkbfj-0007tj-AI for 51727@debbugs.gnu.org; Tue, 09 Nov 2021 19:41:03 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CAF281600CD; Tue, 9 Nov 2021 16:40:57 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 6AZWfuv-gMYT; Tue, 9 Nov 2021 16:40:57 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 252741600F6; Tue, 9 Nov 2021 16:40:57 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id b8HR9UpQ4tsx; Tue, 9 Nov 2021 16:40:57 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 0C56F1600CD; Tue, 9 Nov 2021 16:40:57 -0800 (PST) Message-ID: Date: Tue, 9 Nov 2021 16:40:56 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Content-Language: en-US References: From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.4 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.4 (---) On 11/9/21 11:04, Carlo Marcelo Arenas Bel=C3=B3n wrote: > Severity: wishlist >=20 > There are times, when the expression is too simple or will not be used = too > often to justify the extra time in -P that is required for JIT compilat= ion. How much extra time are we talking about? I would expect users would=20 bother thinking about this flag only when it significantly helps=20 performance, which sorta implies large inputs, which means the=20 expression will be used often, which means users won't want to use nojit. Anyway, if we do this sort of thing I suggest waiting for PCRE2 and=20 doing it then. Also, our flags should use the same spelling as PCRE2=20 (which suggests that we use 'no-jit' rather than 'nojit'). And what=20 other PCRE2 flags might users want to specify? From unknown Sun Jun 15 13:01:21 2025 X-Loop: help-debbugs@gnu.org Subject: bug#51727: add an optional flag to -P to disable JIT Resent-From: Carlo Arenas Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 10 Nov 2021 12:12:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 51727 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Paul Eggert Cc: 51727@debbugs.gnu.org Received: via spool by 51727-submit@debbugs.gnu.org id=B51727.163654628131075 (code B ref 51727); Wed, 10 Nov 2021 12:12:02 +0000 Received: (at 51727) by debbugs.gnu.org; 10 Nov 2021 12:11:21 +0000 Received: from localhost ([127.0.0.1]:37157 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mkmRl-000859-Bp for submit@debbugs.gnu.org; Wed, 10 Nov 2021 07:11:21 -0500 Received: from mail-ua1-f48.google.com ([209.85.222.48]:33351) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mkmRh-00084u-Jc for 51727@debbugs.gnu.org; Wed, 10 Nov 2021 07:11:20 -0500 Received: by mail-ua1-f48.google.com with SMTP id b17so4347207uas.0 for <51727@debbugs.gnu.org>; Wed, 10 Nov 2021 04:11:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ThWiv+OkjLoh2Uv5eYI3lY08ihgaGSRN6rzCNS95J9s=; b=lZRCEzyBSEZ1eo9KxYkDTqBPH8fdEWje5rmjV/KE9hBDJ8p/FhL6YG2S6xAEXfapLE X7ar2XTyuZjsLrOlcVaavMaXyBtdXJpIhEzXF7kiLMbvOo4cmWsNOIO6aG5QbnLN9H63 q2CNVuctPfK7Kz5u4nIM5X8IH+ORyCw7gfer2kQaPWE97PBIGCW7azKVplE/A9Dhtxz7 +0P4/Cw5ZKvCiRUUKoRgnxtmv1Ql9a7taWwv/rFq9G8pPEQ+Ck+/356C9srOIm8edB2r gIHMmzp0aSoscKdCPQltjcva4FooQCBlYqgUoeOLm/Y73yknf5GTffNAY/LagfqimZ1U SIZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ThWiv+OkjLoh2Uv5eYI3lY08ihgaGSRN6rzCNS95J9s=; b=uRBgpEonGiTcjzrAEt9/aoPqzWuaPFedsZoDruZpuXjsZaNuMkyxzVlavYCHkQY6/V 8NoMpKMl28rSkhr1q42FunpOX/rbvx/NQyIaSQzULEIACiD4XYLzweIUWP8K799DSwtZ u53R3Uu0JqWg/lZpTHub0f25h+ihHrhBvfBKF7z+PZYtpfiuFAS6OOZwuNQ4meKHT9Ib 6dpH5g5Ra+AEUi/Hw/d8dFiAwKZXglydS/jQeh9Orn+n2q6IbFiHCIodOMOIqE+FI6M8 V90O5fcKPodtZmusm+l4hWO4oeSHkcFlsfX0c6QJ89VoWeF31Riu1IjEpH7+26hqo7kB ZXsQ== X-Gm-Message-State: AOAM531ZE0iAzxEfdUt6Mp7kQ0iEM8J2J8WBkmwQn9CI+oDf/niz9snT I2faCgbl+Vr10zXjc5eP5CynUcRGIBBYaZwNUeiTvMAB X-Google-Smtp-Source: ABdhPJxn8+I4FMAg3MeAuCJWNFpSWmC6IdfO6HwSs+bB4uyZPEnnXcdsYk3S7o4Ds8g2lFlZyIRYPFOLg4uIApMUr98= X-Received: by 2002:ab0:7c65:: with SMTP id h5mr21460556uax.138.1636546271999; Wed, 10 Nov 2021 04:11:11 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Carlo Arenas Date: Wed, 10 Nov 2021 04:11:00 -0800 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Tue, Nov 9, 2021 at 4:40 PM Paul Eggert wrote: > > On 11/9/21 11:04, Carlo Marcelo Arenas Bel=C3=B3n wrote: > > Severity: wishlist > > > > There are times, when the expression is too simple or will not be used = too > > often to justify the extra time in -P that is required for JIT compilat= ion. > > How much extra time are we talking about? I would expect users would > bother thinking about this flag only when it significantly helps > performance, which sorta implies large inputs, which means the > expression will be used often, which means users won't want to use nojit. good point; I have to admit its main use, for me at least, was to actually see how much time I am saving by using jit, and to avoid hitting buggy jit code paths I might have introduced myself ;), which is what the commit message kind of implies. The main point though, was to allow the user the flexibility to decide for themselves, from any option, while keeping a reasonable default (which is why keeping JIT enabled by default was done here as well). > Anyway, if we do this sort of thing I suggest waiting for PCRE2 and > doing it then. Also, our flags should use the same spelling as PCRE2 > (which suggests that we use 'no-jit' rather than 'nojit'). fair enough, I will make sure to use no-jit if I didn't already. > And what > other PCRE2 flags might users want to specify? a couple that I think might be interesting, is to enable extended mode, so that users that have really complex expressions can write them as multiline strings with comments, and one to tell PCRE to skip the UTF-8 validation if your content is know to be safe, which could increase performance by more than ~10% if I recall correctly from what we did in git (which was more aggressive as well and probably didn't get the results I would expect from grep) Carlo From unknown Sun Jun 15 13:01:21 2025 X-Loop: help-debbugs@gnu.org Subject: bug#51727: add an optional flag to -P to disable JIT Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 12 Nov 2021 23:28:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 51727 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Carlo Arenas Cc: 51727@debbugs.gnu.org Received: via spool by 51727-submit@debbugs.gnu.org id=B51727.16367596804975 (code B ref 51727); Fri, 12 Nov 2021 23:28:02 +0000 Received: (at 51727) by debbugs.gnu.org; 12 Nov 2021 23:28:00 +0000 Received: from localhost ([127.0.0.1]:46184 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mlfxg-0001IB-Gj for submit@debbugs.gnu.org; Fri, 12 Nov 2021 18:28:00 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46712) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mlfxc-0001Hs-T3 for 51727@debbugs.gnu.org; Fri, 12 Nov 2021 18:27:59 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 8AEE2160100; Fri, 12 Nov 2021 15:27:50 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id wLDEvL9n7YED; Fri, 12 Nov 2021 15:27:49 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 67459160108; Fri, 12 Nov 2021 15:27:49 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id DujaugxeeuDd; Fri, 12 Nov 2021 15:27:49 -0800 (PST) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 4105F160100; Fri, 12 Nov 2021 15:27:49 -0800 (PST) Message-ID: <067ef28e-970c-1d78-3a9b-4a0e12ddeacd@cs.ucla.edu> Date: Fri, 12 Nov 2021 15:27:48 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 Content-Language: en-US References: From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.4 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.4 (---) On 11/10/21 04:11, Carlo Arenas wrote: > On Tue, Nov 9, 2021 at 4:40 PM Paul Eggert wrote: > its main use, for me at least, was to > actually see how much time I am saving by using jit, and to avoid > hitting buggy jit code paths I might have introduced myself ;), which > is what the commit message kind of implies. This sounds esoteric enough that we needn't support/document it for 'grep' users. (Their lives are complicated enough already....) > enable extended > mode, so that users that have really complex expressions can write > them as multiline strings with comments, This wouldn't be just a PCRE thing; even ordinary BREs and EREs could benefit from being able to have multiline regexps, in which a newline means "|". The GNU regular expression compiler supports this. Also, we'd need to disable grep's ordinary activity of reordering and removing duplicates in regular-expression patterns (in the current syntax where a newline always starts a new pattern). So I expect this wouldn't be a new -P suboption; it'd be a more-general option that applies to all regexp syntax options (though it'd be a no-op for -F). > and one to tell PCRE to skip > the UTF-8 validation if your content is know to be safe ... and let grep dump core (or worse) otherwise? I'm not sure I would like to to head in that direction....