From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 03 17:49:52 2023 Received: (at submit) by debbugs.gnu.org; 3 Apr 2023 21:49:53 +0000 Received: from localhost ([127.0.0.1]:45446 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pjS3k-0008EE-Kz for submit@debbugs.gnu.org; Mon, 03 Apr 2023 17:49:52 -0400 Received: from lists.gnu.org ([209.51.188.17]:50168) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pjS3i-0008E4-33 for submit@debbugs.gnu.org; Mon, 03 Apr 2023 17:49:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pjS3c-0001Fg-FT for bug-grep@gnu.org; Mon, 03 Apr 2023 17:49:46 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pjS2U-0005Sx-7b for bug-grep@gnu.org; Mon, 03 Apr 2023 17:49:43 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id D88F93C097AFC for ; Mon, 3 Apr 2023 14:48:31 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id VFMeTIYy8yAA for ; Mon, 3 Apr 2023 14:48:30 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id C02273C097AFD for ; Mon, 3 Apr 2023 14:48:30 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu C02273C097AFD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1680558510; bh=JIM1SP0aPXD1kWNw1BVtNa8adJzGQfL9nuqQvpEFmsM=; h=Message-ID:Date:MIME-Version:To:From; b=ai6L8Sfg/opkBUUoDSxRrBseV/Pm1bl04rdqV8qmVjeA862gHvcaXu5ALMN1J+Hsg p/+SxYnrQtVOtCKe+Ka8XKkZ+iptHpI+kfvc3ICNe7H3zsQwMDjnSRmcaDXFTp895W 7Ce4lqp3pOcIwnmBvLAw1kptzjfKJez+iMPGma5pNczeoyxdxBYSPiQBOw6po3ykS3 7NvAD7Kkv3AoKXc8/bZqKHSLjCEO8kAK4+Axe7OgsN4bem7xmoI+Vja+oJBVEsj6xf NcTeC1bIa5bumesm7paGgXSUkEtvGjgW+izmnFDZ6b0hCnUARHpiTRqJUzIVnxkDLw zEO+//VbUr/Vg== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Dmndb4AMDR74 for ; Mon, 3 Apr 2023 14:48:30 -0700 (PDT) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id AA98A3C097AFC for ; Mon, 3 Apr 2023 14:48:30 -0700 (PDT) Message-ID: Date: Mon, 3 Apr 2023 14:48:30 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Content-Language: en-US To: bug-grep@gnu.org From: Paul Eggert Subject: PCRE2-related workarounds that GNU grep might need Organization: UCLA Computer Science Department Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=131.179.128.66; envelope-from=eggert@cs.ucla.edu; helo=mail.cs.ucla.edu X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) Recent commits in Git do the following to work around bugs in PCRE2. Quite possibly GNU grep -P should do the same, when in a UTF-8 locale. * Disable PCRE2_UCP unless PCRE2 10.35 or higher. * If ignoring case and PCRE2_MATCH_INVALID_UTF is defined, then enable PCRE2_NO_START_OPTIMIZE unless PCRE2 10.36 or higher. From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 04 02:17:38 2023 Received: (at 62657) by debbugs.gnu.org; 4 Apr 2023 06:17:38 +0000 Received: from localhost ([127.0.0.1]:46336 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pjZz8-0004FT-2s for submit@debbugs.gnu.org; Tue, 04 Apr 2023 02:17:38 -0400 Received: from mail-wm1-f49.google.com ([209.85.128.49]:52020) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pjZz5-0004F8-TA for 62657@debbugs.gnu.org; Tue, 04 Apr 2023 02:17:36 -0400 Received: by mail-wm1-f49.google.com with SMTP id o32so18291020wms.1 for <62657@debbugs.gnu.org>; Mon, 03 Apr 2023 23:17:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680589050; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7CL079y7BqyWAZHz+16DipNHMe81+5TWOTsc+a6fNG8=; b=IAM5cvpuzKJOrov7eyYOVZM56cv/bwt3xu1ltWPBHDiEoHViQ9H8LYmKbb+fkP3Gom 3pk9Ym+FDazNwD/6w00+0PXPbbUK6u3dypKqKo3ND/RuO61C4OCi97ecI7AK5f9Indvs dW2CXihw+/1pPBdSZmOewhYn8c7CRj6xPLc/s8j04MPxNfCeDycwQ2hpG1Cr0CHKAFWe 1pMgSk09LFon0YDzDKwG2YQ7yhLYQO3fXDhLwbdyR752Q0TaxQypmGUYhRx8Gyb8bWsL pfpKudlJ163/7CB5fIOBVSzy27/qzVa/fmJ3G81yW5B8TZQY5vpKL4O+avC1JF88D7e4 ReRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680589050; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7CL079y7BqyWAZHz+16DipNHMe81+5TWOTsc+a6fNG8=; b=RDCyHTc0BBwnbHb8WLK95lJZeItdSbmEamFP2AGhrysjsRvUcVozpWqsr1HPUVzF5H +cpitADW2yqm4ksrMjgLv5kRBzZcJXDF742AJX77FeOqom0f7JgfWGTbNm/P3PzVrD8q pyJFInPylv0w2m6huOIIG3pd4HrUbIwvbPsYQMslQhpqGnklI/XNIGvGA9BEoM/CnnGM FRNAFtYhbexAj5f24A9RiihARudRTwfOUB5/2/e/hSkRPJOxv17YoQ1mjSlUNl4zEWR2 4Be7vtZZl6jgECi8zDOVnJIUXAH6gx2iX1K+z5f6mWf57oZGBDv4AXBLo4ssCfpqDbRO ds9w== X-Gm-Message-State: AAQBX9e9Ir3S1cly42nKp5KkCs9+FuQMdCJBHaA5fo5fqB3pSpk0HWXO BB+n+Yxt20fm/EHzPk9+g6FnJrLGVuLDlgrTPqo= X-Google-Smtp-Source: AKy350Ya/CFwjKXoUFDzbEJ/+hTcUGcYJVnE6nHBDAVE+De4SuGGGedTcWE9+IDZXK2bdQI9E4g29CMgnNQrp58A5fA= X-Received: by 2002:a05:600c:218d:b0:3ee:9c6d:832a with SMTP id e13-20020a05600c218d00b003ee9c6d832amr437510wme.4.1680589049900; Mon, 03 Apr 2023 23:17:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Carlo Arenas Date: Mon, 3 Apr 2023 23:17:18 -0700 Message-ID: Subject: Re: bug#62657: PCRE2-related workarounds that GNU grep might need To: Paul Eggert Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 62657 Cc: 62657@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Mon, Apr 3, 2023 at 2:50=E2=80=AFPM Paul Eggert wro= te: > > * Disable PCRE2_UCP unless PCRE2 10.35 or higher. this is because of a bug in JIT, alternatively JIT could be disabled > * If ignoring case and PCRE2_MATCH_INVALID_UTF is defined, then > enable PCRE2_NO_START_OPTIMIZE unless PCRE2 10.36 or higher. this one is only triggered when PCRE2_MULTILINE is used, which is not the case for GNU grep From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 04 02:23:29 2023 Received: (at 62657) by debbugs.gnu.org; 4 Apr 2023 06:23:29 +0000 Received: from localhost ([127.0.0.1]:46341 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pja4m-0004SZ-RW for submit@debbugs.gnu.org; Tue, 04 Apr 2023 02:23:29 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:44960) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pja4k-0004SC-UI for 62657@debbugs.gnu.org; Tue, 04 Apr 2023 02:23:27 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id E93E93C09FA01; Mon, 3 Apr 2023 23:23:20 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id mRQUw3bbY-lO; Mon, 3 Apr 2023 23:23:20 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id A534B3C09FA05; Mon, 3 Apr 2023 23:23:20 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu A534B3C09FA05 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1680589400; bh=LZ5WPoTJoPTWhsxpxX34xuIrX+ygfC/2j2pRMo66XaM=; h=Message-ID:Date:MIME-Version:To:From; b=gd0rPIXbEKes/AVwGvbQ03UKtRX2cyMLBELixolw/UZbDYd+7hsFN5ASsrmrQ41pl dwcvBu9T7rJWX5QfAcfEoLMNfh0oCYWvFVD5PA6RawiouHgjwpQ1wYImElypHfIh+/ GE2kQCeTMN5pAtggM4wNbkhaC6X6Zh53tC1s9KljpfkUJDE4gK6+SuA7RfD5b5d7yP /tiHNwrJcAGYO1SUvooXQxkt3fIV0Nfc9zx9lhsHWFbYoFf4WusIhvgO4pbXoYjNlE dsGGcbEnvfWw730C1iysD/3Hlma4jMYkAsq5Iq00PKBU39VbwN4/9VKSMHwM+qnmGY zYOHs+0D6a50Q== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id m7bddKR0LioM; Mon, 3 Apr 2023 23:23:20 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id 836933C09FA01; Mon, 3 Apr 2023 23:23:20 -0700 (PDT) Message-ID: Date: Mon, 3 Apr 2023 23:23:20 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: bug#62657: PCRE2-related workarounds that GNU grep might need Content-Language: en-US To: Carlo Arenas References: From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: 62657 Cc: 62657@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.1 (--) On 2023-04-03 23:17, Carlo Arenas wrote: > On Mon, Apr 3, 2023 at 2:50=E2=80=AFPM Paul Eggert = wrote: >> >> * Disable PCRE2_UCP unless PCRE2 10.35 or higher. >=20 > this is because of a bug in JIT, alternatively JIT could be disabled Oh, that might be better as it doesn't affect behavior (just performance)= . >> * If ignoring case and PCRE2_MATCH_INVALID_UTF is defined, then >> enable PCRE2_NO_START_OPTIMIZE unless PCRE2 10.36 or higher. >=20 > this one is only triggered when PCRE2_MULTILINE is used, which is not > the case for GNU grep Thanks for letting us know. From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 04 03:34:39 2023 Received: (at 62657) by debbugs.gnu.org; 4 Apr 2023 07:34:40 +0000 Received: from localhost ([127.0.0.1]:46471 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pjbBf-0006tK-Cz for submit@debbugs.gnu.org; Tue, 04 Apr 2023 03:34:39 -0400 Received: from mail-wr1-f50.google.com ([209.85.221.50]:39890) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pjbBd-0006t4-NU for 62657@debbugs.gnu.org; Tue, 04 Apr 2023 03:34:38 -0400 Received: by mail-wr1-f50.google.com with SMTP id m2so31721575wrh.6 for <62657@debbugs.gnu.org>; Tue, 04 Apr 2023 00:34:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680593672; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ny+7sFoGf3ZrTRNRWS7+Xmwrs3TYb++5yMm1CfHs4w4=; b=mzCClCEDYZIiU9Dxd6528Oo9MU41dQ8571/MwQL0KFeqgDNOORgQzczhQUFgzIKQ+3 sZIdDXrx1D3fY2vq1aXwtWwC8JoAtZIN5Ilw+fEyNX1hWjnfnDn9JD45FD++ukkBaVAK wJuSgJ9deRdA0+DszlFvSlGDX3XqoSIhQsIENFdNkOLen55/UnIj1FuyxGUCgI9O6uG/ 0bEJhsfokf3DAP/n+0XSDEkg+pmYSMfEgfsF6F4Z760PUkZakazZLNwu+sGGJxxXPxkK wAO34OB6+dFomhXUhIrc2Acpkad3Z/vVqprsprm2V+F4/pTyQWvyQiMFJSn/ljnRm8zg R2LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680593672; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ny+7sFoGf3ZrTRNRWS7+Xmwrs3TYb++5yMm1CfHs4w4=; b=oubRAW/ureND+8SNQs5crVtPh4cqo83rt1PZla6Rxi70IAYtf2n2cBjfTPAPTydzAb SgVr9qXx65XsQbVr7rkqH/0fxs3OHXoM5Qn4UCZWNKJEVKEYjCl1Kl5Fn5pNnNgPTMNC QeLu4Vr/aWfcuzHrsYJNZdYVYbk2nh0kq/WFl6KjeTNz6df9F+jwJLvD5GlGYkV/Gq8r 1OO954Nf1gxnyztEFpRS3KlTxeIhL/hnDsxIsESx9dNTn2WpFaU81+++fnK5alqBTQmD geMyIRqVKliDvJYlvnnRkuPdW/PQKjouYCiuOPGxgC7IWgWlQX/foU1hEMZj4oNTUGvZ PZyw== X-Gm-Message-State: AAQBX9dukNnWX6NyPOqf5KZbDL8C89UAMA/tYi/pKX+WXL/iYFCRbFR1 7ETVtiaukvVcNOxt/13HRM670OWTkP2x/pGLUrM= X-Google-Smtp-Source: AKy350Yb19oRsNeoWiFzcwGO/JANyAu5yc5UO57lYvPy8lwd4yi1vt68z1RHQidY4nyGEGED8JOr1u8LeABMSZV4Q8Y= X-Received: by 2002:adf:e30e:0:b0:2e4:cbfe:da50 with SMTP id b14-20020adfe30e000000b002e4cbfeda50mr236981wrj.1.1680593671673; Tue, 04 Apr 2023 00:34:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Carlo Arenas Date: Tue, 4 Apr 2023 00:34:20 -0700 Message-ID: Subject: Re: bug#62657: PCRE2-related workarounds that GNU grep might need To: Paul Eggert Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 62657 Cc: 62657@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Mon, Apr 3, 2023 at 11:23=E2=80=AFPM Paul Eggert wr= ote: > > On 2023-04-03 23:17, Carlo Arenas wrote: > > On Mon, Apr 3, 2023 at 2:50=E2=80=AFPM Paul Eggert = wrote: > >> > >> * Disable PCRE2_UCP unless PCRE2 10.35 or higher. > > > > this is because of a bug in JIT, alternatively JIT could be disabled > > Oh, that might be better as it doesn't affect behavior (just performance)= . Also, unlike `git`; GNU grep doesn't use the fastpath JIT API and skip UTF validation, so this crash can only be triggered in 10.34 and not older versions, even with JIT and PCRE2_UCP enabled. Carlo