From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 21 Apr 2023 02:05:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 62983@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.168204266826200 (code B ref -1); Fri, 21 Apr 2023 02:05:01 +0000 Received: (at submit) by debbugs.gnu.org; 21 Apr 2023 02:04:28 +0000 Received: from localhost ([127.0.0.1]:38963 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppg8R-0006oV-CY for submit@debbugs.gnu.org; Thu, 20 Apr 2023 22:04:27 -0400 Received: from lists.gnu.org ([209.51.188.17]:39918) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppg8P-0006oO-Uz for submit@debbugs.gnu.org; Thu, 20 Apr 2023 22:04:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ppg8O-0000c8-Uw for bug-grep@gnu.org; Thu, 20 Apr 2023 22:04:24 -0400 Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ppg8M-0004Gj-VD for bug-grep@gnu.org; Thu, 20 Apr 2023 22:04:24 -0400 Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1a814fe0ddeso19107875ad.2 for ; Thu, 20 Apr 2023 19:04:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682042660; x=1684634660; h=content-transfer-encoding:content-disposition:mime-version :message-id:subject:to:from:date:from:to:cc:subject:date:message-id :reply-to; bh=TYmXZJr7ra4gvL5q34caCGe9Ko78VviEasW23gF3cOU=; b=e0gU88Rzsav58gCbIhy58wLnYWM29+7DVrkdAfg0PxhfRWwzN50UdjJ9fDellXTikV 7Z3Oxc/CUB1mMr3TABji7Ylrphuud56ULpVpCY6IUSPr4z7KQ2lDh8FYyAa1otaPute9 uzDzg46jCBMw4UNEZqYcgX3EkIF2cEPWNIxjiNyHthQa9YSMptrRPmnNBtx+RQQKC7R4 WmqAmO3kUJv0NcjqE2Lzcb/x5fejIn2XINoQFh2CJGk0QCdbN87yAX/yWPAcuDig63ls RsU7wptbwMuJbv29V3JQclW/kEwoiLizf9FArAZvjVaWWTHxd5yQAZZYvHo00cV5yoQ9 JYsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682042660; x=1684634660; h=content-transfer-encoding:content-disposition:mime-version :message-id:subject:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TYmXZJr7ra4gvL5q34caCGe9Ko78VviEasW23gF3cOU=; b=R6Y/NUml57IdfunVY403SvGu0UpN/D+H57q5gPmNu3FFJ58Go/yF43ay99bLcCSmeq qCr5wfZ3vNn0S8SAgOvhR9vg9mBb0MYnlc/4XicmBWinIWV+DDaf8T8dJF2zwXifBKHZ 58V6M8o7YRMoYilvbPcmGLq0qNgN33BFNS/+iCIniq4CMaNG1bqZHEZ079dAptF7KVgk h+4EYyHINeenVNl+XeLPveLUn/3bgp6WQS3oVcTNrP9ImD7fN234Lue4hNrf4Snm3zED RFGllB6pD+qxv3XEyyU9GHO4Z8EMcj6w/nVuATha7EwIOzS1o80ax1aN7KPxeBznFOK/ or9Q== X-Gm-Message-State: AAQBX9dXFYxBAxDOJd31MKmzzln7lZB2x1LQK5twzup6p9gMbCSe1nQB cGlWeIJ69JLXvpQNc1wPse/vtFlIsk4= X-Google-Smtp-Source: AKy350YfKBDo/K13bJdY9wigjpw6JPkXR+fSx3/MnPKy9TtaSe2nuuMbdeu/dLv0XYNcIE2UwAVVmg== X-Received: by 2002:a17:903:1ce:b0:1a2:a8d0:838e with SMTP id e14-20020a17090301ce00b001a2a8d0838emr3165077plh.61.1682042660034; Thu, 20 Apr 2023 19:04:20 -0700 (PDT) Received: from Carlos-MacBook-Pro-2.local (192-184-219-167.fiber.dynamic.sonic.net. [192.184.219.167]) by smtp.gmail.com with ESMTPSA id c23-20020a170902849700b001a05122b562sm1683684plo.286.2023.04.20.19.04.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Apr 2023 19:04:19 -0700 (PDT) Date: Thu, 20 Apr 2023 19:04:18 -0700 From: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="h7otyrncashcyzty" Content-Disposition: inline Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::62b; envelope-from=carenas@gmail.com; helo=mail-pl1-x62b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --h7otyrncashcyzty Content-Type: text/plain; charset=us-ascii Content-Disposition: inline All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on its JIT implementation that results in failure to match for the negative perl classes, and seems to be easier to replicate when the matching character is a multibyte one. Disable that flag and use the original fallback instead. Alternatively JIT could be disabled instead, but the option selected has less of an impact on performance. Carlo --h7otyrncashcyzty Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="0001-pcre-workaround-bug-affecting-W-or-D.patch" Content-Transfer-Encoding: 8bit >From 9194c8e9f9ca7315c2e8c25a7986d0690fb31d7c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Thu, 20 Apr 2023 18:37:20 -0700 Subject: [PATCH] pcre: workaround bug affecting \W or \D PCRE2 has a bug when using PCRE2_MATCH_INVALID_UTF that would randomly fail to match patterns using \W or \D. * NEWS: mention this * src/pcre2search.c: not use the problematic flag in all broken versions of PCRE2 * tests: add new pcre2-utf-bug224 test --- NEWS | 5 +++++ src/pcresearch.c | 23 ++++++++++++++--------- tests/Makefile.am | 1 + tests/pcre-utf8-bug224 | 31 +++++++++++++++++++++++++++++++ 4 files changed, 51 insertions(+), 9 deletions(-) create mode 100755 tests/pcre-utf8-bug224 diff --git a/NEWS b/NEWS index f16c576..8e371dc 100644 --- a/NEWS +++ b/NEWS @@ -15,6 +15,11 @@ GNU grep NEWS -*- outline -*- when running on 32-bit x86 and ARM hosts using glibc 2.34+. [bug introduced in grep 3.9] + grep no longer fails to match patterns with \D or \W when linked to + PCRE2 10.34 or newer. + [bug introduced in grep 3.8] + + ** Changes in behavior grep --version now prints a line describing the version of PCRE2 it uses. diff --git a/src/pcresearch.c b/src/pcresearch.c index 1f82932..6ef0d2e 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -58,6 +58,9 @@ struct pcre_comp /* Table, indexed by ! (flag & PCRE2_NOTBOL), of whether the empty string matches when that flag is used. */ int empty_match[2]; + + /* Flags */ + unsigned binary_safe:1; }; /* Memory allocation functions for PCRE. */ @@ -130,16 +133,11 @@ jit_exec (struct pcre_comp *pc, char const *subject, idx_t search_bytes, } } -/* Return true if E is an error code for bad UTF-8, and if pcre2_match - could return E because PCRE lacks PCRE2_MATCH_INVALID_UTF. */ +/* Return true if E is an error code for bad UTF-8 */ static bool bad_utf8_from_pcre2 (int e) { -#ifdef PCRE2_MATCH_INVALID_UTF - return false; -#else return PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1; -#endif } /* Compile the -P style PATTERN, containing SIZE bytes that are @@ -157,6 +155,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact) = pcre2_general_context_create (private_malloc, private_free, NULL); pcre2_compile_context *ccontext = pcre2_compile_context_create (gcontext); + pc->binary_safe = false; if (localeinfo.multibyte) { uint32_t unicode; @@ -181,8 +180,14 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact) flags |= PCRE2_NEVER_BACKSLASH_C; #endif #ifdef PCRE2_MATCH_INVALID_UTF - /* Consider invalid UTF-8 as a barrier, instead of error. */ - flags |= PCRE2_MATCH_INVALID_UTF; + /* workaround PCRE2 bug + https://github.com/PCRE2Project/pcre2/issues/224 */ +#if PCRE2_MAJOR == 10 && PCRE2_MINOR <= 42 + pc->binary_safe = !strstr (pattern, "\\D") && !strstr (pattern, "\\W"); + if (pc->binary_safe) + /* Consider invalid UTF-8 as a barrier, instead of error. */ + flags |= PCRE2_MATCH_INVALID_UTF; +#endif #endif } @@ -313,7 +318,7 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size, e = jit_exec (pc, subject, line_end - subject, search_offset, options); - if (!bad_utf8_from_pcre2 (e)) + if (pc->binary_safe || !bad_utf8_from_pcre2 (e)) break; idx_t valid_bytes = pcre2_get_startchar (pc->data); diff --git a/tests/Makefile.am b/tests/Makefile.am index 7718f24..9b4422e 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -155,6 +155,7 @@ TESTS = \ pcre-jitstack \ pcre-o \ pcre-utf8 \ + pcre-utf8-bug224 \ pcre-utf8-w \ pcre-w \ pcre-wx-backref \ diff --git a/tests/pcre-utf8-bug224 b/tests/pcre-utf8-bug224 new file mode 100755 index 0000000..739e7b5 --- /dev/null +++ b/tests/pcre-utf8-bug224 @@ -0,0 +1,31 @@ +#!/bin/sh +# Ensure \D and \W matches multibyte characters in UTF mode +# +# Copyright (C) 2023 Free Software Foundation, Inc. +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notice and this notice are preserved. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_en_utf8_locale_ +LC_ALL=en_US.UTF-8 +export LC_ALL +require_pcre_ + +echo . | grep -qP '(*UTF).' 2>/dev/null \ + || skip_ 'PCRE unicode support is compiled out' + +fail=0 + +# 'ñ' (U+00F1) +printf '\302\221\n' > in || framework_failure_ +grep -P '\D' in > out || fail=1 +compare in out || fail=1 + +# “𝄞” (U+1D11E) +printf '\360\235\204\236\n' > in || framework_failure_ +grep -P '\W' in > out || fail=1 +compare in out || fail=1 + +Exit $fail -- 2.39.2 (Apple Git-143) --h7otyrncashcyzty-- From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 21 Apr 2023 02:34:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Cc: 62983@debbugs.gnu.org Received: via spool by 62983-submit@debbugs.gnu.org id=B62983.168204442628779 (code B ref 62983); Fri, 21 Apr 2023 02:34:01 +0000 Received: (at 62983) by debbugs.gnu.org; 21 Apr 2023 02:33:46 +0000 Received: from localhost ([127.0.0.1]:38973 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppgao-0007U6-Fi for submit@debbugs.gnu.org; Thu, 20 Apr 2023 22:33:46 -0400 Received: from mail-lj1-f169.google.com ([209.85.208.169]:50269) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppgam-0007Ts-Fg for 62983@debbugs.gnu.org; Thu, 20 Apr 2023 22:33:45 -0400 Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-2a7af0cb2e6so10780931fa.0 for <62983@debbugs.gnu.org>; Thu, 20 Apr 2023 19:33:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682044418; x=1684636418; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wrrEzIw5vKWzp3X2FvL4LMMUnQHV7vrOh4NwQ+RlLfU=; b=kBygo4sJmVjQmR5w5ksRq8uwvd+G06gJ+AmNeTYy2RUbrYvAYngG2fbaxMtrCFb/Lx P2O5L7bzltaTGfuVQyUYw2u+nl8vQuwmhati29ngpnxiC4BvXCNisz9moQc24Y4Fo14W yxEnqUoENbz1sG1ByXCwsBv+yMwcW7IIP7ycRNOLpxeHr3rNm5yxjXSBmynnO3MLQD2B 6DYJFK8lLMtzf8YVBzw4rHn8sXnGRoNQbTsvRcAD31zhfdBr1nFos0jnYfqnKrKH6avl k8sxooL98ZOtFuMwXReOlJiRk7H3LVcOfJcGW7PcWiZgV9oQ2tP8QFG2NaM+XFGo4xXt 3X6g== X-Gm-Message-State: AAQBX9dqFt+q08y0k1hnpzJjJDPMCxlHltEzyCalHmLrEImZ949pZMBP 2SWZ5w1D8rAZHz/yl7QnCWNPuWLaxqPwvBgUHAY= X-Google-Smtp-Source: AKy350Y2m3+goJ+sTvGM5zCD6aiD2v+AHAmxF+vQaiVYUtEDCB+01gz2uwM0yxIeEax1XzsvmjHanfZ9N4iu6Ty3FFk= X-Received: by 2002:a2e:9143:0:b0:2a8:c8c5:c769 with SMTP id q3-20020a2e9143000000b002a8c8c5c769mr237552ljg.36.1682044418218; Thu, 20 Apr 2023 19:33:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Thu, 20 Apr 2023 19:33:25 -0700 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.2 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.8 (/) On Thu, Apr 20, 2023 at 7:05=E2=80=AFPM Carlo Marcelo Arenas Bel=C3=B3n wrote: > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > its JIT implementation that results in failure to match for the negative > perl classes, and seems to be easier to replicate when the matching > character is a multibyte one. > > Disable that flag and use the original fallback instead. > > Alternatively JIT could be disabled instead, but the option selected has > less of an impact on performance. Thanks for the patch! Is there any PCRE-upstream discussion about this? If so, I'd like to reference that from your commit log. From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 21 Apr 2023 02:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Cc: 62983@debbugs.gnu.org Received: via spool by 62983-submit@debbugs.gnu.org id=B62983.168204452628959 (code B ref 62983); Fri, 21 Apr 2023 02:36:02 +0000 Received: (at 62983) by debbugs.gnu.org; 21 Apr 2023 02:35:26 +0000 Received: from localhost ([127.0.0.1]:38983 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppgcQ-0007X1-1E for submit@debbugs.gnu.org; Thu, 20 Apr 2023 22:35:26 -0400 Received: from mail-lj1-f178.google.com ([209.85.208.178]:54566) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppgcN-0007Wl-W9 for 62983@debbugs.gnu.org; Thu, 20 Apr 2023 22:35:24 -0400 Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2a8db10a5d4so11144561fa.1 for <62983@debbugs.gnu.org>; Thu, 20 Apr 2023 19:35:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682044518; x=1684636518; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=20+5sD91/V7PjGlb8363BnX8KzCSk76PejvcA+rQ+2c=; b=YvBcAKEbmK0fxlNEWRkPnFWB1MovdJY5PPmrM2H1NAgEH4oln0cVjzyl2ke6MjyHvU Zj1jDwvgeZ2rg2uO1WwCl47OpAlyIgRYDEe4Fuc5wf49MZwxhrp7JOocZVNMQF2fZXfO fCat8LifIOhS9h8ZaVEguHbrIvW/D0AlGMREo07wEuPe8Z+kjwtP1sRTOIu/TnRZwH+U t7nZaAIEZmkca7/u4Ib4rX+90FXahG0S5kkJZK8UlIImr5gNeq9hrVItvwQlx6P/oqtO oBJsnrRAlGrIEFzvpJgstHkbO6RYQHw8ljq7o4ZWdCHtsJYHinf0kZCCgYfL1ybDP18Z vXlQ== X-Gm-Message-State: AAQBX9ex03x6RWP8mVGy1/o3yo8vBzdnDcjMSCfYEwRCV9+J4wzUMxGh WydLL3Qk8EDERYCSsBj9+xAeMyn2Qs13poTUNekPbOvT//k= X-Google-Smtp-Source: AKy350awN0ZuXmQ+APErZpYK8anaAmsx9J78LuneGiYPJPYwNf55GJ664eI3k16vgEi5E04kipfNiiNtxeikf8r6uQI= X-Received: by 2002:a05:651c:c2:b0:2a8:ea22:28b1 with SMTP id 2-20020a05651c00c200b002a8ea2228b1mr190996ljr.21.1682044518009; Thu, 20 Apr 2023 19:35:18 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Thu, 20 Apr 2023 19:35:05 -0700 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.2 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.8 (/) On Thu, Apr 20, 2023 at 7:33=E2=80=AFPM Jim Meyering wro= te: > > On Thu, Apr 20, 2023 at 7:05=E2=80=AFPM Carlo Marcelo Arenas Bel=C3=B3n > wrote: > > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > > its JIT implementation that results in failure to match for the negativ= e > > perl classes, and seems to be easier to replicate when the matching > > character is a multibyte one. > > > > Disable that flag and use the original fallback instead. > > > > Alternatively JIT could be disabled instead, but the option selected ha= s > > less of an impact on performance. > > Thanks for the patch! Is there any PCRE-upstream discussion about this? > If so, I'd like to reference that from your commit log. Oh! I see it in the test file: https://github.com/PCRE2Project/pcre2/issues/224 From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 21 Apr 2023 18:44:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Cc: 62983@debbugs.gnu.org Received: via spool by 62983-submit@debbugs.gnu.org id=B62983.16821025853617 (code B ref 62983); Fri, 21 Apr 2023 18:44:01 +0000 Received: (at 62983) by debbugs.gnu.org; 21 Apr 2023 18:43:05 +0000 Received: from localhost ([127.0.0.1]:41147 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppviq-0000wG-IV for submit@debbugs.gnu.org; Fri, 21 Apr 2023 14:43:05 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:39362) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppvij-0000vg-J8 for 62983@debbugs.gnu.org; Fri, 21 Apr 2023 14:43:03 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 590AB3C097AFA; Fri, 21 Apr 2023 11:42:51 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id rO4KlsmuiOUB; Fri, 21 Apr 2023 11:42:50 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id CB0A83C097AFD; Fri, 21 Apr 2023 11:42:50 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu CB0A83C097AFD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1682102570; bh=FCZiUNZfMbdhdOaA0RTrGf9wLYmpnkxkERYdXuqlmKI=; h=Message-ID:Date:MIME-Version:To:From; b=np2GLh7NqiBEyEW7N1Vo//fUO5eA4NWslkiVgqiki0czH6S81CcoKShJ7QbDJkQJ9 xIX4BOCHHP5YVAuJn0pkjio4+QBWU88FnBiKI8fj3ZgCxEXg9r3QULtbgUYRUR/V7B X2ZZYT/p837CeMCQXZsBgj8WvfYWytRKLb1IzjgBL+yUOaVwARkKO3r5yQFB4ltuRT xjdZfujCOclK/+ph/6M/vcB6sdjZOd0KJSNMluQSlCTa4LWgJWGm9BnUZp05G63wOX nY3gRyJ2sd1UGeBy4eiK02LC7XV0ieIHKNvXSJ08Utqf9a3092ZEtWjyoMPqOhZ0rO dBRVtcr/NI/OQ== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id bCSycuizZS6N; Fri, 21 Apr 2023 11:42:50 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id A764E3C097AFA; Fri, 21 Apr 2023 11:42:50 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------LRtUWsM1TxZJ3GWjVmeLEwal" Message-ID: Date: Fri, 21 Apr 2023 11:42:50 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Content-Language: en-US References: From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: X-Spam-Score: -1.1 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.1 (--) This is a multi-part message in MIME format. --------------LRtUWsM1TxZJ3GWjVmeLEwal Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable On 2023-04-20 19:04, Carlo Marcelo Arenas Bel=C3=B3n wrote: > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > its JIT implementation that results in failure to match for the negativ= e > perl classes, and seems to be easier to replicate when the matching > character is a multibyte one. Unfortunately that is a little vague. I expect the issue is not limited=20 to \D and \W, as there are other ways to specify negative Perl classes.=20 And if the bug merely seems to be easier to replicate with multibyte=20 characters, it sounds like we may have issues even when matching ASCII=20 characters in a UTF-8 locale. Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We=20 should focus our optimization efforts on future PCRE2 versions, and not=20 worry about optimizing earlier versions where optimizations complicate=20 maintenance for a declining benefit, and are likely to provoke bugs in=20 older versions that as time passes will be harder to debug. > Alternatively JIT could be disabled instead, but the option selected ha= s > less of an impact on performance. Disabling JIT sounds better, as correctness trumps performance. Until=20 the bug is fixed (or at least better-understood so that we have a=20 workaround we can trust), how about the attached patch instead? --------------LRtUWsM1TxZJ3GWjVmeLEwal Content-Type: text/x-patch; charset=UTF-8; name="0001-grep-use-PCRE2-JIT-only-in-unibyte-locales.patch" Content-Disposition: attachment; filename="0001-grep-use-PCRE2-JIT-only-in-unibyte-locales.patch" Content-Transfer-Encoding: base64 RnJvbSA0ZWM3MWI2M2Y5YWMwYmIyN2I2MGUxYzk4MDJlZGNiYTg2ODA5OWU4IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBGcmksIDIxIEFwciAyMDIzIDExOjMxOjEyIC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gZ3JlcDogdXNlIFBDUkUyIEpJVCBvbmx5IGluIHVuaWJ5dGUgbG9jYWxlcwoKKiBzcmMv cGNyZXNlYXJjaC5jIChQY29tcGlsZSk6IENhbGwgcGNyZTJfaml0X2NvbXBpbGUgb25seQpp ZiBpbiBhIG11bHRpYnl0ZSBsb2NhbGUsIHRvIHdvcmsgYXJvdW5kIGEgUENSRTIgSklUIGJ1 Zy4KLS0tCiBORVdTICAgICAgICAgICAgIHwgIDQgKysrKwogc3JjL3BjcmVzZWFyY2guYyB8 IDE3ICsrKysrKysrKysrLS0tLS0tCiAyIGZpbGVzIGNoYW5nZWQsIDE1IGluc2VydGlvbnMo KyksIDYgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvTkVXUyBiL05FV1MKaW5kZXggZjE2 YzU3Ni4uYjliOGNkYSAxMDA2NDQKLS0tIGEvTkVXUworKysgYi9ORVdTCkBAIC0xMSw2ICsx MSwxMCBAQCBHTlUgZ3JlcCBORVdTICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgLSotIG91dGxpbmUgLSotCiAgIFVuaWNvZGUgaW50ZXJwcmV0YXRpb25zLgogICBbYnVn IGludHJvZHVjZWQgaW4gZ3JlcCAzLjEwXQogCisgIFdpdGggLVAsIHBhdHRlcm5zIGxpa2Ug XEQgYW5kIFxXIG5vdyB3b3JrIGFnYWluIGluIGEgVVRGLTggbG9jYWxlLAorICB3aGVuIGxp bmtlZCB0byBQQ1JFMiAxMC4zNCBvciBuZXdlci4KKyAgW2J1ZyBpbnRyb2R1Y2VkIGluIGdy ZXAgMy44XQorCiAgIGdyZXAgbm8gbG9uZ2VyIGZhaWxzIG9uIGZpbGVzIGRhdGVkIGFmdGVy IHRoZSB5ZWFyIDIwMzgsCiAgIHdoZW4gcnVubmluZyBvbiAzMi1iaXQgeDg2IGFuZCBBUk0g aG9zdHMgdXNpbmcgZ2xpYmMgMi4zNCsuCiAgIFtidWcgaW50cm9kdWNlZCBpbiBncmVwIDMu OV0KZGlmZiAtLWdpdCBhL3NyYy9wY3Jlc2VhcmNoLmMgYi9zcmMvcGNyZXNlYXJjaC5jCmlu ZGV4IGU4MmJmODYuLjQwODZiYmMgMTAwNjQ0Ci0tLSBhL3NyYy9wY3Jlc2VhcmNoLmMKKysr IGIvc3JjL3BjcmVzZWFyY2guYwpAQCAtMjQzLDEzICsyNDMsMTggQEAgUGNvbXBpbGUgKGNo YXIgKnBhdHRlcm4sIGlkeF90IHNpemUsIHJlZ19zeW50YXhfdCBpZ25vcmVkLCBib29sIGV4 YWN0KQogICBwYy0+bWNvbnRleHQgPSBOVUxMOwogICBwYy0+ZGF0YSA9IHBjcmUyX21hdGNo X2RhdGFfY3JlYXRlX2Zyb21fcGF0dGVybiAocGMtPmNyZSwgZ2NvbnRleHQpOwogCi0gIC8q IElnbm9yZSBhbnkgZmFpbHVyZSByZXR1cm4gZnJvbSBwY3JlMl9qaXRfY29tcGlsZSwgYXMg dGhhdCBtZXJlbHkKLSAgICAgbWVhbnMgSklUIHdvbid0IGJlIHVzZWQgZHVyaW5nIG1hdGNo aW5nLiAgKi8KLSAgcGNyZTJfaml0X2NvbXBpbGUgKHBjLT5jcmUsIFBDUkUyX0pJVF9DT01Q TEVURSk7CisgIC8qIERvIG5vdCB1c2UgUENSRTIgSklUIGluIG11bHRpYnl0ZSBsb2NhbGVz IDxodHRwczovL2J1Z3MuZ251Lm9yZy82Mjk4Mz4uCisgICAgIEZJWE1FOiB3aGVuIHRoZSBQ Q1JFMiBidWcgaXMgZml4ZWQgb3IgYSByZWxpYWJsZSB3b3JrYXJvdW5kIGZvdW5kLiAgKi8K KyAgaWYgKCFsb2NhbGVpbmZvLm11bHRpYnl0ZSkKKyAgICB7CisgICAgICAvKiBJZ25vcmUg YW55IGZhaWx1cmUgcmV0dXJuIGZyb20gcGNyZTJfaml0X2NvbXBpbGUsIGFzIHRoYXQgbWVy ZWx5CisgICAgICAgICBtZWFucyBKSVQgd29uJ3QgYmUgdXNlZCBkdXJpbmcgbWF0Y2hpbmcu ICAqLworICAgICAgcGNyZTJfaml0X2NvbXBpbGUgKHBjLT5jcmUsIFBDUkUyX0pJVF9DT01Q TEVURSk7CiAKLSAgLyogVGhlIFBDUkUgZG9jdW1lbnRhdGlvbiBzYXlzIHRoYXQgYSAzMiBL aUIgc3RhY2sgaXMgdGhlIGRlZmF1bHQuICAqLwotICBwYy0+aml0X3N0YWNrID0gTlVMTDsK LSAgcGMtPmppdF9zdGFja19zaXplID0gMzIgPDwgMTA7CisgICAgICAvKiBUaGUgUENSRSBk b2N1bWVudGF0aW9uIHNheXMgdGhhdCBhIDMyIEtpQiBzdGFjayBpcyB0aGUgZGVmYXVsdC4g ICovCisgICAgICBwYy0+aml0X3N0YWNrID0gTlVMTDsKKyAgICAgIHBjLT5qaXRfc3RhY2tf c2l6ZSA9IDMyIDw8IDEwOworICAgIH0KIAogICBwYy0+ZW1wdHlfbWF0Y2hbZmFsc2VdID0g cGNyZV9leGVjIChwYywgIiIsIDAsIDAsIFBDUkUyX05PVEJPTCk7CiAgIHBjLT5lbXB0eV9t YXRjaFt0cnVlXSA9IHBjcmVfZXhlYyAocGMsICIiLCAwLCAwLCAwKTsKLS0gCjIuMzkuMgoK --------------LRtUWsM1TxZJ3GWjVmeLEwal-- From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 21 Apr 2023 20:22:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Paul Eggert Cc: 62983@debbugs.gnu.org Received: via spool by 62983-submit@debbugs.gnu.org id=B62983.168210846715207 (code B ref 62983); Fri, 21 Apr 2023 20:22:02 +0000 Received: (at 62983) by debbugs.gnu.org; 21 Apr 2023 20:21:07 +0000 Received: from localhost ([127.0.0.1]:41241 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppxFi-0003xC-Tc for submit@debbugs.gnu.org; Fri, 21 Apr 2023 16:21:07 -0400 Received: from mail-pf1-f170.google.com ([209.85.210.170]:62527) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ppxFd-0003wG-SC for 62983@debbugs.gnu.org; Fri, 21 Apr 2023 16:21:05 -0400 Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-63d4595d60fso16685142b3a.0 for <62983@debbugs.gnu.org>; Fri, 21 Apr 2023 13:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682108456; x=1684700456; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=Wk43UOhQN17QhkherOf7+1kJXZ/T0Azv/V5Gck1V8yI=; b=EX+/I0JW8OfQXTettP3EdwxwBBZGH4eJBYyHK2Mka8m/FXxhRaMGn1nMf/ADz8110k MbsWwzQSgIAdGebuON3yOZJB/hI9S6h2wf2GMc88X3h6ZQyJ3qGEJYTsJ5T4LCbLw/eT 65jhxYmMq6IWbkU9sq/Ohjqswxru8tpI6ngaSyI8D1Pyy4cBSUxL+ypA1hpq4yfQ60Pp Z7jnM1Vl6+Oxlb9l6ExHQ7EVgSGcRXwtUhYC1hI3Bt1CAOQKFkF14sI/gQE8rtTRJYZz U9j1rESHMreZM2tsCatvnglm77wBXwB/kDjioSd/XEir7gW4T15DwMXvyJm9EaEhcb+e tljQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682108456; x=1684700456; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Wk43UOhQN17QhkherOf7+1kJXZ/T0Azv/V5Gck1V8yI=; b=lJX099+vjuw27hGIeTa5wLva1HgqFl+F5AQZVbiS0YJNz51OMNM1dDG2sAbXUxycWm M1gv+fSfFhkGTm2bqEXO2VQZLIKs4e5cCrA+kbsvbkYEksqG5o3HjXog0ksTuN8iGlmh A9MwwOMTGTjBYyTWEszn/tMg7qgfsrPpo1TYaI7aw47I1POFaQSpwk7T+i1srZWMcdfu kddFU5Gg1M7OJBCdYO5jOpmVFmylgfgV6OA46e+d4NRQ3ORMKXpj5+0srGhjewEN60Kj pRgmrnzroJBlU3J+BN+SD5J6KxTOHbTqKG9gKPU0zyFgpLvyZOVS6gceKSBDy7Re1ZBB 0uNg== X-Gm-Message-State: AAQBX9ePmdOTUN0tmfhuo9TlrJD56ENpL0Rdh5MpunighgQpupt3Unva tCABQxnVvZLHQQNNAdOsBww= X-Google-Smtp-Source: AKy350bUDgVMuqqTOZBY4eZJ03dMT5eXUck6H+eLbaEjZtz5lXt2miiI6VYc+AlXx63JmuUNJFVyUw== X-Received: by 2002:a17:902:ecd0:b0:1a6:8548:e0ac with SMTP id a16-20020a170902ecd000b001a68548e0acmr6336164plh.34.1682108455616; Fri, 21 Apr 2023 13:20:55 -0700 (PDT) Received: from Carlos-MacBook-Pro-2.local (192-184-219-167.fiber.dynamic.sonic.net. [192.184.219.167]) by smtp.gmail.com with ESMTPSA id h11-20020a170902748b00b001a641e4738asm3090742pll.1.2023.04.21.13.20.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Apr 2023 13:20:55 -0700 (PDT) Date: Fri, 21 Apr 2023 13:20:53 -0700 From: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Message-ID: References: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="kp62zerfaxgdsxut" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --kp62zerfaxgdsxut Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Fri, Apr 21, 2023 at 11:42:50AM -0700, Paul Eggert wrote: > On 2023-04-20 19:04, Carlo Marcelo Arenas Belón wrote: > > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > > its JIT implementation that results in failure to match for the negative > > perl classes, and seems to be easier to replicate when the matching > > character is a multibyte one. > > Unfortunately that is a little vague. I expect the issue is not limited to > \D and \W, as there are other ways to specify negative Perl classes. Correct, it should also affect at least \S, but hadn't been able to trigger it there. The bug was that an uninitialized value was being used in the JIT code that supports the PCRE2_MATCH_INVALID_UTF mode. which is why I said "randomly" in the commit message. If you want to be strict, how about the attached patch instead? > And if > the bug merely seems to be easier to replicate with multibyte characters, it > sounds like we may have issues even when matching ASCII characters in a > UTF-8 locale. Which the current workaround addresses, since you need both PCRE2_JIT and PCRE2_MATCH_INVALID_UTF to trigger it, and the subject encoding is irrelevant for the logic to decide if PCRE2_MATCH_INVALID_UTF gets enabled or not. > Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We should > focus our optimization efforts on future PCRE2 versions, and not worry about > optimizing earlier versions where optimizations complicate maintenance for a > declining benefit, and are likely to provoke bugs in older versions that as > time passes will be harder to debug. Not sure I understand your concern here, but if it is about disabling JIT insteed, then the possibility of introducing bugs is even bigger since it affects all versions of PCRE2 (not only 10.34 or newer). > > Alternatively JIT could be disabled instead, but the option selected has > > less of an impact on performance. > > Disabling JIT sounds better, as correctness trumps performance. Until the > bug is fixed (or at least better-understood so that we have a workaround we > can trust), how about the attached patch instead? The bug has been fixed already, and will be included in the next release. There might be additional changes as spelled in that discussion, and indeed the change to the proposed solution proactively helps with one of those. It is very unlikely, but some systems might include non 0 values on the tables for characters over 127 and that might trigger a similar problem that is yet to be fixed. Carlo [1] https://github.com/PCRE2Project/pcre2/commit/2c08b619dc973beacc474dcb67cda8cd366200ce --kp62zerfaxgdsxut Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable =46rom 919d4aa016dd979a52b9e5fd3b0ba1d1cf833ac8 Mon Sep 17 00:00:00 2001 =46rom: =3D?UTF-8?q?Carlo=3D20Marcelo=3D20Arenas=3D20Bel=3DC3=3DB3n?=3D Date: Thu, 20 Apr 2023 18:37:20 -0700 Subject: [PATCH v2] pcre: workaround bug affecting PCRE2_MATCH_INVALID_UTF PCRE2 has a bug when using PCRE2_MATCH_INVALID_UTF that would randomly fail to match patterns using perl negative classes (like \W or \D). * NEWS: mention this * src/pcre2search.c: restric impact of the but not use the problematic flag in all broken versions of PCRE2 only generate locale tables for non Unicode * tests: add new pcre2-utf-bug224 test with replications for \[W|D] --- NEWS | 5 +++++ src/pcresearch.c | 22 ++++++++++++++-------- tests/Makefile.am | 1 + tests/pcre-utf8-bug224 | 31 +++++++++++++++++++++++++++++++ 4 files changed, 51 insertions(+), 8 deletions(-) create mode 100755 tests/pcre-utf8-bug224 diff --git a/NEWS b/NEWS index f16c576..3552db1 100644 --- a/NEWS +++ b/NEWS @@ -15,6 +15,11 @@ GNU grep NEWS -*- out= line -*- when running on 32-bit x86 and ARM hosts using glibc 2.34+. [bug introduced in grep 3.9] =20 + grep no longer fails to match patterns which relied on negative perl + classes like \D or \W when linked with PCRE2 10.34 or newer. + [bug introduced in grep 3.8] + + ** Changes in behavior =20 grep --version now prints a line describing the version of PCRE2 it uses. diff --git a/src/pcresearch.c b/src/pcresearch.c index e867f49..a64b65b 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -58,6 +58,9 @@ struct pcre_comp /* Table, indexed by ! (flag & PCRE2_NOTBOL), of whether the empty string matches when that flag is used. */ int empty_match[2]; + + /* Flags */ + unsigned binary_safe:1; }; =20 /* Memory allocation functions for PCRE. */ @@ -130,16 +133,11 @@ jit_exec (struct pcre_comp *pc, char const *subject, = idx_t search_bytes, } } =20 -/* Return true if E is an error code for bad UTF-8, and if pcre2_match - could return E because PCRE lacks PCRE2_MATCH_INVALID_UTF. */ +/* Return true if E is an error code for bad UTF-8 */ static bool bad_utf8_from_pcre2 (int e) { -#ifdef PCRE2_MATCH_INVALID_UTF - return false; -#else return PCRE2_ERROR_UTF8_ERR21 <=3D e && e <=3D PCRE2_ERROR_UTF8_ERR1; -#endif } =20 /* Compile the -P style PATTERN, containing SIZE bytes that are @@ -157,6 +155,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignor= ed, bool exact) =3D pcre2_general_context_create (private_malloc, private_free, NULL); pcre2_compile_context *ccontext =3D pcre2_compile_context_create (gconte= xt); =20 + pc->binary_safe =3D false; if (localeinfo.multibyte) { uint32_t unicode; @@ -181,8 +180,13 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t igno= red, bool exact) flags |=3D PCRE2_NEVER_BACKSLASH_C; #endif #ifdef PCRE2_MATCH_INVALID_UTF + /* workaround PCRE2 bug + https://github.com/PCRE2Project/pcre2/issues/224 */ +#if PCRE2_MAJOR =3D=3D 10 && PCRE2_MINOR > 42 + pc->binary_safe =3D true; /* Consider invalid UTF-8 as a barrier, instead of error. */ flags |=3D PCRE2_MATCH_INVALID_UTF; +#endif #endif } =20 @@ -226,7 +230,9 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignor= ed, bool exact) size =3D re_size; } =20 - pcre2_set_character_tables (ccontext, pcre2_maketables (gcontext)); + if (!localeinfo.multibyte) + pcre2_set_character_tables (ccontext, pcre2_maketables (gcontext)); + pc->cre =3D pcre2_compile ((PCRE2_SPTR) pattern, size, flags, &ec, &e, ccontext); if (!pc->cre) @@ -313,7 +319,7 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t= *match_size, =20 e =3D jit_exec (pc, subject, line_end - subject, search_offset, options); - if (!bad_utf8_from_pcre2 (e)) + if (pc->binary_safe || !bad_utf8_from_pcre2 (e)) break; =20 idx_t valid_bytes =3D pcre2_get_startchar (pc->data); diff --git a/tests/Makefile.am b/tests/Makefile.am index 7718f24..9b4422e 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -155,6 +155,7 @@ TESTS =3D \ pcre-jitstack \ pcre-o \ pcre-utf8 \ + pcre-utf8-bug224 \ pcre-utf8-w \ pcre-w \ pcre-wx-backref \ diff --git a/tests/pcre-utf8-bug224 b/tests/pcre-utf8-bug224 new file mode 100755 index 0000000..549cc43 --- /dev/null +++ b/tests/pcre-utf8-bug224 @@ -0,0 +1,31 @@ +#!/bin/sh +# Ensure negative perl classes matches multibyte characters in UTF mode +# +# Copyright (C) 2023 Free Software Foundation, Inc. +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notice and this notice are preserved. + +. "${srcdir=3D.}/init.sh"; path_prepend_ ../src +require_en_utf8_locale_ +LC_ALL=3Den_US.UTF-8 +export LC_ALL +require_pcre_ + +echo . | grep -qP '(*UTF).' 2>/dev/null \ + || skip_ 'PCRE unicode support is compiled out' + +fail=3D0 + +# '=C3=B1' (U+00F1) +printf '\302\221\n' > in || framework_failure_ +grep -P '\D' in > out || fail=3D1 +compare in out || fail=3D1 + +# =E2=80=9C=F0=9D=84=9E=E2=80=9D (U+1D11E) +printf '\360\235\204\236\n' > in || framework_failure_ +grep -P '\W' in > out || fail=3D1 +compare in out || fail=3D1 + +Exit $fail --=20 2.39.2 (Apple Git-143) --kp62zerfaxgdsxut-- From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 29 Apr 2023 06:56:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Cc: Paul Eggert , 62983@debbugs.gnu.org Received: via spool by 62983-submit@debbugs.gnu.org id=B62983.16827513065968 (code B ref 62983); Sat, 29 Apr 2023 06:56:02 +0000 Received: (at 62983) by debbugs.gnu.org; 29 Apr 2023 06:55:06 +0000 Received: from localhost ([127.0.0.1]:35057 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pseU6-0001Y9-4K for submit@debbugs.gnu.org; Sat, 29 Apr 2023 02:55:06 -0400 Received: from mail-lj1-f174.google.com ([209.85.208.174]:45257) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pseU3-0001XZ-G7 for 62983@debbugs.gnu.org; Sat, 29 Apr 2023 02:55:04 -0400 Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-2a8b082d6feso5327931fa.2 for <62983@debbugs.gnu.org>; Fri, 28 Apr 2023 23:55:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682751297; x=1685343297; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iNbhEgysq8IS3aW395mPDVpd3MDiYb42t3blmdrCy4E=; b=Cjob8Dj91Q1m15td3L4N1z7XfiS2+LZykW9okCK6j3U46uSWlqjRSjleenoMpTQSt7 fmVDlPHGN9c3EfCb3tTBf7APZ2pfB8zjORmqARXniC+1w+uRnbi9hfsFZokKQnsC22VI Ql2ouYooIULfo6tv3X3jWjB5wMpzlaKG7n2mdD0HAHK+7ie9Pa7BiqN/qZh22qG03v8u 2MqhTrto2jLOREtx6DSzk+pXMEHgw340zrFOZQk/K8EBzCk3IvSVWd8wfJcDBZAhRfAa N7+jKrGHUruGIm9bEh5eUFR51NlPYIvFgW3pYsWGl0mgR4aBo2cetw+74qaBX6FTL9oy BJ9A== X-Gm-Message-State: AC+VfDzBEGBzZuFZDLIUvNzB8DcHXR7ZZqk6G7h4DWZmUY+kZBJg6wVh zeBepnhwxq98MBB49+7F3FUGYHY1J8uUVF81kzI= X-Google-Smtp-Source: ACHHUZ5HAqEoOQVSpANvW5Tn8kEwkV1MNGz00xlIOrsZBj+7eDH3ZWv/ZwQEYgE1Z6VC4aKh/SqDYiVKnDNauh0zZUs= X-Received: by 2002:a2e:8501:0:b0:2a9:f8fd:49ff with SMTP id j1-20020a2e8501000000b002a9f8fd49ffmr2175720lji.17.1682751297436; Fri, 28 Apr 2023 23:54:57 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Sat, 29 Apr 2023 08:54:44 +0200 Message-ID: Content-Type: multipart/mixed; boundary="000000000000545b5d05fa74110f" X-Spam-Score: 0.2 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.8 (/) --000000000000545b5d05fa74110f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Apr 21, 2023 at 10:22=E2=80=AFPM Carlo Marcelo Arenas Bel=C3=B3n wrote: > On Fri, Apr 21, 2023 at 11:42:50AM -0700, Paul Eggert wrote: > > On 2023-04-20 19:04, Carlo Marcelo Arenas Bel=C3=B3n wrote: > > > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug = on > > > its JIT implementation that results in failure to match for the negat= ive > > > perl classes, and seems to be easier to replicate when the matching > > > character is a multibyte one. > > > > Unfortunately that is a little vague. I expect the issue is not limited= to > > \D and \W, as there are other ways to specify negative Perl classes. > > Correct, it should also affect at least \S, but hadn't been able to trigg= er > it there. > > The bug was that an uninitialized value was being used in the JIT code th= at > supports the PCRE2_MATCH_INVALID_UTF mode. which is why I said "randomly"= in > the commit message. > > If you want to be strict, how about the attached patch instead? > > > And if > > the bug merely seems to be easier to replicate with multibyte character= s, it > > sounds like we may have issues even when matching ASCII characters in a > > UTF-8 locale. > > Which the current workaround addresses, since you need both PCRE2_JIT and > PCRE2_MATCH_INVALID_UTF to trigger it, and the subject encoding is irrele= vant > for the logic to decide if PCRE2_MATCH_INVALID_UTF gets enabled or not. > > > Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We sh= ould > > focus our optimization efforts on future PCRE2 versions, and not worry = about > > optimizing earlier versions where optimizations complicate maintenance = for a > > declining benefit, and are likely to provoke bugs in older versions tha= t as > > time passes will be harder to debug. > > Not sure I understand your concern here, but if it is about disabling JIT > insteed, then the possibility of introducing bugs is even bigger since it > affects all versions of PCRE2 (not only 10.34 or newer). > > > > Alternatively JIT could be disabled instead, but the option selected = has > > > less of an impact on performance. > > > > Disabling JIT sounds better, as correctness trumps performance. Until t= he > > bug is fixed (or at least better-understood so that we have a workaroun= d we > > can trust), how about the attached patch instead? > > The bug has been fixed already, and will be included in the next release. > There might be additional changes as spelled in that discussion, and inde= ed > the change to the proposed solution proactively helps with one of those. > > It is very unlikely, but some systems might include non 0 values on the > tables for characters over 127 and that might trigger a similar problem t= hat > is yet to be fixed. > > Carlo > > [1] https://github.com/PCRE2Project/pcre2/commit/2c08b619dc973beacc474dcb= 67cda8cd366200ce Thanks, Carlo. I've made some small adjustments and tidied up the ChangeLog in the attache= d. Hope to push it by Sunday. There's enough going on via gnulib that I'll likely make yet another snapshot with the very latest. Also, there remain solaris sparc and i386 gnulib test failures: https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-sparc/b= uilds/336 FAIL: test-c-stack.sh FAIL: test-year2038 https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-i386/bu= ilds/334 FAIL: test-year2038 --000000000000545b5d05fa74110f Content-Type: application/octet-stream; name="grep-pcre2.diff" Content-Disposition: attachment; filename="grep-pcre2.diff" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lh1mnrd10 RnJvbSA5Mzk3Yzc0ZmNlODhlZWYxN2RkMDBhN2M3Yjg4OWQwNDk1ZjQ1YjUxIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiA9P1VURi04P3E/Q2FybG89MjBNYXJjZWxvPTIwQXJlbmFzPTIw QmVsPUMzPUIzbj89IDxjYXJlbmFzQGdtYWlsLmNvbT4KRGF0ZTogVGh1LCAyMCBBcHIgMjAyMyAx ODozNzoyMCAtMDcwMApTdWJqZWN0OiBbUEFUQ0hdIHBjcmU6IHdvcmsgYXJvdW5kIGEgUENSRTJf TUFUQ0hfSU5WQUxJRF9VVEYgYnVnCgpQQ1JFMiBoYXMgYSBidWcgd2hlbiB1c2luZyBQQ1JFMl9N QVRDSF9JTlZBTElEX1VURjogaXQgd291bGQKc29tZXRpbWVzIGZhaWwgdG8gbWF0Y2ggcGF0dGVy bnMgdXNpbmcgcGVybCBuZWdhdGl2ZSBjbGFzc2VzCmxpa2UgXFcgYW5kIFxELgoKKiBORVdTIChC dWcgZml4ZXMpOiBNZW50aW9uIGl0LgoqIHNyYy9wY3JlMnNlYXJjaC5jOiByZXN0cmljIGltcGFj dCBvZiB0aGUgYnVnCkRvIG5vdCB1c2UgdGhlIHByb2JsZW1hdGljIGZsYWcgd2l0aCBicm9rZW4g dmVyc2lvbnMgb2YgUENSRTIuCkdlbmVyYXRlIGxvY2FsZSB0YWJsZXMgb25seSBmb3Igc2luZ2xl LWJ5dGUgbG9jYWxlcy4KKiB0ZXN0cy9NYWtlZmlsZS5hbSAoVEVTVFMpOiBBZGQgdGhlIGZpbGUg bmFtZQoqIHRlc3RzL3BjcmUtdXRmOC1idWcyMjQ6IE5ldyBmaWxlLCB0byB0ZXN0IGZvciB0aGlz LgotLS0KIE5FV1MgICAgICAgICAgICAgICAgICAgfCAgNSArKysrKwogc3JjL3BjcmVzZWFyY2gu YyAgICAgICB8IDIyICsrKysrKysrKysrKysrLS0tLS0tLS0KIHRlc3RzL01ha2VmaWxlLmFtICAg ICAgfCAgMSArCiB0ZXN0cy9wY3JlLXV0ZjgtYnVnMjI0IHwgMzEgKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKwogNCBmaWxlcyBjaGFuZ2VkLCA1MSBpbnNlcnRpb25zKCspLCA4IGRlbGV0 aW9ucygtKQogY3JlYXRlIG1vZGUgMTAwNzU1IHRlc3RzL3BjcmUtdXRmOC1idWcyMjQKCmRpZmYg LS1naXQgYS9ORVdTIGIvTkVXUwppbmRleCBjMTU3NjRjLi45N2E5MTNjIDEwMDY0NAotLS0gYS9O RVdTCisrKyBiL05FV1MKQEAgLTE1LDYgKzE1LDExIEBAIEdOVSBncmVwIE5FV1MgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAtKi0gb3V0bGluZSAtKi0KICAgd2hlbiBydW5uaW5n IG9uIDMyLWJpdCB4ODYgYW5kIEFSTSBob3N0cyB1c2luZyBnbGliYyAyLjM0Ky4KICAgW2J1ZyBp bnRyb2R1Y2VkIGluIGdyZXAgMy45XQoKKyAgZ3JlcCBubyBsb25nZXIgZmFpbHMgdG8gbWF0Y2gg cGF0dGVybnMgdXNpbmcgbmVnYXRlZCBwZXJsCisgIGNsYXNzZXMgbGlrZSBcRCBvciBcVyB3aGVu IGxpbmtlZCB3aXRoIFBDUkUyIDEwLjM0IG9yIG5ld2VyLgorICBbYnVnIGludHJvZHVjZWQgaW4g Z3JlcCAzLjhdCisKKwogKiogQ2hhbmdlcyBpbiBiZWhhdmlvcgoKICAgZ3JlcCAtLXZlcnNpb24g bm93IHByaW50cyBhIGxpbmUgZGVzY3JpYmluZyB0aGUgdmVyc2lvbiBvZiBQQ1JFMiBpdCB1c2Vz LgpkaWZmIC0tZ2l0IGEvc3JjL3BjcmVzZWFyY2guYyBiL3NyYy9wY3Jlc2VhcmNoLmMKaW5kZXgg ZTg2N2Y0OS4uNjhlYzZkZSAxMDA2NDQKLS0tIGEvc3JjL3BjcmVzZWFyY2guYworKysgYi9zcmMv cGNyZXNlYXJjaC5jCkBAIC01OCw2ICs1OCw5IEBAIHN0cnVjdCBwY3JlX2NvbXAKICAgLyogVGFi bGUsIGluZGV4ZWQgYnkgISAoZmxhZyAmIFBDUkUyX05PVEJPTCksIG9mIHdoZXRoZXIgdGhlIGVt cHR5CiAgICAgIHN0cmluZyBtYXRjaGVzIHdoZW4gdGhhdCBmbGFnIGlzIHVzZWQuICAqLwogICBp bnQgZW1wdHlfbWF0Y2hbMl07CisKKyAgLyogRmxhZ3MgKi8KKyAgdW5zaWduZWQgYmluYXJ5X3Nh ZmU6MTsKIH07CgogLyogTWVtb3J5IGFsbG9jYXRpb24gZnVuY3Rpb25zIGZvciBQQ1JFLiAgKi8K QEAgLTEzMCwxNiArMTMzLDExIEBAIGppdF9leGVjIChzdHJ1Y3QgcGNyZV9jb21wICpwYywgY2hh ciBjb25zdCAqc3ViamVjdCwgaWR4X3Qgc2VhcmNoX2J5dGVzLAogICAgIH0KIH0KCi0vKiBSZXR1 cm4gdHJ1ZSBpZiBFIGlzIGFuIGVycm9yIGNvZGUgZm9yIGJhZCBVVEYtOCwgYW5kIGlmIHBjcmUy X21hdGNoCi0gICBjb3VsZCByZXR1cm4gRSBiZWNhdXNlIFBDUkUgbGFja3MgUENSRTJfTUFUQ0hf SU5WQUxJRF9VVEYuICAqLworLyogUmV0dXJuIHRydWUgaWYgRSBpcyBhbiBlcnJvciBjb2RlIGZv ciBiYWQgVVRGLTggKi8KIHN0YXRpYyBib29sCiBiYWRfdXRmOF9mcm9tX3BjcmUyIChpbnQgZSkK IHsKLSNpZmRlZiBQQ1JFMl9NQVRDSF9JTlZBTElEX1VURgotICByZXR1cm4gZmFsc2U7Ci0jZWxz ZQogICByZXR1cm4gUENSRTJfRVJST1JfVVRGOF9FUlIyMSA8PSBlICYmIGUgPD0gUENSRTJfRVJS T1JfVVRGOF9FUlIxOwotI2VuZGlmCiB9CgogLyogQ29tcGlsZSB0aGUgLVAgc3R5bGUgUEFUVEVS TiwgY29udGFpbmluZyBTSVpFIGJ5dGVzIHRoYXQgYXJlCkBAIC0xNTcsNiArMTU1LDcgQEAgUGNv bXBpbGUgKGNoYXIgKnBhdHRlcm4sIGlkeF90IHNpemUsIHJlZ19zeW50YXhfdCBpZ25vcmVkLCBi b29sIGV4YWN0KQogICAgID0gcGNyZTJfZ2VuZXJhbF9jb250ZXh0X2NyZWF0ZSAocHJpdmF0ZV9t YWxsb2MsIHByaXZhdGVfZnJlZSwgTlVMTCk7CiAgIHBjcmUyX2NvbXBpbGVfY29udGV4dCAqY2Nv bnRleHQgPSBwY3JlMl9jb21waWxlX2NvbnRleHRfY3JlYXRlIChnY29udGV4dCk7CgorICBwYy0+ YmluYXJ5X3NhZmUgPSBmYWxzZTsKICAgaWYgKGxvY2FsZWluZm8ubXVsdGlieXRlKQogICAgIHsK ICAgICAgIHVpbnQzMl90IHVuaWNvZGU7CkBAIC0xODEsOCArMTgwLDEzIEBAIFBjb21waWxlIChj aGFyICpwYXR0ZXJuLCBpZHhfdCBzaXplLCByZWdfc3ludGF4X3QgaWdub3JlZCwgYm9vbCBleGFj dCkKICAgICAgIGZsYWdzIHw9IFBDUkUyX05FVkVSX0JBQ0tTTEFTSF9DOwogI2VuZGlmCiAjaWZk ZWYgUENSRTJfTUFUQ0hfSU5WQUxJRF9VVEYKKyAgICAgIC8qIHdvcmthcm91bmQgUENSRTIgYnVn CisgICAgICAgICBodHRwczovL2dpdGh1Yi5jb20vUENSRTJQcm9qZWN0L3BjcmUyL2lzc3Vlcy8y MjQgKi8KKyNpZiAxMCA8IFBDUkUyX01BSk9SIHx8IChQQ1JFMl9NQUpPUiA9PSAxMCAmJiA0MiA8 IFBDUkUyX01JTk9SKQorICAgICAgcGMtPmJpbmFyeV9zYWZlID0gdHJ1ZTsKICAgICAgIC8qIENv bnNpZGVyIGludmFsaWQgVVRGLTggYXMgYSBiYXJyaWVyLCBpbnN0ZWFkIG9mIGVycm9yLiAgKi8K ICAgICAgIGZsYWdzIHw9IFBDUkUyX01BVENIX0lOVkFMSURfVVRGOworI2VuZGlmCiAjZW5kaWYK ICAgICB9CgpAQCAtMjI2LDcgKzIzMCw5IEBAIFBjb21waWxlIChjaGFyICpwYXR0ZXJuLCBpZHhf dCBzaXplLCByZWdfc3ludGF4X3QgaWdub3JlZCwgYm9vbCBleGFjdCkKICAgICAgIHNpemUgPSBy ZV9zaXplOwogICAgIH0KCi0gIHBjcmUyX3NldF9jaGFyYWN0ZXJfdGFibGVzIChjY29udGV4dCwg cGNyZTJfbWFrZXRhYmxlcyAoZ2NvbnRleHQpKTsKKyAgaWYgKCFsb2NhbGVpbmZvLm11bHRpYnl0 ZSkKKyAgICBwY3JlMl9zZXRfY2hhcmFjdGVyX3RhYmxlcyAoY2NvbnRleHQsIHBjcmUyX21ha2V0 YWJsZXMgKGdjb250ZXh0KSk7CisKICAgcGMtPmNyZSA9IHBjcmUyX2NvbXBpbGUgKChQQ1JFMl9T UFRSKSBwYXR0ZXJuLCBzaXplLCBmbGFncywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICZl YywgJmUsIGNjb250ZXh0KTsKICAgaWYgKCFwYy0+Y3JlKQpAQCAtMzEzLDcgKzMxOSw3IEBAIFBl eGVjdXRlICh2b2lkICp2Y3AsIGNoYXIgY29uc3QgKmJ1ZiwgaWR4X3Qgc2l6ZSwgaWR4X3QgKm1h dGNoX3NpemUsCgogICAgICAgICAgIGUgPSBqaXRfZXhlYyAocGMsIHN1YmplY3QsIGxpbmVfZW5k IC0gc3ViamVjdCwKICAgICAgICAgICAgICAgICAgICAgICAgIHNlYXJjaF9vZmZzZXQsIG9wdGlv bnMpOwotICAgICAgICAgIGlmICghYmFkX3V0ZjhfZnJvbV9wY3JlMiAoZSkpCisgICAgICAgICAg aWYgKHBjLT5iaW5hcnlfc2FmZSB8fCAhYmFkX3V0ZjhfZnJvbV9wY3JlMiAoZSkpCiAgICAgICAg ICAgICBicmVhazsKCiAgICAgICAgICAgaWR4X3QgdmFsaWRfYnl0ZXMgPSBwY3JlMl9nZXRfc3Rh cnRjaGFyIChwYy0+ZGF0YSk7CmRpZmYgLS1naXQgYS90ZXN0cy9NYWtlZmlsZS5hbSBiL3Rlc3Rz L01ha2VmaWxlLmFtCmluZGV4IDc3MThmMjQuLjliNDQyMmUgMTAwNjQ0Ci0tLSBhL3Rlc3RzL01h a2VmaWxlLmFtCisrKyBiL3Rlc3RzL01ha2VmaWxlLmFtCkBAIC0xNTUsNiArMTU1LDcgQEAgVEVT VFMgPQkJCQkJCVwKICAgcGNyZS1qaXRzdGFjawkJCQkJXAogICBwY3JlLW8JCQkJCVwKICAgcGNy ZS11dGY4CQkJCQlcCisgIHBjcmUtdXRmOC1idWcyMjQJCQkJXAogICBwY3JlLXV0ZjgtdwkJCQkJ XAogICBwY3JlLXcJCQkJCVwKICAgcGNyZS13eC1iYWNrcmVmCQkJCVwKZGlmZiAtLWdpdCBhL3Rl c3RzL3BjcmUtdXRmOC1idWcyMjQgYi90ZXN0cy9wY3JlLXV0ZjgtYnVnMjI0Cm5ldyBmaWxlIG1v ZGUgMTAwNzU1CmluZGV4IDAwMDAwMDAuLmU3ZTBkY2QKLS0tIC9kZXYvbnVsbAorKysgYi90ZXN0 cy9wY3JlLXV0ZjgtYnVnMjI0CkBAIC0wLDAgKzEsMzEgQEAKKyMhL2Jpbi9zaAorIyBFbnN1cmUg bmVnYXRlZCBwZXJsIGNsYXNzZXMgbWF0Y2ggbXVsdGlieXRlIGNoYXJhY3RlcnMgaW4gVVRGIG1v ZGUKKyMKKyMgQ29weXJpZ2h0IChDKSAyMDIzIEZyZWUgU29mdHdhcmUgRm91bmRhdGlvbiwgSW5j LgorIworIyBDb3B5aW5nIGFuZCBkaXN0cmlidXRpb24gb2YgdGhpcyBmaWxlLCB3aXRoIG9yIHdp dGhvdXQgbW9kaWZpY2F0aW9uLAorIyBhcmUgcGVybWl0dGVkIGluIGFueSBtZWRpdW0gd2l0aG91 dCByb3lhbHR5IHByb3ZpZGVkIHRoZSBjb3B5cmlnaHQKKyMgbm90aWNlIGFuZCB0aGlzIG5vdGlj ZSBhcmUgcHJlc2VydmVkLgorCisuICIke3NyY2Rpcj0ufS9pbml0LnNoIjsgcGF0aF9wcmVwZW5k XyAuLi9zcmMKK3JlcXVpcmVfZW5fdXRmOF9sb2NhbGVfCitMQ19BTEw9ZW5fVVMuVVRGLTgKK2V4 cG9ydCBMQ19BTEwKK3JlcXVpcmVfcGNyZV8KKworZWNobyAuIHwgZ3JlcCAtcVAgJygqVVRGKS4n IDI+L2Rldi9udWxsIFwKKyAgfHwgc2tpcF8gJ1BDUkUgdW5pY29kZSBzdXBwb3J0IGlzIGNvbXBp bGVkIG91dCcKKworZmFpbD0wCisKKyMgJ8OxJyAoVSswMEYxKQorcHJpbnRmICdcMzAyXDIyMVxu JyA+IGluIHx8IGZyYW1ld29ya19mYWlsdXJlXworZ3JlcCAtUCAnXEQnIGluID4gb3V0IHx8IGZh aWw9MQorY29tcGFyZSBpbiBvdXQgfHwgZmFpbD0xCisKKyMg4oCc8J2EnuKAnSAoVSsxRDExRSkK K3ByaW50ZiAnXDM2MFwyMzVcMjA0XDIzNlxuJyA+IGluIHx8IGZyYW1ld29ya19mYWlsdXJlXwor Z3JlcCAtUCAnXFcnIGluID4gb3V0IHx8IGZhaWw9MQorY29tcGFyZSBpbiBvdXQgfHwgZmFpbD0x CisKK0V4aXQgJGZhaWwKLS0gCjIuNDAuMC4zNjMuZzljNjk5MGNjYTIKCg== --000000000000545b5d05fa74110f-- From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Carlo Arenas Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 29 Apr 2023 18:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Jim Meyering Cc: Paul Eggert , 62983@debbugs.gnu.org Received: via spool by 62983-submit@debbugs.gnu.org id=B62983.16827939279208 (code B ref 62983); Sat, 29 Apr 2023 18:46:02 +0000 Received: (at 62983) by debbugs.gnu.org; 29 Apr 2023 18:45:27 +0000 Received: from localhost ([127.0.0.1]:36639 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pspZX-0002OS-Ho for submit@debbugs.gnu.org; Sat, 29 Apr 2023 14:45:27 -0400 Received: from mail-wm1-f48.google.com ([209.85.128.48]:50346) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pspZS-0002OA-NK for 62983@debbugs.gnu.org; Sat, 29 Apr 2023 14:45:26 -0400 Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-3f18dacd392so5284905e9.0 for <62983@debbugs.gnu.org>; Sat, 29 Apr 2023 11:45:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682793917; x=1685385917; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=W3Rzn5soDEd/nmYFzDEU+iyvKxefqW19KGfFDQ3SzZQ=; b=TvYR2sDPLNQ1ZmSupcGSjxUS/fgldkI3kRLP+tasW19Zl7266/aAT5EjwSM7r8JSw1 PDZXjAS8uo7p0reQkrWSQcer9kT/ofclMdnMeStvCQOTi3TC+20E1lggyPIT4UHQNQfQ b0Iv6PEL2C+ki6J66eZkM4EriSnxS/Qv2IKBbqhoCaVgUiNOx5LKjfr52y9pT8BPDSRc 4gGHAZD6v4RbR/yN/q06tDNkXljwg/XAk/IitONfJryvH4vDQN2mFJrFRxX3lf5MQQh+ W9Loqhses/FN/QUmPnh1lsnoPOQ6guObFRVaz3jvYyf0GZ2cXB4/wvDQ5UFwYycerE1p Hejw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682793917; x=1685385917; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W3Rzn5soDEd/nmYFzDEU+iyvKxefqW19KGfFDQ3SzZQ=; b=ELjRxDYvxiwmNa3F4WcYKurE/lbSRn09/V5kjwKLJ3jI3H8XflebLuuhyO9+HT5K7Y YUPitBJ8lnifoetBGyJo6yTcPTYnwJTIFTWx+5iWbllKj6D8Bb/MOciCUSP9Kh5FIH7V 0Dp4dB8hcFQ7fYRR1HpF9eaYtVZazwXGjYnYNS+OZn2MPhb8kLKudH03c6tGOKTPnOZK QZgr72uqnZwX2kKqPqWRFqFS1BRhul7pmgA+44SSG2BTgy+0thllzTPwo7OU/xc21Zxe jsdNi3mo7DoeJzLkg7p0Jj87dZqf5HOPf8t74VUEU9zwwK4RZNNi1YUPyKt9J+HPlODE Rh4A== X-Gm-Message-State: AC+VfDyyXe0xziL7fGU81PsCgMdTGIxhQS87+UybVcjZ+s1oSX9qRtjP GeB0J/X8V5kjvSeiV/4AMNCqMhxdSCULcfLv6xX883/wgWQ= X-Google-Smtp-Source: ACHHUZ7hpP+TeBaBHkMlYWECYCUIF3FehJLw+QmmpNNXLaiBev9xs8DVEGkVeVoPD/Wq+mGD+EjUaa8OX/jqYQ/P3mw= X-Received: by 2002:a5d:42c2:0:b0:2ef:ae4e:3549 with SMTP id t2-20020a5d42c2000000b002efae4e3549mr6342094wrr.55.1682793916720; Sat, 29 Apr 2023 11:45:16 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Carlo Arenas Date: Sat, 29 Apr 2023 11:45:05 -0700 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Just some nitpicking, but could we use single quotes around the '=F0=9D=84= =9E' character in pcre-utf8-bug224 instead of double quotes? Carlo From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 30 Apr 2023 01:16:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Jim Meyering , Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= Cc: 62983@debbugs.gnu.org Received: via spool by 62983-submit@debbugs.gnu.org id=B62983.168281732020188 (code B ref 62983); Sun, 30 Apr 2023 01:16:02 +0000 Received: (at 62983) by debbugs.gnu.org; 30 Apr 2023 01:15:20 +0000 Received: from localhost ([127.0.0.1]:36872 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psvep-0005FX-D4 for submit@debbugs.gnu.org; Sat, 29 Apr 2023 21:15:19 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:48052) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psvej-0005FB-5E for 62983@debbugs.gnu.org; Sat, 29 Apr 2023 21:15:17 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id E07E83C097AFA; Sat, 29 Apr 2023 18:15:06 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 8UYn7HZAoJPL; Sat, 29 Apr 2023 18:15:06 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 253D13C09FA01; Sat, 29 Apr 2023 18:15:06 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 253D13C09FA01 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1682817306; bh=fGeuYqvstp6kY9rB0wnVS7lb5Ixmu1Na3Df2rVODfvQ=; h=Message-ID:Date:MIME-Version:To:From; b=S+LniBPwtmAlf7ZIEEcPrJ6+iqatxBAItZnF1s9/5nwokHkQ0V8g7fNhG34++raV3 o4bL5DVRgX+KeYX4Hdw9Eq6OZuorIdtNbWMA3XPoZPKfGBSQ7ZCunLTmpJvz336lbF svSt4sOtsFeNm0WVd9RSL+HIr3G/dn1ABNh9nTNs6bNjEAwzZWz2pHrOhaEkoryMbr e52VNHJS5R2ddPwiRCX7YJHfX/UUl6WuMnlrqlq2lzE2Ro+59lac+sIIFjVMq6b5in OzqtZBc3UhO8MFlOlfMr/GuSCoqYP9Az74yLbum78J/mVF8xmgxdQpdGuMKl6A3LW2 pZA71Pu5OGMhg== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id EpIouVdBQ2Ck; Sat, 29 Apr 2023 18:15:06 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id E4F993C097AFA; Sat, 29 Apr 2023 18:15:05 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------qkHhYN5701DawvdeoXS0hW11" Message-ID: <0ab0e249-b81e-4e07-6050-33e5f9f5bd1c@cs.ucla.edu> Date: Sat, 29 Apr 2023 18:15:05 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Content-Language: en-US References: From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: X-Spam-Score: -1.1 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.1 (--) This is a multi-part message in MIME format. --------------qkHhYN5701DawvdeoXS0hW11 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 2023-04-28 23:54, Jim Meyering wrote: > I've made some small adjustments and tidied up the ChangeLog in the attached. One question about that patch (both original and as revised). Why do we need a new binary_safe slot in struct pcre_comp? Shouldn't the binary_safe stuff be done at compile-time rather than run-time? Proposed revised patch attached. It also tweaks commentary slightly, and uses a more uniform style in the test comments (something like what Carlo suggested, but a bit wordier since it names the characters). --------------qkHhYN5701DawvdeoXS0hW11 Content-Type: text/x-patch; charset=UTF-8; name="0001-pcre-work-around-a-PCRE2_MATCH_INVALID_UTF-bug.patch" Content-Disposition: attachment; filename*0="0001-pcre-work-around-a-PCRE2_MATCH_INVALID_UTF-bug.patch" Content-Transfer-Encoding: base64 RnJvbSAxNTU5NGVlMmM2MTFmMzVjYTYyNTFjMzNiMzM3Yjk0MjcwZDZlNTM2IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiA9P1VURi04P3E/Q2FybG89MjBNYXJjZWxvPTIwQXJl bmFzPTIwQmVsPUMzPUIzbj89IDxjYXJlbmFzQGdtYWlsLmNvbT4KRGF0ZTogVGh1LCAyMCBB cHIgMjAyMyAxODozNzoyMCAtMDcwMApTdWJqZWN0OiBbUEFUQ0hdIHBjcmU6IHdvcmsgYXJv dW5kIGEgUENSRTJfTUFUQ0hfSU5WQUxJRF9VVEYgYnVnCgpQQ1JFMiBoYXMgYSBidWcgd2hl biB1c2luZyBQQ1JFMl9NQVRDSF9JTlZBTElEX1VURjogaXQgd291bGQKc29tZXRpbWVzIGZh aWwgdG8gbWF0Y2ggcGF0dGVybnMgdXNpbmcgbmVnYXRpdmUgY2xhc3NlcwpsaWtlIFxXIGFu ZCBcRC4KCiogTkVXUyAoQnVnIGZpeGVzKTogTWVudGlvbiBpdC4KKiBzcmMvcGNyZTJzZWFy Y2guYzogUmVzdHJpY3QgaW1wYWN0IG9mIHRoZSBidWcuCkRvIG5vdCB1c2UgdGhlIHByb2Js ZW1hdGljIGZsYWcgd2l0aCBicm9rZW4gdmVyc2lvbnMgb2YgUENSRTIuCkFsc28sIGdlbmVy YXRlIGxvY2FsZSB0YWJsZXMgb25seSBmb3Igc2luZ2xlLWJ5dGUgbG9jYWxlcywKYXMgdGhl IFBDUkUyIGRvY3VtZW50YXRpb24gcmVjb21tZW5kcyB0aGlzLgoqIHRlc3RzL01ha2VmaWxl LmFtIChURVNUUyk6IEFkZCB0aGUgZmlsZSBuYW1lCiogdGVzdHMvcGNyZS11dGY4LWJ1ZzIy NDogTmV3IGZpbGUsIHRvIHRlc3QgZm9yIHRoaXMuCi0tLQogTkVXUyAgICAgICAgICAgICAg ICAgICB8ICA1ICsrKysrCiBzcmMvcGNyZXNlYXJjaC5jICAgICAgIHwgMjkgKysrKysrKysr KysrKysrKystLS0tLS0tLS0tLS0KIHRlc3RzL01ha2VmaWxlLmFtICAgICAgfCAgMSArCiB0 ZXN0cy9wY3JlLXV0ZjgtYnVnMjI0IHwgMzEgKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKwogNCBmaWxlcyBjaGFuZ2VkLCA1NCBpbnNlcnRpb25zKCspLCAxMiBkZWxldGlvbnMo LSkKIGNyZWF0ZSBtb2RlIDEwMDc1NSB0ZXN0cy9wY3JlLXV0ZjgtYnVnMjI0CgpkaWZmIC0t Z2l0IGEvTkVXUyBiL05FV1MKaW5kZXggYzE1NzY0Yy4uNTNjMTQyNyAxMDA2NDQKLS0tIGEv TkVXUworKysgYi9ORVdTCkBAIC0xNSw2ICsxNSwxMSBAQCBHTlUgZ3JlcCBORVdTICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgLSotIG91dGxpbmUgLSotCiAgIHdoZW4g cnVubmluZyBvbiAzMi1iaXQgeDg2IGFuZCBBUk0gaG9zdHMgdXNpbmcgZ2xpYmMgMi4zNCsu CiAgIFtidWcgaW50cm9kdWNlZCBpbiBncmVwIDMuOV0KIAorICBncmVwIC1QIG5vIGxvbmdl ciBmYWlscyB0byBtYXRjaCBwYXR0ZXJucyB1c2luZyBuZWdhdGVkIGNsYXNzZXMKKyAgbGlr ZSBcRCBvciBcVyB3aGVuIGxpbmtlZCB3aXRoIFBDUkUyIDEwLjM0IG9yIG5ld2VyLgorICBb YnVnIGludHJvZHVjZWQgaW4gZ3JlcCAzLjhdCisKKwogKiogQ2hhbmdlcyBpbiBiZWhhdmlv cgogCiAgIGdyZXAgLS12ZXJzaW9uIG5vdyBwcmludHMgYSBsaW5lIGRlc2NyaWJpbmcgdGhl IHZlcnNpb24gb2YgUENSRTIgaXQgdXNlcy4KZGlmZiAtLWdpdCBhL3NyYy9wY3Jlc2VhcmNo LmMgYi9zcmMvcGNyZXNlYXJjaC5jCmluZGV4IGU4NjdmNDkuLjQ0MjYyYWMgMTAwNjQ0Ci0t LSBhL3NyYy9wY3Jlc2VhcmNoLmMKKysrIGIvc3JjL3BjcmVzZWFyY2guYwpAQCAtMzksNiAr MzksMTUgQEAKICMgZGVmaW5lIFBDUkUyX0VYVFJBX0FTQ0lJX0JTRCAwCiAjZW5kaWYKIAor LyogVXNlIFBDUkUyX01BVENIX0lOVkFMSURfVVRGIGlmIHN1cHBvcnRlZCBhbmQgbm90IGJ1 Z2d5OworICAgc2VlIDxodHRwczovL2dpdGh1Yi5jb20vUENSRTJQcm9qZWN0L3BjcmUyL2lz c3Vlcy8yMjQ+LgorICAgQXNzdW1lIHRoZSBidWcgd2lsbCBiZSBmaXhlZCBhZnRlciBQQ1JF MiAxMC40Mi4gICovCisjaWYgZGVmaW5lZCBQQ1JFMl9NQVRDSF9JTlZBTElEX1VURiAmJiAx MCA8IFBDUkUyX01BSk9SICsgKDQyIDwgUENSRTJfTUlOT1IpCitlbnVtIHsgTUFUQ0hfSU5W QUxJRF9VVEYgPSBQQ1JFMl9NQVRDSF9JTlZBTElEX1VURiB9OworI2Vsc2UKK2VudW0geyBN QVRDSF9JTlZBTElEX1VURiA9IDAgfTsKKyNlbmRpZgorCiBzdHJ1Y3QgcGNyZV9jb21wCiB7 CiAgIC8qIEdlbmVyYWwgY29udGV4dCBmb3IgUENSRSBvcGVyYXRpb25zLiAgKi8KQEAgLTEz MCwxNiArMTM5LDExIEBAIGppdF9leGVjIChzdHJ1Y3QgcGNyZV9jb21wICpwYywgY2hhciBj b25zdCAqc3ViamVjdCwgaWR4X3Qgc2VhcmNoX2J5dGVzLAogICAgIH0KIH0KIAotLyogUmV0 dXJuIHRydWUgaWYgRSBpcyBhbiBlcnJvciBjb2RlIGZvciBiYWQgVVRGLTgsIGFuZCBpZiBw Y3JlMl9tYXRjaAotICAgY291bGQgcmV0dXJuIEUgYmVjYXVzZSBQQ1JFIGxhY2tzIFBDUkUy X01BVENIX0lOVkFMSURfVVRGLiAgKi8KKy8qIFJldHVybiB0cnVlIGlmIEUgaXMgYW4gZXJy b3IgY29kZSBmb3IgYmFkIFVURi04LiAgKi8KIHN0YXRpYyBib29sCiBiYWRfdXRmOF9mcm9t X3BjcmUyIChpbnQgZSkKIHsKLSNpZmRlZiBQQ1JFMl9NQVRDSF9JTlZBTElEX1VURgotICBy ZXR1cm4gZmFsc2U7Ci0jZWxzZQogICByZXR1cm4gUENSRTJfRVJST1JfVVRGOF9FUlIyMSA8 PSBlICYmIGUgPD0gUENSRTJfRVJST1JfVVRGOF9FUlIxOwotI2VuZGlmCiB9CiAKIC8qIENv bXBpbGUgdGhlIC1QIHN0eWxlIFBBVFRFUk4sIGNvbnRhaW5pbmcgU0laRSBieXRlcyB0aGF0 IGFyZQpAQCAtMTY4LDYgKzE3Miw5IEBAIFBjb21waWxlIChjaGFyICpwYXR0ZXJuLCBpZHhf dCBzaXplLCByZWdfc3ludGF4X3QgaWdub3JlZCwgYm9vbCBleGFjdCkKIAogICAgICAgZmxh Z3MgfD0gUENSRTJfVVRGOwogCisgICAgICAvKiBJZiBzdXBwb3J0ZWQsIGNvbnNpZGVyIGlu dmFsaWQgVVRGLTggYXMgYSBiYXJyaWVyIG5vdCBhbiBlcnJvci4gICovCisgICAgICBmbGFn cyB8PSBNQVRDSF9JTlZBTElEX1VURjsKKwogICAgICAgLyogSWYgUENSRTJfRVhUUkFfQVND SUlfQlNEIGlzIGF2YWlsYWJsZSwgdXNlIFBDUkUyX1VDUAogICAgICAgICAgc28gdGhhdCBc ZCBkb2VzIG5vdCBoYXZlIHRoZSB1bmRlc2lyYWJsZSBlZmZlY3Qgb2YgbWF0Y2hpbmcKICAg ICAgICAgIG5vbi1BU0NJSSBkaWdpdHMuICBPdGhlcndpc2UgKGkuZS4sIHdpdGggUENSRTIg MTAuNDIgYW5kIGVhcmxpZXIpLApAQCAtMTc5LDEwICsxODYsNiBAQCBQY29tcGlsZSAoY2hh ciAqcGF0dGVybiwgaWR4X3Qgc2l6ZSwgcmVnX3N5bnRheF90IGlnbm9yZWQsIGJvb2wgZXhh Y3QpCiAjaWYgMAogICAgICAgLyogRG8gbm90IG1hdGNoIGluZGl2aWR1YWwgY29kZSB1bml0 cyBidXQgb25seSBVVEYtOC4gICovCiAgICAgICBmbGFncyB8PSBQQ1JFMl9ORVZFUl9CQUNL U0xBU0hfQzsKLSNlbmRpZgotI2lmZGVmIFBDUkUyX01BVENIX0lOVkFMSURfVVRGCi0gICAg ICAvKiBDb25zaWRlciBpbnZhbGlkIFVURi04IGFzIGEgYmFycmllciwgaW5zdGVhZCBvZiBl cnJvci4gICovCi0gICAgICBmbGFncyB8PSBQQ1JFMl9NQVRDSF9JTlZBTElEX1VURjsKICNl bmRpZgogICAgIH0KIApAQCAtMjI2LDcgKzIyOSw5IEBAIFBjb21waWxlIChjaGFyICpwYXR0 ZXJuLCBpZHhfdCBzaXplLCByZWdfc3ludGF4X3QgaWdub3JlZCwgYm9vbCBleGFjdCkKICAg ICAgIHNpemUgPSByZV9zaXplOwogICAgIH0KIAotICBwY3JlMl9zZXRfY2hhcmFjdGVyX3Rh YmxlcyAoY2NvbnRleHQsIHBjcmUyX21ha2V0YWJsZXMgKGdjb250ZXh0KSk7CisgIGlmICgh bG9jYWxlaW5mby5tdWx0aWJ5dGUpCisgICAgcGNyZTJfc2V0X2NoYXJhY3Rlcl90YWJsZXMg KGNjb250ZXh0LCBwY3JlMl9tYWtldGFibGVzIChnY29udGV4dCkpOworCiAgIHBjLT5jcmUg PSBwY3JlMl9jb21waWxlICgoUENSRTJfU1BUUikgcGF0dGVybiwgc2l6ZSwgZmxhZ3MsCiAg ICAgICAgICAgICAgICAgICAgICAgICAgICAmZWMsICZlLCBjY29udGV4dCk7CiAgIGlmICgh cGMtPmNyZSkKQEAgLTMxMyw3ICszMTgsNyBAQCBQZXhlY3V0ZSAodm9pZCAqdmNwLCBjaGFy IGNvbnN0ICpidWYsIGlkeF90IHNpemUsIGlkeF90ICptYXRjaF9zaXplLAogCiAgICAgICAg ICAgZSA9IGppdF9leGVjIChwYywgc3ViamVjdCwgbGluZV9lbmQgLSBzdWJqZWN0LAogICAg ICAgICAgICAgICAgICAgICAgICAgc2VhcmNoX29mZnNldCwgb3B0aW9ucyk7Ci0gICAgICAg ICAgaWYgKCFiYWRfdXRmOF9mcm9tX3BjcmUyIChlKSkKKyAgICAgICAgICBpZiAoTUFUQ0hf SU5WQUxJRF9VVEYgfHwgIWJhZF91dGY4X2Zyb21fcGNyZTIgKGUpKQogICAgICAgICAgICAg YnJlYWs7CiAKICAgICAgICAgICBpZHhfdCB2YWxpZF9ieXRlcyA9IHBjcmUyX2dldF9zdGFy dGNoYXIgKHBjLT5kYXRhKTsKZGlmZiAtLWdpdCBhL3Rlc3RzL01ha2VmaWxlLmFtIGIvdGVz dHMvTWFrZWZpbGUuYW0KaW5kZXggNzcxOGYyNC4uOWI0NDIyZSAxMDA2NDQKLS0tIGEvdGVz dHMvTWFrZWZpbGUuYW0KKysrIGIvdGVzdHMvTWFrZWZpbGUuYW0KQEAgLTE1NSw2ICsxNTUs NyBAQCBURVNUUyA9CQkJCQkJXAogICBwY3JlLWppdHN0YWNrCQkJCQlcCiAgIHBjcmUtbwkJ CQkJXAogICBwY3JlLXV0ZjgJCQkJCVwKKyAgcGNyZS11dGY4LWJ1ZzIyNAkJCQlcCiAgIHBj cmUtdXRmOC13CQkJCQlcCiAgIHBjcmUtdwkJCQkJXAogICBwY3JlLXd4LWJhY2tyZWYJCQkJ XApkaWZmIC0tZ2l0IGEvdGVzdHMvcGNyZS11dGY4LWJ1ZzIyNCBiL3Rlc3RzL3BjcmUtdXRm OC1idWcyMjQKbmV3IGZpbGUgbW9kZSAxMDA3NTUKaW5kZXggMDAwMDAwMC4uNzg4NDUwMQot LS0gL2Rldi9udWxsCisrKyBiL3Rlc3RzL3BjcmUtdXRmOC1idWcyMjQKQEAgLTAsMCArMSwz MSBAQAorIyEvYmluL3NoCisjIEVuc3VyZSBuZWdhdGVkIFBlcmwgY2xhc3NlcyBtYXRjaCBt dWx0aWJ5dGUgY2hhcmFjdGVycyBpbiBVVEYgbW9kZS4KKyMKKyMgQ29weXJpZ2h0IChDKSAy MDIzIEZyZWUgU29mdHdhcmUgRm91bmRhdGlvbiwgSW5jLgorIworIyBDb3B5aW5nIGFuZCBk aXN0cmlidXRpb24gb2YgdGhpcyBmaWxlLCB3aXRoIG9yIHdpdGhvdXQgbW9kaWZpY2F0aW9u LAorIyBhcmUgcGVybWl0dGVkIGluIGFueSBtZWRpdW0gd2l0aG91dCByb3lhbHR5IHByb3Zp ZGVkIHRoZSBjb3B5cmlnaHQKKyMgbm90aWNlIGFuZCB0aGlzIG5vdGljZSBhcmUgcHJlc2Vy dmVkLgorCisuICIke3NyY2Rpcj0ufS9pbml0LnNoIjsgcGF0aF9wcmVwZW5kXyAuLi9zcmMK K3JlcXVpcmVfZW5fdXRmOF9sb2NhbGVfCitMQ19BTEw9ZW5fVVMuVVRGLTgKK2V4cG9ydCBM Q19BTEwKK3JlcXVpcmVfcGNyZV8KKworZWNobyAuIHwgZ3JlcCAtcVAgJygqVVRGKS4nIDI+ L2Rldi9udWxsIFwKKyAgfHwgc2tpcF8gJ1BDUkUgdW5pY29kZSBzdXBwb3J0IGlzIGNvbXBp bGVkIG91dCcKKworZmFpbD0wCisKKyMgJ8OxJyAtIFUrMDBGMSBMQVRJTiBTTUFMTCBMRVRU RVIgTiBXSVRIIFRJTERFCitwcmludGYgJ1wzMDJcMjIxXG4nID4gaW4gfHwgZnJhbWV3b3Jr X2ZhaWx1cmVfCitncmVwIC1QICdcRCcgaW4gPiBvdXQgfHwgZmFpbD0xCitjb21wYXJlIGlu IG91dCB8fCBmYWlsPTEKKworIyAn8J2EnicgLSBVKzFEMTFFIE1VU0lDQUwgU1lNQk9MIEcg Q0xFRgorcHJpbnRmICdcMzYwXDIzNVwyMDRcMjM2XG4nID4gaW4gfHwgZnJhbWV3b3JrX2Zh aWx1cmVfCitncmVwIC1QICdcVycgaW4gPiBvdXQgfHwgZmFpbD0xCitjb21wYXJlIGluIG91 dCB8fCBmYWlsPTEKKworRXhpdCAkZmFpbAotLSAKMi40MC4wCgo= --------------qkHhYN5701DawvdeoXS0hW11-- From unknown Sat Jun 14 03:53:24 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 30 Apr 2023 07:02:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62983 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Paul Eggert Cc: Carlo Marcelo Arenas =?UTF-8?Q?Bel=C3=B3n?= , 62983@debbugs.gnu.org Received: via spool by 62983-submit@debbugs.gnu.org id=B62983.16828381173674 (code B ref 62983); Sun, 30 Apr 2023 07:02:02 +0000 Received: (at 62983) by debbugs.gnu.org; 30 Apr 2023 07:01:57 +0000 Received: from localhost ([127.0.0.1]:37108 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pt14G-0000xC-Ot for submit@debbugs.gnu.org; Sun, 30 Apr 2023 03:01:57 -0400 Received: from mail-lf1-f50.google.com ([209.85.167.50]:62612) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pt145-0000wi-DI for 62983@debbugs.gnu.org; Sun, 30 Apr 2023 03:01:55 -0400 Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-4f00d3f98deso17597189e87.0 for <62983@debbugs.gnu.org>; Sun, 30 Apr 2023 00:01:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682838099; x=1685430099; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BaCub5jJ1swq1p51II8Ku4X1HRVwGkPak5IxDgLuhaE=; b=cc//IPITbZNcxfTeyHf9jKE9srCEyzrJEPawedOr81btdFD/MDVRI9MfiravSavT+7 ZR8cnqnJly/DVtFAyKtNTHWY4drCYiZckVvvlOcknctmnLTxbIpbOn6MgyGh+mZ0j3mQ 4Z/9MV7xNhBd+cXmuHc2gW0j3PT6T3WJAU6B09m+Ko51MuC2vifyDb4A3D6Ab1nVN54a 3gZPk6lkscRY1MSNH1dvqqslwrNFD5KSD6mMQU/D7yqX0tZLoHepTgc5P0iqcE8oE/gH jXPQbjzFAbC3FApXmBbwfhrD81Rf/IetmK6BeS/0rRqFiwLd3k9mKskGc++KVIgqfCld JYqg== X-Gm-Message-State: AC+VfDwU3RgaI4ptd4fnHhIYW8Q2Q12O4UIJ/dZQVfLl2o/wObUzHsb6 mTHfokLzD8hvpfYJLYwTebmHZ/H1yOBE9B+TfRo= X-Google-Smtp-Source: ACHHUZ6su6C2CXQLGCO7F5Hnb9gnt5KHOcWcPSLy0uDUEa9USagXoCs5mV7ndLOj8HkZlV13zm8i3bgPGfe3w4OMlRE= X-Received: by 2002:a05:6512:32c1:b0:4ef:ef1d:a987 with SMTP id f1-20020a05651232c100b004efef1da987mr3602683lfg.25.1682838099071; Sun, 30 Apr 2023 00:01:39 -0700 (PDT) MIME-Version: 1.0 References: <0ab0e249-b81e-4e07-6050-33e5f9f5bd1c@cs.ucla.edu> In-Reply-To: <0ab0e249-b81e-4e07-6050-33e5f9f5bd1c@cs.ucla.edu> From: Jim Meyering Date: Sun, 30 Apr 2023 09:01:26 +0200 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.3 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.8 (/) On Sun, Apr 30, 2023 at 3:15=E2=80=AFAM Paul Eggert wr= ote: > > On 2023-04-28 23:54, Jim Meyering wrote: > > I've made some small adjustments and tidied up the ChangeLog in the att= ached. > > One question about that patch (both original and as revised). Why do we > need a new binary_safe slot in struct pcre_comp? Shouldn't the > binary_safe stuff be done at compile-time rather than run-time? > > Proposed revised patch attached. It also tweaks commentary slightly, and > uses a more uniform style in the test comments (something like what > Carlo suggested, but a bit wordier since it names the characters). Thanks, Paul. I prefer that. Pushed. I've also pushed an update to use the latest from gnulib.