GNU bug report logs - #47264
RFE: pcre2 support

Previous Next

Package: grep;

Reported by: Jaroslav Skarvada <jskarvad <at> redhat.com>

Date: Fri, 19 Mar 2021 15:23:01 UTC

Severity: wishlist

Merged with 22345, 40395

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Carlo Arenas <carenas <at> gmail.com>
Cc: 47264 <at> debbugs.gnu.org
Subject: bug#47264: [PATCH] pcre: migrate to pcre2
Date: Mon, 8 Nov 2021 11:53:47 -0800
On 11/8/21 01:47, Carlo Arenas wrote:
> On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> Let me know how to help otherwise.

The main thing from my point of view is that I'd like to know what those 
other bugs are. I can split out the patch into pieces if I know what to 
look for.

> I didn't want anyone hitting on those old PCRE2 bugs though with this
> first release, hence why the configure rule is there for now (even if
> I am likely going to remove it for the next version)

If it's a PCRE2 bug we can ask people to fix it in their PCRE2 library.

Possibly we should continue to support PCRE1 as a configure-time option; 
that would assuage concerns about bugs in PCRE2. More work for us, though.

> \C is supported with -P in the PCRE version now though, is removing that ok?

I guess I don't see the harm of supporting \C; why disable it?

>> If memory serves grep currently takes care to not pass invalid UTF-8 in
>> the buffer or pattern. Does PCRE2_MATCH_INVALID_UTF make this work obsolete?
> 
> not sure I understand what you mean

I guess I was thinking about an older grep version.

Currently grep compiles with PCRE_UTF8 and checks for PCRE_ERROR_BADUTF8 
returns from pcre_exec, so it's relying on the count of bytes that this 
pcre_exec returns in sub[0] before calling pcre_exec with 
PCRE_NO_UTF8_CHECK. So, effectively it's using pcre_exec to check that a 
buffer contains valid UTF-8.

I don't see how this works with the proposed patch. It uses sub[0] but I 
don't see how it's set. What am I missing?

One more thing I just noticed: this test:

  if (PCRE2_ERROR_UTF8_ERR1 <= e || e < PCRE2_ERROR_UTF8_ERR21)

is logically equivalent to the following (which is clearer to me):

  if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))

Shouldn't that be the following instead?

  if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1))




This bug report was last modified 3 years and 184 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.