GNU bug report logs - #62983
workaround PCRE2 bug affecting at least \D and \W

Previous Next

Package: grep;

Reported by: Carlo Marcelo Arenas Belón <carenas <at> gmail.com>

Date: Fri, 21 Apr 2023 02:05:01 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Carlo Marcelo Arenas Belón <carenas <at> gmail.com>
Cc: 62983 <at> debbugs.gnu.org
Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W
Date: Fri, 21 Apr 2023 11:42:50 -0700
[Message part 1 (text/plain, inline)]
On 2023-04-20 19:04, Carlo Marcelo Arenas Belón wrote:
> All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on
> its JIT implementation that results in failure to match for the negative
> perl classes, and seems to be easier to replicate when the matching
> character is a multibyte one.

Unfortunately that is a little vague. I expect the issue is not limited 
to \D and \W, as there are other ways to specify negative Perl classes. 
And if the bug merely seems to be easier to replicate with multibyte 
characters, it sounds like we may have issues even when matching ASCII 
characters in a UTF-8 locale.

Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We 
should focus our optimization efforts on future PCRE2 versions, and not 
worry about optimizing earlier versions where optimizations complicate 
maintenance for a declining benefit, and are likely to provoke bugs in 
older versions that as time passes will be harder to debug.


> Alternatively JIT could be disabled instead, but the option selected has
> less of an impact on performance.

Disabling JIT sounds better, as correctness trumps performance. Until 
the bug is fixed (or at least better-understood so that we have a 
workaround we can trust), how about the attached patch instead?
[0001-grep-use-PCRE2-JIT-only-in-unibyte-locales.patch (text/x-patch, attachment)]

This bug report was last modified 2 years and 46 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.