GNU bug report logs -
#60690
[PATCH v2] grep: correctly identify utf-8 characters with \{b,w} in -P
Previous Next
Full log
View this message in rfc822 format
On 4/3/23 23:56, Carlo Arenas wrote:
> On Mon, Apr 3, 2023 at 2:38 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>
>> on March 23 Git disabled
>> the use of PCRE2_UCP in PCRE2 10.34 or earlier[6], due to a PCRE2 bug
>> that can cause a crash when PCRE2_UCP is used[7]. A bug fix[8] should
>> appear in the next PCRE2 release.
>
> Presume PCRE2 is a typo and should have been "git" here?
No, I was talking about what options Git uses when it calls PCRE2
functions. In other words, this is about whether GNU 'grep -P' should be
compatible with 'git grep -P' (as well as with Perl and with pcregrep),
when interpreting \d and similar constructs.
This is an evolving area. Git master is fiddling with flags and options,
and so is GNU grep master, and so is PCRE2, and there are bugs. If
you're running bleeding-edge versions of this code you'll get different
behavior than if you're running grep 3.8, pcregrep 8.45, Perl 5.36, and
git 2.39.2 (which is what Fedora 37 has).
What I'm fearing is that we may evolve into mutually incompatible
interpretations of how Perl regular expressions deal with UTF-8 text.
That'd be a recipe for confusion down the road.
This bug report was last modified 2 years and 70 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.