GNU bug report logs - #60690
[PATCH v2] grep: correctly identify utf-8 characters with \{b,w} in -P

Previous Next

Package: grep;

Reported by: Ævar Arnfjörð Bjarmason <avarab <at> gmail.com>

Date: Mon, 9 Jan 2023 12:19:01 UTC

Severity: normal

Tags: patch

Merged with 62552, 62605

Full log


Message #46 received at 60690 <at> debbugs.gnu.org (full text, mbox):

From: Junio C Hamano <gitster <at> pobox.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: demerphq <at> gmail.com, Philip.Hazel <at> gmail.com, 60690 <at> debbugs.gnu.org,
 mega lith01 <megalith01 <at> gmail.com>, Carlo Arenas <carenas <at> gmail.com>,
 Ævar Arnfjörð Bjarmason <avarab <at> gmail.com>,
 pcre-dev <at> exim.org,
 Tukusej’s Sirs <tukusejssirs <at> protonmail.com>,
 git <at> vger.kernel.org
Subject: Re: bug#60690: -P '\d' in GNU and git grep
Date: Wed, 05 Apr 2023 12:37:49 -0700
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> Here are two ways forward to fix this incompatibility (there are other
> possibilities of course):
>
> (A) GNU grep adds a --no-ucp option that acts like 10.43 pcre2grep
> --no-ucp, and git grep -P follows suit. That is, both GNU and git grep
> act like 10.43 pcre2grep -u, in that they enable PCRE2_UTF, and also
> enable PCRE2_UCP unless --no-ucp is given. This would cause \d to
> match non-ASCII digits unless --no-ucp is given.
>
> (B) GNU grep -P and git grep -P mimic pcre2grep in both -u and
> --no-ucp. That is, they would both do 8-bit-only by default, and use
> PCRE2_UTF only when -u or --utf is given, and use PCRE2_UCP only when
> --no-ucp is absent. This would cause \d to match non-ASCII digits only
> when -u is given but --no-ucp is not.
>
> Under either (A) or (B), future pcre2grep -u, GNU grep -P, and git
> grep -P would be consistent.
>
> I mildly prefer (B) but (A) would also work. (One advantage of (B) is
> that it should be faster....)

For "git grep -P", I would like to hear from Carlo and Ævar; I agree
both (A) and (B) would be workable solutions, and have a slight
preference on a solution that does not add more options that take
only in effect when -P is given, simply because these options are
cumbersome to document and explain, but that is a very minor point.

Thanks.




This bug report was last modified 2 years and 70 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.