GNU bug report logs - #60690
[PATCH v2] grep: correctly identify utf-8 characters with \{b,w} in -P

Previous Next

Package: grep;

Reported by: Ævar Arnfjörð Bjarmason <avarab <at> gmail.com>

Date: Mon, 9 Jan 2023 12:19:01 UTC

Severity: normal

Tags: patch

Merged with 62552, 62605

Full log


Message #52 received at 60690 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: demerphq <at> gmail.com, Philip.Hazel <at> gmail.com, 60690 <at> debbugs.gnu.org,
 mega lith01 <megalith01 <at> gmail.com>, Carlo Arenas <carenas <at> gmail.com>,
 Ævar Arnfjörð Bjarmason <avarab <at> gmail.com>,
 git <at> vger.kernel.org, Junio C Hamano <gitster <at> pobox.com>,
 Tukusej’s Sirs <tukusejssirs <at> protonmail.com>,
 pcre-dev <at> exim.org
Subject: Re: bug#60690: -P '\d' in GNU and git grep
Date: Wed, 5 Apr 2023 13:03:51 -0700
On 2023-04-05 12:40, Jim Meyering wrote:
> (C)  preserve grep -P's tradition of \d matching only 0..9, and once
> grep uses 10.43 or newer, \b and \w will also work as desired.

If I understand you correctly, (C) would mean that GNU grep -P, git grep 
-P, and pcre2grep -u would all use PCRE2_UTF | PCRE2_UCP, and would also 
use the extra option PCRE2_EXTRA_ASCII_BSD that is planned for 10.43 PCRE2.

This would require changes to bleeding-edge pcre2grep -u (since it would 
need to add PCRE2_EXTRA_ASCII_BSD unless --no-ucp is also given), and to 
git grep -P (which would need to add PCRE2_UCP and 
PCRE2_EXTRA_ASCII_BSD, when libpcre2 is new enough to #define 
PCRE2_EXTRA_ASCII_BSD).

This option works for me as well. In fact it's the least work for me 
since I already implemented it in bleeding-edge GNU grep (so it works 
this way already :-).





This bug report was last modified 2 years and 70 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.