GNU bug report logs - #47264
RFE: pcre2 support

Previous Next

Package: grep;

Reported by: Jaroslav Skarvada <jskarvad <at> redhat.com>

Date: Fri, 19 Mar 2021 15:23:01 UTC

Severity: wishlist

Merged with 22345, 40395

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #47 received at 47264-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Carlo Marcelo Arenas Belón <carenas <at> gmail.com>
Cc: Jaroslav Skarvada <jskarvad <at> redhat.com>, 47264-done <at> debbugs.gnu.org
Subject: Re: bug#47264: [PATCH v2] pcre: migrate to pcre2
Date: Sun, 14 Nov 2021 12:45:29 -0800
[Message part 1 (text/plain, inline)]
On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote:
> Sadly, hadn't been able to generate a release,

Does this mean you're having trouble running 'make dist'? If so, what's 
the trouble?


> it seems to be ready for some broader testing, specially if the
> attached patch is applied on top of a 10.37 release (tested that way
> in OpenBSD i386)

OK, thanks, I installed it into the Savannah master copy of GNU grep, 
except that I didn't rename m4/pcre.m4 to m4/pcre2.m4, or rename the 
macros to use PCRE2. This made the change easier to audit. Revised patch 
0001 attached.

Also, I followed up with several related patches (also attached as 
0002-0012). Please take a look at them and let us know of any problems. 
In the attached patch "grep: prefer signed integers" I followed the 
usual grep approach of preferring signed to unsigned integers (e.g., 
idx_t to size_t) when either will do; this lets us debug better with 
-fsanitize=undefined to catch integer overflow.

One issue I discovered: PCRE2_EXTRA_MATCH_WORD (which is used by 
pcre2grep -w) is incompatible with 'grep -w'. For example, 'echo a%%a | 
grep -Pw %%' outputs nothing, whereas 'echo a%%a | pcre2grep -w %%' 
outputs 'a%%a'. I think the GNU grep behavior (which is the same as with 
'grep -w', either on Linux or OpenBSD) is more intuitive here: do you 
happen to know why PCRE behaves the way it does? Is that worth a PCRE2 
bug report? Anyway, the attached patches avoid using 
PCRE2_EXTRA_MATCH_WORD for that reason.


> * no more version restrictions (should work with >~10.20)

I tested with 10.00 and found one more glitch (it doesn't have 
PCRE2_SIZE_MAX), which is fixed by the attached patch "grep: port to 
PCRE2 10.20".


> Pending:
> * what to do with the current support of \C (enabled for now)

Let's open another bug report for that; I'm still a bit fuzzy about what 
the pros and cons are.


> * merge of non critical bugfix in #51710[1]

I plan to follow up in that bug report.

Marking this bug as done. Thanks again for working on this.
[0001-grep-migrate-to-pcre2.patch (text/x-patch, attachment)]
[0002-maint-minor-rewording-and-reindenting.patch (text/x-patch, attachment)]
[0003-grep-Don-t-limit-jitstack_max-to-INT_MAX.patch (text/x-patch, attachment)]
[0004-grep-improve-pcre2_get_error_message-comments.patch (text/x-patch, attachment)]
[0005-grep-speed-up-fix-bad-UTF8-check-with-P.patch (text/x-patch, attachment)]
[0006-grep-prefer-signed-integers.patch (text/x-patch, attachment)]
[0007-grep-use-PCRE2_EXTRA_MATCH_LINE.patch (text/x-patch, attachment)]
[0008-grep-simplify-JIT-setup.patch (text/x-patch, attachment)]
[0009-grep-improve-memory-exhaustion-checking-with-P.patch (text/x-patch, attachment)]
[0010-grep-use-ximalloc-not-xcalloc.patch (text/x-patch, attachment)]
[0011-grep-fix-minor-P-memory-leak.patch (text/x-patch, attachment)]
[0012-grep-port-to-PCRE2-10.20.patch (text/x-patch, attachment)]

This bug report was last modified 3 years and 184 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.