GNU bug report logs -
#47264
RFE: pcre2 support
Previous Next
Full log
Message #47 received at 47264-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote:
> Sadly, hadn't been able to generate a release,
Does this mean you're having trouble running 'make dist'? If so, what's
the trouble?
> it seems to be ready for some broader testing, specially if the
> attached patch is applied on top of a 10.37 release (tested that way
> in OpenBSD i386)
OK, thanks, I installed it into the Savannah master copy of GNU grep,
except that I didn't rename m4/pcre.m4 to m4/pcre2.m4, or rename the
macros to use PCRE2. This made the change easier to audit. Revised patch
0001 attached.
Also, I followed up with several related patches (also attached as
0002-0012). Please take a look at them and let us know of any problems.
In the attached patch "grep: prefer signed integers" I followed the
usual grep approach of preferring signed to unsigned integers (e.g.,
idx_t to size_t) when either will do; this lets us debug better with
-fsanitize=undefined to catch integer overflow.
One issue I discovered: PCRE2_EXTRA_MATCH_WORD (which is used by
pcre2grep -w) is incompatible with 'grep -w'. For example, 'echo a%%a |
grep -Pw %%' outputs nothing, whereas 'echo a%%a | pcre2grep -w %%'
outputs 'a%%a'. I think the GNU grep behavior (which is the same as with
'grep -w', either on Linux or OpenBSD) is more intuitive here: do you
happen to know why PCRE behaves the way it does? Is that worth a PCRE2
bug report? Anyway, the attached patches avoid using
PCRE2_EXTRA_MATCH_WORD for that reason.
> * no more version restrictions (should work with >~10.20)
I tested with 10.00 and found one more glitch (it doesn't have
PCRE2_SIZE_MAX), which is fixed by the attached patch "grep: port to
PCRE2 10.20".
> Pending:
> * what to do with the current support of \C (enabled for now)
Let's open another bug report for that; I'm still a bit fuzzy about what
the pros and cons are.
> * merge of non critical bugfix in #51710[1]
I plan to follow up in that bug report.
Marking this bug as done. Thanks again for working on this.
[0001-grep-migrate-to-pcre2.patch (text/x-patch, attachment)]
[0002-maint-minor-rewording-and-reindenting.patch (text/x-patch, attachment)]
[0003-grep-Don-t-limit-jitstack_max-to-INT_MAX.patch (text/x-patch, attachment)]
[0004-grep-improve-pcre2_get_error_message-comments.patch (text/x-patch, attachment)]
[0005-grep-speed-up-fix-bad-UTF8-check-with-P.patch (text/x-patch, attachment)]
[0006-grep-prefer-signed-integers.patch (text/x-patch, attachment)]
[0007-grep-use-PCRE2_EXTRA_MATCH_LINE.patch (text/x-patch, attachment)]
[0008-grep-simplify-JIT-setup.patch (text/x-patch, attachment)]
[0009-grep-improve-memory-exhaustion-checking-with-P.patch (text/x-patch, attachment)]
[0010-grep-use-ximalloc-not-xcalloc.patch (text/x-patch, attachment)]
[0011-grep-fix-minor-P-memory-leak.patch (text/x-patch, attachment)]
[0012-grep-port-to-PCRE2-10.20.patch (text/x-patch, attachment)]
This bug report was last modified 3 years and 184 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.