GNU bug report logs -
#62267
grep-3.9 bug: \d matches multibyte digits
Previous Next
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Sun, 19 Mar 2023 00:07:01 UTC
Severity: normal
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
On 2023-03-18 23:33, Jim Meyering wrote:
> By the way, have you ever used \D? I think I have not.
No, I'm not much of a Perl user these days (last seriously used it in
the 1990s...).
> - char *new_keys = xnmalloc (len / 2 + 1, 5);
> + char *new_keys = xnmalloc (len / 2 + 1, 6);
This could be xnmalloc (len + 1, 3).
Or if you want to show the work, you can replace it with something like:
int origlen = sizeof "\\D" - 1;
int repllen = sizeof "[^0-9]" - 1;
int expansion = repllen / origlen + (repllen % origlen != 0);
char *new_keys = xnmalloc (len + 1, expansion);
(Isn't memory allocation fun? :-)
> Doesn't Perl have the same issue?
Oh, you're right. Not being a Perl expert, all I did was run this:
echo '٠١٢٣٤٥٦٧٨٩' | perl -ne 'print if /\d/'
and I observed no output. However, I now see that I need to use perl's
-C option too, to get the kind of regular-expression behavior that plain
grep has.
Looking at the source code again, how about if we move the PCRE-specific
changes from src/grep.c to src/pcresearch.c which is where it really
belongs, and more importantly use the bleeding-edge
PCRE2_EXTRA_ASCII_BSD macro if available?
Something like the attached patch, say. This patch doesn't take your \D
fixes (or the above suggestions) into account.
[0001-grep-forward-port-to-PCRE2-10.43.patch (text/x-patch, attachment)]
This bug report was last modified 2 years and 142 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.