GNU bug report logs - #62267
grep-3.9 bug: \d matches multibyte digits

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Sun, 19 Mar 2023 00:07:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


Message #17 received at 62267 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 62267 <at> debbugs.gnu.org
Subject: Re: bug#62267: grep-3.9 bug: \d matches multibyte digits
Date: Sun, 19 Mar 2023 01:28:38 -0700
[Message part 1 (text/plain, inline)]
On 2023-03-18 23:33, Jim Meyering wrote:
> By the way, have you ever used \D? I think I have not.

No, I'm not much of a Perl user these days (last seriously used it in 
the 1990s...).

> -  char *new_keys = xnmalloc (len / 2 + 1, 5);
> +  char *new_keys = xnmalloc (len / 2 + 1, 6);

This could be xnmalloc (len + 1, 3).

Or if you want to show the work, you can replace it with something like:

   int origlen = sizeof "\\D" - 1;
   int repllen = sizeof "[^0-9]" - 1;
   int expansion = repllen / origlen + (repllen % origlen != 0);
   char *new_keys = xnmalloc (len + 1, expansion);

(Isn't memory allocation fun? :-)


> Doesn't Perl have the same issue?

Oh, you're right. Not being a Perl expert, all I did was run this:

  echo '٠١٢٣٤٥٦٧٨٩' | perl -ne 'print if /\d/'

and I observed no output. However, I now see that I need to use perl's 
-C option too, to get the kind of regular-expression behavior that plain 
grep has.


Looking at the source code again, how about if we move the PCRE-specific 
changes from src/grep.c to src/pcresearch.c which is where it really 
belongs, and more importantly use the bleeding-edge 
PCRE2_EXTRA_ASCII_BSD macro if available?

Something like the attached patch, say. This patch doesn't take your \D 
fixes (or the above suggestions) into account.
[0001-grep-forward-port-to-PCRE2-10.43.patch (text/x-patch, attachment)]

This bug report was last modified 2 years and 143 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.