I was not happy to discover that with grep-3.9 and -P, \d can match multibyte digits like the Arabic ones: $ LC_ALL=en_US.UTF-8 grep -Po '\d+' <<< '٠١٢٣٤٥٦٧٨٩' ٠١٢٣٤٥٦٧٨٩ grep -P has never before done that. Of course, in the C/POSIX locale, there is no such match: $ LC_ALL=C grep -Po '\d+' <<< '٠١٢٣٤٥٦٧٨٩' [1] TL;DR, with the attached fix, grep preprocesses each affected regexp, changing each eligible "\d" to "[0-9]". Consider this a short-term fix. Longer term (subject to pcre2 releases), we may instead simply add a "(?aD)" prefix. If you really want to match non-ASCII digits, use \p{Nd}. For background, see the PCRE2 documentation: https://www.pcre.org/current/doc/html/pcre2pattern.html https://www.pcre.org/current/doc/html/pcre2syntax.html which say this: By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode or in the 16-bit and 32-bit libraries. However, if locale-specific matching is happening, \s and \w may also match characters with code points in the range 128-255. If the PCRE2_UCP option is set, the behaviour of these escape sequences is changed to use Unicode properties and they match many more characters. Per upstream pcre2-10.40-112-g6277357, (?aD) does what we want: PCRE2_EXTRA_ASCII_BSD: This option forces \d to match only ASCII digits, even when PCRE2_UCP is set. It can be changed within a pattern by means of the (?aD) option setting. I used pcre2grep (built from master) to demonstrate how we may eventually use "(?aD)" under the covers: $ LC_ALL=en_US.UTF-8 ./pcre2grep --color -u '(?aD)\d' <<< '٠١٢٣٤٥٦٧٨٩' [Exit 1] $ LC_ALL=en_US.UTF-8 ./pcre2grep --color -u '(?aD)^\d+$' <<< '٠١٢٣٤٥٦٧٨٩' ٠١٢٣٤٥٦٧٨٩ For the record, https://github.com/PCRE2Project/pcre2 currently declares 10.42 to be the latest, while there's a commit suggesting it's 10.43. The difference is important: the 10.43 has support for (?aD), while 10.42 does not. Incidentally, you can demonstrate this in python3, too: $ LC_ALL=en_US.UTF-8 python3 \ -c "import re; print(re.match(r'\d+', '٠١٢٣٤٥٦٧٨٩'))" Use flags=re.ASCII to get the often-desired behavior: $ LC_ALL=en_US.UTF-8 python3 \ -c "import re; print(re.match(r'\d+', '٠١٢٣٤٥٦٧٨٩', flags=re.ASCII))" None This is cause for a new snapshot today and soon thereafter, the release of grep-3.10.