GNU bug report logs -
#60708
pcre: improve support for linking with a library without unicode
Previous Next
Full log
View this message in rfc822 format
On Wed, Jan 11, 2023 at 6:29 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> Oh, I think see your point, but doesn't this mean that even my code was
> too trusting? It should be something like this:
>
> if (localeinfo.multibyte)
> {
> uint32_t unicode;
> if (! (localeinfo.using_utf8
> && 0 <= pcre2_config (PCRE2_CONFIG_UNICODE, &unicode)
> && unicode))
> die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8
> locales"));
> ...
>
> That is, we're better off diagnosing the problem and not attempting to
> use pcre2 if the result will be wrong (or even result in undefined
> behavior). The problem is unlikely to occur so it's good to be
> conservative here.
Maybe we are not clear on what the "problem" is.
The issue the original code was trying to avoid was to set PCRE_UTF if
the library doesn't have Unicode support, as that would block grep
with a PCRE error (as shown in the commit message), and which also
disabled some tests as it couldn't be differentiated with a failure in
grep because -P wasn't supported. Your suggested code doesn't address
that, it merely changes the error message with one that would be IMHO
even less clear and worsens the problem.
Using a non Unicode PCRE library is perfectly fine, and there is no
"undefined behavior" risk, and indeed `grep -P` without the UTF flag
is exactly what the alternate path uses and what is recommended for
speed, so?
Carlo
This bug report was last modified 2 years and 154 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.