GNU bug report logs -
#39970
guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
Previous Next
Full log
View this message in rfc822 format
On 12.03.2020 12:02, pelzflorian (Florian Pelz) wrote:
>
> Guile’s behavior that i is not among [a-z] has been confirmed as
> unexpected by a natively Turkish friend of mine. It is different from
> the behavior of current glibc:
>
> florian <at> florianmacbook ~$ cat iyiyim.c
> #include <regex.h>
> #include <stdio.h>
> #include <stdlib.h>
> #define STR "iyiyım"
> int main (int argc,
> char** argv)
> {
> regex_t only_letters;
> int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED);
> if (r != 0)
> printf ("This error does not happen.\n");
> r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0);
> if (r == 0)
> printf ("The string " STR " matched!\n");
> else
> printf ("No match for " STR ".\n");
> }
> florian <at> florianmacbook ~$ gcc -o iyiyim iyiyim.c
> florian <at> florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim
> The string iyiyım matched!
>
> Apparently Guile uses a bundled regular expression library rather than
> glibc. I can try making Guile use a newer GNUlib for its regular
> expressions, maybe that helps. Shall I file a separate bug for Guile?
>
Also native Turkish speaker here, and yeah that seems like a clear bug.
By the way, Turkish doesn't have q, w, or x. So if [a-z] is interpreted
by locale, it would fail to match those letters. I suppose that doesn't
matter for the patch you guys used but it might have been part of the
original problem.
The dotless lowercase i / dotted uppercase I mostly bites programmers in
case conversion. The uppercase of i is İ and the lowercase of I is ı.
There was even an exploit in GitHub related to this:
https://eng.getwisdom.io/hacking-github-with-unicode-dotless-i/
- Taylan
This bug report was last modified 4 years and 40 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.