#39970 - guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales

GNU bug report logs - #39970
guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales

Package: guix;

Reported by: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>

Date: Sat, 7 Mar 2020 12:02:01 UTC

Severity: normal

Message #40 received at 39970 <at> debbugs.gnu.org (full text, mbox):

From: Taylan Kammer <taylan.kammer <at> gmail.com> To: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>, Ludovic Courtès <ludo <at> gnu.org> Cc: 39970 <at> debbugs.gnu.org Subject: Re: bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales Date: Wed, 5 May 2021 09:04:32 +0200

On 12.03.2020 12:02, pelzflorian (Florian Pelz) wrote: > > Guile’s behavior that i is not among [a-z] has been confirmed as > unexpected by a natively Turkish friend of mine. It is different from > the behavior of current glibc: > > florian <at> florianmacbook ~$ cat iyiyim.c > #include <regex.h> > #include <stdio.h> > #include <stdlib.h> > #define STR "iyiyım" > int main (int argc, > char** argv) > { > regex_t only_letters; > int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED); > if (r != 0) > printf ("This error does not happen.\n"); > r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0); > if (r == 0) > printf ("The string " STR " matched!\n"); > else > printf ("No match for " STR ".\n"); > } > florian <at> florianmacbook ~$ gcc -o iyiyim iyiyim.c > florian <at> florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim > The string iyiyım matched! > > Apparently Guile uses a bundled regular expression library rather than > glibc. I can try making Guile use a newer GNUlib for its regular > expressions, maybe that helps. Shall I file a separate bug for Guile? > Also native Turkish speaker here, and yeah that seems like a clear bug. By the way, Turkish doesn't have q, w, or x. So if [a-z] is interpreted by locale, it would fail to match those letters. I suppose that doesn't matter for the patch you guys used but it might have been part of the original problem. The dotless lowercase i / dotted uppercase I mostly bites programmers in case conversion. The uppercase of i is İ and the lowercase of I is ı. There was even an exploit in GitHub related to this: https://eng.getwisdom.io/hacking-github-with-unicode-dotless-i/ - Taylan

This bug report was last modified 4 years and 104 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #39970 guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales

GNU bug report logs - #39970
guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales