GNU bug report logs - #57507
Regular expression matching depends on locale encoding

Previous Next

Package: guile;

Reported by: Jean Abou Samra <jean <at> abou-samra.fr>

Date: Wed, 31 Aug 2022 16:55:02 UTC

Severity: normal

Full log


Message #14 received at 57507 <at> debbugs.gnu.org (full text, mbox):

From: Jean Abou Samra <jean <at> abou-samra.fr>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 57507 <at> debbugs.gnu.org
Subject: Re: bug#57507: Regular expression matching depends on locale encoding
Date: Mon, 5 Sep 2022 20:39:26 +0200
Le 05/09/2022 à 09:48, Ludovic Courtès a écrit :
> Hi Jean,
>
> Jean Abou Samra <jean <at> abou-samra.fr> skribis:
>
>> Regular expressions do funky things with Unicode if a non-Unicode-aware
>> locale is set. Yet, they're purely string operations, so I don't think
>> it's expected that they depend on the locale encoding.
> This is the expected behavior: first because (ice-9 regex) is
> implemented in terms of the libc regex functions, as Dale put (but that
> could be thought as an implementation detail), and second because things
> such as character classes are necessarily locale-dependent (this has
> bitten us in the past, for instance with <https://bugs.gnu.org/35785>).
>
> I hope that makes sense.



OK, thanks, but in this case, it should be clearly stated as a limitation
in the (ice-9 regex) documentation IMHO. If you don't know what constraints
there are on the implementation, there is no reason to expect this. Would it
help if I submitted a patch for that?





This bug report was last modified 2 years and 209 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.