GNU bug report logs - #57507
Regular expression matching depends on locale encoding

Previous Next

Package: guile;

Reported by: Jean Abou Samra <jean <at> abou-samra.fr>

Date: Wed, 31 Aug 2022 16:55:02 UTC

Severity: normal

Full log


Message #11 received at 57507 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Jean Abou Samra <jean <at> abou-samra.fr>
Cc: 57507 <at> debbugs.gnu.org
Subject: Re: bug#57507: Regular expression matching depends on locale encoding
Date: Mon, 05 Sep 2022 09:48:36 +0200
Hi Jean,

Jean Abou Samra <jean <at> abou-samra.fr> skribis:

> Regular expressions do funky things with Unicode if a non-Unicode-aware
> locale is set. Yet, they're purely string operations, so I don't think
> it's expected that they depend on the locale encoding.

This is the expected behavior: first because (ice-9 regex) is
implemented in terms of the libc regex functions, as Dale put (but that
could be thought as an implementation detail), and second because things
such as character classes are necessarily locale-dependent (this has
bitten us in the past, for instance with <https://bugs.gnu.org/35785>).

I hope that makes sense.

Thanks,
Ludo’.




This bug report was last modified 2 years and 209 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.