GNU bug report logs - #78439
Accent insensitive grep

Previous Next

Package: grep;

Reported by: "Avid Seeker" <avidseeker <at> disroot.org>

Date: Thu, 15 May 2025 07:47:02 UTC

Severity: normal

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Avid Seeker" <avidseeker <at> disroot.org>
To: <bug-grep <at> gnu.org>
Subject: Accent insensitive grep
Date: Thu, 15 May 2025 05:49:00 +0000
Re-iterating the question on SO <https://stackoverflow.com/questions/20937864/> of applying an
accent-insensitive grep to text. (e.g: all accents of a letter 'e' should be regarded as an ascii 'e').

The response by Adam Katz mentions:
> You should not expect equivalence classes to be portable as they are too arcane.

What's the stance of grep developers on this? are equivalence classes the
right tool to approach this? I see that they depend on LC_COLLATE, in
which case it would be possible to setup a custom locale that matches
digraphs.

In the example he gave, he also mentions:
> This matches all words like aei... [but won't match] æi... it's quite
> likely that digraphs are beyond the reach of even the best equivalence
> class map.

Is there a way to setup a locale without having to recompile glibc or
are these locale values hardcoded into programs using glibc?

Thanks,
Avid




This bug report was last modified 32 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.