GNU bug report logs - #78439
Accent insensitive grep

Previous Next

Package: grep;

Reported by: "Avid Seeker" <avidseeker <at> disroot.org>

Date: Thu, 15 May 2025 07:47:02 UTC

Severity: normal

To reply to this bug, email your comments to 78439 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#78439; Package grep. (Thu, 15 May 2025 07:47:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Avid Seeker" <avidseeker <at> disroot.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Thu, 15 May 2025 07:47:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Avid Seeker" <avidseeker <at> disroot.org>
To: <bug-grep <at> gnu.org>
Subject: Accent insensitive grep
Date: Thu, 15 May 2025 05:49:00 +0000
Re-iterating the question on SO <https://stackoverflow.com/questions/20937864/> of applying an
accent-insensitive grep to text. (e.g: all accents of a letter 'e' should be regarded as an ascii 'e').

The response by Adam Katz mentions:
> You should not expect equivalence classes to be portable as they are too arcane.

What's the stance of grep developers on this? are equivalence classes the
right tool to approach this? I see that they depend on LC_COLLATE, in
which case it would be possible to setup a custom locale that matches
digraphs.

In the example he gave, he also mentions:
> This matches all words like aei... [but won't match] æi... it's quite
> likely that digraphs are beyond the reach of even the best equivalence
> class map.

Is there a way to setup a locale without having to recompile glibc or
are these locale values hardcoded into programs using glibc?

Thanks,
Avid




Information forwarded to bug-grep <at> gnu.org:
bug#78439; Package grep. (Thu, 15 May 2025 16:20:04 GMT) Full text and rfc822 format available.

Message #8 received at 78439 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Avid Seeker <avidseeker <at> disroot.org>
Cc: 78439 <at> debbugs.gnu.org
Subject: Re: bug#78439: Accent insensitive grep
Date: Thu, 15 May 2025 09:19:14 -0700
On 2025-05-14 22:49, Avid Seeker via Bug reports for GNU grep wrote:

> are equivalence classes the
> right tool to approach this?

They're supposed to be, yes ...

> I see that they depend on LC_COLLATE, in
> which case it would be possible to setup a custom locale that matches
> digraphs.

... though you're venturing into uncharted territory here. Please let us 
know of any monsters you find.

> Is there a way to setup a locale without having to recompile glibc

Yes, use localedef.





This bug report was last modified 32 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.