GNU bug report logs - #56747
28.1.90; Char fold search doesn't work

Previous Next

Package: emacs;

Reported by: Damien Cassou <damien <at> cassou.me>

Date: Sun, 24 Jul 2022 17:29:01 UTC

Severity: normal

Found in version 28.1.90

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Damien Cassou <damien <at> cassou.me>
Cc: 56747 <at> debbugs.gnu.org
Subject: bug#56747: 28.1.90; Char fold search doesn't work
Date: Mon, 25 Jul 2022 15:01:31 +0300
> From: Damien Cassou <damien <at> cassou.me>
> Cc: 56747 <at> debbugs.gnu.org
> Date: Sun, 24 Jul 2022 21:43:10 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> > Which part of the manual led you to expect the above behavior?
> 
> This page (info "(emacs) Lax Search") says:
> 
>   In addition, ‘a’ matches other characters that resemble it, or have it
>   as part of their graphical representation, such as U+249C
>   PARENTHESIZED LATIN SMALL LETTER A and U+2100 ACCOUNT OF (which looks
>   like a small ‘a’ over ‘c’)
> 
> Those 2 characters are the ones I tried so I was expecting to make it
> work.

Ah, you are right.  I wasn't reading the text closely enough.

> > By default, Emacs only folds "canonically-equivalent" characters, and
> > those two aren't equivalent to 'a'.
> 
> Then I don't understand what the manual is saying. Can you please
> explain?

It's a documentation bug: these 2 pairs are by default not handled as
equivalent.  The reasons are to some extent heuristics: since the
table of the equivalent character sequences is produced mechanically,
allowing such "too lax" equivalences would lead to surprising false
matches; see bug#20975 for one example.  (These surprising results are
in part due to the our simplistic implementation, whereby we convert
the set of equivalent sequences to a regexp.)  So we decided to play
it safe, and not allow 'a' to match a character whose Unicode
decomposition is "(a)", because 'a' is not the first character of the
decomposition.  We do allow the sequence "(a)" to match ⒜ (but not
vice versa!), and we do allow 'a' to match 'ⓐ' (because 'a' is the
only character in the decomposition of the latter).

The result of these heuristics is somewhat inconsistent from user POV,
which is why we have a facility to customize it.

So I've now updated the manual to quote only examples that really
work.

> By the way, you are doing an amazing job with Emacs! Thank you so much
> Eli.

Thanks, but please don't forget Lars and others involved in the
development.




This bug report was last modified 2 years and 297 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.