GNU bug report logs - #13041
24.2; diacritic-fold-search

Previous Next

Package: emacs;

Reported by: perin <at> acm.org

Date: Fri, 30 Nov 2012 18:31:02 UTC

Severity: wishlist

Found in version 24.2

Fixed in version 25.1

Done: Michael Albinus <michael.albinus <at> gmx.de>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Juri Linkov <juri <at> jurta.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: Kenichi Handa <handa <at> gnu.org>, 13041 <at> debbugs.gnu.org, perin <at> panix.com, perin <at> acm.org
Subject: bug#13041: 24.2; diacritic-fold-search
Date: Sun, 09 Dec 2012 01:07:12 +0200
> This means that when you type the second "f" you might get a match
> before the present one.  Consider a buffer containing the two lines
> suffer
> suffer
>
> Typing "suf" as search string would go to "suffer".  Adding an "f" to
> the search string now would go back to "suffer" (or not).
Going back looks like backtracking in the regexp search.

OTOH, instead of using an approach of matching only a full match
like in Chromium, we could do like GEdit and OpenOffice that
match the whole ligature character in a partial match
(i.e. to match "ff" when the search string is just "f").

Though this has a problem of highlighting the whole character for
a partial match that looks wrong, but perhaps no one can do better.

>> http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
>> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt
>
> Case folding "ß" to "SS" (upper case "S") is not what I had in mind.  I
> was talking about the (weak?) equivalence of "ß" and "ss" (lower case
> "s") which is much more important when searching.  In particular so,
> because many German words that were earlier written with an "ß" are now
> written with "ss".

Yes, this is what I meant too.  It is surprising but
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
defines the equivalence of "ß" and "ss" (lower case "s")
instead of case-folding.  The following line in CaseFolding.txt:

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

maps 00DF (LATIN SMALL LETTER SHARP S) to two characters
0073 0073 (LATIN SMALL LETTER S) keeping the lower case.
Maybe this is a bug in Unicode data?




This bug report was last modified 8 years and 342 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.