GNU bug report logs -
#13041
24.2; diacritic-fold-search
Previous Next
Reported by: perin <at> acm.org
Date: Fri, 30 Nov 2012 18:31:02 UTC
Severity: wishlist
Found in version 24.2
Fixed in version 25.1
Done: Michael Albinus <michael.albinus <at> gmx.de>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> This means that when you type the second "f" you might get a match
> before the present one. Consider a buffer containing the two lines
> suffer
> suffer
>
> Typing "suf" as search string would go to "suffer". Adding an "f" to
> the search string now would go back to "suffer" (or not).
Going back looks like backtracking in the regexp search.
OTOH, instead of using an approach of matching only a full match
like in Chromium, we could do like GEdit and OpenOffice that
match the whole ligature character in a partial match
(i.e. to match "ff" when the search string is just "f").
Though this has a problem of highlighting the whole character for
a partial match that looks wrong, but perhaps no one can do better.
>> http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
>> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt
>
> Case folding "ß" to "SS" (upper case "S") is not what I had in mind. I
> was talking about the (weak?) equivalence of "ß" and "ss" (lower case
> "s") which is much more important when searching. In particular so,
> because many German words that were earlier written with an "ß" are now
> written with "ss".
Yes, this is what I meant too. It is surprising but
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
defines the equivalence of "ß" and "ss" (lower case "s")
instead of case-folding. The following line in CaseFolding.txt:
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
maps 00DF (LATIN SMALL LETTER SHARP S) to two characters
0073 0073 (LATIN SMALL LETTER S) keeping the lower case.
Maybe this is a bug in Unicode data?
This bug report was last modified 8 years and 342 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.