GNU bug report logs - #17130
24.4.50; Deficient Unicode case folding

Previous Next

Package: emacs;

Reported by: Nathan Trapuzzano <nbtrap <at> nbtrap.com>

Date: Fri, 28 Mar 2014 12:08:02 UTC

Severity: wishlist

Found in version 24.4.50

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Nathan Trapuzzano <nbtrap <at> nbtrap.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 17130 <at> debbugs.gnu.org
Subject: bug#17130: 24.4.50; Deficient Unicode case folding
Date: Sat, 29 Mar 2014 10:03:32 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Reading through the manual section on case tables, it seems that this
>> could be supported via the extra "canonicalize" slot:
>> 
>>     CANONICALIZE
>>       The canonicalize table maps all of a set of case-related
>>       characters into a particular member of that set.
>
> Not efficiently, no.  E.g., how will you find ς from σ, using this
> method?

σ, ς, and Σ would all have σ in the CANONICALIZE slot, since they all
fold to σ.  (By the way, ς should upcase to Σ--that much I know the case
tables can handle.)

> Besides, don't we also need to know that ς can only be present at the
> end of a word?

Don't think so.  AFAIK, Unicode says nothing about ordering except when
it comes to combining characters.  But even it did prescribe such a
rule, I don't think it would have anything to do with case folding.

>> If this isn't already used for Unicode case folding, what _is_ it used
>> for?
>
> It is used for case-insensitive regexp matching, see search.c.

Right, but what I'm asking is: if Emacs doesn't do Unicode case folding,
what is the purpose of the CANONICALIZE slot except as a kind of
placeholder that gets autofilled?  Are there other kinds of case
folding--other than traditional upper/lower and Unicode--that I'm not
aware of?  I understand that Emacs autofills the CANONICALIZE slot from
the other slots, but only when the CANONICALIZE slot is not already set
to non-nil.  What if the CANONICALIZE slot on ς were set to σ?  I think
that's all that would have to happen for the Unicode folding to work.
It seems the machinery is already in place.




This bug report was last modified 5 years and 240 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.