GNU bug report logs - #69968
Case-folding of Mathematical Alphanumeric Symbols

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Sat, 23 Mar 2024 20:41:02 UTC

Severity: normal

Done: Juri Linkov <juri <at> linkov.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 69968 in the body.
You can then email your comments to 69968 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#69968; Package emacs. (Sat, 23 Mar 2024 20:41:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juri Linkov <juri <at> linkov.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 23 Mar 2024 20:41:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Case-folding of Mathematical Alphanumeric Symbols
Date: Sat, 23 Mar 2024 20:27:45 +0200

I wonder why case-folding is not supported for letters from
the Unicode block "Mathematical Alphanumeric Symbols":
https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols

Is it because the Unicode standard doesn't provide information
about their case-folding?  And indeed they are missing from
https://unicode.org/Public/UNIDATA/CaseFolding.txt

But OTOH, I can't find the file CaseFolding.txt in admin/unidata.
This means Emacs doesn't use this file?

Then should we add more case-folding information explicitly
for this Unicode block?

Case-folding is already supported for some characters from other
Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
CIRCLED LATIN CAPITAL LETTERs, etc.
But e.g. PARENTHESIZED LATIN CAPITAL LETTERs are missing too.
What is worse is that in Emacs ⒜ doesn't have even a word syntax
like its counterpart 🄐.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69968; Package emacs. (Sun, 24 Mar 2024 06:41:01 GMT) Full text and rfc822 format available.

Message #8 received at 69968 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 69968 <at> debbugs.gnu.org
Subject: Re: bug#69968: Case-folding of Mathematical Alphanumeric Symbols
Date: Sun, 24 Mar 2024 08:27:39 +0200

> From: Juri Linkov <juri <at> linkov.net>
> Date: Sat, 23 Mar 2024 20:27:45 +0200
> 
> I wonder why case-folding is not supported for letters from
> the Unicode block "Mathematical Alphanumeric Symbols":
> https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols

These are not letters, they are symbols.  And letter-case is not
defined for symbols.

> Is it because the Unicode standard doesn't provide information
> about their case-folding?  And indeed they are missing from
> https://unicode.org/Public/UNIDATA/CaseFolding.txt

Unicode doesn't consider them letters.

> But OTOH, I can't find the file CaseFolding.txt in admin/unidata.
> This means Emacs doesn't use this file?

We don't.  We use the case-conversion information in UnicodeData.txt,
as it tells us everything we need to know.

> Then should we add more case-folding information explicitly
> for this Unicode block?

What is the rationale for doing so?  It's against Unicode, so we need
to have a good reason, as this will have to be maintained by hand, and
also because some users might be surprised.

> Case-folding is already supported for some characters from other
> Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
> CIRCLED LATIN CAPITAL LETTERs, etc.

That's because UnicodeData.txt defines their letter-case conversions.

> But e.g. PARENTHESIZED LATIN CAPITAL LETTERs are missing too.
> What is worse is that in Emacs ⒜ doesn't have even a word syntax
> like its counterpart 🄐.

I think the fact that 🄐 has the word syntax might be a mistake.  These
are both symbols, so why would we want them to have the word syntax?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69968; Package emacs. (Sun, 24 Mar 2024 17:22:03 GMT) Full text and rfc822 format available.

Message #11 received at 69968 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 69968 <at> debbugs.gnu.org
Subject: Re: bug#69968: Case-folding of Mathematical Alphanumeric Symbols
Date: Sun, 24 Mar 2024 19:09:10 +0200

>> I wonder why case-folding is not supported for letters from
>> the Unicode block "Mathematical Alphanumeric Symbols":
>> https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
>
> These are not letters, they are symbols.  And letter-case is not
> defined for symbols.

𝘋𝘰 𝘺𝘰𝘶 𝘳𝘦𝘢𝘭𝘭𝘺 𝘵𝘩𝘪𝘯𝘬 𝘵𝘩𝘪𝘴 𝘵𝘦𝘹𝘵 𝘪𝘴 𝘯𝘰𝘵 𝘸𝘳𝘪𝘵𝘵𝘦𝘯 𝘸𝘪𝘵𝘩 𝙡𝙚𝙩𝙩𝙚𝙧𝙨?

>> Is it because the Unicode standard doesn't provide information
>> about their case-folding?  And indeed they are missing from
>> https://unicode.org/Public/UNIDATA/CaseFolding.txt
>
> Unicode doesn't consider them letters.

Ок, if Unicode doesn't consider them letters,
let's stick to the Unicode standard.

>> But OTOH, I can't find the file CaseFolding.txt in admin/unidata.
>> This means Emacs doesn't use this file?
>
> We don't.  We use the case-conversion information in UnicodeData.txt,
> as it tells us everything we need to know.

Thanks, I didn't remember that case-conversion is in UnicodeData.txt.
I checked admin/unidata/UnicodeData.txt and indeed there is
no case-conversion for Mathematical Alphanumeric Symbols.

>> Then should we add more case-folding information explicitly
>> for this Unicode block?
>
> What is the rationale for doing so?  It's against Unicode, so we need
> to have a good reason, as this will have to be maintained by hand, and
> also because some users might be surprised.

I don't think that some users might be surprised because
when they don't need to change case, they just don't use
case-changing functions.  But when they expect that case
should be changed, then indeed they will be surprised
that case is not changed.

>> Case-folding is already supported for some characters from other
>> Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
>> CIRCLED LATIN CAPITAL LETTERs, etc.
>
> That's because UnicodeData.txt defines their letter-case conversions.

Ok, then it's very strange that the Unicode standard doesn't define
letter-case conversions for other letters.  But what can we do.

>> But e.g. PARENTHESIZED LATIN CAPITAL LETTERs are missing too.
>> What is worse is that in Emacs ⒜ doesn't have even a word syntax
>> like its counterpart 🄐.
>
> I think the fact that 🄐 has the word syntax might be a mistake.  These
> are both symbols, so why would we want them to have the word syntax?

Because they look like letters with diacritics.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69968; Package emacs. (Sun, 24 Mar 2024 18:09:01 GMT) Full text and rfc822 format available.

Message #14 received at 69968 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 69968 <at> debbugs.gnu.org
Subject: Re: bug#69968: Case-folding of Mathematical Alphanumeric Symbols
Date: Sun, 24 Mar 2024 19:45:14 +0200

> From: Juri Linkov <juri <at> linkov.net>
> Cc: 69968 <at> debbugs.gnu.org
> Date: Sun, 24 Mar 2024 19:09:10 +0200
> 
> >> I wonder why case-folding is not supported for letters from
> >> the Unicode block "Mathematical Alphanumeric Symbols":
> >> https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
> >
> > These are not letters, they are symbols.  And letter-case is not
> > defined for symbols.
> 
> 𝘋𝘰 𝘺𝘰𝘶 𝘳𝘦𝘢𝘭𝘭𝘺 𝘵𝘩𝘪𝘯𝘬 𝘵𝘩𝘪𝘴 𝘵𝘦𝘹𝘵 𝘪𝘴 𝘯𝘰𝘵 𝘸𝘳𝘪𝘵𝘵𝘦𝘯 𝘸𝘪𝘵𝘩 𝙡𝙚𝙩𝙩𝙚𝙧𝙨?

What does that prove?  The fact that the glyphs look like normal
letters doesn't mean they are.  Like ℵ and ℶ are not Hebrew letters
they look like (and have left-to-right directionality).  And similarly
with 𞸀, 𞸁 and other mathematical symbols in that block aren't Arabic
letters, and in particular don't shape like Arabic letters.

> >> Case-folding is already supported for some characters from other
> >> Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
> >> CIRCLED LATIN CAPITAL LETTERs, etc.
> >
> > That's because UnicodeData.txt defines their letter-case conversions.
> 
> Ok, then it's very strange that the Unicode standard doesn't define
> letter-case conversions for other letters.  But what can we do.

We can define case-conversions for them if we decide to do so.
Moreover, Lisp programs which for some reason need that can do that
themselves, even if by default there are no case-conversions defined
for them.  The question is when and why is this needed?

> >> But e.g. PARENTHESIZED LATIN CAPITAL LETTERs are missing too.
> >> What is worse is that in Emacs ⒜ doesn't have even a word syntax
> >> like its counterpart 🄐.
> >
> > I think the fact that 🄐 has the word syntax might be a mistake.  These
> > are both symbols, so why would we want them to have the word syntax?
> 
> Because they look like letters with diacritics.

Not sure I agree.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69968; Package emacs. (Mon, 25 Mar 2024 07:49:03 GMT) Full text and rfc822 format available.

Message #17 received at 69968 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 69968 <at> debbugs.gnu.org
Subject: Re: bug#69968: Case-folding of Mathematical Alphanumeric Symbols
Date: Mon, 25 Mar 2024 09:37:10 +0200

>> >> I wonder why case-folding is not supported for letters from
>> >> the Unicode block "Mathematical Alphanumeric Symbols":
>> >> https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
>> >
>> > These are not letters, they are symbols.  And letter-case is not
>> > defined for symbols.
>>
>> 𝘋𝘰 𝘺𝘰𝘶 𝘳𝘦𝘢𝘭𝘭𝘺 𝘵𝘩𝘪𝘯𝘬 𝘵𝘩𝘪𝘴 𝘵𝘦𝘹𝘵 𝘪𝘴 𝘯𝘰𝘵 𝘸𝘳𝘪𝘵𝘵𝘦𝘯 𝘸𝘪𝘵𝘩 𝙡𝙚𝙩𝙩𝙚𝙧𝙨?
>
> What does that prove?  The fact that the glyphs look like normal
> letters doesn't mean they are.  Like ℵ and ℶ are not Hebrew letters
> they look like (and have left-to-right directionality).  And similarly
> with 𞸀, 𞸁 and other mathematical symbols in that block aren't Arabic
> letters, and in particular don't shape like Arabic letters.

I agree that these characters were intended to be used only
as mathematical symbols.  The problem is that often these symbols
are abused as letters to apply more styles in applications that
don't support styles.  There are special sites such as
https://www.textconverter.net/
that convert ASCII text to styled Unicode characters.

I don't use such sites, but once tried to copy such text to Emacs
and discovered that Isearch already nicely supports the search
of these characters by char-fold.  So it was a surprise that
unlike char-fold, case-fold is not supported to ignore case
while searching.

>> >> Case-folding is already supported for some characters from other
>> >> Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
>> >> CIRCLED LATIN CAPITAL LETTERs, etc.
>> >
>> > That's because UnicodeData.txt defines their letter-case conversions.
>>
>> Ok, then it's very strange that the Unicode standard doesn't define
>> letter-case conversions for other letters.  But what can we do.
>
> We can define case-conversions for them if we decide to do so.
> Moreover, Lisp programs which for some reason need that can do that
> themselves, even if by default there are no case-conversions defined
> for them.  The question is when and why is this needed?

Probably case-conversions for them could be added later only
when there is more support for such symbols in Emacs:
for example, after creating an input method to input them,
or better a command that will convert the region of ASCII chars,
etc.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69968; Package emacs. (Mon, 25 Mar 2024 16:00:04 GMT) Full text and rfc822 format available.

Message #20 received at 69968 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 69968 <at> debbugs.gnu.org
Subject: Re: bug#69968: Case-folding of Mathematical Alphanumeric Symbols
Date: Mon, 25 Mar 2024 14:37:49 +0200

> From: Juri Linkov <juri <at> linkov.net>
> Cc: 69968 <at> debbugs.gnu.org
> Date: Mon, 25 Mar 2024 09:37:10 +0200
> 
> >> Ok, then it's very strange that the Unicode standard doesn't define
> >> letter-case conversions for other letters.  But what can we do.
> >
> > We can define case-conversions for them if we decide to do so.
> > Moreover, Lisp programs which for some reason need that can do that
> > themselves, even if by default there are no case-conversions defined
> > for them.  The question is when and why is this needed?
> 
> Probably case-conversions for them could be added later only
> when there is more support for such symbols in Emacs:
> for example, after creating an input method to input them,
> or better a command that will convert the region of ASCII chars,
> etc.

I agree that case-conversions for these characters would make more
sense as part of a larger package which would allow using these
characters as letters.  In any case, making a lower-case character L
and upper-case character U a case-pair is simple:

  (let ((tbl (standard-case-table)))
    (set-case-syntax-pair U L tbl))

The above makes the change global, but it can also be made
buffer-locally; see "Case Tables" in the ELisp manual for more
details.

I guess we can now close this bug?  Or is there anything else to do
here?

Reply sent to Juri Linkov <juri <at> linkov.net>:
You have taken responsibility. (Mon, 25 Mar 2024 17:22:02 GMT) Full text and rfc822 format available.

Notification sent to Juri Linkov <juri <at> linkov.net>:
bug acknowledged by developer. (Mon, 25 Mar 2024 17:22:02 GMT) Full text and rfc822 format available.

Message #25 received at 69968-done <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 69968-done <at> debbugs.gnu.org
Subject: Re: bug#69968: Case-folding of Mathematical Alphanumeric Symbols
Date: Mon, 25 Mar 2024 19:18:37 +0200

> I agree that case-conversions for these characters would make more
> sense as part of a larger package which would allow using these
> characters as letters.  In any case, making a lower-case character L
> and upper-case character U a case-pair is simple:
>
>   (let ((tbl (standard-case-table)))
>     (set-case-syntax-pair U L tbl))
>
> The above makes the change global, but it can also be made
> buffer-locally; see "Case Tables" in the ELisp manual for more
> details.
>
> I guess we can now close this bug?  Or is there anything else to do
> here?

Thanks for the explanations, so I'm closing this now.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 23 Apr 2024 11:25:15 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 114 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #69968 Case-folding of Mathematical Alphanumeric Symbols

GNU bug report logs - #69968
Case-folding of Mathematical Alphanumeric Symbols