GNU bug report logs - #39659
27.0.60; inappropriate han script definition in char-script-table

Previous Next

Package: emacs;

Reported by: ynyaaa <at> gmail.com

Date: Tue, 18 Feb 2020 13:52:01 UTC

Severity: normal

Found in version 27.0.60

Full log


View this message in rfc822 format

From: ynyaaa <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> gnu.org>, 39659 <at> debbugs.gnu.org
Subject: bug#39659: 27.0.60; inappropriate han script definition in char-script-table
Date: Wed, 19 Feb 2020 18:53:07 +0900
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: ynyaaa <at> gmail.com
>> Date: Tue, 18 Feb 2020 22:50:57 +0900
>> 
>> 'han' script is defined in char-script-table as:
>> 	2E80-2FDF	han
>> 	3200-9FFF	han
>> 	F900-FAFF	han
>> 	FE30-FE4F	han
>> 	1F200-1F2FF	han
>> 	20000-2A6DF	han
>> 	2A700-2EBEF	han
>> 	2F800-2FA1F	han
>> 
>> It is better to set values as:
>> 	3200-33FF	cjk-misc
>> 	4DC0-4DFF	cjk-misc
>> 	FE30-FE4F	cjk-misc
>> 	1F200-1F2FF	cjk-misc
>> 
>> If enclosed CJK Ideographs should be 'han' script,
>> enclosed Hanguls should be 'hangul' script,
>> enclosed Katakana should be 'kana' script,
>> and enclosed Numbers should be 'symbol' script.
>
> Please provide some rationale for the differences, just saying
> "better" and "should" doesn't explain why you think the changes are
> for the good.
>
> CC'ing Handa-san, who I hope will have some comments on this.
>
> Thanks.

Because they are not han characters.
I think that combinatorial characters are not han characters,
and that they are symbolic characters.

As for enclosed latin letters, they are treated as 'symbol' script.
	249C-24B5	PARENTHESIZED LATIN SMALL LETTER *
	24B6-24CF	CIRCLED LATIN CAPITAL LETTER *
	24D0-24E9	CIRCLED LATIN SMALL LETTER *
	1F110-1F129	PARENTHESIZED LATIN CAPITAL LETTER *
	1F130-1F149	SQUARED LATIN CAPITAL LETTER *
	1F150-1F169	NEGATIVE CIRCLED LATIN CAPITAL LETTER *
	1F170-1F189	NEGATIVE SQUARED LATIN CAPITAL LETTER *
	1F12A		TORTOISE SHELL BRACKETED LATIN CAPITAL LETTER S
	1F12B		CIRCLED ITALIC LATIN CAPITAL LETTER C
	1F12C		CIRCLED ITALIC LATIN CAPITAL LETTER R
	1F18A		CROSSED NEGATIVE SQUARED LATIN CAPITAL LETTER P
	1F1A5		SQUARED LATIN SMALL LETTER D

If script is set to han, hangul or kana for combinatorial characters
which contain han, hangul or kana characters, script values are like below:

CodePoint	Script	Comment
3200-321E	hangul	enclosed hangul
321F		-	unassigned
3220-3247	han	enclosed han
3248-324F	symbol	enclosed number
3250		symbol	combined latin
3251-325F	symbol	enclosed number
3260-327E	hangul	enclosed hangul
327F		symbol	symbol
3280-32B0	han	enclosed han
32B1-32BF	symbol	enclosed number
32C0-32CB	han	square character with han
32CC-32CF	symbol	square character with latin
32D0-32FE	kana	enclosed kana
32FF		han	square character with han
3300-3357	kana	square character with kana
3358-3370	han	square character with han
3371-337A	symbol	square character with latin
337B-337F	han	square character with han
3380-33DF	symbol	square character with latin
33E0-33FE	han	square character with han
33FF		symbol	square character with latin

4DC0-4DFF	symbol	symbol

FE30-FE44	symbol	symbol for vertical
FE45-FE46	symbol	symbol
FE47-FE48	symbol	symbol for vertical
FE49-FE4F	symbol	symbol

1F200-1F202	kana	enclosed/square character with kana
...		-	unassigned
1F210-1F212	han	enclosed han
1F213		kana	enclosed kana
1F214-1F248	han	enclosed han
...		-	unassigned
1F250-1F251	han	enclosed han
...		-	unassigned
1F260-1F265	symbol	symbol




This bug report was last modified 5 years and 162 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.