GNU bug report logs - #6866
mule-cmds.el just _assumes_ all of Taiwan uses Big5 and not UTF-8

Previous Next

Package: emacs;

Reported by: jidanni <at> jidanni.org

Date: Mon, 16 Aug 2010 12:03:02 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6866 in the body.
You can then email your comments to 6866 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 12:03:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to jidanni <at> jidanni.org:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 16 Aug 2010 12:03:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: jidanni <at> jidanni.org
To: bug-gnu-emacs <at> gnu.org
Subject: mule-cmds.el just _assumes_ all of Taiwan uses Big5 and not UTF-8
Date: Mon, 16 Aug 2010 19:09:48 +0800
I demand an explanation.
$ zgrep TW mule-cmds.el
    ("zh_TW" . "Chinese-Big5")
You guys just *assume* that all TW people still use Big5.
One can do LC_ALL=zh_TW.UTF-8 until he is blue in the face, but still
  current-language-environment is a variable defined in `mule-cmds.el'.
  Its value is "Chinese-BIG5"
emacs-version "24.0.50.1"




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 12:22:02 GMT) Full text and rfc822 format available.

Message #8 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: jidanni <at> jidanni.org
Cc: 6866 <at> debbugs.gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses Big5 and
	not UTF-8
Date: Mon, 16 Aug 2010 08:22:31 -0400
> From: jidanni <at> jidanni.org
> Date: Mon, 16 Aug 2010 19:09:48 +0800
> Cc: 
> 
> I demand an explanation.
> $ zgrep TW mule-cmds.el
>     ("zh_TW" . "Chinese-Big5")
> You guys just *assume* that all TW people still use Big5.

That's because they do.  Case closed.

> One can do LC_ALL=zh_TW.UTF-8 until he is blue in the face

How dare you??!!!




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 12:24:02 GMT) Full text and rfc822 format available.

Message #11 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: Jason Rumney <jasonr <at> gnu.org>
To: jidanni <at> jidanni.org
Cc: 6866 <at> debbugs.gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses Big5
	and	not UTF-8
Date: Mon, 16 Aug 2010 20:23:58 +0800
 On 16/8/2010 7:09 PM, jidanni <at> jidanni.org wrote:
> I demand an explanation.
> $ zgrep TW mule-cmds.el
>      ("zh_TW" . "Chinese-Big5")
> You guys just *assume* that all TW people still use Big5.
> One can do LC_ALL=zh_TW.UTF-8 until he is blue in the face, but still
>    current-language-environment is a variable defined in `mule-cmds.el'.
>    Its value is "Chinese-BIG5"
> emacs-version "24.0.50.1"

Please explain what bug you think this caused.






Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 12:58:01 GMT) Full text and rfc822 format available.

Message #14 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: jidanni <at> jidanni.org
To: jasonr <at> gnu.org
Cc: 6866 <at> debbugs.gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses Big5
	and	not UTF-8
Date: Mon, 16 Aug 2010 20:58:43 +0800
>>>>> "JR" == Jason Rumney <jasonr <at> gnu.org> writes:
JR> Please explain what bug you think this caused.
http://news.gmane.org/group/gmane.emacs.w3m/thread=8661





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 13:17:01 GMT) Full text and rfc822 format available.

Message #17 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: Jason Rumney <jasonr <at> gnu.org>
To: jidanni <at> jidanni.org
Cc: 6866 <at> debbugs.gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses Big5
	and	not UTF-8
Date: Mon, 16 Aug 2010 21:17:39 +0800
 On 16/8/2010 8:58 PM, jidanni <at> jidanni.org wrote:
>>>>>> "JR" == Jason Rumney<jasonr <at> gnu.org>  writes:
> JR>  Please explain what bug you think this caused.
> http://news.gmane.org/group/gmane.emacs.w3m/thread=8661

It appears from that thread that there is a bug in w3m. It is not 
apparent that it is related to your report here though, as the expected 
behavior is that a Japanese search engine should only be chosen if the 
current-language matches "Japanese", which this clearly does not.





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 13:39:02 GMT) Full text and rfc822 format available.

Message #20 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: jidanni <at> jidanni.org
To: jasonr <at> gnu.org
Cc: 6866 <at> debbugs.gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses Big5
	and	not UTF-8
Date: Mon, 16 Aug 2010 21:39:10 +0800
Well anyway, for our locale me and my friends all use zh_TW.UTF-8 and
stopped using zh_TW.big5 years ago. So at least it looks very dumb there
in mule-cmds.el that the zh_CN people can use UTF-8, but the HK and TW
are locked in the dark ages:

    ("zh_HK" . "Chinese-Big5")
    ("zh_TW" . "Chinese-Big5")
    ("zh_CN.UTF-8" . "Chinese-GBK")
    ("zh_CN" . "Chinese-GB")

The only big5 thing I apparently sometimes still use is
$ GET http://jidanni.org/comp/configuration/.emacs | grep -i b5
    (setq default-input-method 'chinese-py-punct-b5))));no 'utf' ones




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 14:08:01 GMT) Full text and rfc822 format available.

Message #23 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: Jason Rumney <jasonr <at> gnu.org>
To: jidanni <at> jidanni.org
Cc: 6866 <at> debbugs.gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses Big5
	and	not UTF-8
Date: Mon, 16 Aug 2010 22:07:52 +0800
 On 16/8/2010 9:39 PM, jidanni <at> jidanni.org wrote:
> Well anyway, for our locale me and my friends all use zh_TW.UTF-8 and
> stopped using zh_TW.big5 years ago. So at least it looks very dumb there
> in mule-cmds.el that the zh_CN people can use UTF-8, but the HK and TW
> are locked in the dark ages:
>
>      ("zh_HK" . "Chinese-Big5")
>      ("zh_TW" . "Chinese-Big5")
>      ("zh_CN.UTF-8" . "Chinese-GBK")
>      ("zh_CN" . "Chinese-GB")

GBK is a backwards compatible extension of GB with more characters.  I'm 
not sure that Big5 has an equivalent.  In all these cases, the character 
set is used to select preferences for fonts, input methods and other 
language sensitive things, and has nothing to do with UTF-8 (which is 
used as a preference for file encoding when specified).





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 15:27:02 GMT) Full text and rfc822 format available.

Message #26 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: jidanni <at> jidanni.org
To: jasonr <at> gnu.org
Cc: 6866 <at> debbugs.gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses Big5
	and	not UTF-8
Date: Mon, 16 Aug 2010 23:27:51 +0800
>>>>> "JR" == Jason Rumney <jasonr <at> gnu.org> writes:

JR> In all these cases, the character set is used to select preferences
JR> for fonts, input methods and other language sensitive things, and
JR> has nothing to do with UTF-8 (which is used as a preference for file
JR> encoding when specified).

Then it is a sad choice of the name of a character set being used for
other purposes. Many users will say: didn't I make a big effort years
ago to totally convert my environment? Why do I still have traces of
big5 hanging around?

Perhaps there should be a more neutral name used. Since it seems what
you are calling Chinese-Big5 does not have much to do with
http://en.wikipedia.org/wiki/Traditional_Chinese#Computer_encoding
after all.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 16:27:02 GMT) Full text and rfc822 format available.

Message #29 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: Werner LEMBERG <wl <at> gnu.org>
To: jasonr <at> gnu.org
Cc: 6866 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses Big5
	and not UTF-8
Date: Mon, 16 Aug 2010 18:26:54 +0200 (CEST)
> GBK is a backwards compatible extension of GB with more characters.
> I'm not sure that Big5 has an equivalent.

Such an extension exists; it is called Big5-plus.  However, AFAIK,
nobody has ever used it, and today it is obsolete since Unicode is
much better.


    Werner




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Mon, 16 Aug 2010 17:40:03 GMT) Full text and rfc822 format available.

Notification sent to jidanni <at> jidanni.org:
bug acknowledged by developer. (Mon, 16 Aug 2010 17:40:03 GMT) Full text and rfc822 format available.

Message #34 received at 6866-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: jidanni <at> jidanni.org
Cc: 6866-done <at> debbugs.gnu.org, jasonr <at> gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses
	Big5	and	not UTF-8
Date: Mon, 16 Aug 2010 20:37:30 +0300
> From: jidanni <at> jidanni.org
> Date: Mon, 16 Aug 2010 21:39:10 +0800
> Cc: 6866 <at> debbugs.gnu.org
> 
> Well anyway, for our locale me and my friends all use zh_TW.UTF-8 and
> stopped using zh_TW.big5 years ago. So at least it looks very dumb there
> in mule-cmds.el that the zh_CN people can use UTF-8, but the HK and TW
> are locked in the dark ages:
> 
>     ("zh_HK" . "Chinese-Big5")
>     ("zh_TW" . "Chinese-Big5")
>     ("zh_CN.UTF-8" . "Chinese-GBK")
>     ("zh_CN" . "Chinese-GB")

Are you sure you understand what this data base is used for in Emacs?

The function within mule-cmds.el which uses this data has this
comment:

    ;; locale-language-names specify both lang-env and coding.
    ;; But, what specified in locale-preferred-coding-systems
    ;; has higher priority.

Thus, if you specify UTF-8 as the preferred encoding (e.g., via
LC_ALL), it overrules the Big5 default.

> The only big5 thing I apparently sometimes still use is
> $ GET http://jidanni.org/comp/configuration/.emacs | grep -i b5
>     (setq default-input-method 'chinese-py-punct-b5))));no 'utf' ones

You are confused: an input method can produce Big5 characters, but
that won't prevent Emacs from encoding them in UTF-8 if that's your
preference.

I'm closing this bug.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6866; Package emacs. (Mon, 16 Aug 2010 17:44:02 GMT) Full text and rfc822 format available.

Message #37 received at 6866 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: jidanni <at> jidanni.org
Cc: 6866 <at> debbugs.gnu.org, jasonr <at> gnu.org
Subject: Re: bug#6866: mule-cmds.el just _assumes_ all of Taiwan uses
	Big5	and	not UTF-8
Date: Mon, 16 Aug 2010 20:41:44 +0300
> From: jidanni <at> jidanni.org
> Date: Mon, 16 Aug 2010 23:27:51 +0800
> Cc: 6866 <at> debbugs.gnu.org
> 
> Many users will say: didn't I make a big effort years ago to totally
> convert my environment? Why do I still have traces of big5 hanging
> around?

Users should not look into the code unless they actually read it (as
opposed to grep them with some random string) and understand what the
code does.

> Perhaps there should be a more neutral name used. Since it seems what
> you are calling Chinese-Big5 does not have much to do with
> http://en.wikipedia.org/wiki/Traditional_Chinese#Computer_encoding
> after all.

It _is_ a name of an encoding, just the Emacs name.  It just isn't
used in the way you thought.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 14 Sep 2010 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 342 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.