GNU bug report logs - #13936
Default to UTF-8 for most Emacs source files

Previous Next

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Tue, 12 Mar 2013 21:23:01 UTC

Severity: wishlist

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #38 received at 13936 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 13936 <at> debbugs.gnu.org
Subject: Re: bug#13936: Default to UTF-8 for most Emacs source files
Date: Wed, 20 Mar 2013 09:43:38 -0700
On 03/20/13 01:18, Kenichi Handa wrote:
> Among CJK files, I think K(orean) files can be in UTF-8
> without problem.

It's easy enough to convert the K files to UTF-8 too, and I'll propose
a patch to do that in followup email.

> Are there any people familiar with Korean situation?

Sorry, I don't know.

For what it's worth, when I use Emacs to convert TUTORIAL.ko to
UTF-8 and back, the result is identical to the original, so no
information is lost by making that change.  (This is not true
for TUTORIAL.ja.)

I have another question.  Shouldn't it be OK to convert Elisp source
files such as leim/quail/japanese.el to UTF-8 as well?  Emacs
internally converts their text to UTF-8 while compiling them, so the
corresponding .elc files are in UTF-8 already, and there should be no
functional difference if we convert the .el files to UTF-8.

Converting these files to UTF-8 would fix an inconsistency in Emacs
behavior.  For example, if I visit the file leim/quail/japanese.el I see
this definition:

  (defvar quail-japanese-use-double-n nil
    "If non-nil, use type \"nn\" to insert ん.")

where the character 'ん' is displayed using code point 0x2473 in
charset japanese-jisx0208.  But if I *use* the above definition string,
by typing "C-h v quail-japanese-use-double-n RET", the help string
that I see has been translated to UTF-8, so Emacs displays that
character using code point 0x3093 in charset unicode instead.  It
would be better if the runtime behavior matched the source code, and
an easy way to do that would be to convert the source code to UTF-8.

Here is the list of the remaining .el files that I'd like to convert
to UTF-8:

	leim/quail/cyril-jis.el
	leim/quail/hanja-jis.el
	leim/quail/japanese.el
	leim/quail/py-punct.el
	leim/quail/pypunct-b5.el
	lisp/international/ja-dic-cnv.el
	lisp/international/ja-dic-utl.el
	lisp/international/kinsoku.el
	lisp/international/kkc.el
	lisp/international/titdic-cnv.el
	lisp/language/japan-util.el
	lisp/language/japanese.el
	lisp/term/x-win.el

x-win.el is a special case, since it has two "Kana: Fixme:" lines
talking about problems when converting to UTF-8 -- evidently these are
issues in our current setup anyway since Emacs converts the text to UTF-8
before compiling it.




This bug report was last modified 12 years and 101 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.