GNU bug report logs -
#8308
23.3; Use utf-8 for writing abbrev file
Previous Next
Reported by: Leo <sdl.web <at> gmail.com>
Date: Mon, 21 Mar 2011 06:23:01 UTC
Severity: minor
Found in version 23.3
Fixed in version 24.1.
Done: Leo <sdl.web <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 8308 in the body.
You can then email your comments to 8308 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 06:23:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Leo <sdl.web <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
monnier <at> iro.umontreal.ca, bug-gnu-emacs <at> gnu.org
.
(Mon, 21 Mar 2011 06:23:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Is it OK to change the encoding for abbrev file to utf-8?
=== modified file 'lisp/abbrev.el'
--- a/lisp/abbrev.el 2011-03-21 05:49:12 +0000
+++ b/lisp/abbrev.el 2011-03-21 06:20:36 +0000
@@ -225,9 +225,9 @@
abbrev-file-name)))
(or (and file (> (length file) 0))
(setq file abbrev-file-name))
- (let ((coding-system-for-write 'emacs-mule))
+ (let ((coding-system-for-write 'utf-8))
(with-temp-file file
- (insert ";;-*-coding: emacs-mule;-*-\n")
+ (insert ";;-*-coding: utf-8;-*-\n")
(dolist (table
;; We sort the table in order to ease the automatic
;; merging of different versions of the user's abbrevs
Leo
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 09:01:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 8308 <at> debbugs.gnu.org (full text, mbox):
> From: Leo <sdl.web <at> gmail.com>
> Date: Mon, 21 Mar 2011 14:22:24 +0800
> Cc:
>
> Is it OK to change the encoding for abbrev file to utf-8?
What will that do to characters that are not unified into the range of
valid Unicode code points?
Can you tell what is the purpose of this change?
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 10:02:01 GMT)
Full text and
rfc822 format available.
Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):
On 2011-03-21 17:00 +0800, Eli Zaretskii wrote:
>> From: Leo <sdl.web <at> gmail.com>
>> Date: Mon, 21 Mar 2011 14:22:24 +0800
>> Cc:
>>
>> Is it OK to change the encoding for abbrev file to utf-8?
>
> What will that do to characters that are not unified into the range of
> valid Unicode code points?
That's a valid concern. But
,----
| M -- emacs-mule
|
| Emacs 21 internal format used in buffer and string.
| Type: emacs-mule (Emacs 21 internal encoding)
| EOL type: Automatic selection from:
| [emacs-mule-unix emacs-mule-dos emacs-mule-mac]
| This coding system can encode all emacs-mule charsets.
|
| [back]
`----
,----[ (info "(elisp)Text Representations") ]
| (1) This internal representation is based on one of the encodings
| defined by the Unicode Standard, called "UTF-8", for representing any
| Unicode codepoint, but Emacs extends UTF-8 to represent the additional
| codepoints it uses for raw 8-bit bytes and characters not unified with
| Unicode.
`----
Would you agree to use utf-8-emacs instead, which covers all characters.
>
> Can you tell what is the purpose of this change?
Make abbrev file editable to other editors.
Leo
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 10:55:01 GMT)
Full text and
rfc822 format available.
Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):
> From: Leo <sdl.web <at> gmail.com>
> Date: Mon, 21 Mar 2011 18:01:17 +0800
> Cc:
>
> Would you agree to use utf-8-emacs instead, which covers all characters.
That's better, but the characters outside Unicode are still going to
do bad things to any software except Emacs. AFAIK, emacs-mule is a
superset of iso-2022 in the same way as utf-8-emacs is a superset of
utf-8.
> > Can you tell what is the purpose of this change?
>
> Make abbrev file editable to other editors.
If we are really keen on making the abbrev files editable to other
editors, we should make sure they are encoded in some encoding that
these other editors will understand. That probably calls for using
utf-8 for everything that's covered by Unicode, and using other
appropriate encodings for characters outside Unicode.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 11:21:01 GMT)
Full text and
rfc822 format available.
Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):
Am 21.03.2011 11:54, schrieb Eli Zaretskii:
>> From: Leo<sdl.web <at> gmail.com>
>> Date: Mon, 21 Mar 2011 18:01:17 +0800
>> Cc:
>>
>> Would you agree to use utf-8-emacs instead, which covers all characters.
>
> That's better, but the characters outside Unicode are still going to
> do bad things to any software except Emacs. AFAIK, emacs-mule is a
> superset of iso-2022 in the same way as utf-8-emacs is a superset of
> utf-8.
>
>>> Can you tell what is the purpose of this change?
>>
>> Make abbrev file editable to other editors.
>
> If we are really keen on making the abbrev files editable to other
> editors, we should make sure they are encoded in some encoding that
> these other editors will understand. That probably calls for using
> utf-8 for everything that's covered by Unicode, and using other
> appropriate encodings for characters outside Unicode.
>
>
>
>
Hi,
sounds interesting for me, as not just other editors are at stake AFAIU,
but auto-generated abbrevs produced by programms.
These might be theme-specific, cover items of medicine, jura etc.
Could offer modes with preloaded abbrevs resp. to matter of writing.
Regards,
Andreas
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 14:51:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 8308 <at> debbugs.gnu.org (full text, mbox):
> Is it OK to change the encoding for abbrev file to utf-8?
> === modified file 'lisp/abbrev.el'
> --- a/lisp/abbrev.el 2011-03-21 05:49:12 +0000
> +++ b/lisp/abbrev.el 2011-03-21 06:20:36 +0000
> @@ -225,9 +225,9 @@
> abbrev-file-name)))
> (or (and file (> (length file) 0))
> (setq file abbrev-file-name))
> - (let ((coding-system-for-write 'emacs-mule))
> + (let ((coding-system-for-write 'utf-8))
> (with-temp-file file
> - (insert ";;-*-coding: emacs-mule;-*-\n")
> + (insert ";;-*-coding: utf-8;-*-\n")
> (dolist (table
> ;; We sort the table in order to ease the automatic
> ;; merging of different versions of the user's abbrevs
Sounds good in general, but I'm wondering whether we should worry about
the presence of abbrevs which include bytes (aka eight-bit-chars).
Using `utf-8-emacs' should fix those issues, but would then bump into
the problem that such abbrev files wouldn't be compatible with Emacs-22.
Stefan
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 15:39:01 GMT)
Full text and
rfc822 format available.
Message #23 received at submit <at> debbugs.gnu.org (full text, mbox):
On 2011-03-21 22:50 +0800, Stefan Monnier wrote:
> Sounds good in general, but I'm wondering whether we should worry about
> the presence of abbrevs which include bytes (aka eight-bit-chars).
> Using `utf-8-emacs' should fix those issues, but would then bump into
> the problem that such abbrev files wouldn't be compatible with Emacs-22.
I think we should just use utf-8-emacs. What do other people think?
Leo
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 18:19:02 GMT)
Full text and
rfc822 format available.
Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):
Am 21.03.2011 15:50, schrieb Stefan Monnier:
>> Is it OK to change the encoding for abbrev file to utf-8?
>> === modified file 'lisp/abbrev.el'
>> --- a/lisp/abbrev.el 2011-03-21 05:49:12 +0000
>> +++ b/lisp/abbrev.el 2011-03-21 06:20:36 +0000
>> @@ -225,9 +225,9 @@
>> abbrev-file-name)))
>> (or (and file (> (length file) 0))
>> (setq file abbrev-file-name))
>> - (let ((coding-system-for-write 'emacs-mule))
>> + (let ((coding-system-for-write 'utf-8))
>> (with-temp-file file
>> - (insert ";;-*-coding: emacs-mule;-*-\n")
>> + (insert ";;-*-coding: utf-8;-*-\n")
>> (dolist (table
>> ;; We sort the table in order to ease the automatic
>> ;; merging of different versions of the user's abbrevs
>
> Sounds good in general, but I'm wondering whether we should worry about
> the presence of abbrevs which include bytes (aka eight-bit-chars).
> Using `utf-8-emacs' should fix those issues, but would then bump into
> the problem that such abbrev files wouldn't be compatible with Emacs-22.
>
>
> Stefan
>
Hi,
so maybe not hard-code it, rather have a variable?
Andreas
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 18:46:02 GMT)
Full text and
rfc822 format available.
Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):
> From: Leo <sdl.web <at> gmail.com>
> Date: Mon, 21 Mar 2011 23:37:41 +0800
> Cc:
>
> I think we should just use utf-8-emacs.
Why do you think so?
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Mon, 21 Mar 2011 18:54:02 GMT)
Full text and
rfc822 format available.
Message #32 received at submit <at> debbugs.gnu.org (full text, mbox):
> Date: Mon, 21 Mar 2011 19:24:16 +0100
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> Cc:
>
> so maybe not hard-code it, rather have a variable?
A constant encoding will never DTRT in all cases.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Tue, 22 Mar 2011 01:02:02 GMT)
Full text and
rfc822 format available.
Message #35 received at submit <at> debbugs.gnu.org (full text, mbox):
On 2011-03-22 02:45 +0800, Eli Zaretskii wrote:
>> I think we should just use utf-8-emacs.
>
> Why do you think so?
By the time 24.1 is released, it will be 1-2 years from now and there
will be two major stable releases that work with utf-8-emacs, which are
backward-compatible enough. But I don't know so I'll forget about this
bug and let the gurus figure it out.
Leo
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Tue, 22 Mar 2011 02:49:02 GMT)
Full text and
rfc822 format available.
Message #38 received at submit <at> debbugs.gnu.org (full text, mbox):
>>> I think we should just use utf-8-emacs.
>> Why do you think so?
> By the time 24.1 is released, it will be 1-2 years from now and there
> will be two major stable releases that work with utf-8-emacs, which are
> backward-compatible enough. But I don't know so I'll forget about this
> bug and let the gurus figure it out.
I think it might be OK to do it for Emacs-25, but since Emacs-22 can't
handle utf-8-emacs, I think it's a bit early to switch to it in
Emacs-24. If utf-8 is sufficient, OTOH it's the best choice. So maybe
we should check the buffer first to see if utf-8 is safe, and only fall
back to emacs-mule if utf-8 is not safe.
Stefan
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Tue, 22 Mar 2011 03:48:02 GMT)
Full text and
rfc822 format available.
Message #41 received at submit <at> debbugs.gnu.org (full text, mbox):
On 2011-03-22 10:48 +0800, Stefan Monnier wrote:
> I think it might be OK to do it for Emacs-25, but since Emacs-22 can't
> handle utf-8-emacs, I think it's a bit early to switch to it in
> Emacs-24. If utf-8 is sufficient, OTOH it's the best choice. So maybe
> we should check the buffer first to see if utf-8 is safe, and only fall
> back to emacs-mule if utf-8 is not safe.
I think default to utf-8 is good, which is sufficient for most people.
Any comments on the following patch? I don't know how to introduce a
char unencodable with utf-8 to the abbrevs. So it is only partially
tested.
=== modified file 'lisp/abbrev.el'
--- lisp/abbrev.el 2011-01-25 04:08:28 +0000
+++ lisp/abbrev.el 2011-03-22 03:30:52 +0000
@@ -225,21 +225,29 @@
abbrev-file-name)))
(or (and file (> (length file) 0))
(setq file abbrev-file-name))
- (let ((coding-system-for-write 'emacs-mule))
- (with-temp-file file
- (insert ";;-*-coding: emacs-mule;-*-\n")
+ (let ((coding-system-for-write 'utf-8))
+ (with-temp-buffer
(dolist (table
- ;; We sort the table in order to ease the automatic
- ;; merging of different versions of the user's abbrevs
- ;; file. This is useful, for example, for when the
- ;; user keeps their home directory in a revision
- ;; control system, and is therefore keeping multiple
- ;; slightly-differing copies loosely synchronized.
- (sort (copy-sequence abbrev-table-name-list)
- (lambda (s1 s2)
- (string< (symbol-name s1)
- (symbol-name s2)))))
- (insert-abbrev-table-description table nil)))))
+ ;; We sort the table in order to ease the automatic
+ ;; merging of different versions of the user's abbrevs
+ ;; file. This is useful, for example, for when the
+ ;; user keeps their home directory in a revision
+ ;; control system, and is therefore keeping multiple
+ ;; slightly-differing copies loosely synchronized.
+ (sort (copy-sequence abbrev-table-name-list)
+ (lambda (s1 s2)
+ (string< (symbol-name s1)
+ (symbol-name s2)))))
+ (insert-abbrev-table-description table nil))
+ (when (unencodable-char-position (point-min) (point-max) 'utf-8)
+ (setq coding-system-for-write
+ (if (> emacs-major-version 24)
+ 'utf-8-emacs
+ ;; For compatibility with Emacs 22
+ 'emacs-mule)))
+ (goto-char (point-min))
+ (insert (format ";;-*-coding: %s;-*-\n" coding-system-for-write))
+ (write-region nil nil file nil 0))))
(defun add-mode-abbrev (arg)
"Define mode-specific abbrev for last word(s) before point.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Tue, 22 Mar 2011 05:25:02 GMT)
Full text and
rfc822 format available.
Message #44 received at submit <at> debbugs.gnu.org (full text, mbox):
> I think default to utf-8 is good, which is sufficient for most people.
> Any comments on the following patch? I don't know how to introduce a
> char unencodable with utf-8 to the abbrevs. So it is only partially
> tested.
(unibyte-string 129) returns a string containing an unencodable char.
So you can test with it.
The patch looks good,
Stefan
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Tue, 22 Mar 2011 10:43:03 GMT)
Full text and
rfc822 format available.
Message #47 received at submit <at> debbugs.gnu.org (full text, mbox):
On 2011-03-22 13:24 +0800, Stefan Monnier wrote:
> (unibyte-string 129) returns a string containing an unencodable char.
> So you can test with it.
I still cannot get any byte into the abbrevs. For example,
(unibyte-string 129) returns byte \201 but when it is written to abbrev
file by write-abbrev-file, it is changed to \ 2 0 1, so utf-8 appear
sufficient even for bytes.
Leo
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#8308
; Package
emacs
.
(Tue, 22 Mar 2011 18:28:01 GMT)
Full text and
rfc822 format available.
Message #50 received at submit <at> debbugs.gnu.org (full text, mbox):
>> (unibyte-string 129) returns a string containing an unencodable char.
>> So you can test with it.
> I still cannot get any byte into the abbrevs. For example,
> (unibyte-string 129) returns byte \201 but when it is written to abbrev
> file by write-abbrev-file, it is changed to \ 2 0 1, so utf-8 appear
> sufficient even for bytes.
Good. In any case your unencodable-foo test would trigger if there were
eight-bit-chars in there, so it works correctly in this respect.
Please install your patch.
Stefan
Reply sent
to
Leo <sdl.web <at> gmail.com>
:
You have taken responsibility.
(Wed, 23 Mar 2011 00:43:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Leo <sdl.web <at> gmail.com>
:
bug acknowledged by developer.
(Wed, 23 Mar 2011 00:43:02 GMT)
Full text and
rfc822 format available.
Message #55 received at 8308-done <at> debbugs.gnu.org (full text, mbox):
Version: 24.1.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 20 Apr 2011 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 14 years and 125 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.