GNU bug report logs - #2354
23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1

Previous Next

Package: emacs;

Reported by: David Engster <deng <at> randomsample.de>

Date: Tue, 17 Feb 2009 10:45:02 UTC

Severity: normal

Merged with 2497

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 2354 in the body.
You can then email your comments to 2354 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2354; Package emacs. (Tue, 17 Feb 2009 10:45:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Engster <deng <at> randomsample.de>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Tue, 17 Feb 2009 10:45:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: David Engster <deng <at> randomsample.de>
To: bug-gnu-emacs <at> gnu.org
Subject: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1
Date: Tue, 17 Feb 2009 11:35:11 +0100
This is what I believe to be a regression in CVS Emacs since the
23.0.90 pretest. I'm using a fresh CVS checkout from 2009-02-17,
compiled with 'make bootstrap'.

You can reproduce it as follows:

1. emacs -Q
2. M-x set-language-environment RET Latin-1 RET
3. In some buffer write:

 (ucs-insert "2500")

4. Eval it, so that the unicode character is inserted into the buffer.
5. Save the file and choose utf-8 as encoding.
6. Kill the buffer.
7. Load the file you just saved.

Result: Emacs displays "â\224\200" for the unicode character.

Expected behaviour: Emacs should detect utf-8 encoding and display
correct character.

Please note that this has worked without problems with the Emacs
23.0.90 pretest, so it must be due to some change(s) since then in CVS.

In GNU Emacs 23.0.90.1 (i686-pc-linux-gnu, GTK+ Version 2.12.11)
 of 2009-02-17 on void
Windowing system distributor `The X.Org Foundation', version 11.0.10402000
configured using `configure  '--prefix=/usr/local/emacs''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: nil
  value of $XMODIFIERS: nil
  locale-coding-system: nil
  default-enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  tool-bar-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
M-x r e p o <tab> r <tab> C-g M-x s e t - l a n <tab> 
<return> L a t i n w <backspace> - w <return> <backspace> 
1 <return> M-x r e p o <tab> r <tab> <return>

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Making completion list...
Quit
Making completion list...





Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2354; Package emacs. (Tue, 17 Feb 2009 16:55:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juanma Barranquero <lekktu <at> gmail.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Tue, 17 Feb 2009 16:55:04 GMT) Full text and rfc822 format available.

Message #10 received at 2354 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: David Engster <deng <at> randomsample.de>
Cc: 2354 <at> debbugs.gnu.org
Subject: Re: bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with 
	language environment Latin-1
Date: Tue, 17 Feb 2009 17:45:59 +0100
On Tue, Feb 17, 2009 at 11:35, David Engster <deng <at> randomsample.de> wrote:

> You can reproduce it as follows:
>
> 1. emacs -Q
> 2. M-x set-language-environment RET Latin-1 RET
> 3. In some buffer write:
>
>  (ucs-insert "2500")
>
> 4. Eval it, so that the unicode character is inserted into the buffer.
> 5. Save the file and choose utf-8 as encoding.
> 6. Kill the buffer.
> 7. Load the file you just saved.
>
> Result: Emacs displays "â\224\200" for the unicode character.

I cannot reproduce it on Windows with the current trunk. The file's
coding is correctly detected as UTF-8.

    Juanma




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2354; Package emacs. (Tue, 17 Feb 2009 18:10:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Engster <deng <at> randomsample.de>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Tue, 17 Feb 2009 18:10:04 GMT) Full text and rfc822 format available.

Message #15 received at 2354 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: David Engster <deng <at> randomsample.de>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: 2354 <at> debbugs.gnu.org
Subject: Re: bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with  language environment Latin-1
Date: Tue, 17 Feb 2009 19:04:42 +0100
Juanma Barranquero <lekktu <at> gmail.com> writes:
> On Tue, Feb 17, 2009 at 11:35, David Engster <deng <at> randomsample.de> wrote:
>
>> You can reproduce it as follows:
>>
>> 1. emacs -Q
>> 2. M-x set-language-environment RET Latin-1 RET
>> 3. In some buffer write:
>>
>>  (ucs-insert "2500")
>>
>> 4. Eval it, so that the unicode character is inserted into the buffer.
>> 5. Save the file and choose utf-8 as encoding.
>> 6. Kill the buffer.
>> 7. Load the file you just saved.
>>
>> Result: Emacs displays "â\224\200" for the unicode character.
>
> I cannot reproduce it on Windows with the current trunk. The file's
> coding is correctly detected as UTF-8.

Thank you for looking into this. I tested this now again on a different
machine, but also running GNU/Linux (Ubuntu 8.10), with the same
result. FWIW, I think I could track down this issue to the following
commit for src/coding.c:

revision 1.413
date: 2009-02-09 01:42:37 +0100;  author: handa;  state: Exp;  lines: +1 -1;  commitid: WAhpeD8cqX926HBt;
(detect_coding_charset): Fix previous change.

With revision 1.412 of coding.c, the error disappears for me.

-David




Merged 2354 2497. Request was from Jason Rumney <jasonr <at> gnu.org> to control <at> emacsbugs.donarmstrong.com. (Sat, 28 Feb 2009 01:35:07 GMT) Full text and rfc822 format available.

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sat, 28 Feb 2009 12:30:04 GMT) Full text and rfc822 format available.

Notification sent to David Engster <deng <at> randomsample.de>:
bug acknowledged by developer. (Sat, 28 Feb 2009 12:30:04 GMT) Full text and rfc822 format available.

Message #22 received at 2354-done <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: 2497-done <at> debbugs.gnu.org, 2354-done <at> debbugs.gnu.org
Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k
Date: Sat, 28 Feb 2009 14:21:08 +0200
> From: David Engster <deng <at> randomsample.de>
> Date: Fri, 27 Feb 2009 18:46:12 +0100
> Cc: emacs-pretest-bug <at> gnu.org, 2497 <at> emacsbugs.donarmstrong.com
> 
> Uwe Siart <uwe.siart <at> tum.de> writes:
> > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it
> > fails to read utf-8 encoded files correctly. When visiting a file in
> > utf-8 encoding all characters above 255 are screwed up and "C-h C RET"
> > indicates iso-latin1-dos for saving the file. This has not been an
> > issue in 23.0.90.
> 
> Maybe this is a duplicate of what I reported in
> 
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=2354
> 
> As I write later in that bug report, I think I could track down this
> issue to the change in revision 1.413 of src/coding.c. Maybe you could
> try if the same applies to your problem.

Should be fixed by this change:

2009-02-28  Eli Zaretskii  <eliz <at> gnu.org>

	* coding.c (detect_coding_charset): Fix change from 2008-10-21.
	Also, check iso-latin-*, not only iso-8859-*.





Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sat, 28 Feb 2009 12:30:04 GMT) Full text and rfc822 format available.

Notification sent to uwe.siart <at> tum.de:
bug acknowledged by developer. (Sat, 28 Feb 2009 12:30:04 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> emacsbugs.donarmstrong.com. (Wed, 01 Apr 2009 14:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 16 years and 87 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.