From unknown Mon Jun 23 13:13:25 2025 X-Loop: owner@emacsbugs.donarmstrong.com Subject: bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 Reply-To: David Engster , 2354@debbugs.gnu.org Resent-From: David Engster Resent-To: bug-submit-list@lists.donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Tue, 17 Feb 2009 10:45:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-Emacs-PR-Message: report 2354 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Received: via spool by submit@emacsbugs.donarmstrong.com id=B.123486694028167 (code B ref -1); Tue, 17 Feb 2009 10:45:02 +0000 Received: (at submit) by emacsbugs.donarmstrong.com; 17 Feb 2009 10:35:40 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=0.1 required=4.0 tests=FOURLA autolearn=no version=3.2.5-bugs.debian.org_2005_01_02 Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1HAZVl4028159 for ; Tue, 17 Feb 2009 02:35:33 -0800 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LZNIY-0003fK-Sq for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 05:35:30 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LZNIR-0003VJ-Qq for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 05:35:25 -0500 Received: from [199.232.76.173] (port=51423 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LZNIQ-0003UY-NL for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 05:35:22 -0500 Received: from m61s02.vlinux.de ([83.151.21.164]:48492) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LZNIQ-0007hp-1t for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 05:35:22 -0500 Received: from dslb-082-083-056-080.pools.arcor-ip.net ([82.83.56.80] helo=void) by m61s02.vlinux.de with esmtpsa (TLS-1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.63) (envelope-from ) id 1LZNKC-0001iN-Ng for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 11:37:12 +0100 From: David Engster To: bug-gnu-emacs@gnu.org User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.90 (gnu/linux) Date: Tue, 17 Feb 2009 11:35:11 +0100 Message-ID: <87y6w5jqqo.fsf@engster.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) This is what I believe to be a regression in CVS Emacs since the 23.0.90 pretest. I'm using a fresh CVS checkout from 2009-02-17, compiled with 'make bootstrap'. You can reproduce it as follows: 1. emacs -Q 2. M-x set-language-environment RET Latin-1 RET 3. In some buffer write: (ucs-insert "2500") 4. Eval it, so that the unicode character is inserted into the buffer. 5. Save the file and choose utf-8 as encoding. 6. Kill the buffer. 7. Load the file you just saved. Result: Emacs displays "=E2\224\200" for the unicode character. Expected behaviour: Emacs should detect utf-8 encoding and display correct character. Please note that this has worked without problems with the Emacs 23.0.90 pretest, so it must be due to some change(s) since then in CVS. In GNU Emacs 23.0.90.1 (i686-pc-linux-gnu, GTK+ Version 2.12.11) of 2009-02-17 on void Windowing system distributor `The X.Org Foundation', version 11.0.10402000 configured using `configure '--prefix=3D/usr/local/emacs'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: nil value of $XMODIFIERS: nil locale-coding-system: nil default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t tool-bar-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e p o r C-g M-x s e t - l a n =20 L a t i n w - w =20 1 M-x r e p o r Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. Making completion list... Quit Making completion list... From unknown Mon Jun 23 13:13:25 2025 X-Loop: owner@emacsbugs.donarmstrong.com Subject: bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 Reply-To: Juanma Barranquero , 2354@debbugs.gnu.org Resent-From: Juanma Barranquero Resent-To: bug-submit-list@lists.donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Tue, 17 Feb 2009 16:55:04 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-Emacs-PR-Message: followup 2354 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Received: via spool by 2354-submit@emacsbugs.donarmstrong.com id=B2354.123488916324183 (code B ref 2354); Tue, 17 Feb 2009 16:55:04 +0000 Received: (at 2354) by emacsbugs.donarmstrong.com; 17 Feb 2009 16:46:03 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ey-out-2122.google.com (ey-out-2122.google.com [74.125.78.24]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1HGjxKv024177 for <2354@emacsbugs.donarmstrong.com>; Tue, 17 Feb 2009 08:46:01 -0800 Received: by ey-out-2122.google.com with SMTP id 25so230420eya.13 for <2354@emacsbugs.donarmstrong.com>; Tue, 17 Feb 2009 08:45:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=KYdR2KYlCd9l5WaT06Gh1mKYOBoCIa3JfScqQTT5oB4=; b=mP1h+mHiW0ZVSL1RLmrM2R+fGReHe3dpI1HtQgVaGyRz7yHczsbZq0s3dDpoydalEN A0VIHc/iEI/KOzHxdmxRHkQOKwEkljMVX5ngdDPMwdTx1JbqX9+ruqlNt7+HbJyMItb5 RQKgyqw4ekXpc2cfis9nVgzB6h6UruTouOZzs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=yDz7ytizhXfTbS0fdFE2umD8FUJOd/Au1QdxKa/iMImV/4aeEMNIXK+p8tvy6Yf3eC /nYsWx/WgC4JuMSPpQVV6VFXf36or0Ug7NOywLjAm8Ry5bazlMia8ux7wwcqvd+rRufc Ne+3mdQlej+YC/sbAvVkFknLC0emZ6qRK+kx4= MIME-Version: 1.0 Received: by 10.211.196.13 with SMTP id y13mr1237582ebp.135.1234889159037; Tue, 17 Feb 2009 08:45:59 -0800 (PST) In-Reply-To: <87y6w5jqqo.fsf@engster.org> References: <87y6w5jqqo.fsf@engster.org> Date: Tue, 17 Feb 2009 17:45:59 +0100 Message-ID: From: Juanma Barranquero To: David Engster Cc: 2354@debbugs.gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tue, Feb 17, 2009 at 11:35, David Engster wrote: > You can reproduce it as follows: > > 1. emacs -Q > 2. M-x set-language-environment RET Latin-1 RET > 3. In some buffer write: > > (ucs-insert "2500") > > 4. Eval it, so that the unicode character is inserted into the buffer. > 5. Save the file and choose utf-8 as encoding. > 6. Kill the buffer. > 7. Load the file you just saved. > > Result: Emacs displays "=C3=A2\224\200" for the unicode character. I cannot reproduce it on Windows with the current trunk. The file's coding is correctly detected as UTF-8. Juanma From unknown Mon Jun 23 13:13:25 2025 X-Loop: owner@emacsbugs.donarmstrong.com Subject: bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 Reply-To: David Engster , 2354@debbugs.gnu.org Resent-From: David Engster Resent-To: bug-submit-list@lists.donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Tue, 17 Feb 2009 18:10:04 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-Emacs-PR-Message: followup 2354 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Received: via spool by 2354-submit@emacsbugs.donarmstrong.com id=B2354.123489388910628 (code B ref 2354); Tue, 17 Feb 2009 18:10:04 +0000 Received: (at 2354) by emacsbugs.donarmstrong.com; 17 Feb 2009 18:04:49 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-1.9 required=4.0 tests=FOURLA,GMAIL,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from m61s02.vlinux.de (m61s02.vlinux.de [83.151.21.164]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1HI4j5P010622 for <2354@emacsbugs.donarmstrong.com>; Tue, 17 Feb 2009 10:04:47 -0800 Received: from kafka.physik3.gwdg.de ([134.76.92.48] helo=kafka) by m61s02.vlinux.de with esmtpsa (TLS-1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.63) (envelope-from ) id 1LZUL9-0005wg-Tc; Tue, 17 Feb 2009 19:06:39 +0100 From: David Engster To: Juanma Barranquero Cc: 2354@debbugs.gnu.org In-Reply-To: (Juanma Barranquero's message of "Tue, 17 Feb 2009 17:45:59 +0100") References: <87y6w5jqqo.fsf@engster.org> User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.90 (gnu/linux) Date: Tue, 17 Feb 2009 19:04:42 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Juanma Barranquero writes: > On Tue, Feb 17, 2009 at 11:35, David Engster wrote: > >> You can reproduce it as follows: >> >> 1. emacs -Q >> 2. M-x set-language-environment RET Latin-1 RET >> 3. In some buffer write: >> >> (ucs-insert "2500") >> >> 4. Eval it, so that the unicode character is inserted into the buffer. >> 5. Save the file and choose utf-8 as encoding. >> 6. Kill the buffer. >> 7. Load the file you just saved. >> >> Result: Emacs displays "=E2\224\200" for the unicode character. > > I cannot reproduce it on Windows with the current trunk. The file's > coding is correctly detected as UTF-8. Thank you for looking into this. I tested this now again on a different machine, but also running GNU/Linux (Ubuntu 8.10), with the same result. FWIW, I think I could track down this issue to the following commit for src/coding.c: revision 1.413 date: 2009-02-09 01:42:37 +0100; author: handa; state: Exp; lines: +1 -1= ; commitid: WAhpeD8cqX926HBt; (detect_coding_charset): Fix previous change. With revision 1.412 of coding.c, the error disappears for me. -David From jasonrumney@gmail.com Fri Feb 27 17:33:30 2009 Received: (at control) by emacsbugs.donarmstrong.com; 28 Feb 2009 01:33:31 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=unavailable version=3.2.5-bugs.debian.org_2005_01_02 Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.168]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1S1XLBJ020072; Fri, 27 Feb 2009 17:33:23 -0800 Received: by wf-out-1314.google.com with SMTP id 24so1627758wfg.13 for ; Fri, 27 Feb 2009 17:33:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=cQXLo9yPykRIeugxzGSb2Lic4XwRqMqlEB6lmjWySR8=; b=RUU3blHrr3wCQO7GU4gKJHr7yhsffwWyE/l7Xt4CH4JHiumBRkrGmzE7DSqz5NYMOX VtYoSHb8kbVpLncoGarLlj97rZyuthVN49o1eAGaK7PwBN3c2jEjSXqyDEssyQ/vRDCL QxIYGnj2eRkao11Dy+IWuRsUW5aK3mwXM6mXs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; b=WpnrJ8eStBH8GnLOxovbgldfKEdBR1cI+VVfK1ca0GY4hWpeBoDKfDQNFnXxuhlwjW wSAEeP3Wycjy97HeFB4YsJmqB+/jDZauZhzAFLunP4ZLolYktojdBo3ZFNthdRsVz47e TWnbZ+sHHfJbRePMggoa340f144glyDf4hQUk= Received: by 10.110.47.9 with SMTP id u9mr4724313tiu.4.1235784800874; Fri, 27 Feb 2009 17:33:20 -0800 (PST) Received: from ?192.168.249.26? ([124.13.6.185]) by mx.google.com with ESMTPS id a14sm2903034tia.7.2009.02.27.17.33.18 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 27 Feb 2009 17:33:19 -0800 (PST) Sender: Jason Rumney Message-ID: <49A89443.9080202@gnu.org> Date: Sat, 28 Feb 2009 09:32:51 +0800 From: Jason Rumney User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: David Engster , 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <87d4d3u61n.fsf@engster.org> In-Reply-To: <87d4d3u61n.fsf@engster.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit merge 2354 2497 David Engster wrote: > Maybe this is a duplicate of what I reported in > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=2354 > It seems so, yes. From unknown Mon Jun 23 13:13:25 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.420 (Entity 5.420) X-Loop: owner@emacsbugs.donarmstrong.com From: help-debbugs@gnu.org (Emacs bug Tracking System) To: David Engster Subject: bug#2354 closed by Eli Zaretskii (Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k) Message-ID: References: <87y6w5jqqo.fsf@engster.org> X-Emacs-PR-Message: they-closed 2354 X-Emacs-PR-Package: emacs Reply-To: 2354@debbugs.gnu.org Date: Sat, 28 Feb 2009 12:30:04 +0000 Content-Type: multipart/mixed; boundary="----------=_1235824204-12677-1" This is a multi-part message in MIME format... ------------=_1235824204-12677-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is an automatic notification regarding your bug report which was filed against the emacs package: #2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environm= ent Latin-1 It has been closed by Eli Zaretskii . Their explanation is attached below along with your original report. If this explanation is unsatisfactory and you have not received a better one in a separate message then please contact Eli Zaretskii by replying to this email. --=20 2354: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D2354 Emacs Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1235824204-12677-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 2354-done) by emacsbugs.donarmstrong.com; 28 Feb 2009 12:21:20 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout2.012.net.il (mtaout2.012.net.il [84.95.2.4]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SCLBmc011258; Sat, 28 Feb 2009 04:21:12 -0800 Received: from conversion-daemon.i_mtaout2.012.net.il by i_mtaout2.012.net.il (HyperSendmail v2004.12) id <0KFR00H00ZLJR800@i_mtaout2.012.net.il>; Sat, 28 Feb 2009 14:21:42 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.167.119]) by i_mtaout2.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0KFR0028OZO478C2@i_mtaout2.012.net.il>; Sat, 28 Feb 2009 14:21:42 +0200 (IST) Date: Sat, 28 Feb 2009 14:21:08 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: <87d4d3u61n.fsf@engster.org> X-012-Sender: halo1@inter.net.il To: 2497-done@debbugs.gnu.org, 2354-done@debbugs.gnu.org Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> <87d4d3u61n.fsf@engster.org> X-CrossAssassin-Score: 2 > From: David Engster > Date: Fri, 27 Feb 2009 18:46:12 +0100 > Cc: emacs-pretest-bug@gnu.org, 2497@emacsbugs.donarmstrong.com > > Uwe Siart writes: > > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > > fails to read utf-8 encoded files correctly. When visiting a file in > > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > > indicates iso-latin1-dos for saving the file. This has not been an > > issue in 23.0.90. > > Maybe this is a duplicate of what I reported in > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=2354 > > As I write later in that bug report, I think I could track down this > issue to the change in revision 1.413 of src/coding.c. Maybe you could > try if the same applies to your problem. Should be fixed by this change: 2009-02-28 Eli Zaretskii * coding.c (detect_coding_charset): Fix change from 2008-10-21. Also, check iso-latin-*, not only iso-8859-*. ------------=_1235824204-12677-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by emacsbugs.donarmstrong.com; 17 Feb 2009 10:35:40 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=0.1 required=4.0 tests=FOURLA autolearn=no version=3.2.5-bugs.debian.org_2005_01_02 Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1HAZVl4028159 for ; Tue, 17 Feb 2009 02:35:33 -0800 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LZNIY-0003fK-Sq for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 05:35:30 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LZNIR-0003VJ-Qq for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 05:35:25 -0500 Received: from [199.232.76.173] (port=51423 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LZNIQ-0003UY-NL for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 05:35:22 -0500 Received: from m61s02.vlinux.de ([83.151.21.164]:48492) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LZNIQ-0007hp-1t for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 05:35:22 -0500 Received: from dslb-082-083-056-080.pools.arcor-ip.net ([82.83.56.80] helo=void) by m61s02.vlinux.de with esmtpsa (TLS-1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.63) (envelope-from ) id 1LZNKC-0001iN-Ng for bug-gnu-emacs@gnu.org; Tue, 17 Feb 2009 11:37:12 +0100 From: David Engster To: bug-gnu-emacs@gnu.org Subject: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.90 (gnu/linux) Date: Tue, 17 Feb 2009 11:35:11 +0100 Message-ID: <87y6w5jqqo.fsf@engster.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) This is what I believe to be a regression in CVS Emacs since the 23.0.90 pretest. I'm using a fresh CVS checkout from 2009-02-17, compiled with 'make bootstrap'. You can reproduce it as follows: 1. emacs -Q 2. M-x set-language-environment RET Latin-1 RET 3. In some buffer write: (ucs-insert "2500") 4. Eval it, so that the unicode character is inserted into the buffer. 5. Save the file and choose utf-8 as encoding. 6. Kill the buffer. 7. Load the file you just saved. Result: Emacs displays "=E2\224\200" for the unicode character. Expected behaviour: Emacs should detect utf-8 encoding and display correct character. Please note that this has worked without problems with the Emacs 23.0.90 pretest, so it must be due to some change(s) since then in CVS. In GNU Emacs 23.0.90.1 (i686-pc-linux-gnu, GTK+ Version 2.12.11) of 2009-02-17 on void Windowing system distributor `The X.Org Foundation', version 11.0.10402000 configured using `configure '--prefix=3D/usr/local/emacs'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: nil value of $XMODIFIERS: nil locale-coding-system: nil default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t tool-bar-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e p o r C-g M-x s e t - l a n =20 L a t i n w - w =20 1 M-x r e p o r Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. Making completion list... Quit Making completion list... ------------=_1235824204-12677-1-- From unknown Mon Jun 23 13:13:25 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.420 (Entity 5.420) X-Loop: owner@emacsbugs.donarmstrong.com From: help-debbugs@gnu.org (Emacs bug Tracking System) To: uwe.siart@tum.de Subject: bug#2497 closed by Eli Zaretskii (Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k) Message-ID: References: <877i3c55tg.fsf@tum.de> X-Emacs-PR-Message: they-closed 2497 X-Emacs-PR-Package: emacs Reply-To: 2497@debbugs.gnu.org Date: Sat, 28 Feb 2009 12:30:04 +0000 Content-Type: multipart/mixed; boundary="----------=_1235824204-12677-3" This is a multi-part message in MIME format... ------------=_1235824204-12677-3 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is an automatic notification regarding your bug report which was filed against the emacs package: #2354: 23.0.91; Fails to read UTF-8 on Win2k It has been closed by Eli Zaretskii . Their explanation is attached below along with your original report. If this explanation is unsatisfactory and you have not received a better one in a separate message then please contact Eli Zaretskii by replying to this email. --=20 2354: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D2354 Emacs Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1235824204-12677-3 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 2354-done) by emacsbugs.donarmstrong.com; 28 Feb 2009 12:21:20 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout2.012.net.il (mtaout2.012.net.il [84.95.2.4]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SCLBmc011258; Sat, 28 Feb 2009 04:21:12 -0800 Received: from conversion-daemon.i_mtaout2.012.net.il by i_mtaout2.012.net.il (HyperSendmail v2004.12) id <0KFR00H00ZLJR800@i_mtaout2.012.net.il>; Sat, 28 Feb 2009 14:21:42 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.167.119]) by i_mtaout2.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0KFR0028OZO478C2@i_mtaout2.012.net.il>; Sat, 28 Feb 2009 14:21:42 +0200 (IST) Date: Sat, 28 Feb 2009 14:21:08 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: <87d4d3u61n.fsf@engster.org> X-012-Sender: halo1@inter.net.il To: 2497-done@debbugs.gnu.org, 2354-done@debbugs.gnu.org Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> <87d4d3u61n.fsf@engster.org> X-CrossAssassin-Score: 2 > From: David Engster > Date: Fri, 27 Feb 2009 18:46:12 +0100 > Cc: emacs-pretest-bug@gnu.org, 2497@emacsbugs.donarmstrong.com > > Uwe Siart writes: > > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > > fails to read utf-8 encoded files correctly. When visiting a file in > > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > > indicates iso-latin1-dos for saving the file. This has not been an > > issue in 23.0.90. > > Maybe this is a duplicate of what I reported in > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=2354 > > As I write later in that bug report, I think I could track down this > issue to the change in revision 1.413 of src/coding.c. Maybe you could > try if the same applies to your problem. Should be fixed by this change: 2009-02-28 Eli Zaretskii * coding.c (detect_coding_charset): Fix change from 2008-10-21. Also, check iso-latin-*, not only iso-8859-*. ------------=_1235824204-12677-3 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by emacsbugs.donarmstrong.com; 27 Feb 2009 14:10:38 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=0.1 required=4.0 tests=FOURLA autolearn=no version=3.2.5-bugs.debian.org_2005_01_02 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1REAY58007668 for ; Fri, 27 Feb 2009 06:10:36 -0800 Received: from mx10.gnu.org ([199.232.76.166]:55952) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1Ld3Nv-0008O2-9i for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 09:08:15 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1Ld3Q7-0000gw-FM for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 09:10:32 -0500 Received: from mailrelay1.lrz-muenchen.de ([129.187.254.106]:33747) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Ld3Q7-0000gO-3W for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 09:10:31 -0500 Received: from PEGASUS ([129.187.140.137] [129.187.140.137]) by mailout.lrz-muenchen.de with ESMTP for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 15:10:19 +0100 Date: Fri, 27 Feb 2009 15:10:19 +0100 Message-Id: <877i3c55tg.fsf@tum.de> From: Uwe Siart To: emacs-pretest-bug@gnu.org Subject: 23.0.91; Fails to read UTF-8 on Win2k Reply-to: uwe.siart@tum.de X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.4-2.6 I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it fails to read utf-8 encoded files correctly. When visiting a file in utf-8 encoding all characters above 255 are screwed up and "C-h C RET" indicates iso-latin1-dos for saving the file. This has not been an issue in 23.0.90. -- Uwe In GNU Emacs 23.0.91.1 (i386-mingw-nt5.0.2195) of 2009-02-27 on SOFT-MJASON Windowing system distributor `Microsoft Corp.', version 5.0.2195 configured using `configure --with-gcc (3.4)' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: DEU value of $XMODIFIERS: nil locale-coding-system: cp1252 default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: iswitchb-mode: t display-time-mode: t auto-insert-mode: t diff-auto-refine-mode: t delete-selection-mode: t pc-selection-mode: t tooltip-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e p o r t Recent messages: Loading time...done Loading iswitchb...done For information about GNU Emacs and the GNU system, type C-h C-a. Making completion list... [2 times] ------------=_1235824204-12677-3--