From unknown Mon Jun 23 18:26:18 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#2497 <2497@debbugs.gnu.org> To: bug#2497 <2497@debbugs.gnu.org> Subject: Status: 23.0.91; Fails to read UTF-8 on Win2k Reply-To: bug#2497 <2497@debbugs.gnu.org> Date: Tue, 24 Jun 2025 01:26:18 +0000 retitle 2497 23.0.91; Fails to read UTF-8 on Win2k reassign 2497 emacs submitter 2497 uwe.siart@tum.de severity 2497 normal thanks From uwe.siart@tum.de Fri Feb 27 06:10:38 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 27 Feb 2009 14:10:38 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=0.1 required=4.0 tests=FOURLA autolearn=no version=3.2.5-bugs.debian.org_2005_01_02 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1REAY58007668 for ; Fri, 27 Feb 2009 06:10:36 -0800 Received: from mx10.gnu.org ([199.232.76.166]:55952) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1Ld3Nv-0008O2-9i for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 09:08:15 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1Ld3Q7-0000gw-FM for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 09:10:32 -0500 Received: from mailrelay1.lrz-muenchen.de ([129.187.254.106]:33747) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Ld3Q7-0000gO-3W for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 09:10:31 -0500 Received: from PEGASUS ([129.187.140.137] [129.187.140.137]) by mailout.lrz-muenchen.de with ESMTP for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 15:10:19 +0100 Date: Fri, 27 Feb 2009 15:10:19 +0100 Message-Id: <877i3c55tg.fsf@tum.de> From: Uwe Siart To: emacs-pretest-bug@gnu.org Subject: 23.0.91; Fails to read UTF-8 on Win2k Reply-to: uwe.siart@tum.de X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.4-2.6 I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it fails to read utf-8 encoded files correctly. When visiting a file in utf-8 encoding all characters above 255 are screwed up and "C-h C RET" indicates iso-latin1-dos for saving the file. This has not been an issue in 23.0.90. -- Uwe In GNU Emacs 23.0.91.1 (i386-mingw-nt5.0.2195) of 2009-02-27 on SOFT-MJASON Windowing system distributor `Microsoft Corp.', version 5.0.2195 configured using `configure --with-gcc (3.4)' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: DEU value of $XMODIFIERS: nil locale-coding-system: cp1252 default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: iswitchb-mode: t display-time-mode: t auto-insert-mode: t diff-auto-refine-mode: t delete-selection-mode: t pc-selection-mode: t tooltip-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e p o r t Recent messages: Loading time...done Loading iswitchb...done For information about GNU Emacs and the GNU system, type C-h C-a. Making completion list... [2 times] From eliz@gnu.org Fri Feb 27 08:03:22 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 16:03:23 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-1.4 required=4.0 tests=FOURLA,HAS_BUG_NUMBER, RCVD_IN_NIX1 autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout7.012.net.il (mtaout7.012.net.il [84.95.2.19]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RG3Jqt007069 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:03:20 -0800 Received: from conversion-daemon.i-mtaout7.012.net.il by i-mtaout7.012.net.il (HyperSendmail v2007.08) id <0KFQ00D00F3GT400@i-mtaout7.012.net.il> for 2497@emacsbugs.donarmstrong.com; Fri, 27 Feb 2009 18:03:21 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.167.119]) by i-mtaout7.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0KFQ00GU7F9JVG11@i-mtaout7.012.net.il>; Fri, 27 Feb 2009 18:03:21 +0200 (IST) Date: Fri, 27 Feb 2009 18:03:16 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: <877i3c55tg.fsf@tum.de> X-012-Sender: halo1@inter.net.il To: uwe.siart@tum.de, 2497@debbugs.gnu.org Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> > Date: Fri, 27 Feb 2009 15:10:19 +0100 > From: Uwe Siart > Cc: > > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > fails to read utf-8 encoded files correctly. When visiting a file in > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > indicates iso-latin1-dos for saving the file. Does it work with "C-x RET c utf-8 RET" immediately prior to "C-x C-f"? If it does, then the problem is with guessing the encoding, not with decoding it. Also, what is the default value of buffer-file-coding-system, and was it the same in 23.0.90? From lekktu@gmail.com Fri Feb 27 08:11:48 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 16:11:48 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mail-ew0-f176.google.com (mail-ew0-f176.google.com [209.85.219.176]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RGBi9p010540 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:11:46 -0800 Received: by ewy24 with SMTP id 24so1406891ewy.1 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:11:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=1Q/oWNNlbEQupDlUQH12UJ2D5J8SV4NUfJLI2rQUXd4=; b=DscUtMKL3SPlfoEcg0taZN3R26BQy+V6GMWTnoatm5q+B3gY+965Fcyv1e9mK+I+nb GGhzGN0tjZ2Nei2qdnp3Q1ZmD64pnnPRLrHzio1+bhgVs85X0rlTS9Q+L97axPo6ha97 Dn/BTxkhGrhEWnOmeiqC5SvJYHC3zwxKBxYXc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=DHL+pDa4P0wV7dSqTRuH2clMTZyG0QL1CtKo23d6mdQbF3poKTp5iYC5IuOBjrhDX0 GuYST5ZhzsBYjHXNTgnJcudTfnAVHDU4WomaDNc0R9rjazM+phlMbJprlJfO8XAPGlG/ zcrmEXDxiUX6P72UQ7vP53HAEtrzRUj99exAs= MIME-Version: 1.0 Received: by 10.210.36.10 with SMTP id j10mr563227ebj.31.1235751098499; Fri, 27 Feb 2009 08:11:38 -0800 (PST) In-Reply-To: <877i3c55tg.fsf@tum.de> References: <877i3c55tg.fsf@tum.de> Date: Fri, 27 Feb 2009 17:11:38 +0100 Message-ID: Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k From: Juanma Barranquero To: uwe.siart@tum.de Cc: 2497@debbugs.gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Feb 27, 2009 at 15:10, Uwe Siart wrote: > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > fails to read utf-8 encoded files correctly. When visiting a file in > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > indicates iso-latin1-dos for saving the file. This has not been an > issue in 23.0.90. Do you have a specific example of a UTF-8 coded file that was detected as UTF-8 in 23.0.90 and it is detected as Latin-1 in 23.0.91? For example, I create a UTF-8 file (without UTF-8 byte-order-mark "signature") with just the following contents: ca=C3=B1=C3=B3n And 23.0.90 also thinks it is Latin-1. That said, if you need UTF-8 to be given more priority than Latin-1, etc, you can use `set-coding-system-priority' in your .emacs. Juanma From lekktu@gmail.com Fri Feb 27 08:16:56 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 16:16:56 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.0 required=4.0 tests=GMAIL,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ey-out-2122.google.com (ey-out-2122.google.com [74.125.78.24]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RGGqjI011987 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:16:53 -0800 Received: by ey-out-2122.google.com with SMTP id 25so299106eya.13 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:16:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ITjMZRgXKjn/1ENh+quV04g8AYY42RRFLHpjlF23aI8=; b=fVXCQD3QaXD8a1LG2ZmNgs/xQih0MmGewoXJ8M9peNxF/7qte497J8HAhE6wD1o5Lw mCe+j7m8hc+JDGwihRQjfLVvmhrd9jczWBnG6oDhEhA6UH/FsdwukO/f9ej3e2Hsh6AC DXV6LjtCTJn3MqVF69ClZUyTTS1IApEP0IkEI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=m7LTH03PpfAagS4L+Y94swYwDv/XAq83p6KLUPRlcYIQpiJBh2EKADTQwmGtOwQ93P 5p/mKozUlnpbh8KQrE9A3vVxG9o26ClBzz6OLkvgGy7A7CxnOeuM5VTbmVrFGowEkrzL yNtai/YQ2RZQA3F1ITvzQriVoZ7M3cfqfUK48= MIME-Version: 1.0 Received: by 10.210.135.17 with SMTP id i17mr541489ebd.87.1235751411420; Fri, 27 Feb 2009 08:16:51 -0800 (PST) In-Reply-To: References: <877i3c55tg.fsf@tum.de> Date: Fri, 27 Feb 2009 17:16:51 +0100 Message-ID: Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k From: Juanma Barranquero To: uwe.siart@tum.de Cc: 2497@debbugs.gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Feb 27, 2009 at 17:11, Juanma Barranquero wrote: > ca=C3=B1=C3=B3n > > And 23.0.90 also thinks it is Latin-1. Just to be clear: of course "ca=C3=B1=C3=B3n" is Latin-1. What I mean is th= at emacs 23.0.90 also reads the byte representation of "ca=C3=B1=C3=B3n" in UT= F-8, that is: 0000000 63 61 c3 b1 c3 b3 6e and interprets it as Latin-1: ca=C3=83=C2=B1=C3=83=C2=B3n Juanma From uwe.siart@tum.de Fri Feb 27 08:23:53 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 16:23:53 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-1.9 required=4.0 tests=FOURLA,GMAIL,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mailrelay1.lrz-muenchen.de (mailrelay1.lrz-muenchen.de [129.187.254.106]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RGNnjk013393 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:23:51 -0800 Received: from PEGASUS ([129.187.140.137] [129.187.140.137]) by mailout.lrz-muenchen.de with ESMTP; Fri, 27 Feb 2009 17:23:44 +0100 From: Uwe Siart To: Juanma Barranquero Cc: 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> Reply-To: uwe.siart@tum.de Date: Fri, 27 Feb 2009 17:23:43 +0100 In-Reply-To: (Juanma Barranquero's message of "Fri, 27 Feb 2009 17:11:38 +0100") Message-Id: <873adzg86o.fsf@tum.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.90 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Juanma Barranquero writes: > On Fri, Feb 27, 2009 at 15:10, Uwe Siart wrote: > >> I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it >> fails to read utf-8 encoded files correctly. When visiting a file in >> utf-8 encoding all characters above 255 are screwed up and "C-h C RET" >> indicates iso-latin1-dos for saving the file. This has not been an >> issue in 23.0.90. > > Do you have a specific example of a UTF-8 coded file that was detected > as UTF-8 in 23.0.90 and it is detected as Latin-1 in 23.0.91? Yes. My .gnus.el: I hope, the webserver delivers it in utf-8 encoding. > For example, I create a UTF-8 file (without UTF-8 byte-order-mark > "signature") with just the following contents: > > ca=F1=F3n > > And 23.0.90 also thinks it is Latin-1. Maybe because it can be encoded in latin-1. That would be ok for me. But my .gnus.el contains symbols (arrows for the summary buffer) that are definitely not included in latin-1 but 23.0.91 recognises latin-1. --=20 Uwe From uwe.siart@tum.de Fri Feb 27 08:28:13 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 16:28:13 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.0 required=4.0 tests=GMAIL,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mailrelay1.lrz-muenchen.de (mailrelay1.lrz-muenchen.de [129.187.254.106]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RGSA7W014818 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:28:11 -0800 Received: from PEGASUS ([129.187.140.137] [129.187.140.137]) by mailout.lrz-muenchen.de with ESMTP; Fri, 27 Feb 2009 17:27:57 +0100 From: Uwe Siart To: Juanma Barranquero Cc: 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> Reply-To: uwe.siart@tum.de Date: Fri, 27 Feb 2009 17:27:56 +0100 In-Reply-To: (Juanma Barranquero's message of "Fri, 27 Feb 2009 17:16:51 +0100") Message-Id: <87y6vretf7.fsf@tum.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.90 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Juanma Barranquero writes: > Just to be clear: of course "ca=F1=F3n" is Latin-1. What I mean is that > emacs 23.0.90 also reads the byte representation of "ca=F1=F3n" in UTF-8, > that is: > > 0000000 63 61 c3 b1 c3 b3 6e > > and interprets it as Latin-1: ca=C3=B1=C3=B3n I tried this out in 23.0.90 in the following way: - mark "ca=F1=F3n" from your mail - create empty file with 'touch t.txt' - visit t.txt and yank ca=F1=F3n - save t.txt - visit t.txt and get correct result (ca=F1=F3n not ca=C3=B1=C3=B3n) --=20 Uwe From lekktu@gmail.com Fri Feb 27 08:32:35 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 16:32:36 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.188]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RGWWZQ016162 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:32:33 -0800 Received: by nf-out-0910.google.com with SMTP id g16so535879nfd.31 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:32:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=kj3DX862cBYYfQ6/aNJtCJI7jnEHh5C+J17b+of2X8c=; b=nbf3hDLCrInubTJVL/+U+Wo6lMvS+iKchukwIxuDe5iL9FejRTguo3GvuzqzQnHBAT ik/AsexQ5IHS8RSzWJ/OA6sNqMGFoA6gxfPGvA8szXcG929s/7rCYo/uJbbKMici5U9c XCshUbeYbEN3bQMqcuujQhEs0ZK/Hw77UZSSk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=tY4h1X3Wi7Kz1h4/sTjXWHiuweDzGJY+Q4LlU7++2vYaKEGBFTv15Sm0KDmWgH5e3Z IkIkDydTAT3pk4URtc/vtSywG0mLjYAqYPNe/Mban7TbmcWC3zZtW2Luka6d0DbY7B3Y GOU8qd9NncJT1xDFqHBWUFE0+rnKZb/8rlNFM= MIME-Version: 1.0 Received: by 10.210.89.4 with SMTP id m4mr2083080ebb.82.1235752351619; Fri, 27 Feb 2009 08:32:31 -0800 (PST) In-Reply-To: <87y6vretf7.fsf@tum.de> References: <877i3c55tg.fsf@tum.de> <87y6vretf7.fsf@tum.de> Date: Fri, 27 Feb 2009 17:32:31 +0100 Message-ID: Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k From: Juanma Barranquero To: uwe.siart@tum.de Cc: 2497@debbugs.gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Feb 27, 2009 at 17:27, Uwe Siart wrote: > I tried this out in 23.0.90 in the following way: > > - mark "ca=C3=B1=C3=B3n" from your mail > - create empty file with 'touch t.txt' > - visit t.txt and yank ca=C3=B1=C3=B3n > - save t.txt > - visit t.txt > > and get correct result (ca=C3=B1=C3=B3n not ca=C3=83=C2=B1=C3=83=C2=B3n) Of course: you've created a file t.txt encoded in Latin-1, not UTF-8. Juanma From lekktu@gmail.com Fri Feb 27 08:38:46 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 16:38:46 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mail-ew0-f176.google.com (mail-ew0-f176.google.com [209.85.219.176]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RGcgbM017620 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:38:44 -0800 Received: by ewy24 with SMTP id 24so1435405ewy.1 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:38:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=gsz6TNKlEw9xGp9gznDS9mwhszmVCZCJs9TtsFsqHdM=; b=NcEVe6ilfJZBCrnhrZ9m8c1ugIUAkIfRXuqX6kh0DPE/ZcYnkomRZIAtWqxXtSbWN9 zelZKZb7sWTiVz8nsKE94GMl9sDNyvyhDWffnFm2le5gLAYCTkPynCNrBYgJkwirv5hR o/ymALG+ksAmF/Rj8EXPinodRrac2c9WXRrf4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=FR1tntSPy0ERGYByNVXMl+p3Mi+1Q3Cw3F+dK48plFTtxWAWiPtXOeBBqyvyhcGTzg dsYkPM/Ih3mlUMYQkSIfylSsOInrU6NgZFmZSD3A1D75+hCQGSFxBLJPYHe8QYSf2ajd KUCtSE/Yj2y8Ve8C8/4+40rOn4BxinkQ1OwVc= MIME-Version: 1.0 Received: by 10.210.12.13 with SMTP id 13mr2080755ebl.97.1235752717074; Fri, 27 Feb 2009 08:38:37 -0800 (PST) In-Reply-To: <873adzg86o.fsf@tum.de> References: <877i3c55tg.fsf@tum.de> <873adzg86o.fsf@tum.de> Date: Fri, 27 Feb 2009 17:38:37 +0100 Message-ID: Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k From: Juanma Barranquero To: uwe.siart@tum.de Cc: 2497@debbugs.gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On Fri, Feb 27, 2009 at 17:23, Uwe Siart wrote: > Yes. My .gnus.el: Aha, yes, the bug is reproducible. > I hope, the webserver delivers it in utf-8 encoding. Yes. Emacs 23.0.90 opens it as utf-8, as does Notepad2. > But > my .gnus.el contains symbols (arrows for the summary buffer) that are > definitely not included in latin-1 but 23.0.91 recognises latin-1. Juanma From uwe.siart@tum.de Fri Feb 27 08:48:31 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 16:48:31 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mailrelay1.lrz-muenchen.de (mailrelay1.lrz-muenchen.de [129.187.254.106]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RGmSRn020312 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 08:48:29 -0800 Received: from PEGASUS ([129.187.140.137] [129.187.140.137]) by mailout.lrz-muenchen.de with ESMTP; Fri, 27 Feb 2009 17:48:15 +0100 From: Uwe Siart To: Eli Zaretskii Cc: 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> Reply-To: uwe.siart@tum.de Date: Fri, 27 Feb 2009 17:48:15 +0100 In-Reply-To: (Eli Zaretskii's message of "Fri, 27 Feb 2009 18:03:16 +0200") Message-Id: <87ljrromgg.fsf@tum.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.90 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Eli Zaretskii writes: >> Date: Fri, 27 Feb 2009 15:10:19 +0100 >> From: Uwe Siart >> Cc: >> >> I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it >> fails to read utf-8 encoded files correctly. When visiting a file in >> utf-8 encoding all characters above 255 are screwed up and "C-h C RET" >> indicates iso-latin1-dos for saving the file. > > Does it work with "C-x RET c utf-8 RET" immediately prior to > "C-x C-f"? It works with "C-x RET c utf-8 RET" immediately prior to "C-x C-f". > If it does, then the problem is with guessing the encoding, not with > decoding it. That's also my impression. > Also, what is the default value of buffer-file-coding-system, and was > it the same in 23.0.90? iso-latin-1-dos in 23.0.90 and in 23.0.91. -- Uwe From geb-bug-gnu-emacs@m.gmane.org Fri Feb 27 09:02:44 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 27 Feb 2009 17:02:45 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RH2dcH024335 for ; Fri, 27 Feb 2009 09:02:41 -0800 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Ld66g-0002Dw-1A for bug-gnu-emacs@gnu.org; Fri, 27 Feb 2009 12:02:38 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Ld66e-0002DV-AX for bug-gnu-emacs@gnu.org; Fri, 27 Feb 2009 12:02:37 -0500 Received: from [199.232.76.173] (port=59308 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ld66d-0002DC-MB for bug-gnu-emacs@gnu.org; Fri, 27 Feb 2009 12:02:36 -0500 Received: from main.gmane.org ([80.91.229.2]:57985 helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Ld66d-0004BB-37 for bug-gnu-emacs@gnu.org; Fri, 27 Feb 2009 12:02:35 -0500 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Ld66Y-0002Cw-ET for bug-gnu-emacs@gnu.org; Fri, 27 Feb 2009 17:02:30 +0000 Received: from smaug.linux.pwf.cam.ac.uk ([193.60.95.72]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 27 Feb 2009 17:02:30 +0000 Received: from sdl.web by smaug.linux.pwf.cam.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 27 Feb 2009 17:02:30 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: bug-gnu-emacs@gnu.org From: Leo Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Date: Fri, 27 Feb 2009 17:02:19 +0000 Organization: University of Cambridge Lines: 37 Message-ID: References: <877i3c55tg.fsf@tum.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: smaug.linux.pwf.cam.ac.uk User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) Cancel-Lock: sha1:PPCJNHuQlYMkF3HLaLbj0kKX0oo= Sender: news X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) On 2009-02-27 16:11 +0000, Juanma Barranquero wrote: > On Fri, Feb 27, 2009 at 15:10, Uwe Siart wrote: > >> I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it >> fails to read utf-8 encoded files correctly. When visiting a file in >> utf-8 encoding all characters above 255 are screwed up and "C-h C RET" >> indicates iso-latin1-dos for saving the file. This has not been an >> issue in 23.0.90. > > Do you have a specific example of a UTF-8 coded file that was detected > as UTF-8 in 23.0.90 and it is detected as Latin-1 in 23.0.91? > > For example, I create a UTF-8 file (without UTF-8 byte-order-mark > "signature") with just the following contents: > > cañón > > And 23.0.90 also thinks it is Latin-1. > > That said, if you need UTF-8 to be given more priority than Latin-1, > etc, you can use `set-coding-system-priority' in your .emacs. > > Juanma I have the following code in my .emacs when I changed to w32 last June. So the problem might exist longer. ;;; FIXME: find out why GNU/Linux does not need this (prefer-coding-system 'utf-8) I just tested some Chinese files. Without that line, all of them are being opened in latin-1 encoding and are unreadable. Tested in GNU Emacs 23.0.91.1 (i386-mingw-nt5.1.2600) of 2009-02-26 -- .: Leo :. [ sdl.web AT gmail.com ] .: I use Emacs :. From david@engster.org Fri Feb 27 09:46:41 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 27 Feb 2009 17:46:41 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=unavailable version=3.2.5-bugs.debian.org_2005_01_02 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RHkcu3005318 for ; Fri, 27 Feb 2009 09:46:39 -0800 Received: from mail.gnu.org ([199.232.76.166]:37237 helo=mx10.gnu.org) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1Ld6ku-0002ST-Ao for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 12:44:19 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1Ld6n3-0002nY-P6 for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 12:46:29 -0500 Received: from m61s02.vlinux.de ([83.151.21.164]:43436) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Ld6n3-0002mG-2s for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 12:46:25 -0500 Received: from dslc-082-082-164-201.pools.arcor-ip.net ([82.82.164.201] helo=honk) by m61s02.vlinux.de with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.69) (envelope-from ) id 1Ld6p8-000497-UF; Fri, 27 Feb 2009 18:48:35 +0100 From: David Engster To: uwe.siart@tum.de Cc: 2497@debbugs.gnu.org, emacs-pretest-bug@gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-Reply-To: <877i3c55tg.fsf@tum.de> (Uwe Siart's message of "Fri, 27 Feb 2009 15:10:19 +0100") References: <877i3c55tg.fsf@tum.de> User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.60 (gnu/linux) Mail-Copies-To: never Date: Fri, 27 Feb 2009 18:46:12 +0100 Message-ID: <87d4d3u61n.fsf@engster.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) Uwe Siart writes: > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > fails to read utf-8 encoded files correctly. When visiting a file in > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > indicates iso-latin1-dos for saving the file. This has not been an > issue in 23.0.90. Maybe this is a duplicate of what I reported in http://debbugs.gnu.org/cgi/bugreport.cgi?bug=2354 As I write later in that bug report, I think I could track down this issue to the change in revision 1.413 of src/coding.c. Maybe you could try if the same applies to your problem. -David From eliz@gnu.org Fri Feb 27 10:19:41 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 18:19:41 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-1.5 required=4.0 tests=HAS_BUG_NUMBER,RCVD_IN_NIX1 autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout6.012.net.il (mtaout6.012.net.il [84.95.2.16]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RIJbat014460 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 10:19:39 -0800 Received: from conversion-daemon.i-mtaout6.012.net.il by i-mtaout6.012.net.il (HyperSendmail v2007.08) id <0KFQ00B00LBSJT00@i-mtaout6.012.net.il> for 2497@emacsbugs.donarmstrong.com; Fri, 27 Feb 2009 20:19:48 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.167.119]) by i-mtaout6.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0KFQ00M8JLKR7CG0@i-mtaout6.012.net.il>; Fri, 27 Feb 2009 20:19:48 +0200 (IST) Date: Fri, 27 Feb 2009 20:19:04 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: <87ljrromgg.fsf@tum.de> X-012-Sender: halo1@inter.net.il To: uwe.siart@tum.de Cc: 2497@debbugs.gnu.org Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> > From: Uwe Siart > Cc: 2497@emacsbugs.donarmstrong.com > Date: Fri, 27 Feb 2009 17:48:15 +0100 > > It works with "C-x RET c utf-8 RET" immediately prior to "C-x C-f". > > > If it does, then the problem is with guessing the encoding, not with > > decoding it. > > That's also my impression. > > > Also, what is the default value of buffer-file-coding-system, and was > > it the same in 23.0.90? > > iso-latin-1-dos in 23.0.90 and in 23.0.91. Then you shouldn't expect Emacs to guess UTF-8 encoding correctly in every single instance. Distinguishing between UTF-8 and Latin-1 is generally impossible with the current state of the art of coded character sets support in Emacs. It might work in certain cases, but that's sheer luck. One way to work around that in your specific case, without changing your global defaults, is to add a `coding:' cookie to your .gnus.el file. From eliz@gnu.org Fri Feb 27 10:20:20 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 18:20:20 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.0 required=4.0 tests=GMAIL,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout5.012.net.il (mtaout5.012.net.il [84.95.2.13]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RIKFw2015083 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 10:20:17 -0800 Received: from conversion-daemon.i_mtaout5.012.net.il by i_mtaout5.012.net.il (HyperSendmail v2004.12) id <0KFQ00E00LLGF400@i_mtaout5.012.net.il> for 2497@emacsbugs.donarmstrong.com; Fri, 27 Feb 2009 20:20:11 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.167.119]) by i_mtaout5.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0KFQ000MOLLB1CE0@i_mtaout5.012.net.il>; Fri, 27 Feb 2009 20:20:11 +0200 (IST) Date: Fri, 27 Feb 2009 20:19:47 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: X-012-Sender: halo1@inter.net.il To: Juanma Barranquero , 2497@debbugs.gnu.org Cc: uwe.siart@tum.de Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> <873adzg86o.fsf@tum.de> > Date: Fri, 27 Feb 2009 17:38:37 +0100 > From: Juanma Barranquero > Cc: 2497@emacsbugs.donarmstrong.com > > On Fri, Feb 27, 2009 at 17:23, Uwe Siart wrote: > > > Yes. My .gnus.el: > > Aha, yes, the bug is reproducible. Which bug? From uwe.siart@tum.de Fri Feb 27 12:35:38 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 20:35:39 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mailrelay1.lrz-muenchen.de (mailrelay1.lrz-muenchen.de [129.187.254.106]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RKZYhI023728 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 12:35:36 -0800 Received: from PEGASUS ([129.187.100.25] [129.187.100.25]) by mailout.lrz-muenchen.de with ESMTP; Fri, 27 Feb 2009 21:35:06 +0100 From: Uwe Siart To: Eli Zaretskii Cc: 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> Reply-To: uwe.siart@tum.de Date: Fri, 27 Feb 2009 21:35:08 +0100 In-Reply-To: (Eli Zaretskii's message of "Fri, 27 Feb 2009 20:19:04 +0200") Message-Id: <87ljrr7h4z.fsf@tum.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.90 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Eli Zaretskii writes: >> From: Uwe Siart >> iso-latin-1-dos in 23.0.90 and in 23.0.91. > > Then you shouldn't expect Emacs to guess UTF-8 encoding correctly in > every single instance. Distinguishing between UTF-8 and Latin-1 is > generally impossible with the current state of the art of coded > character sets support in Emacs. It might work in certain cases, but > that's sheer luck. I do not have the background knowledge to join in this conversation but I just observed that it worked correctly for years now (even with CVS Emacsen prior to the 22.1 release) and that it stopped working in 23.0.91. If it appears that this is not a bug then I will take the measures you suggested and set a utf-8 cookie in all files concerned. -- Uwe From lekktu@gmail.com Fri Feb 27 12:38:26 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 27 Feb 2009 20:38:26 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mail-ew0-f176.google.com (mail-ew0-f176.google.com [209.85.219.176]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RKcJc8023755 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 12:38:20 -0800 Received: by ewy24 with SMTP id 24so1613688ewy.1 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 12:38:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=fHrLO8VLGqOK3ptn3zR4XlLKhUtUHERNoCvzt/20ytc=; b=Xr8pv7C4J8QPAeJLUW6J3s1Gij4/BXP+rZkAY83Ie3kgYUKcK0c/MCoO+TAkxeEKMR Gk4hWnoa62/Rw67K+IqY+Boc8m6uRV4A7Ae3pQjuJMlNniDRsM+RM+Pm2unOdaBKaFOE l4o0sNSgUxR0jXo10SaGgmYPHXhlGWlTQh5U8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=EIhyBJIFdm+JcJ8P3kWlW0y9eUbGlOcIa/UTAF3x/GXv5p7wh7FNR2hqdCj3/Cy/vV 71UkRrlLTtxk2DB7iZuG+s71OcbN7cpp0zxeLaWBDQMWMMblM3p5Q63WzqVVpDGRuEWp c1JgX6Mzh2bKy1V4qd9vrUiAKxxoOcI9+btw8= MIME-Version: 1.0 Received: by 10.210.45.17 with SMTP id s17mr2244434ebs.93.1235767093695; Fri, 27 Feb 2009 12:38:13 -0800 (PST) In-Reply-To: References: <877i3c55tg.fsf@tum.de> <873adzg86o.fsf@tum.de> Date: Fri, 27 Feb 2009 21:38:13 +0100 Message-ID: Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k From: Juanma Barranquero To: Eli Zaretskii Cc: 2497@debbugs.gnu.org, uwe.siart@tum.de Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On Fri, Feb 27, 2009 at 19:19, Eli Zaretskii wrote: >> Aha, yes, the bug is reproducible. > > Which bug? I mean, the fact that the given .gnus.el file was read as utf-8-dos in 23.0.90 and as iso-latin1-dos in 23.0.91 (with characters that are not latin-1). Juanma From uwe.siart@tum.de Fri Feb 27 13:16:03 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 27 Feb 2009 21:16:03 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=unavailable version=3.2.5-bugs.debian.org_2005_01_02 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RLG190001498 for ; Fri, 27 Feb 2009 13:16:02 -0800 Received: from mx10.gnu.org ([199.232.76.166]:50923) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1LdA1c-0002zG-TF for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 16:13:40 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1LdA3q-0001h5-UZ for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 16:15:59 -0500 Received: from mailrelay1.lrz-muenchen.de ([129.187.254.106]:43135) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LdA3q-0001gV-CG for emacs-pretest-bug@gnu.org; Fri, 27 Feb 2009 16:15:58 -0500 Received: from PEGASUS ([129.187.41.149] [129.187.41.149]) by mailout.lrz-muenchen.de with ESMTP; Fri, 27 Feb 2009 22:15:34 +0100 From: Uwe Siart To: 2497@debbugs.gnu.org Cc: emacs-pretest-bug@gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <87d4d3u61n.fsf@engster.org> Reply-To: uwe.siart@tum.de Date: Fri, 27 Feb 2009 22:15:36 +0100 In-Reply-To: <87d4d3u61n.fsf@engster.org> (David Engster's message of "Fri, 27 Feb 2009 18:46:12 +0100") Message-Id: <874oyf1szr.fsf@tum.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.4-2.6 David Engster writes: > Maybe this is a duplicate of what I reported in > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=2354 > > As I write later in that bug report, I think I could track down this > issue to the change in revision 1.413 of src/coding.c. Maybe you could > try if the same applies to your problem. At least I can reproduce it and it seems to be the very same thing that I stumbled across. But due to lack of detailed knowledge about coding recognition I'm unable to join the discussion whether this is a bug or not. It's just that I felt more comfortable about the previous state. So far I got things back to work with ;; -*- coding:utf-8-dos; -*- as the first line of my .gnus.el :-) -- Uwe From rms@gnu.org Fri Feb 27 15:36:35 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 27 Feb 2009 23:36:35 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1RNaRIh012225; Fri, 27 Feb 2009 15:36:28 -0800 Received: from rms by fencepost.gnu.org with local (Exim 4.67) (envelope-from ) id 1LdCDY-0000uj-6X; Fri, 27 Feb 2009 18:34:08 -0500 Content-Type: text/plain; charset=ISO-8859-15 From: Richard M Stallman To: uwe.siart@tum.de, 2497@debbugs.gnu.org CC: emacs-pretest-bug@gnu.org In-reply-to: <877i3c55tg.fsf@tum.de> (message from Uwe Siart on Fri, 27 Feb 2009 15:10:19 +0100) Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k Reply-to: rms@gnu.org References: <877i3c55tg.fsf@tum.de> Message-Id: Date: Fri, 27 Feb 2009 18:34:08 -0500 Please don't call that system "Win"--that name implies praise. From jasonrumney@gmail.com Fri Feb 27 17:29:40 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 01:29:40 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.0 required=4.0 tests=GMAIL,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from el-out-1112.google.com (el-out-1112.google.com [209.85.162.183]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1S1TbX3018553 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 17:29:38 -0800 Received: by el-out-1112.google.com with SMTP id o28so1188708ele.22 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 17:29:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=KX8iBvr9vzQqjO8LzShubc+Uw3NQAf6ZXmlu9pBZP2U=; b=NHUPFtTtbXm1soyDkbGapBioI2lHVWiJEc9ejxAvLMs1T8yV6+ppT9dChbL4r0F6pG VeXR/9Lk0NrPwFla+kvD2ADbv8PyyeMBq8V235wAOLOyR1CWFtliCvq18i9p2gP+tsCM 7PFyRV5XLsqZxtT05kvJjF4SWz7NgbHk58sMw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; b=pqb+hxLOeYbH1+WKXWTv9g6UqqB/NSmiWluGFsnQ57J8mbwTkqZpBEzphGe+VmRCLs psOzQT0LAEgFnsIi5Ky85EBpAyHVdIr0KVGsyGv3QGiI5pnqcRKnYr2Fze/Q/7ApB3oZ yDYf+dKZQoudk9gR3nOVI/WpuC0iDr4o4Nlg8= Received: by 10.110.52.5 with SMTP id z5mr1246015tiz.26.1235784575412; Fri, 27 Feb 2009 17:29:35 -0800 (PST) Received: from ?192.168.249.26? ([124.13.6.185]) by mx.google.com with ESMTPS id a14sm3218960tia.27.2009.02.27.17.29.32 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 27 Feb 2009 17:29:33 -0800 (PST) Sender: Jason Rumney Message-ID: <49A89361.7040006@f2s.com> Date: Sat, 28 Feb 2009 09:29:05 +0800 From: Jason Rumney User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: Eli Zaretskii , 2497@debbugs.gnu.org CC: Juanma Barranquero , uwe.siart@tum.de Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <873adzg86o.fsf@tum.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Eli Zaretskii wrote: >> Date: Fri, 27 Feb 2009 17:38:37 +0100 >> From: Juanma Barranquero >> Cc: 2497@emacsbugs.donarmstrong.com >> >> On Fri, Feb 27, 2009 at 17:23, Uwe Siart wrote: >> >> >>> Yes. My .gnus.el: >>> >> Aha, yes, the bug is reproducible. >> > > Which bug? > The one where the OP's .gnus.el contains characters which were correctly detected as UTF-8 in 23.0.90, but now appear as \200\224 octal escapes, as the file is incorrectly detected as Latin-1. From jasonrumney@gmail.com Fri Feb 27 17:33:30 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 01:33:30 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.168]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1S1XLBJ020072; Fri, 27 Feb 2009 17:33:23 -0800 Received: by wf-out-1314.google.com with SMTP id 24so1627758wfg.13 for ; Fri, 27 Feb 2009 17:33:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=cQXLo9yPykRIeugxzGSb2Lic4XwRqMqlEB6lmjWySR8=; b=RUU3blHrr3wCQO7GU4gKJHr7yhsffwWyE/l7Xt4CH4JHiumBRkrGmzE7DSqz5NYMOX VtYoSHb8kbVpLncoGarLlj97rZyuthVN49o1eAGaK7PwBN3c2jEjSXqyDEssyQ/vRDCL QxIYGnj2eRkao11Dy+IWuRsUW5aK3mwXM6mXs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; b=WpnrJ8eStBH8GnLOxovbgldfKEdBR1cI+VVfK1ca0GY4hWpeBoDKfDQNFnXxuhlwjW wSAEeP3Wycjy97HeFB4YsJmqB+/jDZauZhzAFLunP4ZLolYktojdBo3ZFNthdRsVz47e TWnbZ+sHHfJbRePMggoa340f144glyDf4hQUk= Received: by 10.110.47.9 with SMTP id u9mr4724313tiu.4.1235784800874; Fri, 27 Feb 2009 17:33:20 -0800 (PST) Received: from ?192.168.249.26? ([124.13.6.185]) by mx.google.com with ESMTPS id a14sm2903034tia.7.2009.02.27.17.33.18 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 27 Feb 2009 17:33:19 -0800 (PST) Sender: Jason Rumney Message-ID: <49A89443.9080202@gnu.org> Date: Sat, 28 Feb 2009 09:32:51 +0800 From: Jason Rumney User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: David Engster , 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <87d4d3u61n.fsf@engster.org> In-Reply-To: <87d4d3u61n.fsf@engster.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit merge 2354 2497 David Engster wrote: > Maybe this is a duplicate of what I reported in > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=2354 > It seems so, yes. From monnier@iro.umontreal.ca Fri Feb 27 20:40:11 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 04:40:12 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.4 required=4.0 tests=FOURLA,HAS_BUG_NUMBER, XIRONPORT autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ironport2-out.teksavvy.com (ironport2-out.pppoe.ca [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1S4e7ff014940 for <2497@emacsbugs.donarmstrong.com>; Fri, 27 Feb 2009 20:40:09 -0800 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArIFACtPqEnO+JhN/2dsb2JhbACBWdVZhBQGg2s X-IronPort-AV: E=Sophos;i="4.38,280,1233550800"; d="scan'208";a="34517798" Received: from 206-248-152-77.dsl.teksavvy.com (HELO pastel.home) ([206.248.152.77]) by ironport2-out.teksavvy.com with ESMTP; 27 Feb 2009 23:40:02 -0500 Received: by pastel.home (Postfix, from userid 20848) id 9F9D09170; Fri, 27 Feb 2009 23:40:01 -0500 (EST) From: Stefan Monnier To: Eli Zaretskii Cc: 2497@debbugs.gnu.org, uwe.siart@tum.de Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Message-ID: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> Date: Fri, 27 Feb 2009 23:40:01 -0500 In-Reply-To: (Eli Zaretskii's message of "Fri, 27 Feb 2009 20:19:04 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii >> It works with "C-x RET c utf-8 RET" immediately prior to "C-x C-f". >> > If it does, then the problem is with guessing the encoding, not with >> > decoding it. >> That's also my impression. >> > Also, what is the default value of buffer-file-coding-system, and was >> > it the same in 23.0.90? >> iso-latin-1-dos in 23.0.90 and in 23.0.91. > Then you shouldn't expect Emacs to guess UTF-8 encoding correctly in > every single instance. Distinguishing between UTF-8 and Latin-1 is The guessing shouldn't give priority to buffer-file-coding-system. Instead we have the set-coding-system-priority instead. And IIUC utf-8 should always have a pretty high priority since false positives are fairly rare. So this still looks like a real bug. Stefan From uwe.siart@tum.de Sat Feb 28 00:18:08 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 08:18:08 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mailrelay1.lrz-muenchen.de (mailrelay1.lrz-muenchen.de [129.187.254.106]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1S8I3iq020962 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 00:18:05 -0800 Received: from PEGASUS ([129.187.98.247] [129.187.98.247]) by mailout.lrz-muenchen.de with ESMTP; Sat, 28 Feb 2009 09:17:32 +0100 From: Uwe Siart To: Stefan Monnier Cc: Eli Zaretskii , 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> Reply-To: uwe.siart@tum.de Date: Sat, 28 Feb 2009 09:17:35 +0100 In-Reply-To: (Stefan Monnier's message of "Fri, 27 Feb 2009 23:40:01 -0500") Message-Id: <87zlg7t1pc.fsf@tum.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Stefan Monnier writes: > The guessing shouldn't give priority to buffer-file-coding-system. > Instead we have the set-coding-system-priority instead. And IIUC utf-8 > should always have a pretty high priority since false positives are > fairly rare. So this still looks like a real bug. Here I would like to note that I never had false positives in the past (before 23.0.91) but I do have false positives now. Therefore I'm inclined to call it a bug. -- Uwe From uwe.siart@tum.de Sat Feb 28 01:48:03 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 28 Feb 2009 09:48:03 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=unavailable version=3.2.5-bugs.debian.org_2005_01_02 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1S9m0bX019167 for ; Sat, 28 Feb 2009 01:48:01 -0800 Received: from mail.gnu.org ([199.232.76.166]:46974 helo=mx10.gnu.org) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1LdLlM-0008V9-OQ for emacs-pretest-bug@gnu.org; Sat, 28 Feb 2009 04:45:40 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1LdLnZ-0002I7-Sw for emacs-pretest-bug@gnu.org; Sat, 28 Feb 2009 04:48:00 -0500 Received: from mailrelay1.lrz-muenchen.de ([129.187.254.106]:50008) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LdLnV-0002Gi-83; Sat, 28 Feb 2009 04:47:53 -0500 Received: from PEGASUS ([129.187.100.84] [129.187.100.84]) by mailout.lrz-muenchen.de with ESMTP; Sat, 28 Feb 2009 10:47:42 +0100 From: Uwe Siart To: rms@gnu.org Cc: 2497@debbugs.gnu.org, emacs-pretest-bug@gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k References: <877i3c55tg.fsf@tum.de> Reply-To: uwe.siart@tum.de Date: Sat, 28 Feb 2009 10:47:44 +0100 In-Reply-To: (Richard M. Stallman's message of "Fri, 27 Feb 2009 18:34:08 -0500") Message-Id: <87ocwmevun.fsf@tum.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.4-2.6 Richard M Stallman writes: > Please don't call that system "Win"--that name implies praise. How right you are. Forgive me my trespasses. In my own defence I have to say that I never thought of W2k as the "system". My system is Emacs and I'm very comfortable with it. W2k is its boot loader. The boot loader does not become noticeable too much. I never understood, however, why this boot loader takes up a whole CD. -- Uwe From david@engster.org Sat Feb 28 02:14:28 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 10:14:28 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-1.7 required=4.0 tests=FOURLA,HAS_BUG_NUMBER, IMPRONONCABLE_1,MURPHY_WRONG_WORD2 autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from m61s02.vlinux.de (m61s02.vlinux.de [83.151.21.164]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SAENKA028887 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 02:14:25 -0800 Received: from dslc-082-082-164-201.pools.arcor-ip.net ([82.82.164.201] helo=void) by m61s02.vlinux.de with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.69) (envelope-from ) id 1LdMFU-00060w-VZ; Sat, 28 Feb 2009 11:16:49 +0100 From: David Engster To: uwe.siart@tum.de Cc: 2497@debbugs.gnu.org, Stefan Monnier Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-Reply-To: <87zlg7t1pc.fsf@tum.de> (Uwe Siart's message of "Sat, 28 Feb 2009 09:17:35 +0100") References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> <87zlg7t1pc.fsf@tum.de> User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.91 (gnu/linux) Mail-Copies-To: never Date: Sat, 28 Feb 2009 11:14:16 +0100 Message-ID: <87tz6e3m2v.fsf@engster.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Uwe Siart writes: > Stefan Monnier writes: > >> The guessing shouldn't give priority to buffer-file-coding-system. >> Instead we have the set-coding-system-priority instead. And IIUC utf-8 >> should always have a pretty high priority since false positives are >> fairly rare. So this still looks like a real bug. > > Here I would like to note that I never had false positives in the past > (before 23.0.91) but I do have false positives now. Therefore I'm > inclined to call it a bug. I second this - this has worked for years without problems, and suddenly it fails to detect UTF-8 with a Latin-1 environment. I once again confirmed that this behaviour can be tracked down to this change in detect_coding_charset in coding.c (revision 1.413): --- coding.c 7 Feb 2009 10:49:39 -0000 1.412 +++ coding.c 9 Feb 2009 00:42:37 -0000 1.413 @@ -5101,7 +5101,7 @@ valids = AREF (attrs, coding_attr_charset_valids); name = CODING_ID_NAME (coding->id); if (VECTORP (Vlatin_extra_code_table) - && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-")) + && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-") == 0) check_latin_extra = 1; if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) src += head_ascii; I'm inclined to say that this change is wrong, since strcmp will only return 0 if two strings are exactly equal. In this case though, the string "iso-8859-" is compared to "iso-8859-1" (in my case), so it returns 1 and therefore check_latin_extra is not set. -David From eliz@gnu.org Sat Feb 28 02:50:10 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 10:50:10 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout1.012.net.il (mtaout1.012.net.il [84.95.2.1]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SAo6rZ009236 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 02:50:07 -0800 Received: from conversion-daemon.i-mtaout1.012.net.il by i-mtaout1.012.net.il (HyperSendmail v2007.08) id <0KFR00500V7YH500@i-mtaout1.012.net.il> for 2497@emacsbugs.donarmstrong.com; Sat, 28 Feb 2009 12:50:38 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.167.119]) by i-mtaout1.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0KFR004ZXVG7P6D0@i-mtaout1.012.net.il>; Sat, 28 Feb 2009 12:50:33 +0200 (IST) Date: Sat, 28 Feb 2009 12:49:58 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: X-012-Sender: halo1@inter.net.il To: Stefan Monnier , Kenichi Handa Cc: 2497@debbugs.gnu.org, uwe.siart@tum.de Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> > From: Stefan Monnier > Cc: 2497@emacsbugs.donarmstrong.com, uwe.siart@tum.de > Date: Fri, 27 Feb 2009 23:40:01 -0500 > > >> It works with "C-x RET c utf-8 RET" immediately prior to "C-x C-f". > >> > If it does, then the problem is with guessing the encoding, not with > >> > decoding it. > >> That's also my impression. > >> > Also, what is the default value of buffer-file-coding-system, and was > >> > it the same in 23.0.90? > >> iso-latin-1-dos in 23.0.90 and in 23.0.91. > > Then you shouldn't expect Emacs to guess UTF-8 encoding correctly in > > every single instance. Distinguishing between UTF-8 and Latin-1 is > > The guessing shouldn't give priority to buffer-file-coding-system. > Instead we have the set-coding-system-priority instead. Please give me some credit: I said ``the _default_value_ of buffer-file-coding-system''. That default tells volumes about the coding-system priorities. > And IIUC utf-8 should always have a pretty high priority With today's CVS on a Windows XP machine I get this: M-: (coding-system-priority-list) RET => (iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit utf-8-auto utf-8-with-signature utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le japanese-shift-jis undecided) So UTF-8 is indeed ``pretty high'', but lower than the locale's default. > So this still looks like a real bug. Perhaps it is, but I didn't know Emacs 23 can reliably distinguish between Latin-1 and UTF-8, even when UTF-8 sequences are present in the text. Can we do that reliably? Perhaps Handa-san can shed some light on this. From eliz@gnu.org Sat Feb 28 04:10:33 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 12:10:33 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-1.6 required=4.0 tests=FOURLA,HAS_BUG_NUMBER, IMPRONONCABLE_1,MURPHY_DRUGS_REL8,MURPHY_WRONG_WORD1,MURPHY_WRONG_WORD2 autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout1.012.net.il (mtaout1.012.net.il [84.95.2.1]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SCAScP008573 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 04:10:30 -0800 Received: from conversion-daemon.i-mtaout1.012.net.il by i-mtaout1.012.net.il (HyperSendmail v2007.08) id <0KFR00K00YXKPN00@i-mtaout1.012.net.il> for 2497@emacsbugs.donarmstrong.com; Sat, 28 Feb 2009 14:09:38 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.167.119]) by i-mtaout1.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0KFR004VLZ400VA0@i-mtaout1.012.net.il>; Sat, 28 Feb 2009 14:09:37 +0200 (IST) Date: Sat, 28 Feb 2009 14:09:04 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: <87tz6e3m2v.fsf@engster.org> X-012-Sender: halo1@inter.net.il To: David Engster , 2497@debbugs.gnu.org Cc: uwe.siart@tum.de Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> <87zlg7t1pc.fsf@tum.de> <87tz6e3m2v.fsf@engster.org> > From: David Engster > Date: Sat, 28 Feb 2009 11:14:16 +0100 > Cc: 2497@emacsbugs.donarmstrong.com > > I once again confirmed that this behaviour can be tracked down to this > change in detect_coding_charset in coding.c (revision 1.413): > > --- coding.c 7 Feb 2009 10:49:39 -0000 1.412 > +++ coding.c 9 Feb 2009 00:42:37 -0000 1.413 > @@ -5101,7 +5101,7 @@ > valids = AREF (attrs, coding_attr_charset_valids); > name = CODING_ID_NAME (coding->id); > if (VECTORP (Vlatin_extra_code_table) > - && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-")) > + && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-") == 0) > check_latin_extra = 1; > if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) > src += head_ascii; > > I'm inclined to say that this change is wrong, since strcmp will only > return 0 if two strings are exactly equal. In this case though, the > string "iso-8859-" is compared to "iso-8859-1" (in my case), so it > returns 1 and therefore check_latin_extra is not set. You are right. But in my case, it was not enough to test for "iso-8859-", as the symbol's name was "iso-latin-1", not "iso-8859-1". I installed the patch below, that does seem to fix the problem with the OP's .gnus.el, although I don't know how general that problem is, nor whether Emacs is capable of distinguishing UTF-8 from Latin-N in general. 2009-02-28 Eli Zaretskii * coding.c (detect_coding_charset): Fix change from 2008-10-21. Also, check iso-latin-*, not only iso-8859-*. Index: src/coding.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/coding.c,v retrieving revision 1.419 diff -u -r1.419 coding.c --- src/coding.c 22 Feb 2009 15:48:03 -0000 1.419 +++ src/coding.c 28 Feb 2009 12:01:18 -0000 @@ -5103,7 +5103,10 @@ valids = AREF (attrs, coding_attr_charset_valids); name = CODING_ID_NAME (coding->id); if (VECTORP (Vlatin_extra_code_table) - && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-") == 0) + && (strncmp ((char *) SDATA (SYMBOL_NAME (name)), + "iso-8859-", sizeof ("iso-8859-") - 1) == 0 + || strncmp ((char *) SDATA (SYMBOL_NAME (name)), + "iso-latin-", sizeof ("iso-latin-") - 1) == 0)) check_latin_extra = 1; if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) src += head_ascii; From uwe.siart@tum.de Sat Feb 28 04:16:46 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 12:16:46 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mailrelay1.lrz-muenchen.de (mailrelay1.lrz-muenchen.de [129.187.254.106]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SCGf5S009950 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 04:16:43 -0800 Received: from PEGASUS ([129.187.51.177] [129.187.51.177]) by mailout.lrz-muenchen.de with ESMTP; Sat, 28 Feb 2009 13:16:10 +0100 From: Uwe Siart To: Eli Zaretskii Cc: Stefan Monnier , Kenichi Handa , 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> Reply-To: uwe.siart@tum.de Date: Sat, 28 Feb 2009 13:16:08 +0100 In-Reply-To: (Eli Zaretskii's message of "Sat, 28 Feb 2009 12:49:58 +0200") Message-Id: <87ab86ah9z.fsf@tum.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Eli Zaretskii writes: >> From: Stefan Monnier >> So this still looks like a real bug. > > Perhaps it is, but I didn't know Emacs 23 can reliably distinguish > between Latin-1 and UTF-8, even when UTF-8 sequences are present in > the text. Can we do that reliably? Perhaps Handa-san can shed some > light on this. Finding a solution to do it reliably would of course be the best. Assumed this is not possible right now we should distinguish between =BBhigh reliability=AB and =BBpoor reliability=AB. From my perception it has been much more reliable earlier so (as a user with limited viewpoint) I vote for reverting the change. --=20 Uwe From eliz@gnu.org Sat Feb 28 04:21:20 2009 Received: (at 2497-done) by emacsbugs.donarmstrong.com; 28 Feb 2009 12:21:20 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout2.012.net.il (mtaout2.012.net.il [84.95.2.4]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SCLBmc011258; Sat, 28 Feb 2009 04:21:12 -0800 Received: from conversion-daemon.i_mtaout2.012.net.il by i_mtaout2.012.net.il (HyperSendmail v2004.12) id <0KFR00H00ZLJR800@i_mtaout2.012.net.il>; Sat, 28 Feb 2009 14:21:42 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.167.119]) by i_mtaout2.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0KFR0028OZO478C2@i_mtaout2.012.net.il>; Sat, 28 Feb 2009 14:21:42 +0200 (IST) Date: Sat, 28 Feb 2009 14:21:08 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: <87d4d3u61n.fsf@engster.org> X-012-Sender: halo1@inter.net.il To: 2497-done@debbugs.gnu.org, 2354-done@debbugs.gnu.org Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> <87d4d3u61n.fsf@engster.org> > From: David Engster > Date: Fri, 27 Feb 2009 18:46:12 +0100 > Cc: emacs-pretest-bug@gnu.org, 2497@emacsbugs.donarmstrong.com > > Uwe Siart writes: > > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > > fails to read utf-8 encoded files correctly. When visiting a file in > > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > > indicates iso-latin1-dos for saving the file. This has not been an > > issue in 23.0.90. > > Maybe this is a duplicate of what I reported in > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=2354 > > As I write later in that bug report, I think I could track down this > issue to the change in revision 1.413 of src/coding.c. Maybe you could > try if the same applies to your problem. Should be fixed by this change: 2009-02-28 Eli Zaretskii * coding.c (detect_coding_charset): Fix change from 2008-10-21. Also, check iso-latin-*, not only iso-8859-*. From jasonrumney@gmail.com Sat Feb 28 06:16:56 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 14:16:56 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER, MURPHY_DRUGS_REL8 autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ti-out-0910.google.com (ti-out-0910.google.com [209.85.142.187]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SEGqPf010405 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 06:16:54 -0800 Received: by ti-out-0910.google.com with SMTP id 28so2141851tif.1 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 06:16:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=4IGXtfOszmsE36WUrwXWPeXslG1aB9yhk837n8HJFi0=; b=MEsSUHEfslhmu00y4O8JcjXFrESECDtZ13CeCVUHE8s6Dtz97HhcKGWZbAzT+G/oec jaZD3w3oucWZ3JBAbtaa9pov61zvxifUke50QIqA8qBYpbLd88iRVn1OFcitnmlaLRua +zVuSlzSJ4t7uOXWc/gmCbgqMberqBTMu2hNQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; b=VjlsYIKUXulnu5Fta7lTB57FTjCxZQkE+hKxVpQMmBRAuMpsPfq6uaZ6tQXnCBzWsy XnqZkngtOHqJH4vYEkLPIW25pJzpSah8s0uPcMhIQkcNJpZcg3Sdb0n6naMEwARmb8rj HtYrPOYlNXzhYoKDIGbCgZ93GjQXp1yfNhcfA= Received: by 10.110.41.20 with SMTP id o20mr5560501tio.31.1235830612044; Sat, 28 Feb 2009 06:16:52 -0800 (PST) Received: from ?192.168.249.26? ([124.13.6.185]) by mx.google.com with ESMTPS id b7sm5081354tic.15.2009.02.28.06.16.49 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 28 Feb 2009 06:16:50 -0800 (PST) Sender: Jason Rumney Message-ID: <49A94736.5000206@f2s.com> Date: Sat, 28 Feb 2009 22:16:22 +0800 From: Jason Rumney User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: Eli Zaretskii , 2497@debbugs.gnu.org CC: David Engster , uwe.siart@tum.de Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> <87zlg7t1pc.fsf@tum.de> <87tz6e3m2v.fsf@engster.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Eli Zaretskii wrote: > You are right. But in my case, it was not enough to test for > "iso-8859-", as the symbol's name was "iso-latin-1", not "iso-8859-1". > > I installed the patch below, that does seem to fix the problem with > the OP's .gnus.el, although I don't know how general that problem is, > nor whether Emacs is capable of distinguishing UTF-8 from Latin-N in > general. > I installed a further change for the case where latin-extra-code-table is not a vector. But I don't understand why we have this table, and why the default value allows the 6 C1 control codes PU1, PU2, STS, CCH, MW and SPA to appear in latin text without breaking the auto detection. Are these control characters really that common? From david@engster.org Sat Feb 28 06:31:58 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 14:31:59 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-1.8 required=4.0 tests=HAS_BUG_NUMBER, IMPRONONCABLE_1,MURPHY_DRUGS_REL8,MURPHY_WRONG_WORD2 autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from m61s02.vlinux.de (m61s02.vlinux.de [83.151.21.164]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SEVsci014484 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 06:31:56 -0800 Received: from dslc-082-082-164-201.pools.arcor-ip.net ([82.82.164.201] helo=void) by m61s02.vlinux.de with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.69) (envelope-from ) id 1LdQGi-0007C5-G1; Sat, 28 Feb 2009 15:34:20 +0100 From: David Engster To: Eli Zaretskii Cc: 2497@debbugs.gnu.org, uwe.siart@tum.de Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-Reply-To: (Eli Zaretskii's message of "Sat, 28 Feb 2009 14:09:04 +0200") References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> <87zlg7t1pc.fsf@tum.de> <87tz6e3m2v.fsf@engster.org> User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.91 (gnu/linux) Date: Sat, 28 Feb 2009 15:31:47 +0100 Message-ID: <87hc2e3a5o.fsf@engster.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Eli Zaretskii writes: >> From: David Engster >> I'm inclined to say that this change is wrong, since strcmp will only >> return 0 if two strings are exactly equal. In this case though, the >> string "iso-8859-" is compared to "iso-8859-1" (in my case), so it >> returns 1 and therefore check_latin_extra is not set. > > You are right. But in my case, it was not enough to test for > "iso-8859-", as the symbol's name was "iso-latin-1", not "iso-8859-1". > > I installed the patch below, that does seem to fix the problem with > the OP's .gnus.el, although I don't know how general that problem is, > nor whether Emacs is capable of distinguishing UTF-8 from Latin-N in > general. I can confirm this patch fixes my original bug report (#2354). Thanks! -David From rms@gnu.org Sat Feb 28 10:10:36 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 28 Feb 2009 18:10:37 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SIAR5j010842; Sat, 28 Feb 2009 10:10:29 -0800 Received: from rms by fencepost.gnu.org with local (Exim 4.67) (envelope-from ) id 1LdTba-0006iK-Cn; Sat, 28 Feb 2009 13:08:06 -0500 Content-Type: text/plain; charset=ISO-8859-15 From: Richard M Stallman To: uwe.siart@tum.de, 2497@debbugs.gnu.org CC: emacs-pretest-bug@gnu.org, 2497@debbugs.gnu.org In-reply-to: <87ocwmevun.fsf@tum.de> (message from Uwe Siart on Sat, 28 Feb 2009 10:47:44 +0100) Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k Reply-to: rms@gnu.org References: <87ocwmevun.fsf@tum.de> Message-Id: Date: Sat, 28 Feb 2009 13:08:06 -0500 How right you are. Forgive me my trespasses. Only Emacs can forgive you, but I am confident that it will. In my own defence I have to say that I never thought of W2k as the "system". My system is Emacs and I'm very comfortable with it. W2k is its boot loader. Why not switch to a free boot loader then? From monnier@iro.umontreal.ca Sat Feb 28 14:00:53 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 22:00:53 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.4 required=4.0 tests=FOURLA,HAS_BUG_NUMBER, XIRONPORT autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ironport2-out.teksavvy.com (ironport2-out.teksavvy.com [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SM0nYX008133 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 14:00:50 -0800 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApsEAOpCqUnO+JhN/2dsb2JhbACBWdUghBoGg3c X-IronPort-AV: E=Sophos;i="4.38,282,1233550800"; d="scan'208";a="34538253" Received: from 206-248-152-77.dsl.teksavvy.com (HELO pastel.home) ([206.248.152.77]) by ironport2-out.teksavvy.com with ESMTP; 28 Feb 2009 17:00:43 -0500 Received: by pastel.home (Postfix, from userid 20848) id 1E7EA7FE9; Sat, 28 Feb 2009 17:00:43 -0500 (EST) From: Stefan Monnier To: uwe.siart@tum.de Cc: Eli Zaretskii , 2497@debbugs.gnu.org Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Message-ID: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> <87zlg7t1pc.fsf@tum.de> Date: Sat, 28 Feb 2009 17:00:43 -0500 In-Reply-To: <87zlg7t1pc.fsf@tum.de> (Uwe Siart's message of "Sat, 28 Feb 2009 09:17:35 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii >> The guessing shouldn't give priority to buffer-file-coding-system. >> Instead we have the set-coding-system-priority instead. And IIUC utf-8 >> should always have a pretty high priority since false positives are >> fairly rare. So this still looks like a real bug. > Here I would like to note that I never had false positives in the past > (before 23.0.91) but I do have false positives now. Therefore I'm > inclined to call it a bug. To clear things up: by "false positives" I meant text that Emacs thinks is valid utf-8 whereas it's really using some other coding system. Stefan From monnier@iro.umontreal.ca Sat Feb 28 14:04:44 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 28 Feb 2009 22:04:44 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.5 required=4.0 tests=HAS_BUG_NUMBER,XIRONPORT autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ironport2-out.teksavvy.com (ironport2-out.pppoe.ca [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n1SM4fsU008212 for <2497@emacsbugs.donarmstrong.com>; Sat, 28 Feb 2009 14:04:42 -0800 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApsEABdEqUnO+JhN/2dsb2JhbACBWNUghBoGg3c X-IronPort-AV: E=Sophos;i="4.38,282,1233550800"; d="scan'208";a="34538338" Received: from 206-248-152-77.dsl.teksavvy.com (HELO pastel.home) ([206.248.152.77]) by ironport2-out.teksavvy.com with ESMTP; 28 Feb 2009 17:04:35 -0500 Received: by pastel.home (Postfix, from userid 20848) id 873147FE9; Sat, 28 Feb 2009 17:04:35 -0500 (EST) From: Stefan Monnier To: Eli Zaretskii Cc: Kenichi Handa , 2497@debbugs.gnu.org, uwe.siart@tum.de Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Message-ID: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> Date: Sat, 28 Feb 2009 17:04:35 -0500 In-Reply-To: (Eli Zaretskii's message of "Sat, 28 Feb 2009 12:49:58 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii >> The guessing shouldn't give priority to buffer-file-coding-system. >> Instead we have the set-coding-system-priority instead. > Please give me some credit: I said ``the _default_value_ of > buffer-file-coding-system''. That default tells volumes about the > coding-system priorities. I'm sorry for my bad wording: what I wrote was only meant to describe the way the code is currently expected to work (AFAIK). > M-: (coding-system-priority-list) RET > => (iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit utf-8-auto utf-8-with-signature utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le japanese-shift-jis undecided) > So UTF-8 is indeed ``pretty high'', but lower than the locale's > default. That seems to be the source of the problem. utf-8 should always come before latin-1 in that list, since utf-8 streams that are valid latin-1 streams are not uncommon, whereas latin-1 streams that are valid utf-8 streams are extremely rare. Stefan From handa@m17n.org Mon Mar 2 03:43:28 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 2 Mar 2009 11:43:28 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.0 required=4.0 tests=FVGT_m_MULTI_ODD, HAS_BUG_NUMBER,IMPRONONCABLE_2 autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n22BhO18002481 for <2497@emacsbugs.donarmstrong.com>; Mon, 2 Mar 2009 03:43:25 -0800 Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id n22BhMFJ005580; Mon, 2 Mar 2009 20:43:22 +0900 (JST) env-from (handa@m17n.org) Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id n22BhMKR017824; Mon, 2 Mar 2009 20:43:22 +0900 (JST) env-from (handa@m17n.org) Received: by smtp2.aist.go.jp with ESMTP id n22BhKWb006691; Mon, 2 Mar 2009 20:43:20 +0900 (JST) env-from (handa@m17n.org) Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1Le6Yw-0006th-Om; Mon, 02 Mar 2009 20:43:58 +0900 From: Kenichi Handa To: Eli Zaretskii CC: monnier@iro.umontreal.ca, 2497@debbugs.gnu.org, uwe.siart@tum.de In-reply-to: (message from Eli Zaretskii on Sat, 28 Feb 2009 12:49:58 +0200) Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Message-Id: Date: Mon, 02 Mar 2009 20:43:58 +0900 In article , Eli Zaretskii writes: > M-: (coding-system-priority-list) RET >>> (iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit utf-8-auto utf-8-with-signature utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le japanese-shift-jis undecided) > So UTF-8 is indeed ``pretty high'', but lower than the locale's > default. > > So this still looks like a real bug. > Perhaps it is, but I didn't know Emacs 23 can reliably distinguish > between Latin-1 and UTF-8, even when UTF-8 sequences are present in > the text. Can we do that reliably? Perhaps Handa-san can shed some > light on this. The coding system iso-latin-1 is for the character set iso-8859-1, and the code-space of iso-8859-1 is 0x00..0xFF (without gap, i.e. including 0x80..0x9F) (see /usr/share/i18n/charmaps/ISO-8859-1.gz). So, if we follows it strictly, any byte sequence can be a correct iso-8859-1 stream, and it means that when iso-latin-1 has the highest priority, all files are detected as iso-latin-1. So, as far as we strictly follows the definition of iso-8859-1... In article , Stefan Monnier writes: > That seems to be the source of the problem. utf-8 should always come > before latin-1 in that list, since utf-8 streams that are valid latin-1 > streams are not uncommon, whereas latin-1 streams that are valid utf-8 > streams are extremely rare. I think that is the only solution. In article <87ab86ah9z.fsf@tum.de>, Uwe Siart writes: > Assumed this is not possible right now we should distinguish between > »high reliability« and »poor reliability«. From my perception it has > been much more reliable earlier so (as a user with limited viewpoint) > I vote for reverting the change. In Emacs 22, the coding system iso-latin-1 was defined as a variant of iso-2022-based coding system, and thus 0x80..0x9F were not a valid byte (except for 0x91 and etc. in latin-extra-code-table). So, some of UTF-8 texts were not detected as iso-latin-1. To recover that behaviour, we can define iso-latin-1 as before by doing this: (define-coding-system 'iso-latin-1 "Emacs 22 iso-latin-1." :mnemonic ?1 :coding-type 'iso-2022 :charset-list '(ascii latin-iso8859-1) :ascii-compatible-p t :mime-charset 'iso-8859-1 :designation [ascii latin-iso8859-1 nil nil]) But, even with that, still some valid UTF-8 texts will be detected as iso-latin-1. So I don't think this is the solution of "high reliability". --- Kenichi Handa handa@m17n.org From monnier@iro.umontreal.ca Mon Mar 2 07:25:54 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 2 Mar 2009 15:25:55 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.5 required=4.0 tests=HAS_BUG_NUMBER,XIRONPORT autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ironport2-out.teksavvy.com (ironport2-out.teksavvy.com [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n22FPp9L028323 for <2497@emacsbugs.donarmstrong.com>; Mon, 2 Mar 2009 07:25:52 -0800 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhIFAJOJq0lFxIQh/2dsb2JhbACBWdR6hBoGg3k X-IronPort-AV: E=Sophos;i="4.38,289,1233550800"; d="scan'208";a="34578400" Received: from 69-196-132-33.dsl.teksavvy.com (HELO pastel.home) ([69.196.132.33]) by ironport2-out.teksavvy.com with ESMTP; 02 Mar 2009 10:25:45 -0500 Received: by pastel.home (Postfix, from userid 20848) id 253A87FE9; Mon, 2 Mar 2009 10:25:45 -0500 (EST) From: Stefan Monnier To: Kenichi Handa Cc: Eli Zaretskii , 2497@debbugs.gnu.org, uwe.siart@tum.de Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Message-ID: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> Date: Mon, 02 Mar 2009 10:25:45 -0500 In-Reply-To: (Kenichi Handa's message of "Mon, 02 Mar 2009 20:43:58 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii >> That seems to be the source of the problem. utf-8 should always come >> before latin-1 in that list, since utf-8 streams that are valid latin-1 >> streams are not uncommon, whereas latin-1 streams that are valid utf-8 >> streams are extremely rare. > I think that is the only solution. Not only it's the only solution, but it's a solution on which we agreed already several years ago. So, again, the bug is in the ordering, and we have to figure out which code ends up putting latin-1 before utf-8 in the coding system priority. Stefan From eliz@gnu.org Mon Mar 2 11:26:09 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 2 Mar 2009 19:26:10 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.9 required=4.0 tests=FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout6.012.net.il (mtaout6.012.net.il [84.95.2.16]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n22JQ6J7025265 for <2497@emacsbugs.donarmstrong.com>; Mon, 2 Mar 2009 11:26:08 -0800 Received: from conversion-daemon.i-mtaout6.012.net.il by i-mtaout6.012.net.il (HyperSendmail v2007.08) id <0KFW0050084GPC00@i-mtaout6.012.net.il> for 2497@emacsbugs.donarmstrong.com; Mon, 02 Mar 2009 21:26:40 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.229.248.57]) by i-mtaout6.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0KFW00JES8OD4SP0@i-mtaout6.012.net.il>; Mon, 02 Mar 2009 21:26:38 +0200 (IST) Date: Mon, 02 Mar 2009 21:25:58 +0200 From: Eli Zaretskii Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k In-reply-to: X-012-Sender: halo1@inter.net.il To: Stefan Monnier Cc: handa@m17n.org, 2497@debbugs.gnu.org, uwe.siart@tum.de Reply-to: Eli Zaretskii Message-id: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> > From: Stefan Monnier > Cc: Eli Zaretskii , 2497@emacsbugs.donarmstrong.com, uwe.siart@tum.de > Date: Mon, 02 Mar 2009 10:25:45 -0500 > > So, again, the bug is in the ordering Actually, the OP was complaining that, even with this ordering, Emacs 23 did TRT for him, and that a recent change broke that. That bug is fixed now, I believe, so you are talking about a more general problem. > we have to figure out which code ends up putting latin-1 before utf-8 in > the coding system priority. Well, I think this is fairly easy: set-locale-environment does it. Observe: (defun set-locale-environment (&optional locale-name frame) "Set up multi-lingual environment for using LOCALE-NAME. This sets the language environment, the coding system priority, the default input method and sometimes other things. ... (let ((language-name (locale-name-match locale locale-language-names)) (charset-language-name (locale-name-match locale locale-charset-language-names)) (default-eol-type (coding-system-eol-type default-buffer-file-coding-system)) (coding-system (or (locale-name-match locale locale-preferred-coding-systems) (when locale (if (string-match "\\.\\([^@]+\\)" locale) (locale-charset-to-coding-system (match-string 1 locale))))))) ... (when (and (not frame) coding-system (not (coding-system-equal coding-system locale-coding-system))) >>>>> (prefer-coding-system coding-system) ;; Fixme: perhaps prefer-coding-system should set this too. ;; But it's not the time to do such a fundamental change. (setq default-sendmail-coding-system coding-system) (setq locale-coding-system coding-system)))) Even the doc string says that the coding system priority is set according to the locale's native encoding. From monnier@IRO.UMontreal.CA Tue Mar 3 08:34:56 2009 Received: (at 2497) by emacsbugs.donarmstrong.com; 3 Mar 2009 16:34:56 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from chene.dit.umontreal.ca (chene.dit.umontreal.ca [132.204.246.20]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n23GYmwd008341 for <2497@emacsbugs.donarmstrong.com>; Tue, 3 Mar 2009 08:34:50 -0800 Received: from alfajor.home (vpn-132-204-232-170.acd.umontreal.ca [132.204.232.170]) by chene.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id n23GYknd005904; Tue, 3 Mar 2009 11:34:46 -0500 Received: by alfajor.home (Postfix, from userid 20848) id 0FA93A22E6; Tue, 3 Mar 2009 11:34:46 -0500 (EST) From: Stefan Monnier To: Eli Zaretskii Cc: handa@m17n.org, 2497@debbugs.gnu.org, uwe.siart@tum.de Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Message-ID: References: <877i3c55tg.fsf@tum.de> <87ljrromgg.fsf@tum.de> Date: Tue, 03 Mar 2009 11:34:45 -0500 In-Reply-To: (Eli Zaretskii's message of "Mon, 02 Mar 2009 21:25:58 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.90 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV3222=0 >> So, again, the bug is in the ordering > Actually, the OP was complaining that, even with this ordering, Emacs > 23 did TRT for him, and that a recent change broke that. That bug is > fixed now, I believe, so you are talking about a more general problem. Yes. I didn't realize that the reason why it worked before is because we were luckly. >> we have to figure out which code ends up putting latin-1 before utf-8 in >> the coding system priority. > Well, I think this is fairly easy: set-locale-environment does it. > Observe: > (defun set-locale-environment (&optional locale-name frame) [...] >>>>>> (prefer-coding-system coding-system) [...] > Even the doc string says that the coding system priority is set > according to the locale's native encoding. Indeed, thanks for spotting it. Can someone change this code so it doesn't move utf-8 from first to second place? Stefan From unknown Mon Jun 23 18:26:18 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: $requester Subject: Internal Control Message-Id: bug archived. Date: Wed, 01 Apr 2009 14:24:09 +0000 User-Agent: Fakemail v42.6.9 # A New Hope # A log time ago, in a galaxy far, far away # something happened. # # Magically this resulted in the following # action being taken, but this fake control # message doesn't tell you why it happened # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator