From unknown Mon Jun 23 22:03:56 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#20623 <20623@debbugs.gnu.org> To: bug#20623 <20623@debbugs.gnu.org> Subject: Status: XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Reply-To: bug#20623 <20623@debbugs.gnu.org> Date: Tue, 24 Jun 2025 05:03:56 +0000 retitle 20623 XML and HTML files with encoding/charset=3D"utf-8" declaratio= n lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on sa= ve reassign 20623 emacs submitter 20623 Simon Ledergerber severity 20623 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Thu May 21 14:52:41 2015 Received: (at submit) by debbugs.gnu.org; 21 May 2015 18:52:41 +0000 Received: from localhost ([127.0.0.1]:52380 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YvVaC-0005VH-GW for submit@debbugs.gnu.org; Thu, 21 May 2015 14:52:41 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36981) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YvVYx-0005TD-Qb for submit@debbugs.gnu.org; Thu, 21 May 2015 14:51:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YvVYq-0005HY-Jj for submit@debbugs.gnu.org; Thu, 21 May 2015 14:51:18 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45979) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVYq-0005HU-H0 for submit@debbugs.gnu.org; Thu, 21 May 2015 14:51:16 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50481) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVYo-0000z9-UK for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:51:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YvVYl-0005GC-SH for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:51:14 -0400 Received: from mout.gmx.net ([212.227.17.22]:56000) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVYl-0005Fd-Hw for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:51:11 -0400 Received: from [192.168.1.102] ([77.56.185.142]) by mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0LkCU2-1ZS5UF0JgI-00cBjh for ; Thu, 21 May 2015 20:51:09 +0200 Message-ID: <555E2912.7060509@gmx.net> Date: Thu, 21 May 2015 20:50:58 +0200 From: Simon Ledergerber User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: bug-gnu-emacs@gnu.org Subject: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:+N77zlDSfKetZFyJ9sojTf37y7vlQ9EMwXRcTN9eR1I4kstbR1k y8u2HrVwBcRgmT+eD6uDr66KK0+EiCoqODouS+ovKG2+2UoLufkMVjJuDK2o8/vE6HOF+Gq rUhzvB8YfVTBBBuSc+iLloOhugd2uJgZcAjDZ5AxCygaHWXO2QP6F17wL3enxHHrZkWJ+1/ 4VyeaKjzrlFwOel39mzPw== X-UI-Out-Filterresults: notjunk:1; X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Thu, 21 May 2015 14:52:38 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.1 (----) Hi When I was editing XHTML and HTML files, I wanted to make sure the BOM was written out to the file in order to make it easier for the browser to detect the UTF-8 encoding. Therefore I changed the coding system for the file buffer to utf-8-with-signature-dos (since I am working on a Windows System) before saving the file. After some time I got surprised because the browser (IE11), didn't report UTF-8 as the file's encoding. Having checked the hexdump of my (X)HTML file, I saw the BOM was definitely missing. Obviously, when a "UTF-8" string appears in the (even if commented out, see later below) or declaration, Emacs switches the file coding system to utf-8, when it saves the file, even if utf-8-with-signature was specified explicitly before. This appears to me as a bug, because there is no way anymore to restore the BOM using Emacs. I was not sure, if my bug is related to bug #8282, so I decided to report it (again). My Emacs version is: 24.5.1 (x86_64-unkown-cygwin) of 2015-04-10 on Windows 8.1 x64. I am running Emacs in text-mode only inside a Cygwin console. This is my .emacs.d/init.el: (line-number-mode) (column-number-mode) (setq-default fill-column 80) (setq-default buffer-file-coding-system 'utf-8-dos) (setq-default indent-tabs-mode nil) With XML the problem can be reproduced in the most basic way as detailed out by the following steps: - Create a new file with C-x C-f in the current directory. Name it test.txt for example. - Switch to fundamental mode with M-x fundamental-mode. - Type the text '' - Now save the file and check again: The encoding system for the buffer has changed to utf-8-dos and the BOM has disappeared from the file! Now the steps for HTML: - Create a new file test1.txt in the current directory. - Fill it with the following simple and yet incomplete HTML5 document: Test - Change the coding system to utf-8-with-signature-dos and save the file. - Verify that the coding system for the buffer is correct and the BOM is really written: Yes, it is. - Insert the following *comment* between and : <!-- <meta charset="utf-8"> --> - Save the file and verify: The coding system has changed to utf-8-dos and the BOM has vanished, even if it is just a comment and has no effect! Regards Simon P. S. Information as reported by M-x report-emacs-bug: In GNU Emacs 24.5.1 (x86_64-unknown-cygwin) of 2015-04-10 on desktop-new Configured using: `configure --srcdir=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/src/emacs-24.5 --prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/emacs --htmldir=/usr/share/doc/emacs/html -C --with-x=no 'CFLAGS=-ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/build=/usr/src/debug/emacs-24.5-1 -fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/src/emacs-24.5=/usr/src/debug/emacs-24.5-1' CPPFLAGS= LDFLAGS=' Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Help Minor modes in effect: tooltip-mode: t electric-indent-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent messages: Beginning of buffer [3 times] Saving file /cygdrive/c/users/.../html_basics/basic.xhtml... Wrote /cygdrive/c/users/.../html_basics/basic.xhtml Mark set [2 times] Auto-saving...done Mark set [2 times] Saving file /cygdrive/c/users/.../html_basics/basic.xhtml... Wrote /cygdrive/c/users/.../html_basics/basic.xhtml No docstring slot for help-mode-setup No docstring slot for help-mode-finish Load-path shadows: None found. Features: (shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util help-fns mail-prsvr mail-utils misearch multi-isearch mule-diag help-mode easymenu regexp-opt sgml-mode xterm time-date tooltip electric uniquify ediff-hook vc-hooks lisp-float-type tabulated-list newcomment lisp-mode prog-mode register page menu-bar rfn-eshadow timer select mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dbusbind gfilenotify multi-tty emacs) Memory information: ((conses 16 81797 4691) (symbols 48 17091 0) (miscs 40 73 387) (strings 32 11233 4887) (string-bytes 1 291872) (vectors 16 7587) (vector-slots 8 342125 27930) (floats 8 57 393) (intervals 56 834 26) (buffers 960 21)) From debbugs-submit-bounces@debbugs.gnu.org Thu May 21 15:48:47 2015 Received: (at 20623) by debbugs.gnu.org; 21 May 2015 19:48:47 +0000 Received: from localhost ([127.0.0.1]:52392 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1YvWSU-0006pt-Es for submit@debbugs.gnu.org; Thu, 21 May 2015 15:48:46 -0400 Received: from mtaout27.012.net.il ([80.179.55.183]:39918) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@gnu.org>) id 1YvWSP-0006pb-QE for 20623@debbugs.gnu.org; Thu, 21 May 2015 15:48:43 -0400 Received: from conversion-daemon.mtaout27.012.net.il by mtaout27.012.net.il (HyperSendmail v2007.08) id <0NOP00I00TCRWN00@mtaout27.012.net.il> for 20623@debbugs.gnu.org; Thu, 21 May 2015 22:43:47 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout27.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NOP00B0ZTGZ6N70@mtaout27.012.net.il>; Thu, 21 May 2015 22:43:47 +0300 (IDT) Date: Thu, 21 May 2015 22:48:31 +0300 From: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save In-reply-to: <555E2912.7060509@gmx.net> X-012-Sender: halo1@inter.net.il To: Simon Ledergerber <sledergerber@gmx.net> Message-id: <83iobl67ao.fsf@gnu.org> References: <555E2912.7060509@gmx.net> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20623 Cc: 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@gnu.org> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: 1.0 (+) > Date: Thu, 21 May 2015 20:50:58 +0200 > From: Simon Ledergerber <sledergerber@gmx.net> > > When I was editing XHTML and HTML files, I wanted to make sure the BOM > was written out to the file in order to make it easier for the browser > to detect the UTF-8 encoding. Therefore I changed the coding system for > the file buffer to utf-8-with-signature-dos (since I am working on a > Windows System) before saving the file. > > After some time I got surprised because the browser (IE11), didn't > report UTF-8 as the file's encoding. Having checked the hexdump of my > (X)HTML file, I saw the BOM was definitely missing. > > Obviously, when a "UTF-8" string appears in the <meta charset="utf-8"> > (even if commented out, see later below) or <?xml version="1.0" > encoding="utf-8"?> declaration, Emacs switches the file coding system to > utf-8, when it saves the file, even if utf-8-with-signature was > specified explicitly before. This appears to me as a bug, because there > is no way anymore to restore the BOM using Emacs. What would you expect Emacs to do instead? It just obeys the stated encoding, which says nothing about the BOM. How can Emacs know when to use utf-8 and when utf-8-with-signature? From debbugs-submit-bounces@debbugs.gnu.org Fri May 22 03:11:45 2015 Received: (at 20623) by debbugs.gnu.org; 22 May 2015 07:11:45 +0000 Received: from localhost ([127.0.0.1]:52534 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1Yvh7Q-0005Ye-Gq for submit@debbugs.gnu.org; Fri, 22 May 2015 03:11:45 -0400 Received: from mtaout26.012.net.il ([80.179.55.182]:47717) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@gnu.org>) id 1Yvh7M-0005YM-LF for 20623@debbugs.gnu.org; Fri, 22 May 2015 03:11:41 -0400 Received: from conversion-daemon.mtaout26.012.net.il by mtaout26.012.net.il (HyperSendmail v2007.08) id <0NOQ00600P0C5X00@mtaout26.012.net.il> for 20623@debbugs.gnu.org; Fri, 22 May 2015 10:13:18 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout26.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NOQ00O9VPE6ZG70@mtaout26.012.net.il>; Fri, 22 May 2015 10:13:18 +0300 (IDT) Date: Fri, 22 May 2015 10:11:31 +0300 From: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save In-reply-to: <555E44EB.6070604@gmx.net> X-012-Sender: halo1@inter.net.il To: Simon Ledergerber <sledergerber@gmx.net> Message-id: <83egm95boc.fsf@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20623 Cc: 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@gnu.org> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: 1.0 (+) [Please don't remove the bug address from the CC list, so that this discussion is recorded in the bug data base.] > Date: Thu, 21 May 2015 22:49:47 +0200 > From: Simon Ledergerber <sledergerber@gmx.net> > > From the documentation I understand that utf-8 is without BOM and > utf-8-with-signature is with BOM. Maybe I am wrong and should rather > understand that utf-8 is auto-detect. But then there is something like > utf-8-without-signature missing to specify explicitly that no BOM is > desired. > > In my opinion, it is correct when Emacs prefers utf-8 over > utf-8-with-signature when it opens a file without BOM that can still be > recognized as UTF-8. > > However when a file is opened with a BOM already present, it should > stick to the utf-8-with-signature coding system, because the BOM "EF BB > BF" unambiguously marks the file as UTF-8. (For UTF-16 for example, > there is a different BOM byte pattern. There are other coding systems > which do not have a BOM at all.) What do you mean by "stick to"? When I try visiting an XML file that is encoded with BOM, Emacs decodes the file correctly, and the value of buffer-file-coding-system is utf-8-with-signature. Isn't that what you want? If that's what you want, but it doesn't happen for you, please try in "emacs -Q". It's possible that the default you set: (setq-default buffer-file-coding-system 'utf-8-dos) is the reason for what you see. (I don't understand why you need such a default, and it sounds like a bad idea to me.) > By doing C-x <RET> f and then saving it with C-x C-s, I expect to be > able to change the coding system. For example, if I specify utf-8-dos, > the BOM should be removed, if one was present, and CR LF should be > inserted for EOL. On the other side, if I choose > utf-8-with-signature-unix, a BOM should be written and LF be taken for > EOL. (The conversion between DOS and Unix works, just the BOM is the > problem.) > > I have found a link, where this topic was already discussed, but it > didn't help me further: > http://superuser.com/questions/41254/make-emacs-not-remove-the-bom-from-xml-files > > In that post Vebjorn Ljosa asked exactly the question I have. Richard > Hoskins replies with the answer to change the coding system with C-x > <RET> r utf-8-with-signature. Unfortunately, it didn't work for me - > after doing a change in the file and saving, it got back to utf-8 > automatically - that's why I have filed the bug. That's not how you force a file to be saved in a specific encoding. You should do this instead: C-x RET c utf-8-with-signature RET C-x C-s The "C-x RET c" prefix forces the next Emacs operation to use the specified encoding. In this case, Emacs will ask for confirmation, because the encoding you specified is different from what the XML comment says. From debbugs-submit-bounces@debbugs.gnu.org Fri May 22 09:21:23 2015 Received: (at 20623) by debbugs.gnu.org; 22 May 2015 13:21:23 +0000 Received: from localhost ([127.0.0.1]:52660 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1Yvmt6-0007dN-2M for submit@debbugs.gnu.org; Fri, 22 May 2015 09:21:22 -0400 Received: from mout.gmx.net ([212.227.17.21]:63529) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <sledergerber@gmx.net>) id 1Yvmt2-0007d9-Op for 20623@debbugs.gnu.org; Fri, 22 May 2015 09:21:18 -0400 Received: from [192.168.1.102] ([77.56.185.142]) by mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0MZgdm-1YaOeO23Hq-00LU0e; Fri, 22 May 2015 15:21:10 +0200 Message-ID: <555F2D3C.6090608@gmx.net> Date: Fri, 22 May 2015 15:21:00 +0200 From: Simon Ledergerber <sledergerber@gmx.net> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> In-Reply-To: <83egm95boc.fsf@gnu.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:W/wk9xhHItSC6ZbE358dFPGwRmhsaBTnRQC84cYaA3qxCXZ7iiy ai7mL/ZNA0Fj1RsGF+Vpmn5gGTrZZwQxSEsq5PJ783NjpP6HP5/4/xYniwP7Lqun5GLogaM 80tMGKaudMhy/f2/PY8naexo3TB8BW1O06AhKcqfU5YG4GVkMtpabMHjh5AEjPymMxGuKRd EtsrWdoz7GfNps/nPQewQ== X-UI-Out-Filterresults: notjunk:1; X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20623 Cc: 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -0.0 (/) Hello Eli I have done some more research to answer your questions. You will find the details of my statement at the end of this mail. On 22.05.2015 09:11, Eli Zaretskii wrote: > [Please don't remove the bug address from the CC list, so that this > discussion is recorded in the bug data base.] > >> Date: Thu, 21 May 2015 22:49:47 +0200 >> From: Simon Ledergerber <sledergerber@gmx.net> >> >> From the documentation I understand that utf-8 is without BOM and >> utf-8-with-signature is with BOM. Maybe I am wrong and should rather >> understand that utf-8 is auto-detect. But then there is something like >> utf-8-without-signature missing to specify explicitly that no BOM is >> desired. >> >> In my opinion, it is correct when Emacs prefers utf-8 over >> utf-8-with-signature when it opens a file without BOM that can still be >> recognized as UTF-8. >> >> However when a file is opened with a BOM already present, it should >> stick to the utf-8-with-signature coding system, because the BOM "EF BB >> BF" unambiguously marks the file as UTF-8. (For UTF-16 for example, >> there is a different BOM byte pattern. There are other coding systems >> which do not have a BOM at all.) > What do you mean by "stick to"? When I try visiting an XML file that > is encoded with BOM, Emacs decodes the file correctly, and the value > of buffer-file-coding-system is utf-8-with-signature. Isn't that what > you want? If that's what you want, but it doesn't happen for you, > please try in "emacs -Q". It's possible that the default you set: > > (setq-default buffer-file-coding-system 'utf-8-dos) > > is the reason for what you see. (I don't understand why you need such > a default, and it sounds like a bad idea to me.) You're right. When I open a file that was really saved with BOM, Emacs detects its encoding correctly, i. e. utf-8-with-signature-dos. But when I change the content and save with C-x C-s, the encoding changes to utf-8-dos and the BOM gets lost. Even when I start Emacs with -Q. This is the actual bug. > >> By doing C-x <RET> f and then saving it with C-x C-s, I expect to be >> able to change the coding system. For example, if I specify utf-8-dos, >> the BOM should be removed, if one was present, and CR LF should be >> inserted for EOL. On the other side, if I choose >> utf-8-with-signature-unix, a BOM should be written and LF be taken for >> EOL. (The conversion between DOS and Unix works, just the BOM is the >> problem.) >> >> I have found a link, where this topic was already discussed, but it >> didn't help me further: >> http://superuser.com/questions/41254/make-emacs-not-remove-the-bom-from-xml-files >> >> In that post Vebjorn Ljosa asked exactly the question I have. Richard >> Hoskins replies with the answer to change the coding system with C-x >> <RET> r utf-8-with-signature. Unfortunately, it didn't work for me - >> after doing a change in the file and saving, it got back to utf-8 >> automatically - that's why I have filed the bug. > That's not how you force a file to be saved in a specific encoding. > You should do this instead: > > C-x RET c utf-8-with-signature RET C-x C-s > > The "C-x RET c" prefix forces the next Emacs operation to use the > specified encoding. In this case, Emacs will ask for confirmation, > because the encoding you specified is different from what the XML > comment says. > This is true and it worked for me. Please see below for further explanations. Summary: - C-x RET c utf-8-with-signature RET C-x C-s is a good workaround, because it really forces the file being written with BOM. In order to have an effect however, the file must be dirty, i. e. there must be a pending change. But before the command completes in this case, the prompt "Selected encoding utf-8-with-signature-dos disagrees with utf-8-dos specified by file contents. Really save (else edit coding cookies and try again)? (yes or no)" appears. I think this is what you mean with your sentence: "In this case, Emacs will ask for confirmation, because the encoding you specified is different from what the XML comment says." - But consider the following: The encoding in the XML declaration or in the HTML <meta charset="utf-8"> just specifies UTF-8 (or another encoding). It doesn't say anything about the presence or absence of the BOM. Therefore an editor detecting and deciding about the file's encoding should not rely on this information only. - When such a file, which was saved successfully with BOM, is closed and reopened again, Emacs detects its encoding correctly, say utf-8-with-signature-dos. - However, when I change the file content and save it again just with C-x C-s (without C-x RET c ... first!), then it changes back to utf-8-dos. Yes, even if I start emacs with -Q! (That's the point.) - I do not fully understand the criterion for and the magic behind how Emacs chooses the file encoding when I do C-x C-s. But I was able to reproduce it several times by applying the procedures given in the bug report, even when -Q is on. As we already have stated above, this could be because Emacs favors (and forces) utf-8 whenever it sees something like XML or HTML that might be UTF-8-encoded. -> Conclusion: C-x RET c utf-8-with-signature RET C-x C-s is a good way to force the file being written as I want. But what I still do not understand: When I open a file with BOM and Emacs recognizes that, why does it change the encoding silently to drop the BOM when I regularly save with C-x C-s - and this even without giving me a notice or warning? From debbugs-submit-bounces@debbugs.gnu.org Fri May 22 11:22:56 2015 Received: (at 20623) by debbugs.gnu.org; 22 May 2015 15:22:57 +0000 Received: from localhost ([127.0.0.1]:53061 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1Yvomm-0002I4-EK for submit@debbugs.gnu.org; Fri, 22 May 2015 11:22:56 -0400 Received: from mercure.iro.umontreal.ca ([132.204.24.67]:48171) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <monnier@iro.umontreal.ca>) id 1Yvomk-0002Hr-8W for 20623@debbugs.gnu.org; Fri, 22 May 2015 11:22:54 -0400 Received: from hidalgo.iro.umontreal.ca (hidalgo.iro.umontreal.ca [132.204.27.50]) by mercure.iro.umontreal.ca (Postfix) with ESMTP id DFC599C15E; Fri, 22 May 2015 11:22:52 -0400 (EDT) Received: from lechon.iro.umontreal.ca (lechon.iro.umontreal.ca [132.204.27.242]) by hidalgo.iro.umontreal.ca (Postfix) with ESMTP id D38F81E5B8D; Fri, 22 May 2015 11:22:27 -0400 (EDT) Received: by lechon.iro.umontreal.ca (Postfix, from userid 20848) id B3FF3B4334; Fri, 22 May 2015 11:22:27 -0400 (EDT) From: Stefan Monnier <monnier@iro.umontreal.ca> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <jwvoalcfxkp.fsf-monnier+bug#20623@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> Date: Fri, 22 May 2015 11:22:27 -0400 In-Reply-To: <83iobl67ao.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 21 May 2015 22:48:31 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-DIRO-MailScanner-Information: Please contact the ISP for more information X-DIRO-MailScanner: Found to be clean X-DIRO-MailScanner-SpamCheck: n'est pas un polluriel, SpamAssassin (score=-2.82, requis 5, autolearn=not spam, ALL_TRUSTED -2.82, MC_TSTLAST 0.00) X-DIRO-MailScanner-From: monnier@iro.umontreal.ca X-Spam-Status: No X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20623 Cc: Simon Ledergerber <sledergerber@gmx.net>, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -2.3 (--) > What would you expect Emacs to do instead? It just obeys the stated > encoding, which says nothing about the BOM. How can Emacs know when > to use utf-8 and when utf-8-with-signature? To the extent that Emacs has seen the BOM when opening the file, it would make sense for Emacs to try and preserve this detail. IOW the utf-8 annotation in the XML metadata shouldn't mean "use the utf-8 coding system" but "use a coding system compatible with utf-8". So if the coding system is already compatible with utf-8 (e.g. utf-8-with-signature), we should simply keep using that rather than switch to the utf-8 coding-system. Stefan From debbugs-submit-bounces@debbugs.gnu.org Fri May 22 11:27:11 2015 Received: (at 20623) by debbugs.gnu.org; 22 May 2015 15:27:11 +0000 Received: from localhost ([127.0.0.1]:53065 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1Yvoqq-0002Px-9d for submit@debbugs.gnu.org; Fri, 22 May 2015 11:27:11 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]:60886) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@gnu.org>) id 1Yvoqn-0002PQ-5I for 20623@debbugs.gnu.org; Fri, 22 May 2015 11:27:06 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0NOR00700BRU0G00@a-mtaout20.012.net.il> for 20623@debbugs.gnu.org; Fri, 22 May 2015 18:26:58 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NOR0068EC8YUZ80@a-mtaout20.012.net.il>; Fri, 22 May 2015 18:26:58 +0300 (IDT) Date: Fri, 22 May 2015 18:26:57 +0300 From: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save In-reply-to: <jwvoalcfxkp.fsf-monnier+bug#20623@gnu.org> X-012-Sender: halo1@inter.net.il To: Stefan Monnier <monnier@iro.umontreal.ca> Message-id: <83iobk4oqm.fsf@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <jwvoalcfxkp.fsf-monnier+bug#20623@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20623 Cc: sledergerber@gmx.net, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@gnu.org> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: 1.0 (+) > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Simon Ledergerber <sledergerber@gmx.net>, 20623@debbugs.gnu.org > Date: Fri, 22 May 2015 11:22:27 -0400 > > > What would you expect Emacs to do instead? It just obeys the stated > > encoding, which says nothing about the BOM. How can Emacs know when > > to use utf-8 and when utf-8-with-signature? > > To the extent that Emacs has seen the BOM when opening the file, it > would make sense for Emacs to try and preserve this detail. It does. From debbugs-submit-bounces@debbugs.gnu.org Fri May 22 17:51:21 2015 Received: (at 20623) by debbugs.gnu.org; 22 May 2015 21:51:22 +0000 Received: from localhost ([127.0.0.1]:53269 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1Yvuqe-0007Pd-Hp for submit@debbugs.gnu.org; Fri, 22 May 2015 17:51:21 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:35473) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <monnier@iro.umontreal.ca>) id 1Yvuqa-0007PB-01 for 20623@debbugs.gnu.org; Fri, 22 May 2015 17:51:17 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0CVDQA731xV/xaz+M5cgxCEAsEVh0sEAgKBPDsSAQEBAQEBAYEKQQWDXQEBAwFWIxALNAcLFBgNJIg3CM8jAQEBAQYBAQEBHos6hQUHhC0Fsz+BRSNhgQVVgVkigngBAQE X-IPAS-Result: A0CVDQA731xV/xaz+M5cgxCEAsEVh0sEAgKBPDsSAQEBAQEBAYEKQQWDXQEBAwFWIxALNAcLFBgNJIg3CM8jAQEBAQYBAQEBHos6hQUHhC0Fsz+BRSNhgQVVgVkigngBAQE X-IronPort-AV: E=Sophos;i="5.13,465,1427774400"; d="scan'208";a="121810799" Received: from 206-248-179-22.dsl.teksavvy.com (HELO pastel.home) ([206.248.179.22]) by ironport2-out.teksavvy.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 22 May 2015 17:51:09 -0400 Received: by pastel.home (Postfix, from userid 20848) id 7EF2A24E2; Fri, 22 May 2015 17:51:07 -0400 (EDT) From: Stefan Monnier <monnier@iro.umontreal.ca> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <jwvpp5si8nn.fsf-monnier+emacsbugs@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <jwvoalcfxkp.fsf-monnier+bug#20623@gnu.org> <83iobk4oqm.fsf@gnu.org> Date: Fri, 22 May 2015 17:51:07 -0400 In-Reply-To: <83iobk4oqm.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 22 May 2015 18:26:57 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 20623 Cc: sledergerber@gmx.net, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: 0.3 (/) >> > What would you expect Emacs to do instead? It just obeys the stated >> > encoding, which says nothing about the BOM. How can Emacs know when >> > to use utf-8 and when utf-8-with-signature? >> To the extent that Emacs has seen the BOM when opening the file, it >> would make sense for Emacs to try and preserve this detail. > It does. While there are cases where it does, this bug report is about a case where it doesn't, IIUC. Stefan From debbugs-submit-bounces@debbugs.gnu.org Sat May 23 02:44:28 2015 Received: (at 20623) by debbugs.gnu.org; 23 May 2015 06:44:28 +0000 Received: from localhost ([127.0.0.1]:53389 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1Yw3AW-0007Xm-I3 for submit@debbugs.gnu.org; Sat, 23 May 2015 02:44:28 -0400 Received: from mtaout25.012.net.il ([80.179.55.181]:51993) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@gnu.org>) id 1Yw3AR-0007XT-4S for 20623@debbugs.gnu.org; Sat, 23 May 2015 02:44:22 -0400 Received: from conversion-daemon.mtaout25.012.net.il by mtaout25.012.net.il (HyperSendmail v2007.08) id <0NOS00M00I3X3C00@mtaout25.012.net.il> for 20623@debbugs.gnu.org; Sat, 23 May 2015 09:40:02 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout25.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NOS00G4NIIP2H60@mtaout25.012.net.il>; Sat, 23 May 2015 09:40:02 +0300 (IDT) Date: Sat, 23 May 2015 09:44:12 +0300 From: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save In-reply-to: <jwvpp5si8nn.fsf-monnier+emacsbugs@gnu.org> X-012-Sender: halo1@inter.net.il To: Stefan Monnier <monnier@iro.umontreal.ca> Message-id: <83mw0v3i9v.fsf@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <jwvoalcfxkp.fsf-monnier+bug#20623@gnu.org> <83iobk4oqm.fsf@gnu.org> <jwvpp5si8nn.fsf-monnier+emacsbugs@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20623 Cc: sledergerber@gmx.net, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@gnu.org> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: 1.0 (+) > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: sledergerber@gmx.net, 20623@debbugs.gnu.org > Date: Fri, 22 May 2015 17:51:07 -0400 > > >> > What would you expect Emacs to do instead? It just obeys the stated > >> > encoding, which says nothing about the BOM. How can Emacs know when > >> > to use utf-8 and when utf-8-with-signature? > >> To the extent that Emacs has seen the BOM when opening the file, it > >> would make sense for Emacs to try and preserve this detail. > > It does. > > While there are cases where it does, this bug report is about a case > where it doesn't, IIUC. AFAIU, that happened because the user has this in ~/.emacs: (setq-default buffer-file-coding-system 'utf-8-dos) IMO, this bad customization should be removed, and then the problem will go away. From debbugs-submit-bounces@debbugs.gnu.org Sat May 23 13:11:33 2015 Received: (at 20623) by debbugs.gnu.org; 23 May 2015 17:11:33 +0000 Received: from localhost ([127.0.0.1]:54123 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1YwCxN-0008OD-4n for submit@debbugs.gnu.org; Sat, 23 May 2015 13:11:32 -0400 Received: from mout.gmx.net ([212.227.15.15]:56581) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <sledergerber@gmx.net>) id 1YwCxH-0008Nq-7u for 20623@debbugs.gnu.org; Sat, 23 May 2015 13:11:27 -0400 Received: from [192.168.1.100] ([77.56.185.142]) by mail.gmx.com (mrgmx001) with ESMTPSA (Nemesis) id 0MEXHd-1YuF0E2zUH-00FkAL; Sat, 23 May 2015 19:11:15 +0200 MIME-Version: 1.0 To: Eli Zaretskii <eliz@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> From: Simon Ledergerber <sledergerber@gmx.net> Subject: RE: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Date: Sat, 23 May 2015 19:11:15 +0200 In-Reply-To: <83mw0v3i9v.fsf@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <jwvoalcfxkp.fsf-monnier+bug#20623@gnu.org> <83iobk4oqm.fsf@gnu.org> <jwvpp5si8nn.fsf-monnier+emacsbugs@gnu.org> <83mw0v3i9v.fsf@gnu.org> Content-Type: multipart/alternative; boundary="_3B118A15-6F26-4E0C-953C-6EEA7CE91C7C_" Message-ID: <0LpKKr-1Zb1Pi3ZTR-00fE69@mail.gmx.com> X-Provags-ID: V03:K0:2Gr+jLSvrzGgm6BXcfumIEO+DN+BZlqRSe+nEkLUNveQKTqCvmI wIGFzERmimZqPgJz7+i/Hs+GBxAQCotz1KUG6MXYyxQ+a5yR/hz3SIKCuogU0CMm3hz28o3 4giF5iWzzk8Pn4UuktCSyKbk/P+PKwB65Ej7RvVb9rFma9IU6CylJ9WYzXVAiHwLeHl3kx4 Rc4sYMyugCu66xmhKTE0Q== X-UI-Out-Filterresults: notjunk:1; X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20623 Cc: 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -0.0 (/) --_3B118A15-6F26-4E0C-953C-6EEA7CE91C7C_ Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As already mentioned in my last post, even when I started Emacs with the op= tion -Q, which should opt out my customizations, it made no difference. So = naturally, the source of the problem will be somewhere else. -----Original Message----- From: "Eli Zaretskii" <eliz@gnu.org> Sent: =E2=80=8E23.=E2=80=8E05.=E2=80=8E2015 08:44 To: "Stefan Monnier" <monnier@iro.umontreal.ca> Cc: "sledergerber@gmx.net" <sledergerber@gmx.net>; "20623@debbugs.gnu.org" = <20623@debbugs.gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset=3D"utf-8" = declaration loose BOM; Coding system is reset from utf-8-with-signature to = utf-8 on save > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: sledergerber@gmx.net, 20623@debbugs.gnu.org > Date: Fri, 22 May 2015 17:51:07 -0400 >=20 > >> > What would you expect Emacs to do instead? It just obeys the stated > >> > encoding, which says nothing about the BOM. How can Emacs know when > >> > to use utf-8 and when utf-8-with-signature? > >> To the extent that Emacs has seen the BOM when opening the file, it > >> would make sense for Emacs to try and preserve this detail. > > It does. >=20 > While there are cases where it does, this bug report is about a case > where it doesn't, IIUC. AFAIU, that happened because the user has this in ~/.emacs: (setq-default buffer-file-coding-system 'utf-8-dos) IMO, this bad customization should be removed, and then the problem will go away. --_3B118A15-6F26-4E0C-953C-6EEA7CE91C7C_ Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8" <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset= =3Dutf-8"></head><body><div><div style=3D"font-family: Calibri,sans-serif; = font-size: 11pt;">As already mentioned in my last post, even when I started= Emacs with the option -Q, which should opt out my customizations, it made = no difference. So naturally, the source of the problem will be somewhere el= se.</div></div><div dir=3D"ltr"><hr><span style=3D"font-family: Calibri,san= s-serif; font-size: 11pt; font-weight: bold;">From: </span><span style=3D"f= ont-family: Calibri,sans-serif; font-size: 11pt;"><a href=3D"mailto:eliz@gn= u.org">Eli Zaretskii</a></span><br><span style=3D"font-family: Calibri,sans= -serif; font-size: 11pt; font-weight: bold;">Sent: </span><span style=3D"fo= nt-family: Calibri,sans-serif; font-size: 11pt;">=E2=80=8E23.=E2=80=8E05.= =E2=80=8E2015 08:44</span><br><span style=3D"font-family: Calibri,sans-seri= f; font-size: 11pt; font-weight: bold;">To: </span><span style=3D"font-fami= ly: Calibri,sans-serif; font-size: 11pt;"><a href=3D"mailto:monnier@iro.umo= ntreal.ca">Stefan Monnier</a></span><br><span style=3D"font-family: Calibri= ,sans-serif; font-size: 11pt; font-weight: bold;">Cc: </span><span style=3D= "font-family: Calibri,sans-serif; font-size: 11pt;"><a href=3D"mailto:slede= rgerber@gmx.net">sledergerber@gmx.net</a>; <a href=3D"mailto:20623@debbugs.= gnu.org">20623@debbugs.gnu.org</a></span><br><span style=3D"font-family: Ca= libri,sans-serif; font-size: 11pt; font-weight: bold;">Subject: </span><spa= n style=3D"font-family: Calibri,sans-serif; font-size: 11pt;">Re: bug#20623= : XML and HTML files with encoding/charset=3D"utf-8" declaration loose BOM;= Coding system is reset from utf-8-with-signature to utf-8 on save</span><b= r><br></div>> From: Stefan Monnier <monnier@iro.umontreal.ca><br>&= gt; Cc: sledergerber@gmx.net,  20623@debbugs.gnu.org<br>> Date: Fri= , 22 May 2015 17:51:07 -0400<br>> <br>> >> > What would you = expect Emacs to do instead?  It just obeys the stated<br>> >>= > encoding, which says nothing about the BOM.  How can Emacs know = when<br>> >> > to use utf-8 and when utf-8-with-signature?<br>&= gt; >> To the extent that Emacs has seen the BOM when opening the fil= e, it<br>> >> would make sense for Emacs to try and preserve this = detail.<br>> > It does.<br>> <br>> While there are cases where = it does, this bug report is about a case<br>> where it doesn't, IIUC.<br= ><br>AFAIU, that happened because the user has this in ~/.emacs:<br><br>&nb= sp; (setq-default buffer-file-coding-system 'utf-8-dos)<br><br>IMO, this ba= d customization should be removed, and then the problem<br>will go away.<br= ></body></html>= --_3B118A15-6F26-4E0C-953C-6EEA7CE91C7C_-- From debbugs-submit-bounces@debbugs.gnu.org Sat May 23 13:21:11 2015 Received: (at 20623) by debbugs.gnu.org; 23 May 2015 17:21:11 +0000 Received: from localhost ([127.0.0.1]:54128 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1YwD6h-0000A9-0c for submit@debbugs.gnu.org; Sat, 23 May 2015 13:21:11 -0400 Received: from mtaout22.012.net.il ([80.179.55.172]:64141) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@gnu.org>) id 1YwD6b-00009N-IJ for 20623@debbugs.gnu.org; Sat, 23 May 2015 13:21:05 -0400 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NOT00400B86G200@a-mtaout22.012.net.il> for 20623@debbugs.gnu.org; Sat, 23 May 2015 20:20:54 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NOT0046XC6UH920@a-mtaout22.012.net.il>; Sat, 23 May 2015 20:20:54 +0300 (IDT) Date: Sat, 23 May 2015 20:20:56 +0300 From: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save In-reply-to: <0LpKKr-1Zb1Pi3ZTR-00fE69@mail.gmx.com> X-012-Sender: halo1@inter.net.il To: Simon Ledergerber <sledergerber@gmx.net> Message-id: <83lhgf1a87.fsf@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <jwvoalcfxkp.fsf-monnier+bug#20623@gnu.org> <83iobk4oqm.fsf@gnu.org> <jwvpp5si8nn.fsf-monnier+emacsbugs@gnu.org> <83mw0v3i9v.fsf@gnu.org> <0LpKKr-1Zb1Pi3ZTR-00fE69@mail.gmx.com> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20623 Cc: monnier@iro.umontreal.ca, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@gnu.org> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: 1.0 (+) > Cc: <20623@debbugs.gnu.org> > From: Simon Ledergerber <sledergerber@gmx.net> > Date: Sat, 23 May 2015 19:11:15 +0200 > > As already mentioned in my last post, even when I started Emacs with the option > -Q, which should opt out my customizations, it made no difference. So > naturally, the source of the problem will be somewhere else. Doesn't happen to me. So please post the file you used and the exact sequence of steps, starting from 'emacs -Q", to reproduce the problem. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 12 17:45:41 2016 Received: (at 20623) by debbugs.gnu.org; 12 Oct 2016 21:45:41 +0000 Received: from localhost ([127.0.0.1]:53517 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1buRLJ-0002Gr-C2 for submit@debbugs.gnu.org; Wed, 12 Oct 2016 17:45:41 -0400 Received: from clientmail.realize.ch ([46.140.89.53]:2660) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <alain.schneble@realize.ch>) id 1buRLH-0002GX-8S for 20623@debbugs.gnu.org; Wed, 12 Oct 2016 17:45:39 -0400 Received: from rintintin.hq.realize.ch.lan.rit (Unknown [192.168.0.105]) by clientmail.realize.ch with ESMTP ; Wed, 12 Oct 2016 23:45:17 +0200 Received: from myngb (192.168.66.65) by rintintin.hq.realize.ch.lan.rit (192.168.0.105) with Microsoft SMTP Server (TLS) id 15.0.516.32; Wed, 12 Oct 2016 23:45:04 +0200 From: Alain Schneble <a.s@realize.ch> To: Simon Ledergerber <sledergerber@gmx.net> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> Date: Wed, 12 Oct 2016 23:44:57 +0200 In-Reply-To: <555F2D3C.6090608@gmx.net> (Simon Ledergerber's message of "Fri, 22 May 2015 15:21:00 +0200") Message-ID: <8660oxdyxy.fsf@realize.ch> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain X-ClientProxiedBy: rintintin.hq.realize.ch.lan.rit (192.168.0.105) To rintintin.hq.realize.ch.lan.rit (192.168.0.105) X-Spam-Score: -0.4 (/) X-Debbugs-Envelope-To: 20623 Cc: Eli Zaretskii <eliz@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -0.4 (/) I'm joining this discussion and would like to report a recipe to reproduce this issue on Windows: - emacs -Q - C-x C-f utf-8-bom-test.xml - Enter the following text in the new buffer: <?xml version="1.0" encoding="utf-8"?> <root></root> - C-x RET c utf-8-with-signature-dos C-x C-s yes RET - C-x k RET - C-x C-f utf-8-bom-test.xml - M-: buffer-file-coding-system => utf-8-with-signature-dos - Change buffer content, e.g. add some text to the root element: <?xml version="1.0" encoding="utf-8"?> <root>test</root> - C-x C-s - M-: buffer-file-coding-system => utf-8-dos (expected coding system: utf-8-with-signature-dos) As it was already mentioned in this thread, just by visiting the file, then changing and saving the buffer, the BOM gets lost. This is due to select-safe-coding-system (called by choose_write_coding_system) fully trusting the coding system identified by find-auto-coding. So far so good. The latter eventually calls auto-coding-functions which in turn calls the built-in sgml-xml-auto-coding-function which I think should take into account some context to enrich the derived coding system with a signature if needed. Similar to what select-safe-coding-system does to enrich the coding with the proper eol-type. Does that make sense to you? If so, I'll try to come up with a patch that enhances sgml-xml-auto-coding-function to take into account buffer-file-coding-system (buffer + default value) in case it carries the same text-conversion but different signature. The proposed "auto coding" shall inherit the signature in this case. Thanks for any help. Alain From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 04 11:54:27 2017 Received: (at 20623) by debbugs.gnu.org; 4 Dec 2017 16:54:27 +0000 Received: from localhost ([127.0.0.1]:45734 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1eLu0f-0005Zz-Nz for submit@debbugs.gnu.org; Mon, 04 Dec 2017 11:54:27 -0500 Received: from eggs.gnu.org ([208.118.235.92]:41019) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <rgm@gnu.org>) id 1eLu0e-0005Zm-Pb for 20623@debbugs.gnu.org; Mon, 04 Dec 2017 11:54:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1eLu0W-0004mX-Ih for 20623@debbugs.gnu.org; Mon, 04 Dec 2017 11:54:17 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:57927) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1eLu0N-0004jt-0q; Mon, 04 Dec 2017 11:54:07 -0500 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@gnu.org>) id 1eLu0J-0000AF-Tr; Mon, 04 Dec 2017 11:54:04 -0500 From: Glenn Morris <rgm@gnu.org> To: Alain Schneble <a.s@realize.ch> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> X-Spook: Red Cross underground Storm e-bomb ANZUS keyhole X-Ran: /$1y9:b/@IBq_f7-m$>><NZ=rbVe/8VF#yBrhhg&a4]6&>eM)%"b#Gn#MHD=z2rzu1]d}k X-Hue: red X-Attribution: GM Date: Mon, 04 Dec 2017 11:54:03 -0500 In-Reply-To: <8660oxdyxy.fsf@realize.ch> (Alain Schneble's message of "Wed, 12 Oct 2016 23:44:57 +0200") Message-ID: <457eu2h1sk.fsf@fencepost.gnu.org> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: Simon Ledergerber <sledergerber@gmx.net>, Eli Zaretskii <eliz@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -5.0 (-----) Now reported with "fix this or get removed from the distribution" severity at <https://bugs.debian.org/883434>. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 04 12:39:05 2017 Received: (at 20623) by debbugs.gnu.org; 4 Dec 2017 17:39:05 +0000 Received: from localhost ([127.0.0.1]:45772 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1eLuht-0006dl-2J for submit@debbugs.gnu.org; Mon, 04 Dec 2017 12:39:05 -0500 Received: from pmta21.teksavvy.com ([76.10.157.36]:40008) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <monnier@iro.umontreal.ca>) id 1eLuhr-0006dH-A0 for 20623@debbugs.gnu.org; Mon, 04 Dec 2017 12:39:03 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2HSGwBUhyVa/7mYWxdcHAEBAQQBAQoBA?= =?us-ascii?q?YM8Zm6DW4pSjxOBfZcVggEthRgChTRDFAEBAQEBAQEBAQNoKIJrS1gBAQEBAQE?= =?us-ascii?q?jAg1eAQQBeQULCw0BJgcLFBgxii0IEKlxIQKKNwEBAQEBAQQBAQEBHwWFUYZqh?= =?us-ascii?q?QkBhW8gBZMahgmJSYd0mRcogUCFdIx8AosJNiOBTTIaCDCCZIJOAxwZgWwjik0?= =?us-ascii?q?BAQE?= X-IPAS-Result: =?us-ascii?q?A2HSGwBUhyVa/7mYWxdcHAEBAQQBAQoBAYM8Zm6DW4pSjxO?= =?us-ascii?q?BfZcVggEthRgChTRDFAEBAQEBAQEBAQNoKIJrS1gBAQEBAQEjAg1eAQQBeQULC?= =?us-ascii?q?w0BJgcLFBgxii0IEKlxIQKKNwEBAQEBAQQBAQEBHwWFUYZqhQkBhW8gBZMahgm?= =?us-ascii?q?JSYd0mRcogUCFdIx8AosJNiOBTTIaCDCCZIJOAxwZgWwjik0BAQE?= X-IronPort-AV: E=Sophos;i="5.45,359,1508817600"; d="scan'208";a="10727083" Received: from unknown (HELO pastel.home) ([23.91.152.185]) by smtp.teksavvy.com with ESMTP; 04 Dec 2017 12:38:57 -0500 Received: by pastel.home (Postfix, from userid 20848) id 2B35D61367; Mon, 4 Dec 2017 12:38:57 -0500 (EST) From: Stefan Monnier <monnier@iro.umontreal.ca> To: Glenn Morris <rgm@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> Date: Mon, 04 Dec 2017 12:38:57 -0500 In-Reply-To: <457eu2h1sk.fsf@fencepost.gnu.org> (Glenn Morris's message of "Mon, 04 Dec 2017 11:54:03 -0500") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 20623 Cc: Simon Ledergerber <sledergerber@gmx.net>, Eli Zaretskii <eliz@gnu.org>, Alain Schneble <a.s@realize.ch>, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: 0.3 (/) > Now reported with "fix this or get removed from the distribution" > severity at <https://bugs.debian.org/883434>. I'm curious to see if the OP's "grave" severity settings will stick. "Grave" is defined in https://www.debian.org/Bugs/Developer#severities as: makes the package in question unusable or mostly so, or causes data loss, or introduces a security hole allowing access to the accounts of users who use the package. The only part that could arguably apply is "causes data loss", but even that is stretching the meaning of those words, I think. This said, we should indeed fix this bug. Not sure how to Do It Right but least this specific problem should be fixable with a patch along the lines of the one below (guaranteed 100% untested). Stefan diff --git a/lisp/international/mule.el b/lisp/international/mule.el index 019e65b2c6..5c0675aa2f 100644 --- a/lisp/international/mule.el +++ b/lisp/international/mule.el @@ -1885,6 +1885,12 @@ auto-coding-alist-lookup (setq alist (cdr alist)))) coding-system)) +(defun mule--coding-system-compatible-p (cs new-cs) + "Return non-nil if CS is one of the coding-systems described by NEW-CS." + (let ((base (coding-system-base cs))) + (or (eq base new-cs) + (eq base (intern (concat new-cs "-with-signature")))))) + (put 'enable-character-translation 'permanent-local t) (put 'enable-character-translation 'safe-local-variable 'booleanp) @@ -2038,8 +2044,12 @@ find-auto-coding (save-excursion (goto-char (point-min)) (funcall (pop funcs) size))))) - (if coding-system - (cons coding-system 'auto-coding-functions))))) + (and coding-system + ;; Don't override utf-8-with-signature with utf-8 + ;; or latin-1-mac with latin-1 (bug#20623). + (not (mule--coding-system-compatible-p + buffer-file-coding-system coding-system)) + (cons coding-system 'auto-coding-functions))))) (defun set-auto-coding (filename size) "Return coding system for a file FILENAME of which SIZE bytes follow point. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 04 12:43:25 2017 Received: (at control) by debbugs.gnu.org; 4 Dec 2017 17:43:25 +0000 Received: from localhost ([127.0.0.1]:45782 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1eLum4-0006lU-2O for submit@debbugs.gnu.org; Mon, 04 Dec 2017 12:43:25 -0500 Received: from eggs.gnu.org ([208.118.235.92]:58234) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <rgm@gnu.org>) id 1eLum2-0006lH-ME for control@debbugs.gnu.org; Mon, 04 Dec 2017 12:43:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1eLulx-0006Id-0V for control@debbugs.gnu.org; Mon, 04 Dec 2017 12:43:17 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:59045) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1eLulw-0006IP-SS for control@debbugs.gnu.org; Mon, 04 Dec 2017 12:43:16 -0500 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@gnu.org>) id 1eLulv-0001dE-VB for control@debbugs.gnu.org; Mon, 04 Dec 2017 12:43:16 -0500 Subject: control message for bug 20623 To: <control@debbugs.gnu.org> X-Mailer: mail (GNU Mailutils 2.99.98) Message-Id: <E1eLulv-0001dE-VB@fencepost.gnu.org> From: Glenn Morris <rgm@gnu.org> Date: Mon, 04 Dec 2017 12:43:15 -0500 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -5.0 (-----) # lern2spel retitle 20623 XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 04 15:28:53 2017 Received: (at 20623) by debbugs.gnu.org; 4 Dec 2017 20:28:53 +0000 Received: from localhost ([127.0.0.1]:45966 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1eLxMD-000677-3q for submit@debbugs.gnu.org; Mon, 04 Dec 2017 15:28:53 -0500 Received: from eggs.gnu.org ([208.118.235.92]:44507) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1eLxMB-00066s-31 for 20623@debbugs.gnu.org; Mon, 04 Dec 2017 15:28:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1eLxM1-0006br-NQ for 20623@debbugs.gnu.org; Mon, 04 Dec 2017 15:28:45 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD, URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33650) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1eLxM1-0006bn-Jq; Mon, 04 Dec 2017 15:28:41 -0500 Received: from [176.228.60.248] (port=4686 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1eLxLz-0003Ex-In; Mon, 04 Dec 2017 15:28:40 -0500 Date: Mon, 04 Dec 2017 22:28:20 +0200 Message-Id: <837eu2xmor.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: Stefan Monnier <monnier@iro.umontreal.ca> In-reply-to: <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> (message from Stefan Monnier on Mon, 04 Dec 2017 12:38:57 -0500) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Reply-To: Eli Zaretskii <eliz@gnu.org> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -5.0 (-----) > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Alain Schneble <a.s@realize.ch>, Simon Ledergerber <sledergerber@gmx.net>, 20623@debbugs.gnu.org, Eli Zaretskii <eliz@gnu.org> > Date: Mon, 04 Dec 2017 12:38:57 -0500 > > This said, we should indeed fix this bug. Agreed. > Not sure how to Do It Right but least this specific problem should be > fixable with a patch along the lines of the one below (guaranteed 100% > untested). Isn't it better to fix this in sgml-xml-auto-coding-function? That's where the root cause is, AFAIU. And I don't understand the comment about latin-1-mac: I don't think we have such problems in Emacs. The -with-signature variety is different, because it is not about EOL format. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 04 16:08:22 2017 Received: (at 20623) by debbugs.gnu.org; 4 Dec 2017 21:08:22 +0000 Received: from localhost ([127.0.0.1]:46004 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1eLxyQ-000770-7O for submit@debbugs.gnu.org; Mon, 04 Dec 2017 16:08:22 -0500 Received: from pmta11.teksavvy.com ([76.10.157.34]:59245) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <monnier@iro.umontreal.ca>) id 1eLxyO-00076m-Ct for 20623@debbugs.gnu.org; Mon, 04 Dec 2017 16:08:21 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2GpGwAGuCVa/7mYWxdcHAEBAQQBAQoBA?= =?us-ascii?q?YM8gVSDW4pSjxOBfZcVggGFRQKFNEMUAQEBAQEBAQEBA2gohSQBBAF5BQsLDSc?= =?us-ascii?q?HCxQYMYotCKouIQKKPQEBAQEGAgElg0eCCoZqhQqFbyAFkxqPUpcOiX0ohzSYB?= =?us-ascii?q?zYjgU0yGggwgmSDBoFsI4pNAQEB?= X-IPAS-Result: =?us-ascii?q?A2GpGwAGuCVa/7mYWxdcHAEBAQQBAQoBAYM8gVSDW4pSjxO?= =?us-ascii?q?BfZcVggGFRQKFNEMUAQEBAQEBAQEBA2gohSQBBAF5BQsLDScHCxQYMYotCKouI?= =?us-ascii?q?QKKPQEBAQEGAgElg0eCCoZqhQqFbyAFkxqPUpcOiX0ohzSYBzYjgU0yGggwgmS?= =?us-ascii?q?DBoFsI4pNAQEB?= X-IronPort-AV: E=Sophos;i="5.45,361,1508817600"; d="scan'208";a="11162974" Received: from unknown (HELO pastel.home) ([23.91.152.185]) by smtp.teksavvy.com with ESMTP; 04 Dec 2017 16:08:14 -0500 Received: by pastel.home (Postfix, from userid 20848) id 9462161367; Mon, 4 Dec 2017 16:08:14 -0500 (EST) From: Stefan Monnier <monnier@iro.umontreal.ca> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> Date: Mon, 04 Dec 2017 16:08:14 -0500 In-Reply-To: <837eu2xmor.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 04 Dec 2017 22:28:20 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: 0.3 (/) > Isn't it better to fix this in sgml-xml-auto-coding-function? That's > where the root cause is, AFAIU. I'd expect the same problem would affect all other uses. > And I don't understand the comment about latin-1-mac: I don't think we > have such problems in Emacs. The -with-signature variety is > different, because it is not about EOL format. You might be right, but I don't know where/how this is handled. Stefan From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 10 14:17:37 2017 Received: (at 20623) by debbugs.gnu.org; 10 Dec 2017 19:17:37 +0000 Received: from localhost ([127.0.0.1]:55600 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1eO76X-0005if-D5 for submit@debbugs.gnu.org; Sun, 10 Dec 2017 14:17:37 -0500 Received: from eggs.gnu.org ([208.118.235.92]:40159) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1eO76V-0005iQ-4A for 20623@debbugs.gnu.org; Sun, 10 Dec 2017 14:17:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1eO76M-0007L2-Kq for 20623@debbugs.gnu.org; Sun, 10 Dec 2017 14:17:29 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:52121) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1eO76M-0007Kv-H6; Sun, 10 Dec 2017 14:17:26 -0500 Received: from [176.228.60.248] (port=2582 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1eO76K-0001Dh-FT; Sun, 10 Dec 2017 14:17:26 -0500 Date: Sun, 10 Dec 2017 21:17:00 +0200 Message-Id: <838teatmtv.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: Stefan Monnier <monnier@iro.umontreal.ca> In-reply-to: <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> (message from Stefan Monnier on Mon, 04 Dec 2017 16:08:14 -0500) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Reply-To: Eli Zaretskii <eliz@gnu.org> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -5.0 (-----) > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: rgm@gnu.org, a.s@realize.ch, sledergerber@gmx.net, 20623@debbugs.gnu.org > Date: Mon, 04 Dec 2017 16:08:14 -0500 > > > Isn't it better to fix this in sgml-xml-auto-coding-function? That's > > where the root cause is, AFAIU. > > I'd expect the same problem would affect all other uses. Not sure what you meant by "all other uses". Could you please elaborate? > > And I don't understand the comment about latin-1-mac: I don't think we > > have such problems in Emacs. The -with-signature variety is > > different, because it is not about EOL format. > > You might be right, but I don't know where/how this is handled. I would like to propose the following alternative patch, which accepts utf-8-with-signature and utf-8-hfs as variants of utf-8 for the purposes of encoding of XML files. Comments? Do we want a similar treatment for UTF-16? (That doesn't seem to be required by the bug report, and UTF-16 in XML files is non-standard anyway. But what about HTML?) diff --git a/lisp/international/mule.el b/lisp/international/mule.el index 857fa80..5ff1acf 100644 --- a/lisp/international/mule.el +++ b/lisp/international/mule.el @@ -2493,7 +2493,17 @@ sgml-xml-auto-coding-function (let* ((match (match-string 1)) (sym (intern (downcase match)))) (if (coding-system-p sym) - sym + ;; If the encoding tag is UTF-8 and the buffer's + ;; encoding is one of the variants of UTF-8, use the + ;; buffer's encoding. This allows, e.g., saving an + ;; XML file as UTF-8 with BOM when the tag says UTF-8. + (if (and (coding-system-equal 'utf-8 + (coding-system-type sym)) + (coding-system-equal sym + (coding-system-type + buffer-file-coding-system))) + buffer-file-coding-system + sym) (message "Warning: unknown coding system \"%s\"" match) nil)) ;; Files without an encoding tag should be UTF-8. But users @@ -2506,7 +2516,8 @@ sgml-xml-auto-coding-function (coding-system-base (detect-coding-region (point-min) size t))))) ;; Pure ASCII always comes back as undecided. - (if (memq detected '(utf-8 undecided)) + (if (memq detected + '(utf-8 'utf-8-with-signature 'utf-8-hfs undecided)) 'utf-8 (warn "File contents detected as %s. Consider adding an encoding attribute to the xml declaration, From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 15 04:09:05 2017 Received: (at 20623-done) by debbugs.gnu.org; 15 Dec 2017 09:09:05 +0000 Received: from localhost ([127.0.0.1]:34593 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1ePlzN-0002Vk-3v for submit@debbugs.gnu.org; Fri, 15 Dec 2017 04:09:05 -0500 Received: from eggs.gnu.org ([208.118.235.92]:40242) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1ePlzL-0002VH-ML for 20623-done@debbugs.gnu.org; Fri, 15 Dec 2017 04:09:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1ePlzF-0003b9-Oq for 20623-done@debbugs.gnu.org; Fri, 15 Dec 2017 04:08:58 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50916) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1ePlz9-0003Y7-GP; Fri, 15 Dec 2017 04:08:51 -0500 Received: from [176.228.60.248] (port=4498 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1ePlz8-00064q-WC; Fri, 15 Dec 2017 04:08:51 -0500 Date: Fri, 15 Dec 2017 11:08:50 +0200 Message-Id: <83fu8ctl25.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: monnier@iro.umontreal.ca In-reply-to: <838teatmtv.fsf@gnu.org> (message from Eli Zaretskii on Sun, 10 Dec 2017 21:17:00 +0200) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> <838teatmtv.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623-done Cc: sledergerber@gmx.net, a.s@realize.ch, 20623-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Reply-To: Eli Zaretskii <eliz@gnu.org> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -5.0 (-----) > Date: Sun, 10 Dec 2017 21:17:00 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net > > I would like to propose the following alternative patch, which accepts > utf-8-with-signature and utf-8-hfs as variants of utf-8 for the > purposes of encoding of XML files. Comments? Do we want a similar > treatment for UTF-16? (That doesn't seem to be required by the bug > report, and UTF-16 in XML files is non-standard anyway. But what > about HTML?) No further comments, so I've pushed the change and I'm marking this bug done. From unknown Mon Jun 23 22:03:56 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request <help-debbugs@gnu.org> Subject: Internal Control Message-Id: bug archived. Date: Fri, 12 Jan 2018 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 01 13:48:38 2018 Received: (at control) by debbugs.gnu.org; 1 Aug 2018 17:48:38 +0000 Received: from localhost ([127.0.0.1]:37914 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fkvEk-00033R-4D for submit@debbugs.gnu.org; Wed, 01 Aug 2018 13:48:38 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46830) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <rgm@gnu.org>) id 1fkvEi-00033D-Jj for control@debbugs.gnu.org; Wed, 01 Aug 2018 13:48:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fkvEc-000127-TB for control@debbugs.gnu.org; Wed, 01 Aug 2018 13:48:31 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:52370) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fkvEc-00011n-Nf for control@debbugs.gnu.org; Wed, 01 Aug 2018 13:48:30 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@gnu.org>) id 1fkvEc-0007bF-E3 for control@debbugs.gnu.org; Wed, 01 Aug 2018 13:48:30 -0400 Subject: control message for bug 20623 To: <control@debbugs.gnu.org> X-Mailer: mail (GNU Mailutils 2.99.98) Message-Id: <E1fkvEc-0007bF-E3@fencepost.gnu.org> From: Glenn Morris <rgm@gnu.org> Date: Wed, 01 Aug 2018 13:48:30 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) # 889f07c unarchive 20623 fixed 20623 26.1 From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 01 14:07:41 2018 Received: (at 20623) by debbugs.gnu.org; 1 Aug 2018 18:07:41 +0000 Received: from localhost ([127.0.0.1]:37922 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fkvXB-0005Vk-2K for submit@debbugs.gnu.org; Wed, 01 Aug 2018 14:07:41 -0400 Received: from eggs.gnu.org ([208.118.235.92]:55054) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <rgm@gnu.org>) id 1fkvX8-0005VV-Rr for 20623@debbugs.gnu.org; Wed, 01 Aug 2018 14:07:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fkvX2-0005Ar-SV for 20623@debbugs.gnu.org; Wed, 01 Aug 2018 14:07:33 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:52611) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fkvX0-00059Z-FN; Wed, 01 Aug 2018 14:07:30 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@gnu.org>) id 1fkvWy-0006FP-78; Wed, 01 Aug 2018 14:07:28 -0400 From: Glenn Morris <rgm@gnu.org> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> <838teatmtv.fsf@gnu.org> X-Spook: Somolia Security Council Homeland Defense FDA cypherpunk X-Ran: G?:d0UWDxQd|n|5/6^uMNo6Xf_58WQLA-6_k]]i7:7Tvi\m$^K:!j(RNCo|4|z0_@\^'LV X-Hue: green X-Attribution: GM Date: Wed, 01 Aug 2018 14:07:28 -0400 In-Reply-To: <838teatmtv.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 10 Dec 2017 21:17:00 +0200") Message-ID: <s7o9em2c67.fsf_-_@fencepost.gnu.org> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: sledergerber@gmx.net, a.s@realize.ch, Stefan Monnier <monnier@iro.umontreal.ca>, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) The HTML (not XML) case specified in the original report ("Now the steps for HTML" in https://debbugs.gnu.org/20623#5) and in https://bugs.debian.org/883434 seems unfixed. From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 01 14:41:45 2018 Received: (at 20623) by debbugs.gnu.org; 1 Aug 2018 18:41:45 +0000 Received: from localhost ([127.0.0.1]:37929 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fkw48-0006Nm-Q7 for submit@debbugs.gnu.org; Wed, 01 Aug 2018 14:41:44 -0400 Received: from eggs.gnu.org ([208.118.235.92]:42492) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1fkw47-0006NZ-3y for 20623@debbugs.gnu.org; Wed, 01 Aug 2018 14:41:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1fkw3y-0005IB-Ql for 20623@debbugs.gnu.org; Wed, 01 Aug 2018 14:41:37 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:53055) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1fkw3n-00057d-2m; Wed, 01 Aug 2018 14:41:23 -0400 Received: from [176.228.60.248] (port=4252 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1fkw3l-00060t-T2; Wed, 01 Aug 2018 14:41:22 -0400 Date: Wed, 01 Aug 2018 21:41:15 +0300 Message-Id: <83lg9qlyk4.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: Glenn Morris <rgm@gnu.org> In-reply-to: <s7o9em2c67.fsf_-_@fencepost.gnu.org> (message from Glenn Morris on Wed, 01 Aug 2018 14:07:28 -0400) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> <838teatmtv.fsf@gnu.org> <s7o9em2c67.fsf_-_@fencepost.gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: sledergerber@gmx.net, a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) > From: Glenn Morris <rgm@gnu.org> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 20623@debbugs.gnu.org, a.s@realize.ch, sledergerber@gmx.net > Date: Wed, 01 Aug 2018 14:07:28 -0400 > > The HTML (not XML) case specified in the original report > ("Now the steps for HTML" in https://debbugs.gnu.org/20623#5) and in > https://bugs.debian.org/883434 seems unfixed. Should it be? From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 07 15:15:12 2018 Received: (at 20623) by debbugs.gnu.org; 7 Aug 2018 19:15:12 +0000 Received: from localhost ([127.0.0.1]:43940 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fn7Rn-0005SD-Qf for submit@debbugs.gnu.org; Tue, 07 Aug 2018 15:15:11 -0400 Received: from eggs.gnu.org ([208.118.235.92]:41974) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <rgm@gnu.org>) id 1fn7Rm-0005Me-Tm for 20623@debbugs.gnu.org; Tue, 07 Aug 2018 15:15:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fn7Rg-0001mP-Tx for 20623@debbugs.gnu.org; Tue, 07 Aug 2018 15:15:05 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:58367) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fn7Rd-0001ii-3N; Tue, 07 Aug 2018 15:15:01 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@gnu.org>) id 1fn7Ra-0007pc-VX; Tue, 07 Aug 2018 15:14:59 -0400 From: Glenn Morris <rgm@gnu.org> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> <838teatmtv.fsf@gnu.org> <s7o9em2c67.fsf_-_@fencepost.gnu.org> <83lg9qlyk4.fsf@gnu.org> X-Spook: Firefly Noriega Resistant Subway Pakistan Relief Crest X-Ran: NhSB1="CQcVetpeHRu=?bd4v+*R2b!Q).LkDRl]OfBVl\$snvIE$+W6T:X[~?"q.,C>iMf X-Hue: yellow X-Debbugs-No-Ack: yes X-Attribution: GM Date: Tue, 07 Aug 2018 15:14:58 -0400 In-Reply-To: <83lg9qlyk4.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 01 Aug 2018 21:41:15 +0300") Message-ID: <unin4mrnt9.fsf@fencepost.gnu.org> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: sledergerber@gmx.net, a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) Eli Zaretskii wrote: >> The HTML (not XML) case specified in the original report >> ("Now the steps for HTML" in https://debbugs.gnu.org/20623#5) and in >> https://bugs.debian.org/883434 seems unfixed. > > Should it be? I think this a bug that should be fixed, yes (if that is the question). From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 07 15:15:55 2018 Received: (at control) by debbugs.gnu.org; 7 Aug 2018 19:15:55 +0000 Received: from localhost ([127.0.0.1]:43943 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fn7SV-00062P-3j for submit@debbugs.gnu.org; Tue, 07 Aug 2018 15:15:55 -0400 Received: from eggs.gnu.org ([208.118.235.92]:42513) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <rgm@gnu.org>) id 1fn7ST-0005xL-Mj for control@debbugs.gnu.org; Tue, 07 Aug 2018 15:15:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fn7SN-0002ir-Vm for control@debbugs.gnu.org; Tue, 07 Aug 2018 15:15:48 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:58377) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fn7SN-0002iN-S5 for control@debbugs.gnu.org; Tue, 07 Aug 2018 15:15:47 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@gnu.org>) id 1fn7SN-0007vR-8q for control@debbugs.gnu.org; Tue, 07 Aug 2018 15:15:47 -0400 Subject: control message for bug 20623 To: <control@debbugs.gnu.org> X-Mailer: mail (GNU Mailutils 2.99.98) Message-Id: <E1fn7SN-0007vR-8q@fencepost.gnu.org> From: Glenn Morris <rgm@gnu.org> Date: Tue, 07 Aug 2018 15:15:47 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) found 20623 26.1 From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 08 05:47:56 2018 Received: (at 20623) by debbugs.gnu.org; 8 Aug 2018 09:47:56 +0000 Received: from localhost ([127.0.0.1]:44199 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fnL4N-0001cm-Qs for submit@debbugs.gnu.org; Wed, 08 Aug 2018 05:47:55 -0400 Received: from joooj.vinc17.net ([155.133.131.76]:43198) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <vincent@vinc17.net>) id 1fnL4M-0001ca-DL for 20623@debbugs.gnu.org; Wed, 08 Aug 2018 05:47:54 -0400 Received: from smtp-zira.vinc17.net (unknown [109.190.253.16]) by joooj.vinc17.net (Postfix) with ESMTPSA id 9A0CDBB5; Wed, 8 Aug 2018 11:47:52 +0200 (CEST) Received: by zira.vinc17.org (Postfix, from userid 1000) id 91A5AC20031; Wed, 8 Aug 2018 11:47:48 +0200 (CEST) Date: Wed, 8 Aug 2018 11:47:48 +0200 From: Vincent Lefevre <vincent@vinc17.net> To: Stefan Monnier <monnier@iro.umontreal.ca> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <20180808094748.GA26509@zira.vinc17.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> X-Mailer-Info: https://www.vinc17.net/mutt/ User-Agent: Mutt/1.10.1+58 (10c1ac4b) vl-108074 (2018-07-29) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 20623 Cc: Glenn Morris <rgm@gnu.org>, Eli Zaretskii <eliz@gnu.org>, Alain Schneble <a.s@realize.ch>, 20623@debbugs.gnu.org, Simon Ledergerber <sledergerber@gmx.net> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -1.0 (-) On 2017-12-04 12:38:57 -0500, Stefan Monnier wrote: > > Now reported with "fix this or get removed from the distribution" > > severity at <https://bugs.debian.org/883434>. > > I'm curious to see if the OP's "grave" severity settings will stick. > "Grave" is defined in https://www.debian.org/Bugs/Developer#severities as: > > makes the package in question unusable or mostly so, or causes data > loss, or introduces a security hole allowing access to the accounts > of users who use the package. > > The only part that could arguably apply is "causes data loss", but even > that is stretching the meaning of those words, I think. Actually there's the issue that the coding system (in Emacs sense) is changed, but also the fact that this change is invisible to the user (mainly because the BOM is usually not visible), which makes the issue even worse. Basically, this is invisible data corruption. Even though only two bytes are removed, this introduces breakage in other applications, and it can take much time to the user to find the cause. Emacs should not change the coding system when not needed, and when it needs to, it must make sure to have a confirmation from the user. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 08 10:45:31 2018 Received: (at 20623) by debbugs.gnu.org; 8 Aug 2018 14:45:31 +0000 Received: from localhost ([127.0.0.1]:45211 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fnPiN-0003KD-GC for submit@debbugs.gnu.org; Wed, 08 Aug 2018 10:45:31 -0400 Received: from chene.dit.umontreal.ca ([132.204.246.20]:57396) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <monnier@iro.umontreal.ca>) id 1fnPiL-0003IP-Ul for 20623@debbugs.gnu.org; Wed, 08 Aug 2018 10:45:30 -0400 Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by chene.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w78EjPWE017194; Wed, 8 Aug 2018 10:45:26 -0400 Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id BD31EAE16B; Wed, 8 Aug 2018 10:45:24 -0400 (EDT) From: Stefan Monnier <monnier@IRO.UMontreal.CA> To: Vincent Lefevre <vincent@vinc17.net> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <jwvh8k4c416.fsf-monnier+emacsbugs@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> Date: Wed, 08 Aug 2018 10:45:24 -0400 In-Reply-To: <20180808094748.GA26509@zira.vinc17.org> (Vincent Lefevre's message of "Wed, 8 Aug 2018 11:47:48 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.2 X-NAI-Spam-Rules: 3 Rules triggered LNG_SB_1=0.2, EDT_SA_DN_PASS=0, RV6347=0 X-NAI-Spam-Version: 2.3.0.9418 : core <6347> : inlines <6803> : streams <1794912> : uri <2685906> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20623 Cc: Glenn Morris <rgm@gnu.org>, Eli Zaretskii <eliz@gnu.org>, Alain Schneble <a.s@realize.ch>, 20623@debbugs.gnu.org, Simon Ledergerber <sledergerber@gmx.net> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Actually there's the issue that the coding system (in Emacs sense) > is changed, but also the fact that this change is invisible to the > user (mainly because the BOM is usually not visible), which makes > the issue even worse. Basically, this is invisible data corruption. > Even though only two bytes are removed, this introduces breakage in > other applications, and it can take much time to the user to find > the cause. > > Emacs should not change the coding system when not needed, and when > it needs to, it must make sure to have a confirmation from the user. FWIW, I agree: I don't think it qualifies as Debian's definition of "grave", but there is no doubt that it's a bug and that we should fix it. Stefan From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 05:15:41 2018 Received: (at 20623-done) by debbugs.gnu.org; 11 Aug 2018 09:15:41 +0000 Received: from localhost ([127.0.0.1]:47783 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1foPzo-0005vS-Tl for submit@debbugs.gnu.org; Sat, 11 Aug 2018 05:15:41 -0400 Received: from eggs.gnu.org ([208.118.235.92]:37210) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1foPzo-0005vH-0V for 20623-done@debbugs.gnu.org; Sat, 11 Aug 2018 05:15:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foPze-0007sk-R9 for 20623-done@debbugs.gnu.org; Sat, 11 Aug 2018 05:15:34 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:36347) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foPze-0007sd-Mq; Sat, 11 Aug 2018 05:15:30 -0400 Received: from [176.228.60.248] (port=2187 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1foPzd-0007lz-Pm; Sat, 11 Aug 2018 05:15:30 -0400 Date: Sat, 11 Aug 2018 12:15:31 +0300 Message-Id: <83a7ptmfgs.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: Vincent Lefevre <vincent@vinc17.net> In-reply-to: <20180808094748.GA26509@zira.vinc17.org> (message from Vincent Lefevre on Wed, 8 Aug 2018 11:47:48 +0200) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623-done Cc: rgm@gnu.org, a.s@realize.ch, monnier@iro.umontreal.ca, 20623-done@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) > Date: Wed, 8 Aug 2018 11:47:48 +0200 > From: Vincent Lefevre <vincent@vinc17.net> > Cc: Glenn Morris <rgm@gnu.org>, Simon Ledergerber <sledergerber@gmx.net>, > Eli Zaretskii <eliz@gnu.org>, Alain Schneble <a.s@realize.ch>, > 20623@debbugs.gnu.org > > On 2017-12-04 12:38:57 -0500, Stefan Monnier wrote: > > > Now reported with "fix this or get removed from the distribution" > > > severity at <https://bugs.debian.org/883434>. > > > > I'm curious to see if the OP's "grave" severity settings will stick. > > "Grave" is defined in https://www.debian.org/Bugs/Developer#severities as: > > > > makes the package in question unusable or mostly so, or causes data > > loss, or introduces a security hole allowing access to the accounts > > of users who use the package. > > > > The only part that could arguably apply is "causes data loss", but even > > that is stretching the meaning of those words, I think. > > Actually there's the issue that the coding system (in Emacs sense) > is changed, but also the fact that this change is invisible to the > user (mainly because the BOM is usually not visible), which makes > the issue even worse. Basically, this is invisible data corruption. > Even though only two bytes are removed, this introduces breakage in > other applications, and it can take much time to the user to find > the cause. > > Emacs should not change the coding system when not needed, and when > it needs to, it must make sure to have a confirmation from the user. I agree with the last paragraph, so I've now fixed the remaining issue of this bug (with HTML files) on the emacs-26 branch. However, I would respectfully request that in the future bug reports be accurate and fair in the assigned severity, and in particular make sure that the severity matches the actual behavior as judged objectively. In this case, I cannot but express my extreme surprise to see such a minor issue described as "grave". The alleged data loss is minor, if it exists at all (the BOM is not data important for the user, nor data whose loss cannot be easily repaired). The unspecified "breakage in other applications" cannot be considered without the missing details, but in general I'd be surprised to hear about modern applications (browsers?) that really need a BOM in UTF-8 encoded HTML files to the degree that the lack of BOM causes them to "break" in some way; if they do, it could arguably be a bug in those applications. Bottom line: artificially and unreasonably increasing the severity level doesn't help the motivation to fix the bug, and if anything, has the opposite effect of ignoring the source of the bug report as not serious. I'm sure we don't want that, certainly not for bugs reported by Debian. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 06:13:45 2018 Received: (at 20623) by debbugs.gnu.org; 11 Aug 2018 10:13:45 +0000 Received: from localhost ([127.0.0.1]:47816 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1foQu0-0007QP-R8 for submit@debbugs.gnu.org; Sat, 11 Aug 2018 06:13:45 -0400 Received: from joooj.vinc17.net ([155.133.131.76]:44104) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <vincent@vinc17.net>) id 1foQtz-0007QH-1A for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 06:13:43 -0400 Received: from smtp-zira.vinc17.net (unknown [37.171.204.47]) by joooj.vinc17.net (Postfix) with ESMTPSA id 8F5082A7; Sat, 11 Aug 2018 12:13:41 +0200 (CEST) Received: by zira.vinc17.org (Postfix, from userid 1000) id 11948C2008E; Sat, 11 Aug 2018 12:13:41 +0200 (CEST) Date: Sat, 11 Aug 2018 12:13:41 +0200 From: Vincent Lefevre <vincent@vinc17.net> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <20180811101341.GA4800@zira.vinc17.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> <83a7ptmfgs.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <83a7ptmfgs.fsf@gnu.org> X-Mailer-Info: https://www.vinc17.net/mutt/ User-Agent: Mutt/1.10.1+58 (10c1ac4b) vl-108074 (2018-07-29) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -1.0 (-) On 2018-08-11 12:15:31 +0300, Eli Zaretskii wrote: > In this case, I cannot but express my extreme surprise to see such a > minor issue described as "grave". The alleged data loss is minor, if > it exists at all (the BOM is not data important for the user, You're completely wrong. The presence of BOM or not is very important for some applications, such as Firefox (not to determine the charset, but the MIME type of local files). > nor data whose loss cannot be easily repaired). It can be repaired, but the problems are the user doesn't know what's going on and this breaks things. If some package removed the execute permission of some utility in /bin, this would also be a grave bug, though it can easily been repaired. > The unspecified "breakage in > other applications" cannot be considered without the missing details, > but in general I'd be surprised to hear about modern applications > (browsers?) that really need a BOM in UTF-8 encoded HTML files to the > degree that the lack of BOM causes them to "break" in some way; if > they do, it could arguably be a bug in those applications. Firefox. And that's actually the way I detected the bug, after hours of trying to find why it was behaving in an inconsistent way. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 06:45:27 2018 Received: (at 20623) by debbugs.gnu.org; 11 Aug 2018 10:45:27 +0000 Received: from localhost ([127.0.0.1]:47836 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1foROh-0001nb-Fz for submit@debbugs.gnu.org; Sat, 11 Aug 2018 06:45:27 -0400 Received: from eggs.gnu.org ([208.118.235.92]:49447) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1foROg-0001nP-7C for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 06:45:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foROW-0002VX-EG for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 06:45:20 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:37056) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foROW-0002VP-BS; Sat, 11 Aug 2018 06:45:16 -0400 Received: from [176.228.60.248] (port=4028 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1foROV-0003bc-80; Sat, 11 Aug 2018 06:45:15 -0400 Date: Sat, 11 Aug 2018 13:45:17 +0300 Message-Id: <83zhxtkwqq.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: Vincent Lefevre <vincent@vinc17.net> In-reply-to: <20180811101341.GA4800@zira.vinc17.org> (message from Vincent Lefevre on Sat, 11 Aug 2018 12:13:41 +0200) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> <83a7ptmfgs.fsf@gnu.org> <20180811101341.GA4800@zira.vinc17.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) > Date: Sat, 11 Aug 2018 12:13:41 +0200 > From: Vincent Lefevre <vincent@vinc17.net> > Cc: monnier@iro.umontreal.ca, rgm@gnu.org, sledergerber@gmx.net, > a.s@realize.ch, 20623@debbugs.gnu.org > > On 2018-08-11 12:15:31 +0300, Eli Zaretskii wrote: > > In this case, I cannot but express my extreme surprise to see such a > > minor issue described as "grave". The alleged data loss is minor, if > > it exists at all (the BOM is not data important for the user, > > You're completely wrong. The presence of BOM or not is very important > for some applications, such as Firefox (not to determine the charset, > but the MIME type of local files). Please provide the details, including the use case, if possible. I'm still in the dark regarding the importance of the BOM in UTF-8 encoded HTML stuff. > It can be repaired, but the problems are the user doesn't know > what's going on and this breaks things. I agree about the user not knowing, but that doesn't yet qualify as "data loss", which has an widely accepted meaning. > If some package removed the execute permission of some utility in > /bin, this would also be a grave bug, though it can easily been > repaired. Well, I disagree about the "grave" part, because that means the package is unusable, causes data loss, or introduces a security hole allowing access to the user account. None of that is true in the case in point. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 08:45:23 2018 Received: (at 20623) by debbugs.gnu.org; 11 Aug 2018 12:45:23 +0000 Received: from localhost ([127.0.0.1]:47877 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1foTGl-0006dt-AK for submit@debbugs.gnu.org; Sat, 11 Aug 2018 08:45:23 -0400 Received: from pruche.dit.umontreal.ca ([132.204.246.22]:37689) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <monnier@iro.umontreal.ca>) id 1foTGj-0006dl-GZ for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 08:45:22 -0400 Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w7BCjG93024937; Sat, 11 Aug 2018 08:45:17 -0400 Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id E7029AE1F5; Sat, 11 Aug 2018 08:45:15 -0400 (EDT) From: Stefan Monnier <monnier@IRO.UMontreal.CA> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <jwv600h6pm8.fsf-monnier+bug#20623@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> <838teatmtv.fsf@gnu.org> Date: Sat, 11 Aug 2018 08:45:15 -0400 In-Reply-To: <838teatmtv.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 10 Dec 2017 21:17:00 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.2 X-NAI-Spam-Rules: 3 Rules triggered LNG_SB_1=0.2, EDT_SA_DN_PASS=0, RV6349=0 X-NAI-Spam-Version: 2.3.0.9418 : core <6349> : inlines <6809> : streams <1795190> : uri <2687364> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -3.3 (---) >> > Isn't it better to fix this in sgml-xml-auto-coding-function? That's >> > where the root cause is, AFAIU. >> I'd expect the same problem would affect all other uses. > Not sure what you meant by "all other uses". Could you please > elaborate? Your commit ec6f588940e51013435408a456c10d33ddf98fb2 answers that question: at least sgml-html-meta-auto-coding-function is one of those "other uses". > > And I don't understand the comment about latin-1-mac: I don't think we > > have such problems in Emacs. The -with-signature variety is > > different, because it is not about EOL format. > You might be right, but I don't know where/how this is handled. I still don't know where the EOL part is handled. Stefan From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 09:54:17 2018 Received: (at 20623) by debbugs.gnu.org; 11 Aug 2018 13:54:17 +0000 Received: from localhost ([127.0.0.1]:47884 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1foULQ-0008Br-QC for submit@debbugs.gnu.org; Sat, 11 Aug 2018 09:54:17 -0400 Received: from eggs.gnu.org ([208.118.235.92]:51523) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1foULP-0008Be-7m for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 09:54:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foULJ-0003mO-13 for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 09:54:09 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39244) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foULC-0003iU-RS; Sat, 11 Aug 2018 09:54:02 -0400 Received: from [176.228.60.248] (port=4107 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1foULC-00005K-2R; Sat, 11 Aug 2018 09:54:02 -0400 Date: Sat, 11 Aug 2018 16:54:04 +0300 Message-Id: <83va8hko03.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: Stefan Monnier <monnier@IRO.UMontreal.CA> In-reply-to: <jwv600h6pm8.fsf-monnier+bug#20623@gnu.org> (message from Stefan Monnier on Sat, 11 Aug 2018 08:45:15 -0400) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> <838teatmtv.fsf@gnu.org> <jwv600h6pm8.fsf-monnier+bug#20623@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net > Date: Sat, 11 Aug 2018 08:45:15 -0400 > > > > And I don't understand the comment about latin-1-mac: I don't think we > > > have such problems in Emacs. The -with-signature variety is > > > different, because it is not about EOL format. > > You might be right, but I don't know where/how this is handled. > > I still don't know where the EOL part is handled. If you tell me what do you mean by "handled" in this context, I might be able to help you understand where that happens. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 11:41:11 2018 Received: (at 20623) by debbugs.gnu.org; 11 Aug 2018 15:41:11 +0000 Received: from localhost ([127.0.0.1]:48308 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1foW0o-0002OP-Mz for submit@debbugs.gnu.org; Sat, 11 Aug 2018 11:41:11 -0400 Received: from joooj.vinc17.net ([155.133.131.76]:44152) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <vincent@vinc17.net>) id 1foW0l-0002OF-N2 for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 11:41:04 -0400 Received: from smtp-zira.vinc17.net (unknown [37.171.204.47]) by joooj.vinc17.net (Postfix) with ESMTPSA id 5C0B82A7; Sat, 11 Aug 2018 17:41:02 +0200 (CEST) Received: by zira.vinc17.org (Postfix, from userid 1000) id C961EC2008E; Sat, 11 Aug 2018 17:41:01 +0200 (CEST) Date: Sat, 11 Aug 2018 17:41:01 +0200 From: Vincent Lefevre <vincent@vinc17.net> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <20180811154101.GB4800@zira.vinc17.org> References: <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> <83a7ptmfgs.fsf@gnu.org> <20180811101341.GA4800@zira.vinc17.org> <83zhxtkwqq.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <83zhxtkwqq.fsf@gnu.org> X-Mailer-Info: https://www.vinc17.net/mutt/ User-Agent: Mutt/1.10.1+58 (10c1ac4b) vl-108074 (2018-07-29) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -1.0 (-) On 2018-08-11 13:45:17 +0300, Eli Zaretskii wrote: > > Date: Sat, 11 Aug 2018 12:13:41 +0200 > > From: Vincent Lefevre <vincent@vinc17.net> > > Cc: monnier@iro.umontreal.ca, rgm@gnu.org, sledergerber@gmx.net, > > a.s@realize.ch, 20623@debbugs.gnu.org > > > > On 2018-08-11 12:15:31 +0300, Eli Zaretskii wrote: > > > In this case, I cannot but express my extreme surprise to see such a > > > minor issue described as "grave". The alleged data loss is minor, if > > > it exists at all (the BOM is not data important for the user, > > > > You're completely wrong. The presence of BOM or not is very important > > for some applications, such as Firefox (not to determine the charset, > > but the MIME type of local files). > > Please provide the details, including the use case, if possible. I'm > still in the dark regarding the importance of the BOM in UTF-8 encoded > HTML stuff. https://bugzilla.mozilla.org/show_bug.cgi?id=1422889 for HTML. Wontfix because of: https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm For text/plain only (but this is another example that BOM can matter in practice), there's https://bugzilla.mozilla.org/show_bug.cgi?id=1071816 (which is a bug that should be fixed). > > It can be repaired, but the problems are the user doesn't know > > what's going on and this breaks things. > > I agree about the user not knowing, but that doesn't yet qualify as > "data loss", which has an widely accepted meaning. This is data corruption, which is a form of data loss, because some information is lost in the process (I recall that Emacs does not provide any information to the user about this transformation). -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 12:27:47 2018 Received: (at 20623) by debbugs.gnu.org; 11 Aug 2018 16:27:47 +0000 Received: from localhost ([127.0.0.1]:48325 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1foWju-0003TQ-2N for submit@debbugs.gnu.org; Sat, 11 Aug 2018 12:27:46 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46931) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1foWjs-0003TC-JM for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 12:27:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foWjj-0003xn-Gr for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 12:27:35 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_20 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:40813) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foWjj-0003xO-DK; Sat, 11 Aug 2018 12:27:31 -0400 Received: from [176.228.60.248] (port=2585 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1foWji-0004Kp-As; Sat, 11 Aug 2018 12:27:30 -0400 Date: Sat, 11 Aug 2018 19:27:33 +0300 Message-Id: <83pnyolvgq.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: Vincent Lefevre <vincent@vinc17.net> In-reply-to: <20180811154101.GB4800@zira.vinc17.org> (message from Vincent Lefevre on Sat, 11 Aug 2018 17:41:01 +0200) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> <83a7ptmfgs.fsf@gnu.org> <20180811101341.GA4800@zira.vinc17.org> <83zhxtkwqq.fsf@gnu.org> <20180811154101.GB4800@zira.vinc17.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -1.0 (-) > Date: Sat, 11 Aug 2018 17:41:01 +0200 > From: Vincent Lefevre <vincent@vinc17.net> > Cc: monnier@iro.umontreal.ca, rgm@gnu.org, sledergerber@gmx.net, > a.s@realize.ch, 20623@debbugs.gnu.org > > > > You're completely wrong. The presence of BOM or not is very important > > > for some applications, such as Firefox (not to determine the charset, > > > but the MIME type of local files). > > > > Please provide the details, including the use case, if possible. I'm > > still in the dark regarding the importance of the BOM in UTF-8 encoded > > HTML stuff. > > https://bugzilla.mozilla.org/show_bug.cgi?id=1422889 > > for HTML. Wontfix because of: > > https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm > > For text/plain only (but this is another example that BOM can matter > in practice), there's > > https://bugzilla.mozilla.org/show_bug.cgi?id=1071816 > > (which is a bug that should be fixed). Maybe I'm missing something, but none of these issues describes the situation in this bug report, namely: an HTML file with an explicit charset= tag, with or without a BOM. In fact, the first of these issues happens only in files that _do_ have a BOM, so you could say that Emacs did you a favor by removing it ;-) > > I agree about the user not knowing, but that doesn't yet qualify as > > "data loss", which has an widely accepted meaning. > > This is data corruption, which is a form of data loss, because some > information is lost in the process (I recall that Emacs does not > provide any information to the user about this transformation). That is the most inclusive interpretation of "data loss" I've ever seen. "Some information is lost" is nowhere near what "grave bug" means by "data loss", so I don't think "grave" applies here. Anyway, the Emacs issue is now fixed. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 20:04:14 2018 Received: (at 20623) by debbugs.gnu.org; 12 Aug 2018 00:04:14 +0000 Received: from localhost ([127.0.0.1]:48481 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fodri-0001oR-5f for submit@debbugs.gnu.org; Sat, 11 Aug 2018 20:04:14 -0400 Received: from chene.dit.umontreal.ca ([132.204.246.20]:59715) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <monnier@iro.umontreal.ca>) id 1fodrf-0001oJ-AN for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 20:04:13 -0400 Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by chene.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w7C046Pt008682; Sat, 11 Aug 2018 20:04:07 -0400 Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id 7A5B2AE1F5; Sat, 11 Aug 2018 20:04:05 -0400 (EDT) From: Stefan Monnier <monnier@IRO.UMontreal.CA> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <jwvtvo05u5w.fsf-monnier+emacs@gnu.org> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> <838teatmtv.fsf@gnu.org> <jwv600h6pm8.fsf-monnier+bug#20623@gnu.org> <83va8hko03.fsf@gnu.org> Date: Sat, 11 Aug 2018 20:04:05 -0400 In-Reply-To: <83va8hko03.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 11 Aug 2018 16:54:04 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.2 X-NAI-Spam-Rules: 3 Rules triggered LNG_SB_1=0.2, EDT_SA_DN_PASS=0, RV6349=0 X-NAI-Spam-Version: 2.3.0.9418 : core <6349> : inlines <6809> : streams <1795235> : uri <2687567> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -3.3 (---) >> > > And I don't understand the comment about latin-1-mac: I don't think we >> > > have such problems in Emacs. The -with-signature variety is >> > > different, because it is not about EOL format. >> > You might be right, but I don't know where/how this is handled. >> I still don't know where the EOL part is handled. > If you tell me what do you mean by "handled" in this context, I might > be able to help you understand where that happens. You say that the code I wrote is not needed to make sure an existing latin-1-mac setting isn't overwritten by a latin-1 guess. I expect this is indeed true (otherwise I think we'd have had bug-reports about it), but I don't know where that is handled. Stefan From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 20:11:55 2018 Received: (at 20623) by debbugs.gnu.org; 12 Aug 2018 00:11:55 +0000 Received: from localhost ([127.0.0.1]:48490 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fodz9-0001zj-9G for submit@debbugs.gnu.org; Sat, 11 Aug 2018 20:11:55 -0400 Received: from pruche.dit.umontreal.ca ([132.204.246.22]:46228) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <monnier@iro.umontreal.ca>) id 1fodz7-0001zZ-G7 for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 20:11:54 -0400 Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w7C0BnM5013442; Sat, 11 Aug 2018 20:11:50 -0400 Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id 11E70AE1F5; Sat, 11 Aug 2018 20:11:49 -0400 (EDT) From: Stefan Monnier <monnier@IRO.UMontreal.CA> To: Vincent Lefevre <vincent@vinc17.net> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <jwvo9e85twz.fsf-monnier+emacsbugs@gnu.org> References: <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> <83a7ptmfgs.fsf@gnu.org> <20180811101341.GA4800@zira.vinc17.org> <83zhxtkwqq.fsf@gnu.org> <20180811154101.GB4800@zira.vinc17.org> Date: Sat, 11 Aug 2018 20:11:49 -0400 In-Reply-To: <20180811154101.GB4800@zira.vinc17.org> (Vincent Lefevre's message of "Sat, 11 Aug 2018 17:41:01 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.1 X-NAI-Spam-Rules: 3 Rules triggered TRK_NCM1=0.1, EDT_SA_DN_PASS=0, RV6349=0 X-NAI-Spam-Version: 2.3.0.9418 : core <6349> : inlines <6809> : streams <1795235> : uri <2687570> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, Eli Zaretskii <eliz@gnu.org>, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -3.3 (---) >> > > In this case, I cannot but express my extreme surprise to see such a >> > > minor issue described as "grave". The alleged data loss is minor, if >> > > it exists at all (the BOM is not data important for the user, >> > You're completely wrong. The presence of BOM or not is very important >> > for some applications, such as Firefox (not to determine the charset, >> > but the MIME type of local files). >> Please provide the details, including the use case, if possible. I'm >> still in the dark regarding the importance of the BOM in UTF-8 encoded >> HTML stuff. > https://bugzilla.mozilla.org/show_bug.cgi?id=1422889 I don't see any data loss there. Stefan PS: We can all cook up contrived scenarios where this bug leads to a serious loss of data. But in that case a problem in C-n which makes it move to the wrong column would also qualify as "grave" because I can just as well construct a contrived scenarios where such a bug leads to a serious loss of data. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 20:58:57 2018 Received: (at 20623) by debbugs.gnu.org; 12 Aug 2018 00:58:57 +0000 Received: from localhost ([127.0.0.1]:48507 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1foeif-00039Q-6P for submit@debbugs.gnu.org; Sat, 11 Aug 2018 20:58:57 -0400 Received: from joooj.vinc17.net ([155.133.131.76]:44206) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <vincent@vinc17.net>) id 1foeid-00039H-0K for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 20:58:55 -0400 Received: from smtp-zira.vinc17.net (unknown [37.168.239.54]) by joooj.vinc17.net (Postfix) with ESMTPSA id BEEF82A7; Sun, 12 Aug 2018 02:58:53 +0200 (CEST) Received: by zira.vinc17.org (Postfix, from userid 1000) id 19B7FC20135; Sun, 12 Aug 2018 02:58:53 +0200 (CEST) Date: Sun, 12 Aug 2018 02:58:53 +0200 From: Vincent Lefevre <vincent@vinc17.net> To: Stefan Monnier <monnier@IRO.UMontreal.CA> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <20180812005853.GD4800@zira.vinc17.org> References: <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> <83a7ptmfgs.fsf@gnu.org> <20180811101341.GA4800@zira.vinc17.org> <83zhxtkwqq.fsf@gnu.org> <20180811154101.GB4800@zira.vinc17.org> <jwvo9e85twz.fsf-monnier+emacsbugs@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <jwvo9e85twz.fsf-monnier+emacsbugs@gnu.org> X-Mailer-Info: https://www.vinc17.net/mutt/ User-Agent: Mutt/1.10.1+58 (10c1ac4b) vl-108074 (2018-07-29) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, Eli Zaretskii <eliz@gnu.org>, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -1.0 (-) On 2018-08-11 20:11:49 -0400, Stefan Monnier wrote: > >> Please provide the details, including the use case, if possible. I'm > >> still in the dark regarding the importance of the BOM in UTF-8 encoded > >> HTML stuff. > > https://bugzilla.mozilla.org/show_bug.cgi?id=1422889 > > I don't see any data loss there. Because it is not there, it is in Emacs. What the Mozilla bug shows is that the presence of BOM or not is important and yields very different behavior. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 11 21:34:28 2018 Received: (at 20623) by debbugs.gnu.org; 12 Aug 2018 01:34:28 +0000 Received: from localhost ([127.0.0.1]:48533 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fofH1-0006C5-Rz for submit@debbugs.gnu.org; Sat, 11 Aug 2018 21:34:28 -0400 Received: from joooj.vinc17.net ([155.133.131.76]:44230) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <vincent@vinc17.net>) id 1fofH0-0006By-Ny for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 21:34:27 -0400 Received: from smtp-zira.vinc17.net (unknown [37.168.239.54]) by joooj.vinc17.net (Postfix) with ESMTPSA id DFC452A7; Sun, 12 Aug 2018 03:34:25 +0200 (CEST) Received: by zira.vinc17.org (Postfix, from userid 1000) id 76959C20135; Sun, 12 Aug 2018 03:34:25 +0200 (CEST) Date: Sun, 12 Aug 2018 03:34:25 +0200 From: Vincent Lefevre <vincent@vinc17.net> To: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Message-ID: <20180812013425.GE4800@zira.vinc17.org> References: <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <20180808094748.GA26509@zira.vinc17.org> <83a7ptmfgs.fsf@gnu.org> <20180811101341.GA4800@zira.vinc17.org> <83zhxtkwqq.fsf@gnu.org> <20180811154101.GB4800@zira.vinc17.org> <83pnyolvgq.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <83pnyolvgq.fsf@gnu.org> X-Mailer-Info: https://www.vinc17.net/mutt/ User-Agent: Mutt/1.10.1+58 (10c1ac4b) vl-108074 (2018-07-29) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -1.0 (-) On 2018-08-11 19:27:33 +0300, Eli Zaretskii wrote: > Maybe I'm missing something, but none of these issues describes the > situation in this bug report, namely: an HTML file with an explicit > charset= tag, with or without a BOM. In fact, the first of these > issues happens only in files that _do_ have a BOM, so you could say > that Emacs did you a favor by removing it ;-) In theory yes, but in practice, one does not want that when doing file-loading tests. Otherwise the tests become meaningless. This is just list a spellchecker that automatically corrects spelling mistakes without the user knowledge (even when it is right), as if the goal is to write something about a spelling mistake, the text becomes meaningless. Or when some characters are changed automatically to improve typography (as this can be seen by some blog software when posting, with no previewing), as this can make the text meaningless, e.g. when it is code. > Anyway, the Emacs issue is now fixed. OK, thanks. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sun Aug 12 15:08:07 2018 Received: (at 20623) by debbugs.gnu.org; 12 Aug 2018 19:08:07 +0000 Received: from localhost ([127.0.0.1]:49021 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fovih-0001wT-7a for submit@debbugs.gnu.org; Sun, 12 Aug 2018 15:08:07 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45183) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1fovif-0001vz-Gi for 20623@debbugs.gnu.org; Sun, 12 Aug 2018 15:08:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foviZ-0000RY-Fw for 20623@debbugs.gnu.org; Sun, 12 Aug 2018 15:08:00 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39047) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>) id 1foviT-0000QE-Hk; Sun, 12 Aug 2018 15:07:53 -0400 Received: from [176.228.60.248] (port=4609 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <eliz@gnu.org>) id 1foviS-0002VT-NF; Sun, 12 Aug 2018 15:07:53 -0400 Date: Sun, 12 Aug 2018 22:07:57 +0300 Message-Id: <83wosvjtde.fsf@gnu.org> From: Eli Zaretskii <eliz@gnu.org> To: Stefan Monnier <monnier@IRO.UMontreal.CA> In-reply-to: <jwvtvo05u5w.fsf-monnier+emacs@gnu.org> (message from Stefan Monnier on Sat, 11 Aug 2018 20:04:05 -0400) Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <jwvk1y2ct0w.fsf-monnier+emacsbugs@gnu.org> <837eu2xmor.fsf@gnu.org> <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> <838teatmtv.fsf@gnu.org> <jwv600h6pm8.fsf-monnier+bug#20623@gnu.org> <83va8hko03.fsf@gnu.org> <jwvtvo05u5w.fsf-monnier+emacs@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20623 Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: rgm@gnu.org, a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net > Date: Sat, 11 Aug 2018 20:04:05 -0400 > > You say that the code I wrote is not needed to make sure an existing > latin-1-mac setting isn't overwritten by a latin-1 guess. I expect this > is indeed true (otherwise I think we'd have had bug-reports about it), > but I don't know where that is handled. It is handled inside select-safe-coding-system, which first invokes find-auto-coding to decide which encoding is appropriate (and as part of that, looks at XML or HTML charset information declared by the text), and then, if the encoding it got doesn't specify the EOL conversion, it uses the EOL conversion from the buffer's encoding or from the appropriate defaults. Since XML/HTML charset tags never specify the EOL conversion, it follows that Emacs will never override the EOL conversion of the buffer, it will only use the charset for "text conversion". I hope this answers your question. From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 21 12:55:21 2018 Received: (at control) by debbugs.gnu.org; 21 Aug 2018 16:55:21 +0000 Received: from localhost ([127.0.0.1]:56988 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1fs9w8-0003WE-Qf for submit@debbugs.gnu.org; Tue, 21 Aug 2018 12:55:20 -0400 Received: from eggs.gnu.org ([208.118.235.92]:35099) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <rgm@gnu.org>) id 1fs9w7-0003Vq-E0 for control@debbugs.gnu.org; Tue, 21 Aug 2018 12:55:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fs9w1-0001h7-1d for control@debbugs.gnu.org; Tue, 21 Aug 2018 12:55:13 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:37912) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@gnu.org>) id 1fs9vq-0001Ru-Kf for control@debbugs.gnu.org; Tue, 21 Aug 2018 12:55:07 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@gnu.org>) id 1fs9vq-0006Cs-EU for control@debbugs.gnu.org; Tue, 21 Aug 2018 12:55:02 -0400 Subject: control message for bug 20623 To: <control@debbugs.gnu.org> X-Mailer: mail (GNU Mailutils 2.99.98) Message-Id: <E1fs9vq-0006Cs-EU@fencepost.gnu.org> From: Glenn Morris <rgm@gnu.org> Date: Tue, 21 Aug 2018 12:55:02 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit@debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> X-Spam-Score: -6.0 (------) # ec6f588 fixed 20623 26.2 From unknown Mon Jun 23 22:03:56 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request <help-debbugs@gnu.org> Subject: Internal Control Message-Id: bug archived. Date: Wed, 19 Sep 2018 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator