GNU bug report logs - #20623
XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save

Previous Next

Package: emacs;

Reported by: Simon Ledergerber <sledergerber <at> gmx.net>

Date: Thu, 21 May 2015 18:53:02 UTC

Severity: normal

Found in version 26.1

Fixed in version 26.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log

Message #118 received at 20623 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: rgm <at> gnu.org, a.s <at> realize.ch, 20623 <at> debbugs.gnu.org, sledergerber <at> gmx.net
Subject: Re: bug#20623: XML and HTML files with
 encoding/charset="utf-8"	declaration loose BOM;
 Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Sun, 12 Aug 2018 22:07:57 +0300

> From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> Cc: rgm <at> gnu.org, a.s <at> realize.ch, 20623 <at> debbugs.gnu.org, sledergerber <at> gmx.net
> Date: Sat, 11 Aug 2018 20:04:05 -0400
> 
> You say that the code I wrote is not needed to make sure an existing
> latin-1-mac setting isn't overwritten by a latin-1 guess.  I expect this
> is indeed true (otherwise I think we'd have had bug-reports about it),
> but I don't know where that is handled.

It is handled inside select-safe-coding-system, which first invokes
find-auto-coding to decide which encoding is appropriate (and as part
of that, looks at XML or HTML charset information declared by the
text), and then, if the encoding it got doesn't specify the EOL
conversion, it uses the EOL conversion from the buffer's encoding or
from the appropriate defaults.

Since XML/HTML charset tags never specify the EOL conversion, it
follows that Emacs will never override the EOL conversion of the
buffer, it will only use the charset for "text conversion".

I hope this answers your question.

This bug report was last modified 6 years and 333 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #20623 XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save

GNU bug report logs - #20623
XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save