GNU bug report logs - #20623
XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save

Previous Next

Package: emacs;

Reported by: Simon Ledergerber <sledergerber <at> gmx.net>

Date: Thu, 21 May 2015 18:53:02 UTC

Severity: normal

Found in version 26.1

Fixed in version 26.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #35 received at 20623 <at> debbugs.gnu.org (full text, mbox):

From: Alain Schneble <a.s <at> realize.ch>
To: Simon Ledergerber <sledergerber <at> gmx.net>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 20623 <at> debbugs.gnu.org
Subject: Re: bug#20623: XML and HTML files with
 encoding/charset="utf-8"	declaration loose BOM;
 Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Wed, 12 Oct 2016 23:44:57 +0200
I'm joining this discussion and would like to report a recipe to
reproduce this issue on Windows:

- emacs -Q
- C-x C-f utf-8-bom-test.xml
- Enter the following text in the new buffer:
<?xml version="1.0" encoding="utf-8"?>
<root></root>
- C-x RET c utf-8-with-signature-dos C-x C-s yes RET
- C-x k RET
- C-x C-f utf-8-bom-test.xml
- M-: buffer-file-coding-system
  => utf-8-with-signature-dos
- Change buffer content, e.g. add some text to the root element:
<?xml version="1.0" encoding="utf-8"?>
<root>test</root>
- C-x C-s
- M-: buffer-file-coding-system
  => utf-8-dos
  (expected coding system: utf-8-with-signature-dos)

As it was already mentioned in this thread, just by visiting the file,
then changing and saving the buffer, the BOM gets lost.  This is due to
select-safe-coding-system (called by choose_write_coding_system) fully
trusting the coding system identified by find-auto-coding.  So far so
good.  The latter eventually calls auto-coding-functions which in turn
calls the built-in sgml-xml-auto-coding-function which I think should
take into account some context to enrich the derived coding system with
a signature if needed.  Similar to what select-safe-coding-system does
to enrich the coding with the proper eol-type.

Does that make sense to you?  If so, I'll try to come up with a patch
that enhances sgml-xml-auto-coding-function to take into account
buffer-file-coding-system (buffer + default value) in case it carries
the same text-conversion but different signature.  The proposed "auto
coding" shall inherit the signature in this case.

Thanks for any help.
Alain





This bug report was last modified 6 years and 279 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.