GNU bug report logs - #20623
XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save

Previous Next

Package: emacs;

Reported by: Simon Ledergerber <sledergerber <at> gmx.net>

Date: Thu, 21 May 2015 18:53:02 UTC

Severity: normal

Found in version 26.1

Fixed in version 26.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: Glenn Morris <rgm <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>, Alain Schneble <a.s <at> realize.ch>, 20623 <at> debbugs.gnu.org, Simon Ledergerber <sledergerber <at> gmx.net>
Subject: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Wed, 08 Aug 2018 10:45:24 -0400
> Actually there's the issue that the coding system (in Emacs sense)
> is changed, but also the fact that this change is invisible to the
> user (mainly because the BOM is usually not visible), which makes
> the issue even worse. Basically, this is invisible data corruption.
> Even though only two bytes are removed, this introduces breakage in
> other applications, and it can take much time to the user to find
> the cause.
>
> Emacs should not change the coding system when not needed, and when
> it needs to, it must make sure to have a confirmation from the user.

FWIW, I agree:  I don't think it qualifies as Debian's definition of
"grave", but there is no doubt that it's a bug and that we should
fix it.


        Stefan




This bug report was last modified 6 years and 279 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.