GNU bug report logs - #20623
XML and HTML files with encoding/charset="utf-8" declaration lose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save

Previous Next

Package: emacs;

Reported by: Simon Ledergerber <sledergerber <at> gmx.net>

Date: Thu, 21 May 2015 18:53:02 UTC

Severity: normal

Found in version 26.1

Fixed in version 26.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #103 received at 20623 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: rgm <at> gnu.org, a.s <at> realize.ch, monnier <at> iro.umontreal.ca,
 20623 <at> debbugs.gnu.org, sledergerber <at> gmx.net
Subject: Re: bug#20623: XML and HTML files with encoding/charset="utf-8"
 declaration loose BOM; Coding system is reset from utf-8-with-signature to
 utf-8 on save
Date: Sat, 11 Aug 2018 19:27:33 +0300
> Date: Sat, 11 Aug 2018 17:41:01 +0200
> From: Vincent Lefevre <vincent <at> vinc17.net>
> Cc: monnier <at> iro.umontreal.ca, rgm <at> gnu.org, sledergerber <at> gmx.net,
> 	a.s <at> realize.ch, 20623 <at> debbugs.gnu.org
> 
> > > You're completely wrong. The presence of BOM or not is very important
> > > for some applications, such as Firefox (not to determine the charset,
> > > but the MIME type of local files).
> > 
> > Please provide the details, including the use case, if possible.  I'm
> > still in the dark regarding the importance of the BOM in UTF-8 encoded
> > HTML stuff.
> 
>   https://bugzilla.mozilla.org/show_bug.cgi?id=1422889
> 
> for HTML. Wontfix because of:
> 
>   https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm
> 
> For text/plain only (but this is another example that BOM can matter
> in practice), there's
> 
>   https://bugzilla.mozilla.org/show_bug.cgi?id=1071816
> 
> (which is a bug that should be fixed).

Maybe I'm missing something, but none of these issues describes the
situation in this bug report, namely: an HTML file with an explicit
charset= tag, with or without a BOM.  In fact, the first of these
issues happens only in files that _do_ have a BOM, so you could say
that Emacs did you a favor by removing it ;-)

> > I agree about the user not knowing, but that doesn't yet qualify as
> > "data loss", which has an widely accepted meaning.
> 
> This is data corruption, which is a form of data loss, because some
> information is lost in the process (I recall that Emacs does not
> provide any information to the user about this transformation).

That is the most inclusive interpretation of "data loss" I've ever
seen.  "Some information is lost" is nowhere near what "grave bug"
means by "data loss", so I don't think "grave" applies here.

Anyway, the Emacs issue is now fixed.




This bug report was last modified 6 years and 279 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.