GNU bug report logs - #68971
Innocent file renders crazy

Previous Next

Package: emacs;

Reported by: Dan Jacobson <jidanni <at> jidanni.org>

Date: Wed, 7 Feb 2024 14:19:01 UTC

Severity: normal

Tags: notabug

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 68971 in the body.
You can then email your comments to 68971 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#68971; Package emacs. (Wed, 07 Feb 2024 14:19:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 07 Feb 2024 14:19:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dan Jacobson <jidanni <at> jidanni.org>
To: bug-gnu-emacs <at> gnu.org
Subject: Innocent file renders crazy
Date: Wed, 07 Feb 2024 22:17:58 +0800
[Message part 1 (text/plain, inline)]
There is something crazy about this attached file that causes emacs to
display tons of weird characters.
$ md5sum metadata.html
42c875bae87988bbbd4db481b873bc1a metadata.html
$ emacs -Q metadata.html #crazy!
GNU Emacs 29.1
[metadata.html (text/html, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#68971; Package emacs. (Wed, 07 Feb 2024 15:01:01 GMT) Full text and rfc822 format available.

Message #8 received at 68971 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dan Jacobson <jidanni <at> jidanni.org>
Cc: 68971 <at> debbugs.gnu.org
Subject: Re: bug#68971: Innocent file renders crazy
Date: Wed, 07 Feb 2024 17:00:13 +0200
tags 68971 notabug
thanks

> Date: Wed, 07 Feb 2024 22:17:58 +0800
> From: Dan Jacobson <jidanni <at> jidanni.org>
> 
> There is something crazy about this attached file that causes emacs to
> display tons of weird characters.
> $ md5sum metadata.html
> 42c875bae87988bbbd4db481b873bc1a metadata.html
> $ emacs -Q metadata.html #crazy!
> GNU Emacs 29.1

It's this part:

  <html lang="en"><head><META http-equiv="Content-Type"
  content="text/html; charset=utf-16"><font face="calibri"><title>
                      ^^^^^^^^^^^^^^

UTF-16 encodes each character below 0x10000 with 2 bytes, so you get
this gibberish if you try to display plain ASCII text as if it were
UTF-16.

This is not a bug.




Added tag(s) notabug. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 07 Feb 2024 15:01:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#68971; Package emacs. (Wed, 07 Feb 2024 21:47:01 GMT) Full text and rfc822 format available.

Message #13 received at 68971 <at> debbugs.gnu.org (full text, mbox):

From: Dan Jacobson <jidanni <at> jidanni.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 68971 <at> debbugs.gnu.org
Subject: Re: bug#68971: Innocent file renders crazy
Date: Thu, 08 Feb 2024 05:46:35 +0800
OK, you are entirely right. It is all the file's fault and not emacs's.

But on the other hand I wouldn't get far telling the Google Chrome team
they should stop overriding charset declarations just to make things
render good.

In the end it's the emacs users who end up not being able to read the
document.

Maybe have some warning "wrong charset detected, proceed? [y,n,(a)utofix...]"

Else well, all the other users in the room are proceeding with their
homework assignment, except Ralph, who uses emacs, which has gibberish
on its screen, with no warnings.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 08 Feb 2024 06:04:02 GMT) Full text and rfc822 format available.

Notification sent to Dan Jacobson <jidanni <at> jidanni.org>:
bug acknowledged by developer. (Thu, 08 Feb 2024 06:04:02 GMT) Full text and rfc822 format available.

Message #18 received at 68971-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dan Jacobson <jidanni <at> jidanni.org>
Cc: 68971-done <at> debbugs.gnu.org
Subject: Re: bug#68971: Innocent file renders crazy
Date: Thu, 08 Feb 2024 08:03:01 +0200
> From: Dan Jacobson <jidanni <at> jidanni.org>
> Cc: 68971 <at> debbugs.gnu.org
> Date: Thu, 08 Feb 2024 05:46:35 +0800
> 
> OK, you are entirely right. It is all the file's fault and not emacs's.
> 
> But on the other hand I wouldn't get far telling the Google Chrome team
> they should stop overriding charset declarations just to make things
> render good.
> 
> In the end it's the emacs users who end up not being able to read the
> document.
> 
> Maybe have some warning "wrong charset detected, proceed? [y,n,(a)utofix...]"

How can Emacs know, up front, that the charset is wrong?  In general,
when a file claims some specific charset or encoding, Emacs believes
that and obeys.  The "gibberish" is in the eyes of the beholder; Emacs
doesn't really understand human-readable text, and so doesn't know
whether what it presents is legible text or garbage caused by wrong
decoding.

> Else well, all the other users in the room are proceeding with their
> homework assignment, except Ralph, who uses emacs, which has gibberish
> on its screen, with no warnings.

What I did when I saw gibberish was to visit the file literally (as in
"M-x find-file-literally"), then, when I saw it was plain ASCII,
looked at its preamble, where I saw UTF-16, which explained why "C-x C-f"
shows gibberish.  So when something like this happens, my suggestion
is:

  . M-x find-file-literally
  . look at the literal display: if its is readable, you can just
    proceed with your home assignment
  . alternatively, force Emacs to visit with the correct encoding, as
    in "C-x RET c utf-8 RET C-x C-f metadata.html RET"

The "utf-8" part above was a guess, based on looking at the file when
visited literally; you may need to guess again if the results are not
good enough.  See the node "Text Coding" in the Emacs user manual for
more about these facilities.

And with that, I'm closing this bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 07 Mar 2024 12:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 133 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.