GNU bug report logs -
#68971
Innocent file renders crazy
Previous Next
Reported by: Dan Jacobson <jidanni <at> jidanni.org>
Date: Wed, 7 Feb 2024 14:19:01 UTC
Severity: normal
Tags: notabug
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 68971 in the body.
You can then email your comments to 68971 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68971
; Package
emacs
.
(Wed, 07 Feb 2024 14:19:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Dan Jacobson <jidanni <at> jidanni.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Wed, 07 Feb 2024 14:19:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
There is something crazy about this attached file that causes emacs to
display tons of weird characters.
$ md5sum metadata.html
42c875bae87988bbbd4db481b873bc1a metadata.html
$ emacs -Q metadata.html #crazy!
GNU Emacs 29.1
[metadata.html (text/html, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68971
; Package
emacs
.
(Wed, 07 Feb 2024 15:01:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 68971 <at> debbugs.gnu.org (full text, mbox):
tags 68971 notabug
thanks
> Date: Wed, 07 Feb 2024 22:17:58 +0800
> From: Dan Jacobson <jidanni <at> jidanni.org>
>
> There is something crazy about this attached file that causes emacs to
> display tons of weird characters.
> $ md5sum metadata.html
> 42c875bae87988bbbd4db481b873bc1a metadata.html
> $ emacs -Q metadata.html #crazy!
> GNU Emacs 29.1
It's this part:
<html lang="en"><head><META http-equiv="Content-Type"
content="text/html; charset=utf-16"><font face="calibri"><title>
^^^^^^^^^^^^^^
UTF-16 encodes each character below 0x10000 with 2 bytes, so you get
this gibberish if you try to display plain ASCII text as if it were
UTF-16.
This is not a bug.
Added tag(s) notabug.
Request was from
Eli Zaretskii <eliz <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Wed, 07 Feb 2024 15:01:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68971
; Package
emacs
.
(Wed, 07 Feb 2024 21:47:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 68971 <at> debbugs.gnu.org (full text, mbox):
OK, you are entirely right. It is all the file's fault and not emacs's.
But on the other hand I wouldn't get far telling the Google Chrome team
they should stop overriding charset declarations just to make things
render good.
In the end it's the emacs users who end up not being able to read the
document.
Maybe have some warning "wrong charset detected, proceed? [y,n,(a)utofix...]"
Else well, all the other users in the room are proceeding with their
homework assignment, except Ralph, who uses emacs, which has gibberish
on its screen, with no warnings.
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Thu, 08 Feb 2024 06:04:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Dan Jacobson <jidanni <at> jidanni.org>
:
bug acknowledged by developer.
(Thu, 08 Feb 2024 06:04:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 68971-done <at> debbugs.gnu.org (full text, mbox):
> From: Dan Jacobson <jidanni <at> jidanni.org>
> Cc: 68971 <at> debbugs.gnu.org
> Date: Thu, 08 Feb 2024 05:46:35 +0800
>
> OK, you are entirely right. It is all the file's fault and not emacs's.
>
> But on the other hand I wouldn't get far telling the Google Chrome team
> they should stop overriding charset declarations just to make things
> render good.
>
> In the end it's the emacs users who end up not being able to read the
> document.
>
> Maybe have some warning "wrong charset detected, proceed? [y,n,(a)utofix...]"
How can Emacs know, up front, that the charset is wrong? In general,
when a file claims some specific charset or encoding, Emacs believes
that and obeys. The "gibberish" is in the eyes of the beholder; Emacs
doesn't really understand human-readable text, and so doesn't know
whether what it presents is legible text or garbage caused by wrong
decoding.
> Else well, all the other users in the room are proceeding with their
> homework assignment, except Ralph, who uses emacs, which has gibberish
> on its screen, with no warnings.
What I did when I saw gibberish was to visit the file literally (as in
"M-x find-file-literally"), then, when I saw it was plain ASCII,
looked at its preamble, where I saw UTF-16, which explained why "C-x C-f"
shows gibberish. So when something like this happens, my suggestion
is:
. M-x find-file-literally
. look at the literal display: if its is readable, you can just
proceed with your home assignment
. alternatively, force Emacs to visit with the correct encoding, as
in "C-x RET c utf-8 RET C-x C-f metadata.html RET"
The "utf-8" part above was a guess, based on looking at the file when
visited literally; you may need to guess again if the results are not
good enough. See the node "Text Coding" in the Emacs user manual for
more about these facilities.
And with that, I'm closing this bug.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 07 Mar 2024 12:24:09 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 133 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.