GNU bug report logs -
#22222
24.4; eww ignores charset=utf-8 in text/plain
Previous Next
Reported by: trentbuck <at> gmail.com (Trent W. Buck)
Date: Tue, 22 Dec 2015 04:13:02 UTC
Severity: normal
Found in version 24.4
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22222 in the body.
You can then email your comments to 22222 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#22222
; Package
emacs
.
(Tue, 22 Dec 2015 04:13:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
trentbuck <at> gmail.com (Trent W. Buck)
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Tue, 22 Dec 2015 04:13:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
In #emacs on Freenode, "snotglob" reported that
When I open this doc with eww
https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org
it appears that some apostrophes are shown as \342\200\231
I can reproduce his fault.
I'm using UTF-8 in all the places it can be used.
That URL has this header field:
Content-Type: text/plain; charset=utf-8
While eww-render correctly extracts the charset,
it doesn't use it:
(eww-display-raw)
compare:
(eww-display-html charset url nil point)
I have not actually tested HEAD,
but the issue appears to be present from RTFS:
http://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/net/eww.el#n347
In GNU Emacs 24.4.1 (x86_64-pc-linux-gnu)
of 2014-10-26 on trouble, modified by Debian
System Description: Debian GNU/Linux 8.0 (jessie)
Configured using:
`configure --build x86_64-linux-gnu --prefix=/usr
--sharedstatedir=/var/lib --libexecdir=/usr/lib
--localstatedir=/var/lib --infodir=/usr/share/info
--mandir=/usr/share/man --with-pop=yes
--enable-locallisppath=/etc/emacs24:/etc/emacs:/usr/local/share/emacs/24.4/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.4/site-lisp:/usr/share/emacs/site-lisp
--build x86_64-linux-gnu --prefix=/usr --sharedstatedir=/var/lib
--libexecdir=/usr/lib --localstatedir=/var/lib
--infodir=/usr/share/info --mandir=/usr/share/man --with-pop=yes
--enable-locallisppath=/etc/emacs24:/etc/emacs:/usr/local/share/emacs/24.4/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.4/site-lisp:/usr/share/emacs/site-lisp
--with-x=no --without-gconf --without-gsettings 'CFLAGS=-g -O2
-fstack-protector-strong -Wformat -Werror=format-security -Wall'
CPPFLAGS=-D_FORTIFY_SOURCE=2 LDFLAGS=-Wl,-z,relro'
Important settings:
value of $LC_COLLATE: C
value of $LANG: en_AU.utf8
locale-coding-system: utf-8-unix
Major mode: Fundamental
Minor modes in effect:
xterm-mouse-mode: t
ido-everywhere: t
savehist-mode: t
icomplete-mode: t
show-paren-mode: t
delete-selection-mode: t
tooltip-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
column-number-mode: t
line-number-mode: t
transient-mark-mode: t
Recent input:
Recent messages:
Loading /usr/share/emacs/site-lisp/magit/magit-install.el (source)...done
Loading /etc/emacs/site-start.d/50magit.el (source)...done
Loading /etc/emacs/site-start.d/50pylint.el (source)...
Loading pylint...done
Loading /etc/emacs/site-start.d/50pylint.el (source)...done
Loading /etc/emacs/site-start.d/50python-docutils.el (source)...done
Loading /etc/emacs/site-start.d/50w3m-el-snapshot.el (source)...done
Loading /etc/emacs/site-start.d/51debian-el.el (source)...done
Loading term/xterm...done
For information about GNU Emacs and the GNU system, type C-h C-a.
Load-path shadows:
/usr/share/emacs/24.4/site-lisp/debian-startup hides /usr/share/emacs/site-lisp/debian-startup
/usr/share/emacs/site-lisp/rst hides /usr/share/emacs/24.4/lisp/textmodes/rst
/usr/share/emacs24/site-lisp/dictionaries-common/ispell hides /usr/share/emacs/24.4/lisp/textmodes/ispell
/usr/share/emacs24/site-lisp/dictionaries-common/flyspell hides /usr/share/emacs/24.4/lisp/textmodes/flyspell
Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader xterm advice sendmail rfc2047 rfc2045
ietf-drums mm-util help-fns mail-prsvr mail-utils jka-compr edmacro
kmacro cl-loaddefs cl-lib disp-table xt-mouse ido savehist icomplete
time-date paren delsel saveplace debian-el debian-el-loaddefs w3m-load
pylint compile comint regexp-opt ansi-color ring tool-bar 50magit
emacs-goodies-el emacs-goodies-custom emacs-goodies-loaddefs easy-mmode
dpkg-dev-el dpkg-dev-el-loaddefs tooltip electric uniquify ediff-hook
vc-hooks lisp-float-type tabulated-list newcomment lisp-mode prog-mode
register page menu-bar rfn-eshadow timer select mouse jit-lock font-lock
syntax facemenu font-core frame cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process dbusbind
gfilenotify multi-tty emacs)
Memory information:
((conses 16 102548 4949)
(symbols 48 19418 0)
(miscs 40 41 71)
(strings 32 25459 5386)
(string-bytes 1 684956)
(vectors 16 9215)
(vector-slots 8 355333 14722)
(floats 8 69 208)
(intervals 56 261 0)
(buffers 960 11)
(heap 1024 8764 721))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#22222
; Package
emacs
.
(Tue, 22 Dec 2015 17:03:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 22222 <at> debbugs.gnu.org (full text, mbox):
> From: trentbuck <at> gmail.com (Trent W. Buck)
> Date: Tue, 22 Dec 2015 15:11:56 +1100
>
> In #emacs on Freenode, "snotglob" reported that
>
> When I open this doc with eww
> https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org
> it appears that some apostrophes are shown as \342\200\231
>
> I can reproduce his fault.
It's not that EWW ignored the charset, but it did pretty strange
things with its value, and with decoding of the text in general.
I believe I fixed that in commit 9b0f182 on the emacs-25 branch.
Lars, I'd appreciate if you could eyeball the changeset and tell if I
broke some use case. I couldn't wrap my head around the code there
that decoded stuff (like encoding with UTF-8 and then decoding with
something that must NOT be UTF-8 -- how can that do anything useful?),
so I just rewrote that as best I could. If I broke something, please
show the broken use case, and I will look into it.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#22222
; Package
emacs
.
(Wed, 23 Dec 2015 00:33:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 22222 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii wrote:
>> When I open this doc with eww
>> https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org
>> it appears that some apostrophes are shown as \342\200\231
>
> I believe I fixed that in commit 9b0f182 on the emacs-25 branch.
FWIW, I can confirm that
WORKS: git checkout 9b0f182 && make && src/emacs -eval '(eww "https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org")'
FAILS: git checkout 9b0f182~ && make && src/emacs -eval '(eww "https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org")'
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#22222
; Package
emacs
.
(Wed, 23 Dec 2015 16:11:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 22222 <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 23 Dec 2015 11:31:54 +1100
> From: "Trent W. Buck" <trentbuck <at> gmail.com>
> Cc: Lars Magne Ingebrigtsen <larsi <at> gnus.org>, 22222 <at> debbugs.gnu.org
>
> Eli Zaretskii wrote:
> >> When I open this doc with eww
> >> https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org
> >> it appears that some apostrophes are shown as \342\200\231
> >
> > I believe I fixed that in commit 9b0f182 on the emacs-25 branch.
>
> FWIW, I can confirm that
>
> WORKS: git checkout 9b0f182 && make && src/emacs -eval '(eww "https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org")'
> FAILS: git checkout 9b0f182~ && make && src/emacs -eval '(eww "https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org")'
Thanks for testing.
Reply sent
to
Lars Ingebrigtsen <larsi <at> gnus.org>
:
You have taken responsibility.
(Thu, 24 Dec 2015 19:40:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
trentbuck <at> gmail.com (Trent W. Buck)
:
bug acknowledged by developer.
(Thu, 24 Dec 2015 19:40:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 22222-done <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> Lars, I'd appreciate if you could eyeball the changeset and tell if I
> broke some use case. I couldn't wrap my head around the code there
> that decoded stuff (like encoding with UTF-8 and then decoding with
> something that must NOT be UTF-8 -- how can that do anything useful?),
> so I just rewrote that as best I could. If I broke something, please
> show the broken use case, and I will look into it.
I think it looks OK after your fixes (thanks). That code has had a lot
of back-and-forth after it was adapted to allow the user to
interactively change the charset, which probably explains the
awkwardness of the code.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 22 Jan 2016 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 151 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.