GNU bug report logs - #22222
24.4; eww ignores charset=utf-8 in text/plain

Previous Next

Package: emacs;

Reported by: trentbuck <at> gmail.com (Trent W. Buck)

Date: Tue, 22 Dec 2015 04:13:02 UTC

Severity: normal

Found in version 24.4

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22222 in the body.
You can then email your comments to 22222 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#22222; Package emacs. (Tue, 22 Dec 2015 04:13:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to trentbuck <at> gmail.com (Trent W. Buck):
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 22 Dec 2015 04:13:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: trentbuck <at> gmail.com (Trent W. Buck)
To: bug-gnu-emacs <at> gnu.org
Subject: 24.4; eww ignores charset=utf-8 in text/plain
Date: Tue, 22 Dec 2015 15:11:56 +1100
In #emacs on Freenode, "snotglob" reported that

    When I open this doc with eww
    https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org
    it appears that some apostrophes are shown as \342\200\231

I can reproduce his fault.
I'm using UTF-8 in all the places it can be used.

That URL has this header field:

    Content-Type: text/plain; charset=utf-8

While eww-render correctly extracts the charset,
it doesn't use it:

    (eww-display-raw)

compare:

    (eww-display-html charset url nil point)

I have not actually tested HEAD,
but the issue appears to be present from RTFS:

    http://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/net/eww.el#n347






In GNU Emacs 24.4.1 (x86_64-pc-linux-gnu)
 of 2014-10-26 on trouble, modified by Debian
System Description:	Debian GNU/Linux 8.0 (jessie)

Configured using:
 `configure --build x86_64-linux-gnu --prefix=/usr
 --sharedstatedir=/var/lib --libexecdir=/usr/lib
 --localstatedir=/var/lib --infodir=/usr/share/info
 --mandir=/usr/share/man --with-pop=yes
 --enable-locallisppath=/etc/emacs24:/etc/emacs:/usr/local/share/emacs/24.4/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.4/site-lisp:/usr/share/emacs/site-lisp
 --build x86_64-linux-gnu --prefix=/usr --sharedstatedir=/var/lib
 --libexecdir=/usr/lib --localstatedir=/var/lib
 --infodir=/usr/share/info --mandir=/usr/share/man --with-pop=yes
 --enable-locallisppath=/etc/emacs24:/etc/emacs:/usr/local/share/emacs/24.4/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.4/site-lisp:/usr/share/emacs/site-lisp
 --with-x=no --without-gconf --without-gsettings 'CFLAGS=-g -O2
 -fstack-protector-strong -Wformat -Werror=format-security -Wall'
 CPPFLAGS=-D_FORTIFY_SOURCE=2 LDFLAGS=-Wl,-z,relro'

Important settings:
  value of $LC_COLLATE: C
  value of $LANG: en_AU.utf8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  xterm-mouse-mode: t
  ido-everywhere: t
  savehist-mode: t
  icomplete-mode: t
  show-paren-mode: t
  delete-selection-mode: t
  tooltip-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:


Recent messages:
Loading /usr/share/emacs/site-lisp/magit/magit-install.el (source)...done
Loading /etc/emacs/site-start.d/50magit.el (source)...done
Loading /etc/emacs/site-start.d/50pylint.el (source)...
Loading pylint...done
Loading /etc/emacs/site-start.d/50pylint.el (source)...done
Loading /etc/emacs/site-start.d/50python-docutils.el (source)...done
Loading /etc/emacs/site-start.d/50w3m-el-snapshot.el (source)...done
Loading /etc/emacs/site-start.d/51debian-el.el (source)...done
Loading term/xterm...done
For information about GNU Emacs and the GNU system, type C-h C-a.

Load-path shadows:
/usr/share/emacs/24.4/site-lisp/debian-startup hides /usr/share/emacs/site-lisp/debian-startup
/usr/share/emacs/site-lisp/rst hides /usr/share/emacs/24.4/lisp/textmodes/rst
/usr/share/emacs24/site-lisp/dictionaries-common/ispell hides /usr/share/emacs/24.4/lisp/textmodes/ispell
/usr/share/emacs24/site-lisp/dictionaries-common/flyspell hides /usr/share/emacs/24.4/lisp/textmodes/flyspell

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader xterm advice sendmail rfc2047 rfc2045
ietf-drums mm-util help-fns mail-prsvr mail-utils jka-compr edmacro
kmacro cl-loaddefs cl-lib disp-table xt-mouse ido savehist icomplete
time-date paren delsel saveplace debian-el debian-el-loaddefs w3m-load
pylint compile comint regexp-opt ansi-color ring tool-bar 50magit
emacs-goodies-el emacs-goodies-custom emacs-goodies-loaddefs easy-mmode
dpkg-dev-el dpkg-dev-el-loaddefs tooltip electric uniquify ediff-hook
vc-hooks lisp-float-type tabulated-list newcomment lisp-mode prog-mode
register page menu-bar rfn-eshadow timer select mouse jit-lock font-lock
syntax facemenu font-core frame cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process dbusbind
gfilenotify multi-tty emacs)

Memory information:
((conses 16 102548 4949)
 (symbols 48 19418 0)
 (miscs 40 41 71)
 (strings 32 25459 5386)
 (string-bytes 1 684956)
 (vectors 16 9215)
 (vector-slots 8 355333 14722)
 (floats 8 69 208)
 (intervals 56 261 0)
 (buffers 960 11)
 (heap 1024 8764 721))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22222; Package emacs. (Tue, 22 Dec 2015 17:03:02 GMT) Full text and rfc822 format available.

Message #8 received at 22222 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: trentbuck <at> gmail.com (Trent W. Buck),
 Lars Magne Ingebrigtsen <larsi <at> gnus.org>
Cc: 22222 <at> debbugs.gnu.org
Subject: Re: bug#22222: 24.4; eww ignores charset=utf-8 in text/plain
Date: Tue, 22 Dec 2015 19:03:17 +0200
> From: trentbuck <at> gmail.com (Trent W. Buck)
> Date: Tue, 22 Dec 2015 15:11:56 +1100
> 
> In #emacs on Freenode, "snotglob" reported that
> 
>     When I open this doc with eww
>     https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org
>     it appears that some apostrophes are shown as \342\200\231
> 
> I can reproduce his fault.

It's not that EWW ignored the charset, but it did pretty strange
things with its value, and with decoding of the text in general.

I believe I fixed that in commit 9b0f182 on the emacs-25 branch.

Lars, I'd appreciate if you could eyeball the changeset and tell if I
broke some use case.  I couldn't wrap my head around the code there
that decoded stuff (like encoding with UTF-8 and then decoding with
something that must NOT be UTF-8 -- how can that do anything useful?),
so I just rewrote that as best I could.  If I broke something, please
show the broken use case, and I will look into it.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22222; Package emacs. (Wed, 23 Dec 2015 00:33:02 GMT) Full text and rfc822 format available.

Message #11 received at 22222 <at> debbugs.gnu.org (full text, mbox):

From: "Trent W. Buck" <trentbuck <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Lars Magne Ingebrigtsen <larsi <at> gnus.org>, 22222 <at> debbugs.gnu.org
Subject: Re: bug#22222: 24.4; eww ignores charset=utf-8 in text/plain
Date: Wed, 23 Dec 2015 11:31:54 +1100
Eli Zaretskii wrote:
>>     When I open this doc with eww
>>     https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org
>>     it appears that some apostrophes are shown as \342\200\231
>
> I believe I fixed that in commit 9b0f182 on the emacs-25 branch.

FWIW, I can confirm that

WORKS: git checkout 9b0f182  && make && src/emacs -eval '(eww "https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org")'
FAILS: git checkout 9b0f182~ && make && src/emacs -eval '(eww "https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org")'




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22222; Package emacs. (Wed, 23 Dec 2015 16:11:02 GMT) Full text and rfc822 format available.

Message #14 received at 22222 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Trent W. Buck" <trentbuck <at> gmail.com>
Cc: larsi <at> gnus.org, 22222 <at> debbugs.gnu.org
Subject: Re: bug#22222: 24.4; eww ignores charset=utf-8 in text/plain
Date: Wed, 23 Dec 2015 18:10:59 +0200
> Date: Wed, 23 Dec 2015 11:31:54 +1100
> From: "Trent W. Buck" <trentbuck <at> gmail.com>
> Cc: Lars Magne Ingebrigtsen <larsi <at> gnus.org>, 22222 <at> debbugs.gnu.org
> 
> Eli Zaretskii wrote:
> >>     When I open this doc with eww
> >>     https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org
> >>     it appears that some apostrophes are shown as \342\200\231
> >
> > I believe I fixed that in commit 9b0f182 on the emacs-25 branch.
> 
> FWIW, I can confirm that
> 
> WORKS: git checkout 9b0f182  && make && src/emacs -eval '(eww "https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org")'
> FAILS: git checkout 9b0f182~ && make && src/emacs -eval '(eww "https://raw.githubusercontent.com/howardabrams/pdx-emacs-hackers/master/workshops/keyboard-macros.org")'

Thanks for testing.




Reply sent to Lars Ingebrigtsen <larsi <at> gnus.org>:
You have taken responsibility. (Thu, 24 Dec 2015 19:40:02 GMT) Full text and rfc822 format available.

Notification sent to trentbuck <at> gmail.com (Trent W. Buck):
bug acknowledged by developer. (Thu, 24 Dec 2015 19:40:02 GMT) Full text and rfc822 format available.

Message #19 received at 22222-done <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "Trent W. Buck" <trentbuck <at> gmail.com>, 22222-done <at> debbugs.gnu.org
Subject: Re: bug#22222: 24.4; eww ignores charset=utf-8 in text/plain
Date: Thu, 24 Dec 2015 20:39:20 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> Lars, I'd appreciate if you could eyeball the changeset and tell if I
> broke some use case.  I couldn't wrap my head around the code there
> that decoded stuff (like encoding with UTF-8 and then decoding with
> something that must NOT be UTF-8 -- how can that do anything useful?),
> so I just rewrote that as best I could.  If I broke something, please
> show the broken use case, and I will look into it.

I think it looks OK after your fixes (thanks).  That code has had a lot
of back-and-forth after it was adapted to allow the user to
interactively change the charset, which probably explains the
awkwardness of the code.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 22 Jan 2016 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 151 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.