GNU bug report logs - #464
23.0.60; [Regression] Implicit utf-8 no longer correctly decoded in gnus

Previous Next

Packages: emacs, gnus;

Reported by: James Cloos <cloos <at> jhcloos.com>

Date: Sun, 22 Jun 2008 06:45:03 UTC

Severity: normal

Done: Lars Magne Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 464 in the body.
You can then email your comments to 464 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#464; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to James Cloos <cloos <at> jhcloos.com>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: James Cloos <cloos <at> jhcloos.com>
To: emacs-pretest-bug <at> gnu.org
Subject: 23.0.60; [Regression] Implicit utf-8 no longer correctly decoded in gnus
Date: Sun, 22 Jun 2008 02:37:34 -0400
Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.

Your bug report will be posted to the emacs-pretest-bug <at> gnu.org mailing list.

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

For the last few weeks gnus no longer correctly displays utf-8 data in
articles or mime-parts which do not explicitly declare themselves to be
utf-8.  Before, this just worked.

It is very common with commit messages that the mime block for the
message has no local headers, or they don’t use mime at all, and also do
not specify a charset in the main headers.  This used to display
correctly, but no longer does so.

As an example, ‘ and ’ display as ^X and ^Y (in the escape-glyph face).

Using C-ug shows that the correct octets are there.

Saving the article or mime block saves the incorrect data, whereas
caching the article (with *) saves the correct octet-stream.  Viewing
the cached article outside of gnus works (at least, of course, in a
UTF-8 locale...).


If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
    `bt full' and `xbacktrace'.
If you would like to further debug the crash, please read the file
/usr/share/emacs/23.0.60/etc/DEBUG for instructions.


In GNU Emacs 23.0.60.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2008-06-21 on lugabout
Windowing system distributor `The X.Org Foundation', version 11.0.10599001
configured using `configure  '--prefix=/usr' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--program-suffix=-emacs-23' '--infodir=/usr/share/info/emacs-23' '--without-carbon' '--with-sound' '--with-x' '--with-toolkit-scroll-bars' '--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff' '--with-xpm' '--enable-font-backend' '--with-freetype' '--with-xft' '--with-libotf' '--with-m17n-flt' '--with-x-toolkit=athena' '--without-hesiod' '--with-kerberos' '--with-kerberos5' '--with-gpm' '--with-dbus' '--build=i686-pc-linux-gnu' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' 'CC=i686-pc-linux-gnu-gcc' 'CFLAGS=-march=pentium3 -O2' 'LDFLAGS= -Wl,--as-needed ''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: C
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: C
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Group

Minor modes in effect:
  gnus-undo-mode: t
  show-paren-mode: t
  display-time-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t

Recent messages:
Auto-saving...done
Making completion list... [2 times]
Quit
Type C-x 1 to remove help window.  
Making completion list...




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#464; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #10 received at 464 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: 464 <at> debbugs.gnu.org
Subject: Re: bug#464: 23.0.60; [Regression] Implicit utf-8 no longer correctly decoded in gnus
Date: Sun, 22 Jun 2008 10:06:08 -0400
> For the last few weeks gnus no longer correctly displays utf-8 data in
> articles or mime-parts which do not explicitly declare themselves to be
> utf-8.  Before, this just worked.

Can you try to pin down the change that broke it?
Using the usual bisection algorithm on dates,


        Stefan




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#464; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to James Cloos <cloos <at> jhcloos.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #15 received at 464 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: James Cloos <cloos <at> jhcloos.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 464 <at> debbugs.gnu.org
Subject: Re: bug#464: 23.0.60; [Regression] Implicit utf-8 no longer correctly decoded in gnus
Date: Mon, 23 Jun 2008 03:58:12 -0400
>>>>> "Stefan" == Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> For the last few weeks gnus no longer correctly displays utf-8 data
>> in articles or mime-parts which do not explicitly declare themselves
>> to be utf-8.  Before, this just worked.

Stefan> Can you try to pin down the change that broke it?
Stefan> Using the usual bisection algorithm on dates,

The best I can say right now is that the last compile that worked was
before the font-backend branch merged, and the first that didn't work
was after that merge.

Gnus takes about an hour to startup, and emacs almost as long to
compile, so I'll have to setup a separate user account to do the
bisecting.  It'll take some time.

-JimC
-- 
James Cloos <cloos <at> jhcloos.com>         OpenPGP: 1024D/ED7DAEA6




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#464; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #20 received at 464 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: James Cloos <cloos <at> jhcloos.com>
Cc: 464 <at> debbugs.gnu.org
Subject: Re: bug#464: 23.0.60; [Regression] Implicit utf-8 no longer correctly decoded in gnus
Date: Mon, 23 Jun 2008 20:36:53 -0400
>>> For the last few weeks gnus no longer correctly displays utf-8 data
>>> in articles or mime-parts which do not explicitly declare themselves
>>> to be utf-8.  Before, this just worked.

Stefan> Can you try to pin down the change that broke it?
Stefan> Using the usual bisection algorithm on dates,

> The best I can say right now is that the last compile that worked was
> before the font-backend branch merged, and the first that didn't work
> was after that merge.

> Gnus takes about an hour to startup, and emacs almost as long to
> compile, so I'll have to setup a separate user account to do the
> bisecting.  It'll take some time.

If you can't bisect, then can you try and provide a simple recipe?


        Stefan


PS: Of course, utf-8 email that is not labelled as such is invalid,
according to the MIME RFCs, so Gnus's current behavior might be
considered as a feature.





Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#464; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to James Cloos <cloos <at> jhcloos.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #25 received at 464 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: James Cloos <cloos <at> jhcloos.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 464 <at> debbugs.gnu.org
Subject: Re: bug#464: 23.0.60; [Regression] Implicit utf-8 no longer correctly decoded in gnus
Date: Mon, 23 Jun 2008 21:33:12 -0400
Stefan> If you can't bisect, then can you try and provide a simple recipe?

The git logs mailed to xorg-commit AT lists.freedesktop.org and several
mails sent to the unicode list are where I most noticed this.

The xorg commit list is on gmane at gmane.comp.freedesktop.xorg.cvs,
so reading that in gnus should do it.  (Check out my recent commit
messages there.)

Hmm.  I just tested the gmane group on my webserver (debian sid,
emacs-snapshot-nox (GNU Emacs 23.0.60.1 (i486-pc-linux-gnu) of
2008-06-21 on elegiac, modified by Debian) running over ssh) and
it worked.

The bug seems to be specific to X terminals.

Seeing the above, I started the server and tried in a tty frame via
emacslcient -t.  It worked correctly when viewing a cached mail.

So nnimap may also be needed to trigger the bug????

The next time I see it in a fresh mail I'll open a tty frame and see
whether it shows up there, too.

-JimC




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#464; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to James Cloos <cloos <at> jhcloos.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #30 received at 464 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: James Cloos <cloos <at> jhcloos.com>
To: 464 <at> debbugs.gnu.org
Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, ding <at> gnus.org
Subject: Re: bug#464: 23.0.60; [Regression] Implicit utf-8 no longer correctly decoded in gnus
Date: Thu, 03 Jul 2008 17:08:09 -0400
I figured out the difference.

If the mail has a transfer encoding of base64 it works correctly.  If it
is 8bit the decoding fails.  I haven't hit on a utf-8 quoted-printable
so am not yet sure whether those work, but I suspect they would.

This suggests the unibyte vs multibyte change that occurred a few weeks
back is the culprit.

The raw message probably needs to be multibyte iff the encoding is 8bit
and the charset is anything which might use more than 8 bits per
character, including at least the utf encodings of the UCS and the
various CJK character sets.

I'll try out (imap-disable-multibyte)¹ and (set-buffer-multibyte) to see
whether those make any difference on such emails.

-JimC

1] Incidently, it seems odd that (imap-disable-multibyte)'s help text
   says that it will:  "Enable multibyte in the current buffer."

-- 
James Cloos <cloos <at> jhcloos.com>         OpenPGP: 1024D/ED7DAEA6




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#464; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to James Cloos <cloos <at> jhcloos.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #35 received at 464 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: James Cloos <cloos <at> jhcloos.com>
To: 464 <at> debbugs.gnu.org
Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, ding <at> gnus.org
Subject: Re: bug#464: 23.0.60; [Regression] Implicit utf-8 no longer correctly decoded in gnus
Date: Thu, 03 Jul 2008 17:29:49 -0400
|> I haven't hit on a utf-8 quoted-printable so am not yet sure whether
|> those work, but I suspect they would.

Just hit a qp and, indeed, they do work.

So it is in fact only mail with Content-Transfer-Encoding: 8bit which
now fail, but which used to work as of four to eight weeks ago, or so.

-JimC
-- 
James Cloos <cloos <at> jhcloos.com>         OpenPGP: 1024D/ED7DAEA6




bug reassigned from package `emacs' to `emacs,gnus'. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> emacsbugs.donarmstrong.com. (Mon, 01 Dec 2008 22:45:03 GMT) Full text and rfc822 format available.

Reply sent to Lars Magne Ingebrigtsen <larsi <at> gnus.org>:
You have taken responsibility. (Thu, 30 Sep 2010 17:34:01 GMT) Full text and rfc822 format available.

Notification sent to James Cloos <cloos <at> jhcloos.com>:
bug acknowledged by developer. (Thu, 30 Sep 2010 17:34:02 GMT) Full text and rfc822 format available.

Message #42 received at 464-close <at> debbugs.gnu.org (full text, mbox):

From: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
To: James Cloos <cloos <at> jhcloos.com>
Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 464-close <at> debbugs.gnu.org,
	ding <at> gnus.org
Subject: Re: bug#464: 23.0.60; [Regression] Implicit utf-8 no longer correctly
	decoded in gnus
Date: Thu, 30 Sep 2010 19:36:24 +0200
James Cloos <cloos <at> jhcloos.com> writes:

> So it is in fact only mail with Content-Transfer-Encoding: 8bit which
> now fail, but which used to work as of four to eight weeks ago, or so.

I'm unable to reproduce this bug with Emacs 24 -- utf-8, 8bit
Content-Transfer-Encoding and no charset= works for me.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi <at> gnus.org * Lars Magne Ingebrigtsen




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 29 Oct 2010 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 233 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.