GNU bug report logs - #33268
26.1.50; Can't decode the utf-8 text file

Previous Next

Package: emacs;

Reported by: Zhang Haijun <ccsmile2008 <at> outlook.com>

Date: Mon, 5 Nov 2018 07:45:01 UTC

Severity: normal

Tags: notabug

Found in version 26.1.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33268 in the body.
You can then email your comments to 33268 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#33268; Package emacs. (Mon, 05 Nov 2018 07:45:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Zhang Haijun <ccsmile2008 <at> outlook.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 05 Nov 2018 07:45:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Zhang Haijun <ccsmile2008 <at> outlook.com>
To: "bug-gnu-emacs <at> gnu.org" <bug-gnu-emacs <at> gnu.org>
Subject: 26.1.50; Can't decode the utf-8 text file
Date: Sun, 4 Nov 2018 08:23:15 +0000
[Message part 1 (text/plain, inline)]
Open the attachment text file with "emacs -Q". There are many 
unrecognized chars(like \342\200\230). Following is the encoding info of 
the buffer.

-------------------------------------------------
= -- no-conversion (alias: binary)

Do no conversion.

When you visit a file with this coding, the file is read into a
unibyte buffer as is, thus each byte of a file is treated as a
character.
Type: raw-text (text with random binary characters)
EOL type: LF
--------------------------------------------------

But if I run the command revert-buffer, then there is no unrecognized 
chars. Encoding info of the buffer becomes:

---------------------------------------------------
U -- utf-8-unix (alias: mule-utf-8-unix cp65001-unix)

UTF-8 (no signature (BOM))
Type: utf-8 (UTF-8: Emacs internal multibyte form)
EOL type: LF
This coding system encodes the following charsets:
   unicode
---------------------------------------------------



In GNU Emacs 26.1.50 (build 4, x86_64-pc-linux-gnu, GTK+ Version 3.22.26)
  of 2018-11-04 built on centos7.home
Repository revision: 7cadb328092e354225149bbc74c2ddaf4b49b638
Windowing system distributor 'The X.Org Foundation', version 11.0.11905000
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Quit [3 times]
user-error: Beginning of history; no preceding item
funcall-interactively: End of buffer

Configured using:
  'configure --prefix=/home/jun/apps/emacs-26 --without-makeinfo
  --with-x-toolkit=gtk3 --with-modules'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GSETTINGS GLIB NOTIFY
LIBSELINUX GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11
XDBE XIM MODULES THREADS

Important settings:
   value of $LANG: en_US.UTF-8
   value of $XMODIFIERS: @im=fcitx
   locale-coding-system: utf-8-unix

Major mode: Text

Minor modes in effect:
   diff-auto-refine-mode: t
   tooltip-mode: t
   global-eldoc-mode: t
   electric-indent-mode: t
   mouse-wheel-mode: t
   tool-bar-mode: t
   menu-bar-mode: t
   file-name-shadow-mode: t
   global-font-lock-mode: t
   font-lock-mode: t
   blink-cursor-mode: t
   auto-composition-mode: t
   auto-encryption-mode: t
   auto-compression-mode: t
   line-number-mode: t
   transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs
format-spec rfc822 mml mml-sec password-cache epa derived epg epg-config
gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mail-utils vc-git diff-mode easymenu
easy-mmode elec-pair time-date mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow isearch timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads dbusbind inotify dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)

Memory information:
((conses 16 98466 14078)
  (symbols 48 20784 1)
  (miscs 40 42 154)
  (strings 32 29892 1511)
  (string-bytes 1 791061)
  (vectors 16 14723)
  (vector-slots 8 510132 7238)
  (floats 8 51 372)
  (intervals 56 222 0)
  (buffers 992 13)
  (heap 1024 24474 3071))
[emacs-26.txt (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33268; Package emacs. (Wed, 07 Nov 2018 16:13:01 GMT) Full text and rfc822 format available.

Message #8 received at 33268 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Zhang Haijun <ccsmile2008 <at> outlook.com>
Cc: 33268 <at> debbugs.gnu.org
Subject: Re: bug#33268: 26.1.50; Can't decode the utf-8 text file
Date: Wed, 07 Nov 2018 18:11:52 +0200
tags 33268 notabug
close 33268
thanks

> From: Zhang Haijun <ccsmile2008 <at> outlook.com>
> Date: Sun, 4 Nov 2018 08:23:15 +0000
> Accept-Language: zh-CN, en-US
> 
> Open the attachment text file with "emacs -Q". There are many 
> unrecognized chars(like \342\200\230). Following is the encoding info of 
> the buffer.

This has been discussed on emacs-devel, see

  http://lists.gnu.org/archive/html/emacs-devel/2018-11/msg00065.html

As explained there, this is not a bug, but intended behavior, when a
file includes null bytes.




Added tag(s) notabug. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 07 Nov 2018 16:13:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 33268 <at> debbugs.gnu.org and Zhang Haijun <ccsmile2008 <at> outlook.com> Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 07 Nov 2018 16:13:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 06 Dec 2018 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 256 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.