#23814 - 24.5; bug of hz coding-system

GNU bug report logs - #23814
24.5; bug of hz coding-system

Package: emacs;

Reported by: ynyaaa <at> gmail.com

Date: Tue, 21 Jun 2016 12:23:02 UTC

Severity: normal

Found in version 24.5

Fixed in version 26.1

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: handa <handa <at> gnu.org> To: ynyaaa <at> gmail.com Cc: eliz <at> gnu.org, 23814 <at> debbugs.gnu.org Subject: bug#23814: 24.5; bug of hz coding-system Date: Sun, 14 Aug 2016 20:22:25 +0900

[Message part 1 (text/plain, inline)]

Hi, sorry for the late response. I've just noticed that my reply mail didn't go out successfully. I'm trying to re-send it. I wrote: > In article <871t2dz22d.fsf <at> gmail.com>, ynyaaa <at> gmail.com writes: > > If there are unencodable characters, encodable characters may be broken. > > In this example, the second ?\x4E00 character disappears. > > (set-language-environment 'Chinese-GB) > > (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz) > >>> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273" > > How to treat unencodable characters on encoding is a difficult problem. > As HZ is designed for 7-bit environment, I think it's important to keep > 7-bit on encoding. So, the new code uses \uXXXX for those characters. > Another way is to use UTF-8 sequence for them, then we can decode it > back. Which, do yo think, is better? > > > To avoid this behavior, there are some solutions. > > (a) While decoding, replace "~{...~}" with "\e$A...\e(B" > > and decode with iso-2022-7bit. > > (b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding > > and insert "\e$)A" at the beginning of the temp buffer > > and decode with iso-2022-8bit-ss2. > > (8bit data are decoded as euc-cn.) > > (c) While encoding, use euc-cn instead of iso-2022-7bit > > and translate each consecutive 8bit data to 7bit data > > prefixed by "~{" and postfixed by "~}". > > I adopted the (a) method for decoding, and fix bugs encoding code. > > > By the way, RFC1843 describes: > > The escape sequence '~\n' is a line-continuation marker to be > > consumed with no output produced. > > The variable decode-hz-line-continuation controls this feature. I don't > remember why the default is nil (i.e. do not decode ~\n), perhaps some > Chinese people I was discussing with on implementing HZ support > suggested that. > > Attched is the full china-util.el (not a diff). > > --- > K. Handa > handa <at> gnu.org

[china-util.el (application/emacs-lisp, attachment)]

This bug report was last modified 8 years and 140 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #23814 24.5; bug of hz coding-system

GNU bug report logs - #23814
24.5; bug of hz coding-system