GNU bug report logs - #23814
24.5; bug of hz coding-system

Previous Next

Package: emacs;

Reported by: ynyaaa <at> gmail.com

Date: Tue, 21 Jun 2016 12:23:02 UTC

Severity: normal

Found in version 24.5

Fixed in version 26.1

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: handa <handa <at> gnu.org>
To: ynyaaa <at> gmail.com
Cc: eliz <at> gnu.org, 23814 <at> debbugs.gnu.org
Subject: bug#23814: 24.5; bug of hz coding-system
Date: Sun, 14 Aug 2016 20:22:25 +0900
[Message part 1 (text/plain, inline)]
Hi, sorry for the late response.  I've just noticed that my reply mail
didn't go out successfully.  I'm trying to re-send it.

I wrote:

> In article <871t2dz22d.fsf <at> gmail.com>, ynyaaa <at> gmail.com writes:
> > If there are unencodable characters, encodable characters may be broken.
> > In this example, the second ?\x4E00 character disappears.
> >     (set-language-environment 'Chinese-GB)
> >     (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz)
> >>> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273"
> 
> How to treat unencodable characters on encoding is a difficult problem.
> As HZ is designed for 7-bit environment, I think it's important to keep
> 7-bit on encoding.  So, the new code uses \uXXXX for those characters.
> Another way is to use UTF-8 sequence for them, then we can decode it
> back.  Which, do yo think, is better?
> 
> > To avoid this behavior, there are some solutions.
> > (a) While decoding, replace "~{...~}" with "\e$A...\e(B"
> >     and decode with iso-2022-7bit.
> > (b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding
> >     and insert "\e$)A" at the beginning of the temp buffer
> >     and decode with iso-2022-8bit-ss2.
> >     (8bit data are decoded as euc-cn.)
> > (c) While encoding, use euc-cn instead of iso-2022-7bit
> >     and translate each consecutive 8bit data to 7bit data
> >     prefixed by "~{" and postfixed by "~}".
> 
> I adopted the (a) method for decoding, and fix bugs encoding code.
> 
> > By the way, RFC1843 describes:
> >     The escape sequence '~\n' is a line-continuation marker to be
> >     consumed with no output produced.
> 
> The variable decode-hz-line-continuation controls this feature.  I don't
> remember why the default is nil (i.e. do not decode ~\n), perhaps some
> Chinese people I was discussing with on implementing HZ support
> suggested that.
> 
> Attched is the full china-util.el (not a diff).
> 
> ---
> K. Handa
> handa <at> gnu.org

[china-util.el (application/emacs-lisp, attachment)]

This bug report was last modified 8 years and 85 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.