GNU bug report logs -
#23814
24.5; bug of hz coding-system
Previous Next
Reported by: ynyaaa <at> gmail.com
Date: Tue, 21 Jun 2016 12:23:02 UTC
Severity: normal
Found in version 24.5
Fixed in version 26.1
Done: Glenn Morris <rgm <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Hi, sorry for the late response. I've just noticed that my reply mail
didn't go out successfully. I'm trying to re-send it.
I wrote:
> In article <871t2dz22d.fsf <at> gmail.com>, ynyaaa <at> gmail.com writes:
> > If there are unencodable characters, encodable characters may be broken.
> > In this example, the second ?\x4E00 character disappears.
> > (set-language-environment 'Chinese-GB)
> > (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz)
> >>> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273"
>
> How to treat unencodable characters on encoding is a difficult problem.
> As HZ is designed for 7-bit environment, I think it's important to keep
> 7-bit on encoding. So, the new code uses \uXXXX for those characters.
> Another way is to use UTF-8 sequence for them, then we can decode it
> back. Which, do yo think, is better?
>
> > To avoid this behavior, there are some solutions.
> > (a) While decoding, replace "~{...~}" with "\e$A...\e(B"
> > and decode with iso-2022-7bit.
> > (b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding
> > and insert "\e$)A" at the beginning of the temp buffer
> > and decode with iso-2022-8bit-ss2.
> > (8bit data are decoded as euc-cn.)
> > (c) While encoding, use euc-cn instead of iso-2022-7bit
> > and translate each consecutive 8bit data to 7bit data
> > prefixed by "~{" and postfixed by "~}".
>
> I adopted the (a) method for decoding, and fix bugs encoding code.
>
> > By the way, RFC1843 describes:
> > The escape sequence '~\n' is a line-continuation marker to be
> > consumed with no output produced.
>
> The variable decode-hz-line-continuation controls this feature. I don't
> remember why the default is nil (i.e. do not decode ~\n), perhaps some
> Chinese people I was discussing with on implementing HZ support
> suggested that.
>
> Attched is the full china-util.el (not a diff).
>
> ---
> K. Handa
> handa <at> gnu.org
[china-util.el (application/emacs-lisp, attachment)]
This bug report was last modified 8 years and 85 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.