GNU bug report logs -
#23814
24.5; bug of hz coding-system
Previous Next
Reported by: ynyaaa <at> gmail.com
Date: Tue, 21 Jun 2016 12:23:02 UTC
Severity: normal
Found in version 24.5
Fixed in version 26.1
Done: Glenn Morris <rgm <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
handa <handa <at> gnu.org> writes:
> In article <87twffigzv.fsf <at> gmail.com>, ynyaaa <at> gmail.com writes:
>
>> But I found other bugs about decodings of "~" escape.
>> "~~" and "~{!!~}" should be encoded and decoded as below.
>> "~~" -> "~~~~" -> "~~"
>> "~{!!~}" -> "~~{!!~~}" -> "~{!!~}"
>
>> In really they are encoded properly, but decoded in wrong way.
>> (decode-coding-string (encode-coding-string "~~" 'hz) 'hz)
>>>> "~"
>> (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz)
>>>> #("\x3000" 0 1 (charset chinese-gb2312))
>
> Thank you for finding those bugs. Could you please try the attached
> patch instead?
>
> ---
> K. Handa
> handa <at> gnu.org
If there are unencodable characters, encodable characters may be broken.
In this example, the second ?\x4E00 character disappears.
(set-language-environment 'Chinese-GB)
(decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz)
=> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273"
To avoid this behavior, there are some solutions.
(a) While decoding, replace "~{...~}" with "\e$A...\e(B"
and decode with iso-2022-7bit.
(b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding
and insert "\e$)A" at the beginning of the temp buffer
and decode with iso-2022-8bit-ss2.
(8bit data are decoded as euc-cn.)
(c) While encoding, use euc-cn instead of iso-2022-7bit
and translate each consecutive 8bit data to 7bit data
prefixed by "~{" and postfixed by "~}".
By the way, RFC1843 describes:
The escape sequence '~\n' is a line-continuation marker to be
consumed with no output produced.
This form shoud return "AB".
(decode-coding-string "A~\nB" 'hz)
=> "A\nB"
> diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
> index e531640..9abdae1 100644
> --- a/lisp/language/china-util.el
> +++ b/lisp/language/china-util.el
> @@ -95,7 +95,12 @@ decode-hz-region
> (goto-char (point-min))
> (while (search-forward "~" nil t)
> (setq ch (following-char))
> - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
> + (if (= ch ?{)
> + (search-forward "~}" nil 'move)
> + (when (or (= ch ?\n) (= ch ?~))
> + (delete-char -1)
> + (put-text-property (point) (1+ (point)) 'hz-decoded t)
> + (forward-char 1))))
>
> ;; "^zW...\n" -> Chinese GB2312
> ;; "~{...~}" -> Chinese GB2312
> @@ -104,6 +109,8 @@ decode-hz-region
> (while (re-search-forward hz/zw-start-gb nil t)
> (setq pos (match-beginning 0)
> ch (char-after pos))
> + (if (and (= ch ?~) (get-text-property pos 'hz-decoded))
> + (forward-char 1)
> ;; Record the first position to start conversion.
> (or beg (setq beg pos))
> (end-of-line)
> @@ -122,9 +129,10 @@ decode-hz-region
> t)
> (delete-char -2))
> (setq end (point))
> - (translate-region pos (point) hz-set-msb-table))))
> + (translate-region pos (point) hz-set-msb-table)))))
> (if beg
> (decode-coding-region beg end 'euc-china)))
> + (remove-text-properties (point-min) (point-max) '(hz-decoded nil))
> (- (point-max) (point-min)))))
>
> ;;;###autoload
> @@ -142,6 +150,7 @@ encode-hz-region
> (save-restriction
> (narrow-to-region beg end)
>
> + (put-text-property beg end 'charset 'chinese-gb2312)
> ;; "~" -> "~~"
> (goto-char (point-min))
> (while (search-forward "~" nil t) (insert ?~))
This bug report was last modified 8 years and 140 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.