From debbugs-submit-bounces@debbugs.gnu.org Tue Jun 21 08:22:56 2016 Received: (at submit) by debbugs.gnu.org; 21 Jun 2016 12:22:56 +0000 Received: from localhost ([127.0.0.1]:48703 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFKhj-0001sH-Sg for submit@debbugs.gnu.org; Tue, 21 Jun 2016 08:22:56 -0400 Received: from eggs.gnu.org ([208.118.235.92]:38992) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFKhh-0001s4-Q2 for submit@debbugs.gnu.org; Tue, 21 Jun 2016 08:22:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFKhb-0008C6-9y for submit@debbugs.gnu.org; Tue, 21 Jun 2016 08:22:48 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:32773) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFKhb-0008BX-6f for submit@debbugs.gnu.org; Tue, 21 Jun 2016 08:22:47 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58147) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFKhY-0000FG-JU for bug-gnu-emacs@gnu.org; Tue, 21 Jun 2016 08:22:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFKhT-00089Y-IP for bug-gnu-emacs@gnu.org; Tue, 21 Jun 2016 08:22:44 -0400 Received: from mail-pa0-x235.google.com ([2607:f8b0:400e:c03::235]:34055) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFKhT-00089T-BR for bug-gnu-emacs@gnu.org; Tue, 21 Jun 2016 08:22:39 -0400 Received: by mail-pa0-x235.google.com with SMTP id bz2so6034070pad.1 for ; Tue, 21 Jun 2016 05:22:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:mime-version; bh=95SclJ9ytpw2eXDejBvn2PZjKMH18J3njmiXIxcKDXU=; b=OoUQVmiIhOGcSpI+S99f5ICkuVOrqsxeFXWY5Xx4wqybDs720L8LYHIdn05XTHWioL F8lBz2u5Ccp1IpZXxCvNNZZOPsOR8vyhugoLzO1WVS8yZhWQdQROp7s9T8vPSNq1iLji gDFOA0wEuycUezovdi3n6sQF2JW8w/LIcPW2LuDO+DOXzC53hnB4gQymEOt3q+yrMjmT 7KhXFADogrt3AbC7R9J1GLYzuw9q5Gj3f4JpalTZJUYtmKUiNh8yuuIywe4HTKAgJVwH s+jGenCGcYvut1XkWISR0ajO4CnenC5eFjcRjQHNd4xbC/8DEVcErJW5M1wS1gaZkgjn dnOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:mime-version; bh=95SclJ9ytpw2eXDejBvn2PZjKMH18J3njmiXIxcKDXU=; b=TN7uguiCna6NIm6Qb+wXx6WHdkK1VZxG0hPbwhMT9zd7JpV7iGeRUHTg0oEd9wivu/ dnlgLON7rN5JgPzdDhe4Z+VYcq6awlQDaZySTGX0kU1K6safJbvlN0IlTfL8kSwJ/ZLe Tq5taRWAlTngwKtMBaCZ7U4wSatk8xWagrsUeZvlIMNCoTIfBuzLf3Eqbf6a254myS/K 6oUgC6GUjbOcRap24NVEF21O87dAuPMRXwMB97gtsNAW0ObxkvRWrnE5k/T82CeTBd/B mW0MIw8llRFZxbmcQeTL8N67IvWad38fSYyVRLSmy77BZz874XLheK0oJcmhhvkT1XHB XJQg== X-Gm-Message-State: ALyK8tJuxZxTzGQzDkPwAdaaKLnt+Ai+noUHT1AhO6UOBqI3oHZ+XhDZrZgrNOt0uoUR6Q== X-Received: by 10.66.182.194 with SMTP id eg2mr28243237pac.159.1466511758402; Tue, 21 Jun 2016 05:22:38 -0700 (PDT) Received: from PNUT-PC (east49-p99.eaccess.hi-ho.ne.jp. [219.105.5.100]) by smtp.gmail.com with ESMTPSA id d8sm95542206pfg.72.2016.06.21.05.22.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Jun 2016 05:22:37 -0700 (PDT) From: ynyaaa@gmail.com To: bug-gnu-emacs@gnu.org Subject: 24.5; bug of hz coding-system Date: Tue, 21 Jun 2016 21:22:32 +0900 Message-ID: <877fdiu3xz.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) hz coding-system should encode chinese-gb2312 characters, it may fail to encode text without charset property. current-language-environment =>"Japanese" ;; wrong (encode-coding-string "\x4E00" 'hz) =>"\e$B0l~}" ;; correct (encode-coding-string (propertize "\x4E00" 'charset 'chinese-gb2312) 'hz) =>"~{R;~}" When the second byte of chinese-gb2312 character equals to ?~, hz coding-system may faile to decode. (encode-coding-string (propertize "\x670D" 'charset 'chinese-gb2312) 'hz) =>"~{7~~}" ;; wrong (decode-coding-string "~{7~~}" 'hz) =>"\300\267" In GNU Emacs 24.5.1 (i686-pc-mingw32) of 2015-04-11 on LEG570 Windowing system distributor `Microsoft Corp.', version 6.0.6002 Configured using: `configure --prefix=/c/usr --host=i686-pc-mingw32' Important settings: value of $LANG: JPN locale-coding-system: cp932 Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent messages: Load-path shadows: None found. Features: (network-stream starttls tls mailalias smtpmail auth-source eieio byte-opt bytecomp byte-compile cl-extra cl-loaddefs cl-lib cconv eieio-core password-cache rect warnings china-util misearch multi-isearch pp shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils help-mode easymenu advice help-fns time-date japan-util tooltip electric uniquify ediff-hook vc-hooks lisp-float-type mwheel dos-w32 ls-lisp w32-common-fns disp-table w32-win w32-vars tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment lisp-mode prog-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process w32notify w32 multi-tty emacs) Memory information: ((conses 8 94845 27098) (symbols 32 19573 0) (miscs 32 77 279) (strings 16 16482 13821) (string-bytes 1 462365) (vectors 8 12746) (vector-slots 4 519456 11240) (floats 8 62 556) (intervals 28 606 13) (buffers 508 18)) From debbugs-submit-bounces@debbugs.gnu.org Tue Jun 21 08:59:42 2016 Received: (at 23814) by debbugs.gnu.org; 21 Jun 2016 12:59:42 +0000 Received: from localhost ([127.0.0.1]:48752 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFLHK-0002mo-JY for submit@debbugs.gnu.org; Tue, 21 Jun 2016 08:59:42 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46912) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFLHJ-0002ma-56 for 23814@debbugs.gnu.org; Tue, 21 Jun 2016 08:59:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFLHA-0006h1-T4 for 23814@debbugs.gnu.org; Tue, 21 Jun 2016 08:59:36 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43666) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFLHA-0006gV-Pi; Tue, 21 Jun 2016 08:59:32 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4729 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bFLH8-0005Dy-Oa; Tue, 21 Jun 2016 08:59:31 -0400 Date: Tue, 21 Jun 2016 15:58:39 +0300 Message-Id: <83inx27l6o.fsf@gnu.org> From: Eli Zaretskii To: ynyaaa@gmail.com In-reply-to: <877fdiu3xz.fsf@gmail.com> (ynyaaa@gmail.com) Subject: Re: bug#23814: 24.5; bug of hz coding-system References: <877fdiu3xz.fsf@gmail.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.4 (------) X-Debbugs-Envelope-To: 23814 Cc: 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.4 (------) > From: ynyaaa@gmail.com > Date: Tue, 21 Jun 2016 21:22:32 +0900 > > hz coding-system should encode chinese-gb2312 characters, > it may fail to encode text without charset property. This is by design, and mentioned in the doc string of that coding-system. Since Emacs is Unicode based, the _only_ way of having "chinese-gb2312 characters" is by using that text property. IOW, I don't think this is a bug. From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 22 09:47:13 2016 Received: (at 23814) by debbugs.gnu.org; 22 Jun 2016 13:47:13 +0000 Received: from localhost ([127.0.0.1]:50457 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFiUr-0002yV-KS for submit@debbugs.gnu.org; Wed, 22 Jun 2016 09:47:13 -0400 Received: from mail-pa0-f52.google.com ([209.85.220.52]:35565) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFiUp-0002yJ-TO for 23814@debbugs.gnu.org; Wed, 22 Jun 2016 09:47:12 -0400 Received: by mail-pa0-f52.google.com with SMTP id hl6so17166210pac.2 for <23814@debbugs.gnu.org>; Wed, 22 Jun 2016 06:47:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version; bh=vYq9o0RbkZWqya3tK5dXEY8a0pJJefgfFUEkBQpyG3A=; b=AZrVcd1NzZDT0Xx+q4jwb5noZqvc1LpHfO4pKP+v4eciryegxr/vHZ4h6UTVq5gNSx CwUvIUL5HUXQKemYY+2KLSo/c5CR6A7PbXurrdwlrmsnPePX/dECcT6srQVJIvmCKFCl jzVZN/tiK5sWvSxNrYN92wWEypergi8kLyCtjP9zSdibotjFKVs/oHbLTK5FC03Mk7YX 5iGQP4BZgJY4dgfPFE6/t/6bGXSz5CwFmltyJ4FeSlfegF7DPJH7LAO+6Sg2+ELHrx0y 6Y41RK6ZBYGHGu56P57Jrvhn05pwk6eQucs17DMa/sdghYCIiIXl2X6O4YXunnkNT9/g 3qNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version; bh=vYq9o0RbkZWqya3tK5dXEY8a0pJJefgfFUEkBQpyG3A=; b=UWoZHZweITcV2k/cb9B7hM2LUZeQENyWhv7bbaUL6qOVM9PxfJbhQ0Wn4W0E4QNnQS dbUxxKuDITDNbN/cerallnPgZzhwYJ4GJF11ABz4+NEBeOIAxkoCbbQRMQ38fsor39fP cNDanMN4hE7oh0wo5L20qqngT0aVKnbckzS98dd1O6y1wJIcuUtl6xIFxgMdOJL4XCpX 8s8tMYLqWS/Q626wbcQrWC4WD3JYJ3WQBzf0R0/JSrbh9xBqcRKBsfrQEKW7XvYMwG6j 1fvYYTlICNT9vozmU8YPBQIie3//7hBkfPsSV8LmkK2g1qFN3pEbNmL4TJg0DhzOrY2c /V2g== X-Gm-Message-State: ALyK8tIMuRYKGV2EH1gNpqK3f5MpdUyzr2/wmhXL+ZCWoArbImb719UccfpIpgX34hj2AA== X-Received: by 10.66.83.35 with SMTP id n3mr8075985pay.124.1466603226113; Wed, 22 Jun 2016 06:47:06 -0700 (PDT) Received: from PNUT-PC (east49-p99.eaccess.hi-ho.ne.jp. [219.105.5.100]) by smtp.gmail.com with ESMTPSA id q127sm71561pfb.34.2016.06.22.06.47.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Jun 2016 06:47:05 -0700 (PDT) From: ynyaaa@gmail.com To: Eli Zaretskii Subject: Re: bug#23814: 24.5; bug of hz coding-system Date: Wed, 22 Jun 2016 22:47:00 +0900 Message-ID: <8760t1l4iz.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 23814 Cc: 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Eli Zaretskii writes: > This is by design, and mentioned in the doc string of that > coding-system. Since Emacs is Unicode based, the _only_ way of having > "chinese-gb2312 characters" is by using that text property. `encode-hz-region' uses `iso-2022-7bit' coding-system internally, replacing it with the coding-system below will work. (define-coding-system 'iso-2022-cn-gb "ISO 2022 based 7bit encoding only for Chinese GB2312." :coding-type 'iso-2022 :mnemonic ?C :charset-list '(ascii chinese-gb2312) :designation [(ascii chinese-gb2312) nil nil nil] :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe) ) From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 22 11:29:20 2016 Received: (at 23814) by debbugs.gnu.org; 22 Jun 2016 15:29:20 +0000 Received: from localhost ([127.0.0.1]:51321 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFk5f-0005bT-V6 for submit@debbugs.gnu.org; Wed, 22 Jun 2016 11:29:20 -0400 Received: from eggs.gnu.org ([208.118.235.92]:50927) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFk5e-0005b2-3u for 23814@debbugs.gnu.org; Wed, 22 Jun 2016 11:29:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFk5T-0005YA-Hv for 23814@debbugs.gnu.org; Wed, 22 Jun 2016 11:29:13 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_05,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41900) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFk5T-0005Xt-Et; Wed, 22 Jun 2016 11:29:07 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1885 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bFk5Q-0000aE-3r; Wed, 22 Jun 2016 11:29:05 -0400 Date: Wed, 22 Jun 2016 18:28:15 +0300 Message-Id: <83r3bp450w.fsf@gnu.org> From: Eli Zaretskii To: ynyaaa@gmail.com In-reply-to: <8760t1l4iz.fsf@gmail.com> (ynyaaa@gmail.com) Subject: Re: bug#23814: 24.5; bug of hz coding-system References: <8760t1l4iz.fsf@gmail.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.5 (------) X-Debbugs-Envelope-To: 23814 Cc: 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.5 (------) > > From: ynyaaa@gmail.com > Cc: 23814@debbugs.gnu.org > Date: Wed, 22 Jun 2016 22:47:00 +0900 > > Eli Zaretskii writes: > > > This is by design, and mentioned in the doc string of that > > coding-system. Since Emacs is Unicode based, the _only_ way of having > > "chinese-gb2312 characters" is by using that text property. > > `encode-hz-region' uses `iso-2022-7bit' coding-system internally, > replacing it with the coding-system below will work. > > (define-coding-system 'iso-2022-cn-gb > "ISO 2022 based 7bit encoding only for Chinese GB2312." > :coding-type 'iso-2022 > :mnemonic ?C > :charset-list '(ascii chinese-gb2312) > :designation [(ascii chinese-gb2312) nil nil nil] > :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe) > ) What advantages does this change have? From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 22 13:04:32 2016 Received: (at 23814) by debbugs.gnu.org; 22 Jun 2016 17:04:32 +0000 Received: from localhost ([127.0.0.1]:51377 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFlZn-0007sO-Qd for submit@debbugs.gnu.org; Wed, 22 Jun 2016 13:04:31 -0400 Received: from mail-pa0-f48.google.com ([209.85.220.48]:33083) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFlZm-0007sC-GV for 23814@debbugs.gnu.org; Wed, 22 Jun 2016 13:04:30 -0400 Received: by mail-pa0-f48.google.com with SMTP id b13so18587348pat.0 for <23814@debbugs.gnu.org>; Wed, 22 Jun 2016 10:04:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version; bh=C2+zINr5W6qyY5WZDIYK+PgBdEbDQ5gJPVZd247/Vnc=; b=UP6XsLfrHFspuYDcZ5hadNQo1JrbidhGESt9M436iqSJx4eCD0PdTC0W67bajQDiV1 oFbc5/ZA4RBzjcI/XwvH1s9pisQRdmdVtK7ogOui1SEgRZ8AhpyrwMlwqG1RCZfGgYYw SXV8QKmsPSElx5sUGk8yJrrYJvceuvSI9FIbVRtVf3hVlR887k12OTesT9SRVLzWQIgj g/QyEvPWe5I6pQiLcFuOuN+XMJ9dKgDf/P+FMQbuIkiJUW6q1PryEcVWICEXQVu4kkWn EVjwC/6mY8GjLmL34uJmlQu1G/vr9oa15My15b6jtc4xA0Q6tk+4g1el+efrqONdG4U6 oz8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version; bh=C2+zINr5W6qyY5WZDIYK+PgBdEbDQ5gJPVZd247/Vnc=; b=KlwF3CS+ABwooQflC6M5t30FzD3sHmiOozOT3ZWH60i46A5tw+vZLxmCoWLRyCw7b8 mVqf8sUfQzOXsw9QYzULD6JfGhioEbAExjGz/C+OydN93DI1lSNPzcucUcE4daOBbAlC i/gnQr/rcNrOM31RkdNZTRnw6389+NeowY1X38P5pCB4tku2GXWDyR9Mukq8LRpb9uWe EPoCaD/EN6X1Bw4lupOj61i5U/dOV9B5D+O+kmT8eNu+yT7fmc3okSwairXIT6SSfjaM x29VghwNjqd6uGE1L8RHGB6Qqj9M6K6xTKZz8IyZnNSFGs0IyJlhH9yOzUkE6Ouyyxhr fZhg== X-Gm-Message-State: ALyK8tK31gp8GKm2hzluP4T35Mp8ZZkp6XwCAsqnIVwQKcxI9J+FyUusQV8KAOMf9m77NA== X-Received: by 10.66.218.195 with SMTP id pi3mr35110634pac.83.1466615064618; Wed, 22 Jun 2016 10:04:24 -0700 (PDT) Received: from PNUT-PC (east49-p99.eaccess.hi-ho.ne.jp. [219.105.5.100]) by smtp.gmail.com with ESMTPSA id o12sm1089100pfg.10.2016.06.22.10.04.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Jun 2016 10:04:24 -0700 (PDT) From: ynyaaa@gmail.com To: Eli Zaretskii Subject: Re: bug#23814: 24.5; bug of hz coding-system Date: Thu, 23 Jun 2016 02:04:18 +0900 Message-ID: <87k2hhnoj1.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 23814 Cc: 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Eli Zaretskii writes: >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally, >> replacing it with the coding-system below will work. >> >> (define-coding-system 'iso-2022-cn-gb >> "ISO 2022 based 7bit encoding only for Chinese GB2312." >> :coding-type 'iso-2022 >> :mnemonic ?C >> :charset-list '(ascii chinese-gb2312) >> :designation [(ascii chinese-gb2312) nil nil nil] >> :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe) >> ) > > What advantages does this change have? `iso-2022-7bit' may encode same character to various strings, while `iso-2022-cn-gb' encodes same charcter to same string. (mapcar (lambda (cs) (encode-coding-string (propertize "\x4e00" 'charset cs) 'iso-2022-7bit)) '(chinese-gb2312 japanese-jisx0208 korean-ksc5601 chinese-cns11643-1)) =>("\e$AR;\e(B" "\e$B0l\e(B" "\e$(Cli\e(B" "\e$(GD!\e(B") (mapcar (lambda (cs) (encode-coding-string (propertize "\x4e00" 'charset cs) 'iso-2022-cn-gb)) '(chinese-gb2312 japanese-jisx0208 korean-ksc5601 chinese-cns11643-1)) =>("\e$AR;\e(B" "\e$AR;\e(B" "\e$AR;\e(B" "\e$AR;\e(B") `encode-hz-region' expects `chinese-gb2312' characters are encoded with "\e$A" sequences, and replaces them to "~{". From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 22 13:27:59 2016 Received: (at 23814) by debbugs.gnu.org; 22 Jun 2016 17:27:59 +0000 Received: from localhost ([127.0.0.1]:51401 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFlwV-0008SI-Hw for submit@debbugs.gnu.org; Wed, 22 Jun 2016 13:27:59 -0400 Received: from eggs.gnu.org ([208.118.235.92]:55167) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bFlwU-0008S6-Ot for 23814@debbugs.gnu.org; Wed, 22 Jun 2016 13:27:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFlwO-0000XU-Pi for 23814@debbugs.gnu.org; Wed, 22 Jun 2016 13:27:53 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_20,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43872) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFlwL-0000WU-1o; Wed, 22 Jun 2016 13:27:49 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2194 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bFlwI-0002Li-OW; Wed, 22 Jun 2016 13:27:48 -0400 Date: Wed, 22 Jun 2016 20:26:53 +0300 Message-Id: <83h9cl3zj6.fsf@gnu.org> From: Eli Zaretskii To: ynyaaa@gmail.com, Kenichi Handa In-reply-to: <87k2hhnoj1.fsf@gmail.com> (ynyaaa@gmail.com) Subject: Re: bug#23814: 24.5; bug of hz coding-system References: <87k2hhnoj1.fsf@gmail.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.5 (------) X-Debbugs-Envelope-To: 23814 Cc: 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.5 (------) > From: ynyaaa@gmail.com > Cc: 23814@debbugs.gnu.org > Date: Thu, 23 Jun 2016 02:04:18 +0900 > > Eli Zaretskii writes: > > >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally, > >> replacing it with the coding-system below will work. > >> > >> (define-coding-system 'iso-2022-cn-gb > >> "ISO 2022 based 7bit encoding only for Chinese GB2312." > >> :coding-type 'iso-2022 > >> :mnemonic ?C > >> :charset-list '(ascii chinese-gb2312) > >> :designation [(ascii chinese-gb2312) nil nil nil] > >> :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe) > >> ) > > > > What advantages does this change have? > > `iso-2022-7bit' may encode same character to various strings, > while `iso-2022-cn-gb' encodes same charcter to same string. > > (mapcar (lambda (cs) (encode-coding-string > (propertize "\x4e00" 'charset cs) > 'iso-2022-7bit)) > '(chinese-gb2312 japanese-jisx0208 korean-ksc5601 > chinese-cns11643-1)) > =>("\e$AR;\e(B" > "\e$B0l\e(B" > "\e$(Cli\e(B" > "\e$(GD!\e(B") > > (mapcar (lambda (cs) (encode-coding-string > (propertize "\x4e00" 'charset cs) > 'iso-2022-cn-gb)) > '(chinese-gb2312 japanese-jisx0208 korean-ksc5601 > chinese-cns11643-1)) > =>("\e$AR;\e(B" > "\e$AR;\e(B" > "\e$AR;\e(B" > "\e$AR;\e(B") > > `encode-hz-region' expects `chinese-gb2312' characters are encoded > with "\e$A" sequences, and replaces them to "~{". I understand, but as I said, I think this is by design, and should not be changed. However, maybe I'm missing something, so I'll CC Handa-san and ask him to comment on this proposal and the issue in general. From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 09 07:20:44 2016 Received: (at 23814) by debbugs.gnu.org; 9 Jul 2016 11:20:44 +0000 Received: from localhost ([127.0.0.1]:43273 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bLqJQ-0003FE-0j for submit@debbugs.gnu.org; Sat, 09 Jul 2016 07:20:44 -0400 Received: from eggs.gnu.org ([208.118.235.92]:54927) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bLqJO-0003F2-SC for 23814@debbugs.gnu.org; Sat, 09 Jul 2016 07:20:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bLqJI-0004Op-QV for 23814@debbugs.gnu.org; Sat, 09 Jul 2016 07:20:37 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_05,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:38518) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bLqJD-0004NT-Pr; Sat, 09 Jul 2016 07:20:31 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1554 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bLqJ9-0007MG-7n; Sat, 09 Jul 2016 07:20:28 -0400 Date: Sat, 09 Jul 2016 14:20:19 +0300 Message-Id: <83d1mngirw.fsf@gnu.org> From: Eli Zaretskii To: handa@gnu.org In-reply-to: <83h9cl3zj6.fsf@gnu.org> (message from Eli Zaretskii on Wed, 22 Jun 2016 20:26:53 +0300) Subject: Re: bug#23814: 24.5; bug of hz coding-system References: <87k2hhnoj1.fsf@gmail.com> <83h9cl3zj6.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 23814 Cc: ynyaaa@gmail.com, 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) Ping! Could you please comment on this issue? > Date: Wed, 22 Jun 2016 20:26:53 +0300 > From: Eli Zaretskii > Cc: 23814@debbugs.gnu.org > > > From: ynyaaa@gmail.com > > Cc: 23814@debbugs.gnu.org > > Date: Thu, 23 Jun 2016 02:04:18 +0900 > > > > Eli Zaretskii writes: > > > > >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally, > > >> replacing it with the coding-system below will work. > > >> > > >> (define-coding-system 'iso-2022-cn-gb > > >> "ISO 2022 based 7bit encoding only for Chinese GB2312." > > >> :coding-type 'iso-2022 > > >> :mnemonic ?C > > >> :charset-list '(ascii chinese-gb2312) > > >> :designation [(ascii chinese-gb2312) nil nil nil] > > >> :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe) > > >> ) > > > > > > What advantages does this change have? > > > > `iso-2022-7bit' may encode same character to various strings, > > while `iso-2022-cn-gb' encodes same charcter to same string. > > > > (mapcar (lambda (cs) (encode-coding-string > > (propertize "\x4e00" 'charset cs) > > 'iso-2022-7bit)) > > '(chinese-gb2312 japanese-jisx0208 korean-ksc5601 > > chinese-cns11643-1)) > > =>("\e$AR;\e(B" > > "\e$B0l\e(B" > > "\e$(Cli\e(B" > > "\e$(GD!\e(B") > > > > (mapcar (lambda (cs) (encode-coding-string > > (propertize "\x4e00" 'charset cs) > > 'iso-2022-cn-gb)) > > '(chinese-gb2312 japanese-jisx0208 korean-ksc5601 > > chinese-cns11643-1)) > > =>("\e$AR;\e(B" > > "\e$AR;\e(B" > > "\e$AR;\e(B" > > "\e$AR;\e(B") > > > > `encode-hz-region' expects `chinese-gb2312' characters are encoded > > with "\e$A" sequences, and replaces them to "~{". > > I understand, but as I said, I think this is by design, and should not > be changed. However, maybe I'm missing something, so I'll CC > Handa-san and ask him to comment on this proposal and the issue in > general. From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 13 10:13:19 2016 Received: (at 23814) by debbugs.gnu.org; 13 Jul 2016 14:13:19 +0000 Received: from localhost ([127.0.0.1]:49905 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bNKuZ-0002fL-A1 for submit@debbugs.gnu.org; Wed, 13 Jul 2016 10:13:19 -0400 Received: from eggs.gnu.org ([208.118.235.92]:48065) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bNKuT-0002f2-6X for 23814@debbugs.gnu.org; Wed, 13 Jul 2016 10:13:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bNKuM-0005rV-Tu for 23814@debbugs.gnu.org; Wed, 13 Jul 2016 10:13:03 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_05,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:54948) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bNKuH-0005r0-6N; Wed, 13 Jul 2016 10:12:57 -0400 Received: from fl1-61-203-105-252.iba.mesh.ad.jp ([61.203.105.252]:53558 helo=shatin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bNKuF-00073D-2s; Wed, 13 Jul 2016 10:12:55 -0400 Received: from handa by shatin with local (Exim 4.86_2) (envelope-from ) id 1bNKu8-00057x-00; Wed, 13 Jul 2016 23:12:48 +0900 From: handa To: Eli Zaretskii Subject: Re: bug#23814: 24.5; bug of hz coding-system In-Reply-To: <83d1mngirw.fsf@gnu.org> (message from Eli Zaretskii on Sat, 09 Jul 2016 14:20:19 +0300) Date: Wed, 13 Jul 2016 23:12:47 +0900 Message-ID: <87wpkphbj4.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 23814 Cc: ynyaaa@gmail.com, 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) In article <83d1mngirw.fsf@gnu.org>, Eli Zaretskii writes: > Ping! Could you please comment on this issue? Sorry, I've overlooked that mail. > > > >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally, > > > >> replacing it with the coding-system below will work. > > > >> > > > >> (define-coding-system 'iso-2022-cn-gb > > > >> "ISO 2022 based 7bit encoding only for Chinese GB2312." > > > >> :coding-type 'iso-2022 > > > >> :mnemonic ?C > > > >> :charset-list '(ascii chinese-gb2312) > > > >> :designation [(ascii chinese-gb2312) nil nil nil] > > > >> :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe) > > > >> ) Right. But, as there are already so many iso-2022 based coding systems, I'd like to avoid adding a new one just for encode-hz-region. I think the attached patch is sufficent. Could you please try it? It also fixes the problem of incorrect decoding of "~{7~~}". --- K. Handa handa@gnu.org diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el index e531640..9735bd6 100644 --- a/lisp/language/china-util.el +++ b/lisp/language/china-util.el @@ -95,7 +95,9 @@ decode-hz-region (goto-char (point-min)) (while (search-forward "~" nil t) (setq ch (following-char)) - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))) + (if (= ch ?{) + (search-forward "~}" nil 'move) + (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))) ;; "^zW...\n" -> Chinese GB2312 ;; "~{...~}" -> Chinese GB2312 @@ -141,7 +143,7 @@ encode-hz-region (save-excursion (save-restriction (narrow-to-region beg end) - + (put-text-property beg end 'charset 'chinese-gb2312) ;; "~" -> "~~" (goto-char (point-min)) (while (search-forward "~" nil t) (insert ?~)) From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 23 13:47:42 2016 Received: (at 23814) by debbugs.gnu.org; 23 Jul 2016 17:47:43 +0000 Received: from localhost ([127.0.0.1]:35114 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bR11a-0004ve-Lj for submit@debbugs.gnu.org; Sat, 23 Jul 2016 13:47:42 -0400 Received: from eggs.gnu.org ([208.118.235.92]:56765) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bR11Z-0004vR-Mk for 23814@debbugs.gnu.org; Sat, 23 Jul 2016 13:47:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bR11T-0000qO-OU for 23814@debbugs.gnu.org; Sat, 23 Jul 2016 13:47:36 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:54938) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bR11P-0000pf-T9; Sat, 23 Jul 2016 13:47:31 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2883 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bR11L-0000NI-SS; Sat, 23 Jul 2016 13:47:30 -0400 Date: Sat, 23 Jul 2016 20:47:27 +0300 Message-Id: <83eg6kutzk.fsf@gnu.org> From: Eli Zaretskii To: ynyaaa@gmail.com In-reply-to: <87wpkphbj4.fsf@gnu.org> (message from handa on Wed, 13 Jul 2016 23:12:47 +0900) Subject: Re: bug#23814: 24.5; bug of hz coding-system References: <87wpkphbj4.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 23814 Cc: handa , 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) Ping! Could you please try this patch and see if it solves the problem? > From: handa > Cc: ynyaaa@gmail.com, 23814@debbugs.gnu.org > Date: Wed, 13 Jul 2016 23:12:47 +0900 > > > > > >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally, > > > > >> replacing it with the coding-system below will work. > > > > >> > > > > >> (define-coding-system 'iso-2022-cn-gb > > > > >> "ISO 2022 based 7bit encoding only for Chinese GB2312." > > > > >> :coding-type 'iso-2022 > > > > >> :mnemonic ?C > > > > >> :charset-list '(ascii chinese-gb2312) > > > > >> :designation [(ascii chinese-gb2312) nil nil nil] > > > > >> :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe) > > > > >> ) > > Right. But, as there are already so many iso-2022 based coding systems, > I'd like to avoid adding a new one just for encode-hz-region. I think > the attached patch is sufficent. Could you please try it? It also > fixes the problem of incorrect decoding of "~{7~~}". > > --- > K. Handa > handa@gnu.org > > diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el > index e531640..9735bd6 100644 > --- a/lisp/language/china-util.el > +++ b/lisp/language/china-util.el > @@ -95,7 +95,9 @@ decode-hz-region > (goto-char (point-min)) > (while (search-forward "~" nil t) > (setq ch (following-char)) > - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))) > + (if (= ch ?{) > + (search-forward "~}" nil 'move) > + (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))) > > ;; "^zW...\n" -> Chinese GB2312 > ;; "~{...~}" -> Chinese GB2312 > @@ -141,7 +143,7 @@ encode-hz-region > (save-excursion > (save-restriction > (narrow-to-region beg end) > - > + (put-text-property beg end 'charset 'chinese-gb2312) > ;; "~" -> "~~" > (goto-char (point-min)) > (while (search-forward "~" nil t) (insert ?~)) > > From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 24 04:21:25 2016 Received: (at 23814) by debbugs.gnu.org; 24 Jul 2016 08:21:25 +0000 Received: from localhost ([127.0.0.1]:35221 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bREf6-00029I-U8 for submit@debbugs.gnu.org; Sun, 24 Jul 2016 04:21:25 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:33507) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bREf5-000295-8u for 23814@debbugs.gnu.org; Sun, 24 Jul 2016 04:21:23 -0400 Received: by mail-pf0-f195.google.com with SMTP id i6so9938073pfe.0 for <23814@debbugs.gnu.org>; Sun, 24 Jul 2016 01:21:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version; bh=dsX+Rhtta1B4O4vpckpj9xU/jtkiLl2u0iAiN62hPBE=; b=EuXmGbvFUQ/igTmuBMvHVOnPBScOTCnTGYMiS9isVi6DL1+CFHbvzM4ncwiGv2hODV JZDm4MU0qcIvgQprtbbrHOauauhT0arbZYmC0wiCsO7EYbTTDY97+faGv037VEuFQ8IP sbzH4xZTWkA4IXPR5z/EOyX+Wu9KjvD6TFPgJ5jxfH/DphuNBcdTxaD8Z/4hg8EboNjj +T5bg8w+nnlgIsEMoZmZPCpqezjOju3j48xO0vYo+BmNKunibVXhs8rkOkBo1kMkB2KD FA/mitonLE/cgEiJJAQXMXuCSAy9GAZaKBEMrH3vGfoedJGo3royeKXskUN5t36oP9bj /svg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version; bh=dsX+Rhtta1B4O4vpckpj9xU/jtkiLl2u0iAiN62hPBE=; b=WsJgwYZJCXQxTW5pnX9wLrLOM/UygudUxqaRRaduAFszByN4q/2ZQeC7tWZLzNmhSw fwijwKWmFQlg1MNYy4Ktb748hfPg12V6J7mP7YATlptJ2rGnEw5Copk6J1eOf/azKxFp jX1vomMGyxFzcqqINeSOPuWajWsuXzxTz8zvgzaHsXVpAxvi1B+ohMR6JHEMvEzU9K00 xt9XJv1nf80RDm8ok6rdu68y9rEOudBiEDMlmh0soaIUNwop+ySw72whckpT3Q1PJk/J q5Cax2/IahZcZLW1xdMDgIOenPqEMV55XZHnPnnf4rCtddhpd3dFnNYgW8OGifYCNFmd RvXA== X-Gm-Message-State: AEkoousM4pdZTAAA2FjEKUoMPvnHKFcWq1jqCRwiPUeramxpi8hHHAUySsEdJ7cAdI1htg== X-Received: by 10.98.7.200 with SMTP id 69mr20200292pfh.33.1469348477428; Sun, 24 Jul 2016 01:21:17 -0700 (PDT) Received: from PNUT-PC (east49-p99.eaccess.hi-ho.ne.jp. [219.105.5.100]) by smtp.gmail.com with ESMTPSA id g5sm31426447pfg.0.2016.07.24.01.21.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 24 Jul 2016 01:21:16 -0700 (PDT) From: ynyaaa@gmail.com To: Eli Zaretskii Subject: Re: bug#23814: 24.5; bug of hz coding-system Date: Sun, 24 Jul 2016 17:21:08 +0900 Message-ID: <87twffigzv.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 23814 Cc: handa , 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Eli Zaretskii writes: > Ping! Could you please try this patch and see if it solves the > problem? The patch seems to make better results. But I found other bugs about decodings of "~" escape. "~~" and "~{!!~}" should be encoded and decoded as below. "~~" -> "~~~~" -> "~~" "~{!!~}" -> "~~{!!~~}" -> "~{!!~}" In really they are encoded properly, but decoded in wrong way. (decode-coding-string (encode-coding-string "~~" 'hz) 'hz) => "~" (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz) => #("\x3000" 0 1 (charset chinese-gb2312)) These behaviors are not affected by the patch. >> diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el >> index e531640..9735bd6 100644 >> --- a/lisp/language/china-util.el >> +++ b/lisp/language/china-util.el >> @@ -95,7 +95,9 @@ decode-hz-region >> (goto-char (point-min)) >> (while (search-forward "~" nil t) >> (setq ch (following-char)) >> - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))) >> + (if (= ch ?{) >> + (search-forward "~}" nil 'move) >> + (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))) >> >> ;; "^zW...\n" -> Chinese GB2312 >> ;; "~{...~}" -> Chinese GB2312 >> @@ -141,7 +143,7 @@ encode-hz-region >> (save-excursion >> (save-restriction >> (narrow-to-region beg end) >> - >> + (put-text-property beg end 'charset 'chinese-gb2312) >> ;; "~" -> "~~" >> (goto-char (point-min)) >> (while (search-forward "~" nil t) (insert ?~)) >> >> From debbugs-submit-bounces@debbugs.gnu.org Tue Jul 26 11:09:53 2016 Received: (at 23814) by debbugs.gnu.org; 26 Jul 2016 15:09:53 +0000 Received: from localhost ([127.0.0.1]:38260 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bS3zQ-0007mt-Fr for submit@debbugs.gnu.org; Tue, 26 Jul 2016 11:09:53 -0400 Received: from eggs.gnu.org ([208.118.235.92]:54390) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bS3zL-0007mc-00 for 23814@debbugs.gnu.org; Tue, 26 Jul 2016 11:09:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bS3zE-00082x-Vh for 23814@debbugs.gnu.org; Tue, 26 Jul 2016 11:09:37 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_40,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41082) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bS3zB-00081Z-6A; Tue, 26 Jul 2016 11:09:33 -0400 Received: from fl1-61-203-105-252.iba.mesh.ad.jp ([61.203.105.252]:57872 helo=shatin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bS3zA-000123-Ds; Tue, 26 Jul 2016 11:09:32 -0400 Received: from handa by shatin with local (Exim 4.86_2) (envelope-from ) id 1bS3z3-0001xQ-2H; Wed, 27 Jul 2016 00:09:25 +0900 From: handa To: ynyaaa@gmail.com Subject: Re: bug#23814: 24.5; bug of hz coding-system In-Reply-To: <87twffigzv.fsf@gmail.com> (ynyaaa@gmail.com) Date: Wed, 27 Jul 2016 00:09:24 +0900 Message-ID: <87r3agbfmj.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 23814 Cc: eliz@gnu.org, 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) In article <87twffigzv.fsf@gmail.com>, ynyaaa@gmail.com writes: > But I found other bugs about decodings of "~" escape. > "~~" and "~{!!~}" should be encoded and decoded as below. > "~~" -> "~~~~" -> "~~" > "~{!!~}" -> "~~{!!~~}" -> "~{!!~}" > In really they are encoded properly, but decoded in wrong way. > (decode-coding-string (encode-coding-string "~~" 'hz) 'hz) >>> "~" > (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz) >>> #("\x3000" 0 1 (charset chinese-gb2312)) Thank you for finding those bugs. Could you please try the attached patch instead? --- K. Handa handa@gnu.org diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el index e531640..9abdae1 100644 --- a/lisp/language/china-util.el +++ b/lisp/language/china-util.el @@ -95,7 +95,12 @@ decode-hz-region (goto-char (point-min)) (while (search-forward "~" nil t) (setq ch (following-char)) - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))) + (if (= ch ?{) + (search-forward "~}" nil 'move) + (when (or (= ch ?\n) (= ch ?~)) + (delete-char -1) + (put-text-property (point) (1+ (point)) 'hz-decoded t) + (forward-char 1)))) ;; "^zW...\n" -> Chinese GB2312 ;; "~{...~}" -> Chinese GB2312 @@ -104,6 +109,8 @@ decode-hz-region (while (re-search-forward hz/zw-start-gb nil t) (setq pos (match-beginning 0) ch (char-after pos)) + (if (and (= ch ?~) (get-text-property pos 'hz-decoded)) + (forward-char 1) ;; Record the first position to start conversion. (or beg (setq beg pos)) (end-of-line) @@ -122,9 +129,10 @@ decode-hz-region t) (delete-char -2)) (setq end (point)) - (translate-region pos (point) hz-set-msb-table)))) + (translate-region pos (point) hz-set-msb-table))))) (if beg (decode-coding-region beg end 'euc-china))) + (remove-text-properties (point-min) (point-max) '(hz-decoded nil)) (- (point-max) (point-min))))) ;;;###autoload @@ -142,6 +150,7 @@ encode-hz-region (save-restriction (narrow-to-region beg end) + (put-text-property beg end 'charset 'chinese-gb2312) ;; "~" -> "~~" (goto-char (point-min)) (while (search-forward "~" nil t) (insert ?~)) From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 28 21:05:31 2016 Received: (at 23814) by debbugs.gnu.org; 29 Jul 2016 01:05:31 +0000 Received: from localhost ([127.0.0.1]:49818 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSwF0-0001ut-Oo for submit@debbugs.gnu.org; Thu, 28 Jul 2016 21:05:30 -0400 Received: from mail-pa0-f67.google.com ([209.85.220.67]:33386) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSwEx-0001ud-Ai for 23814@debbugs.gnu.org; Thu, 28 Jul 2016 21:05:29 -0400 Received: by mail-pa0-f67.google.com with SMTP id q2so4316525pap.0 for <23814@debbugs.gnu.org>; Thu, 28 Jul 2016 18:05:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version; bh=xQWk9CbWpMDw2WLrfDJ+3v2Rr+jJeApcYC6IMQvsTIM=; b=bGndvQzxxtpIWR9cpEjnqxNxhXwe9scKNKzoYGuwmTKynuMJwfIiLMpKo0IkVnB29F 44DU+qU7+VZWH/EYh3INNShqdAUmMLbyS6j1orV4rz60xV+h3kB6KjR41yg4rM4WZI4z MCNzYl3BZYbjjKbvC9djb+2h9CuRcj1dpN5dfE2t6yhlnSvsWnLyquUK0I4ASZ1jFPIJ sRDfFeUQlMSXrEJ8/PPTbmY305i/SXidTQmlfZVaBfs7pVy+qj7R30dfueqn+LIvpCrI x5ppZ+PQrBeNUOVhCmwfPJKz7y36escqhUbedbhdUcjpqVefR49uOJlLf1BBL5OuzI8t QRUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version; bh=xQWk9CbWpMDw2WLrfDJ+3v2Rr+jJeApcYC6IMQvsTIM=; b=Wi4TolwrUTZ1pUlu7srkoU7HM1tks1T+yyARvZG9CGNfJ28fdCAG2A16MwjSK2ND4i MeJX4xW4HUApiv1yjTaHyW5M+n5QG5G7883nA9fFEOTknwimMDHB5+CMJMmoX09Rit1r Totcwm291fwRl+lW5znPmANnDJ/5m7f9p7Khuq3ZgCj+QZtFqkEhJAz1RIsHdXiBG4YD UXwVWPtsLr97uM4ppjLmmirUNKX4LD7290MUhuEQunEvT37u5ft+GMdaQtSpu1n7eiuF L8/0rYT7RUXiagG7jvT8xxFsj9XqlyntoMsVPrvqFeQbaJCeHiws/C4bLZgHfjnc44wq JhsQ== X-Gm-Message-State: AEkoouuVJJZ6KV0+Sapy/cbaNb1/pAqtZV0ksdia/p+9TCeVYcleys89Oe0WmBglhlyvdg== X-Received: by 10.66.172.237 with SMTP id bf13mr65082966pac.42.1469754321242; Thu, 28 Jul 2016 18:05:21 -0700 (PDT) Received: from PNUT-PC (east49-p99.eaccess.hi-ho.ne.jp. [219.105.5.100]) by smtp.gmail.com with ESMTPSA id q26sm19874134pfj.53.2016.07.28.18.05.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Jul 2016 18:05:20 -0700 (PDT) From: ynyaaa@gmail.com To: handa Subject: Re: bug#23814: 24.5; bug of hz coding-system Date: Fri, 29 Jul 2016 10:05:14 +0900 Message-ID: <871t2dz22d.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 23814 Cc: eliz@gnu.org, 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) handa writes: > In article <87twffigzv.fsf@gmail.com>, ynyaaa@gmail.com writes: > >> But I found other bugs about decodings of "~" escape. >> "~~" and "~{!!~}" should be encoded and decoded as below. >> "~~" -> "~~~~" -> "~~" >> "~{!!~}" -> "~~{!!~~}" -> "~{!!~}" > >> In really they are encoded properly, but decoded in wrong way. >> (decode-coding-string (encode-coding-string "~~" 'hz) 'hz) >>>> "~" >> (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz) >>>> #("\x3000" 0 1 (charset chinese-gb2312)) > > Thank you for finding those bugs. Could you please try the attached > patch instead? > > --- > K. Handa > handa@gnu.org If there are unencodable characters, encodable characters may be broken. In this example, the second ?\x4E00 character disappears. (set-language-environment 'Chinese-GB) (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz) => "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273" To avoid this behavior, there are some solutions. (a) While decoding, replace "~{...~}" with "\e$A...\e(B" and decode with iso-2022-7bit. (b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding and insert "\e$)A" at the beginning of the temp buffer and decode with iso-2022-8bit-ss2. (8bit data are decoded as euc-cn.) (c) While encoding, use euc-cn instead of iso-2022-7bit and translate each consecutive 8bit data to 7bit data prefixed by "~{" and postfixed by "~}". By the way, RFC1843 describes: The escape sequence '~\n' is a line-continuation marker to be consumed with no output produced. This form shoud return "AB". (decode-coding-string "A~\nB" 'hz) => "A\nB" > diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el > index e531640..9abdae1 100644 > --- a/lisp/language/china-util.el > +++ b/lisp/language/china-util.el > @@ -95,7 +95,12 @@ decode-hz-region > (goto-char (point-min)) > (while (search-forward "~" nil t) > (setq ch (following-char)) > - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))) > + (if (= ch ?{) > + (search-forward "~}" nil 'move) > + (when (or (= ch ?\n) (= ch ?~)) > + (delete-char -1) > + (put-text-property (point) (1+ (point)) 'hz-decoded t) > + (forward-char 1)))) > > ;; "^zW...\n" -> Chinese GB2312 > ;; "~{...~}" -> Chinese GB2312 > @@ -104,6 +109,8 @@ decode-hz-region > (while (re-search-forward hz/zw-start-gb nil t) > (setq pos (match-beginning 0) > ch (char-after pos)) > + (if (and (= ch ?~) (get-text-property pos 'hz-decoded)) > + (forward-char 1) > ;; Record the first position to start conversion. > (or beg (setq beg pos)) > (end-of-line) > @@ -122,9 +129,10 @@ decode-hz-region > t) > (delete-char -2)) > (setq end (point)) > - (translate-region pos (point) hz-set-msb-table)))) > + (translate-region pos (point) hz-set-msb-table))))) > (if beg > (decode-coding-region beg end 'euc-china))) > + (remove-text-properties (point-min) (point-max) '(hz-decoded nil)) > (- (point-max) (point-min))))) > > ;;;###autoload > @@ -142,6 +150,7 @@ encode-hz-region > (save-restriction > (narrow-to-region beg end) > > + (put-text-property beg end 'charset 'chinese-gb2312) > ;; "~" -> "~~" > (goto-char (point-min)) > (while (search-forward "~" nil t) (insert ?~)) From debbugs-submit-bounces@debbugs.gnu.org Sun Aug 14 07:22:44 2016 Received: (at 23814) by debbugs.gnu.org; 14 Aug 2016 11:22:44 +0000 Received: from localhost ([127.0.0.1]:56807 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bYtV6-0002Zx-2Q for submit@debbugs.gnu.org; Sun, 14 Aug 2016 07:22:44 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52700) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bYtV4-0002Zi-T9 for 23814@debbugs.gnu.org; Sun, 14 Aug 2016 07:22:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bYtUy-0007GL-Q9 for 23814@debbugs.gnu.org; Sun, 14 Aug 2016 07:22:37 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44307) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bYtUu-0007G0-W0; Sun, 14 Aug 2016 07:22:33 -0400 Received: from fl1-122-134-89-8.iba.mesh.ad.jp ([122.134.89.8]:45590 helo=shatin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bYtUs-00038G-Pu; Sun, 14 Aug 2016 07:22:31 -0400 Received: from handa by shatin with local (Exim 4.86_2) (envelope-from ) id 1bYtUn-0002CH-RZ; Sun, 14 Aug 2016 20:22:25 +0900 From: handa To: ynyaaa@gmail.com Subject: Re: bug#23814: 24.5; bug of hz coding-system In-Reply-To: <871t2dz22d.fsf@gmail.com> (ynyaaa@gmail.com) Date: Sun, 14 Aug 2016 20:22:25 +0900 Message-ID: <87bn0vzjbi.fsf@gnu.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 23814 Cc: eliz@gnu.org, 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.6 (-----) --=-=-= Content-Type: text/plain Hi, sorry for the late response. I've just noticed that my reply mail didn't go out successfully. I'm trying to re-send it. I wrote: > In article <871t2dz22d.fsf@gmail.com>, ynyaaa@gmail.com writes: > > If there are unencodable characters, encodable characters may be broken. > > In this example, the second ?\x4E00 character disappears. > > (set-language-environment 'Chinese-GB) > > (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz) > >>> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273" > > How to treat unencodable characters on encoding is a difficult problem. > As HZ is designed for 7-bit environment, I think it's important to keep > 7-bit on encoding. So, the new code uses \uXXXX for those characters. > Another way is to use UTF-8 sequence for them, then we can decode it > back. Which, do yo think, is better? > > > To avoid this behavior, there are some solutions. > > (a) While decoding, replace "~{...~}" with "\e$A...\e(B" > > and decode with iso-2022-7bit. > > (b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding > > and insert "\e$)A" at the beginning of the temp buffer > > and decode with iso-2022-8bit-ss2. > > (8bit data are decoded as euc-cn.) > > (c) While encoding, use euc-cn instead of iso-2022-7bit > > and translate each consecutive 8bit data to 7bit data > > prefixed by "~{" and postfixed by "~}". > > I adopted the (a) method for decoding, and fix bugs encoding code. > > > By the way, RFC1843 describes: > > The escape sequence '~\n' is a line-continuation marker to be > > consumed with no output produced. > > The variable decode-hz-line-continuation controls this feature. I don't > remember why the default is nil (i.e. do not decode ~\n), perhaps some > Chinese people I was discussing with on implementing HZ support > suggested that. > > Attched is the full china-util.el (not a diff). > > --- > K. Handa > handa@gnu.org --=-=-= Content-Type: application/emacs-lisp; charset=utf-8 Content-Disposition: attachment; filename=china-util.el Content-Transfer-Encoding: quoted-printable ;;; china-util.el --- utilities for Chinese -*- coding: utf-8 -*- ;; Copyright (C) 1995, 2001-2016 Free Software Foundation, Inc. ;; Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, ;; 2005, 2006, 2007, 2008, 2009, 2010, 2011 ;; National Institute of Advanced Industrial Science and Technology (AIST) ;; Registration Number H14PRO021 ;; Copyright (C) 2003 ;; National Institute of Advanced Industrial Science and Technology (AIST) ;; Registration Number H13PRO009 ;; Keywords: mule, multilingual, Chinese ;; This file is part of GNU Emacs. ;; GNU Emacs is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; GNU Emacs is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs. If not, see . ;;; Commentary: ;;; Code: ;; Hz/ZW/EUC-TW encoding stuff ;; HZ is an encoding method for Chinese character set GB2312 used ;; widely in Internet. It is very similar to 7-bit environment of ;; ISO-2022. The difference is that HZ uses the sequence "~{" and ;; "~}" for designating GB2312 and ASCII respectively, hence, it ;; doesn't uses ESC (0x1B) code. ;; ZW is another encoding method for Chinese character set GB2312. It ;; encodes Chinese characters line by line by starting each line with ;; the sequence "zW". It also uses only 7-bit as HZ. ;; EUC-TW is similar to EUC-KS or EUC-JP. Its main character set is ;; plane 1 of CNS 11643; characters of planes 2 to 7 are accessed with ;; a single shift escape followed by three bytes: the first gives the ;; plane, the second and third the character code. Note that characters ;; of plane 1 are (redundantly) accessible with a single shift escape ;; also. ;; ISO-2022 escape sequence to designate GB2312. (defvar iso2022-gb-designation "\e$A") ;; HZ escape sequence to designate GB2312. (defvar hz-gb-designation "~{") ;; ISO-2022 escape sequence to designate ASCII. (defvar iso2022-ascii-designation "\e(B") ;; HZ escape sequence to designate ASCII. (defvar hz-ascii-designation "~}") ;; Regexp of ZW sequence to start GB2312. (defvar zw-start-gb "^zW") ;; Regexp for start of GB2312 in an encoding mixture of HZ and ZW. (defvar hz/zw-start-gb (concat hz-gb-designation "\\|" zw-start-gb "\\|[^\0-\177]")) (defvar decode-hz-line-continuation nil "Flag to tell if we should care line continuation convention of Hz.") (defconst hz-set-msb-table (eval-when-compile (let ((chars nil) (i 0)) (while (< i 33) (push i chars) (setq i (1+ i))) (while (< i 127) (push (decode-char 'eight-bit (+ i 128)) chars) (setq i (1+ i))) (apply 'string (nreverse chars))))) ;;;###autoload (defun decode-hz-region (beg end) "Decode HZ/ZW encoded text in the current region. Return the length of resulting text." (interactive "r") (save-excursion (save-restriction (let (pos ch) (narrow-to-region beg end) ;; We, at first, convert HZ/ZW to `iso-2022-7bit', ;; then decode it. ;; "~\n" -> "", "~~" -> "~" (goto-char (point-min)) (while (search-forward "~" nil t) (setq ch (following-char)) (cond ((=3D ch ?{) (delete-region (1- (point)) (1+ (point))) (setq pos (point)) (insert iso2022-gb-designation) (if (looking-at "\\([!-}][!-~]\\)*") (goto-char (match-end 0))) (if (looking-at hz-ascii-designation) (delete-region (match-beginning 0) (match-end 0))) (insert iso2022-ascii-designation) (decode-coding-region pos (point) 'iso-2022-7bit)) ((=3D ch ?~) (delete-char 1)) ((and (=3D ch ?\n) decode-hz-line-continuation) (delete-region (1- (point)) (1+ (point)))) (t (forward-char 1))))) (- (point-max) (point-min))))) ;;;###autoload (defun decode-hz-buffer () "Decode HZ/ZW encoded text in the current buffer." (interactive) (decode-hz-region (point-min) (point-max))) (defvar hz-category-table nil) ;;;###autoload (defun encode-hz-region (beg end) "Encode the text in the current region to HZ. Return the length of resulting text." (interactive "r") (unless hz-category-table (setq hz-category-table (make-category-table)) (with-category-table hz-category-table (define-category ?c "hz encodable") (map-charset-chars #'modify-category-entry 'ascii ?c) (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c))) (save-excursion (save-restriction (narrow-to-region beg end) (with-category-table hz-category-table ;; ~ -> ~~ (goto-char (point-min)) (while (search-forward "~" nil t) (insert ?~)) ;; ESC -> ESC ESC (goto-char (point-min)) (while (search-forward "\e" nil t) (insert ?\e)) ;; Non-ASCII-GB2312 -> \uXXXX (goto-char (point-min)) (while (re-search-forward "\\Cc" nil t) (let ((ch (preceding-char))) (delete-char -1) (insert (format "\\u%04X" ch)))) ;; Prefer chinese-gb2312 for Chinese characters. (put-text-property (point-min) (point-max) 'charset 'chinese-gb2312) (encode-coding-region (point-min) (point-max) 'iso-2022-7bit) ;; ESC $ B ... ESC ( B -> ~{ ... ~} ;; ESC ESC -> ESC (goto-char (point-min)) (while (search-forward "\e" nil t) (if (=3D (following-char) ?\e) ;; ESC ESC -> ESC (delete-char 1) (forward-char -1) (if (looking-at iso2022-gb-designation) (progn (delete-region (match-beginning 0) (match-end 0)) (insert hz-gb-designation) (search-forward iso2022-ascii-designation nil 'move) (delete-region (match-beginning 0) (match-end 0)) (insert hz-ascii-designation)))))) (- (point-max) (point-min))))) ;;;###autoload (defun encode-hz-buffer () "Encode the text in the current buffer to HZ." (interactive) (encode-hz-region (point-min) (point-max))) ;;;###autoload (defun post-read-decode-hz (len) (let ((pos (point)) (buffer-modified-p (buffer-modified-p)) last-coding-system-used) (prog1 (decode-hz-region pos (+ pos len)) (set-buffer-modified-p buffer-modified-p)))) ;;;###autoload (defun pre-write-encode-hz (from to) (let ((buf (current-buffer))) (set-buffer (generate-new-buffer " *temp*")) (if (stringp from) (insert from) (insert-buffer-substring buf from to)) (let (last-coding-system-used) (encode-hz-region 1 (point-max))) nil)) ;; (provide 'china-util) ;;; china-util.el ends here --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 17 02:33:43 2016 Received: (at 23814) by debbugs.gnu.org; 17 Aug 2016 06:33:43 +0000 Received: from localhost ([127.0.0.1]:59693 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bZuQ3-0002Kt-EZ for submit@debbugs.gnu.org; Wed, 17 Aug 2016 02:33:43 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:35239) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bZuQ1-0002Kg-Pq for 23814@debbugs.gnu.org; Wed, 17 Aug 2016 02:33:42 -0400 Received: by mail-pf0-f195.google.com with SMTP id h186so7017396pfg.2 for <23814@debbugs.gnu.org>; Tue, 16 Aug 2016 23:33:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version; bh=Eu5DywaLGLDKvRYIymzDitYfBmUwIkj13JMy3cwHRLA=; b=cqPPudDwO9s5rU6D8IIuL25VMo4Nowl+4FZ/SrvZbthSfyj5X/oOJnmzVNBWwvzjmE IigXVRrm+DUDsv9JASumOYRU2tTjcnRnwspDkFw1Nl/BjDjMmyxXEoFPvHDspf9lqjpS BsohWF5jBksMbJQYHdJBuGP0frklaXeFr56Dnecd61PoTMHWnO5t656FZGyRDMiWJG9t IkfGKYETD41t59GSXe3nAumgmUqLZ65clqVh5uw6F4XqW3bvb1GXgmQ8op7L72cDucdm 9tk7wGZnt75TPMrlOiZcgfZRjfbI8fMfo1hY762lsZf26Qg4EzH+B6TYNzWUXM5Or+Lo dzKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version; bh=Eu5DywaLGLDKvRYIymzDitYfBmUwIkj13JMy3cwHRLA=; b=dMVcqpNoKEUEx23k8MWUJEPpTLyHKhL8JCl/gsahn8ooIHItV6OF97FWulgXKwZfJE cFBkZBYsWVW7o5KiCq6ISHwbHKgBpYNbs5sCyRRrWlqL08kMtNkGaCIqrieokdxB32kL 16dwNYRlIR0LOKb4caVhwxXkO/KAzfzgPePksitSz9NEz1JV1g4OkeClT5pPJk7gxHpb QPqy7lk8Hgs64R0w82OeczS7NXOg1qpQpGC0jpADRjQr8Y6nDh6SpasxKbjjefYXD7ip BXZF8GYknbnmVezc7RUJd5fBJJHiVHlMySOuANsmG0kbjwIoUeDV1zJbmPKKm5V9u5ld UwAQ== X-Gm-Message-State: AEkoouvVfjX0Edxw9jQE5IPzvtzGgLK0GmIHSF4ip5QqOYY0qq3nIbulm0hPtrNaUH/vUw== X-Received: by 10.98.65.139 with SMTP id g11mr14802208pfd.140.1471415615944; Tue, 16 Aug 2016 23:33:35 -0700 (PDT) Received: from PNUT-PC (east49-p99.eaccess.hi-ho.ne.jp. [219.105.5.100]) by smtp.gmail.com with ESMTPSA id 75sm43921471pfw.92.2016.08.16.23.33.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Aug 2016 23:33:35 -0700 (PDT) From: ynyaaa@gmail.com To: handa Subject: Re: bug#23814: 24.5; bug of hz coding-system Date: Wed, 17 Aug 2016 15:33:29 +0900 Message-ID: <87oa4rdhvq.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 23814 Cc: eliz@gnu.org, 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Hi, I tried new china-util.el. It works very well. handa writes: > Hi, sorry for the late response. I've just noticed that my reply mail > didn't go out successfully. I'm trying to re-send it. >> How to treat unencodable characters on encoding is a difficult problem. >> As HZ is designed for 7-bit environment, I think it's important to keep >> 7-bit on encoding. So, the new code uses \uXXXX for those characters. >> Another way is to use UTF-8 sequence for them, then we can decode it >> back. Which, do yo think, is better? I prefer 7bit encoding to use only 7bit data, too. As for elisp, "\u12345" is treated as "\u1234\ 5". From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 17 10:43:39 2016 Received: (at 23814) by debbugs.gnu.org; 17 Aug 2016 14:43:39 +0000 Received: from localhost ([127.0.0.1]:60367 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba247-0002VN-GH for submit@debbugs.gnu.org; Wed, 17 Aug 2016 10:43:39 -0400 Received: from eggs.gnu.org ([208.118.235.92]:48617) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba245-0002VB-O1 for 23814@debbugs.gnu.org; Wed, 17 Aug 2016 10:43:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ba23z-0000ml-DZ for 23814@debbugs.gnu.org; Wed, 17 Aug 2016 10:43:28 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_40,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41002) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba23u-0000m3-NQ; Wed, 17 Aug 2016 10:43:22 -0400 Received: from fl1-122-134-89-8.iba.mesh.ad.jp ([122.134.89.8]:44844 helo=shatin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ba23s-00047V-UV; Wed, 17 Aug 2016 10:43:21 -0400 Received: from handa by shatin with local (Exim 4.86_2) (envelope-from ) id 1ba23m-000497-0f; Wed, 17 Aug 2016 23:43:14 +0900 From: handa To: ynyaaa@gmail.com Subject: Re: bug#23814: 24.5; bug of hz coding-system In-Reply-To: <87oa4rdhvq.fsf@gmail.com> (ynyaaa@gmail.com) Date: Wed, 17 Aug 2016 23:43:13 +0900 Message-ID: <87bn0rjw1q.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 23814 Cc: eliz@gnu.org, 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.6 (-----) In article <87oa4rdhvq.fsf@gmail.com>, ynyaaa@gmail.com writes: > Hi, I tried new china-util.el. It works very well. Thank you for testing it. > I prefer 7bit encoding to use only 7bit data, too. > As for elisp, "\u12345" is treated as "\u1234\ 5". Ah, ok, I changed to encode characters not in BMP to \UXXXXXXXX. I've just committed the attached change. --- K. Handa handa@gnu.org 2016-08-17 handa * lisp/language/china-util.el (decode-hz-region): Pay attention to "~~}" sequence at the end of Chinese character range. (hz-category-table): New variable. (encode-hz-region): Convert non-encodable characters to \u... and \U... Preserve ESC on ecoding. Put `chinese-gb2312' `charset' text property in advance to force iso-2022-encoding to select chinese-gb2312 designation. diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el index e531640..6505fb8 100644 --- a/lisp/language/china-util.el +++ b/lisp/language/china-util.el @@ -88,43 +88,34 @@ decode-hz-region (let (pos ch) (narrow-to-region beg end) - ;; We, at first, convert HZ/ZW to `euc-china', + ;; We, at first, convert HZ/ZW to `iso-2022-7bit', ;; then decode it. - ;; "~\n" -> "\n", "~~" -> "~" + ;; "~\n" -> "", "~~" -> "~" (goto-char (point-min)) (while (search-forward "~" nil t) (setq ch (following-char)) - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))) + (cond ((= ch ?{) + (delete-region (1- (point)) (1+ (point))) + (setq pos (point)) + (insert iso2022-gb-designation) + (if (looking-at "\\([!-}][!-~]\\)*") + (goto-char (match-end 0))) + (if (looking-at hz-ascii-designation) + (delete-region (match-beginning 0) (match-end 0))) + (insert iso2022-ascii-designation) + (decode-coding-region pos (point) 'iso-2022-7bit)) + + ((= ch ?~) + (delete-char 1)) + + ((and (= ch ?\n) + decode-hz-line-continuation) + (delete-region (1- (point)) (1+ (point)))) + + (t + (forward-char 1))))) - ;; "^zW...\n" -> Chinese GB2312 - ;; "~{...~}" -> Chinese GB2312 - (goto-char (point-min)) - (setq beg nil) - (while (re-search-forward hz/zw-start-gb nil t) - (setq pos (match-beginning 0) - ch (char-after pos)) - ;; Record the first position to start conversion. - (or beg (setq beg pos)) - (end-of-line) - (setq end (point)) - (if (>= ch 128) ; 8bit GB2312 - nil - (goto-char pos) - (delete-char 2) - (setq end (- end 2)) - (if (= ch ?z) ; ZW -> euc-china - (progn - (translate-region (point) end hz-set-msb-table) - (goto-char end)) - (if (search-forward hz-ascii-designation - (if decode-hz-line-continuation nil end) - t) - (delete-char -2)) - (setq end (point)) - (translate-region pos (point) hz-set-msb-table)))) - (if beg - (decode-coding-region beg end 'euc-china))) (- (point-max) (point-min))))) ;;;###autoload @@ -133,33 +124,57 @@ decode-hz-buffer (interactive) (decode-hz-region (point-min) (point-max))) +(defvar hz-category-table nil) + ;;;###autoload (defun encode-hz-region (beg end) "Encode the text in the current region to HZ. Return the length of resulting text." (interactive "r") + (unless hz-category-table + (setq hz-category-table (make-category-table)) + (with-category-table hz-category-table + (define-category ?c "hz encodable") + (map-charset-chars #'modify-category-entry 'ascii ?c) + (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c))) (save-excursion (save-restriction (narrow-to-region beg end) + (with-category-table hz-category-table + ;; ~ -> ~~ + (goto-char (point-min)) + (while (search-forward "~" nil t) (insert ?~)) + + ;; ESC -> ESC ESC + (goto-char (point-min)) + (while (search-forward "\e" nil t) (insert ?\e)) - ;; "~" -> "~~" - (goto-char (point-min)) - (while (search-forward "~" nil t) (insert ?~)) - - ;; Chinese GB2312 -> "~{...~}" - (goto-char (point-min)) - (if (re-search-forward "\\cc" nil t) - (let (pos) - (goto-char (setq pos (match-beginning 0))) - (encode-coding-region pos (point-max) 'iso-2022-7bit) - (goto-char pos) - (while (search-forward iso2022-gb-designation nil t) - (delete-char -3) - (insert hz-gb-designation)) - (goto-char pos) - (while (search-forward iso2022-ascii-designation nil t) - (delete-char -3) - (insert hz-ascii-designation)))) + ;; Non-ASCII-GB2312 -> \uXXXX + (goto-char (point-min)) + (while (re-search-forward "\\Cc" nil t) + (let ((ch (preceding-char))) + (delete-char -1) + (insert (format (if (< ch #x10000) "\\u%04X" "\\U%08X") ch)))) + + ;; Prefer chinese-gb2312 for Chinese characters. + (put-text-property (point-min) (point-max) 'charset 'chinese-gb2312) + (encode-coding-region (point-min) (point-max) 'iso-2022-7bit) + + ;; ESC $ B ... ESC ( B -> ~{ ... ~} + ;; ESC ESC -> ESC + (goto-char (point-min)) + (while (search-forward "\e" nil t) + (if (= (following-char) ?\e) + ;; ESC ESC -> ESC + (delete-char 1) + (forward-char -1) + (if (looking-at iso2022-gb-designation) + (progn + (delete-region (match-beginning 0) (match-end 0)) + (insert hz-gb-designation) + (search-forward iso2022-ascii-designation nil 'move) + (delete-region (match-beginning 0) (match-end 0)) + (insert hz-ascii-designation)))))) (- (point-max) (point-min))))) ;;;###autoload From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 17 11:28:19 2016 Received: (at 23814) by debbugs.gnu.org; 17 Aug 2016 15:28:19 +0000 Received: from localhost ([127.0.0.1]:60413 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba2lP-0003dv-2n for submit@debbugs.gnu.org; Wed, 17 Aug 2016 11:28:19 -0400 Received: from eggs.gnu.org ([208.118.235.92]:58824) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba2lN-0003dj-1n for 23814@debbugs.gnu.org; Wed, 17 Aug 2016 11:28:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ba2lG-0001bk-Qw for 23814@debbugs.gnu.org; Wed, 17 Aug 2016 11:28:11 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41512) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba2lB-0001bI-3k; Wed, 17 Aug 2016 11:28:05 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1477 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ba2l9-0000bk-87; Wed, 17 Aug 2016 11:28:03 -0400 Date: Wed, 17 Aug 2016 18:28:06 +0300 Message-Id: <83wpjfe7p5.fsf@gnu.org> From: Eli Zaretskii To: handa In-reply-to: <87bn0rjw1q.fsf@gnu.org> (message from handa on Wed, 17 Aug 2016 23:43:13 +0900) Subject: Re: bug#23814: 24.5; bug of hz coding-system References: <87bn0rjw1q.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 23814 Cc: ynyaaa@gmail.com, 23814@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.6 (-----) > From: handa > Cc: eliz@gnu.org, 23814@debbugs.gnu.org > Date: Wed, 17 Aug 2016 23:43:13 +0900 > > In article <87oa4rdhvq.fsf@gmail.com>, ynyaaa@gmail.com writes: > > > Hi, I tried new china-util.el. It works very well. > > Thank you for testing it. > > > I prefer 7bit encoding to use only 7bit data, too. > > As for elisp, "\u12345" is treated as "\u1234\ 5". > > Ah, ok, I changed to encode characters not in BMP to \UXXXXXXXX. > > I've just committed the attached change. Thanks. Please close the bug if satisfied with the solution. From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 01 15:36:23 2017 Received: (at control) by debbugs.gnu.org; 1 Mar 2017 20:36:23 +0000 Received: from localhost ([127.0.0.1]:34911 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cjAz1-0001Pm-Ib for submit@debbugs.gnu.org; Wed, 01 Mar 2017 15:36:23 -0500 Received: from eggs.gnu.org ([208.118.235.92]:36526) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cjAyz-0001PX-Us for control@debbugs.gnu.org; Wed, 01 Mar 2017 15:36:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cjAyu-0007aq-3X for control@debbugs.gnu.org; Wed, 01 Mar 2017 15:36:16 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44958) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjAyu-0007ai-0u for control@debbugs.gnu.org; Wed, 01 Mar 2017 15:36:16 -0500 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from ) id 1cjAyt-0000gX-N7 for control@debbugs.gnu.org; Wed, 01 Mar 2017 15:36:15 -0500 Subject: control message for bug 23814 To: X-Mailer: mail (GNU Mailutils 2.99.98) Message-Id: From: Glenn Morris Date: Wed, 01 Mar 2017 15:36:15 -0500 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) # 7faabf0 close 23814 26.1 From unknown Sun Jun 22 04:00:12 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 30 Mar 2017 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator