From unknown Mon Jun 23 20:20:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10919: emacs-mule/utf-8 difference Resent-From: Tiphaine Turpin Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 01 Mar 2012 15:41:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 10919 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 10919@debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.133061646214952 (code B ref -1); Thu, 01 Mar 2012 15:41:03 +0000 Received: (at submit) by debbugs.gnu.org; 1 Mar 2012 15:41:02 +0000 Received: from localhost ([127.0.0.1]:57680 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S387p-0003sq-Vv for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:41:02 -0500 Received: from eggs.gnu.org ([208.118.235.92]:59911) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S387c-0003sP-AF for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:40:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S3877-0000u1-SY for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:40:22 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:39075) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S3877-0000tw-PQ for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:40:17 -0500 Received: from eggs.gnu.org ([208.118.235.92]:34494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S3871-0007Bo-LX for bug-gnu-emacs@gnu.org; Thu, 01 Mar 2012 10:40:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S386r-0000my-L1 for bug-gnu-emacs@gnu.org; Thu, 01 Mar 2012 10:40:11 -0500 Received: from mail1-relais-roc.national.inria.fr ([192.134.164.82]:30753) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S386r-0000lv-FA for bug-gnu-emacs@gnu.org; Thu, 01 Mar 2012 10:40:01 -0500 X-IronPort-AV: E=Sophos;i="4.73,511,1325458800"; d="scan'208";a="146983133" Received: from chercheurs2-217.saclay.inria.fr (HELO [193.55.250.217]) ([193.55.250.217]) by mail1-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA; 01 Mar 2012 16:39:57 +0100 Message-ID: <4F4F984D.2000901@inria.fr> Date: Thu, 01 Mar 2012 16:39:57 +0100 From: Tiphaine Turpin User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111109 Thunderbird/3.1.16 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Hi, I have a problem regarding coding systems: I'm using process-send-string to send substrings of a buffer through a socket, after setting the process encoding and decoding systems to emacs-mule. I expect the number of bytes written to match the byte-length of the substring as obtained by position-bytes, since the specification of position-bytes in emacs-devel is to always work with the emacs-mule encoding. From emacs-devel: "The byte sequence of a buffer after decoded is always in emacs-mule (in emacs-unicode-2 branch, it's utf-8). So, changing buffer-file-coding-system or any other coding-system-related variables doesn't affects position-bytes." However, this is not the case with 3bytes utf8 characters: position-bytes counts them as 3 bytes, but process-send-string wirtes 4 bytes. Setting the process coding systems for the socket to utf-8 solves the problem, but I don't think it will with other coding systems, even if I used buffer-file-coding-system instead, since position-bytes does not use it. What is the real expected behavior of these things, and how to make this correct ? Regards, Tiphaine Turpin From unknown Mon Jun 23 20:20:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10919: emacs-mule/utf-8 difference Resent-From: Tiphaine Turpin Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 01 Mar 2012 15:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10919 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 10919@debbugs.gnu.org Received: via spool by 10919-submit@debbugs.gnu.org id=B10919.133061695015737 (code B ref 10919); Thu, 01 Mar 2012 15:50:01 +0000 Received: (at 10919) by debbugs.gnu.org; 1 Mar 2012 15:49:10 +0000 Received: from localhost ([127.0.0.1]:57698 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S38Fh-00045G-Gg for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:49:09 -0500 Received: from mail4-relais-sop.national.inria.fr ([192.134.164.105]:30805) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S38FW-00044r-3Q for 10919@debbugs.gnu.org; Thu, 01 Mar 2012 10:48:59 -0500 X-IronPort-AV: E=Sophos;i="4.73,511,1325458800"; d="scan'208";a="133865751" Received: from chercheurs2-217.saclay.inria.fr (HELO [193.55.250.217]) ([193.55.250.217]) by mail4-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA; 01 Mar 2012 16:48:31 +0100 Message-ID: <4F4F9A4E.50506@inria.fr> Date: Thu, 01 Mar 2012 16:48:30 +0100 From: Tiphaine Turpin User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111109 Thunderbird/3.1.16 MIME-Version: 1.0 References: <4F4F984D.2000901@inria.fr> In-Reply-To: <4F4F984D.2000901@inria.fr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -6.9 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) I just found a solution which seems to work: using emacs-internal instead of emacs-mule. So it seems to be just a documentation problem (or a problem with my reading of it). Tiphaine On 01/03/2012 16:39, Tiphaine Turpin wrote: > Hi, > > I have a problem regarding coding systems: > > I'm using process-send-string to send substrings of a buffer through a > socket, after setting the process encoding and decoding systems to > emacs-mule. > I expect the number of bytes written to match the byte-length of the > substring as obtained by position-bytes, since the specification of > position-bytes in emacs-devel is to always work with the emacs-mule > encoding. From emacs-devel: > > "The byte sequence of a buffer after decoded is always in emacs-mule > (in emacs-unicode-2 branch, it's utf-8). So, changing > buffer-file-coding-system or any other coding-system-related variables > doesn't affects position-bytes." > > However, this is not the case with 3bytes utf8 characters: > position-bytes counts them as 3 bytes, but process-send-string wirtes > 4 bytes. > > Setting the process coding systems for the socket to utf-8 solves the > problem, but I don't think it will with other coding systems, even if > I used buffer-file-coding-system instead, since position-bytes does > not use it. > > What is the real expected behavior of these things, and how to make > this correct ? > > Regards, > > Tiphaine Turpin > From unknown Mon Jun 23 20:20:48 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10919: emacs-mule/utf-8 difference Resent-From: Stefan Monnier Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 01 Mar 2012 17:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10919 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Tiphaine Turpin Cc: 10919@debbugs.gnu.org Received: via spool by 10919-submit@debbugs.gnu.org id=B10919.133062395425894 (code B ref 10919); Thu, 01 Mar 2012 17:46:02 +0000 Received: (at 10919) by debbugs.gnu.org; 1 Mar 2012 17:45:54 +0000 Received: from localhost ([127.0.0.1]:57772 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S3A4f-0006jO-BD for submit@debbugs.gnu.org; Thu, 01 Mar 2012 12:45:54 -0500 Received: from chene.dit.umontreal.ca ([132.204.246.20]:54913) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S3A4F-0006im-O0 for 10919@debbugs.gnu.org; Thu, 01 Mar 2012 12:45:40 -0500 Received: from faina.iro.umontreal.ca (lechon.iro.umontreal.ca [132.204.27.242]) by chene.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id q21Hj0M8007559; Thu, 1 Mar 2012 12:45:00 -0500 Received: by faina.iro.umontreal.ca (Postfix, from userid 20848) id A11C3130005; Thu, 1 Mar 2012 12:45:00 -0500 (EST) From: Stefan Monnier Message-ID: References: <4F4F984D.2000901@inria.fr> <4F4F9A4E.50506@inria.fr> Date: Thu, 01 Mar 2012 12:45:00 -0500 In-Reply-To: <4F4F9A4E.50506@inria.fr> (Tiphaine Turpin's message of "Thu, 01 Mar 2012 16:48:30 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV4148=0 X-NAI-Spam-Version: 2.2.0.9309 : core <4148> : streams <733755> : uri <1075034> X-Spam-Score: -3.5 (---) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.5 (---) > I just found a solution which seems to work: using emacs-internal instead= of > emacs-mule. So it seems to be just a documentation problem (or a problem > with my reading of it). emacs-mule was internally used in Emacs<23, now it's a variant of utf-8. So position-bytes in Emacs<23 should be consistent with emasc-mule, but in Emacs=E2=89=A523 it is only consistent with emacs-internal (or utf-8). Stefan From unknown Mon Jun 23 20:20:48 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.428 (Entity 5.428) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Tiphaine Turpin Subject: bug#10919: closed (Re: bug#10919: emacs-mule/utf-8 difference) Message-ID: References: <83399scil3.fsf@gnu.org> <4F4F984D.2000901@inria.fr> X-Gnu-PR-Message: they-closed 10919 X-Gnu-PR-Package: emacs Reply-To: 10919@debbugs.gnu.org Date: Thu, 01 Mar 2012 17:54:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1330624442-26574-1" This is a multi-part message in MIME format... ------------=_1330624442-26574-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #10919: emacs-mule/utf-8 difference which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 10919@debbugs.gnu.org. --=20 10919: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D10919 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1330624442-26574-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 10919-done) by debbugs.gnu.org; 1 Mar 2012 17:53:21 +0000 Received: from localhost ([127.0.0.1]:57776 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S3ABt-0006tc-3X for submit@debbugs.gnu.org; Thu, 01 Mar 2012 12:53:21 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:57894) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S3ABf-0006sx-Et for 10919-done@debbugs.gnu.org; Thu, 01 Mar 2012 12:53:08 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0M0700300W7HC200@a-mtaout22.012.net.il> for 10919-done@debbugs.gnu.org; Thu, 01 Mar 2012 19:52:40 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.228.20.191]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M07002ARWBQKCJ0@a-mtaout22.012.net.il>; Thu, 01 Mar 2012 19:52:39 +0200 (IST) Date: Thu, 01 Mar 2012 19:54:48 +0200 From: Eli Zaretskii Subject: Re: bug#10919: emacs-mule/utf-8 difference In-reply-to: <4F4F984D.2000901@inria.fr> X-012-Sender: halo1@inter.net.il To: Tiphaine Turpin Message-id: <83399scil3.fsf@gnu.org> References: <4F4F984D.2000901@inria.fr> X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 10919-done Cc: 10919-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > Date: Thu, 01 Mar 2012 16:39:57 +0100 > From: Tiphaine Turpin > > From emacs-devel: > > "The byte sequence of a buffer after decoded is always in emacs-mule (in > emacs-unicode-2 branch, it's utf-8). This is very old info. The emacs-unicode-2 branch was merged with the mainline when Emacs 23.1 was released. > So, changing > buffer-file-coding-system or any other coding-system-related variables > doesn't affects position-bytes." > > However, this is not the case with 3bytes utf8 characters: > position-bytes counts them as 3 bytes, but process-send-string wirtes 4 > bytes. process-send-string _encodes_ the string, it does not send the internal representation of the string in the buffer. Using process-send-string is like writing the string to a disk file: Emacs encodes it before sending or writing. Therefore, buffer-file-coding-system _does_ affect what is being sent. I'm closing this non-bug. ------------=_1330624442-26574-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 1 Mar 2012 15:41:02 +0000 Received: from localhost ([127.0.0.1]:57680 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S387p-0003sq-Vv for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:41:02 -0500 Received: from eggs.gnu.org ([208.118.235.92]:59911) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S387c-0003sP-AF for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:40:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S3877-0000u1-SY for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:40:22 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:39075) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S3877-0000tw-PQ for submit@debbugs.gnu.org; Thu, 01 Mar 2012 10:40:17 -0500 Received: from eggs.gnu.org ([208.118.235.92]:34494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S3871-0007Bo-LX for bug-gnu-emacs@gnu.org; Thu, 01 Mar 2012 10:40:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S386r-0000my-L1 for bug-gnu-emacs@gnu.org; Thu, 01 Mar 2012 10:40:11 -0500 Received: from mail1-relais-roc.national.inria.fr ([192.134.164.82]:30753) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S386r-0000lv-FA for bug-gnu-emacs@gnu.org; Thu, 01 Mar 2012 10:40:01 -0500 X-IronPort-AV: E=Sophos;i="4.73,511,1325458800"; d="scan'208";a="146983133" Received: from chercheurs2-217.saclay.inria.fr (HELO [193.55.250.217]) ([193.55.250.217]) by mail1-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA; 01 Mar 2012 16:39:57 +0100 Message-ID: <4F4F984D.2000901@inria.fr> Date: Thu, 01 Mar 2012 16:39:57 +0100 From: Tiphaine Turpin User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111109 Thunderbird/3.1.16 MIME-Version: 1.0 To: bug-gnu-emacs@gnu.org Subject: emacs-mule/utf-8 difference Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Hi, I have a problem regarding coding systems: I'm using process-send-string to send substrings of a buffer through a socket, after setting the process encoding and decoding systems to emacs-mule. I expect the number of bytes written to match the byte-length of the substring as obtained by position-bytes, since the specification of position-bytes in emacs-devel is to always work with the emacs-mule encoding. From emacs-devel: "The byte sequence of a buffer after decoded is always in emacs-mule (in emacs-unicode-2 branch, it's utf-8). So, changing buffer-file-coding-system or any other coding-system-related variables doesn't affects position-bytes." However, this is not the case with 3bytes utf8 characters: position-bytes counts them as 3 bytes, but process-send-string wirtes 4 bytes. Setting the process coding systems for the socket to utf-8 solves the problem, but I don't think it will with other coding systems, even if I used buffer-file-coding-system instead, since position-bytes does not use it. What is the real expected behavior of these things, and how to make this correct ? Regards, Tiphaine Turpin ------------=_1330624442-26574-1--