GNU bug report logs - #10919
emacs-mule/utf-8 difference

Previous Next

Package: emacs;

Reported by: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>

Date: Thu, 1 Mar 2012 15:41:03 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 10919 in the body.
You can then email your comments to 10919 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#10919; Package emacs. (Thu, 01 Mar 2012 15:41:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tiphaine Turpin <tiphaine.turpin <at> inria.fr>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 01 Mar 2012 15:41:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
To: bug-gnu-emacs <at> gnu.org
Subject: emacs-mule/utf-8 difference
Date: Thu, 01 Mar 2012 16:39:57 +0100
Hi,

I have a problem regarding coding systems:

I'm using process-send-string to send substrings of a buffer through a 
socket, after setting the process encoding and decoding systems to 
emacs-mule.
I expect the number of bytes written to match the byte-length of the 
substring as obtained by position-bytes, since the specification of 
position-bytes in emacs-devel is to always work with the emacs-mule 
encoding. From emacs-devel:

"The byte sequence of a buffer after decoded is always in emacs-mule (in 
emacs-unicode-2 branch, it's utf-8).  So, changing 
buffer-file-coding-system or any other coding-system-related variables 
doesn't affects position-bytes."

However, this is not the case with 3bytes utf8 characters: 
position-bytes counts them as 3 bytes, but process-send-string wirtes 4 
bytes.

Setting the process coding systems for the socket to utf-8 solves the 
problem, but I don't think it will with other coding systems, even if I 
used buffer-file-coding-system instead, since position-bytes does not 
use it.

What is the real expected behavior of these things, and how to make this 
correct ?

Regards,

Tiphaine Turpin





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#10919; Package emacs. (Thu, 01 Mar 2012 15:50:01 GMT) Full text and rfc822 format available.

Message #8 received at 10919 <at> debbugs.gnu.org (full text, mbox):

From: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
To: 10919 <at> debbugs.gnu.org
Subject: Re: emacs-mule/utf-8 difference
Date: Thu, 01 Mar 2012 16:48:30 +0100
I just found a solution which seems to work: using emacs-internal 
instead of emacs-mule. So it seems to be just a documentation problem 
(or a problem with my reading of it).

Tiphaine

On 01/03/2012 16:39, Tiphaine Turpin wrote:
> Hi,
>
> I have a problem regarding coding systems:
>
> I'm using process-send-string to send substrings of a buffer through a 
> socket, after setting the process encoding and decoding systems to 
> emacs-mule.
> I expect the number of bytes written to match the byte-length of the 
> substring as obtained by position-bytes, since the specification of 
> position-bytes in emacs-devel is to always work with the emacs-mule 
> encoding. From emacs-devel:
>
> "The byte sequence of a buffer after decoded is always in emacs-mule 
> (in emacs-unicode-2 branch, it's utf-8).  So, changing 
> buffer-file-coding-system or any other coding-system-related variables 
> doesn't affects position-bytes."
>
> However, this is not the case with 3bytes utf8 characters: 
> position-bytes counts them as 3 bytes, but process-send-string wirtes 
> 4 bytes.
>
> Setting the process coding systems for the socket to utf-8 solves the 
> problem, but I don't think it will with other coding systems, even if 
> I used buffer-file-coding-system instead, since position-bytes does 
> not use it.
>
> What is the real expected behavior of these things, and how to make 
> this correct ?
>
> Regards,
>
> Tiphaine Turpin
>





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#10919; Package emacs. (Thu, 01 Mar 2012 17:46:02 GMT) Full text and rfc822 format available.

Message #11 received at 10919 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
Cc: 10919 <at> debbugs.gnu.org
Subject: Re: bug#10919: emacs-mule/utf-8 difference
Date: Thu, 01 Mar 2012 12:45:00 -0500
> I just found a solution which seems to work: using emacs-internal instead of
> emacs-mule. So it seems to be just a documentation problem (or a problem
> with my reading of it).

emacs-mule was internally used in Emacs<23, now it's a variant of utf-8.
So position-bytes in Emacs<23 should be consistent with emasc-mule, but
in Emacs≄23 it is only consistent with emacs-internal (or utf-8).


        Stefan




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 01 Mar 2012 17:54:02 GMT) Full text and rfc822 format available.

Notification sent to Tiphaine Turpin <tiphaine.turpin <at> inria.fr>:
bug acknowledged by developer. (Thu, 01 Mar 2012 17:54:02 GMT) Full text and rfc822 format available.

Message #16 received at 10919-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
Cc: 10919-done <at> debbugs.gnu.org
Subject: Re: bug#10919: emacs-mule/utf-8 difference
Date: Thu, 01 Mar 2012 19:54:48 +0200
> Date: Thu, 01 Mar 2012 16:39:57 +0100
> From: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
> 
> From emacs-devel:
> 
> "The byte sequence of a buffer after decoded is always in emacs-mule (in 
> emacs-unicode-2 branch, it's utf-8).

This is very old info.  The emacs-unicode-2 branch was merged with the
mainline when Emacs 23.1 was released.

> So, changing 
> buffer-file-coding-system or any other coding-system-related variables 
> doesn't affects position-bytes."
> 
> However, this is not the case with 3bytes utf8 characters: 
> position-bytes counts them as 3 bytes, but process-send-string wirtes 4 
> bytes.

process-send-string _encodes_ the string, it does not send the
internal representation of the string in the buffer.  Using
process-send-string is like writing the string to a disk file: Emacs
encodes it before sending or writing.

Therefore, buffer-file-coding-system _does_ affect what is being sent.

I'm closing this non-bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 30 Mar 2012 11:24:02 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 88 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.