GNU bug report logs - #10919
emacs-mule/utf-8 difference

Previous Next

Package: emacs;

Reported by: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>

Date: Thu, 1 Mar 2012 15:41:03 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
Subject: bug#10919: closed (Re: bug#10919: emacs-mule/utf-8 difference)
Date: Thu, 01 Mar 2012 17:54:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#10919: emacs-mule/utf-8 difference

which was filed against the emacs package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 10919 <at> debbugs.gnu.org.

-- 
10919: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10919
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Eli Zaretskii <eliz <at> gnu.org>
To: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
Cc: 10919-done <at> debbugs.gnu.org
Subject: Re: bug#10919: emacs-mule/utf-8 difference
Date: Thu, 01 Mar 2012 19:54:48 +0200
> Date: Thu, 01 Mar 2012 16:39:57 +0100
> From: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
> 
> From emacs-devel:
> 
> "The byte sequence of a buffer after decoded is always in emacs-mule (in 
> emacs-unicode-2 branch, it's utf-8).

This is very old info.  The emacs-unicode-2 branch was merged with the
mainline when Emacs 23.1 was released.

> So, changing 
> buffer-file-coding-system or any other coding-system-related variables 
> doesn't affects position-bytes."
> 
> However, this is not the case with 3bytes utf8 characters: 
> position-bytes counts them as 3 bytes, but process-send-string wirtes 4 
> bytes.

process-send-string _encodes_ the string, it does not send the
internal representation of the string in the buffer.  Using
process-send-string is like writing the string to a disk file: Emacs
encodes it before sending or writing.

Therefore, buffer-file-coding-system _does_ affect what is being sent.

I'm closing this non-bug.

[Message part 3 (message/rfc822, inline)]
From: Tiphaine Turpin <tiphaine.turpin <at> inria.fr>
To: bug-gnu-emacs <at> gnu.org
Subject: emacs-mule/utf-8 difference
Date: Thu, 01 Mar 2012 16:39:57 +0100
Hi,

I have a problem regarding coding systems:

I'm using process-send-string to send substrings of a buffer through a 
socket, after setting the process encoding and decoding systems to 
emacs-mule.
I expect the number of bytes written to match the byte-length of the 
substring as obtained by position-bytes, since the specification of 
position-bytes in emacs-devel is to always work with the emacs-mule 
encoding. From emacs-devel:

"The byte sequence of a buffer after decoded is always in emacs-mule (in 
emacs-unicode-2 branch, it's utf-8).  So, changing 
buffer-file-coding-system or any other coding-system-related variables 
doesn't affects position-bytes."

However, this is not the case with 3bytes utf8 characters: 
position-bytes counts them as 3 bytes, but process-send-string wirtes 4 
bytes.

Setting the process coding systems for the socket to utf-8 solves the 
problem, but I don't think it will with other coding systems, even if I 
used buffer-file-coding-system instead, since position-bytes does not 
use it.

What is the real expected behavior of these things, and how to make this 
correct ?

Regards,

Tiphaine Turpin




This bug report was last modified 13 years and 88 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.