GNU bug report logs -
#28179
Fix use of string-to-multibyte in ispell.el
Previous Next
Reported by: Reuben Thomas <rrt <at> sc3d.org>
Date: Tue, 22 Aug 2017 00:53:01 UTC
Severity: minor
Done: Reuben Thomas <rrt <at> sc3d.org>
Bug is archived. No further changes may be made.
Full log
Message #17 received at 28179 <at> debbugs.gnu.org (full text, mbox):
On 22/08/17 18:23, Eli Zaretskii wrote:
>> Cc: 28179 <at> debbugs.gnu.org
>> From: Reuben Thomas <rrt <at> sc3d.org>
>> Date: Tue, 22 Aug 2017 18:04:11 +0100
>>
>> Are you sure we don't need to ensure ispell-get-decoded-string always
>> returns a multibyte string? What if decode-coding-string returns a
>> pure ASCII string, which is therefore unibyte?
>>
>> This is multibyte too, no? The Emacs manual says:
>>
>> Rather, Emacs uses a variable-length internal representation of
>> characters, that stores each character as a sequence of 1 to 5 8-bit
>> bytes, depending on the magnitude of its codepoint(1). For example, any
>> ASCII character takes up only 1 byte, a Latin-1 character takes up 2
>> bytes, etc. We call this representation of text “multibyte”.
> This is a misunderstanding, caused by the overloaded meaning of
> "multibyte string". The way I meant it, it has to do with the
> internal flag marking a string either unibyte or multibyte. Observe:
>
> (multibyte-string-p "abcd") => nil
>
> but
>
> (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t
So here, running decode-coding-string on a plain ASCII string returns a
multibyte string.
> ispell-decode-string, which you replaced with its body. The call to
> string-to-multibyte worked on the result of decoding, not instead of
> the decoding. So actually the call to string-to-multibyte was not
> replaced, it was removed.
Yes, that call seemed to be unnecessary.
> Is the issue more clear now?
I now understand the two meanings of "multibyte", but I don't understand
how my patch is deficient. I tried even:
(multibyte-string-p (decode-coding-string "abcde" 'utf-8 t)) ; returns
t; also if I use 'us-ascii
So in fact even when the string isn't copied (as in my patch, where I
also use a third argument of t to decode-coding-string) it appears to be
changed to a multibyte string.
--
https://rrt.sc3d.org
This bug report was last modified 7 years and 268 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.