From unknown Fri Jun 20 07:19:13 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#28179 <28179@debbugs.gnu.org> To: bug#28179 <28179@debbugs.gnu.org> Subject: Status: Fix use of string-to-multibyte in ispell.el Reply-To: bug#28179 <28179@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:19:13 +0000 retitle 28179 Fix use of string-to-multibyte in ispell.el reassign 28179 emacs submitter 28179 Reuben Thomas severity 28179 minor thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Aug 21 20:52:10 2017 Received: (at submit) by debbugs.gnu.org; 22 Aug 2017 00:52:10 +0000 Received: from localhost ([127.0.0.1]:48366 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1djxQP-0007uR-Nt for submit@debbugs.gnu.org; Mon, 21 Aug 2017 20:52:10 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36279) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1djxQO-0007uF-F3 for submit@debbugs.gnu.org; Mon, 21 Aug 2017 20:52:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1djxQI-0000Lw-7B for submit@debbugs.gnu.org; Mon, 21 Aug 2017 20:52:03 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_20,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:48243) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1djxQI-0000Li-2p for submit@debbugs.gnu.org; Mon, 21 Aug 2017 20:52:02 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55473) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1djxQG-0001gm-RT for bug-gnu-emacs@gnu.org; Mon, 21 Aug 2017 20:52:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1djxQF-0000Gg-RR for bug-gnu-emacs@gnu.org; Mon, 21 Aug 2017 20:52:00 -0400 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:47999) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1djxQF-0000GW-NJ for bug-gnu-emacs@gnu.org; Mon, 21 Aug 2017 20:51:59 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48082) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1djxQF-0006BV-FB for bug-emacs@gnu.org; Mon, 21 Aug 2017 20:51:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1djxQE-0000Cn-CY for bug-emacs@gnu.org; Mon, 21 Aug 2017 20:51:59 -0400 Received: from mail-wr0-x231.google.com ([2a00:1450:400c:c0c::231]:36051) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1djxQE-00008b-29 for bug-emacs@gnu.org; Mon, 21 Aug 2017 20:51:58 -0400 Received: by mail-wr0-x231.google.com with SMTP id f8so75882715wrf.3 for ; Mon, 21 Aug 2017 17:51:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sc3d.org; s=google; h=to:from:subject:organization:message-id:date:user-agent :mime-version:content-language; bh=Xs9XsuaY4O2ZSR009h6peti2wxg/ZUtSuEmRGYh448s=; b=taDX+8A8HilVx/hJAI2jgWIOuCh+jV6VnapX1PxHo12ZcKnfF0Vj563T/jfQyRi6o2 EpggSLQwGu1Q+KvkMSF0nIrBBIER2lk7oHx7CVxN3odrv3tbKt2YafSZSXWCM4gblBoL Vt/iH6ciJgtdTDdLwJiAio4FByyrAahyH+iGY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:organization:message-id:date :user-agent:mime-version:content-language; bh=Xs9XsuaY4O2ZSR009h6peti2wxg/ZUtSuEmRGYh448s=; b=I/Ln+BCIZketYNnSMxXNB7JP47lZVXEOWA9AgQTg7h9EwNUiY1yn87U3ZC9o1VKqFD p4j1JV1YkCuld9Kx6+KBIFRS8Fjsd3E4j+TgOinwZavoIBUQpuJj4s4Z1aj/brs4xwcZ 1rEW6tTWSs83cbiPBNAz9IG3ApjuqNOs/rBbC/O3ORBeWY+01J5tCQU2tH+tQ4LKjEQT pHFvWVKi1vLCxLr2bfoIyDFdV6LxZjsHSs35p62y6UYPxILQcLgYpRwNPl9BuWJVKpFO S8jLDBNa/f3vZZAo4zQnpr48uEzE6ZOSqIQHlVVJ3yIquwhWxBvx+Itd7Y3WFP3idYgR WQng== X-Gm-Message-State: AHYfb5gMGW76ZiaH/BDF/hk/AxsB0gx7oLbdvT1KpXkqxmf2z1Df+hsm 96dOzqHMClUVEoqp+TEIFw== X-Received: by 10.223.182.171 with SMTP id j43mr5989661wre.118.1503363116185; Mon, 21 Aug 2017 17:51:56 -0700 (PDT) Received: from ?IPv6:2a02:c7d:51cb:c700:e890:1fc2:4566:9a1? ([2a02:c7d:51cb:c700:e890:1fc2:4566:9a1]) by smtp.gmail.com with ESMTPSA id q7sm5219183wrd.89.2017.08.21.17.51.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Aug 2017 17:51:54 -0700 (PDT) To: bug-emacs From: Reuben Thomas Subject: Fix use of string-to-multibyte in ispell.el Organization: =?UTF-8?B?U0PCs0Q=?= Message-ID: Date: Tue, 22 Aug 2017 01:51:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------E075EB8D65C87906C458FB26" Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.1 (----) This is a multi-part message in MIME format. --------------E075EB8D65C87906C458FB26 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit The attached patch removes a use of string-to-multibyte in ispell.el. It turned out to be possible to simplify the surrounding code quite a bit too. -- https://rrt.sc3d.org --------------E075EB8D65C87906C458FB26 Content-Type: text/x-patch; name="0001-Avoid-using-string-to-multibyte-in-ispell.el.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-Avoid-using-string-to-multibyte-in-ispell.el.patch" >From 077584030546f97c06d1f8c907475a04115e6c58 Mon Sep 17 00:00:00 2001 From: Reuben Thomas Date: Tue, 22 Aug 2017 01:46:27 +0100 Subject: [PATCH] Avoid using string-to-multibyte in ispell.el * lisp/textmodes/ispell.el (ispell-get-decoded-string): Use decode-coding-string instead. --- lisp/textmodes/ispell.el | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/lisp/textmodes/ispell.el b/lisp/textmodes/ispell.el index e67e603..87a3b7a 100644 --- a/lisp/textmodes/ispell.el +++ b/lisp/textmodes/ispell.el @@ -1485,25 +1485,15 @@ ispell-current-personal-dictionary "The name of the current personal dictionary, or nil for the default. This is passed to the Ispell process using the `-p' switch.") -(defun ispell-decode-string (str) - "Decodes multibyte character strings." - (decode-coding-string str (ispell-get-coding-system))) - ;; Return a string decoded from Nth element of the current dictionary. (defun ispell-get-decoded-string (n) "Get the decoded string in slot N of the descriptor of the current dict." (let* ((slot (or (assoc ispell-current-dictionary ispell-local-dictionary-alist) (assoc ispell-current-dictionary ispell-dictionary-alist) - (error "No data for dictionary \"%s\", neither in `ispell-local-dictionary-alist' nor in `ispell-dictionary-alist'" - ispell-current-dictionary))) - (str (nth n slot))) - (when (and (> (length str) 0) - (not (multibyte-string-p str))) - (setq str (ispell-decode-string str)) - (or (multibyte-string-p str) - (setq str (string-to-multibyte str)))) - str)) + (error "No data for dictionary \"%s\" in `ispell-local-dictionary-alist' or `ispell-dictionary-alist'" + ispell-current-dictionary)))) + (decode-coding-string (nth n slot) (ispell-get-coding-system) t))) (defun ispell-get-casechars () (ispell-get-decoded-string 1)) -- 2.7.4 --------------E075EB8D65C87906C458FB26-- From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 22 12:39:04 2017 Received: (at 28179) by debbugs.gnu.org; 22 Aug 2017 16:39:04 +0000 Received: from localhost ([127.0.0.1]:49776 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkCCl-0001sj-Tj for submit@debbugs.gnu.org; Tue, 22 Aug 2017 12:39:04 -0400 Received: from eggs.gnu.org ([208.118.235.92]:37947) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkCCk-0001sG-7H for 28179@debbugs.gnu.org; Tue, 22 Aug 2017 12:39:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkCCb-0005uX-UX for 28179@debbugs.gnu.org; Tue, 22 Aug 2017 12:38:57 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_20,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33622) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkCCb-0005uR-QZ; Tue, 22 Aug 2017 12:38:53 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2750 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dkCCb-0007aY-8F; Tue, 22 Aug 2017 12:38:53 -0400 Date: Tue, 22 Aug 2017 19:38:49 +0300 Message-Id: <83r2w3a76u.fsf@gnu.org> From: Eli Zaretskii To: Reuben Thomas In-reply-to: (message from Reuben Thomas on Tue, 22 Aug 2017 01:51:53 +0100) Subject: Re: bug#28179: Fix use of string-to-multibyte in ispell.el References: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) > From: Reuben Thomas > Date: Tue, 22 Aug 2017 01:51:53 +0100 > > ;; Return a string decoded from Nth element of the current dictionary. > (defun ispell-get-decoded-string (n) > "Get the decoded string in slot N of the descriptor of the current dict." > (let* ((slot (or > (assoc ispell-current-dictionary ispell-local-dictionary-alist) > (assoc ispell-current-dictionary ispell-dictionary-alist) > - (error "No data for dictionary \"%s\", neither in `ispell-local-dictionary-alist' nor in `ispell-dictionary-alist'" > - ispell-current-dictionary))) > - (str (nth n slot))) > - (when (and (> (length str) 0) > - (not (multibyte-string-p str))) > - (setq str (ispell-decode-string str)) > - (or (multibyte-string-p str) > - (setq str (string-to-multibyte str)))) Are you sure we don't need to ensure ispell-get-decoded-string always returns a multibyte string? What if decode-coding-string returns a pure ASCII string, which is therefore unibyte? IOW, it looks like you didn't replace the call to string-to-multibyte with something equivalent, you simply removed that call. If you analyzed the code and concluded that this call is redundant, please tell the details of your analysis. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 22 13:04:24 2017 Received: (at 28179) by debbugs.gnu.org; 22 Aug 2017 17:04:24 +0000 Received: from localhost ([127.0.0.1]:49791 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkCbH-0002Sb-Mm for submit@debbugs.gnu.org; Tue, 22 Aug 2017 13:04:23 -0400 Received: from mail-wr0-f173.google.com ([209.85.128.173]:35960) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkCbF-0002SQ-Lc for 28179@debbugs.gnu.org; Tue, 22 Aug 2017 13:04:22 -0400 Received: by mail-wr0-f173.google.com with SMTP id f8so93212659wrf.3 for <28179@debbugs.gnu.org>; Tue, 22 Aug 2017 10:04:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sc3d.org; s=google; h=subject:to:cc:references:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=R8U2f6RkX9kCTUsH9j5gqxcMATjYxQS7sQM+4DHg3vM=; b=nRICo+5j3r+2mfsu+3Fhd8B5LZBVRQ+3noYrcZ5+qTe+mtm6QRoqMfix6fbfy3w8i9 wIhiEyLeorfVKI7jY0heiT0NVO3N6LIBs2/yiy0+x1oipxyvGxnJI9rDilVpqw6ZIRwU xj4vdqWDB7woV8adgITsNqyiH39Ydt71BVLP0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language; bh=R8U2f6RkX9kCTUsH9j5gqxcMATjYxQS7sQM+4DHg3vM=; b=mVhZQ7i5CXzo0p5d5OXstDNa8qfYnJsGEMTK6rc0TTzQTXnZACSYEKzoL/iXTRPpvg 8ePhALLSQ5rV/6GHmlrWAVTELTqPdoqE+tcXWLQVrvwQrvZzQP9ACLinBpMJs9cr80Hk 7vo8Y6MhDkWI+YpMbxIFLkuAdJ6KVRQmyc36pYe2JE8/BnYDMOveKLncLnpaaa6S9kT/ ZbPzuaji65K3jxO63/35QDOvIxoPMRJrk/J5ugCEDVrfr1Ooz8O8dM+EfeNroluxMka3 aB6wk+Dq499rydR3ghXnPJb0rnQl5O9Acnk6XHKWwF+PswQgoreDpvSw4BK1RKPddUa9 HPTg== X-Gm-Message-State: AHYfb5gXUO3v3+5pkN7EuJx62aLA54jFWmMkyFJ/IZOcPpv3BYT9frjm KpKG741JZFRoPdC+vt46+g== X-Received: by 10.223.154.8 with SMTP id z8mr897541wrb.122.1503421455468; Tue, 22 Aug 2017 10:04:15 -0700 (PDT) Received: from ?IPv6:2a02:c7d:51cb:c700:cc4b:943a:5ca8:76c7? ([2a02:c7d:51cb:c700:cc4b:943a:5ca8:76c7]) by smtp.gmail.com with ESMTPSA id t12sm16206536wme.14.2017.08.22.10.04.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Aug 2017 10:04:13 -0700 (PDT) Subject: Re: bug#28179: Fix use of string-to-multibyte in ispell.el To: Eli Zaretskii References: <83r2w3a76u.fsf@gnu.org> From: Reuben Thomas Organization: =?UTF-8?B?U0PCs0Q=?= Message-ID: Date: Tue, 22 Aug 2017 18:04:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <83r2w3a76u.fsf@gnu.org> Content-Type: multipart/alternative; boundary="------------9D7B087607D6BCF8D608BCBA" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) This is a multi-part message in MIME format. --------------9D7B087607D6BCF8D608BCBA Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 22/08/17 17:38, Eli Zaretskii wrote: >> From: Reuben Thomas >> Date: Tue, 22 Aug 2017 01:51:53 +0100 >> >> ;; Return a string decoded from Nth element of the current dictionary= =2E >> (defun ispell-get-decoded-string (n) >> "Get the decoded string in slot N of the descriptor of the current = dict." >> (let* ((slot (or >> (assoc ispell-current-dictionary ispell-local-dictionary-alist) >> (assoc ispell-current-dictionary ispell-dictionary-alist) >> - (error "No data for dictionary \"%s\", neither in `ispell-local-dic= tionary-alist' nor in `ispell-dictionary-alist'" >> - ispell-current-dictionary))) >> - (str (nth n slot))) >> - (when (and (> (length str) 0) >> - (not (multibyte-string-p str))) >> - (setq str (ispell-decode-string str)) >> - (or (multibyte-string-p str) >> - (setq str (string-to-multibyte str)))) > Are you sure we don't need to ensure ispell-get-decoded-string always > returns a multibyte string? What if decode-coding-string returns a > pure ASCII string, which is therefore unibyte? This is multibyte too, no? The Emacs manual says: > Rather, Emacs uses a variable-length internal representation of > characters, that stores each character as a sequence of 1 to 5 8-bit > bytes, depending on the magnitude of its codepoint(1). For example, > *any** > **ASCII character takes up only 1 byte*, a Latin-1 character takes up 2= > bytes, etc. We call this representation of text =E2=80=9Cmultibyte=E2=80= =9D. (My bold.) The reason I am using decode-coding-string is because that is what the obsolescence message in subr.el says to use. If I've overlooked something, then it would be nice to know what I've missed in the documentation, or whether the documentation could be improv= ed. --=20 https://rrt.sc3d.org --------------9D7B087607D6BCF8D608BCBA Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit On 22/08/17 17:38, Eli Zaretskii wrote:
From: Reuben Thomas <rrt@sc3d.org>
Date: Tue, 22 Aug 2017 01:51:53 +0100

 ;; Return a string decoded from Nth element of the current dictionary.
 (defun ispell-get-decoded-string (n)
   "Get the decoded string in slot N of the descriptor of the current dict."
   (let* ((slot (or
 		(assoc ispell-current-dictionary ispell-local-dictionary-alist)
 		(assoc ispell-current-dictionary ispell-dictionary-alist)
-		(error "No data for dictionary \"%s\", neither in `ispell-local-dictionary-alist' nor in `ispell-dictionary-alist'"
-		       ispell-current-dictionary)))
-	 (str (nth n slot)))
-    (when (and (> (length str) 0)
-	       (not (multibyte-string-p str)))
-      (setq str (ispell-decode-string str))
-      (or (multibyte-string-p str)
-	  (setq str (string-to-multibyte str))))
Are you sure we don't need to ensure ispell-get-decoded-string always
returns a multibyte string?  What if decode-coding-string returns a
pure ASCII string, which is therefore unibyte?

This is multibyte too, no? The Emacs manual says:

Rather, Emacs uses a variable-length internal representation of
characters, that stores each character as a sequence of 1 to 5 8-bit
bytes, depending on the magnitude of its codepoint(1).  For example, any
ASCII character takes up only 1 byte, a Latin-1 character takes up 2
bytes, etc.  We call this representation of text “multibyte”.

(My bold.)

The reason I am using decode-coding-string is because that is what the obsolescence message in subr.el says to use.

If I've overlooked something, then it would be nice to know what I've missed in the documentation, or whether the documentation could be improved.

-- 
https://rrt.sc3d.org
--------------9D7B087607D6BCF8D608BCBA-- From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 22 13:23:58 2017 Received: (at 28179) by debbugs.gnu.org; 22 Aug 2017 17:23:58 +0000 Received: from localhost ([127.0.0.1]:49817 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkCuE-0002vv-5r for submit@debbugs.gnu.org; Tue, 22 Aug 2017 13:23:58 -0400 Received: from eggs.gnu.org ([208.118.235.92]:50788) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkCuD-0002vi-42 for 28179@debbugs.gnu.org; Tue, 22 Aug 2017 13:23:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkCu4-0000C0-Ua for 28179@debbugs.gnu.org; Tue, 22 Aug 2017 13:23:52 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:34192) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkCu4-0000Bw-Qh; Tue, 22 Aug 2017 13:23:48 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2782 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dkCu4-0007wt-6v; Tue, 22 Aug 2017 13:23:48 -0400 Date: Tue, 22 Aug 2017 20:23:44 +0300 Message-Id: <83lgmba53z.fsf@gnu.org> From: Eli Zaretskii To: Reuben Thomas In-reply-to: (message from Reuben Thomas on Tue, 22 Aug 2017 18:04:11 +0100) Subject: Re: bug#28179: Fix use of string-to-multibyte in ispell.el References: <83r2w3a76u.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) > Cc: 28179@debbugs.gnu.org > From: Reuben Thomas > Date: Tue, 22 Aug 2017 18:04:11 +0100 > > Are you sure we don't need to ensure ispell-get-decoded-string always > returns a multibyte string? What if decode-coding-string returns a > pure ASCII string, which is therefore unibyte? > > This is multibyte too, no? The Emacs manual says: > > Rather, Emacs uses a variable-length internal representation of > characters, that stores each character as a sequence of 1 to 5 8-bit > bytes, depending on the magnitude of its codepoint(1). For example, any > ASCII character takes up only 1 byte, a Latin-1 character takes up 2 > bytes, etc. We call this representation of text “multibyte”. This is a misunderstanding, caused by the overloaded meaning of "multibyte string". The way I meant it, it has to do with the internal flag marking a string either unibyte or multibyte. Observe: (multibyte-string-p "abcd") => nil but (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t although (string= "abcd" (decode-coding-string "abcd" 'utf-8)) => t > The reason I am using decode-coding-string is because that is what the obsolescence message in subr.el > says to use. Yes, but the code already used decode-coding-string, in the function ispell-decode-string, which you replaced with its body. The call to string-to-multibyte worked on the result of decoding, not instead of the decoding. So actually the call to string-to-multibyte was not replaced, it was removed. > If I've overlooked something, then it would be nice to know what I've missed in the documentation, or whether > the documentation could be improved. Is the issue more clear now? From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 23 06:59:56 2017 Received: (at 28179) by debbugs.gnu.org; 23 Aug 2017 10:59:56 +0000 Received: from localhost ([127.0.0.1]:50431 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkTO8-0003kv-5H for submit@debbugs.gnu.org; Wed, 23 Aug 2017 06:59:56 -0400 Received: from mail-wr0-f180.google.com ([209.85.128.180]:35429) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkTO4-0003kg-VC for 28179@debbugs.gnu.org; Wed, 23 Aug 2017 06:59:54 -0400 Received: by mail-wr0-f180.google.com with SMTP id k46so4373972wre.2 for <28179@debbugs.gnu.org>; Wed, 23 Aug 2017 03:59:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sc3d.org; s=google; h=subject:references:to:from:organization:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=JFtMyb1ABUUg0SU2mRKzYbUV13zpTZDgxG41NjOMZb4=; b=y/O5E12TlPtz3/5GUib4BGEUlZywskwduV8xhVNlGppBhynHx1v7xTVheLT4MNHMYI rhVVXgE5BDWxj/r44s992fX5CNVwEm7Xk6nD/5Ov9rGILMrx3bFQt/3TeIFvtYqkDfn8 lF9VvlLOt453eq1rdRFCtNflE3pfAUJz9+2Ys= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:references:to:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding:content-language; bh=JFtMyb1ABUUg0SU2mRKzYbUV13zpTZDgxG41NjOMZb4=; b=BGSPTAfVxNFQEDPRTkwA+PNjWBTBuPyl4w/OdC+aDo8LHnF5q5bTZG20QXZEu7/0aA zUdDQCN803gIu+7gO05yavp9n3+AkB0gz47l6zNksMjSuZKZMeXfbv2F5CLB2gw6uwpS /7usZ28rCcSzl3hjzF61GtVFD9/whJOVQywDr/a0HTPawHmgL6eOPEacdJNvrj/3zpK9 nfhTRw+wAV2HQJK9AQ3pSXkUf45blQXQjwpI8Dvsd5kUXeL0FlDEE2tbH//opovAoitE 4aydCFkALL5amscIsGdJlWCGX9zJgPkwhB8/56YqKJVx+q8H2RzrWf3Y/4O0gOEutThe xtnw== X-Gm-Message-State: AHYfb5idiSGuScUWVGWVrP7WGuQE+8AYK8xX8GbCSPjthcEUwfBy5PdJ zExEBiOWmZ4Vkfc4u4FNLQ== X-Received: by 10.223.160.209 with SMTP id n17mr1259220wrn.190.1503485986531; Wed, 23 Aug 2017 03:59:46 -0700 (PDT) Received: from ?IPv6:2a02:c7d:51cb:c700:24d0:341f:23de:4c9e? ([2a02:c7d:51cb:c700:24d0:341f:23de:4c9e]) by smtp.gmail.com with ESMTPSA id d17sm1593797wrc.78.2017.08.23.03.59.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Aug 2017 03:59:44 -0700 (PDT) Subject: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el References: To: Eli Zaretskii , 28179@debbugs.gnu.org From: Reuben Thomas Organization: =?UTF-8?B?U0PCs0Q=?= X-Forwarded-Message-Id: Message-ID: <0df1f5ab-e99b-b473-549c-5a40045ab71a@sc3d.org> Date: Wed, 23 Aug 2017 11:59:41 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 28179 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) On 22/08/17 18:23, Eli Zaretskii wrote: >> Cc: 28179@debbugs.gnu.org >> From: Reuben Thomas >> Date: Tue, 22 Aug 2017 18:04:11 +0100 >> >> Are you sure we don't need to ensure ispell-get-decoded-string always >> returns a multibyte string? What if decode-coding-string returns a >> pure ASCII string, which is therefore unibyte? >> >> This is multibyte too, no? The Emacs manual says: >> >> Rather, Emacs uses a variable-length internal representation of >> characters, that stores each character as a sequence of 1 to 5 8-bit >> bytes, depending on the magnitude of its codepoint(1). For example, any >> ASCII character takes up only 1 byte, a Latin-1 character takes up 2 >> bytes, etc. We call this representation of text “multibyte”. > This is a misunderstanding, caused by the overloaded meaning of > "multibyte string". The way I meant it, it has to do with the > internal flag marking a string either unibyte or multibyte. Observe: > > (multibyte-string-p "abcd") => nil > > but > > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t So here, running decode-coding-string on a plain ASCII string returns a multibyte string. > ispell-decode-string, which you replaced with its body. The call to > string-to-multibyte worked on the result of decoding, not instead of > the decoding. So actually the call to string-to-multibyte was not > replaced, it was removed. Yes, that call seemed to be unnecessary. > Is the issue more clear now? I now understand the two meanings of "multibyte", but I don't understand how my patch is deficient. I tried even: (multibyte-string-p (decode-coding-string "abcde" 'utf-8 t)) ; returns t; also if I use 'us-ascii So in fact even when the string isn't copied (as in my patch, where I also use a third argument of t to decode-coding-string) it appears to be changed to a multibyte string. -- https://rrt.sc3d.org From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 24 12:59:36 2017 Received: (at 28179) by debbugs.gnu.org; 24 Aug 2017 16:59:36 +0000 Received: from localhost ([127.0.0.1]:53065 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkvTj-00059h-Qy for submit@debbugs.gnu.org; Thu, 24 Aug 2017 12:59:35 -0400 Received: from eggs.gnu.org ([208.118.235.92]:49794) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkvTi-00059V-Fn for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 12:59:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkvTb-0004tJ-Tg for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 12:59:29 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:42426) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkvTb-0004t1-QP; Thu, 24 Aug 2017 12:59:27 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4488 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dkvTa-0002I3-7O; Thu, 24 Aug 2017 12:59:27 -0400 Date: Thu, 24 Aug 2017 19:59:08 +0300 Message-Id: <83lgm89a1v.fsf@gnu.org> From: Eli Zaretskii To: Reuben Thomas In-reply-to: <0df1f5ab-e99b-b473-549c-5a40045ab71a@sc3d.org> (message from Reuben Thomas on Wed, 23 Aug 2017 11:59:41 +0100) Subject: Re: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el References: <0df1f5ab-e99b-b473-549c-5a40045ab71a@sc3d.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) > From: Reuben Thomas > Date: Wed, 23 Aug 2017 11:59:41 +0100 > > I now understand the two meanings of "multibyte", but I don't understand > how my patch is deficient. I didn't say it was deficient, I asked whether you verified that either (a) the result is always multibyte, or (b) that we don't need to worry about it being multibyte if it is pure-ASCII. > So in fact even when the string isn't copied (as in my patch, where I > also use a third argument of t to decode-coding-string) it appears to be > changed to a multibyte string. Fine, if you are sure, go ahead and push. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 24 13:32:49 2017 Received: (at 28179) by debbugs.gnu.org; 24 Aug 2017 17:32:49 +0000 Received: from localhost ([127.0.0.1]:53077 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkvzt-0005we-0v for submit@debbugs.gnu.org; Thu, 24 Aug 2017 13:32:49 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:36381) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkvzp-0005wP-7g for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 13:32:45 -0400 Received: by mail-oi0-f67.google.com with SMTP id t88so117053oij.3 for <28179@debbugs.gnu.org>; Thu, 24 Aug 2017 10:32:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=2ntu86ObcbQPCxsqAWQzY0ZL6YpurYXsKCJHcV1S2jU=; b=PWH+7HGFR93SsbUE+glC2Y5wmZG75pYLdVYTnTZ4na47ueRZz+ytzbhjwVlnUXfh1r QUSa/oQZbeWZP65dsKnlymsktTnxWEaMURVys+H1nNmkDTUDk46rnLvBuulX2obr9ABz OQR3r5hpWNh4cIY2bECtfA90eAOmwaz1bY562vEbBUHojD14aWifFgpLa+CLyv0IbZzS 7pezlyDrURwnYkvN6uEbrOj4oEqN/cv7bBWw4tFmb3XvrkSX9gLpwCvy8voAqIJXQyCy US0QlNgXst4YMD9a6bdj218SXc9ZEwDHuT1RIUB3pstEw68AAapda/J+DvAVZ3JEB1f4 OGTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=2ntu86ObcbQPCxsqAWQzY0ZL6YpurYXsKCJHcV1S2jU=; b=R1y6M/R6kdQWdcp5odo+NERgNW4dT9NxUG7f/bcpfL4jJSW2V8zVxXPWXgP0XgekDn fOZqVrKHb8AanmB5NKiXb+HpBW2zcs03+8pt22mzFIS3THB5BrbtUzqgKTDpOC705hDE OeUJh4KUwIvVqbGdK68Z1KodDuqfUT36wMVidWCHu71k3E1+yGNeco1ZDU5V5C2bpKdp PhgPqRK+tR9iCZH1ETeCciubJv8IdbE1V5MZR5Y0Mc0H2Zo6loWVJqemEGJUk/mXSrgt CirEE31q56eDRFxL1Kah3YLYH3v6+iOuMofZhS/yOd0zmYukA5yFypi70xK0ERzlszLQ rTwg== X-Gm-Message-State: AHYfb5h9Kx6+Muy41rKPlIgjaXTtzHFQiEdQPC0ZePgp6AW7Khh+3TvA rE5VnXivuyzJozgVoYGGFN4XRuM2NQ== X-Received: by 10.202.67.7 with SMTP id q7mr8584052oia.62.1503595959180; Thu, 24 Aug 2017 10:32:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.74.18.129 with HTTP; Thu, 24 Aug 2017 10:32:38 -0700 (PDT) In-Reply-To: <83lgm89a1v.fsf@gnu.org> References: <0df1f5ab-e99b-b473-549c-5a40045ab71a@sc3d.org> <83lgm89a1v.fsf@gnu.org> From: Noam Postavsky Date: Thu, 24 Aug 2017 13:32:38 -0400 X-Google-Sender-Auth: U2MLHb7fb3kOe_XNbpvyNmHNypk Message-ID: Subject: Re: bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el To: Eli Zaretskii Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org, Reuben Thomas X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) On Thu, Aug 24, 2017 at 12:59 PM, Eli Zaretskii wrote: >> From: Reuben Thomas >> So in fact even when the string isn't copied (as in my patch, where I >> also use a third argument of t to decode-coding-string) it appears to be >> changed to a multibyte string. > > Fine, if you are sure, go ahead and push. But please, think of the children^H^H^H^H^H^H^H^H readers (of your patch)! Put this information in the commit message. From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 24 13:45:49 2017 Received: (at 28179) by debbugs.gnu.org; 24 Aug 2017 17:45:50 +0000 Received: from localhost ([127.0.0.1]:53099 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkwCT-0006FF-NK for submit@debbugs.gnu.org; Thu, 24 Aug 2017 13:45:49 -0400 Received: from mail-wm0-f43.google.com ([74.125.82.43]:35614) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkwCR-0006F3-VY for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 13:45:48 -0400 Received: by mail-wm0-f43.google.com with SMTP id b189so1171276wmd.0 for <28179@debbugs.gnu.org>; Thu, 24 Aug 2017 10:45:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sc3d.org; s=google; h=subject:to:cc:references:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=WdKSZcRykexCb2azYCuIqUNd4ynsvmONGw06iOml0go=; b=smEMhSlGiFsk2alXmJGKqacYgOYUTmi0Wi0sxvUCkcSS2cO2ZvD/8oB31VEt+gVXFt UH4a0/O2B0q3LbwvNvAV6ucZ8sb2HnX3atQZ/Aa108EWzBOhP7LaB4IWoHR6qeWCneM0 b4EzHrsbQuqVPaDp18cSDQ3ETTnhrZsybGQU0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding:content-language; bh=WdKSZcRykexCb2azYCuIqUNd4ynsvmONGw06iOml0go=; b=qxyDWWnM/23v89xFlHc/2TWSlfEvuyJWfutYcWqw/Umcn2mEBlNjleepIY6mjPdbb0 JjIwxvN/998UGy7kZPfHrMI9CSCKHSuGRAGPg55pdUqkdi2YaxIY4cgAUMwEC/ZwGVo/ l8Jcq2QPkfO78AwO70IEZzN8uaYL9J1t8p4AT+d+XzkoWY6LdXMVs/H6e+dGX3tmjsjj O9a7UmIue+saoq+MbuKm6ZZX3pCaCPj+VTzYj5CYzmPswkRPY1Z7NHPUSyzmlDqQ2S8M pepesQYckE5smEdD5g1VtHDA0RWVic82758b1BteujpB7Xak230Fodu49U6YQ7zqWNag O6xA== X-Gm-Message-State: AHYfb5gHEmrv8X4axXG/M4QnYwE9LaCywzsUkIdVAPZureFr18WFGwCz WQlxVhA2nBAlwWEWgod2Dg== X-Received: by 10.28.63.13 with SMTP id m13mr4937749wma.63.1503596741436; Thu, 24 Aug 2017 10:45:41 -0700 (PDT) Received: from ?IPv6:2a02:c7d:51cb:c700:e8f6:5609:eba7:e5e? ([2a02:c7d:51cb:c700:e8f6:5609:eba7:e5e]) by smtp.gmail.com with ESMTPSA id 27sm5930391wru.62.2017.08.24.10.45.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 24 Aug 2017 10:45:40 -0700 (PDT) Subject: Re: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el To: Eli Zaretskii References: <0df1f5ab-e99b-b473-549c-5a40045ab71a@sc3d.org> <83lgm89a1v.fsf@gnu.org> From: Reuben Thomas Organization: =?UTF-8?B?U0PCs0Q=?= Message-ID: <4d1a7990-f076-9e22-39df-6edeef17ef7b@sc3d.org> Date: Thu, 24 Aug 2017 18:45:33 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <83lgm89a1v.fsf@gnu.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-GB X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On 24/08/17 17:59, Eli Zaretskii wrote: >> From: Reuben Thomas >> Date: Wed, 23 Aug 2017 11:59:41 +0100 >> >> I now understand the two meanings of "multibyte", but I don't understa= nd >> how my patch is deficient. > I didn't say it was deficient, Sorry, I was unclear. I meant, precisely, I don't see why you think my patch's code returns a string that is not multibyte. > I asked whether you verified that > either (a) the result is always multibyte I believe I showed this is the case. > >> So in fact even when the string isn't copied (as in my patch, where I >> also use a third argument of t to decode-coding-string) it appears to = be >> changed to a multibyte string. > Fine, if you are sure, go ahead and push. > The reason I am asking again is because you first said: > What if decode-coding-string returns a pure ASCII string, which is > therefore unibyte? and then later you said: > The way I meant it, it has to do with the internal flag marking a > string either unibyte or multibyte. Observe: > (multibyte-string-p "abcd") =3D> nil > > but > > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) =3D> t In other words: 1. As far as I can tell from the above (and my own confirmatory experiments and reading of the documentation), a pure ASCII string can be multibyte (it's a matter of the multibyte flag, not the number of bytes used to store each character). 2. decode-coding-string always returns a multibyte string. Since these two observations seemed to mean that you contradicted yourself, I was checking whether in fact I had misunderstood (so that for example one of my two observations above is wrong), or if your original understanding was incomplete (so that in fact your question about decode-coding-string is therefore misguided, because it can return a pure ASCII unibyte string (in the coding sense) which is nonetheless a multibyte string (in the sense that multibyte-string-p on it returns t). Sorry about the miscommunication. In any case, I think the code is correct, your original question was misguided, and I shall push, with, as Noam requested in another message, an explanation of my assumptions. No need to reply further unless you think there really is a problem! --=20 https://rrt.sc3d.org From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 24 14:22:08 2017 Received: (at 28179) by debbugs.gnu.org; 24 Aug 2017 18:22:08 +0000 Received: from localhost ([127.0.0.1]:53153 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkwlc-0000X8-0O for submit@debbugs.gnu.org; Thu, 24 Aug 2017 14:22:08 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46979) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkwlb-0000Ww-3d for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 14:22:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkwlR-0003EM-Ae for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 14:22:01 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43911) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkwlR-0003EC-7z; Thu, 24 Aug 2017 14:21:57 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1568 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dkwlO-0004r7-KO; Thu, 24 Aug 2017 14:21:57 -0400 Date: Thu, 24 Aug 2017 21:20:46 +0300 Message-Id: <83a82o969t.fsf@gnu.org> From: Eli Zaretskii To: Reuben Thomas In-reply-to: <4d1a7990-f076-9e22-39df-6edeef17ef7b@sc3d.org> (message from Reuben Thomas on Thu, 24 Aug 2017 18:45:33 +0100) Subject: Re: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el References: <0df1f5ab-e99b-b473-549c-5a40045ab71a@sc3d.org> <83lgm89a1v.fsf@gnu.org> <4d1a7990-f076-9e22-39df-6edeef17ef7b@sc3d.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) > Cc: 28179@debbugs.gnu.org > From: Reuben Thomas > Date: Thu, 24 Aug 2017 18:45:33 +0100 > > The reason I am asking again is because you first said: > > > What if decode-coding-string returns a pure ASCII string, which is > > therefore unibyte? > > and then later you said: > > > The way I meant it, it has to do with the internal flag marking a > > string either unibyte or multibyte. Observe: > > (multibyte-string-p "abcd") => nil > > > > but > > > > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t That example may be conclusive for UTF-8, but is it conclusive for _any_ encoding? I don't know. E.g., what about the ISO-2022 based encodings, where all the bytes are (AFAIR) pure ASCII? > 1. As far as I can tell from the above (and my own confirmatory > experiments and reading of the documentation), a pure ASCII string can > be multibyte (it's a matter of the multibyte flag, not the number of > bytes used to store each character). > > 2. decode-coding-string always returns a multibyte string. Can you show me why 2 is always correct? It might be, I simply don't know. All I know is that in general relying on plain-ASCII strings to be always multibyte in any given situation is risky, we were bitten by that a few times. But maybe it's not an issue in this case. Which is why I was asking you whether you have sufficient basis to believe this to be so in this case. > Since these two observations seemed to mean that you contradicted > yourself, I was checking whether in fact I had misunderstood (so that > for example one of my two observations above is wrong), or if your > original understanding was incomplete (so that in fact your question > about decode-coding-string is therefore misguided, because it can return > a pure ASCII unibyte string (in the coding sense) which is nonetheless a > multibyte string (in the sense that multibyte-string-p on it returns t). I only used decode-coding-string because I remembered it as an easy way of creating a multibyte ASCII string, when the coding-system is UTF-8, that's all. There was no contradiction in what I said, at least not an intended one. From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 24 14:50:27 2017 Received: (at 28179) by debbugs.gnu.org; 24 Aug 2017 18:50:27 +0000 Received: from localhost ([127.0.0.1]:53165 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkxD1-0001Ba-JV for submit@debbugs.gnu.org; Thu, 24 Aug 2017 14:50:27 -0400 Received: from mail-oi0-f54.google.com ([209.85.218.54]:34923) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkxCz-0001BM-2g for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 14:50:25 -0400 Received: by mail-oi0-f54.google.com with SMTP id k77so3073071oib.2 for <28179@debbugs.gnu.org>; Thu, 24 Aug 2017 11:50:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sc3d.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=5sM02P8aYmDdhN14lfDeHxagIMj7/wT/l5Ohbw6W35A=; b=d//P/jQ4Iy/YOAj8IpK3uTMo54sSKbtoaFk6uJ+9VM4UTaYdkJbjXea5JsSxhXFzEs Qry7fP3pQa5hPtZ2yoLwfkwTrJ+PEfEng6UkoyQyieAznXUS00kPg9Z1daQ5b58+GAie afjltKDNZGjzahE2uQw7V1zuuivTZEDDi3Ac8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=5sM02P8aYmDdhN14lfDeHxagIMj7/wT/l5Ohbw6W35A=; b=Z+UA0qkGud/xkOfzwGM3CaAeH8A+ReiJdIg01+2sXaP5dtgn0M5CNR9PMdB/D4/j7J 9TyLoHOmIEnkM0YPZi5B2vY+v+UwKJvWQCmhacojy/07+J8aFGrWEAAXxrmfTbI0UHzH ixBTJOWbggiHtm5eFJHe+4NWJhOG5kVTu2DNp4Wsk7kJinp3CvHamKvc5Yp7BWAZXLeX WmB6lbkXB9AZRsLoA83DjRd16OdZs2RYADAWkEloTtJtnG6mWztpV30aelLH8vJ9rYm5 3MLwrXCs5uLebYaltEjbc8kNGA1qK69XL862N9zsTVPqRqnprJkXPEkZ682pi8DQirHz eqKg== X-Gm-Message-State: AHYfb5jxywnGYKLPTl2fmnyukDi+gjb0z8rrmCYecCZRz6cupkSk/Otk kyiyq77+3n+DeyWgCSMQkPC2age/oLzYkBM= X-Received: by 10.202.252.199 with SMTP id a190mr9806604oii.268.1503600618043; Thu, 24 Aug 2017 11:50:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.60.154 with HTTP; Thu, 24 Aug 2017 11:50:17 -0700 (PDT) In-Reply-To: <83a82o969t.fsf@gnu.org> References: <0df1f5ab-e99b-b473-549c-5a40045ab71a@sc3d.org> <83lgm89a1v.fsf@gnu.org> <4d1a7990-f076-9e22-39df-6edeef17ef7b@sc3d.org> <83a82o969t.fsf@gnu.org> From: Reuben Thomas Date: Thu, 24 Aug 2017 19:50:17 +0100 Message-ID: Subject: Re: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el To: Eli Zaretskii Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) On 24 August 2017 at 19:20, Eli Zaretskii wrote: >> Cc: 28179@debbugs.gnu.org >> From: Reuben Thomas >> Date: Thu, 24 Aug 2017 18:45:33 +0100 >> >> The reason I am asking again is because you first said: >> >> > What if decode-coding-string returns a pure ASCII string, which is >> > therefore unibyte? >> >> and then later you said: >> >> > The way I meant it, it has to do with the internal flag marking a >> > string either unibyte or multibyte. Observe: >> > (multibyte-string-p "abcd") => nil >> > >> > but >> > >> > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t > > That example may be conclusive for UTF-8, but is it conclusive for > _any_ encoding? I don't know. E.g., what about the ISO-2022 based > encodings, where all the bytes are (AFAIR) pure ASCII? (multibyte-string-p (decode-coding-string "abcd" 'iso-2022-jp)) => t I still don't understand what you're getting at: the bytes in "abcd" are pure ASCII, whatever coding system one is decoding from. > Can you show me why 2 is always correct? It might be, I simply don't > know. All I know is that in general relying on plain-ASCII strings to > be always multibyte in any given situation is risky, we were bitten by > that a few times. But maybe it's not an issue in this case. Which is > why I was asking you whether you have sufficient basis to believe this > to be so in this case. I don't know. As I said before, the make-obsolete notice for string-to-multibyte says "use `decode-coding-string'". If it is as tricky as you suggest it might be, then the notice should be updated to point to more detailed guidance. The relevant commit is: commit f74d496478cd57f252817bd7437fe1b7972ce01f Author: Stefan Monnier Date: Mon Jan 30 13:02:18 2017 -0500 * lisp/subr.el (string-make-unibyte, string-make-multibyte): Obsolete. diff --git a/lisp/subr.el b/lisp/subr.el index a6ba05c..a204577 100644 --- a/lisp/subr.el +++ b/lisp/subr.el @@ -1417,8 +1417,10 @@ posn-object-width-height ;; bug#23850 (make-obsolete 'string-to-unibyte "use `encode-coding-string'." "26.1") (make-obsolete 'string-as-unibyte "use `encode-coding-string'." "26.1") +(make-obsolete 'string-make-unibyte "use `encode-coding-string'." "26.1") (make-obsolete 'string-to-multibyte "use `decode-coding-string'." "26.1") (make-obsolete 'string-as-multibyte "use `decode-coding-string'." "26.1") +(make-obsolete 'string-make-multibyte "use `decode-coding-string'." "26.1") I'm going to close this bug; if better documentation is needed, both for the obsolescence of string-to-multibyte and for multibyte strings in general, that's a new bug. -- https://rrt.sc3d.org From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 24 14:52:30 2017 Received: (at 28179-done) by debbugs.gnu.org; 24 Aug 2017 18:52:30 +0000 Received: from localhost ([127.0.0.1]:53169 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkxF0-0001ER-1v for submit@debbugs.gnu.org; Thu, 24 Aug 2017 14:52:30 -0400 Received: from mail-wm0-f54.google.com ([74.125.82.54]:38081) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkxEx-0001ED-QD for 28179-done@debbugs.gnu.org; Thu, 24 Aug 2017 14:52:28 -0400 Received: by mail-wm0-f54.google.com with SMTP id z132so2029871wmg.1 for <28179-done@debbugs.gnu.org>; Thu, 24 Aug 2017 11:52:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sc3d.org; s=google; h=to:from:subject:organization:message-id:date:user-agent :mime-version:content-transfer-encoding:content-language; bh=bxEIsgl8zBBLDAlr+5MM8fRVUbfHo2oWvqgA4J8mdkk=; b=xoNZpXWad/5tqJtEodW8c9u615ctZXsKshaYUPpoXs0Xz7gvBl/3/sppYKldvuEoUb WjuahD0Zf28bqOyNgtgoGORq9FU6Q9bJsclbpTNZUh+qUuIYPo9T5yLGFu6t5YdcxrUt 5CZH5ERH5l7qBrWB+1oxLCKRf139EvwRK9g1s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:organization:message-id:date :user-agent:mime-version:content-transfer-encoding:content-language; bh=bxEIsgl8zBBLDAlr+5MM8fRVUbfHo2oWvqgA4J8mdkk=; b=qeKIlZ3RkkDkqPwfREhDwI/xQirs+lmISgk6jyml+d8fwHs8FLBv7gX86l8DgHup++ /FhhA31tBSOAck4BBr0NvCB+Q7Cdqm094S8eozONIByLgx8zWU83ag1JGgCy7iMkDSo+ +EzW8PfcLVmVCUfapESzXWRweN88cPn/VAyM30oSt+tZCIwVtK3JvRWCXZK3I7zfzmBc /2LwfEwfPHEWOJQihcarh2o51iO4ZaSnZCddUcXByJ6uRrgONMes0DuD8RCNlLL4nzmk VftihE5Jf3G5yNu51fE6nxfpgTWHgHhEI61V0264AGhtL3MjrOpT63BNqbC8lEieO/UW PJ+A== X-Gm-Message-State: AHYfb5iHlm9ytAp4nmpM0uYxDo6qmH7vGh+B/8kvGXtM/b67EOauiSEl 3cLHqS3DdWeYMnfTPrWkTQ== X-Received: by 10.28.41.129 with SMTP id p123mr101229wmp.148.1503600741422; Thu, 24 Aug 2017 11:52:21 -0700 (PDT) Received: from ?IPv6:2a02:c7d:51cb:c700:e8f6:5609:eba7:e5e? ([2a02:c7d:51cb:c700:e8f6:5609:eba7:e5e]) by smtp.gmail.com with ESMTPSA id b186sm5398163wma.24.2017.08.24.11.52.19 for <28179-done@debbugs.gnu.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 24 Aug 2017 11:52:19 -0700 (PDT) To: 28179-done@debbugs.gnu.org From: Reuben Thomas Subject: Closing bug: patch is installed Organization: =?UTF-8?B?U0PCs0Q=?= Message-ID: Date: Thu, 24 Aug 2017 19:52:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-GB X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 28179-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) I've installed the patch, so closing the bug. There are some lingering doubts over the correctness of the use of decode-coding-string, but they're outside the scope of this bug. -- https://rrt.sc3d.org From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 24 15:03:01 2017 Received: (at 28179) by debbugs.gnu.org; 24 Aug 2017 19:03:01 +0000 Received: from localhost ([127.0.0.1]:53176 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkxPB-0001U4-0N for submit@debbugs.gnu.org; Thu, 24 Aug 2017 15:03:01 -0400 Received: from eggs.gnu.org ([208.118.235.92]:55558) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkxP7-0001To-Qd for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 15:02:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkxOz-0005Kl-Js for 28179@debbugs.gnu.org; Thu, 24 Aug 2017 15:02:52 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44703) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkxOz-0005KZ-Gc; Thu, 24 Aug 2017 15:02:49 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1587 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dkxOv-00037w-4u; Thu, 24 Aug 2017 15:02:49 -0400 Date: Thu, 24 Aug 2017 22:02:20 +0300 Message-Id: <83378g94cj.fsf@gnu.org> From: Eli Zaretskii To: Reuben Thomas In-reply-to: (message from Reuben Thomas on Thu, 24 Aug 2017 19:50:17 +0100) Subject: Re: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el References: <0df1f5ab-e99b-b473-549c-5a40045ab71a@sc3d.org> <83lgm89a1v.fsf@gnu.org> <4d1a7990-f076-9e22-39df-6edeef17ef7b@sc3d.org> <83a82o969t.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 28179 Cc: 28179@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) > From: Reuben Thomas > Date: Thu, 24 Aug 2017 19:50:17 +0100 > Cc: 28179@debbugs.gnu.org > > >> > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t > > > > That example may be conclusive for UTF-8, but is it conclusive for > > _any_ encoding? I don't know. E.g., what about the ISO-2022 based > > encodings, where all the bytes are (AFAIR) pure ASCII? > > (multibyte-string-p (decode-coding-string "abcd" 'iso-2022-jp)) => t That's not what I meant, but never mind. I only replied to tell there was no contradiction in my previous messages, and no confusion on my part, that's all. Thanks. From unknown Fri Jun 20 07:19:13 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 22 Sep 2017 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator