From unknown Wed Jun 18 23:04:35 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#53236 <53236@debbugs.gnu.org> To: bug#53236 <53236@debbugs.gnu.org> Subject: Status: 26.1; encode-coding-string does not encode the string as expected Reply-To: bug#53236 <53236@debbugs.gnu.org> Date: Thu, 19 Jun 2025 06:04:35 +0000 retitle 53236 26.1; encode-coding-string does not encode the string as expe= cted reassign 53236 emacs submitter 53236 Markus Triska severity 53236 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Thu Jan 13 14:46:10 2022 Received: (at submit) by debbugs.gnu.org; 13 Jan 2022 19:46:10 +0000 Received: from localhost ([127.0.0.1]:34654 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n8630-0006d7-1s for submit@debbugs.gnu.org; Thu, 13 Jan 2022 14:46:10 -0500 Received: from lists.gnu.org ([209.51.188.17]:56322) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n862x-0006cp-0V for submit@debbugs.gnu.org; Thu, 13 Jan 2022 14:46:08 -0500 Received: from eggs.gnu.org ([209.51.188.92]:38830) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n862u-00088b-0S for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2022 14:46:04 -0500 Received: from [78.47.144.35] (port=42036 helo=metalevel.at) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n862r-0007Y5-JM for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2022 14:46:02 -0500 Received: from mt-Lenovo-ideapad-120S-11IAP (localhost [127.0.0.1]) by metalevel.at (Postfix) with ESMTP id A31059C73F for ; Thu, 13 Jan 2022 20:45:58 +0100 (CET) Received: by mt-Lenovo-ideapad-120S-11IAP (Postfix, from userid 1000) id 2C6CA141261; Thu, 13 Jan 2022 20:45:58 +0100 (CET) From: Markus Triska To: bug-gnu-emacs@gnu.org Subject: 26.1; encode-coding-string does not encode the string as expected Date: Thu, 13 Jan 2022 20:45:57 +0100 Message-ID: <8735lra07e.fsf@metalevel.at> MIME-Version: 1.0 Content-Type: text/plain X-Host-Lookup-Failed: Reverse DNS lookup failed for 78.47.144.35 (failed) Received-SPF: none client-ip=78.47.144.35; envelope-from=triska@metalevel.at; helo=metalevel.at X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Dear all, please consider the UTF-8 encoding of the Unicode codepoint 0x80, which is formed by two bytes. In hexadecimal notation, they are: 0xC2 0x80. We can use decode-coding-string to verify that this byte sequence is decoded to 0x80 when specifying utf-8, which works exactly as expected: (decode-coding-string "\xC2\x80" 'utf-8) This yields "\200", which is the same as "\x80", as verified via: (string= "\200" "\x80") --> t Correspondingly, I expect (encode-coding-string "\200" 'utf-8) to yield a string equivalent to "\xC2\x80", but that seems not to be the case. I get: (encode-coding-string "\200" 'utf-8) --> "\200" And therefore, unexpectedly: (string= (encode-coding-string "\200" 'utf-8) "\xC2\x80") --> nil It appears that encode-coding-string does not encode the string in UTF-8 as expected. Is there any way to obtain the desired encoding with encode-coding-string, i.e., the UTF-8-encoded string "\xC2\x80"? Thank you and all the best! Markus In GNU Emacs 26.1 (build 3, x86_64-pc-linux-gnu, X toolkit, Xaw scroll bars) of 2019-04-09 built on mt-laptop Windowing system distributor 'The X.Org Foundation', version 11.0.12004000 System Description: Ubuntu 19.04 Configured features: XPM JPEG GIF PNG SOUND GSETTINGS NOTIFY GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS LUCID X11 THREADS Important settings: value of $LC_MONETARY: en_GB.UTF-8 value of $LC_NUMERIC: en_GB.UTF-8 value of $LC_TIME: en_GB.UTF-8 value of $LANG: en_US.UTF-8 value of $XMODIFIERS: @im=ibus locale-coding-system: utf-8-unix From debbugs-submit-bounces@debbugs.gnu.org Thu Jan 13 15:23:52 2022 Received: (at 53236) by debbugs.gnu.org; 13 Jan 2022 20:23:52 +0000 Received: from localhost ([127.0.0.1]:34681 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n86dU-0007ka-Dq for submit@debbugs.gnu.org; Thu, 13 Jan 2022 15:23:52 -0500 Received: from mail-oi1-f169.google.com ([209.85.167.169]:39459) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n86dS-0007kH-Dd for 53236@debbugs.gnu.org; Thu, 13 Jan 2022 15:23:51 -0500 Received: by mail-oi1-f169.google.com with SMTP id e81so9368254oia.6 for <53236@debbugs.gnu.org>; Thu, 13 Jan 2022 12:23:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=pyV1sMVRcizTwdhe6lk9IZy+dkc2Yxwn2m/nWjQ/92g=; b=gKMAxrKcuXi9aHgUvMS3pMBwwpIg1+lc17epVDnmIUtJkoPQS1lVVysS5ougXnULpJ 6xqEp5in2CJTYCqh965rZDEov0HSc4igE5/l43O6x6XndWepBHKRM+Zqg5Ax7UTXhRDQ gVRqFrSxHdwvqDYIJzHpTYcmGPZwJ4n+B+5M0vjLMBRXF79k1IBe3zB8wrsLjbTcVpss zSm1RmsfKMytKuZBiCTVMbqKt3U6huruHQLu9adzpbb8jbElIVokx6Vd4FfX2NJ4xS3u ELEOXmmB6SiikKma0nQ7Xv9KoV8/CBYphaqEeQ/tSq9iRbXr49ISGfUNtvc/HRtGkI/z XCAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=pyV1sMVRcizTwdhe6lk9IZy+dkc2Yxwn2m/nWjQ/92g=; b=7Bc4BRiwpMeqCpUnrxNAdrwd5kS/pME+RcAfB3MOP3bKT5u3chaN/H49HqwWCdFvP0 zH8cSpXplTejkwrdgLJh0q/lDo3i3l5NwnIuCB2QcFvNOyi2fEHzB3MM/BrMTNxgX4xK DgOCjZJ9fbGgPjOJxOwRq+NpZIUapRLsTIJ2tDYhNjroDpp+e97djX029TW8Q6/NcXDG GbJWtobag2xHJV8BE67SUt/7jVqfaSW1jO4xPd9MvbqXAjeMps+dPwQdVxFt0ILrQi6Q R5RN7EooDScuVdgJbc83pdxVJVBQg4DaQtaYXVko7H7c7zkk22fWf3DJv3CLYD/gopMK am6Q== X-Gm-Message-State: AOAM530p3FXrddcSc2Me74HhoKrsYn2oLBsSF7xYAyvpNBjGoatOozbi vCStOWrQlGYoo0mggh98BfIbUGh7XHlwOMzQPopFZyq4l7o= X-Google-Smtp-Source: ABdhPJzyl3fb4/K4KofEHn6YlA5uNQaxtz0ks6vuj6PByEl6I6TI3gVEQTECocOxGGSK3L6mc9cJgl4FPP844CkMd7o= X-Received: by 2002:aca:eb52:: with SMTP id j79mr9603352oih.150.1642105424605; Thu, 13 Jan 2022 12:23:44 -0800 (PST) MIME-Version: 1.0 References: <8735lra07e.fsf@metalevel.at> In-Reply-To: <8735lra07e.fsf@metalevel.at> From: Philipp Stephani Date: Thu, 13 Jan 2022 21:23:33 +0100 Message-ID: Subject: Re: bug#53236: 26.1; encode-coding-string does not encode the string as expected To: Markus Triska Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 53236 Cc: 53236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.8 (/) Am Do., 13. Jan. 2022 um 21:14 Uhr schrieb Markus Triska : > > Dear all, > > please consider the UTF-8 encoding of the Unicode codepoint 0x80, which > is formed by two bytes. In hexadecimal notation, they are: 0xC2 0x80. > > We can use decode-coding-string to verify that this byte sequence is > decoded to 0x80 when specifying utf-8, which works exactly as expected: > > (decode-coding-string "\xC2\x80" 'utf-8) > > This yields "\200", which is the same as "\x80", as verified via: > > (string=3D "\200" "\x80") --> t There are two possible interpretations of "\200": 1. The unibyte string containing the byte #x80 2. The multibyte string containing the Unicode character U+0080 The string literal "\200" gives you the former, while (decode-coding-string "\xC2\x80" 'utf-8) gives you the latter. In fact, (string=3D (decode-coding-string "\xC2\x80" 'utf-8) "\200") =E2=87=92 nil but (string=3D (decode-coding-string "\xC2\x80" 'utf-8) "\u0080") =E2=87=92 t > > Correspondingly, I expect (encode-coding-string "\200" 'utf-8) to yield > a string equivalent to "\xC2\x80", but that seems not to be the case. I g= et: > > (encode-coding-string "\200" 'utf-8) --> "\200" Here "\200" gives you the unibyte string that contains the byte #x80. That can't be encoded as UTF-8 (since UTF-8 encodes Unicode scalar values, not raw bytes), so it's left alone. However, (encode-coding-string "\u0080" 'utf-8) =E2=87=92 "\302\200" There's some background in the chapter "Text representations" in the ELisp manual. HTH From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 14 01:55:49 2022 Received: (at 53236) by debbugs.gnu.org; 14 Jan 2022 06:55:49 +0000 Received: from localhost ([127.0.0.1]:35284 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n8GV3-00085J-Ck for submit@debbugs.gnu.org; Fri, 14 Jan 2022 01:55:49 -0500 Received: from eggs.gnu.org ([209.51.188.92]:49962) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n8GV1-000854-Ul for 53236@debbugs.gnu.org; Fri, 14 Jan 2022 01:55:48 -0500 Received: from [2001:470:142:3::e] (port=35300 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n8GUv-0007YC-R8; Fri, 14 Jan 2022 01:55:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=hg8aCLSFgBTIkliN/VBVAiiiIPh9jkY8v0YnmtAGuMg=; b=rRMUvSGhQ/sH M9B9KGHgrRBfzODtzc2hjEkdg/C8PDiHxp+mJ9m0UhSHib4A18HasJuTgEac0KjS5iNzua4PDBGiM U32BgsDloDUv7zeeWiYRgAs/DXdKyg2wmrjF9ikFsGIHS/DkgAZ6LwsvqtxhQ7hvjDNezs8EadhMB aqW82tgq61hQdA80zDWVk8XgLXo+olvHag1GW4KM5TlEEfqIWgoaSjJM1cADYYHDAh8yq3LSZ46J7 oLkgcvCuKTH2xnv4N4Ai7e4pSLYXhrK9UQWDGX0onibmgEeAGLrwofTVr7uJW1MU45/EcT8NIyNWS LXZ/hT9kRNVFv6yPl+odKQ==; Received: from [87.69.77.57] (port=1574 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n8GUj-0002fU-RI; Fri, 14 Jan 2022 01:55:39 -0500 Date: Fri, 14 Jan 2022 08:55:30 +0200 Message-Id: <838rvi3ixp.fsf@gnu.org> From: Eli Zaretskii To: Markus Triska In-Reply-To: <8735lra07e.fsf@metalevel.at> (message from Markus Triska on Thu, 13 Jan 2022 20:45:57 +0100) Subject: Re: bug#53236: 26.1; encode-coding-string does not encode the string as expected References: <8735lra07e.fsf@metalevel.at> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 53236 Cc: 53236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Markus Triska > Date: Thu, 13 Jan 2022 20:45:57 +0100 > > Correspondingly, I expect (encode-coding-string "\200" 'utf-8) to yield > a string equivalent to "\xC2\x80", but that seems not to be the case. I get: > > (encode-coding-string "\200" 'utf-8) --> "\200" > > And therefore, unexpectedly: > > (string= (encode-coding-string "\200" 'utf-8) "\xC2\x80") --> nil "\200" is a unibyte string, and encoding unibyte strings returns those strings without changing them. This is not a bug, just a dark corner of encoding/decoding stuff. From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 14 05:00:21 2022 Received: (at 53236) by debbugs.gnu.org; 14 Jan 2022 10:00:21 +0000 Received: from localhost ([127.0.0.1]:35688 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n8JNd-00047D-Dd for submit@debbugs.gnu.org; Fri, 14 Jan 2022 05:00:21 -0500 Received: from mail-out.m-online.net ([212.18.0.9]:54995) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n8JNb-000474-V5 for 53236@debbugs.gnu.org; Fri, 14 Jan 2022 05:00:20 -0500 Received: from frontend01.mail.m-online.net (unknown [192.168.8.182]) by mail-out.m-online.net (Postfix) with ESMTP id 4JZxcg1fVQz1qwyG; Fri, 14 Jan 2022 11:00:19 +0100 (CET) Received: from localhost (dynscan1.mnet-online.de [192.168.6.70]) by mail.m-online.net (Postfix) with ESMTP id 4JZxcg03L0z1qqkC; Fri, 14 Jan 2022 11:00:18 +0100 (CET) X-Virus-Scanned: amavisd-new at mnet-online.de Received: from mail.mnet-online.de ([192.168.8.182]) by localhost (dynscan1.mail.m-online.net [192.168.6.70]) (amavisd-new, port 10024) with ESMTP id yXXQtnQIG5jn; Fri, 14 Jan 2022 11:00:18 +0100 (CET) X-Auth-Info: AT1AHeF5a3lseaUeZOSXNSMzKx5BnNzxX4eZjYvV6tXQTkNKsXENRXiN4co43scL Received: from igel.home (ppp-46-244-178-192.dynamic.mnet-online.de [46.244.178.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.mnet-online.de (Postfix) with ESMTPSA; Fri, 14 Jan 2022 11:00:18 +0100 (CET) Received: by igel.home (Postfix, from userid 1000) id 9B1CE2C323E; Fri, 14 Jan 2022 11:00:17 +0100 (CET) From: Andreas Schwab To: Eli Zaretskii Subject: Re: bug#53236: 26.1; encode-coding-string does not encode the string as expected References: <8735lra07e.fsf@metalevel.at> <838rvi3ixp.fsf@gnu.org> X-Yow: .. ich bin in einem dusenjet ins jahr 53 vor chr... ich lande im antiken Rom... einige gladiatoren spielen scrabble... ich rieche PIZZA... Date: Fri, 14 Jan 2022 11:00:17 +0100 In-Reply-To: <838rvi3ixp.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 14 Jan 2022 08:55:30 +0200") Message-ID: <87sftq63im.fsf@igel.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.91 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.4 (/) X-Debbugs-Envelope-To: 53236 Cc: 53236@debbugs.gnu.org, Markus Triska X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.4 (-) On Jan 14 2022, Eli Zaretskii wrote: >> From: Markus Triska >> Date: Thu, 13 Jan 2022 20:45:57 +0100 >> >> Correspondingly, I expect (encode-coding-string "\200" 'utf-8) to yield >> a string equivalent to "\xC2\x80", but that seems not to be the case. I get: >> >> (encode-coding-string "\200" 'utf-8) --> "\200" >> >> And therefore, unexpectedly: >> >> (string= (encode-coding-string "\200" 'utf-8) "\xC2\x80") --> nil > > "\200" is a unibyte string, and encoding unibyte strings returns those > strings without changing them. > > This is not a bug, just a dark corner of encoding/decoding stuff. Or a dark corner of the string syntax. ELISP> (multibyte-string-p "\200") nil ELISP> (multibyte-string-p "\x80") nil ELISP> (multibyte-string-p "\x0080") t ELISP> (encode-coding-string "\x0080" 'utf-8) "\302\200" -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 15 01:41:04 2022 Received: (at control) by debbugs.gnu.org; 15 Jan 2022 06:41:04 +0000 Received: from localhost ([127.0.0.1]:38810 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n8ckJ-0000eJ-TB for submit@debbugs.gnu.org; Sat, 15 Jan 2022 01:41:03 -0500 Received: from [78.47.144.35] (port=57804 helo=metalevel.at) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n8ckD-0000di-PW for control@debbugs.gnu.org; Sat, 15 Jan 2022 01:41:02 -0500 Received: by metalevel.at (Postfix, from userid 1000) id 329AF9C74E; Sat, 15 Jan 2022 07:40:56 +0100 (CET) From: Markus Triska To: control@debbugs.gnu.org Subject: Re: bug#53236: 26.1; encode-coding-string does not encode the string as expected References: <8735lra07e.fsf@metalevel.at> <838rvi3ixp.fsf@gnu.org> <87sftq63im.fsf@igel.home> Date: Sat, 15 Jan 2022 07:40:56 +0100 In-Reply-To: <87sftq63im.fsf@igel.home> (Andreas Schwab's message of "Fri, 14 Jan 2022 11:00:17 +0100") Message-ID: <878rvhbix3.fsf@metalevel.at> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 1.3 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: close 53236 Content analysis details: (1.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 SPF_NONE SPF: sender does not publish an SPF Record 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) close 53236 From unknown Wed Jun 18 23:04:35 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 12 Feb 2022 12:24:07 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator