From unknown Wed Jun 18 23:09:03 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#74922 <74922@debbugs.gnu.org> To: bug#74922 <74922@debbugs.gnu.org> Subject: Status: 29.4; copy_string_contents doesn't always produce a valid utf-8 Reply-To: bug#74922 <74922@debbugs.gnu.org> Date: Thu, 19 Jun 2025 06:09:03 +0000 retitle 74922 29.4; copy_string_contents doesn't always produce a valid utf= -8 reassign 74922 emacs submitter 74922 Evgeny Kurnevsky severity 74922 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 17 01:08:51 2024 Received: (at submit) by debbugs.gnu.org; 17 Dec 2024 06:08:51 +0000 Received: from localhost ([127.0.0.1]:57238 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNQlH-0003WD-5C for submit@debbugs.gnu.org; Tue, 17 Dec 2024 01:08:51 -0500 Received: from lists.gnu.org ([209.51.188.17]:54500) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNQlD-0003W2-Kp for submit@debbugs.gnu.org; Tue, 17 Dec 2024 01:08:48 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tNQlC-00086I-UX for bug-gnu-emacs@gnu.org; Tue, 17 Dec 2024 01:08:46 -0500 Received: from mail-ed1-x52c.google.com ([2a00:1450:4864:20::52c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tNQlB-0002xE-8W for bug-gnu-emacs@gnu.org; Tue, 17 Dec 2024 01:08:46 -0500 Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-5d0d32cd31aso6035532a12.0 for ; Mon, 16 Dec 2024 22:08:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734415721; x=1735020521; darn=gnu.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=taNMoIz8ymKDr57HpX9zHMLO/SXJ35ueIf2rwlrFY14=; b=fgFV9qsGlBU7gsOJMtGw/pL49oWzZYsG2RcUYWO7Bphd7ehvu1rXpN3j4lcha6MPJE /BhVW28JIWMlTTUltorNnKtYv3h5e2ktNEZlTaVAMMD8Gu9/7x8ph1GGKWWknz4WksgE 8epg6CNTkWXkIx/ySzbRmXEguwg/FUt6+NwH0TZyzAtWi1sb/dEZJ9lXl6E82UJQ0avu /pOjhIaJmi+dWti7Pmjh7nDsJspj5jPWPi0X+49I5KEisTMp8f2NqZeNuz2dN91NWYkT peBrrkAzNWUjL1Yb/eq7Wd3mhkO5EbRZ3KVvTYC9iOq1RWcKUeQ1FHZYP9qhuZ9slNyj TvfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734415721; x=1735020521; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=taNMoIz8ymKDr57HpX9zHMLO/SXJ35ueIf2rwlrFY14=; b=CaKZ+bdF2NZF5uJO/9bKQ/D0M5Gml/ZlqJ+EPEggo21mxVKlwNeLAafLox0yUTDTpC +uKZMf/0ll9zXBSrczFFRNLI0MAN3KkU+akHbBNadHWJq8B5j8Z6vVVSvVTVB7qf6at0 nXoDsADNWQdsV5XDWdCsfTGK76wIscI9pEyRhe6W5uZFjJ9s7OoLJqYsy0ZQp3VL1jU+ NhKEPO2t7wEGrJD9IRt6j2GEpJmsrB90oGTQXhUKj2HrgDTpDPdrIBGt2iBVhPoVOEC4 xyBXLR6HPpHQjHhqyTX6FWuoZP+/Gs9sdf3OuidUcqmryg9KYjjlnKpyF1/P/tKKokHG s2sA== X-Gm-Message-State: AOJu0Yz66ObUbWrp4jF6Q0OtPltLrWuyC8TmFNKJiZiRMiP5YNzlUFyI xevI7yYW5x2sEnf6EkN01AkRlQKd3gL2WVY9iCVTiiyyMcVROLa73PqnUdluuLZYWKBz4Zeir5b ucx9DpoQL4cprYQaIbZJDjutox+ljnrdy4zo= X-Gm-Gg: ASbGncvV8a7DQAZ5BFw/mlhWrCXHlfim9Rvo2FNChQtOLLT9kXvusP4ESm9XrwDq+nx 3hoYgeampc3+l9wsvOtWW+J+fQrTyD9ZoFLkT51STIIKFDhM1O5hiQ/ivR47exsn5Yy0cqg== X-Google-Smtp-Source: AGHT+IHNFtOxnCWGTutSAonw+UOUdH4TDT0g6vBgtwvGxAIXSnh8uA3EiR+jUasIuPRB/yadhf9cJobH/CX0Quahgl8= X-Received: by 2002:a05:6402:34cf:b0:5d1:1064:326a with SMTP id 4fb4d7f45d1cf-5d63c33e9e0mr35706800a12.15.1734415721138; Mon, 16 Dec 2024 22:08:41 -0800 (PST) MIME-Version: 1.0 From: Evgeny Kurnevsky Date: Tue, 17 Dec 2024 06:08:30 +0000 Message-ID: Subject: 29.4; copy_string_contents doesn't always produce a valid utf-8 To: bug-gnu-emacs@gnu.org Content-Type: multipart/alternative; boundary="000000000000f3900306297120a6" Received-SPF: pass client-ip=2a00:1450:4864:20::52c; envelope-from=kurnevsky@gmail.com; helo=mail-ed1-x52c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --000000000000f3900306297120a6 Content-Type: text/plain; charset="UTF-8" According to the docs and comment inside module_copy_string_contents it should always produce a valid utf-8 string that can be used in dynamic modules, but it seems it's not always the case. I encountered an emacs crash when using emacs-module-rs because it always expects a valid utf-8 for strings. To reproduce you can call: (some-function-from-dynamic-library (encode-coding-string (f-read-text "wg-private-pc.age") 'utf-8 t)) The file is https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4809afcbf9/secrets/wg-private-pc.age See https://github.com/ubolonton/emacs-module-rs/issues/58 for additional details. --000000000000f3900306297120a6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
According to the docs and comment inside module_copy_= string_contents it should always produce a valid utf-8 string that can be u= sed in dynamic modules, but it seems it's not always the case. I encoun= tered an emacs crash when using emacs-module-rs because it always expects a= valid utf-8 for strings. To reproduce you can call:

(some-function-from-dynamic-library (encode-coding-string (f-read-text &= quot;wg-private-pc.age") 'utf-8 t))


--000000000000f3900306297120a6-- From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 17 08:18:19 2024 Received: (at 74922) by debbugs.gnu.org; 17 Dec 2024 13:18:19 +0000 Received: from localhost ([127.0.0.1]:58007 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNXSt-0007OO-2N for submit@debbugs.gnu.org; Tue, 17 Dec 2024 08:18:19 -0500 Received: from eggs.gnu.org ([209.51.188.92]:48724) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNXSq-0007O7-UG for 74922@debbugs.gnu.org; Tue, 17 Dec 2024 08:18:17 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tNXSk-0001bK-8w; Tue, 17 Dec 2024 08:18:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=caoD6v6UAalds8EnUn3paA7xu0xbzRn9Q5HxiYIZQaE=; b=PHzy5UWfeTdp qJVDztSvY0q9wQpIStPIeWNulKBqSyhKJ9c7Kr+ylLiZGJZynaUUtO9cZ7hlu00aE74puuKROrJ4n I9tBfgPA2fQ6wYfQvnnfZWXgjqEGlbZ30uwaxILzAcrvDn7wshvEFypBoDWiWq79ZqHpCvlakhJd/ 0kDSoLSbGUgYWLwIKrn2PSzb7+SVBfgroZmsNxJrbeuyDLrtpZQZgknf7kGeYNNs5Jg1q9ta98han q6gSczvTUj5h3FdXiw3UZCCk3l6YLdVVz405MhDxPktPvFlvNKdW0iIeEJ8oyADM5nmvPSV/uGBKX W7qrlBeggDPGvquTcy9uxA==; Date: Tue, 17 Dec 2024 15:18:07 +0200 Message-Id: <86msguo3cg.fsf@gnu.org> From: Eli Zaretskii To: Evgeny Kurnevsky In-Reply-To: (message from Evgeny Kurnevsky on Tue, 17 Dec 2024 06:08:30 +0000) Subject: Re: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 References: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 74922 Cc: 74922@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Evgeny Kurnevsky > Date: Tue, 17 Dec 2024 06:08:30 +0000 > > According to the docs and comment inside module_copy_string_contents it should always produce a valid > utf-8 string that can be used in dynamic modules, but it seems it's not always the case. I encountered an > emacs crash when using emacs-module-rs because it always expects a valid utf-8 for strings. To reproduce > you can call: > > (some-function-from-dynamic-library (encode-coding-string (f-read-text "wg-private-pc.age") 'utf-8 t)) > > The file is > https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4809afcbf9/secrets/wg-private-pc.age This string includes raw bytes, it isn't a text string, as far as I could see. It definitely isn't UTF-8 encoded text. What did you expect to happen with it when you copy such a string from Emacs? > See https://github.com/ubolonton/emacs-module-rs/issues/58 for additional details. Can't say there are too many details there... From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 17 08:33:24 2024 Received: (at 74922) by debbugs.gnu.org; 17 Dec 2024 13:33:24 +0000 Received: from localhost ([127.0.0.1]:58080 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNXhP-0008BY-IE for submit@debbugs.gnu.org; Tue, 17 Dec 2024 08:33:23 -0500 Received: from mail-wr1-f46.google.com ([209.85.221.46]:52378) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNXhK-0008B7-O6 for 74922@debbugs.gnu.org; Tue, 17 Dec 2024 08:33:15 -0500 Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-3862ca8e0bbso4539225f8f.0 for <74922@debbugs.gnu.org>; Tue, 17 Dec 2024 05:33:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734442329; x=1735047129; darn=debbugs.gnu.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=bxejSS1suD1NvAFa9thWQZcBxF/91+OSXa9Gz1RxWyc=; b=mDK9Z3ky0nIS56JqU111cXquGJPXHW2UMz4scXVdpzmAdkKCckrfjDEGoMIWCr5a2S yT9hiBtxW5qXp4BOfA6cKttqB3kBA6jZgRm1yTXbwWhGNGhSF78CY6vE/BqFTzdZtAg+ m2dygjAaucuV6NCeOBNnS04dbGTp/e3yeJd3jePhqbbh6cs3BdRuUR0Kx+QLeXKiGVZ9 1lbTBfI0tP//+OCmar1IBALchPFMR0uFJlM3y00qbgoNQ0awKsoarGbKUL8AIVPS8X3o 9x44XuXjU5JoCSyE+qwEjxZkJ2BjIlKkbutn/saBP7vDsHhk+gvg1+TSjKDHuhcd51gE bXow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734442329; x=1735047129; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bxejSS1suD1NvAFa9thWQZcBxF/91+OSXa9Gz1RxWyc=; b=viBIgDIzJZyHW71+3OW1hrbN+6LHrIWoCQEC4DIcuUiyQO9St2Q/mC+ggItuvXlegF zKGIGKb1MeNPaNx04UG1ctcUWi04X8pn9OANLMxnv0CUin8QQ9jimEYfGPOGW8YxOWnF bxTJW/F2V9LJIgEnlljkKnrFn5wL8WFHDdcf1yVAcYJVwBmFja96+kBmFZ/tQmMY4qHl bfnu+UYiFXLjsDpJrpuVz2k0VokH3VmK1b/A6nIZmOwjIG65Pnv2QTaNF2g+Jsk3YE1F M/3cSgZlgU4y8vfpjP59RnKs/h09VC5Ey1js8spcfVB+szoIxXauKWLMCOxFOdrw8Ma/ h4FQ== X-Gm-Message-State: AOJu0YyPYmq0yC/Q1P6iCJlMbHBhCw3p2+rZcmkctxBd594u0SOovibS Ee+9+gXTjVLiDxIir8bYN2qcC2Sv42bpMoT13xoBXAw8/clVesikKE3/SlpRe4c93uwG4RDzgPa EsbGeHOpvOHdkbQZalY546YOqu0X1gNdo X-Gm-Gg: ASbGncvkfR98QWJ9x9851ofhZ+kIJ9qJSfPhg9OF769rnucy9pi7ZlPWTiuTA9kChaZ /4fAijHo9KH20wALNPnQhJZU9PkwpKNdPVVhnFX4PDDV1Ae2bjrlYApUSpJXU5kX0qp93uw== X-Google-Smtp-Source: AGHT+IFcXngYXTyQKNPrJ7ZB2SZHp33Ekc3WY8iZnv7NlEIJk8wQgwwyTV/MnWP3ZNwttaXuIkldHpUifoIXF5sGdmI= X-Received: by 2002:a05:6000:4b0e:b0:382:4ab4:b3e5 with SMTP id ffacd0b85a97d-3886fe7c5ebmr16668316f8f.0.1734442328413; Tue, 17 Dec 2024 05:32:08 -0800 (PST) MIME-Version: 1.0 References: <86msguo3cg.fsf@gnu.org> In-Reply-To: From: Evgeny Kurnevsky Date: Tue, 17 Dec 2024 13:31:57 +0000 Message-ID: Subject: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 To: 74922@debbugs.gnu.org Content-Type: multipart/alternative; boundary="000000000000de5eba0629775208" X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 74922 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.3 (/) --000000000000de5eba0629775208 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Yes, that's a binary file that is not an utf-8 string. From the comment in module_copy_string_contents implementation I guessed that in such cases emacs should signal an error, but instead it just passes this invalid string to the dynamic library which caused this bug in emacs-module-rs (see https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#st= rings ). So if it's expected then maybe it should be explicitly said in the docs of copy_string_contents here https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.htm= l ? It just says that it stores the utf-8 encoded text which makes an impression that it's an always valid utf-8 string. On Tue, Dec 17, 2024 at 1:18=E2=80=AFPM Eli Zaretskii wrote: > > From: Evgeny Kurnevsky > > Date: Tue, 17 Dec 2024 06:08:30 +0000 > > > > According to the docs and comment inside module_copy_string_contents it > should always produce a valid > > utf-8 string that can be used in dynamic modules, but it seems it's not > always the case. I encountered an > > emacs crash when using emacs-module-rs because it always expects a vali= d > utf-8 for strings. To reproduce > > you can call: > > > > (some-function-from-dynamic-library (encode-coding-string (f-read-text > "wg-private-pc.age") 'utf-8 t)) > > > > The file is > > > https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4= 809afcbf9/secrets/wg-private-pc.age > > This string includes raw bytes, it isn't a text string, as far as I > could see. It definitely isn't UTF-8 encoded text. What did you > expect to happen with it when you copy such a string from Emacs? > > > See https://github.com/ubolonton/emacs-module-rs/issues/58 for > additional details. > > Can't say there are too many details there... > --000000000000de5eba0629775208 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Yes, that's a binary file that is not an utf-8 string. From th= e comment in module_copy_string_contents implementation I guessed that in s= uch cases emacs should signal an error, but instead it just passes this inv= alid string to the dynamic library which caused this bug in emacs-module-rs= (see https://ubolonton.github.io/emacs= -module-rs/latest/type-conversions.html#strings ). So if it's expec= ted then maybe it should be explicitly said in the docs of=C2=A0copy_string_contents here ht= tps://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.html<= /a> ? It just says that it stores the utf-8 encoded text which makes an imp= ression that it's an always valid utf-8 string.

> From: Evgeny Kurnevsky <kurnevsky@gmail.com>
> Date: Tue, 17 Dec 2024 06:08:30 +0000
>
> According to the docs and comment inside module_copy_string_contents i= t should always produce a valid
> utf-8 string that can be used in dynamic modules, but it seems it'= s not always the case. I encountered an
> emacs crash when using emacs-module-rs because it always expects a val= id utf-8 for strings. To reproduce
> you can call:
>
> (some-function-from-dynamic-library (encode-coding-string (f-read-text= "wg-private-pc.age") 'utf-8 t))
>
> The file is
> https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627= a55788b80d4809afcbf9/secrets/wg-private-pc.age

This string includes raw bytes, it isn't a text string, as far as I
could see.=C2=A0 It definitely isn't UTF-8 encoded text.=C2=A0 What did= you
expect to happen with it when you copy such a string from Emacs?

> See https://github.com/ubolonton/emacs-mo= dule-rs/issues/58 for additional details.

Can't say there are too many details there...
--000000000000de5eba0629775208-- From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 17 09:24:40 2024 Received: (at 74922) by debbugs.gnu.org; 17 Dec 2024 14:24:40 +0000 Received: from localhost ([127.0.0.1]:58223 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNYV4-0002F9-UO for submit@debbugs.gnu.org; Tue, 17 Dec 2024 09:24:39 -0500 Received: from eggs.gnu.org ([209.51.188.92]:59390) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNYUv-0002Ek-AM for 74922@debbugs.gnu.org; Tue, 17 Dec 2024 09:24:31 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tNYUp-000413-8H; Tue, 17 Dec 2024 09:24:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=20ro7N241Oa++eVcX0skEXHwB77tYt/jmub6b6t2VFA=; b=ad4wj91heifu bohSuZi/vbdmP0sbg9jU9OLWuq+LzooRdmr3cqsoaqUQ1M85XGrJdHTPTKCHo0vuU5J8lSgr8V4tf IVJYqAQqFAgjw9fOqkwA3DZ9jL/W9ArPUa7GbWJTMjkGDLnRKEpLtp49nKpjflxSD8nMD0B2/1xVL vfVbr6Xh8ALIv69QRXu3C7oDxpJQ0QEp5Es/ZPtMiX/ymrOFElgctNgVLNkii5ZR06FfNfl3hh7AD X7lpRnREnw3MbfNirQXcq0H0/Upj016wEd1EvXQfITwCU9utjw/ye22+kPt/AMq6v3z5Qpvnjpl+B 1j62+8x7XU9srdqrjCZY+w==; Date: Tue, 17 Dec 2024 16:24:13 +0200 Message-Id: <8634imo0aa.fsf@gnu.org> From: Eli Zaretskii To: Evgeny Kurnevsky In-Reply-To: (message from Evgeny Kurnevsky on Tue, 17 Dec 2024 13:31:57 +0000) Subject: Re: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 References: <86msguo3cg.fsf@gnu.org> X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: 74922 Cc: 74922@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) > From: Evgeny Kurnevsky > Date: Tue, 17 Dec 2024 13:31:57 +0000 > > Yes, that's a binary file that is not an utf-8 string. From the comment in module_copy_string_contents > implementation I guessed that in such cases emacs should signal an error, but instead it just passes this > invalid string to the dynamic library which caused this bug in emacs-module-rs (see > https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#strings ). So if it's expected then > maybe it should be explicitly said in the docs of copy_string_contents here > https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.html ? It just says that it stores > the utf-8 encoded text which makes an impression that it's an always valid utf-8 string. I could look into the internals, but I actually wonder why the module doesn't check the text before relying on such subtle behaviors. We didn't document the fact that it signals an error for a reason. So: why cannot the module code or the application which uses it test up from that the string it copies is human-readable text, nit some binary junk? From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 17 09:47:47 2024 Received: (at 74922) by debbugs.gnu.org; 17 Dec 2024 14:47:47 +0000 Received: from localhost ([127.0.0.1]:58271 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNYrR-0003Wy-RW for submit@debbugs.gnu.org; Tue, 17 Dec 2024 09:47:46 -0500 Received: from mail-ej1-f51.google.com ([209.85.218.51]:51326) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNYrN-0003Wm-4i for 74922@debbugs.gnu.org; Tue, 17 Dec 2024 09:47:42 -0500 Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-aa67f31a858so994207866b.2 for <74922@debbugs.gnu.org>; Tue, 17 Dec 2024 06:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734446800; x=1735051600; darn=debbugs.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/XLe4OqjeyI0qc00q/3u89dy/Yxw+RtxC+vYjVeBzQY=; b=ayrTZPJsWWl4MXSl9nFW50hnin0k5bSskEXIj0aUFQOlpP6b4PmFr30x3P9rtY/GqU Tem4E83MAk2HUvbGqTVuzOR5Jjy51pYMYDU+Z5xxQzaf1eIPnFS+Ihxqncg+5Vn9s7So ja1TE0eZgv+UxP7Va5/qdNSFMr8MOgpjHWC18+gpWLkjV4+PdqqYX+I9PUccMStIclAx 98duaWsUsXZzbjiDQKbq5L0dB102Du6225A4Okw6E8DHoHjYASSGFzFtGNHe8TG6dZxA ktw2i8ABJG+mylq8VDyOBBJ7xxgosr3XDDcXmgUFn097JDljVfaWbxcLk3dvFqAo7eTD +XJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734446800; x=1735051600; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/XLe4OqjeyI0qc00q/3u89dy/Yxw+RtxC+vYjVeBzQY=; b=Yd2oiziUHdnMT2UDriHlzwLVhQPyAV2G4OY1zQ8AiLOtRF7mjpWdoG35w0hpAS6WYB OVTIy/70bmvdr7sSc4VXKFK7UEXsWVuaSNW4ou9HMV4WrXcLZygzhxw8YOf4UCPoDD0q g6xj+luctQCSrZ18JwRTNyLQz94IGbE8VYmChmy9z2E3u+fOnoZE2kkazhxWchRsn5zw uRp9e/hNQm0PZ/wLiWSuMfzFo+kxskBZlxzoQyDgJojdx1hLVZGx950bd4JmxfKsEoW7 l+Rm3qcH7WTcHV9ZRNLeJNz7O2USVIfpEPuFDIIbeDrhf8Std+KseCHT6wdbzWtc3CQ0 9Muw== X-Gm-Message-State: AOJu0YxmzdH9VfY635yPtM8kOlJsxvWZPSAsjWbhFGcOItNUgvCdaVYq F+LMYhHaitZy1T+j9uvIoBSNTgwtLMAQqHnXGX1i+iHChZ3nxwKcyW/kssPo6wT/pD57bv84KhT 63nn4BfCXFkE+CmbhK1gg8DheKF0= X-Gm-Gg: ASbGncvtJuMYyhfQp12Acs+Z6AGDdRCJYrGmnxOk+LuSBr+IjApsB+VwVJDzVkEHtsw FPd3bDSj37Gy+G/kt5PLU0a2fyGeEWm8KvJBFIWP7o26/Zg4vyjy1rPG0HWTSWoEcRQpIRA== X-Google-Smtp-Source: AGHT+IEHAy0nGGog7mMtkJz3SuU/ru7vDz47mb0LHhP7O4wLVn104nZe9zLcTKwAEXXRyO/yRZi/3YodEOgO58eFUZM= X-Received: by 2002:a17:907:2d89:b0:aa6:832b:8d71 with SMTP id a640c23a62f3a-aab778d9dbemr1376575066b.2.1734446799901; Tue, 17 Dec 2024 06:46:39 -0800 (PST) MIME-Version: 1.0 References: <86msguo3cg.fsf@gnu.org> <8634imo0aa.fsf@gnu.org> In-Reply-To: <8634imo0aa.fsf@gnu.org> From: Evgeny Kurnevsky Date: Tue, 17 Dec 2024 14:46:28 +0000 Message-ID: Subject: Re: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 To: Eli Zaretskii Content-Type: multipart/alternative; boundary="00000000000063de2e0629785d3e" X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 74922 Cc: 74922@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.2 (/) --00000000000063de2e0629785d3e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs can pass an invalid string. So currently this case causes undefined behavior there which results in emacs crash. On Tue, Dec 17, 2024 at 2:24=E2=80=AFPM Eli Zaretskii wrote: > > From: Evgeny Kurnevsky > > Date: Tue, 17 Dec 2024 13:31:57 +0000 > > > > Yes, that's a binary file that is not an utf-8 string. From the comment > in module_copy_string_contents > > implementation I guessed that in such cases emacs should signal an > error, but instead it just passes this > > invalid string to the dynamic library which caused this bug in > emacs-module-rs (see > > > https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#= strings > ). So if it's expected then > > maybe it should be explicitly said in the docs of copy_string_contents > here > > > https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.h= tml > ? It just says that it stores > > the utf-8 encoded text which makes an impression that it's an always > valid utf-8 string. > > I could look into the internals, but I actually wonder why the module > doesn't check the text before relying on such subtle behaviors. We > didn't document the fact that it signals an error for a reason. > > So: why cannot the module code or the application which uses it test > up from that the string it copies is human-readable text, nit some > binary junk? > --=20 =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, =D0=9A=D1=83= =D1=80=D0=BD=D0=B5=D0=B2=D1=81=D0=BA=D0=B8=D0=B9 =D0=95=D0=B2=D0=B3=D0=B5= =D0=BD=D0=B8=D0=B9. --00000000000063de2e0629785d3e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
It can definitely do it, but I guess in emacs-module-rs it= 's not done by default because of performance implications - it might b= e quite costly to check every string in some cases, and it wasn't reall= y clear if emacs can pass an invalid string. So currently this case causes = undefined behavior there which results in emacs crash.

On Tue, Dec 17, 2024 at 2:24=E2=80=AFPM Eli Zaretskii <eliz@gnu.org> wrote:
> From: Evgeny Kurnevsky <kurnevsky@gmail.com><= br> > Date: Tue, 17 Dec 2024 13:31:57 +0000
>
> Yes, that's a binary file that is not an utf-8 string. From the co= mment in module_copy_string_contents
> implementation I guessed that in such cases emacs should signal an err= or, but instead it just passes this
> invalid string to the dynamic library which caused this bug in emacs-m= odule-rs (see
> https://ubolont= on.github.io/emacs-module-rs/latest/type-conversions.html#strings ). So= if it's expected then
> maybe it should be explicitly said in the docs of copy_string_contents= here
> https://www.gnu.org= /software/emacs/manual/html_node/elisp/Module-Values.html ? It just say= s that it stores
> the utf-8 encoded text which makes an impression that it's an alwa= ys valid utf-8 string.

I could look into the internals, but I actually wonder why the module
doesn't check the text before relying on such subtle behaviors.=C2=A0 W= e
didn't document the fact that it signals an error for a reason.

So: why cannot the module code or the application which uses it test
up from that the string it copies is human-readable text, nit some
binary junk?


--
= =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, =D0=9A=D1=83= =D1=80=D0=BD=D0=B5=D0=B2=D1=81=D0=BA=D0=B8=D0=B9 =D0=95=D0=B2=D0=B3=D0=B5= =D0=BD=D0=B8=D0=B9.
--00000000000063de2e0629785d3e-- From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 17 10:10:46 2024 Received: (at 74922) by debbugs.gnu.org; 17 Dec 2024 15:10:46 +0000 Received: from localhost ([127.0.0.1]:59877 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNZDh-00058l-Uv for submit@debbugs.gnu.org; Tue, 17 Dec 2024 10:10:46 -0500 Received: from eggs.gnu.org ([209.51.188.92]:53934) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNZDg-00058X-AE for 74922@debbugs.gnu.org; Tue, 17 Dec 2024 10:10:44 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tNZDb-00030i-0n; Tue, 17 Dec 2024 10:10:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=U2qP28LipYAo/fRPHgtKDz8o72yu9lXevF3poo+iEzM=; b=nWCd5QrrMuKa CyCRQ24Y6R2eyObISUhaEi4Lg6eGKPLgdUiQyWIolfLk/K36ExU+IZUyFqP80s0DscJww7flmqIoO SkA89ikfOM1xf6pN1dxTLZOkuFAdcPqrj0hFiVVqS0mv4/uFZuJEzvoZ46tmJP1g0z/S0O7YL+QjR J3TW6XxNVA99koXPHo9MItV9eKwnw2CYYznDNMtQw+i28tMnDG6rYYp3Sw3rpVkvs0NDRTnYCp9KP UfuAhp5kd3K/wDVmG/AxZkbfdRmr88jtDT/jSZsG3WNyjXEULy5PIZcHOyGZ4nCISL9Cko0ad45sL Ec4BcJoQyLfuiS2umRJR5A==; Date: Tue, 17 Dec 2024 17:10:36 +0200 Message-Id: <86zfkumjkj.fsf@gnu.org> From: Eli Zaretskii To: Evgeny Kurnevsky In-Reply-To: (message from Evgeny Kurnevsky on Tue, 17 Dec 2024 14:46:28 +0000) Subject: Re: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 References: <86msguo3cg.fsf@gnu.org> <8634imo0aa.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 74922 Cc: 74922@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Evgeny Kurnevsky > Date: Tue, 17 Dec 2024 14:46:28 +0000 > Cc: 74922@debbugs.gnu.org > > It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance > implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs > can pass an invalid string. So currently this case causes undefined behavior there which results in emacs > crash. What do Rust programs do when they are told to read random files? This is the same situation, basically. And what would the module do if copy_string_contents *did* signal an error? From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 21 07:09:44 2024 Received: (at 74922) by debbugs.gnu.org; 21 Dec 2024 12:09:44 +0000 Received: from localhost ([127.0.0.1]:45571 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tOyIi-0006YE-0i for submit@debbugs.gnu.org; Sat, 21 Dec 2024 07:09:44 -0500 Received: from eggs.gnu.org ([209.51.188.92]:58878) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tOyIg-0006Y0-Ak for 74922@debbugs.gnu.org; Sat, 21 Dec 2024 07:09:43 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tOyIb-00020N-3j; Sat, 21 Dec 2024 07:09:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=eyVS7eyMyQ+b5R8/bH/+7fhTxJhNcSIP4vIsnvXvkqo=; b=sKfxhISAZt/L 7Few+MZVnWAPUq9IOSgARgzC5CV+PpLleWwxvDj2A9Qi61exxo06BbDKTVRLAfgNCTfKbFoHBXA0W SiCA4PfY6AJExA23rRkaEv/MhtaWSOGZbDYNQOEEhdM109j7B8EsYEJqErvAY0WYwrgTOdzjZyNwo ZQ8HzlUrHinEuqQNSXcM1aBGgR/WufL0ue+w+chA42gJ2xoKl5+WZ62grLileWSZhm11ObhgkuXsO AhdqY0h3cHMizQOO3sfP6RvcykMmtBNsh1iedL7ScEiO1A4mJNF3ISWpfjtxNkTq4xP3Fs6f0YUCO VgGiRIHs3/WDyfl2JIcHEw==; Date: Sat, 21 Dec 2024 14:09:24 +0200 Message-Id: <86ttax6xvv.fsf@gnu.org> From: Eli Zaretskii To: kurnevsky@gmail.com In-Reply-To: <86zfkumjkj.fsf@gnu.org> (message from Eli Zaretskii on Tue, 17 Dec 2024 17:10:36 +0200) Subject: Re: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 References: <86msguo3cg.fsf@gnu.org> <8634imo0aa.fsf@gnu.org> <86zfkumjkj.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 74922 Cc: 74922@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Cc: 74922@debbugs.gnu.org > Date: Tue, 17 Dec 2024 17:10:36 +0200 > From: Eli Zaretskii > > > From: Evgeny Kurnevsky > > Date: Tue, 17 Dec 2024 14:46:28 +0000 > > Cc: 74922@debbugs.gnu.org > > > > It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance > > implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs > > can pass an invalid string. So currently this case causes undefined behavior there which results in emacs > > crash. > > What do Rust programs do when they are told to read random files? > This is the same situation, basically. > > And what would the module do if copy_string_contents *did* signal an > error? I think I know what happened: you called copy_string_contents with a unibyte string. In that case, copy_string_contents will return you the original string without doing anything. The code in copy_string_contents that signals an error relies on the fact that encoding the input string yields nil if the input includes non-Unicode characters. But that cannot be established with unibyte strings, because a unibyte string doesn't hold characters, it holds raw bytes. What you should do is make sure the string passed to copy_string_contents is a multibyte string. If I do that, i.e. (switch-to-buffer "foo") (set-buffer-multibyte t) (insert-file-contents "/path/to/wg-private-pc.age") (setq str1 (buffer-string)) and then call copy_string_contents with the resulting string str1, I get the result you expected. You need to realize that copy_string_contents is a variant of text-encoding routines: it encodes the input multibyte string in UTF-8. The encoding routines in Emacs always return unibyte strings without doing anything, because a unibyte string is already encoded, or at least is supposed to be encoded. And before you ask: no, copy_string_contents cannot by itself signal an error if passed a unibyte string, because a unibyte string can legitimately be a valid UTF-8 string. So in this case, copy_string_contents relies on the caller to make sure the input is valid UTF-8. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 04 06:39:36 2025 Received: (at 74922-done) by debbugs.gnu.org; 4 Jan 2025 11:39:36 +0000 Received: from localhost ([127.0.0.1]:53633 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tU2VD-0002wo-PB for submit@debbugs.gnu.org; Sat, 04 Jan 2025 06:39:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:45674) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tU2VB-0002wZ-7e for 74922-done@debbugs.gnu.org; Sat, 04 Jan 2025 06:39:33 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tU2V5-0000qW-VD; Sat, 04 Jan 2025 06:39:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=FB7oxxBbQuS3v59RRJeGs3NmJDs52Y/bfHMG9ANXDIM=; b=EAttuw16YFXL cWWZK5puNgMsM47uh1MDVbUS+dXcD92yC1jN28lFTqYt4SBWeCOFJbqUmot7cvgC+MiqvSOydTNkI J03HtMOwmYdrwzen+e+5FVs7XOMj/tY9sinCVhThXfu0v71dH2AKD5u690uAAaxlMZSqWe1yQ4VZ5 a3VHDLkLaX3TFrTW6+YJc1o41rsp5kFfHNQuO/Gh64BTfD2Th4bO2ME4k2cnAi8edeFSwcRHzr9Ml Oc5nMjhT0SDXH8m2TCXv8Nv5zEXgbBmASsCLpdf9VmVx6a8srs2bn/+aIC6q41yTGMKlhLsS0BKiR /IHjMCvkrN3zip+8Gy3Sgg==; Date: Sat, 04 Jan 2025 13:39:25 +0200 Message-Id: <86o70merki.fsf@gnu.org> From: Eli Zaretskii To: kurnevsky@gmail.com In-Reply-To: <86ttax6xvv.fsf@gnu.org> (message from Eli Zaretskii on Sat, 21 Dec 2024 14:09:24 +0200) Subject: Re: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 References: <86msguo3cg.fsf@gnu.org> <8634imo0aa.fsf@gnu.org> <86zfkumjkj.fsf@gnu.org> <86ttax6xvv.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 74922-done Cc: 74922-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Cc: 74922@debbugs.gnu.org > Date: Sat, 21 Dec 2024 14:09:24 +0200 > From: Eli Zaretskii > > > Cc: 74922@debbugs.gnu.org > > Date: Tue, 17 Dec 2024 17:10:36 +0200 > > From: Eli Zaretskii > > > > > From: Evgeny Kurnevsky > > > Date: Tue, 17 Dec 2024 14:46:28 +0000 > > > Cc: 74922@debbugs.gnu.org > > > > > > It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance > > > implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs > > > can pass an invalid string. So currently this case causes undefined behavior there which results in emacs > > > crash. > > > > What do Rust programs do when they are told to read random files? > > This is the same situation, basically. > > > > And what would the module do if copy_string_contents *did* signal an > > error? > > I think I know what happened: you called copy_string_contents with a > unibyte string. In that case, copy_string_contents will return you > the original string without doing anything. The code in > copy_string_contents that signals an error relies on the fact that > encoding the input string yields nil if the input includes non-Unicode > characters. But that cannot be established with unibyte strings, > because a unibyte string doesn't hold characters, it holds raw bytes. > > What you should do is make sure the string passed to > copy_string_contents is a multibyte string. If I do that, i.e. > > (switch-to-buffer "foo") > (set-buffer-multibyte t) > (insert-file-contents "/path/to/wg-private-pc.age") > (setq str1 (buffer-string)) > > and then call copy_string_contents with the resulting string str1, I > get the result you expected. > > You need to realize that copy_string_contents is a variant of > text-encoding routines: it encodes the input multibyte string in > UTF-8. The encoding routines in Emacs always return unibyte strings > without doing anything, because a unibyte string is already encoded, > or at least is supposed to be encoded. > > And before you ask: no, copy_string_contents cannot by itself signal > an error if passed a unibyte string, because a unibyte string can > legitimately be a valid UTF-8 string. So in this case, > copy_string_contents relies on the caller to make sure the input is > valid UTF-8. I believe the above explains the problem and the solution, so I'm now closing this bug. From unknown Wed Jun 18 23:09:03 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 01 Feb 2025 12:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator