GNU bug report logs - #74922
29.4; copy_string_contents doesn't always produce a valid utf-8

Previous Next

Package: emacs;

Reported by: Evgeny Kurnevsky <kurnevsky <at> gmail.com>

Date: Tue, 17 Dec 2024 06:09:01 UTC

Severity: normal

Found in version 29.4

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log

View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Evgeny Kurnevsky <kurnevsky <at> gmail.com>
Subject: bug#74922: closed (Re: bug#74922: Fwd: bug#74922: 29.4;
 copy_string_contents doesn't always produce a valid utf-8)
Date: Sat, 04 Jan 2025 11:40:02 +0000

[Message part 1 (text/plain, inline)]

Your bug report

#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8

which was filed against the emacs package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 74922 <at> debbugs.gnu.org.

-- 
74922: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=74922
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems

[Message part 2 (message/rfc822, inline)]

From: Eli Zaretskii <eliz <at> gnu.org>
To: kurnevsky <at> gmail.com
Cc: 74922-done <at> debbugs.gnu.org
Subject: Re: bug#74922: Fwd: bug#74922: 29.4;
 copy_string_contents doesn't always produce a valid utf-8
Date: Sat, 04 Jan 2025 13:39:25 +0200

> Cc: 74922 <at> debbugs.gnu.org
> Date: Sat, 21 Dec 2024 14:09:24 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > Cc: 74922 <at> debbugs.gnu.org
> > Date: Tue, 17 Dec 2024 17:10:36 +0200
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > 
> > > From: Evgeny Kurnevsky <kurnevsky <at> gmail.com>
> > > Date: Tue, 17 Dec 2024 14:46:28 +0000
> > > Cc: 74922 <at> debbugs.gnu.org
> > > 
> > > It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance
> > > implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs
> > > can pass an invalid string. So currently this case causes undefined behavior there which results in emacs
> > > crash.
> > 
> > What do Rust programs do when they are told to read random files?
> > This is the same situation, basically.
> > 
> > And what would the module do if copy_string_contents *did* signal an
> > error?
> 
> I think I know what happened: you called copy_string_contents with a
> unibyte string.  In that case, copy_string_contents will return you
> the original string without doing anything.  The code in
> copy_string_contents that signals an error relies on the fact that
> encoding the input string yields nil if the input includes non-Unicode
> characters. But that cannot be established with unibyte strings,
> because a unibyte string doesn't hold characters, it holds raw bytes.
> 
> What you should do is make sure the string passed to
> copy_string_contents is a multibyte string.  If I do that, i.e.
> 
>   (switch-to-buffer "foo")
>   (set-buffer-multibyte t)
>   (insert-file-contents "/path/to/wg-private-pc.age")
>   (setq str1 (buffer-string))
> 
> and then call copy_string_contents with the resulting string str1, I
> get the result you expected.
> 
> You need to realize that copy_string_contents is a variant of
> text-encoding routines: it encodes the input multibyte string in
> UTF-8.  The encoding routines in Emacs always return unibyte strings
> without doing anything, because a unibyte string is already encoded,
> or at least is supposed to be encoded.
> 
> And before you ask: no, copy_string_contents cannot by itself signal
> an error if passed a unibyte string, because a unibyte string can
> legitimately be a valid UTF-8 string. So in this case,
> copy_string_contents relies on the caller to make sure the input is
> valid UTF-8.

I believe the above explains the problem and the solution, so I'm now
closing this bug.

[Message part 3 (message/rfc822, inline)]

From: Evgeny Kurnevsky <kurnevsky <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.4; copy_string_contents doesn't always produce a valid utf-8
Date: Tue, 17 Dec 2024 06:08:30 +0000

[Message part 4 (text/plain, inline)]

According to the docs and comment inside module_copy_string_contents it
should always produce a valid utf-8 string that can be used in dynamic
modules, but it seems it's not always the case. I encountered an emacs
crash when using emacs-module-rs because it always expects a valid utf-8
for strings. To reproduce you can call:

(some-function-from-dynamic-library (encode-coding-string (f-read-text
"wg-private-pc.age") 'utf-8 t))

The file is
https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4809afcbf9/secrets/wg-private-pc.age

See https://github.com/ubolonton/emacs-module-rs/issues/58 for additional
details.

[Message part 5 (text/html, inline)]

This bug report was last modified 235 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #74922 29.4; copy_string_contents doesn't always produce a valid utf-8

GNU bug report logs - #74922
29.4; copy_string_contents doesn't always produce a valid utf-8