GNU bug report logs -
#74922
29.4; copy_string_contents doesn't always produce a valid utf-8
Previous Next
Reported by: Evgeny Kurnevsky <kurnevsky <at> gmail.com>
Date: Tue, 17 Dec 2024 06:09:01 UTC
Severity: normal
Found in version 29.4
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8
which was filed against the emacs package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 74922 <at> debbugs.gnu.org.
--
74922: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=74922
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
> Cc: 74922 <at> debbugs.gnu.org
> Date: Sat, 21 Dec 2024 14:09:24 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
>
> > Cc: 74922 <at> debbugs.gnu.org
> > Date: Tue, 17 Dec 2024 17:10:36 +0200
> > From: Eli Zaretskii <eliz <at> gnu.org>
> >
> > > From: Evgeny Kurnevsky <kurnevsky <at> gmail.com>
> > > Date: Tue, 17 Dec 2024 14:46:28 +0000
> > > Cc: 74922 <at> debbugs.gnu.org
> > >
> > > It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance
> > > implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs
> > > can pass an invalid string. So currently this case causes undefined behavior there which results in emacs
> > > crash.
> >
> > What do Rust programs do when they are told to read random files?
> > This is the same situation, basically.
> >
> > And what would the module do if copy_string_contents *did* signal an
> > error?
>
> I think I know what happened: you called copy_string_contents with a
> unibyte string. In that case, copy_string_contents will return you
> the original string without doing anything. The code in
> copy_string_contents that signals an error relies on the fact that
> encoding the input string yields nil if the input includes non-Unicode
> characters. But that cannot be established with unibyte strings,
> because a unibyte string doesn't hold characters, it holds raw bytes.
>
> What you should do is make sure the string passed to
> copy_string_contents is a multibyte string. If I do that, i.e.
>
> (switch-to-buffer "foo")
> (set-buffer-multibyte t)
> (insert-file-contents "/path/to/wg-private-pc.age")
> (setq str1 (buffer-string))
>
> and then call copy_string_contents with the resulting string str1, I
> get the result you expected.
>
> You need to realize that copy_string_contents is a variant of
> text-encoding routines: it encodes the input multibyte string in
> UTF-8. The encoding routines in Emacs always return unibyte strings
> without doing anything, because a unibyte string is already encoded,
> or at least is supposed to be encoded.
>
> And before you ask: no, copy_string_contents cannot by itself signal
> an error if passed a unibyte string, because a unibyte string can
> legitimately be a valid UTF-8 string. So in this case,
> copy_string_contents relies on the caller to make sure the input is
> valid UTF-8.
I believe the above explains the problem and the solution, so I'm now
closing this bug.
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
According to the docs and comment inside module_copy_string_contents it
should always produce a valid utf-8 string that can be used in dynamic
modules, but it seems it's not always the case. I encountered an emacs
crash when using emacs-module-rs because it always expects a valid utf-8
for strings. To reproduce you can call:
(some-function-from-dynamic-library (encode-coding-string (f-read-text
"wg-private-pc.age") 'utf-8 t))
The file is
https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4809afcbf9/secrets/wg-private-pc.age
See https://github.com/ubolonton/emacs-module-rs/issues/58 for additional
details.
[Message part 5 (text/html, inline)]
This bug report was last modified 136 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.