GNU bug report logs -
#25397
guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string
Previous Next
Full log
Message #14 received at 25397 <at> debbugs.gnu.org (full text, mbox):
On Tue 10 Jan 2017 04:34, Linas Vepstas <linasvepstas <at> gmail.com> writes:
> void *wrap_puts(void* p)
> {
> char *wtf = p;
>
> SCM port = scm_current_output_port ();
>
> scm_puts("the port-encoding is=", port);
> scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port);
>
> scm_puts("\nThe string to display is =", port);
> scm_puts (wtf, port);
>
> scm_puts("\nWas expecting to see this=", port);
> SCM str = scm_from_utf8_string(wtf);
> scm_display(str, port);
> scm_puts("\n\n", port);
>
> return NULL;
> }
So, there are a few questions here. scm_puts and scm_lfwrite are not
documented, so we need to do basic science on them to see what they are
supposed to do.
Firstly, is scm_puts() a textual interface or a binary interface?
I.e. does it write a sequence of characters or a sequence of bytes?
If I look at uses of scm_puts in Guile sources, it seems clear that it's
a textual interface. That is to say, at all points, the intention seems
to be to write characters on a Guile port. All of the uses are of
strings. Please do a "git grep" on your source to see if your
perceptions correspond.
Now the question is, what encoding is the argument in? If the port is
UTF-16, that byte string should be decoded to characters, and that
character sequence encoded to UTF-16.
All of the scm_puts calls in Guile are of one-byte characters with
codepoints less than 128, so when doing some port refactoring I chose to
interpret the argument as latin1.
FTR, in Guile 2.0, this was effectively a binary interface. Guile 2.0's
scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for
the purposes of updating line and column, but scm_puts and scm_lfwrite
just wrote out the bytes to the port directly, regardless of the
encoding. That was the wrong thing.
Are you arguing that the byte string given to scm_puts should be decoded
from UTF-8? That would be OK.
Andy
This bug report was last modified 8 years and 106 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.