#25397 - guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string

GNU bug report logs - #25397
guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string

Package: guile;

Reported by: linasvepstas <at> gmail.com

Date: Sun, 8 Jan 2017 18:17:01 UTC

Severity: normal

Message #14 received at 25397 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com> To: Linas Vepstas <linasvepstas <at> gmail.com> Cc: 25397 <at> debbugs.gnu.org Subject: Re: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Date: Wed, 01 Mar 2017 16:45:26 +0100

On Tue 10 Jan 2017 04:34, Linas Vepstas <linasvepstas <at> gmail.com> writes: > void *wrap_puts(void* p) > { > char *wtf = p; > > SCM port = scm_current_output_port (); > > scm_puts("the port-encoding is=", port); > scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port); > > scm_puts("\nThe string to display is =", port); > scm_puts (wtf, port); > > scm_puts("\nWas expecting to see this=", port); > SCM str = scm_from_utf8_string(wtf); > scm_display(str, port); > scm_puts("\n\n", port); > > return NULL; > } So, there are a few questions here. scm_puts and scm_lfwrite are not documented, so we need to do basic science on them to see what they are supposed to do. Firstly, is scm_puts() a textual interface or a binary interface? I.e. does it write a sequence of characters or a sequence of bytes? If I look at uses of scm_puts in Guile sources, it seems clear that it's a textual interface. That is to say, at all points, the intention seems to be to write characters on a Guile port. All of the uses are of strings. Please do a "git grep" on your source to see if your perceptions correspond. Now the question is, what encoding is the argument in? If the port is UTF-16, that byte string should be decoded to characters, and that character sequence encoded to UTF-16. All of the scm_puts calls in Guile are of one-byte characters with codepoints less than 128, so when doing some port refactoring I chose to interpret the argument as latin1. FTR, in Guile 2.0, this was effectively a binary interface. Guile 2.0's scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for the purposes of updating line and column, but scm_puts and scm_lfwrite just wrote out the bytes to the port directly, regardless of the encoding. That was the wrong thing. Are you arguing that the byte string given to scm_puts should be decoded from UTF-8? That would be OK. Andy

This bug report was last modified 8 years and 167 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #25397 guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string

GNU bug report logs - #25397
guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string