GNU bug report logs - #18520
string ports should not have an encoding

Previous Next

Package: guile;

Reported by: David Kastrup <dak <at> gnu.org>

Date: Sun, 21 Sep 2014 23:35:02 UTC

Severity: wishlist

Full log


Message #50 received at 18520 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: David Kastrup <dak <at> gnu.org>
Cc: 18520 <at> debbugs.gnu.org
Subject: Re: bug#18520: string ports should not have an encoding
Date: Tue, 23 Sep 2014 18:01:28 +0200
David Kastrup <dak <at> gnu.org> skribis:

> They result in code like
>
>   // we do our own utf8 encoding and verification in the parser, so we
>   // use the no-conversion equivalent of latin1
>   SCM str = scm_from_latin1_string (c_str ());
>   scm_dynwind_begin ((scm_t_dynwind_flags)0);
>   // Why doesn't scm_set_port_encoding_x work here?
>   scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
>   str_port_ = scm_open_input_string (str);
>   scm_dynwind_end ();
>   scm_set_port_filename_x (str_port_, ly_string2scm (name_));
> }

So here ‘c_str’ returns a char * that is a UTF-8-encoded string, right?

In that case, it should be enough to do:

  /* Get a Scheme string from its UTF-8 representation.  */
  str = scm_from_utf8_string (c_str ());

  /* Create an input string port.  ‘read-char’ & co. will return each
     character from STR, one at a time.  */
  str_port = open_input_string (str);

  scm_set_port_filename_x (str_port, file);

As long as textual I/O procedures are used on ‘str_port’, there’s no
need to worry about its encoding.

Now, to be able to use ‘ftell’ and assume it returns the position as a
number of bytes in the UTF-8 sequence, something like this should work
(for 2.0; for 2.2 nothing special is needed):

  /* Get a Scheme string from its UTF-8 representation.  */
  str = scm_from_utf8_string (c_str ());

  scm_dynwind_begin (0);

  /* Make sure the following string port uses UTF-8 as the internal
     encoding of its buffer.  */
  scm_dynwind_fluid (scm_public_ref ("guile", "%default-port-encoding"),
                     scm_from_latin1_string ("UTF-8"));

  /* Create an input string port.  ‘read-char’ & co. will return each
     character from STR, one at a time.  */
  str_port = open_input_string (str);
  scm_dynwind_end ();

  scm_set_port_filename_x (str_port, file);

Does this help for LilyPond?

Ludo’.




This bug report was last modified 10 years and 258 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.