#18520 - string ports should not have an encoding

GNU bug report logs - #18520
string ports should not have an encoding

Package: guile;

Reported by: David Kastrup <dak <at> gnu.org>

Date: Sun, 21 Sep 2014 23:35:02 UTC

Severity: wishlist

Message #29 received at 18520 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org> To: ludo <at> gnu.org (Ludovic Courtès) Cc: 18520 <at> debbugs.gnu.org Subject: Re: bug#18520: string ports should not have an encoding Date: Tue, 23 Sep 2014 00:12:58 +0200

ludo <at> gnu.org (Ludovic Courtès) writes: > David Kastrup <dak <at> gnu.org> skribis: >> >> For error messages, yes. For associating a position in a string with a >> previously parsed closure, no. > > But wouldn’t a line/column pair be as suitable as a unique identifier as > the position in the file? As long as the reencoded UTF-8 is byte-identical to the original. At the current point of time, we flag non-UTF-8 sequences with a warning and continue. People complained previously about things like Latin-1 characters (most likely to occur in comments or lyrics where they cause little or well-identifiable havoc) leading to unceremonious aborts without identifiable cause. At any rate, the current behavior does not make sense. Guile 2.0 might refuse to turn a string into a port, and for Guile 2.2 the port encoding may be used to have a UTF-8 rendition of the string characters be interpreted in another encoding (like latin-1) but not the other way round. Both versions make only some half-baked sense. Most resulting problems can probably be worked around in some manner, but string ports are actually the main stringbuf-like mechanism that Scheme has (dynamically growing strings that are more compact than a list of characters). Wedging a compulsory code conversion into it that is mirrored in the port positions seems like a distraction. > Also, if the result of ‘ftell’ is used as a unique identifier, does it > really matter whether it’s an offset measured in bytes or in > character? In the LilyPond lexer, stuff is usually measured with byte offsets. Yes, one can certainly parse the UTF-8 character distances and hope to arrive at the same results as the UTF-8 reencoding. But the point of GUILE's character set support was not really to make everything more complicated, was it? -- David Kastrup

This bug report was last modified 10 years and 315 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #18520 string ports should not have an encoding

GNU bug report logs - #18520
string ports should not have an encoding