GNU bug report logs - #18520
string ports should not have an encoding

Previous Next

Package: guile;

Reported by: David Kastrup <dak <at> gnu.org>

Date: Sun, 21 Sep 2014 23:35:02 UTC

Severity: wishlist

Full log


Message #29 received at 18520 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: 18520 <at> debbugs.gnu.org
Subject: Re: bug#18520: string ports should not have an encoding
Date: Tue, 23 Sep 2014 00:12:58 +0200
ludo <at> gnu.org (Ludovic Courtès) writes:

> David Kastrup <dak <at> gnu.org> skribis:
>>
>> For error messages, yes.  For associating a position in a string with a
>> previously parsed closure, no.
>
> But wouldn’t a line/column pair be as suitable as a unique identifier as
> the position in the file?

As long as the reencoded UTF-8 is byte-identical to the original.  At
the current point of time, we flag non-UTF-8 sequences with a warning
and continue.

People complained previously about things like Latin-1 characters (most
likely to occur in comments or lyrics where they cause little or
well-identifiable havoc) leading to unceremonious aborts without
identifiable cause.

At any rate, the current behavior does not make sense.  Guile 2.0 might
refuse to turn a string into a port, and for Guile 2.2 the port encoding
may be used to have a UTF-8 rendition of the string characters be
interpreted in another encoding (like latin-1) but not the other way
round.

Both versions make only some half-baked sense.  Most resulting problems
can probably be worked around in some manner, but string ports are
actually the main stringbuf-like mechanism that Scheme has (dynamically
growing strings that are more compact than a list of characters).
Wedging a compulsory code conversion into it that is mirrored in the
port positions seems like a distraction.

> Also, if the result of ‘ftell’ is used as a unique identifier, does it
> really matter whether it’s an offset measured in bytes or in
> character?

In the LilyPond lexer, stuff is usually measured with byte offsets.
Yes, one can certainly parse the UTF-8 character distances and hope to
arrive at the same results as the UTF-8 reencoding.

But the point of GUILE's character set support was not really to make
everything more complicated, was it?

-- 
David Kastrup




This bug report was last modified 10 years and 258 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.