GNU bug report logs - #22901
drain-input doesn't decode

Previous Next

Package: guile;

Reported by: Zefram <zefram <at> fysh.org>

Date: Fri, 4 Mar 2016 03:11:01 UTC

Severity: normal

Done: Taylan Kammer <taylan.kammer <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Zefram <zefram <at> fysh.org>
Subject: bug#22901: closed (drain-input doesn't decode)
Date: Wed, 19 May 2021 11:42:01 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#22901: drain-input doesn't decode

which was filed against the guile package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 22901 <at> debbugs.gnu.org.

-- 
22901: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22901
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Taylan Kammer <taylan.kammer <at> gmail.com>
To: 22901-done <at> debbugs.gnu.org
Subject: drain-input doesn't decode
Date: Wed, 19 May 2021 13:41:26 +0200
Closing this since it's 5 years old and fixed in Guile 2.1 and higher.

-- 
Taylan

[Message part 3 (message/rfc822, inline)]
From: Zefram <zefram <at> fysh.org>
To: bug-guile <at> gnu.org
Subject: drain-input doesn't decode
Date: Fri, 4 Mar 2016 03:09:44 +0000
The documentation for drain-input says that it returns a string of
characters, implying that the result is equivalent to what you'd get
from calling read-char some number of times.  In fact it differs in a
significant respect: whereas read-char decodes input octets according to
the port's selected encoding, drain-input ignores the selected encoding
and always decodes according to ISO-8859-1 (thus preserving the octet
values in character form).

$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (write (map char->integer (let r ((l '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object? c) (reverse l) (r (cons c l))))))) (newline)'
"UCS-2BE"
(353 610 867)
$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (peek-char (current-input-port)) (write (map char->integer (string->list (drain-input (current-input-port))))) (newline)'
"UCS-2BE"
(1 97 2 98 3 99)

The practical upshot is that the input returned by drain-input can't
be used in the same way as regular input from read-char.  It can still
be used if the code doing the reading is totally aware of the encoding,
so that it can perform the decoding manually, but this seems a failure
of abstraction.  The value returned by drain-input ought to be coherent
with the abstraction level at which it is specified.

I can see that there is a reason for drain-input to avoid performing
decoding: the problem that occurs if the buffer ends in the middle
of a character.  If drain-input is to return decoded characters then
presumably in this case it would have to read further octets beyond the
buffer contents, in an unbuffered manner, until it reaches a character
boundary.  If this is too unpalatable, perhaps drain-input should be
permitted only on ports configured for single-octet character encodings.

If, on the other hand, it is decided to endorse the current non-decoding
behaviour, then the break of abstraction needs to be documented.

-zefram



This bug report was last modified 4 years ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.