GNU bug report logs - #22901
drain-input doesn't decode

Previous Next

Package: guile;

Reported by: Zefram <zefram <at> fysh.org>

Date: Fri, 4 Mar 2016 03:11:01 UTC

Severity: normal

Done: Taylan Kammer <taylan.kammer <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22901 in the body.
You can then email your comments to 22901 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#22901; Package guile. (Fri, 04 Mar 2016 03:11:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Zefram <zefram <at> fysh.org>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Fri, 04 Mar 2016 03:11:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Zefram <zefram <at> fysh.org>
To: bug-guile <at> gnu.org
Subject: drain-input doesn't decode
Date: Fri, 4 Mar 2016 03:09:44 +0000
The documentation for drain-input says that it returns a string of
characters, implying that the result is equivalent to what you'd get
from calling read-char some number of times.  In fact it differs in a
significant respect: whereas read-char decodes input octets according to
the port's selected encoding, drain-input ignores the selected encoding
and always decodes according to ISO-8859-1 (thus preserving the octet
values in character form).

$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (write (map char->integer (let r ((l '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object? c) (reverse l) (r (cons c l))))))) (newline)'
"UCS-2BE"
(353 610 867)
$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (peek-char (current-input-port)) (write (map char->integer (string->list (drain-input (current-input-port))))) (newline)'
"UCS-2BE"
(1 97 2 98 3 99)

The practical upshot is that the input returned by drain-input can't
be used in the same way as regular input from read-char.  It can still
be used if the code doing the reading is totally aware of the encoding,
so that it can perform the decoding manually, but this seems a failure
of abstraction.  The value returned by drain-input ought to be coherent
with the abstraction level at which it is specified.

I can see that there is a reason for drain-input to avoid performing
decoding: the problem that occurs if the buffer ends in the middle
of a character.  If drain-input is to return decoded characters then
presumably in this case it would have to read further octets beyond the
buffer contents, in an unbuffered manner, until it reaches a character
boundary.  If this is too unpalatable, perhaps drain-input should be
permitted only on ports configured for single-octet character encodings.

If, on the other hand, it is decided to endorse the current non-decoding
behaviour, then the break of abstraction needs to be documented.

-zefram




Information forwarded to bug-guile <at> gnu.org:
bug#22901; Package guile. (Mon, 20 Jun 2016 16:14:01 GMT) Full text and rfc822 format available.

Message #8 received at 22901 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: Zefram <zefram <at> fysh.org>
Cc: 22901 <at> debbugs.gnu.org
Subject: Re: bug#22901: drain-input doesn't decode
Date: Mon, 20 Jun 2016 18:12:50 +0200
On Fri 04 Mar 2016 04:09, Zefram <zefram <at> fysh.org> writes:

> The documentation for drain-input says that it returns a string of
> characters, implying that the result is equivalent to what you'd get
> from calling read-char some number of times.  In fact it differs in a
> significant respect: whereas read-char decodes input octets according to
> the port's selected encoding, drain-input ignores the selected encoding
> and always decodes according to ISO-8859-1 (thus preserving the octet
> values in character form).
>
> $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding!
> (current-input-port) "UCS-2BE") (write (port-encoding
> (current-input-port))) (newline) (write (map char->integer (let r ((l
> '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object?
> c) (reverse l) (r (cons c l))))))) (newline)'
> "UCS-2BE"
> (353 610 867)
> $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding!
> (current-input-port) "UCS-2BE") (write (port-encoding
> (current-input-port))) (newline) (peek-char (current-input-port))
> (write (map char->integer (string->list (drain-input
> (current-input-port))))) (newline)'
> "UCS-2BE"
> (1 97 2 98 3 99)

Thanks for the test case!  FWIW, this is fixed in Guile 2.1.3.  I am not
sure what we should do about Guile 2.0.  I guess we should make it do
the documented thing though!

Andy




Information forwarded to bug-guile <at> gnu.org:
bug#22901; Package guile. (Sun, 26 Feb 2017 17:47:02 GMT) Full text and rfc822 format available.

Message #11 received at 22901 <at> debbugs.gnu.org (full text, mbox):

From: Matt Wette <matt.wette <at> gmail.com>
To: 22901 <at> debbugs.gnu.org
Subject: drain-input doesn't decode
Date: Sun, 26 Feb 2017 09:46:14 -0800
[Message part 1 (text/plain, inline)]
I put together a test and tried on 2.1.7 - my test fails.  See attached.

  (pass-if "encoded input"
    (let ((fn (test-file))
	  (nc "utf-8")
	  (st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.")
	  ;;(st "hello, world\n")
	  )
      (let ((p1 (open-output-file fn #:encoding nc)))
	;;(display st p1)
	(string-for-each (lambda (ch) (write-char ch p1)) st)
	(close p1))
      (let* ((p0 (open-input-file fn #:encoding nc))
	     (s0 (begin (unread-char (read-char p0) p0) (drain-input p0))))
	(simple-format #t "~S\n" s0)
	(equal? s0 st))))

[port-di.test (application/octet-stream, attachment)]

Information forwarded to bug-guile <at> gnu.org:
bug#22901; Package guile. (Sun, 26 Feb 2017 17:59:01 GMT) Full text and rfc822 format available.

Message #14 received at 22901 <at> debbugs.gnu.org (full text, mbox):

From: Matt Wette <matt.wette <at> gmail.com>
To: 22901 <at> debbugs.gnu.org
Subject: Re: drain-input doesn't decode
Date: Sun, 26 Feb 2017 09:58:42 -0800
[Message part 1 (text/plain, inline)]
> On Feb 26, 2017, at 9:46 AM, Matt Wette <matt.wette <at> gmail.com> wrote:
> 
> I put together a test and tried on 2.1.7 - my test fails.  See attached.
> 
>  (pass-if "encoded input"
>    (let ((fn (test-file))
> 	  (nc "utf-8")
> 	  (st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.")
> 	  ;;(st "hello, world\n")
> 	  )
>      (let ((p1 (open-output-file fn #:encoding nc)))
> 	;;(display st p1)
> 	(string-for-each (lambda (ch) (write-char ch p1)) st)
> 	(close p1))
>      (let* ((p0 (open-input-file fn #:encoding nc))
> 	     (s0 (begin (unread-char (read-char p0) p0) (drain-input p0))))
> 	(simple-format #t "~S\n" s0)
> 	(equal? s0 st))))
> 

My bad.  The failure was on guile-2.0.13.  It seems to work on guile-2.1.7:

mwette$ guile-2.1.7-dev3/meta/guile port-di.test
"βαδ ασσ am I."
PASS: drain-input: encoded input

[Message part 2 (text/html, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#22901; Package guile. (Sun, 16 May 2021 17:56:01 GMT) Full text and rfc822 format available.

Message #17 received at 22901 <at> debbugs.gnu.org (full text, mbox):

From: Taylan Kammer <taylan.kammer <at> gmail.com>
To: 22901 <at> debbugs.gnu.org, Zefram <zefram <at> fysh.org>,
 Andy Wingo <wingo <at> pobox.com>
Subject: drain-input doesn't decode
Date: Sun, 16 May 2021 19:55:07 +0200
Are we still maintaining 2.0, or can this issue be closed?

-- 
Taylan




Reply sent to Taylan Kammer <taylan.kammer <at> gmail.com>:
You have taken responsibility. (Wed, 19 May 2021 11:42:01 GMT) Full text and rfc822 format available.

Notification sent to Zefram <zefram <at> fysh.org>:
bug acknowledged by developer. (Wed, 19 May 2021 11:42:01 GMT) Full text and rfc822 format available.

Message #22 received at 22901-done <at> debbugs.gnu.org (full text, mbox):

From: Taylan Kammer <taylan.kammer <at> gmail.com>
To: 22901-done <at> debbugs.gnu.org
Subject: drain-input doesn't decode
Date: Wed, 19 May 2021 13:41:26 +0200
Closing this since it's 5 years old and fixed in Guile 2.1 and higher.

-- 
Taylan




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 17 Jun 2021 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 364 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.