From unknown Sat Jun 14 14:26:01 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#11197 <11197@debbugs.gnu.org> To: bug#11197 <11197@debbugs.gnu.org> Subject: Status: problems with string ports and unicode Reply-To: bug#11197 <11197@debbugs.gnu.org> Date: Sat, 14 Jun 2025 21:26:01 +0000 retitle 11197 problems with string ports and unicode reassign 11197 guile submitter 11197 Klaus Stehle severity 11197 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 07 16:08:02 2012 Received: (at submit) by debbugs.gnu.org; 7 Apr 2012 20:08:02 +0000 Received: from localhost ([127.0.0.1]:44185 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SGbvT-0003Zr-38 for submit@debbugs.gnu.org; Sat, 07 Apr 2012 16:08:01 -0400 Received: from eggs.gnu.org ([208.118.235.92]:54468) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SGbvQ-0003Zk-Kx for submit@debbugs.gnu.org; Sat, 07 Apr 2012 16:07:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SGbuf-0001Is-25 for submit@debbugs.gnu.org; Sat, 07 Apr 2012 16:07:10 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:39126) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SGbue-0001In-Um for submit@debbugs.gnu.org; Sat, 07 Apr 2012 16:07:08 -0400 Received: from eggs.gnu.org ([208.118.235.92]:54529) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SGbud-0003zf-Ea for bug-guile@gnu.org; Sat, 07 Apr 2012 16:07:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SGbub-0001IK-HN for bug-guile@gnu.org; Sat, 07 Apr 2012 16:07:06 -0400 Received: from mx09.uni-tuebingen.de ([134.2.3.2]:45756) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SGbub-0001IE-80 for bug-guile@gnu.org; Sat, 07 Apr 2012 16:07:05 -0400 Received: from commlink.zdv.uni-tuebingen.de (commlink.zdv.uni-tuebingen.de [134.2.2.101]) by mx09.uni-tuebingen.de (8.13.6/8.13.6) with ESMTP id q37K72cs016845 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 7 Apr 2012 22:07:03 +0200 Date: Sat, 7 Apr 2012 22:07:01 +0200 (CEST) From: Klaus Stehle X-X-Sender: knaks01@commlink.zdv.uni-tuebingen.de To: bug-guile@gnu.org Subject: problems with string ports and unicode Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="42362114-1364899799-1333829221=:6107" X-AntiVirus-Spam-Check: clean (checked by Avira MailGate: version: 3.2.1.23; spam filter version: 3.2.0/2.3; host: mx09) X-AntiVirus: checked by Avira MailGate (version: 3.2.1.23; AVE: 8.2.5.34; VDF: 7.11.10.215; host: mx09); id=15713-BihfWG X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --42362114-1364899799-1333829221=:6107 Content-Type: TEXT/PLAIN; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx09.uni-tuebingen.de id q37K72cs016845 Hi, ;;;; a very very short example script to describe the problem: ;; open a string port with unicode characters >=3D 0x0100 (define p (open-input-string "=C4=8Dty=C5=99=C3=AD")) Put the line into a script and start guile. You will see the output: =3D> Backtrace: That's all, and guile will hang in an eternal loop. If you enter the line interactively into the REPL, everything works properly and you can read all characters with (read-char p). ;;;; another very short script, which is possibly the same problem: ;; open a string port and unread a unicode character >=3D 0x0100 (define p (open-input-string "ibenik")) (unread-char #\=C5=A0 p) Running these two lines as a script generates an error message: =3D> ERROR: In procedure unread-char: =3D> ERROR: Throw to key `encoding-error' with args `("scm_ungetc" "conversion to port encoding failed" 84 #f #\540= )'. If you enter the lines interactively into the REPL, everything works properly and you can read all characters with (read-char p). Cheers, Klaus Stehle ---------------------------- guile --version guile (GNU Guile) 2.0.5 uname -srm Linux 2.6.32-5-amd64 x86_64 echo $LANG de_DE.UTF-8 --42362114-1364899799-1333829221=:6107-- From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 09 17:13:31 2012 Received: (at 11197) by debbugs.gnu.org; 9 Apr 2012 21:13:31 +0000 Received: from localhost ([127.0.0.1]:47404 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHLtz-0002yr-9q for submit@debbugs.gnu.org; Mon, 09 Apr 2012 17:13:31 -0400 Received: from xanadu.aquilenet.fr ([88.191.123.111]:38661) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHLtx-0002yj-20 for 11197@debbugs.gnu.org; Mon, 09 Apr 2012 17:13:29 -0400 Received: from localhost (xanadu.aquilenet.fr [127.0.0.1]) by xanadu.aquilenet.fr (Postfix) with ESMTP id D8BAB7601; Mon, 9 Apr 2012 23:12:30 +0200 (CEST) Received: from xanadu.aquilenet.fr ([127.0.0.1]) by localhost (xanadu.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9xt4-3Aylpck; Mon, 9 Apr 2012 23:12:30 +0200 (CEST) Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by xanadu.aquilenet.fr (Postfix) with ESMTPSA id 45A5D7600; Mon, 9 Apr 2012 23:12:30 +0200 (CEST) From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) To: Klaus Stehle Subject: Re: bug#11197: problems with string ports and unicode References: Date: Mon, 09 Apr 2012 23:12:29 +0200 In-Reply-To: (Klaus Stehle's message of "Sat, 7 Apr 2012 22:07:01 +0200 (CEST)") Message-ID: <87ty0sa9tu.fsf@gnu.org> User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.93 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11197 Cc: 11197@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) Hi, It may be that your string ports are created with a non-Unicode-capable encoding. Try something like: (define p (with-fluids ((%default-port-encoding "UTF-8")) (open-input-string "=C4=8Dty=C5=99=C3=AD"))) More details in the manual (info "(guile) String Ports"). How does it work for you? Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 12:12:31 2012 Received: (at 11197) by debbugs.gnu.org; 11 Apr 2012 16:12:31 +0000 Received: from localhost ([127.0.0.1]:52306 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SI09m-0001d9-9K for submit@debbugs.gnu.org; Wed, 11 Apr 2012 12:12:30 -0400 Received: from world.peace.net ([96.39.62.75]:38339) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SI09j-0001d1-Eq for 11197@debbugs.gnu.org; Wed, 11 Apr 2012 12:12:28 -0400 Received: from 209-6-91-212.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com ([209.6.91.212] helo=yeeloong) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1SI08X-0001o2-BI; Wed, 11 Apr 2012 12:11:13 -0400 From: Mark H Weaver To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: bug#11197: problems with string ports and unicode References: <87ty0sa9tu.fsf@gnu.org> Date: Wed, 11 Apr 2012 12:08:09 -0400 In-Reply-To: <87ty0sa9tu.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 09 Apr 2012 23:12:29 +0200") Message-ID: <87ty0q8d5h.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 11197 Cc: 11197@debbugs.gnu.org, Klaus Stehle X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.8 (/) ludo@gnu.org (Ludovic Court=C3=A8s) writes: > It may be that your string ports are created with a non-Unicode-capable > encoding. Try something like: > > (define p > (with-fluids ((%default-port-encoding "UTF-8")) > (open-input-string "=C4=8Dty=C5=99=C3=AD"))) IMO, this should not be needed. Port encodings should only be relevant when reading from ports involving byte strings, such as file ports or socket ports. The encoding used by Scheme strings is a purely internal matter; from the user's perspective, Scheme strings are simply a sequence of Unicode code points. What _is_ needed is a file coding declaration near the top of the source file, e.g. "coding: utf-8" (see "Character Encoding of Source Files" in the manual). I tried that and it still fails for me. I think this is a genuine bug. Mark From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 12:26:26 2012 Received: (at 11197) by debbugs.gnu.org; 11 Apr 2012 16:26:27 +0000 Received: from localhost ([127.0.0.1]:52318 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SI0NF-0001yT-OZ for submit@debbugs.gnu.org; Wed, 11 Apr 2012 12:26:26 -0400 Received: from xanadu.aquilenet.fr ([88.191.123.111]:45606) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SI0NC-0001yK-AM for 11197@debbugs.gnu.org; Wed, 11 Apr 2012 12:26:23 -0400 Received: from localhost (xanadu.aquilenet.fr [127.0.0.1]) by xanadu.aquilenet.fr (Postfix) with ESMTP id 4C7707643; Wed, 11 Apr 2012 18:25:12 +0200 (CEST) Received: from xanadu.aquilenet.fr ([127.0.0.1]) by localhost (xanadu.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id THBOeXShBa6z; Wed, 11 Apr 2012 18:25:12 +0200 (CEST) Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by xanadu.aquilenet.fr (Postfix) with ESMTPSA id 7567D7642; Wed, 11 Apr 2012 18:25:11 +0200 (CEST) From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) To: Mark H Weaver Subject: Re: bug#11197: problems with string ports and unicode References: <87ty0sa9tu.fsf@gnu.org> <87ty0q8d5h.fsf@netris.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 23 Germinal an 220 de la =?iso-8859-1?Q?R=E9volution?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 83C4 F8E5 10A3 3B4C 5BEA D15D 77DD 95E2 EA52 ECF4 X-OS: x86_64-unknown-linux-gnu Date: Wed, 11 Apr 2012 18:25:10 +0200 In-Reply-To: <87ty0q8d5h.fsf@netris.org> (Mark H. Weaver's message of "Wed, 11 Apr 2012 12:08:09 -0400") Message-ID: <87zkaip76h.fsf@gnu.org> User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.93 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11197 Cc: 11197@debbugs.gnu.org, Klaus Stehle X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) Hi Mark, Mark H Weaver skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: >> It may be that your string ports are created with a non-Unicode-capable >> encoding. Try something like: >> >> (define p >> (with-fluids ((%default-port-encoding "UTF-8")) >> (open-input-string "=C4=8Dty=C5=99=C3=AD"))) > > IMO, this should not be needed. Port encodings should only be relevant > when reading from ports involving byte strings, such as file ports or > socket ports. The encoding used by Scheme strings is a purely internal > matter; from the user's perspective, Scheme strings are simply a > sequence of Unicode code points. Note that =E2=80=9CUTF-8=E2=80=9D above has nothing to do with Guile=E2=80= =99s internal string representation; it=E2=80=99s just one of the many encodings that can repres= ent =E2=80=9C=C4=8Dty=C5=99=C3=AD=E2=80=9D. > What _is_ needed is a file coding declaration near the top of the source > file, e.g. "coding: utf-8" (see "Character Encoding of Source Files" in > the manual). Yes. And you actually need both=E2=80=93i.e., the =E2=80=98coding=E2=80=99= cookie won=E2=80=99t magically make string ports use that encoding. > I tried that and it still fails for me. What fails exactly? Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 13:57:42 2012 Received: (at 11197) by debbugs.gnu.org; 11 Apr 2012 17:57:42 +0000 Received: from localhost ([127.0.0.1]:52460 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SI1na-00049p-15 for submit@debbugs.gnu.org; Wed, 11 Apr 2012 13:57:42 -0400 Received: from world.peace.net ([96.39.62.75]:38416) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SI1nX-00049e-Lq for 11197@debbugs.gnu.org; Wed, 11 Apr 2012 13:57:40 -0400 Received: from 209-6-91-212.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com ([209.6.91.212] helo=yeeloong) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1SI1mJ-00022W-Jd; Wed, 11 Apr 2012 13:56:24 -0400 From: Mark H Weaver To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: bug#11197: problems with string ports and unicode References: <87ty0sa9tu.fsf@gnu.org> <87ty0q8d5h.fsf@netris.org> <87zkaip76h.fsf@gnu.org> Date: Wed, 11 Apr 2012 13:53:21 -0400 In-Reply-To: <87zkaip76h.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 11 Apr 2012 18:25:10 +0200") Message-ID: <87lim288a6.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 11197 Cc: 11197@debbugs.gnu.org, Klaus Stehle X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.8 (/) Hi Ludovic, ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Mark H Weaver skribis: >> ludo@gnu.org (Ludovic Court=C3=A8s) writes: >>> It may be that your string ports are created with a non-Unicode-capable >>> encoding. Try something like: >>> >>> (define p >>> (with-fluids ((%default-port-encoding "UTF-8")) >>> (open-input-string "=C4=8Dty=C5=99=C3=AD"))) >> >> IMO, this should not be needed. Port encodings should only be relevant >> when reading from ports involving byte strings, such as file ports or >> socket ports. The encoding used by Scheme strings is a purely internal >> matter; from the user's perspective, Scheme strings are simply a >> sequence of Unicode code points. > > Note that =E2=80=9CUTF-8=E2=80=9D above has nothing to do with Guile=E2= =80=99s internal string > representation; it=E2=80=99s just one of the many encodings that can repr= esent > =E2=80=9C=C4=8Dty=C5=99=C3=AD=E2=80=9D. Okay, now I understand. The problem is that internally, string ports are implemented by converting the string into a stream of bytes in the string port's encoding, and then the string port reads those bytes. Nonetheless, it is very unfortunate that this internal implementation detail "leaks" out into user code. SRFI-6 says nothing about port encodings, and portable code written for SRFI-6 will fail on Guile unless the string is constrained to whatever the default port encoding happens to be. Conceptually, a string port is a textual port, not a binary port. You should be able to hand it an arbitrary string and read those characters from it, as described in SRFI-6, without setting Guile-specific fluid variables. Similarly, you should be able to write arbitrary characters to a string-output-port. IMO, string ports should use UTF-8 as their initial port encoding, since we know that UTF-8 can represent any Guile string. This will allow portable use of string ports. I realize that this would change the existing behavior of programs that use binary I/O on string ports, but as things stand right now, portable SRFI-6 code is broken on Guile. What do you think? >> What _is_ needed is a file coding declaration near the top of the source >> file, e.g. "coding: utf-8" (see "Character Encoding of Source Files" in >> the manual). > > Yes. And you actually need both=E2=80=93i.e., the =E2=80=98coding=E2=80= =99 cookie won=E2=80=99t > magically make string ports use that encoding. > >> I tried that and it still fails for me. > > What fails exactly? It fails ungracefully (goes into an infinite while trying to print the backtrace) without the %default-port-encoding setting. It works when I add both the %default-port-encoding setting and the coding declaration. Thanks, Mark From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 17:02:34 2012 Received: (at 11197) by debbugs.gnu.org; 11 Apr 2012 21:02:34 +0000 Received: from localhost ([127.0.0.1]:52833 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SI4gT-00012B-7Z for submit@debbugs.gnu.org; Wed, 11 Apr 2012 17:02:33 -0400 Received: from xanadu.aquilenet.fr ([88.191.123.111]:33444) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SI4gP-000121-Fi for 11197@debbugs.gnu.org; Wed, 11 Apr 2012 17:02:31 -0400 Received: from localhost (xanadu.aquilenet.fr [127.0.0.1]) by xanadu.aquilenet.fr (Postfix) with ESMTP id 30C107662; Wed, 11 Apr 2012 23:01:18 +0200 (CEST) Received: from xanadu.aquilenet.fr ([127.0.0.1]) by localhost (xanadu.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cOET4y4eM3av; Wed, 11 Apr 2012 23:01:18 +0200 (CEST) Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by xanadu.aquilenet.fr (Postfix) with ESMTPSA id CA26B7660; Wed, 11 Apr 2012 23:01:16 +0200 (CEST) From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) To: Mark H Weaver Subject: Re: bug#11197: problems with string ports and unicode References: <87ty0sa9tu.fsf@gnu.org> <87ty0q8d5h.fsf@netris.org> <87zkaip76h.fsf@gnu.org> <87lim288a6.fsf@netris.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 23 Germinal an 220 de la =?iso-8859-1?Q?R=E9volution?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 83C4 F8E5 10A3 3B4C 5BEA D15D 77DD 95E2 EA52 ECF4 X-OS: x86_64-unknown-linux-gnu Date: Wed, 11 Apr 2012 23:01:16 +0200 In-Reply-To: <87lim288a6.fsf@netris.org> (Mark H. Weaver's message of "Wed, 11 Apr 2012 13:53:21 -0400") Message-ID: <87wr5mj84j.fsf@gnu.org> User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.93 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11197 Cc: 11197@debbugs.gnu.org, Klaus Stehle X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Mark, Mark H Weaver skribis: > Okay, now I understand. The problem is that internally, string ports > are implemented by converting the string into a stream of bytes in the > string port's encoding, and then the string port reads those bytes. Exactly. [...] > Conceptually, a string port is a textual port, not a binary port. But not in Guile, where there=E2=80=99s no distinction between textual and binary ports. One can write code like: scheme@(guile-user)> (define (string->utf16 s) (let ((p (with-fluids ((%default-port-encoding "UT= F-16BE")) (open-input-string s)))) (get-bytevector-all p))) scheme@(guile-user)> (string->utf16 "hello") $4 =3D #vu8(0 104 0 101 0 108 0 108 0 111) scheme@(guile-user)> (use-modules(rnrs bytevectors)) scheme@(guile-user)> (utf16->string $4) $5 =3D "hello" > You should be able to hand it an arbitrary string and read those > characters from it, as described in SRFI-6, without setting > Guile-specific fluid variables. Similarly, you should be able to > write arbitrary characters to a string-output-port. The SRFI-6 issue could be addressed with: --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/module/srfi/srfi-6.scm b/module/srfi/srfi-6.scm index 098b586..ba946ec 100644 --- a/module/srfi/srfi-6.scm +++ b/module/srfi/srfi-6.scm @@ -1,6 +1,6 @@ ;;; srfi-6.scm --- Basic String Ports -;; Copyright (C) 2001, 2002, 2003, 2006 Free Software Foundation, Inc. +;; Copyright (C) 2001, 2002, 2003, 2006, 2012 Free Software Foundation, Inc. ;; ;; This library is free software; you can redistribute it and/or ;; modify it under the terms of the GNU Lesser General Public @@ -23,10 +23,16 @@ ;;; Code: (define-module (srfi srfi-6) - #:re-export (open-input-string open-output-string get-output-string)) + #:export (open-input-string open-output-string) + #:re-export (get-output-string)) -;; Currently, guile provides these functions by default, so no action -;; is needed, and this file is just a placeholder. +(define (open-input-string s) + (with-fluids ((%default-port-encoding "UTF-8")) + ((@ (guile) open-input-string) s))) + +(define (open-output-string) + (with-fluids ((%default-port-encoding "UTF-8")) + ((@ (guile) open-output-string)))) (cond-expand-provide (current-module) '(srfi-6)) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable It wouldn=E2=80=99t completely solve the problem. > IMO, string ports should use UTF-8 as their initial port encoding, since > we know that UTF-8 can represent any Guile string. This will allow > portable use of string ports. The change was submitted and briefly discussed at . I think the rationale was mostly backward compatibility (in 1.8 people could mix Latin-1 textual and binary I/O), consistency with how other ports behave, and the ability to change the default encoding of string ports. > I realize that this would change the existing behavior of programs that > use binary I/O on string ports, but as things stand right now, portable > SRFI-6 code is broken on Guile. > > What do you think? In hindsight, UTF-8 does seem like a better default than the locale port encoding (which is what %default-port-encoding is, by default), but it does remain useful to specify a different encoding. >>> What _is_ needed is a file coding declaration near the top of the source >>> file, e.g. "coding: utf-8" (see "Character Encoding of Source Files" in >>> the manual). >> >> Yes. And you actually need both=E2=80=93i.e., the =E2=80=98coding=E2=80= =99 cookie won=E2=80=99t >> magically make string ports use that encoding. >> >>> I tried that and it still fails for me. >> >> What fails exactly? > > It fails ungracefully (goes into an infinite while trying to print the > backtrace) without the %default-port-encoding setting. Indeed, it=E2=80=99s stuck in a deadlock: --8<---------------cut here---------------start------------->8--- (gdb) bt #0 0x00007ffff75e1204 in __lll_lock_wait () from /nix/store/vxycd107wjbhcj= 720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0 #1 0x00007ffff75dc4d4 in _L_lock_999 () from /nix/store/vxycd107wjbhcj720h= zkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0 #2 0x00007ffff75dc2ea in pthread_mutex_lock () from /nix/store/vxycd107wjb= hcj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0 #3 0x00007ffff7b30499 in scm_dynwind_pthread_mutex_lock (mutex=3D0x7ffff7d= d28c0) at threads.c:1962 #4 0x00007ffff7b2bb0e in scm_mkstrport (pos=3D0x2, str=3D0x4, modes=3D3276= 80, caller=3D) at strports.c:287 #5 0x00007ffff7aac20b in display_backtrace_body (a=3D0x7fffffffc1a0) at ba= cktrace.c:487 #6 0x00007ffff7b46c7b in vm_regular_engine (vm=3D0x6f61f0, program=3D0x7f5= d50, argv=3D0x6fa3b0, nargs=3D-1) at vm-i-system.c:895 #7 0x00007ffff7ac039e in scm_call_3 (proc=3D0x7f5d50, arg1=3D, arg2=3D, arg3=3D) at ev= al.c:500 #8 0x00007ffff7b32504 in scm_internal_catch (tag=3D, = body=3D, body_data=3D, handler=3D= , handler_data=3D) at throw.c:222 #9 0x00007ffff7aabbba in scm_display_backtrace_with_highlights (stack=3D, port=3D, first=3D, depth=3D, highlights=3D) at backtrace.c:558 #10 0x00007ffff7ab725e in print_exception_and_backtrace (error_port=3D0x6f6= 170, tag=3D0x66d4c0, args=3D0x8e6ea0) at continuations.c:490 #11 pre_unwind_handler (error_port=3D0x6f6170, tag=3D0x66d4c0, args=3D0x8e6= ea0) at continuations.c:534 #12 0x00007ffff7b46c7b in vm_regular_engine (vm=3D0x6f61f0, program=3D0x7f3= ce0, argv=3D0x6fa300, nargs=3D-1) at vm-i-system.c:895 #13 0x00007ffff7b4846e in scm_call_with_vm (vm=3D0x6f61f0, proc=3D0x7f3ce0,= args=3D) at vm.c:878 #14 0x00007ffff7b296db in scm_to_stringn (str=3D0x8dba80, lenp=3D0x7fffffff= c4e8, encoding=3D, handler=3DSCM_FAILED_CONVERSION_ERR= OR) at strings.c:2102 #15 0x00007ffff7b2bb73 in scm_mkstrport (pos=3D0x2, str=3D0x8dba80, modes= =3D196608, caller=3D) at strports.c:312 --8<---------------cut here---------------end--------------->8--- This could be fixed by calling =E2=80=98scm_new_port_table_entry=E2=80=99 a= fter having prepared the backing buffer, but the problem is that =E2=80=98pt->encoding= =E2=80=99 is needed before. Thoughts? Ludo=E2=80=99. --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 20 17:02:15 2012 Received: (at 11197) by debbugs.gnu.org; 20 Jun 2012 21:02:15 +0000 Received: from localhost ([127.0.0.1]:50270 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ShS2Z-00010Y-4R for submit@debbugs.gnu.org; Wed, 20 Jun 2012 17:02:15 -0400 Received: from xanadu.aquilenet.fr ([88.191.123.111]:40062) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ShS2V-00010O-LY for 11197@debbugs.gnu.org; Wed, 20 Jun 2012 17:02:13 -0400 Received: from localhost (localhost [127.0.0.1]) by xanadu.aquilenet.fr (Postfix) with ESMTP id 531A07CEB; Wed, 20 Jun 2012 22:58:40 +0200 (CEST) Received: from xanadu.aquilenet.fr ([127.0.0.1]) by localhost (xanadu.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id APaKLSj0iC-q; Wed, 20 Jun 2012 22:58:40 +0200 (CEST) Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by xanadu.aquilenet.fr (Postfix) with ESMTPSA id BB9F37CE9; Wed, 20 Jun 2012 22:58:39 +0200 (CEST) From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) To: Mark H Weaver Subject: Re: bug#11197: problems with string ports and unicode References: <87ty0sa9tu.fsf@gnu.org> <87ty0q8d5h.fsf@netris.org> <87zkaip76h.fsf@gnu.org> <87lim288a6.fsf@netris.org> <87wr5mj84j.fsf@gnu.org> Date: Wed, 20 Jun 2012 22:58:39 +0200 In-Reply-To: <87wr5mj84j.fsf@gnu.org> ("Ludovic \=\?iso-8859-1\?Q\?Court\=E8s\?\= \=\?iso-8859-1\?Q\?\=22's\?\= message of "Wed, 11 Apr 2012 23:01:16 +0200") Message-ID: <87a9zxog3k.fsf@gnu.org> User-Agent: Gnus/5.130005 (Ma Gnus v0.5) Emacs/24.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11197 Cc: 11197@debbugs.gnu.org, Klaus Stehle X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) Hi, ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > @@ -23,10 +23,16 @@ > ;;; Code: >=20=20 > (define-module (srfi srfi-6) > - #:re-export (open-input-string open-output-string get-output-string)) > + #:export (open-input-string open-output-string) > + #:re-export (get-output-string)) >=20=20 > -;; Currently, guile provides these functions by default, so no action > -;; is needed, and this file is just a placeholder. > +(define (open-input-string s) > + (with-fluids ((%default-port-encoding "UTF-8")) > + ((@ (guile) open-input-string) s))) > + > +(define (open-output-string) > + (with-fluids ((%default-port-encoding "UTF-8")) > + ((@ (guile) open-output-string)))) I=E2=80=99ve applied it as commit ecb48dccbac6b8fdd969f50a23351ef7f4b91ce5. Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 20 17:06:35 2012 Received: (at 11197-done) by debbugs.gnu.org; 20 Jun 2012 21:06:35 +0000 Received: from localhost ([127.0.0.1]:50275 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ShS6k-00016W-Ru for submit@debbugs.gnu.org; Wed, 20 Jun 2012 17:06:35 -0400 Received: from xanadu.aquilenet.fr ([88.191.123.111]:40115) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ShS6j-00016Q-E9 for 11197-done@debbugs.gnu.org; Wed, 20 Jun 2012 17:06:33 -0400 Received: from localhost (localhost [127.0.0.1]) by xanadu.aquilenet.fr (Postfix) with ESMTP id 04AE87CEB; Wed, 20 Jun 2012 23:03:04 +0200 (CEST) Received: from xanadu.aquilenet.fr ([127.0.0.1]) by localhost (xanadu.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xoZm+OX4awJw; Wed, 20 Jun 2012 23:03:03 +0200 (CEST) Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by xanadu.aquilenet.fr (Postfix) with ESMTPSA id 4D9817CE9; Wed, 20 Jun 2012 23:03:03 +0200 (CEST) From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) To: Mark H Weaver Subject: Re: bug#11197: problems with string ports and unicode References: <87ty0sa9tu.fsf@gnu.org> <87ty0q8d5h.fsf@netris.org> <87zkaip76h.fsf@gnu.org> <87lim288a6.fsf@netris.org> <87wr5mj84j.fsf@gnu.org> Date: Wed, 20 Jun 2012 23:03:02 +0200 In-Reply-To: <87wr5mj84j.fsf@gnu.org> ("Ludovic \=\?iso-8859-1\?Q\?Court\=E8s\?\= \=\?iso-8859-1\?Q\?\=22's\?\= message of "Wed, 11 Apr 2012 23:01:16 +0200") Message-ID: <87obodn1bt.fsf@gnu.org> User-Agent: Gnus/5.130005 (Ma Gnus v0.5) Emacs/24.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11197-done Cc: 11197-done@debbugs.gnu.org, Klaus Stehle X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) Hi, ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > Indeed, it=E2=80=99s stuck in a deadlock: > > (gdb) bt > #0 0x00007ffff75e1204 in __lll_lock_wait () from /nix/store/vxycd107wjbh= cj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0 > #1 0x00007ffff75dc4d4 in _L_lock_999 () from /nix/store/vxycd107wjbhcj72= 0hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0 > #2 0x00007ffff75dc2ea in pthread_mutex_lock () from /nix/store/vxycd107w= jbhcj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0 > #3 0x00007ffff7b30499 in scm_dynwind_pthread_mutex_lock (mutex=3D0x7ffff= 7dd28c0) at threads.c:1962 > #4 0x00007ffff7b2bb0e in scm_mkstrport (pos=3D0x2, str=3D0x4, modes=3D32= 7680, caller=3D) at strports.c:287 > #5 0x00007ffff7aac20b in display_backtrace_body (a=3D0x7fffffffc1a0) at = backtrace.c:487 > #6 0x00007ffff7b46c7b in vm_regular_engine (vm=3D0x6f61f0, program=3D0x7= f5d50, argv=3D0x6fa3b0, nargs=3D-1) at vm-i-system.c:895 > #7 0x00007ffff7ac039e in scm_call_3 (proc=3D0x7f5d50, arg1=3D, arg2=3D, arg3=3D) at = eval.c:500 > #8 0x00007ffff7b32504 in scm_internal_catch (tag=3D= , body=3D, body_data=3D, handler= =3D, handler_data=3D) at throw.c:= 222 > #9 0x00007ffff7aabbba in scm_display_backtrace_with_highlights (stack=3D= , port=3D, first=3D, depth=3D, highlights=3D) > at backtrace.c:558 > #10 0x00007ffff7ab725e in print_exception_and_backtrace (error_port=3D0x6= f6170, tag=3D0x66d4c0, args=3D0x8e6ea0) at continuations.c:490 > #11 pre_unwind_handler (error_port=3D0x6f6170, tag=3D0x66d4c0, args=3D0x8= e6ea0) at continuations.c:534 > #12 0x00007ffff7b46c7b in vm_regular_engine (vm=3D0x6f61f0, program=3D0x7= f3ce0, argv=3D0x6fa300, nargs=3D-1) at vm-i-system.c:895 > #13 0x00007ffff7b4846e in scm_call_with_vm (vm=3D0x6f61f0, proc=3D0x7f3ce= 0, args=3D) at vm.c:878 > #14 0x00007ffff7b296db in scm_to_stringn (str=3D0x8dba80, lenp=3D0x7fffff= ffc4e8, encoding=3D, handler=3DSCM_FAILED_CONVERSION_E= RROR) at strings.c:2102 > #15 0x00007ffff7b2bb73 in scm_mkstrport (pos=3D0x2, str=3D0x8dba80, modes= =3D196608, caller=3D) at strports.c:312 > > This could be fixed by calling =E2=80=98scm_new_port_table_entry=E2=80=99= after having > prepared the backing buffer, but the problem is that =E2=80=98pt->encodin= g=E2=80=99 is > needed before. Fixed in 03fcf93bff9f02a3d12ab86be4e67b996310aad4 (not particularly elegant, but I couldn=E2=80=99t think of a better way.) The test in that c= ommit captures the initial problem. I=E2=80=99m marking this bug as =E2=80=9Cdone=E2=80=9D. If you would like = to discuss string port encodings, separate binary/textual ports, or any other significant change, you=E2=80=99re welcome to do so on guile-devel@gnu.org, of course. Thanks! Ludo=E2=80=99. From unknown Sat Jun 14 14:26:01 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 19 Jul 2012 11:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator