From unknown Sat Jun 21 05:03:50 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62290: Error when handling invalid unicode with suspendable ports Resent-From: Christopher Baines Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 20 Mar 2023 09:13:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 62290 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: 62290@debbugs.gnu.org X-Debbugs-Original-To: bug-guile@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.167930352821039 (code B ref -1); Mon, 20 Mar 2023 09:13:01 +0000 Received: (at submit) by debbugs.gnu.org; 20 Mar 2023 09:12:08 +0000 Received: from localhost ([127.0.0.1]:53686 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peBYm-0005TG-Ev for submit@debbugs.gnu.org; Mon, 20 Mar 2023 05:12:08 -0400 Received: from lists.gnu.org ([209.51.188.17]:37714) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peBYl-0005T9-3U for submit@debbugs.gnu.org; Mon, 20 Mar 2023 05:12:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1peBYk-0001d8-DY for bug-guile@gnu.org; Mon, 20 Mar 2023 05:12:06 -0400 Received: from mira.cbaines.net ([212.71.252.8]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1peBYg-0003OU-7c for bug-guile@gnu.org; Mon, 20 Mar 2023 05:12:05 -0400 Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:54d1:d5d4:280e:f699]) by mira.cbaines.net (Postfix) with ESMTPSA id AA1EB16F1F for ; Mon, 20 Mar 2023 09:11:59 +0000 (GMT) Received: from felis (localhost [127.0.0.1]) by localhost (OpenSMTPD) with ESMTP id 077924cf for ; Mon, 20 Mar 2023 09:11:58 +0000 (UTC) User-agent: mu4e 1.8.13; emacs 28.2 From: Christopher Baines Date: Mon, 20 Mar 2023 09:09:14 +0000 Message-ID: <874jqf6b35.fsf@cbaines.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=212.71.252.8; envelope-from=mail@cbaines.net; helo=mira.cbaines.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) Here's a simple reproducer: (use-modules (ice-9 binary-ports) (ice-9 suspendable-ports) (rnrs bytevectors)) (define (test) (let* ((sequence '(#xf4 #xa4 #xbd #xa4)) (p (open-bytevector-input-port (u8-list->bytevector sequence)))) (set-port-encoding! p "UTF-8") (set-port-conversion-strategy! p 'substitute) (peek (read-char p)))) (test) (install-suspendable-ports!) (test) If you run it, it outputs #\=EF=BF=BD as expected the first time, but then = using suspendable ports, it raises an exception. The behaviour should be the same. ;;; (#\=EF=BF=BD) Backtrace: In ice-9/boot-9.scm: 1752:10 8 (with-exception-handler _ _ #:unwind? _ # _) In unknown file: 7 (apply-smob/0 #) In ice-9/boot-9.scm: 724:2 6 (call-with-prompt ("prompt") # =E2=80=A6) In ice-9/eval.scm: 619:8 5 (_ #(#(#))) In ice-9/boot-9.scm: 2836:4 4 (save-module-excursion #) 4388:12 3 (_) In /home/chris/Projects/Guile/guile/bad-unicode.scm: 12:10 2 (test) In ice-9/suspendable-ports.scm: 591:33 1 (read-char _) 499:12 0 (peek-char-and-next-cur/utf8 _ _ _ _) ice-9/suspendable-ports.scm:499:12: In procedure peek-char-and-next-cur/utf= 8: In procedure integer->char: Argument 1 out of range: 1199972 From unknown Sat Jun 21 05:03:50 2025 X-Loop: help-debbugs@gnu.org Subject: bug#62290: [PATCH] Fix some invalid unicode handling issues with suspendable ports. References: <874jqf6b35.fsf@cbaines.net> In-Reply-To: <874jqf6b35.fsf@cbaines.net> Resent-From: Christopher Baines Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 20 Mar 2023 09:16:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62290 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: 62290@debbugs.gnu.org Received: via spool by 62290-submit@debbugs.gnu.org id=B62290.167930371721366 (code B ref 62290); Mon, 20 Mar 2023 09:16:02 +0000 Received: (at 62290) by debbugs.gnu.org; 20 Mar 2023 09:15:17 +0000 Received: from localhost ([127.0.0.1]:53692 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peBbp-0005YY-1L for submit@debbugs.gnu.org; Mon, 20 Mar 2023 05:15:17 -0400 Received: from mira.cbaines.net ([212.71.252.8]:42404) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peBbm-0005YQ-Ue for 62290@debbugs.gnu.org; Mon, 20 Mar 2023 05:15:15 -0400 Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:54d1:d5d4:280e:f699]) by mira.cbaines.net (Postfix) with ESMTPSA id 56CDE16F21 for <62290@debbugs.gnu.org>; Mon, 20 Mar 2023 09:15:14 +0000 (GMT) Received: from localhost (localhost [local]) by localhost (OpenSMTPD) with ESMTPA id 03fb9ae7 for <62290@debbugs.gnu.org>; Mon, 20 Mar 2023 09:15:14 +0000 (UTC) From: Christopher Baines Date: Mon, 20 Mar 2023 09:15:13 +0000 Message-Id: <20230320091513.10817-1-mail@cbaines.net> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Based on the implementation in ports.c. I don't understand what this code is really doing, but the suspendable ports implementation differs from the similar C code for a couple of inequalities. * module/ice-9/suspendable-ports.scm (decode-utf8, bad-utf8-len): Flip a couple of inequalities. * test-suite/tests/ports.test ("string ports"): Add additional invalid UTF-8 test case. --- module/ice-9/suspendable-ports.scm | 8 ++++---- test-suite/tests/ports.test | 7 +++++++ 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/module/ice-9/suspendable-ports.scm b/module/ice-9/suspendable-ports.scm index a823f1d37..9fac1df62 100644 --- a/module/ice-9/suspendable-ports.scm +++ b/module/ice-9/suspendable-ports.scm @@ -419,7 +419,7 @@ (= (logand u8_2 #xc0) #x80) (case u8_0 ((#xe0) (>= u8_1 #xa0)) - ((#xed) (>= u8_1 #x9f)) + ((#xed) (<= u8_1 #x9f)) (else #t))) (kt (integer->char (logior (ash (logand u8_0 #x0f) 12) @@ -436,7 +436,7 @@ (= (logand u8_3 #xc0) #x80) (case u8_0 ((#xf0) (>= u8_1 #x90)) - ((#xf4) (>= u8_1 #x8f)) + ((#xf4) (<= u8_1 #x8f)) (else #t))) (kt (integer->char (logior (ash (logand u8_0 #x07) 18) @@ -462,7 +462,7 @@ ((< buffering 2) 1) ((not (= (logand (ref 1) #xc0) #x80)) 1) ((and (eq? first-byte #xe0) (< (ref 1) #xa0)) 1) - ((and (eq? first-byte #xed) (< (ref 1) #x9f)) 1) + ((and (eq? first-byte #xed) (> (ref 1) #x9f)) 1) ((< buffering 3) 2) ((not (= (logand (ref 2) #xc0) #x80)) 2) (else 0))) @@ -471,7 +471,7 @@ ((< buffering 2) 1) ((not (= (logand (ref 1) #xc0) #x80)) 1) ((and (eq? first-byte #xf0) (< (ref 1) #x90)) 1) - ((and (eq? first-byte #xf4) (< (ref 1) #x8f)) 1) + ((and (eq? first-byte #xf4) (> (ref 1) #x8f)) 1) ((< buffering 3) 2) ((not (= (logand (ref 2) #xc0) #x80)) 2) ((< buffering 4) 3) diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test index 66e10e3dd..1b30e1a68 100644 --- a/test-suite/tests/ports.test +++ b/test-suite/tests/ports.test @@ -1059,6 +1059,13 @@ eof)) (test-decoding-error (#xf0 #x88 #x88 #x88) "UTF-8" + (error ;; 2nd byte should be in the 90..BF range + error ;; 88: not a valid starting byte + error ;; 88: not a valid starting byte + error ;; 88: not a valid starting byte + eof)) + + (test-decoding-error (#xf4 #xa4 #xbd #xa4) "UTF-8" (error ;; 2nd byte should be in the 90..BF range error ;; 88: not a valid starting byte error ;; 88: not a valid starting byte -- 2.39.1 From unknown Sat Jun 21 05:03:50 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Christopher Baines Subject: bug#62290: closed (Re: bug#62290: Error when handling invalid unicode with suspendable ports) Message-ID: References: <87pm932h4b.fsf_-_@gnu.org> <874jqf6b35.fsf@cbaines.net> X-Gnu-PR-Message: they-closed 62290 X-Gnu-PR-Package: guile Reply-To: 62290@debbugs.gnu.org Date: Mon, 20 Mar 2023 22:28:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1679351282-30303-1" This is a multi-part message in MIME format... ------------=_1679351282-30303-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #62290: Error when handling invalid unicode with suspendable ports which was filed against the guile package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 62290@debbugs.gnu.org. --=20 62290: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D62290 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1679351282-30303-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 62290-done) by debbugs.gnu.org; 20 Mar 2023 22:27:41 +0000 Received: from localhost ([127.0.0.1]:57095 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peNyf-0007sE-B2 for submit@debbugs.gnu.org; Mon, 20 Mar 2023 18:27:41 -0400 Received: from eggs.gnu.org ([209.51.188.92]:48124) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peNye-0007s2-5Z for 62290-done@debbugs.gnu.org; Mon, 20 Mar 2023 18:27:40 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1peNyY-0000eM-Uw; Mon, 20 Mar 2023 18:27:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=PCPlLL7uXQP7rxDUHEOrLfskkOE0wJY0zq1ctokOozw=; b=B7q3TjBl0coP9xeiMX+m 2thdYlzJv3WDd1SPs2LX7CsnrcbDt8282fkHhlmHzWmMIJZMrckBkBsSyK8FaplJikHxmqqMHBxVW Rgu79aNi1To2rJISU/n/nDxhf91rnMTFShPH6/jhqTc110j91+RgubvJ2cbjlP/cyvx9SiOHHgUUv tcVzIRMAti1PoKcil9fcVIxs8IYXJZVtNA6ZiKqEY5/j1gLq1IS7y1b4pdJTsXGdTMmsIwCrhsEFV J+y2ToNabf7tF3NWi7DpmFUcp5zR54Ia+iBZM1kc1HuDXudOmGK1p/8dw3+7KiX2U5jxtJRXO6zGO +0qB7xvekGx7cg==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1peNyY-0001ZG-Cc; Mon, 20 Mar 2023 18:27:34 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Christopher Baines Subject: Re: bug#62290: Error when handling invalid unicode with suspendable ports References: <874jqf6b35.fsf@cbaines.net> <20230320091513.10817-1-mail@cbaines.net> Date: Mon, 20 Mar 2023 23:27:32 +0100 In-Reply-To: <20230320091513.10817-1-mail@cbaines.net> (Christopher Baines's message of "Mon, 20 Mar 2023 09:15:13 +0000") Message-ID: <87pm932h4b.fsf_-_@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 62290-done Cc: 62290-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello, Christopher Baines skribis: > Based on the implementation in ports.c. I don't understand what this > code is really doing, but the suspendable ports implementation differs > from the similar C code for a couple of inequalities. > > * module/ice-9/suspendable-ports.scm (decode-utf8, bad-utf8-len): Flip a > couple of inequalities. > * test-suite/tests/ports.test ("string ports"): Add additional invalid > UTF-8 test case. Pushed as cba2e7e3fec3c781230570f5d1ef070625eeeda8. Thanks for documenting the problem and providing a perfect patch! Ludo=E2=80=99. ------------=_1679351282-30303-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 20 Mar 2023 09:12:08 +0000 Received: from localhost ([127.0.0.1]:53686 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peBYm-0005TG-Ev for submit@debbugs.gnu.org; Mon, 20 Mar 2023 05:12:08 -0400 Received: from lists.gnu.org ([209.51.188.17]:37714) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peBYl-0005T9-3U for submit@debbugs.gnu.org; Mon, 20 Mar 2023 05:12:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1peBYk-0001d8-DY for bug-guile@gnu.org; Mon, 20 Mar 2023 05:12:06 -0400 Received: from mira.cbaines.net ([212.71.252.8]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1peBYg-0003OU-7c for bug-guile@gnu.org; Mon, 20 Mar 2023 05:12:05 -0400 Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:54d1:d5d4:280e:f699]) by mira.cbaines.net (Postfix) with ESMTPSA id AA1EB16F1F for ; Mon, 20 Mar 2023 09:11:59 +0000 (GMT) Received: from felis (localhost [127.0.0.1]) by localhost (OpenSMTPD) with ESMTP id 077924cf for ; Mon, 20 Mar 2023 09:11:58 +0000 (UTC) User-agent: mu4e 1.8.13; emacs 28.2 From: Christopher Baines To: bug-guile@gnu.org Subject: Error when handling invalid unicode with suspendable ports Date: Mon, 20 Mar 2023 09:09:14 +0000 Message-ID: <874jqf6b35.fsf@cbaines.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=212.71.252.8; envelope-from=mail@cbaines.net; helo=mira.cbaines.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) Here's a simple reproducer: (use-modules (ice-9 binary-ports) (ice-9 suspendable-ports) (rnrs bytevectors)) (define (test) (let* ((sequence '(#xf4 #xa4 #xbd #xa4)) (p (open-bytevector-input-port (u8-list->bytevector sequence)))) (set-port-encoding! p "UTF-8") (set-port-conversion-strategy! p 'substitute) (peek (read-char p)))) (test) (install-suspendable-ports!) (test) If you run it, it outputs #\=EF=BF=BD as expected the first time, but then = using suspendable ports, it raises an exception. The behaviour should be the same. ;;; (#\=EF=BF=BD) Backtrace: In ice-9/boot-9.scm: 1752:10 8 (with-exception-handler _ _ #:unwind? _ # _) In unknown file: 7 (apply-smob/0 #) In ice-9/boot-9.scm: 724:2 6 (call-with-prompt ("prompt") # =E2=80=A6) In ice-9/eval.scm: 619:8 5 (_ #(#(#))) In ice-9/boot-9.scm: 2836:4 4 (save-module-excursion #) 4388:12 3 (_) In /home/chris/Projects/Guile/guile/bad-unicode.scm: 12:10 2 (test) In ice-9/suspendable-ports.scm: 591:33 1 (read-char _) 499:12 0 (peek-char-and-next-cur/utf8 _ _ _ _) ice-9/suspendable-ports.scm:499:12: In procedure peek-char-and-next-cur/utf= 8: In procedure integer->char: Argument 1 out of range: 1199972 ------------=_1679351282-30303-1--