GNU bug report logs - #62290
Error when handling invalid unicode with suspendable ports

Previous Next

Package: guile;

Reported by: Christopher Baines <mail <at> cbaines.net>

Date: Mon, 20 Mar 2023 09:13:01 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Christopher Baines <mail <at> cbaines.net>
To: 62290 <at> debbugs.gnu.org
Subject: bug#62290: [PATCH] Fix some invalid unicode handling issues with suspendable ports.
Date: Mon, 20 Mar 2023 09:15:13 +0000
Based on the implementation in ports.c.  I don't understand what this
code is really doing, but the suspendable ports implementation differs
from the similar C code for a couple of inequalities.

* module/ice-9/suspendable-ports.scm (decode-utf8, bad-utf8-len): Flip a
couple of inequalities.
* test-suite/tests/ports.test ("string ports"): Add additional invalid
UTF-8 test case.
---
 module/ice-9/suspendable-ports.scm | 8 ++++----
 test-suite/tests/ports.test        | 7 +++++++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/module/ice-9/suspendable-ports.scm b/module/ice-9/suspendable-ports.scm
index a823f1d37..9fac1df62 100644
--- a/module/ice-9/suspendable-ports.scm
+++ b/module/ice-9/suspendable-ports.scm
@@ -419,7 +419,7 @@
                (= (logand u8_2 #xc0) #x80)
                (case u8_0
                  ((#xe0) (>= u8_1 #xa0))
-                 ((#xed) (>= u8_1 #x9f))
+                 ((#xed) (<= u8_1 #x9f))
                  (else #t)))
           (kt (integer->char
                (logior (ash (logand u8_0 #x0f) 12)
@@ -436,7 +436,7 @@
                (= (logand u8_3 #xc0) #x80)
                (case u8_0
                  ((#xf0) (>= u8_1 #x90))
-                 ((#xf4) (>= u8_1 #x8f))
+                 ((#xf4) (<= u8_1 #x8f))
                  (else #t)))
           (kt (integer->char
                (logior (ash (logand u8_0 #x07) 18)
@@ -462,7 +462,7 @@
      ((< buffering 2) 1)
      ((not (= (logand (ref 1) #xc0) #x80)) 1)
      ((and (eq? first-byte #xe0) (< (ref 1) #xa0)) 1)
-     ((and (eq? first-byte #xed) (< (ref 1) #x9f)) 1)
+     ((and (eq? first-byte #xed) (> (ref 1) #x9f)) 1)
      ((< buffering 3) 2)
      ((not (= (logand (ref 2) #xc0) #x80)) 2)
      (else 0)))
@@ -471,7 +471,7 @@
      ((< buffering 2) 1)
      ((not (= (logand (ref 1) #xc0) #x80)) 1)
      ((and (eq? first-byte #xf0) (< (ref 1) #x90)) 1)
-     ((and (eq? first-byte #xf4) (< (ref 1) #x8f)) 1)
+     ((and (eq? first-byte #xf4) (> (ref 1) #x8f)) 1)
      ((< buffering 3) 2)
      ((not (= (logand (ref 2) #xc0) #x80)) 2)
      ((< buffering 4) 3)
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 66e10e3dd..1b30e1a68 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -1059,6 +1059,13 @@
        eof))
 
     (test-decoding-error (#xf0 #x88 #x88 #x88) "UTF-8"
+      (error                ;; 2nd byte should be in the 90..BF range
+       error                ;; 88: not a valid starting byte
+       error                ;; 88: not a valid starting byte
+       error                ;; 88: not a valid starting byte
+       eof))
+
+    (test-decoding-error (#xf4 #xa4 #xbd #xa4) "UTF-8"
       (error                ;; 2nd byte should be in the 90..BF range
        error                ;; 88: not a valid starting byte
        error                ;; 88: not a valid starting byte
-- 
2.39.1





This bug report was last modified 2 years and 64 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.