GNU bug report logs - #26059
utf16->string and utf32->string don't conform to R6RS

Previous Next

Package: guile;

Reported by: taylanbayirli <at> gmail.com ("Taylan Ulrich Bayırlı/Kammer")

Date: Sat, 11 Mar 2017 16:21:02 UTC

Severity: normal

Done: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 26059 in the body.
You can then email your comments to 26059 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#26059; Package guile. (Sat, 11 Mar 2017 16:21:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to taylanbayirli <at> gmail.com ("Taylan Ulrich Bayırlı/Kammer"):
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sat, 11 Mar 2017 16:21:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com ("Taylan Ulrich Bayırlı/Kammer")
To: bug-guile <at> gnu.org
Subject: utf16->string and utf32->string don't conform to R6RS
Date: Sat, 11 Mar 2017 17:26:42 +0100
See the R6RS Libraries document page 10.  The differences:

- R6RS supports reading a BOM.

- R6RS mandates an endianness argument to specify the behavior at the
  absence of a BOM.

- R6RS allows an optional third argument 'endianness-mandatory' to
  explicitly ignore any possible BOM.

Here's a quick patch on top of master to implement the R6RS procedures
in terms of the Guile procedures and export them with a rename from
(rnrs bytevectors).


===File
/home/taylan/src/guile/guile-master/0001-Fix-R6RS-utf16-string-and-utf32-string.patch===
From f51cd1d4884caafb1ed0072cd77c0e3145f34576 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Taylan=20Ulrich=20Bay=C4=B1rl=C4=B1/Kammer?=
 <taylanbayirli <at> gmail.com>
Date: Fri, 10 Mar 2017 22:36:55 +0100
Subject: [PATCH] Fix R6RS utf16->string and utf32->string.

* module/rnrs/bytevectors.scm (read-bom16, read-bom32): New procedures.
(r6rs-utf16->string, r6rs-utf32->string): Ditto.
---
 module/rnrs/bytevectors.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/module/rnrs/bytevectors.scm b/module/rnrs/bytevectors.scm
index 9744359f0..997a8c9cb 100644
--- a/module/rnrs/bytevectors.scm
+++ b/module/rnrs/bytevectors.scm
@@ -69,7 +69,9 @@
            bytevector-ieee-double-native-set!
 
            string->utf8 string->utf16 string->utf32
-           utf8->string utf16->string utf32->string))
+           utf8->string
+           (r6rs-utf16->string . utf16->string)
+           (r6rs-utf32->string . utf32->string)))
 
 
 (load-extension (string-append "libguile-" (effective-version))
@@ -80,4 +82,52 @@
       `(quote ,sym)
       (error "unsupported endianness" sym)))
 
+(define (read-bom16 bv)
+  (let ((c0 (bytevector-u8-ref bv 0))
+        (c1 (bytevector-u8-ref bv 1)))
+    (cond
+     ((and (= c0 #xFE) (= c1 #xFF))
+      'big)
+     ((and (= c0 #xFF) (= c1 #xFE))
+      'little)
+     (else
+      #f))))
+
+(define r6rs-utf16->string
+  (case-lambda
+    ((bv default-endianness)
+     (let ((bom-endianness (read-bom16 bv)))
+       (if (not bom-endianness)
+           (utf16->string bv default-endianness)
+           (substring/shared (utf16->string bv bom-endianness) 1))))
+    ((bv endianness endianness-mandatory?)
+     (if endianness-mandatory?
+         (utf16->string bv endianness)
+         (r6rs-utf16->string bv endianness)))))
+
+(define (read-bom32 bv)
+  (let ((c0 (bytevector-u8-ref bv 0))
+        (c1 (bytevector-u8-ref bv 1))
+        (c2 (bytevector-u8-ref bv 2))
+        (c3 (bytevector-u8-ref bv 3)))
+    (cond
+     ((and (= c0 #x00) (= c1 #x00) (= c2 #xFE) (= c3 #xFF))
+      'big)
+     ((and (= c0 #xFF) (= c1 #xFE) (= c2 #x00) (= c3 #x00))
+      'little)
+     (else
+      #f))))
+
+(define r6rs-utf32->string
+  (case-lambda
+    ((bv default-endianness)
+     (let ((bom-endianness (read-bom32 bv)))
+       (if (not bom-endianness)
+           (utf32->string bv default-endianness)
+           (substring/shared (utf32->string bv bom-endianness) 1))))
+    ((bv endianness endianness-mandatory?)
+     (if endianness-mandatory?
+         (utf32->string bv endianness)
+         (r6rs-utf32->string bv endianness)))))
+
 ;;; bytevector.scm ends here
-- 
2.11.0

============================================================




bug closed, send any further explanations to 26059 <at> debbugs.gnu.org and taylanbayirli <at> gmail.com ("Taylan Ulrich Bayırlı/Kammer") Request was from taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer) to control <at> debbugs.gnu.org. (Sat, 11 Mar 2017 18:20:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guile <at> gnu.org:
bug#26059; Package guile. (Sat, 11 Mar 2017 18:25:01 GMT) Full text and rfc822 format available.

Message #10 received at 26059 <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
To: 26059 <at> debbugs.gnu.org
Subject: Sorry about the duplicate.
Date: Sat, 11 Mar 2017 19:30:59 +0100
Please ignore this, as it's a duplicate of #26058.




Information forwarded to bug-guile <at> gnu.org:
bug#26059; Package guile. (Mon, 13 Mar 2017 12:59:02 GMT) Full text and rfc822 format available.

Message #13 received at 26059-close <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: 26059-close <at> debbugs.gnu.org
Subject: Re: bug#26059: Sorry about the duplicate.
Date: Mon, 13 Mar 2017 13:58:33 +0100
On Sat 11 Mar 2017 19:30, taylanbayirli <at> gmail.com (Taylan Ulrich "Bayırlı/Kammer") writes:

> Please ignore this, as it's a duplicate of #26058.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 11 Apr 2017 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 69 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.