Package: guile;
Reported by: "Thompson, David" <dthompson2 <at> worcester.edu>
Date: Sat, 12 Apr 2025 13:16:02 UTC
Severity: normal
Tags: patch
To reply to this bug, email your comments to 77762 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Sat, 12 Apr 2025 13:16:03 GMT) Full text and rfc822 format available."Thompson, David" <dthompson2 <at> worcester.edu>
:bug-guile <at> gnu.org
.
(Sat, 12 Apr 2025 13:16:03 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: "Thompson, David" <dthompson2 <at> worcester.edu> To: "bug-guile <at> gnu.org" <bug-guile <at> gnu.org> Subject: [PATCH] web: Add JSON module. Date: Sat, 12 Apr 2025 09:14:31 -0400
[Message part 1 (text/plain, inline)]
Attached is a patch that adds a new (web json) module. Some may remember that I submitted a patch back in 2015 (time flies, eh?) for an (ice-9 json) module that never made it in. Well, 10 years is a long time and Guile still doesn't have a built-in JSON module. Third party libraries like guile-json and guile-sjson are available, the latter being an adaptation of my original patch and the former remaining the go-to library used by larger Guile projects like Guix. There's also SRFI-180 (which sounds like a cool surfing trick!) which was published in 2020 but the API is, in my opinion, overly complicated due to generators and other things. Anyway, JSON continues to be *the* data interchange format of the web and Guile really ought to have a simple API that can read/write JSON to/from a port using only Scheme data types that have read syntax (i.e. no hash tables like guile-json). This minimal, practical API is what my patch provides. I've tried my best to make it as efficient as possible. I've settled on the following JSON<->Scheme data type mapping which is nearly identical to SRFI-180 with the exception of object keys: - true and false are #t and #f - null is the symbol 'null - numbers are either exact integers (fixnums and bignums) or inexact reals (flonums, NaNs and infinities excluded) - strings are strings - arrays are vectors - objects are association lists with string keys (SRFI-180 chose symbols but JSON uses strings so strings feel the most honest) Thanks in advance for the review, - Dave
[0001-web-Add-JSON-module.patch (text/x-patch, attachment)]
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Sat, 12 Apr 2025 16:02:03 GMT) Full text and rfc822 format available.Message #8 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: Tomas Volf <~@wolfsden.cz> To: "Thompson, David" <dthompson2 <at> worcester.edu> Cc: 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Sat, 12 Apr 2025 18:01:17 +0200
> - null is the symbol 'null Out of curiosity, what are your thoughts about using #nil instead? Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Sat, 12 Apr 2025 16:05:02 GMT) Full text and rfc822 format available.Message #11 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: "Thompson, David" <dthompson2 <at> worcester.edu> To: Tomas Volf <~@wolfsden.cz> Cc: 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Sat, 12 Apr 2025 12:04:14 -0400
On Sat, Apr 12, 2025 at 12:01 PM Tomas Volf <~@wolfsden.cz> wrote: > > > - null is the symbol 'null > > Out of curiosity, what are your thoughts about using #nil instead? My original patch from 10 years ago did this. Mark Weaver then explained to me that #nil was added specifically for the purpose of supporting Emacs Lisp on the Guile VM and shouldn't be used in Scheme code. The symbol 'null works well because there is no symbol type in JSON so there's no ambiguity. - Dave
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Sun, 13 Apr 2025 01:40:01 GMT) Full text and rfc822 format available.Message #14 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: Arun Isaac <arunisaac <at> systemreboot.net> To: "Thompson, David" <dthompson2 <at> worcester.edu>, 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Sun, 13 Apr 2025 02:39:25 +0100
Hi David, Thanks for the patch. I'm just trying to understand: Why do we need a JSON module in guile itself? Isn't guile-json, the way it is as an external library, good enough? > API that can read/write JSON to/from a port using only Scheme data > types that have read syntax (i.e. no hash tables like guile-json). guile-json does not use hash tables. guile-json uses association lists. I believe the decision to use hash tables was reversed many years ago. Cheers! Arun
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Sun, 13 Apr 2025 02:15:03 GMT) Full text and rfc822 format available.Message #17 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: "Thompson, David" <dthompson2 <at> worcester.edu> To: Arun Isaac <arunisaac <at> systemreboot.net> Cc: 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Sat, 12 Apr 2025 22:13:40 -0400
Hi Arun, On Sat, Apr 12, 2025 at 9:39 PM Arun Isaac <arunisaac <at> systemreboot.net> wrote: > > Thanks for the patch. I'm just trying to understand: Why do we need a > JSON module in guile itself? Isn't guile-json, the way it is as an > external library, good enough? I mean, Guile doesn't *need* to include anything, but JSON is one of the most common serialization formats around. Most language implementations ship with a JSON library. Guile already has a suite of web modules and so this fits right in with those. Over the years I've noticed many users, especially new users, asking why Guile doesn't ship with JSON support. I don't think you should have to install an external library for something so common. In my own projects I prefer to just check a JSON module into the source tree so I don't have to add an additional dependency. guile-json is a good library with a bunch of bells and whistles, but I think most people just want basic read/write procedures and would be happy to take them for granted with their Guile installation. > > API that can read/write JSON to/from a port using only Scheme data > > types that have read syntax (i.e. no hash tables like guile-json). > > guile-json does not use hash tables. guile-json uses association lists. > I believe the decision to use hash tables was reversed many years ago. Heh, shows how long it's been since I last used guile-json. Glad that's changed. :) - Dave
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Mon, 14 Apr 2025 04:22:02 GMT) Full text and rfc822 format available.Message #20 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: Arun Isaac <arunisaac <at> systemreboot.net> To: "Thompson, David" <dthompson2 <at> worcester.edu> Cc: 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Mon, 14 Apr 2025 05:21:16 +0100
Hi David, > I mean, Guile doesn't *need* to include anything, but JSON is one of > the most common serialization formats around. Most language > implementations ship with a JSON library. Guile already has a suite > of web modules and so this fits right in with those. Over the years > I've noticed many users, especially new users, asking why Guile > doesn't ship with JSON support. I don't think you should have to > install an external library for something so common. In my own > projects I prefer to just check a JSON module into the source tree so > I don't have to add an additional dependency. guile-json is a good > library with a bunch of bells and whistles, but I think most people > just want basic read/write procedures and would be happy to take them > for granted with their Guile installation. Thanks for the explanation. That sounds reasonable to me. But, I am a bit biased in favour of keeping guile small and portable. :-) Anyway, I'll let the guile maintainers make a call here. >> > API that can read/write JSON to/from a port using only Scheme data >> > types that have read syntax (i.e. no hash tables like guile-json). >> >> guile-json does not use hash tables. guile-json uses association lists. >> I believe the decision to use hash tables was reversed many years ago. > > Heh, shows how long it's been since I last used guile-json. Glad > that's changed. :) And, the reversal of that decision was thanks to your objection, as a matter of fact! :-) https://yhetil.org/guile-user/CAJ=RwfZgUg6-bCoYJR-v48ywCROEMu94NrSGrNKjzJB0foCfTQ <at> mail.gmail.com/ https://yhetil.org/guile-user/CA+XASoVmEy1Rt7w=pM21JkYL6dYJeGtW0_HjpnUa_eV5pGcvkQ <at> mail.gmail.com/ Cheers! Arun
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Mon, 14 Apr 2025 06:32:02 GMT) Full text and rfc822 format available.Message #23 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com> To: "Thompson, David" <dthompson2 <at> worcester.edu> Cc: 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Mon, 14 Apr 2025 15:30:55 +0900
Hi David, "Thompson, David" <dthompson2 <at> worcester.edu> writes: > Attached is a patch that adds a new (web json) module. Some may > remember that I submitted a patch back in 2015 (time flies, eh?) for > an (ice-9 json) module that never made it in. Well, 10 years is a long > time and Guile still doesn't have a built-in JSON module. Third party > libraries like guile-json and guile-sjson are available, the latter > being an adaptation of my original patch and the former remaining the > go-to library used by larger Guile projects like Guix. There's also > SRFI-180 (which sounds like a cool surfing trick!) which was published > in 2020 but the API is, in my opinion, overly complicated due to > generators and other things. Anyway, JSON continues to be *the* data > interchange format of the web and Guile really ought to have a simple > API that can read/write JSON to/from a port using only Scheme data > types that have read syntax (i.e. no hash tables like guile-json). > This minimal, practical API is what my patch provides. I've tried my > best to make it as efficient as possible. > > I've settled on the following JSON<->Scheme data type mapping which is > nearly identical to SRFI-180 with the exception of object keys: > > - true and false are #t and #f > - null is the symbol 'null > - numbers are either exact integers (fixnums and bignums) or inexact > reals (flonums, NaNs and infinities excluded) > - strings are strings > - arrays are vectors > - objects are association lists with string keys (SRFI-180 chose > symbols but JSON uses strings so strings feel the most honest) > > Thanks in advance for the review, First of all, let me say thank you for working on that! I agree that this would be most welcome in core Guile, for the reasons you mention. [...] > +@example > +@verbatim > +{ > + "name": "Eva Luator", > + "age": 24, > + "schemer": true, > + "hobbies": [ > + "hacking", > + "cycling", > + "surfing" > + ] > +} > +@end verbatim > +@end example > + > +can be represented with the following Scheme expression: > + > +@example > +@verbatim > +'(("name" . "Eva Luator") > + ("age" . 24) > + ("schemer" . #t) > + ("hobbies" . #("hacking" "cycling" "surfing"))) > +@end verbatim > +@end example Is there particular reason for using vectors instead of plain list to represent JSON arrays? The later would be more idiomatic unless there are technical reasons (perhaps performance?). > +Strings, exact integers, inexact reals (excluding NaNs and infinities), > +@code{#t}, @code{#f}, the symbol @code{null}, vectors, and association > +lists may be serialized as JSON. Association lists serialize as JSON > +objects and vectors serialize as JSON arrays. The keys of association > +lists @emph{must} be strings. > + > +@deffn {Scheme Procedure} read-json [port] > + > +Parse a JSON-encoded value from @var{port} and return its Scheme > +representation. If @var{port} is unspecified, the current input port is > +used. > + > +@example > +@verbatim > +(call-with-input-string "[true,false,null,42,\"foo\"]" read-json) > +;; => #(#t #f null 42 "foo") > + > +(call-with-input-string "{\"foo\":1,\"bar\":2}" read-json) > +;; => (("foo" . 1) ("bar" . 2)) > +@end verbatim > +@end example > + > +@end deffn > + > +@deftp {Exception Type} &json-read-error > +An exception type denoting JSON read errors. > +@end deftp > > +@deffn {Scheme Procedure} write-json exp [port] > + > +Serialize the expression @var{exp} as JSON-encoded text to @var{port}. > +If @var{port} is unspecified, the current output port is used. > + > +@example > +@verbatim > +(with-output-to-string (lambda () (write-json #(#t #f null 42 "foo")))) > +;; => "[true,false,null,42,\"foo\"]" > + > +(with-output-to-string (lambda () (write-json '(("foo" . 1) ("bar" . 2))))) > +;; => "{\"foo\":1,\"bar\":2}" > +@end verbatim > +@end example > + > +@end deffn > + > +@deftp {Exception Type} &json-write-error > +An exception type denoting JSON write errors. > +@end deftp I think it could be a bit nicer if the deffn of read-json and write-json explicitly mentioned that upon error an exception of type X is raised. > + > @node Web Client > @subsection Web Client > > diff --git a/module/web/json.scm b/module/web/json.scm > new file mode 100644 > index 000000000..41aac0e90 > --- /dev/null > +++ b/module/web/json.scm > @@ -0,0 +1,308 @@ > +;;;; json.scm --- JSON reader/writer (ECMA-404) > +;;;; Copyright (C) 2025 Free Software Foundation, Inc. > +;;;; > +;;;; This library is free software; you can redistribute it and/or > +;;;; modify it under the terms of the GNU Lesser General Public > +;;;; License as published by the Free Software Foundation; either > +;;;; version 3 of the License, or (at your option) any later version. > +;;;; > +;;;; This library is distributed in the hope that it will be useful, > +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of > +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;;;; Lesser General Public License for more details. > +;;;; > +;;;; You should have received a copy of the GNU Lesser General Public > +;;;; License along with this library; if not, write to the Free Software > +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA The FSF has gone office-less, so the above address is now incorrect [0]. The up-to-date template for the copyright notice (header) reads [1]: --8<---------------cut here---------------start------------->8--- This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>. --8<---------------cut here---------------end--------------->8--- [0] https://www.fsf.org/blogs/community/fsf-office-closing-party [1] https://www.gnu.org/licenses/gpl-howto.html > + > +(define-module (web json) > + #:use-module (ice-9 exceptions) > + #:use-module (ice-9 match) > + #:use-module (ice-9 textual-ports) > + #:export (&json-read-error > + read-json > + > + &json-write-error > + write-json)) > + > +(define-exception-type &json-read-error &error > + make-json-read-error > + json-read-error?) > + > +(define* (read-json #:optional (port (current-input-port))) > + "Parse a JSON-encoded value from @var{port} and return its Scheme > +representation. If @var{port} is unspecified, the current input port is > +used." > + (define (fail message) > + (raise-exception > + (make-exception (make-json-read-error) > + (make-exception-with-origin 'read-json) > + (make-exception-with-message message) > + (make-exception-with-irritants (list port))))) Hm, I wonder what (list port) looks like in the irritants when the exception is reported; is it useful? Shouldn't it show instead the problematic value? > + (define (consume-whitespace) > + (case (peek-char port) > + ((#\space #\tab #\return #\newline) Should a match + ((? char-whitespace?)) predicate pattern be used here instead, or similar? Or perhaps the above is faster and more self-contained, which can be a good thing. > + (read-char port) > + (consume-whitespace)) > + (else (values)))) > + (define-syntax-rule (define-keyword-reader name str val) > + (define (name) > + (if (string=? (get-string-n port (string-length str)) str) > + val > + (fail "invalid keyword")))) > + (define-keyword-reader read-true "true" #t) > + (define-keyword-reader read-false "false" #f) > + (define-keyword-reader read-null "null" 'null) > + (define (read-hex-digit) > + (case (peek-char port) > + ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) > + (- (char->integer (read-char port)) (char->integer #\0))) > + ((#\a #\b #\c #\d #\e #\f) > + (+ 10 (- (char->integer (read-char port)) (char->integer #\a)))) > + ((#\A #\B #\C #\D #\E #\F) > + (+ 10 (- (char->integer (read-char port)) (char->integer #\A)))) > + (else (fail "invalid hex digit")))) > + (define (read-utf16-character) > + (let* ((a (read-hex-digit)) > + (b (read-hex-digit)) > + (c (read-hex-digit)) > + (d (read-hex-digit))) > + (integer->char (+ (* a (expt 16 3)) (* b (expt 16 2)) (* c 16) d)))) > + (define (read-escape-character) > + (case (read-char port) > + ((#\") #\") > + ((#\\) #\\) > + ((#\/) #\/) > + ((#\b) #\backspace) > + ((#\f) #\page) > + ((#\n) #\newline) > + ((#\r) #\return) > + ((#\t) #\tab) > + ((#\u) (read-utf16-character)) > + (else (fail "invalid escape character")))) > + (define (read-string) > + (read-char port) > + (list->string > + (let lp () > + (match (read-char port) > + ((? eof-object?) (fail "EOF while reading string")) > + (#\" '()) > + (#\\ (cons (read-escape-character) (lp))) > + (char (cons char (lp))))))) > + (define (read-digit-maybe) > + (case (peek-char port) > + ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) > + (- (char->integer (read-char port)) > + (char->integer #\0))) > + (else #f))) > + (define (read-integer) > + (let ((x (read-digit-maybe))) > + (and x > + (let lp ((x x)) > + (match (read-digit-maybe) > + (#f x) > + (y (lp (+ (* x 10) y)))))))) Perhaps the above should be named read-integer-maybe, since it may return #f? > + (define (read-fraction) > + (case (peek-char port) > + ((#\.) > + (read-char port) > + (let lp ((mag 10)) > + (let ((n (read-digit-maybe))) > + (if n (+ (/ n mag) (lp (* mag 10))) 0)))) > + (else 0))) Should the above be named 'read-decimal' ? Does a decimal number in JSON always start with '.' and not with 0. ? I was a bit puzzled on what 'mag' may mean here, I guess 'magnitude' although there doesn't appear to have a clear terminology for it. > + (define (read-exponent) > + (case (peek-char port) > + ((#\e #\E) > + (read-char port) > + (case (peek-char port) > + ((#\-) > + (read-char port) > + (expt 10 (- (read-integer)))) > + ((#\+) > + (read-char port) > + (expt 10 (read-integer))) > + (else > + (expt 10 (read-integer))))) > + (else 1))) > + (define (read-positive-number) > + (let ((n (read-integer))) > + (and n > + (let* ((f (read-fraction)) > + (e (read-exponent)) > + (x (* (+ n f) e))) > + (if (exact-integer? x) x (exact->inexact x)))))) This may return #f. Should it fail instead, or be named read-positive-number-maybe ? > + (define (read-negative-number) > + (read-char port) > + (let ((x (read-positive-number))) > + (if x (- x) (fail "invalid number")))) Not symmetrical with the above: this one would fail if an integer couldn't be read in read-positive-number. > + (define (read-leading-zero-number) > + (read-char port) > + (case (peek-char port) > + ;; Extraneous zeroes are not allowed. A single leading zero > + ;; can only be followed by a decimal point. > + ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\e #\E) > + (fail "extraneous leading zero")) Why not check for (not #\.) explicitly? That'd be clearer and would cover all cases (even crazy unexpected ones). > + ;; Fractional number. > + ((#\.) > + (let* ((d (read-fraction)) > + (e (read-exponent))) > + (exact->inexact (* d e)))) > + ;; Just plain zero. > + (else 0))) > + (define (read-key+value-pair) > + (let ((key (read-string))) > + (consume-whitespace) > + (case (read-char port) > + ((#\:) > + (consume-whitespace) > + (cons key (read-value))) > + (else (fail "invalid key/value pair delimiter"))))) > + (define (read-object) > + (read-char port) > + (consume-whitespace) > + (case (peek-char port) > + ;; Empty object. > + ((#\}) > + (read-char port) > + '()) > + (else > + ;; Read first key/value pair, then all subsequent pairs delimited > + ;; by commas. > + (cons (read-key+value-pair) > + (let lp () > + (consume-whitespace) > + (case (peek-char port) > + ((#\,) > + (read-char port) > + (consume-whitespace) > + (cons (read-key+value-pair) (lp))) > + ;; End of object. > + ((#\}) > + (read-char port) > + '()) > + (else (fail "invalid object delimiter")))))))) > + (define (read-array) > + (read-char port) > + (consume-whitespace) > + (case (peek-char port) > + ;; Empty array. > + ((#\]) > + (read-char port) > + #()) > + (else > + (list->vector As mentioned above, just a plain list would be more Schemey, no? What does the vector type buys us? A user wanting a vector could always call list->vector themselves, and otherwise we save some computation. > + ;; Read the first element, then all subsequent elements > + ;; delimited by commas. > + (cons (read-value) > + (let lp () > + (consume-whitespace) > + (case (peek-char port) > + ;; Elements are comma delimited. > + ((#\,) > + (read-char port) > + (consume-whitespace) > + (cons (read-value) (lp))) > + ;; End of array. > + ((#\]) > + (read-char port) > + '()) > + (else (fail "invalid array delimiter"))))))))) > + (define (read-value) > + (consume-whitespace) > + (case (peek-char port) > + ((#\") (read-string)) > + ((#\{) (read-object)) > + ((#\[) (read-array)) > + ((#\t) (read-true)) > + ((#\f) (read-false)) > + ((#\n) (read-null)) > + ((#\-) (read-negative-number)) > + ((#\0) (read-leading-zero-number)) > + ((#\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) (read-positive-number)) > + (else (fail "invalid value")))) > + (read-value)) > + > +(define-exception-type &json-write-error &error > + make-json-write-error > + json-write-error?) > + > +(define* (write-json exp #:optional (port (current-output-port))) > + "Serialize the expression @var{exp} as JSON-encoded text to @var{port}. > +If @var{port} is unspecified, the current output port is used." > + (define (fail message x) > + (raise-exception > + (make-exception (make-json-write-error) > + (make-exception-with-origin 'write-json) > + (make-exception-with-message message) > + (make-exception-with-irritants (list x))))) > + (define (write-char/escape char) > + (match char > + (#\" (put-string port "\\\"")) > + (#\\ (put-string port "\\\\")) > + (#\/ (put-string port "\\/")) > + (#\backspace (put-string port "\\b")) > + (#\page (put-string port "\\f")) > + (#\newline (put-string port "\\n")) > + (#\return (put-string port "\\r")) > + (#\tab (put-string port "\\t")) > + (_ (put-char port char)))) > + (define (write-string str) > + (let ((in (open-input-string str))) Looks like the above 'in' binding is not used. > + (put-char port #\") > + (string-for-each write-char/escape str) > + (put-char port #\"))) > + (define (write-pair x) > + (match x > + (((? string? key) . value) > + (write-string key) > + (put-char port #\:) > + (write-value value)) > + (_ (fail "invalid key/value pair" x)))) > + (define (write-object obj) > + (put-char port #\{) > + (match obj > + ((head . rest) > + (write-pair head) > + (let lp ((obj rest)) > + (match obj > + (() (values)) Any reason to return (values) instead of some dummy #t to denote 'no-op' ?. > + ((head . rest) > + (put-char port #\,) > + (write-pair head) > + (lp rest)) > + (_ (fail "invalid object" obj)))))) > + (put-char port #\})) > + (define (write-array v) > + (put-char port #\[) > + (match (vector-length v) > + (0 (values)) > + (n > + (write-value (vector-ref v 0)) > + (do ((i 1 (1+ i))) > + ((= i n)) > + (put-char port #\,) > + (write-value (vector-ref v i))))) I suppose the above is more efficient than a for-each loop? I'd be curious to see it profiled, if you still have data. At least now I see than for > 100k, vector-ref is faster than list-ref, which probably explains why you went with vectors (could still be an implementation detail with the list->vector call left in the writer though, in my opinion). > + (put-char port #\])) > + (define (write-number x) > + (if (or (exact-integer? x) > + (and (real? x) > + (inexact? x) > + ;; NaNs and infinities are not allowed. > + (not (or (nan? x) (inf? x))))) > + ;; Scheme's string representations of exact integers and floats > + ;; are compatible with JSON. > + (put-string port (number->string x)) > + (fail "invalid number" x))) > + (define (write-value x) > + (match x > + (#t (put-string port "true")) > + (#f (put-string port "false")) > + ('null (put-string port "null")) > + (() (put-string port "{}")) > + ((? pair?) (write-object x)) > + ((? vector?) (write-array x)) > + ((? string?) (write-string x)) > + ((? number?) (write-number x)) > + (_ (fail "invalid value" x)))) > + (write-value exp)) Phew. That's a pretty low-level parser! I hope it's fast, otherwise it seems it'd be more concise/fun/maintainable to devise a PEG-based one, which appears to be doable for JSON, from what I've read. Perhaps sprinkle with a few performance-related comments where such concerns impacted the design choices, so that we can remember and retest/reverify these in the future when Guile evolves. > diff --git a/test-suite/Makefile.am b/test-suite/Makefile.am > index 6014b1f1f..00afea142 100644 > --- a/test-suite/Makefile.am > +++ b/test-suite/Makefile.am > @@ -73,6 +73,7 @@ SCM_TESTS = tests/00-initial-env.test \ > tests/iconv.test \ > tests/import.test \ > tests/interp.test \ > + tests/json.test \ > tests/keywords.test \ > tests/list.test \ > tests/load.test \ > diff --git a/test-suite/tests/json.test b/test-suite/tests/json.test > new file mode 100644 > index 000000000..f92eeccec > --- /dev/null > +++ b/test-suite/tests/json.test > @@ -0,0 +1,154 @@ > +;;;; json.test --- test JSON reader/writer -*- scheme -*- > +;;;; > +;;;; Copyright (C) 2015 Free Software Foundation, Inc. > +;;;; > +;;;; This library is free software; you can redistribute it and/or > +;;;; modify it under the terms of the GNU Lesser General Public > +;;;; License as published by the Free Software Foundation; either > +;;;; version 3 of the License, or (at your option) any later version. > +;;;; > +;;;; This library is distributed in the hope that it will be useful, > +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of > +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;;;; Lesser General Public License for more details. > +;;;; > +;;;; You should have received a copy of the GNU Lesser General Public > +;;;; License along with this library; if not, write to the Free Software > +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + > +(define-module (test-suite test-json) > + #:use-module (test-suite lib) > + #:use-module (web json)) > + > +;;; > +;;; Reader > +;;; > + > +(define (read-json-string str) > + (call-with-input-string str read-json)) > + > +(define (json-read=? str x) > + (= x (read-json-string str))) > + > +(define (json-read-eq? str x) > + (eq? x (read-json-string str))) > + > +(define (json-read-equal? str x) > + (equal? x (read-json-string str))) > + > +(define (json-read-string=? str x) > + (string=? x (read-json-string str))) > + > +(with-test-prefix "read-json" > + ;; Keywords > + (pass-if (json-read-eq? "true" #t)) > + (pass-if (json-read-eq? "false" #f)) > + (pass-if (json-read-eq? "null" 'null)) > + ;; Numbers > + (pass-if (json-read=? "0" 0)) > + (pass-if (json-read=? "-0" 0)) > + (pass-if (json-read=? "0.0" 0.0)) > + (pass-if (json-read=? "-0.0" -0.0)) > + (pass-if (json-read=? "0.1" 0.1)) > + (pass-if (json-read=? "1.234" 1.234)) > + (pass-if (json-read=? "1" 1)) > + (pass-if (json-read=? "-1" -1)) > + (pass-if (json-read=? "1.1" 1.1)) > + (pass-if (json-read=? "1e2" 1e2)) > + (pass-if (json-read=? "1.1e2" 1.1e2)) > + (pass-if (json-read=? "1.1e-2" 1.1e-2)) > + (pass-if (json-read=? "1.1e+2" 1.1e2)) > + ;; Extraneous zeroes in fraction > + (pass-if (json-read=? "1.000" 1)) > + (pass-if (json-read=? "1.5000" 1.5)) > + ;; Extraneous zeroes in exponent > + (pass-if (json-read=? "1.1e000" 1.1)) > + (pass-if (json-read=? "1.1e-02" 1.1e-2)) > + (pass-if (json-read=? "1.1e+02" 1.1e2)) > + ;; Strings > + (pass-if (json-read-string=? "\"foo\"" "foo")) > + ;; Escape codes > + (pass-if (json-read-string=? "\"\\\"\"" "\"")) > + (pass-if (json-read-string=? "\"\\\\\"" "\\")) > + (pass-if (json-read-string=? "\"\\/\"" "/")) > + (pass-if (json-read-string=? "\"\\b\"" "\b")) > + (pass-if (json-read-string=? "\"\\f\"" "\f")) > + (pass-if (json-read-string=? "\"\\n\"" "\n")) > + (pass-if (json-read-string=? "\"\\r\"" "\r")) > + (pass-if (json-read-string=? "\"\\t\"" "\t")) > + ;; Unicode in hexadecimal format > + (pass-if (json-read-string=? "\"\\u12ab\"" "\u12ab")) > + ;; Objects > + (pass-if (json-read-equal? "{}" '())) > + (pass-if (json-read-equal? "{ \"foo\": \"bar\", \"baz\": \"frob\"}" > + '(("foo" . "bar") ("baz" . "frob")))) > + ;; Nested objects > + (pass-if (json-read-equal? "{\"foo\":{\"bar\":\"baz\"}}" > + '(("foo" . (("bar" . "baz")))))) > + ;; Arrays > + (pass-if (json-read-equal? "[]" #())) > + (pass-if (json-read-equal? "[1, 2, \"foo\"]" > + #(1 2 "foo"))) > + ;; Nested arrays > + (pass-if (json-read-equal? "[1, 2, [\"foo\", \"bar\"]]" > + #(1 2 #("foo" "bar")))) > + ;; Arrays and objects nested within each other > + (pass-if (json-read-equal? "{\"foo\":[{\"bar\":true},{\"baz\":[1,2,3]}]}" > + '(("foo" . #((("bar" . #t)) > + (("baz" . #(1 2 3)))))))) > + ;; Leading whitespace > + (pass-if (json-read-eq? "\t\r\n true" #t))) > +;;; > +;;; Writer > +;;; > + > +(define (write-json-string exp) > + (call-with-output-string > + (lambda (port) > + (write-json exp port)))) > + > +(define (json-write-string=? exp str) > + (string=? str (write-json-string exp))) > + > +(with-test-prefix "write-json" > + ;; Keywords > + (pass-if (json-write-string=? #t "true")) > + (pass-if (json-write-string=? #f "false")) > + (pass-if (json-write-string=? 'null "null")) > + ;; Numbers > + (pass-if (json-write-string=? 0 "0")) > + (pass-if (json-write-string=? 0.0 "0.0")) > + (pass-if (json-write-string=? 0.1 "0.1")) > + (pass-if (json-write-string=? 1 "1")) > + (pass-if (json-write-string=? -1 "-1")) > + (pass-if (json-write-string=? 1.1 "1.1")) > + ;; Strings > + (pass-if (json-write-string=? "foo" "\"foo\"")) > + ;; Escape codes > + (pass-if (json-write-string=? "\"" "\"\\\"\"")) > + (pass-if (json-write-string=? "\\" "\"\\\\\"")) > + (pass-if (json-write-string=? "/" "\"\\/\"")) > + (pass-if (json-write-string=? "\b" "\"\\b\"")) > + (pass-if (json-write-string=? "\f" "\"\\f\"")) > + (pass-if (json-write-string=? "\n" "\"\\n\"")) > + (pass-if (json-write-string=? "\r" "\"\\r\"")) > + (pass-if (json-write-string=? "\t" "\"\\t\"")) > + ;; Objects > + (pass-if (json-write-string=? '() "{}")) > + (pass-if (json-write-string=? '(("foo" . "bar") ("baz" . "frob")) > + "{\"foo\":\"bar\",\"baz\":\"frob\"}")) > + ;; Nested objects > + (pass-if (json-write-string=? '(("foo" . (("bar" . "baz")))) > + "{\"foo\":{\"bar\":\"baz\"}}")) > + ;; Arrays > + (pass-if (json-write-string=? #() "[]")) > + (pass-if (json-write-string=? #(1 2 "foo") > + "[1,2,\"foo\"]")) > + ;; Nested arrays > + (pass-if (json-write-string=? #(1 2 #("foo" "bar")) > + "[1,2,[\"foo\",\"bar\"]]")) > + ;; Arrays and objects nested in each other > + (pass-if (json-write-string=? '(("foo" . #((("bar" . #t)) > + (("baz" . #(1 2)))))) > + "{\"foo\":[{\"bar\":true},{\"baz\":[1,2]}]}"))) Neat. Nitpick: perhaps add a trailing '.' after each stand-alone comments, to follow existing conventions. I hope my armchair commentary is of some use :-). Thanks again for working on a JSON parser/writer for Guile. -- Thanks, Maxim
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Mon, 14 Apr 2025 12:17:02 GMT) Full text and rfc822 format available.Message #26 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: "Thompson, David" <dthompson2 <at> worcester.edu> To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com> Cc: 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Mon, 14 Apr 2025 08:16:13 -0400
[Message part 1 (text/plain, inline)]
Hi Maxim, Thanks for the review! On Mon, Apr 14, 2025 at 2:31 AM Maxim Cournoyer <maxim.cournoyer <at> gmail.com> wrote: > > > +@example > > +@verbatim > > +{ > > + "name": "Eva Luator", > > + "age": 24, > > + "schemer": true, > > + "hobbies": [ > > + "hacking", > > + "cycling", > > + "surfing" > > + ] > > +} > > +@end verbatim > > +@end example > > + > > +can be represented with the following Scheme expression: > > + > > +@example > > +@verbatim > > +'(("name" . "Eva Luator") > > + ("age" . 24) > > + ("schemer" . #t) > > + ("hobbies" . #("hacking" "cycling" "surfing"))) > > +@end verbatim > > +@end example > > Is there particular reason for using vectors instead of plain list to > represent JSON arrays? The later would be more idiomatic unless there > are technical reasons (perhaps performance?). Back in 2015 I chose lists out of convenience because lists are such a fundamental data structure in Scheme. It is very tempting! However, there are a couple of issues: 1) O(n) access time for lists is not great when there's a perfectly suitable O(1) data structure with read syntax sitting right there and 2) It creates ambiguity when lists are also used to represent objects. My previous patch used the symbol '@ as a sentinel to mark objects, but it was kinda gross and I grew to dislike it more and more over the years. SRFI-180 uses vectors, Racket's JSON library uses vectors, etc. and I think they made the right call. > > +Strings, exact integers, inexact reals (excluding NaNs and infinities), > > +@code{#t}, @code{#f}, the symbol @code{null}, vectors, and association > > +lists may be serialized as JSON. Association lists serialize as JSON > > +objects and vectors serialize as JSON arrays. The keys of association > > +lists @emph{must} be strings. > > + > > +@deffn {Scheme Procedure} read-json [port] > > + > > +Parse a JSON-encoded value from @var{port} and return its Scheme > > +representation. If @var{port} is unspecified, the current input port is > > +used. > > + > > +@example > > +@verbatim > > +(call-with-input-string "[true,false,null,42,\"foo\"]" read-json) > > +;; => #(#t #f null 42 "foo") > > + > > +(call-with-input-string "{\"foo\":1,\"bar\":2}" read-json) > > +;; => (("foo" . 1) ("bar" . 2)) > > +@end verbatim > > +@end example > > + > > +@end deffn > > + > > +@deftp {Exception Type} &json-read-error > > +An exception type denoting JSON read errors. > > +@end deftp > > > > +@deffn {Scheme Procedure} write-json exp [port] > > + > > +Serialize the expression @var{exp} as JSON-encoded text to @var{port}. > > +If @var{port} is unspecified, the current output port is used. > > + > > +@example > > +@verbatim > > +(with-output-to-string (lambda () (write-json #(#t #f null 42 "foo")))) > > +;; => "[true,false,null,42,\"foo\"]" > > + > > +(with-output-to-string (lambda () (write-json '(("foo" . 1) ("bar" . 2))))) > > +;; => "{\"foo\":1,\"bar\":2}" > > +@end verbatim > > +@end example > > + > > +@end deffn > > + > > +@deftp {Exception Type} &json-write-error > > +An exception type denoting JSON write errors. > > +@end deftp > > I think it could be a bit nicer if the deffn of read-json and write-json > explicitly mentioned that upon error an exception of type X is raised. Sure, makes sense to mention that. > > + > > @node Web Client > > @subsection Web Client > > > > diff --git a/module/web/json.scm b/module/web/json.scm > > new file mode 100644 > > index 000000000..41aac0e90 > > --- /dev/null > > +++ b/module/web/json.scm > > @@ -0,0 +1,308 @@ > > +;;;; json.scm --- JSON reader/writer (ECMA-404) > > +;;;; Copyright (C) 2025 Free Software Foundation, Inc. > > +;;;; > > +;;;; This library is free software; you can redistribute it and/or > > +;;;; modify it under the terms of the GNU Lesser General Public > > +;;;; License as published by the Free Software Foundation; either > > +;;;; version 3 of the License, or (at your option) any later version. > > +;;;; > > +;;;; This library is distributed in the hope that it will be useful, > > +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of > > +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > +;;;; Lesser General Public License for more details. > > +;;;; > > +;;;; You should have received a copy of the GNU Lesser General Public > > +;;;; License along with this library; if not, write to the Free Software > > +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > > The FSF has gone office-less, so the above address is now incorrect [0]. > The up-to-date template for the copyright notice (header) reads [1]: > > --8<---------------cut here---------------start------------->8--- > This program is free software: you can redistribute it and/or modify > it under the terms of the GNU General Public License as published by > the Free Software Foundation, either version 3 of the License, or > (at your option) any later version. > > This program is distributed in the hope that it will be useful, but > WITHOUT ANY WARRANTY; without even the implied warranty of > MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > General Public License for more details. > > You should have received a copy of the GNU General Public License > along with this program. If not, see > <https://www.gnu.org/licenses/>. > --8<---------------cut here---------------end--------------->8--- > > [0] https://www.fsf.org/blogs/community/fsf-office-closing-party > [1] https://www.gnu.org/licenses/gpl-howto.html Ah, right. Good catch. I knew about this change but wasn't thinking about the copyright header at the time. I even walked through Franklin St. recently and thought about the FSF no longer having an office. I always thought it was a bit silly to include a mailing address in a license header because offices are not forever. > > + > > +(define-module (web json) > > + #:use-module (ice-9 exceptions) > > + #:use-module (ice-9 match) > > + #:use-module (ice-9 textual-ports) > > + #:export (&json-read-error > > + read-json > > + > > + &json-write-error > > + write-json)) > > + > > +(define-exception-type &json-read-error &error > > + make-json-read-error > > + json-read-error?) > > + > > +(define* (read-json #:optional (port (current-input-port))) > > + "Parse a JSON-encoded value from @var{port} and return its Scheme > > +representation. If @var{port} is unspecified, the current input port is > > +used." > > + (define (fail message) > > + (raise-exception > > + (make-exception (make-json-read-error) > > + (make-exception-with-origin 'read-json) > > + (make-exception-with-message message) > > + (make-exception-with-irritants (list port))))) > > Hm, I wonder what (list port) looks like in the irritants when the > exception is reported; is it useful? Shouldn't it show instead the > problematic value? By including the port in the irritants it allows the exception handler to use 'port-line' and 'port-column' to show where in the JSON document the error occurred (for ports with such information like file ports). > > + (define (consume-whitespace) > > + (case (peek-char port) > > + ((#\space #\tab #\return #\newline) > > Should a match + ((? char-whitespace?)) predicate pattern be used here > instead, or similar? Or perhaps the above is faster and more > self-contained, which can be a good thing. ECMA-404 states that these 4 characters are the only acceptable whitespace characters: "Whitespace is any sequence of one or more of the following code points: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020)." > > + (read-char port) > > + (consume-whitespace)) > > + (else (values)))) > > + (define-syntax-rule (define-keyword-reader name str val) > > + (define (name) > > + (if (string=? (get-string-n port (string-length str)) str) > > + val > > + (fail "invalid keyword")))) > > + (define-keyword-reader read-true "true" #t) > > + (define-keyword-reader read-false "false" #f) > > + (define-keyword-reader read-null "null" 'null) > > + (define (read-hex-digit) > > + (case (peek-char port) > > + ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) > > + (- (char->integer (read-char port)) (char->integer #\0))) > > + ((#\a #\b #\c #\d #\e #\f) > > + (+ 10 (- (char->integer (read-char port)) (char->integer #\a)))) > > + ((#\A #\B #\C #\D #\E #\F) > > + (+ 10 (- (char->integer (read-char port)) (char->integer #\A)))) > > + (else (fail "invalid hex digit")))) > > + (define (read-utf16-character) > > + (let* ((a (read-hex-digit)) > > + (b (read-hex-digit)) > > + (c (read-hex-digit)) > > + (d (read-hex-digit))) > > + (integer->char (+ (* a (expt 16 3)) (* b (expt 16 2)) (* c 16) d)))) > > + (define (read-escape-character) > > + (case (read-char port) > > + ((#\") #\") > > + ((#\\) #\\) > > + ((#\/) #\/) > > + ((#\b) #\backspace) > > + ((#\f) #\page) > > + ((#\n) #\newline) > > + ((#\r) #\return) > > + ((#\t) #\tab) > > + ((#\u) (read-utf16-character)) > > + (else (fail "invalid escape character")))) > > + (define (read-string) > > + (read-char port) > > + (list->string > > + (let lp () > > + (match (read-char port) > > + ((? eof-object?) (fail "EOF while reading string")) > > + (#\" '()) > > + (#\\ (cons (read-escape-character) (lp))) > > + (char (cons char (lp))))))) > > + (define (read-digit-maybe) > > + (case (peek-char port) > > + ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) > > + (- (char->integer (read-char port)) > > + (char->integer #\0))) > > + (else #f))) > > + (define (read-integer) > > + (let ((x (read-digit-maybe))) > > + (and x > > + (let lp ((x x)) > > + (match (read-digit-maybe) > > + (#f x) > > + (y (lp (+ (* x 10) y)))))))) > > Perhaps the above should be named read-integer-maybe, since it may > return #f? Yeah, good idea. > > + (define (read-fraction) > > + (case (peek-char port) > > + ((#\.) > > + (read-char port) > > + (let lp ((mag 10)) > > + (let ((n (read-digit-maybe))) > > + (if n (+ (/ n mag) (lp (* mag 10))) 0)))) > > + (else 0))) > > Should the above be named 'read-decimal' ? Does a decimal number in > JSON always start with '.' and not with 0. ? I was a bit puzzled on > what 'mag' may mean here, I guess 'magnitude' although there doesn't > appear to have a clear terminology for it. It's called 'read-fraction' because it reads the fractional part of the number. I considered 'read-decimal' but I think it's less descriptive. This procedure is called from two places, after the respective caller has already read the digits before the decimal point. 'mag' does indeed mean 'magnitude', as each digit read increases the order of magnitude of the resulting number. I borrowed this name from Hoot's 'read' implementation, which means it's probably in Guile's pure-Scheme 'read' implementation as well. > > + (define (read-exponent) > > + (case (peek-char port) > > + ((#\e #\E) > > + (read-char port) > > + (case (peek-char port) > > + ((#\-) > > + (read-char port) > > + (expt 10 (- (read-integer)))) > > + ((#\+) > > + (read-char port) > > + (expt 10 (read-integer))) > > + (else > > + (expt 10 (read-integer))))) > > + (else 1))) > > + (define (read-positive-number) > > + (let ((n (read-integer))) > > + (and n > > + (let* ((f (read-fraction)) > > + (e (read-exponent)) > > + (x (* (+ n f) e))) > > + (if (exact-integer? x) x (exact->inexact x)))))) > > This may return #f. Should it fail instead, or be named > read-positive-number-maybe ? Adding 'maybe' to the name is good. It's called from two places. One caller knows that it's not going to return #f because it has peeked a valid digit from the port beforehand, the other does a check and fails upon #f. > > + (define (read-negative-number) > > + (read-char port) > > + (let ((x (read-positive-number))) > > + (if x (- x) (fail "invalid number")))) > > Not symmetrical with the above: this one would fail if an integer > couldn't be read in read-positive-number. This is the intended behavior to ensure that '-' is not parsed as a valid number. There needs to be at least one digit. > > + (define (read-leading-zero-number) > > + (read-char port) > > + (case (peek-char port) > > + ;; Extraneous zeroes are not allowed. A single leading zero > > + ;; can only be followed by a decimal point. > > + ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\e #\E) > > + (fail "extraneous leading zero")) > > Why not check for (not #\.) explicitly? That'd be clearer and would > cover all cases (even crazy unexpected ones). The following clause checks for '.'. We need to distinguish between 3 types of input here: - invalid extraneous zeroes like '09' (ECMA-404 says this is not allowed) - fractional notation like '0.123' - plain '0' If I leave out this clause, we can no longer distinguish errors from plain zeroes. There's likely other ways to express it but this feels straightforward enough. However, it was a great idea to take another look at this code because there is a bug! '0e3' is a valid JSON number that this code rejects. It should parse to 0. My updated patch fixes this and adds a test case. > > + ;; Fractional number. > > + ((#\.) > > + (let* ((d (read-fraction)) > > + (e (read-exponent))) > > + (exact->inexact (* d e)))) > > + ;; Just plain zero. > > + (else 0))) > > + (define (read-key+value-pair) > > + (let ((key (read-string))) > > + (consume-whitespace) > > + (case (read-char port) > > + ((#\:) > > + (consume-whitespace) > > + (cons key (read-value))) > > + (else (fail "invalid key/value pair delimiter"))))) > > + (define (read-object) > > + (read-char port) > > + (consume-whitespace) > > + (case (peek-char port) > > + ;; Empty object. > > + ((#\}) > > + (read-char port) > > + '()) > > + (else > > + ;; Read first key/value pair, then all subsequent pairs delimited > > + ;; by commas. > > + (cons (read-key+value-pair) > > + (let lp () > > + (consume-whitespace) > > + (case (peek-char port) > > + ((#\,) > > + (read-char port) > > + (consume-whitespace) > > + (cons (read-key+value-pair) (lp))) > > + ;; End of object. > > + ((#\}) > > + (read-char port) > > + '()) > > + (else (fail "invalid object delimiter")))))))) > > + (define (read-array) > > + (read-char port) > > + (consume-whitespace) > > + (case (peek-char port) > > + ;; Empty array. > > + ((#\]) > > + (read-char port) > > + #()) > > + (else > > + (list->vector > > As mentioned above, just a plain list would be more Schemey, no? What > does the vector type buys us? A user wanting a vector could always call > list->vector themselves, and otherwise we save some computation. As stated above, vectors are the more natural analog to JSON arrays and being distinct from pairs they resolve some ambiguity in the Scheme representation that would exist otherwise. > > + ;; Read the first element, then all subsequent elements > > + ;; delimited by commas. > > + (cons (read-value) > > + (let lp () > > + (consume-whitespace) > > + (case (peek-char port) > > + ;; Elements are comma delimited. > > + ((#\,) > > + (read-char port) > > + (consume-whitespace) > > + (cons (read-value) (lp))) > > + ;; End of array. > > + ((#\]) > > + (read-char port) > > + '()) > > + (else (fail "invalid array delimiter"))))))))) > > + (define (read-value) > > + (consume-whitespace) > > + (case (peek-char port) > > + ((#\") (read-string)) > > + ((#\{) (read-object)) > > + ((#\[) (read-array)) > > + ((#\t) (read-true)) > > + ((#\f) (read-false)) > > + ((#\n) (read-null)) > > + ((#\-) (read-negative-number)) > > + ((#\0) (read-leading-zero-number)) > > + ((#\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) (read-positive-number)) > > + (else (fail "invalid value")))) > > + (read-value)) > > + > > +(define-exception-type &json-write-error &error > > + make-json-write-error > > + json-write-error?) > > + > > +(define* (write-json exp #:optional (port (current-output-port))) > > + "Serialize the expression @var{exp} as JSON-encoded text to @var{port}. > > +If @var{port} is unspecified, the current output port is used." > > + (define (fail message x) > > + (raise-exception > > + (make-exception (make-json-write-error) > > + (make-exception-with-origin 'write-json) > > + (make-exception-with-message message) > > + (make-exception-with-irritants (list x))))) > > + (define (write-char/escape char) > > + (match char > > + (#\" (put-string port "\\\"")) > > + (#\\ (put-string port "\\\\")) > > + (#\/ (put-string port "\\/")) > > + (#\backspace (put-string port "\\b")) > > + (#\page (put-string port "\\f")) > > + (#\newline (put-string port "\\n")) > > + (#\return (put-string port "\\r")) > > + (#\tab (put-string port "\\t")) > > + (_ (put-char port char)))) > > + (define (write-string str) > > + (let ((in (open-input-string str))) > > Looks like the above 'in' binding is not used. Vestigial code, my favorite. Good eye! Removed. > > + (put-char port #\") > > + (string-for-each write-char/escape str) > > + (put-char port #\"))) > > + (define (write-pair x) > > + (match x > > + (((? string? key) . value) > > + (write-string key) > > + (put-char port #\:) > > + (write-value value)) > > + (_ (fail "invalid key/value pair" x)))) > > + (define (write-object obj) > > + (put-char port #\{) > > + (match obj > > + ((head . rest) > > + (write-pair head) > > + (let lp ((obj rest)) > > + (match obj > > + (() (values)) > > Any reason to return (values) instead of some dummy #t to denote 'no-op' > ?. The loop is evaluated for effect and thus there is nothing to return so returning 0 values makes sense. This is something I picked up from Andy Wingo. > > + ((head . rest) > > + (put-char port #\,) > > + (write-pair head) > > + (lp rest)) > > + (_ (fail "invalid object" obj)))))) > > + (put-char port #\})) > > + (define (write-array v) > > + (put-char port #\[) > > + (match (vector-length v) > > + (0 (values)) > > + (n > > + (write-value (vector-ref v 0)) > > + (do ((i 1 (1+ i))) > > + ((= i n)) > > + (put-char port #\,) > > + (write-value (vector-ref v i))))) > > I suppose the above is more efficient than a for-each loop? I'd be > curious to see it profiled, if you still have data. At least now I see > than for > 100k, vector-ref is faster than list-ref, which probably > explains why you went with vectors (could still be an implementation > detail with the list->vector call left in the writer though, in my > opinion). Yes, it is more efficient because it avoids closure allocation. Also, I just don't see any reason to import (rnrs base) or (srfi srfi-43) to get vector-for-each when looping over a vector is trivial. I didn't profile anything. > > + (put-char port #\])) > > + (define (write-number x) > > + (if (or (exact-integer? x) > > + (and (real? x) > > + (inexact? x) > > + ;; NaNs and infinities are not allowed. > > + (not (or (nan? x) (inf? x))))) > > + ;; Scheme's string representations of exact integers and floats > > + ;; are compatible with JSON. > > + (put-string port (number->string x)) > > + (fail "invalid number" x))) > > + (define (write-value x) > > + (match x > > + (#t (put-string port "true")) > > + (#f (put-string port "false")) > > + ('null (put-string port "null")) > > + (() (put-string port "{}")) > > + ((? pair?) (write-object x)) > > + ((? vector?) (write-array x)) > > + ((? string?) (write-string x)) > > + ((? number?) (write-number x)) > > + (_ (fail "invalid value" x)))) > > + (write-value exp)) > > Phew. That's a pretty low-level parser! I hope it's fast, otherwise it > seems it'd be more concise/fun/maintainable to devise a PEG-based one, > which appears to be doable for JSON, from what I've read. Perhaps > sprinkle with a few performance-related comments where such concerns > impacted the design choices, so that we can remember and retest/reverify > these in the future when Guile evolves. JSON is a pretty simple format and thus I think a hand-rolled parser is appropriate. It's much simpler than 'read', anyway. I suppose it would be more concise, but "PEG parser" and "fun" do not go together for me. At ~300 lines of quite simple code (I did not go hog wild on macros or fancy abstractions nor did I sacrifice readable code for performance) I don't think there is much concern regarding maintenance. If someone wants to experiment to see how a PEG parser compares, though, feel free. > > diff --git a/test-suite/Makefile.am b/test-suite/Makefile.am > > index 6014b1f1f..00afea142 100644 > > --- a/test-suite/Makefile.am > > +++ b/test-suite/Makefile.am > > @@ -73,6 +73,7 @@ SCM_TESTS = tests/00-initial-env.test \ > > tests/iconv.test \ > > tests/import.test \ > > tests/interp.test \ > > + tests/json.test \ > > tests/keywords.test \ > > tests/list.test \ > > tests/load.test \ > > diff --git a/test-suite/tests/json.test b/test-suite/tests/json.test > > new file mode 100644 > > index 000000000..f92eeccec > > --- /dev/null > > +++ b/test-suite/tests/json.test > > @@ -0,0 +1,154 @@ > > +;;;; json.test --- test JSON reader/writer -*- scheme -*- > > +;;;; > > +;;;; Copyright (C) 2015 Free Software Foundation, Inc. > > +;;;; > > +;;;; This library is free software; you can redistribute it and/or > > +;;;; modify it under the terms of the GNU Lesser General Public > > +;;;; License as published by the Free Software Foundation; either > > +;;;; version 3 of the License, or (at your option) any later version. > > +;;;; > > +;;;; This library is distributed in the hope that it will be useful, > > +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of > > +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > +;;;; Lesser General Public License for more details. > > +;;;; > > +;;;; You should have received a copy of the GNU Lesser General Public > > +;;;; License along with this library; if not, write to the Free Software > > +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > > + > > +(define-module (test-suite test-json) > > + #:use-module (test-suite lib) > > + #:use-module (web json)) > > + > > +;;; > > +;;; Reader > > +;;; > > + > > +(define (read-json-string str) > > + (call-with-input-string str read-json)) > > + > > +(define (json-read=? str x) > > + (= x (read-json-string str))) > > + > > +(define (json-read-eq? str x) > > + (eq? x (read-json-string str))) > > + > > +(define (json-read-equal? str x) > > + (equal? x (read-json-string str))) > > + > > +(define (json-read-string=? str x) > > + (string=? x (read-json-string str))) > > + > > +(with-test-prefix "read-json" > > + ;; Keywords > > + (pass-if (json-read-eq? "true" #t)) > > + (pass-if (json-read-eq? "false" #f)) > > + (pass-if (json-read-eq? "null" 'null)) > > + ;; Numbers > > + (pass-if (json-read=? "0" 0)) > > + (pass-if (json-read=? "-0" 0)) > > + (pass-if (json-read=? "0.0" 0.0)) > > + (pass-if (json-read=? "-0.0" -0.0)) > > + (pass-if (json-read=? "0.1" 0.1)) > > + (pass-if (json-read=? "1.234" 1.234)) > > + (pass-if (json-read=? "1" 1)) > > + (pass-if (json-read=? "-1" -1)) > > + (pass-if (json-read=? "1.1" 1.1)) > > + (pass-if (json-read=? "1e2" 1e2)) > > + (pass-if (json-read=? "1.1e2" 1.1e2)) > > + (pass-if (json-read=? "1.1e-2" 1.1e-2)) > > + (pass-if (json-read=? "1.1e+2" 1.1e2)) > > + ;; Extraneous zeroes in fraction > > + (pass-if (json-read=? "1.000" 1)) > > + (pass-if (json-read=? "1.5000" 1.5)) > > + ;; Extraneous zeroes in exponent > > + (pass-if (json-read=? "1.1e000" 1.1)) > > + (pass-if (json-read=? "1.1e-02" 1.1e-2)) > > + (pass-if (json-read=? "1.1e+02" 1.1e2)) > > + ;; Strings > > + (pass-if (json-read-string=? "\"foo\"" "foo")) > > + ;; Escape codes > > + (pass-if (json-read-string=? "\"\\\"\"" "\"")) > > + (pass-if (json-read-string=? "\"\\\\\"" "\\")) > > + (pass-if (json-read-string=? "\"\\/\"" "/")) > > + (pass-if (json-read-string=? "\"\\b\"" "\b")) > > + (pass-if (json-read-string=? "\"\\f\"" "\f")) > > + (pass-if (json-read-string=? "\"\\n\"" "\n")) > > + (pass-if (json-read-string=? "\"\\r\"" "\r")) > > + (pass-if (json-read-string=? "\"\\t\"" "\t")) > > + ;; Unicode in hexadecimal format > > + (pass-if (json-read-string=? "\"\\u12ab\"" "\u12ab")) > > + ;; Objects > > + (pass-if (json-read-equal? "{}" '())) > > + (pass-if (json-read-equal? "{ \"foo\": \"bar\", \"baz\": \"frob\"}" > > + '(("foo" . "bar") ("baz" . "frob")))) > > + ;; Nested objects > > + (pass-if (json-read-equal? "{\"foo\":{\"bar\":\"baz\"}}" > > + '(("foo" . (("bar" . "baz")))))) > > + ;; Arrays > > + (pass-if (json-read-equal? "[]" #())) > > + (pass-if (json-read-equal? "[1, 2, \"foo\"]" > > + #(1 2 "foo"))) > > + ;; Nested arrays > > + (pass-if (json-read-equal? "[1, 2, [\"foo\", \"bar\"]]" > > + #(1 2 #("foo" "bar")))) > > + ;; Arrays and objects nested within each other > > + (pass-if (json-read-equal? "{\"foo\":[{\"bar\":true},{\"baz\":[1,2,3]}]}" > > + '(("foo" . #((("bar" . #t)) > > + (("baz" . #(1 2 3)))))))) > > + ;; Leading whitespace > > + (pass-if (json-read-eq? "\t\r\n true" #t))) > > > +;;; > > +;;; Writer > > +;;; > > + > > +(define (write-json-string exp) > > + (call-with-output-string > > + (lambda (port) > > + (write-json exp port)))) > > + > > +(define (json-write-string=? exp str) > > + (string=? str (write-json-string exp))) > > + > > +(with-test-prefix "write-json" > > + ;; Keywords > > + (pass-if (json-write-string=? #t "true")) > > + (pass-if (json-write-string=? #f "false")) > > + (pass-if (json-write-string=? 'null "null")) > > + ;; Numbers > > + (pass-if (json-write-string=? 0 "0")) > > + (pass-if (json-write-string=? 0.0 "0.0")) > > + (pass-if (json-write-string=? 0.1 "0.1")) > > + (pass-if (json-write-string=? 1 "1")) > > + (pass-if (json-write-string=? -1 "-1")) > > + (pass-if (json-write-string=? 1.1 "1.1")) > > + ;; Strings > > + (pass-if (json-write-string=? "foo" "\"foo\"")) > > + ;; Escape codes > > + (pass-if (json-write-string=? "\"" "\"\\\"\"")) > > + (pass-if (json-write-string=? "\\" "\"\\\\\"")) > > + (pass-if (json-write-string=? "/" "\"\\/\"")) > > + (pass-if (json-write-string=? "\b" "\"\\b\"")) > > + (pass-if (json-write-string=? "\f" "\"\\f\"")) > > + (pass-if (json-write-string=? "\n" "\"\\n\"")) > > + (pass-if (json-write-string=? "\r" "\"\\r\"")) > > + (pass-if (json-write-string=? "\t" "\"\\t\"")) > > + ;; Objects > > + (pass-if (json-write-string=? '() "{}")) > > + (pass-if (json-write-string=? '(("foo" . "bar") ("baz" . "frob")) > > + "{\"foo\":\"bar\",\"baz\":\"frob\"}")) > > + ;; Nested objects > > + (pass-if (json-write-string=? '(("foo" . (("bar" . "baz")))) > > + "{\"foo\":{\"bar\":\"baz\"}}")) > > + ;; Arrays > > + (pass-if (json-write-string=? #() "[]")) > > + (pass-if (json-write-string=? #(1 2 "foo") > > + "[1,2,\"foo\"]")) > > + ;; Nested arrays > > + (pass-if (json-write-string=? #(1 2 #("foo" "bar")) > > + "[1,2,[\"foo\",\"bar\"]]")) > > + ;; Arrays and objects nested in each other > > + (pass-if (json-write-string=? '(("foo" . #((("bar" . #t)) > > + (("baz" . #(1 2)))))) > > + "{\"foo\":[{\"bar\":true},{\"baz\":[1,2]}]}"))) > > Neat. Nitpick: perhaps add a trailing '.' after each stand-alone > comments, to follow existing conventions. Sure thing. > I hope my armchair commentary is of some use :-). It was! Always nice to have another pair of eyes on some code you've stared at for too long to notice all the little issues that remain. Thank you! > Thanks again for working on a JSON parser/writer for Guile. :) Updated patch attached. - Dave
[0001-web-Add-JSON-module.patch (text/x-patch, attachment)]
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Sun, 20 Apr 2025 13:49:05 GMT) Full text and rfc822 format available.Message #29 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com> To: "Thompson, David" <dthompson2 <at> worcester.edu> Cc: 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Sun, 20 Apr 2025 22:48:00 +0900
Hi David, "Thompson, David" <dthompson2 <at> worcester.edu> writes: > Hi Maxim, > > Thanks for the review! My pleasure! It's a small thing I can do to try to help nudge builtin JSON support in Guile, which I agree has its place in 2025. In general, I'd like Guile to compete a bit more with Python in the out-of-the-box library capabilities. Especially since outside of Guix installing Guile libraries is probably not that fun (I'd suspect many Guile libraries to be missing or outdated on various distributions), so a more useful core has even more value there. [...] >> Is there particular reason for using vectors instead of plain list to >> represent JSON arrays? The later would be more idiomatic unless there >> are technical reasons (perhaps performance?). > > Back in 2015 I chose lists out of convenience because lists are such a > fundamental data structure in Scheme. It is very tempting! However, > there are a couple of issues: 1) O(n) access time for lists is not > great when there's a perfectly suitable O(1) data structure with read > syntax sitting right there and 2) It creates ambiguity when lists are > also used to represent objects. My previous patch used the symbol '@ > as a sentinel to mark objects, but it was kinda gross and I grew to > dislike it more and more over the years. SRFI-180 uses vectors, > Racket's JSON library uses vectors, etc. and I think they made the > right call. I see. It makes sense, thanks for explaining. [...] >> Hm, I wonder what (list port) looks like in the irritants when the >> exception is reported; is it useful? Shouldn't it show instead the >> problematic value? > > By including the port in the irritants it allows the exception handler > to use 'port-line' and 'port-column' to show where in the JSON > document the error occurred (for ports with such information like file > ports). OK! >> > + (define (consume-whitespace) >> > + (case (peek-char port) >> > + ((#\space #\tab #\return #\newline) >> >> Should a match + ((? char-whitespace?)) predicate pattern be used here >> instead, or similar? Or perhaps the above is faster and more >> self-contained, which can be a good thing. > > ECMA-404 states that these 4 characters are the only acceptable > whitespace characters: > > "Whitespace is any sequence of one or more of the following code > points: character tabulation (U+0009), line feed (U+000A), carriage > return (U+000D), and space (U+0020)." Makes sense. [...] >> > + (define (read-fraction) >> > + (case (peek-char port) >> > + ((#\.) >> > + (read-char port) >> > + (let lp ((mag 10)) >> > + (let ((n (read-digit-maybe))) >> > + (if n (+ (/ n mag) (lp (* mag 10))) 0)))) >> > + (else 0))) >> >> Should the above be named 'read-decimal' ? Does a decimal number in >> JSON always start with '.' and not with 0. ? I was a bit puzzled on >> what 'mag' may mean here, I guess 'magnitude' although there doesn't >> appear to have a clear terminology for it. > > It's called 'read-fraction' because it reads the fractional part of > the number. I considered 'read-decimal' but I think it's less > descriptive. This procedure is called from two places, after the > respective caller has already read the digits before the decimal > point. 'mag' does indeed mean 'magnitude', as each digit read > increases the order of magnitude of the resulting number. I borrowed > this name from Hoot's 'read' implementation, which means it's probably > in Guile's pure-Scheme 'read' implementation as well. OK, thanks for explaining. I realized later in my review that these are very dependent on when they are called (they are very specific or narrow, more than their name might suggest). It's fine. [...] > The following clause checks for '.'. We need to distinguish between 3 > types of input here: > > - invalid extraneous zeroes like '09' (ECMA-404 says this is not allowed) > - fractional notation like '0.123' > - plain '0' > > If I leave out this clause, we can no longer distinguish errors from > plain zeroes. There's likely other ways to express it but this feels > straightforward enough. > > However, it was a great idea to take another look at this code because > there is a bug! '0e3' is a valid JSON number that this code rejects. > It should parse to 0. My updated patch fixes this and adds a test > case. Good catch! [...] >> > + (define (write-object obj) >> > + (put-char port #\{) >> > + (match obj >> > + ((head . rest) >> > + (write-pair head) >> > + (let lp ((obj rest)) >> > + (match obj >> > + (() (values)) >> >> Any reason to return (values) instead of some dummy #t to denote 'no-op' >> ?. > > The loop is evaluated for effect and thus there is nothing to return > so returning 0 values makes sense. This is something I picked up from > Andy Wingo. Interesting. >> > + ((head . rest) >> > + (put-char port #\,) >> > + (write-pair head) >> > + (lp rest)) >> > + (_ (fail "invalid object" obj)))))) >> > + (put-char port #\})) >> > + (define (write-array v) >> > + (put-char port #\[) >> > + (match (vector-length v) >> > + (0 (values)) >> > + (n >> > + (write-value (vector-ref v 0)) >> > + (do ((i 1 (1+ i))) >> > + ((= i n)) >> > + (put-char port #\,) >> > + (write-value (vector-ref v i))))) >> >> I suppose the above is more efficient than a for-each loop? I'd be >> curious to see it profiled, if you still have data. At least now I see >> than for > 100k, vector-ref is faster than list-ref, which probably >> explains why you went with vectors (could still be an implementation >> detail with the list->vector call left in the writer though, in my >> opinion). > > Yes, it is more efficient because it avoids closure allocation. Also, > I just don't see any reason to import (rnrs base) or (srfi srfi-43) to > get vector-for-each when looping over a vector is trivial. I didn't > profile anything. Yeah, I was thinking in terms of plain lists, didn't know we didn't iterators for vectors in the core (I've seldom used vectors). Sounds reasonable. >> > + (put-char port #\])) >> > + (define (write-number x) >> > + (if (or (exact-integer? x) >> > + (and (real? x) >> > + (inexact? x) >> > + ;; NaNs and infinities are not allowed. >> > + (not (or (nan? x) (inf? x))))) >> > + ;; Scheme's string representations of exact integers and floats >> > + ;; are compatible with JSON. >> > + (put-string port (number->string x)) >> > + (fail "invalid number" x))) >> > + (define (write-value x) >> > + (match x >> > + (#t (put-string port "true")) >> > + (#f (put-string port "false")) >> > + ('null (put-string port "null")) >> > + (() (put-string port "{}")) >> > + ((? pair?) (write-object x)) >> > + ((? vector?) (write-array x)) >> > + ((? string?) (write-string x)) >> > + ((? number?) (write-number x)) >> > + (_ (fail "invalid value" x)))) >> > + (write-value exp)) >> >> Phew. That's a pretty low-level parser! I hope it's fast, otherwise it >> seems it'd be more concise/fun/maintainable to devise a PEG-based one, >> which appears to be doable for JSON, from what I've read. Perhaps >> sprinkle with a few performance-related comments where such concerns >> impacted the design choices, so that we can remember and retest/reverify >> these in the future when Guile evolves. > > JSON is a pretty simple format and thus I think a hand-rolled parser > is appropriate. It's much simpler than 'read', anyway. I suppose it > would be more concise, but "PEG parser" and "fun" do not go together > for me. At ~300 lines of quite simple code (I did not go hog wild on > macros or fancy abstractions nor did I sacrifice readable code for > performance) I don't think there is much concern regarding > maintenance. If someone wants to experiment to see how a PEG parser > compares, though, feel free. Someone could always make that experiment in the future and suggest a replacement, if the performance is as good or better and the code simpler. LGTM. Reviewed-by: Maxim Cournoyer <maxim.cournoyer <at> gmail> -- Thanks, Maxim
bug-guile <at> gnu.org
:bug#77762
; Package guile
.
(Fri, 02 May 2025 12:05:02 GMT) Full text and rfc822 format available.Message #32 received at 77762 <at> debbugs.gnu.org (full text, mbox):
From: Tomas Volf <~@wolfsden.cz> To: "Thompson, David" <dthompson2 <at> worcester.edu> Cc: 77762 <at> debbugs.gnu.org Subject: Re: bug#77762: [PATCH] web: Add JSON module. Date: Fri, 02 May 2025 14:03:56 +0200
"Thompson, David" <dthompson2 <at> worcester.edu> writes: > On Sat, Apr 12, 2025 at 12:01 PM Tomas Volf <~@wolfsden.cz> wrote: >> >> > - null is the symbol 'null >> >> Out of curiosity, what are your thoughts about using #nil instead? > > My original patch from 10 years ago did this. Mark Weaver then > explained to me that #nil was added specifically for the purpose of > supporting Emacs Lisp on the Guile VM and shouldn't be used in Scheme > code. The symbol 'null works well because there is no symbol type in > JSON so there's no ambiguity. Thanks for the explanation. I asked not due to perceived ambiguity, but due to how #nil interacts with conditions. It would be handy to be able to write (when (assq-ref data 'foo) ...) instead of (when (not (eq? 'null (or (assq-ref data 'foo) 'null))) ...) Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.