Package: guix-patches;
Reported by: Philip McGrath <philip <at> philipmcgrath.com>
Date: Sun, 14 Nov 2021 12:43:01 UTC
Severity: normal
Tags: patch
Done: Liliana Marie Prikler <liliana.prikler <at> gmail.com>
Bug is archived. No further changes may be made.
Message #1094 received at 51838 <at> debbugs.gnu.org (full text, mbox):
From: Philip McGrath <philip <at> philipmcgrath.com> To: Liliana Marie Prikler <liliana.prikler <at> gmail.com>, 51838 <at> debbugs.gnu.org Cc: Timothy Sample <samplet <at> ngyro.com>, Pierre Langlois <pierre.langlois <at> gmx.com>, Jelle Licht <jlicht <at> fsfe.org> Subject: Re: [PATCH v6 03/41] guix: node-build-system: Add JSON utilities. Date: Fri, 31 Dec 2021 00:22:48 -0500
Hi Liliana, On 12/30/21 11:56, Liliana Marie Prikler wrote: > Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath: >> This commit adds several utility functions for non-destructive >> transformation of the JSON representation used by (guix build json), >> particularly for purely functional update of JSON objects. They >> should >> eventually be exported, but most are private for now to allow for >> more >> experience and consideration before commiting to the API. The design >> was largely inspired by the 'racket/dict' and 'racket/hash' >> libraries. >> Liliana Marie Prikler proposed 'with-atomic-json-file-replacement'. > Given that this is a fair amount of procedures that you're proposing, I > think a new file would be appropriate. Perhaps (guix build json- > utils)? Adding that should IIUC not cause a world rebuild, so we could > do that on master. > I agree that these functions ultimately belong in their own file, and I'd even had the name (guix build json-utils) in mind. I put them in (guix build node-build-system) for now because, if they were in (guix build json-utils), they would have to be exported, at which point their API would have to be relatively stable, and I didn't want designing them to block, or to be rushed by, the rest of this series. Now, maybe consensus on the json-utils will turn out to be the easy part! But my high-level question is, in your view, do any of the points I'm about to respond to block this patch series? On 12/30/21 13:18, Liliana Marie Prikler wrote: > Having argued for these procedures to be moved into their own file in a > separate mail, now it's time to bikeshed stylistic choices. > > Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath: >> +(define (jsobject-ref js key failure-result) >> + "Return the value assosciated with KEY in the json object JS. If >> KEY is not >> +found and FAILURE-RESULT is a procedure, it is called in tail position >> with >> +zero arguments. Otherwise, FAILURE-RESULT is returned." >> + ;; TODO: `failure-result` should be optional, but should the default >> + ;; `failure-result` be #f (like `assoc-ref`), a thunk raising an >> exception, >> + ;; '(@), or something else? Keep it mandatory until we discuss and >> decide. >> + (match js >> + (('@ . alist) >> + (match (assoc key alist) >> + (#f >> + (if (procedure? failure-result) >> + (failure-result) >> + failure-result)) >> + ((_ . value) >> + value))))) > We can safely replace failure-result by Guile's DEFAULT and leave error > handling to the user. I don't care whether we call it DEFAULT or FAILURE-RESULT. I agree that it should not raise an exception by default. My current thinking is that '(@) would be a good default DEFAULT: it is useful for the common pattern of traversing or transforming nested objects, and, as you point out at the end of this email, explicitly typing #f (the other useful possibility) looks much more like normal Scheme than explicitly typing '(@). In my experience with [1] and [2] (the purely-functional dictionary libraries I use most often), the special case for a procedure as DEFAULT is useful. I feel less strongly about it because it's relatively easy to work around for JSON, since you can pick some non-JSON signal value, but it also seems to have especially little downside for JSON, since (guix build json) will never have a procedure in a valid JSON object. In addition to raising exceptions and other control flow, it's useful for default values that are expensive to produce. [1]: https://docs.racket-lang.org/reference/hashtables.html [2]: https://docs.racket-lang.org/reference/dicts.html > >> +(define (alist-pop alist key) >> + "Return two values: the first pair in ALIST with the given KEY in >> its >> +'car' (or #f, if no such pair exists) and an assosciation list like >> (and >> +potentially sharing storage with) ALIST, but with no entry for KEY." >> + (match (assoc key alist) >> + ;; If key isn't present, we don't need to do any allocation >> + (#f >> + (values #f alist)) >> + (found >> + (values found >> + ;; Because we have `found`, we can find it more >> + ;; efficiently this time with `eq?`. We avoid using >> + ;; `delq` because it would copy pairs in a shared >> + ;; tail. We assume a sufficiently smart compiler to >> + ;; handle "tail recursion modulo cons" (vid. e.g. Indiana >> + ;; University Technical Report No. 19, Friedman & Wise >> + ;; 1975) at least as efficiently as a hand-written >> + ;; tail-recursive implementation with an accumulator. >> + (let loop ((alist alist)) >> + (match alist >> + ;; We know that `found` is present, >> + ;; so no need to check for '() >> + ((this . alist) >> + (if (eq? this found) >> + alist >> + (cons this (loop alist)))))))))) > I think this can be more efficiently be done in a "single" loop. > > (let loop ((rest alist) > (previous '())) > (match rest > (() (values #f alist)) > ((first . rest) > (if (eq? (car first) key) > (values first (reverse! previous rest)) > (loop rest (cons first previous)))))) > I'll admit to a Racket bias, but, having just eliminated the use of 'assoc-set!', I'm loathe to start mutating pairs (even correctly). To quote a bit from the SRFI-1 spec for 'append-reverse!', "note that this pattern of iterative computation followed by a reverse can frequently be rewritten as a recursion, dispensing with the reverse and append-reverse steps, and shifting temporary, intermediate storage from the heap to the stack, which is typically a win for reasons of cache locality and eager storage reclamation." (See how 'set-cdr!' can crash safe Chez Scheme! <https://github.com/cisco/ChezScheme/issues/599>) IIUC, using SRFI-1's 'span' would lead to the same situation. > Also, I don't think your version is tail-recursive. (loop alist) is > not in tail position from what I can tell. Yes, "tail recursion modulo cons" refers to a compiler optimization for functions which are _not_ tail recursive. For full details, see the Friedman & Wise 1975 tech report I cited at <https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various other articles), but, as briefly as I can: The optimization rests on the observation that many recursive functions, like the classic definition of 'map': (define (map f lst) (match lst (() '()) ((this . lst) (cons (f this) (map f lst))))) are nearly tail-recursive, and the only real work remaining to be done in the continuation of the recursive call is to fill in the cdr of the pair. Thus, a compiler can safely transform this code into a truly tail-recursive implementation: (define (map f lst) (match lst (() '()) ((this . lst) (define ret (list (f this))) (let loop ((dest ret) (lst lst)) (match lst ((this . lst) (define new (list (f this))) (set-cdr! dest new) (loop new lst)) (() ret)))))) Unlike the Proper Implementation of Tail Calls (so-called "tail-call optimization"), handling "tail recursion modulo cons" truly is an optimization: it does not change the space complexity of the function. But it can allow the compiler to generate whatever code it thinks will work best with its collector/allocator and continuation/"call stack" implementation. (The optimizations applies to constructors in general, not just 'cons', and a compiler can safely apply it to values that are immutable from the perspective of the source language.) > >> +;; Sadly, Guile's implementation of (@ (srfi srfi-1) alist-delete) >> +;; performs unnecessary allocation, e.g. this currently evaluates to >> #f: >> +;; >> +;; (let ((alist `(("a" . 1)("b" . 2)("c" . 3)))) >> +;; (eq? alist (alist-delete "x" alist))) >> +;; >> +;; These functions generally choose to allocate a new outer pair >> (with the '@ >> +;; tag), even though in unusual cases the resulting object might not >> have >> +;; changed, for the sake of simplicity and to avoid retaining a >> reference to >> +;; the original alist longer than necessary. But that is O(1) >> allocation that >> +;; could only rarely be avoided: `alist-delete` would allocate O(n) >> pairs, >> +;; which would only be necessary in the worst case. >> +(define (alist-delete* alist key) >> + "Return an assosciation list like (and potentially sharing storage >> with) >> +ALIST, but with no entry for KEY." >> + (define-values (_popped remaining) >> + (alist-pop alist key)) >> + remaining) > That's a pretty long comment around something that could be done with > call-with-values or SRFI-71 let. I think one of these two should be > preferred. > > Note that both our versions of alist-pop only pop the first key (as > they should). This means that alist-delete* should really be called > alist-delete-1 as in "remove the first pair in ALIST belonging to KEY". > For the larger JSON handling below, this makes no difference however. Here I was using '*' to mean "a slightly altered version of", as with 'letrec' and 'letrec*', but, yes, since the other functions defined here use '*' to mean "zero or more times", the name is confusing: I think I'd just call it 'alist-delete' and not import (srfi srfi-1)'s version. The comment may be unnecessarily long ... the essence of what I was trying to explain is that, in all of these implementations, I've tried to avoid unnecessary allocation. Being able to rely on 'alist-delete', and more generally 'alist-pop', not to needlessly copy the "spine" of the list lets later functions use them unconditionally. Why would you prefer 'call-with-values' or SRFI-71 over 'define-values'? The style guide against which I'm used to working [3] generally prefers internal definitions, to avoid rightward drift. [3]: https://docs.racket-lang.org/style/Choosing_the_Right_Construct.html#%28part._.Definitions%29 >> +(define (alist-set alist key value) >> + "Return an assosciation list like ALIST, but with KEY mapped to >> VALUE, >> +replacing any existing mapping for KEY." >> + (acons key value (alist-delete* alist key))) > Is order relevant here? Because we could just as well reimplement our > alist-delete* loop and cons the replacement onto the rest. WDYT? Relying on order for JSON objects is non-interoperable, per RFC 8259 § 4. I'm not intending for these alist procedures to be exported, so I'm not trying to handle any more general case than that, as I explain in the comments at the top of the file. I'm not sure what the advantage would be to reimplementing the 'alist-delete' loop here. > >> +(define (jsobject-set js key value) >> + "Return a json object like JS, but with KEY mapped to VALUE, >> replacing any >> +existing mapping for KEY." >> + (cons '@ (match js >> + (('@ . alist) >> + (alist-set alist key value))))) > I think it'd be wiser to put the cons inside the match. > I don't care very much either way. >> +(define jsobject-set* >> + (case-lambda >> + "Return a json object like JS, but functionally extended by >> mapping each >> +KEY to each VALUE, replacing any existing mapping for each KEY. The >> update >> +takes place from left to right, so later mappings overwrite earlier >> mappings >> +for the same KEY." >> + ((js) >> + js) >> + ((js key value) >> + (jsobject-set js key value)) >> + ((js . args) >> + (cons '@ (match js >> + (('@ . alist) >> + (let loop ((alist alist) >> + (args args)) >> + (match args >> + (() >> + alist) >> + ((key value . args) >> + (loop (alist-set alist key value) >> + args)))))))))) > I'm not sure if I like this "syntax". I think I'd prefer > (jsobject-set* obj (FIELD1 VALUE1) (FIELD2 VALUE2) ...) > with FIELD1, FIELD2 being identifiers > WDYT? So you would make 'jsobject-set*' a macro? When you say, "with FIELD1, FIELD2 being identifiers", do you mean that the macro should convert them to strings at compile-time? While, if I were designing a JSON representation, I'd much prefer to use symbols for the object keys, I think it would be confusing to use strings everywhere else but magic symbols here. I based this function on 'hash-set*' [4], 'dict-set*' [5], and numerous similar functions in the Racket world, so I have a high degree of confidence in the usability of the interface. [4]: https://docs.racket-lang.org/reference/hashtables.html#%28def._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._hash-set%2A%29%29 [5]: https://docs.racket-lang.org/reference/dicts.html#%28def._%28%28lib._racket%2Fdict..rkt%29._dict-set%2A%29%29 >> +(define (alist-update alist key failure-result updater) >> + "Return an assosciation list like ALIST, but with KEY mapped to >> the result >> +of applying UPDATER to the value to which KEY is mapped in ALIST. >> When ALIST >> +does not have an existing mapping for KEY, FAILURE-RESULT is used as >> with >> +'jsobject-ref' to obtain the argument for UPDATER." >> + ;; Often, `updater` will be a lambda expression, so making it the >> last >> + ;; argument may help to makes the code legible, and the most >> likely >> + ;; `failure-result` arguments are all shorter than the keyword >> + ;; `#:failure-result`. Plus, making `failure-result` mandatory >> helps make >> + ;; `alist-update` consistent with `alist-update*`. > Which alist-update* are you referring to here? Either way, the > failure-result to default argument from above applies, but we could > keyword it. Ah, I guess read that as, "Plus, making 'default' mandatory helps make 'jsobject-update' consistent with 'jsobject-update*'." >> + (define-values (popped tail-alist) >> + (alist-pop alist key)) >> + (acons key >> + (updater (match popped >> + (#f >> + (if (procedure? failure-result) >> + (failure-result) >> + failure-result)) >> + ((_ . value) >> + value))) >> + tail-alist)) > SRFI-71 let says hi. Also the ordering question applies. I'm starting > to think we should implement alist-pop, alist-set and alist-update in > terms of a single more powerful function producing three values (or > SRFI-1 span). Same question again re 'define-values'. My intent in creating 'alist-pop' was to have a primitive that would work for both 'alist-update' and 'alist-delete', and thereby 'alist-set'. Returning the prefix and the tail separately would involve either extra allocation or mutating pairs. Since order never matters in this context, why pay that price? >> +(define* (jsobject-union #:key >> + (combine (lambda (a b) b)) >> + (combine/key (lambda (k a b) (combine a >> b))) >> + #:rest json-objects) >> + "Combine the given JSON-OBJECTS into a single json object. The >> JSON-OBJECTS >> +are merged from left to right by adding each key/value pair of each >> object to >> +the aggregate object in turn. When one of the JSON-OBJECTS contains >> a mapping >> +from some key KEY to a value VAL such that the aggregate object >> already >> +contains a mapping from KEY to a value VAL0, the aggregate object is >> +functionally updated to instead map KEY to the value of (COMBINE/KEY >> KEY VAL0 >> +VAL). The default COMBINE/KEY tail-calls (COMBINE VAL0 VAL), and >> the default >> +COMBINE simply returns its second argument, so, by default, mappings >> in later >> +JSON-OBJECTS supersede those in earlier ones." >> + (match (filter (lambda (v) >> + (not (or (keyword? v) >> + (procedure? v)))) >> + json-objects) >> + (() >> + '(@)) >> + (((and js0 ('@ . _))) >> + js0) >> + ((('@ . alist0) ('@ . alist*) ...) >> + (cons '@ (fold (lambda (alist1 alist0) >> + (if (null? alist0) >> + alist1 >> + (fold (lambda (k+v alist0) >> + (match k+v >> + ((k . v) >> + (define-values (popped tail- >> alist) >> + (alist-pop alist0 k)) >> + (match popped >> + (#f >> + (cons k+v tail-alist)) >> + ((_ . v0) >> + (acons k >> + (combine/key k v0 v) >> + tail-alist)))))) >> + alist0 >> + alist1))) >> + alist0 >> + alist*))))) > Same default argument. Cons inside. > I think having a single combine function taking (k a b) would be less > confusing than having two. Is there a rationale for the form you > chose? I based this function in particular on 'hash-union' from 'racket/hash' [6], which uses these keywords. (But in 'hash-union', collisions trigger an exception by default, and it requires at least one argument, because otherwise it would be unclear what key-comparison function the result should use.) Having '#:combine' in addition to '#:combine/key' is ultimately just a convenience, but it is quite a nice convenience in practice. It is quite rare, in my experience, for a '#:combine' function to actually depend on the key: it might depend on the type of the value, but, very often, it unconditionally applies to all keys. Using '#:combine' is particularly nice when using a combination function that already exists, like 'append' or '+'. [6]: https://docs.racket-lang.org/reference/hashtables.html#%28def._%28%28lib._racket%2Fhash..rkt%29._hash-union%29%29 >> + (with-atomic-json-file-replacement "package.json" >> + (lambda (pkg-meta) >> + (jsobject-update* >> + pkg-meta >> + "devDependencies" '(@) resolve-dependencies >> + "dependencies" '(@) (lambda (deps) >> + (resolve-dependencies >> + (jsobject-union >> + (jsobject-ref pkg-meta >> "peerDependencies" '(@)) >> + deps)))))) >> #t) > We should probably add a function to our js utils that "generates an > empty object", because '(@) is quite confusing to see in these > circumstances. Otherwise LGTM with the aforementioned caveats. I'm not sure what to call it: it would have to be short, or people (me, at least) might end up writing '(@) anyway. Also, IIUC Guile doesn't actually prevent you from mutating quoted constant pairs, so I function would have to allocate a fresh pair to be robust. It's a somewhat odd idea, but how about this? (define-syntax |{}| (identifier-syntax '(@))) It's as short as '(@), it looks like the JSON notation for the empty object, and IIUC people could only use it to mess up occurrences of '(@) within the same compilation unit, which we can't stop them from doing anyway. Alternatively, if we give up the thunk special case for 'default' values, we could do: (define jsobject-update (case-lambda ((js key updater) (jsobject-update js key '(@) updater)) ((js key default updater) ...))) (define jsobject-update* (case-lambda ... ((js . args) (match js (('@ . alist) (cons '@ (let loop ((alist alist) (args args)) (match args (() alist) ((key (? procedure? updater) . args) (loop (alist-update alist key '(@) updater) args)) ((key default updater . args) (loop (alist-update alist key '(@) updater) args)))))))))) -Philip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.