GNU bug report logs - #51838
[PATCH 00/11] guix: node-build-system: Support compiling add-ons with node-gyp.

Previous Next

Package: guix-patches;

Reported by: Philip McGrath <philip <at> philipmcgrath.com>

Date: Sun, 14 Nov 2021 12:43:01 UTC

Severity: normal

Tags: patch

Done: Liliana Marie Prikler <liliana.prikler <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #1094 received at 51838 <at> debbugs.gnu.org (full text, mbox):

From: Philip McGrath <philip <at> philipmcgrath.com>
To: Liliana Marie Prikler <liliana.prikler <at> gmail.com>, 51838 <at> debbugs.gnu.org
Cc: Timothy Sample <samplet <at> ngyro.com>,
 Pierre Langlois <pierre.langlois <at> gmx.com>, Jelle Licht <jlicht <at> fsfe.org>
Subject: Re: [PATCH v6 03/41] guix: node-build-system: Add JSON utilities.
Date: Fri, 31 Dec 2021 00:22:48 -0500
Hi Liliana,

On 12/30/21 11:56, Liliana Marie Prikler wrote:
> Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath:
>> This commit adds several utility functions for non-destructive
>> transformation of the JSON representation used by (guix build json),
>> particularly for purely functional update of JSON objects.  They
>> should
>> eventually be exported, but most are private for now to allow for
>> more
>> experience and consideration before commiting to the API.  The design
>> was largely inspired by the 'racket/dict' and 'racket/hash'
>> libraries.
>> Liliana Marie Prikler proposed 'with-atomic-json-file-replacement'.
> Given that this is a fair amount of procedures that you're proposing, I
> think a new file would be appropriate.  Perhaps (guix build json-
> utils)?  Adding that should IIUC not cause a world rebuild, so we could
> do that on master.
>

I agree that these functions ultimately belong in their own file, and 
I'd even had the name (guix build json-utils) in mind.

I put them in (guix build node-build-system) for now because, if they 
were in (guix build json-utils), they would have to be exported, at 
which point their API would have to be relatively stable, and I didn't 
want designing them to block, or to be rushed by, the rest of this 
series. Now, maybe consensus on the json-utils will turn out to be the 
easy part! But my high-level question is, in your view, do any of the 
points I'm about to respond to block this patch series?

On 12/30/21 13:18, Liliana Marie Prikler wrote:
> Having argued for these procedures to be moved into their own file in a
> separate mail, now it's time to bikeshed stylistic choices.
> 
> Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath:
>> +(define (jsobject-ref js key failure-result)
>> +  "Return the value assosciated with KEY in the json object JS.  If
>> KEY is not
>> +found and FAILURE-RESULT is a procedure, it is called in tail position
>> with
>> +zero arguments.  Otherwise, FAILURE-RESULT is returned."
>> +  ;; TODO: `failure-result` should be optional, but should the default
>> +  ;; `failure-result` be #f (like `assoc-ref`), a thunk raising an
>> exception,
>> +  ;; '(@), or something else?  Keep it mandatory until we discuss and
>> decide.
>> +  (match js
>> +    (('@ . alist)
>> +     (match (assoc key alist)
>> +       (#f
>> +        (if (procedure? failure-result)
>> +            (failure-result)
>> +            failure-result))
>> +       ((_ . value)
>> +        value)))))
> We can safely replace failure-result by Guile's DEFAULT and leave error
> handling to the user.

I don't care whether we call it DEFAULT or FAILURE-RESULT.

I agree that it should not raise an exception by default. My current 
thinking is that '(@) would be a good default DEFAULT: it is useful for 
the common pattern of traversing or transforming nested objects, and, as 
you point out at the end of this email, explicitly typing #f (the other 
useful possibility) looks much more like normal Scheme than explicitly 
typing '(@).

In my experience with [1] and [2] (the purely-functional dictionary 
libraries I use most often), the special case for a procedure as DEFAULT 
is useful. I feel less strongly about it because it's relatively easy to 
work around for JSON, since you can pick some non-JSON signal value, but 
it also seems to have especially little downside for JSON, since (guix 
build json) will never have a procedure in a valid JSON object. In 
addition to raising exceptions and other control flow, it's useful for 
default values that are expensive to produce.

[1]: https://docs.racket-lang.org/reference/hashtables.html
[2]: https://docs.racket-lang.org/reference/dicts.html

> 
>> +(define (alist-pop alist key)
>> +  "Return two values: the first pair in ALIST with the given KEY in
>> its
>> +'car' (or #f, if no such pair exists) and an assosciation list like
>> (and
>> +potentially sharing storage with) ALIST, but with no entry for KEY."
>> +  (match (assoc key alist)
>> +    ;; If key isn't present, we don't need to do any allocation
>> +    (#f
>> +     (values #f alist))
>> +    (found
>> +     (values found
>> +             ;; Because we have `found`, we can find it more
>> +             ;; efficiently this time with `eq?`. We avoid using
>> +             ;; `delq` because it would copy pairs in a shared
>> +             ;; tail. We assume a sufficiently smart compiler to
>> +             ;; handle "tail recursion modulo cons" (vid. e.g. Indiana
>> +             ;; University Technical Report No. 19, Friedman & Wise
>> +             ;; 1975) at least as efficiently as a hand-written
>> +             ;; tail-recursive implementation with an accumulator.
>> +             (let loop ((alist alist))
>> +               (match alist
>> +                 ;; We know that `found` is present,
>> +                 ;; so no need to check for '()
>> +                 ((this . alist)
>> +                  (if (eq? this found)
>> +                      alist
>> +                      (cons this (loop alist))))))))))
> I think this can be more efficiently be done in a "single" loop.
> 
>    (let loop ((rest alist)
>               (previous '()))
>      (match rest
>        (() (values #f alist))
>        ((first . rest)
>         (if (eq? (car first) key)
>             (values first (reverse! previous rest))
>             (loop rest (cons first previous))))))
> 

I'll admit to a Racket bias, but, having just eliminated the use of 
'assoc-set!', I'm loathe to start mutating pairs (even correctly). To 
quote a bit from the SRFI-1 spec for 'append-reverse!', "note that this 
pattern of iterative computation followed by a reverse can frequently be 
rewritten as a recursion, dispensing with the reverse and append-reverse 
steps, and shifting temporary, intermediate storage from the heap to the 
stack, which is typically a win for reasons of cache locality and eager 
storage reclamation." (See how 'set-cdr!' can crash safe Chez Scheme! 
<https://github.com/cisco/ChezScheme/issues/599>)

IIUC, using SRFI-1's 'span' would lead to the same situation.

> Also, I don't think your version is tail-recursive.  (loop alist) is
> not in tail position from what I can tell.

Yes, "tail recursion modulo cons" refers to a compiler optimization for 
functions which are _not_ tail recursive. For full details, see the 
Friedman & Wise 1975 tech report I cited at 
<https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various 
other articles), but, as briefly as I can: The optimization rests on the 
observation that many recursive functions, like the classic definition 
of 'map':

    (define (map f lst)
      (match lst
        (()
         '())
        ((this . lst)
         (cons (f this)
               (map f lst)))))

are nearly tail-recursive, and the only real work remaining to be done 
in the continuation of the recursive call is to fill in the cdr of the 
pair. Thus, a compiler can safely transform this code into a truly 
tail-recursive implementation:

    (define (map f lst)
      (match lst
        (()
         '())
        ((this . lst)
         (define ret (list (f this)))
         (let loop ((dest ret)
                    (lst lst))
           (match lst
             ((this . lst)
              (define new (list (f this)))
              (set-cdr! dest new)
              (loop new lst))
             (()
              ret))))))

Unlike the Proper Implementation of Tail Calls (so-called "tail-call 
optimization"), handling "tail recursion modulo cons" truly is an 
optimization: it does not change the space complexity of the function. 
But it can allow the compiler to generate whatever code it thinks will 
work best with its collector/allocator and continuation/"call stack" 
implementation.

(The optimizations applies to constructors in general, not just 'cons', 
and a compiler can safely apply it to values that are immutable from the 
perspective of the source language.)

> 
>> +;; Sadly, Guile's implementation of (@ (srfi srfi-1) alist-delete)
>> +;; performs unnecessary allocation, e.g. this currently evaluates to
>> #f:
>> +;;
>> +;;     (let ((alist `(("a" . 1)("b" . 2)("c" . 3))))
>> +;;       (eq? alist (alist-delete "x" alist)))
>> +;;
>> +;; These functions generally choose to allocate a new outer pair
>> (with the '@
>> +;; tag), even though in unusual cases the resulting object might not
>> have
>> +;; changed, for the sake of simplicity and to avoid retaining a
>> reference to
>> +;; the original alist longer than necessary. But that is O(1)
>> allocation that
>> +;; could only rarely be avoided: `alist-delete` would allocate O(n)
>> pairs,
>> +;; which would only be necessary in the worst case.
>> +(define (alist-delete* alist key)
>> +  "Return an assosciation list like (and potentially sharing storage
>> with)
>> +ALIST, but with no entry for KEY."
>> +  (define-values (_popped remaining)
>> +    (alist-pop alist key))
>> +  remaining)
> That's a pretty long comment around something that could be done with
> call-with-values or SRFI-71 let.  I think one of these two should be
> preferred.
> 
> Note that both our versions of alist-pop only pop the first key (as
> they should).  This means that alist-delete* should really be called
> alist-delete-1 as in "remove the first pair in ALIST belonging to KEY".
> For the larger JSON handling below, this makes no difference however.

Here I was using '*' to mean "a slightly altered version of", as with 
'letrec' and 'letrec*', but, yes, since the other functions defined here 
use '*' to mean "zero or more times", the name is confusing: I think I'd 
just call it 'alist-delete' and not import (srfi srfi-1)'s version.

The comment may be unnecessarily long ... the essence of what I was 
trying to explain is that, in all of these implementations, I've tried 
to avoid unnecessary allocation. Being able to rely on 'alist-delete', 
and more generally 'alist-pop', not to needlessly copy the "spine" of 
the list lets later functions use them unconditionally.

Why would you prefer 'call-with-values' or SRFI-71 over 'define-values'? 
The style guide against which I'm used to working [3] generally prefers 
internal definitions, to avoid rightward drift.

[3]: 
https://docs.racket-lang.org/style/Choosing_the_Right_Construct.html#%28part._.Definitions%29

>> +(define (alist-set alist key value)
>> +  "Return an assosciation list like ALIST, but with KEY mapped to
>> VALUE,
>> +replacing any existing mapping for KEY."
>> +  (acons key value (alist-delete* alist key)))
> Is order relevant here?  Because we could just as well reimplement our
> alist-delete* loop and cons the replacement onto the rest.  WDYT?

Relying on order for JSON objects is non-interoperable, per RFC 8259 § 
4. I'm not intending for these alist procedures to be exported, so I'm 
not trying to handle any more general case than that, as I explain in 
the comments at the top of the file.

I'm not sure what the advantage would be to reimplementing the 
'alist-delete' loop here.

> 
>> +(define (jsobject-set js key value)
>> +  "Return a json object like JS, but with KEY mapped to VALUE,
>> replacing any
>> +existing mapping for KEY."
>> +  (cons '@ (match js
>> +             (('@ . alist)
>> +              (alist-set alist key value)))))
> I think it'd be wiser to put the cons inside the match.
> 

I don't care very much either way.

>> +(define jsobject-set*
>> +  (case-lambda
>> +    "Return a json object like JS, but functionally extended by
>> mapping each
>> +KEY to each VALUE, replacing any existing mapping for each KEY.  The
>> update
>> +takes place from left to right, so later mappings overwrite earlier
>> mappings
>> +for the same KEY."
>> +    ((js)
>> +     js)
>> +    ((js key value)
>> +     (jsobject-set js key value))
>> +    ((js . args)
>> +     (cons '@ (match js
>> +                (('@ . alist)
>> +                 (let loop ((alist alist)
>> +                            (args args))
>> +                   (match args
>> +                     (()
>> +                     alist)
>> +                     ((key value . args)
>> +                      (loop (alist-set alist key value)
>> +                            args))))))))))
> I'm not sure if I like this "syntax".  I think I'd prefer
>    (jsobject-set* obj (FIELD1 VALUE1) (FIELD2 VALUE2) ...)
> with FIELD1, FIELD2 being identifiers
> WDYT?

So you would make 'jsobject-set*' a macro? When you say, "with FIELD1, 
FIELD2 being identifiers", do you mean that the macro should convert 
them to strings at compile-time? While, if I were designing a JSON 
representation, I'd much prefer to use symbols for the object keys, I 
think it would be confusing to use strings everywhere else but magic 
symbols here.

I based this function on 'hash-set*' [4], 'dict-set*' [5], and numerous 
similar functions in the Racket world, so I have a high degree of 
confidence in the usability of the interface.

[4]: 
https://docs.racket-lang.org/reference/hashtables.html#%28def._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._hash-set%2A%29%29
[5]: 
https://docs.racket-lang.org/reference/dicts.html#%28def._%28%28lib._racket%2Fdict..rkt%29._dict-set%2A%29%29


>> +(define (alist-update alist key failure-result updater)
>> +  "Return an assosciation list like ALIST, but with KEY mapped to
>> the result
>> +of applying UPDATER to the value to which KEY is mapped in ALIST.
>> When ALIST
>> +does not have an existing mapping for KEY, FAILURE-RESULT is used as
>> with
>> +'jsobject-ref' to obtain the argument for UPDATER."
>> +  ;; Often, `updater` will be a lambda expression, so making it the
>> last
>> +  ;; argument may help to makes the code legible, and the most
>> likely
>> +  ;; `failure-result` arguments are all shorter than the keyword
>> +  ;; `#:failure-result`.  Plus, making `failure-result` mandatory
>> helps make
>> +  ;; `alist-update` consistent with `alist-update*`.
> Which alist-update* are you referring to here?  Either way, the
> failure-result to default argument from above applies, but we could
> keyword it.

Ah, I guess read that as, "Plus, making 'default' mandatory helps make 
'jsobject-update' consistent with 'jsobject-update*'."

>> +  (define-values (popped tail-alist)
>> +    (alist-pop alist key))
>> +  (acons key
>> +         (updater (match popped
>> +                    (#f
>> +                     (if (procedure? failure-result)
>> +                         (failure-result)
>> +                         failure-result))
>> +                    ((_ . value)
>> +                     value)))
>> +         tail-alist))
> SRFI-71 let says hi.  Also the ordering question applies.  I'm starting
> to think we should implement alist-pop, alist-set and alist-update in
> terms of a single more powerful function producing three values (or
> SRFI-1 span).

Same question again re 'define-values'.

My intent in creating 'alist-pop' was to have a primitive that would 
work for both 'alist-update' and 'alist-delete', and thereby 
'alist-set'. Returning the prefix and the tail separately would involve 
either extra allocation or mutating pairs. Since order never matters in 
this context, why pay that price?

>> +(define* (jsobject-union #:key
>> +                         (combine (lambda (a b) b))
>> +                         (combine/key (lambda (k a b) (combine a
>> b)))
>> +                         #:rest json-objects)
>> +  "Combine the given JSON-OBJECTS into a single json object.  The
>> JSON-OBJECTS
>> +are merged from left to right by adding each key/value pair of each
>> object to
>> +the aggregate object in turn.  When one of the JSON-OBJECTS contains
>> a mapping
>> +from some key KEY to a value VAL such that the aggregate object
>> already
>> +contains a mapping from KEY to a value VAL0, the aggregate object is
>> +functionally updated to instead map KEY to the value of (COMBINE/KEY
>> KEY VAL0
>> +VAL).  The default COMBINE/KEY tail-calls (COMBINE VAL0 VAL), and
>> the default
>> +COMBINE simply returns its second argument, so, by default, mappings
>> in later
>> +JSON-OBJECTS supersede those in earlier ones."
>> +  (match (filter (lambda (v)
>> +                   (not (or (keyword? v)
>> +                            (procedure? v))))
>> +                 json-objects)
>> +    (()
>> +     '(@))
>> +    (((and js0 ('@ . _)))
>> +     js0)
>> +    ((('@ . alist0) ('@ . alist*) ...)
>> +     (cons '@ (fold (lambda (alist1 alist0)
>> +                      (if (null? alist0)
>> +                          alist1
>> +                          (fold (lambda (k+v alist0)
>> +                                  (match k+v
>> +                                    ((k . v)
>> +                                     (define-values (popped tail-
>> alist)
>> +                                       (alist-pop alist0 k))
>> +                                     (match popped
>> +                                       (#f
>> +                                        (cons k+v tail-alist))
>> +                                       ((_ . v0)
>> +                                        (acons k
>> +                                               (combine/key k v0 v)
>> +                                               tail-alist))))))
>> +                                alist0
>> +                                alist1)))
>> +                    alist0
>> +                    alist*)))))
> Same default argument.  Cons inside.
> I think having a single combine function taking (k a b) would be less
> confusing than having two.  Is there a rationale for the form you
> chose?

I based this function in particular on 'hash-union' from 'racket/hash' 
[6], which uses these keywords. (But in 'hash-union', collisions trigger 
an exception by default, and it requires at least one argument, because 
otherwise it would be unclear what key-comparison function the result 
should use.)

Having '#:combine' in addition to '#:combine/key' is ultimately just a 
convenience, but it is quite a nice convenience in practice. It is quite 
rare, in my experience, for a '#:combine' function to actually depend on 
the key: it might depend on the type of the value, but, very often, it 
unconditionally applies to all keys. Using '#:combine' is particularly 
nice when using a combination function that already exists, like 
'append' or '+'.

[6]: 
https://docs.racket-lang.org/reference/hashtables.html#%28def._%28%28lib._racket%2Fhash..rkt%29._hash-union%29%29

>> +  (with-atomic-json-file-replacement "package.json"
>> +    (lambda (pkg-meta)
>> +      (jsobject-update*
>> +       pkg-meta
>> +       "devDependencies" '(@) resolve-dependencies
>> +       "dependencies" '(@) (lambda (deps)
>> +                             (resolve-dependencies
>> +                              (jsobject-union
>> +                               (jsobject-ref pkg-meta
>> "peerDependencies" '(@))
>> +                               deps))))))
>>     #t)
> We should probably add a function to our js utils that "generates an
> empty object", because '(@) is quite confusing to see in these
> circumstances.  Otherwise LGTM with the aforementioned caveats.

I'm not sure what to call it: it would have to be short, or people (me, 
at least) might end up writing '(@) anyway. Also, IIUC Guile doesn't 
actually prevent you from mutating quoted constant pairs, so I function 
would have to allocate a fresh pair to be robust.

It's a somewhat odd idea, but how about this?

    (define-syntax |{}| (identifier-syntax '(@)))

It's as short as '(@), it looks like the JSON notation for the empty 
object, and IIUC people could only use it to mess up occurrences of '(@) 
within the same compilation unit, which we can't stop them from doing 
anyway.

Alternatively, if we give up the thunk special case for 'default' 
values, we could do:

    (define jsobject-update
      (case-lambda
        ((js key updater)
         (jsobject-update js key '(@) updater))
        ((js key default updater)
         ...)))
    (define jsobject-update*
      (case-lambda
        ...
        ((js . args)
         (match js
           (('@ . alist)
            (cons '@ (let loop ((alist alist)
                                (args args))
                       (match args
                         (()
                          alist)
                         ((key (? procedure? updater) . args)
                          (loop (alist-update alist key '(@) updater)
                                args))
                         ((key default updater . args)
                          (loop (alist-update alist key '(@) updater)
                                args))))))))))

-Philip




This bug report was last modified 3 years and 195 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.