GNU bug report logs - #51838
[PATCH 00/11] guix: node-build-system: Support compiling add-ons with node-gyp.

Previous Next

Package: guix-patches;

Reported by: Philip McGrath <philip <at> philipmcgrath.com>

Date: Sun, 14 Nov 2021 12:43:01 UTC

Severity: normal

Tags: patch

Done: Liliana Marie Prikler <liliana.prikler <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Philip McGrath <philip <at> philipmcgrath.com>
To: Liliana Marie Prikler <liliana.prikler <at> gmail.com>, 51838 <at> debbugs.gnu.org
Cc: Timothy Sample <samplet <at> ngyro.com>, Pierre Langlois <pierre.langlois <at> gmx.com>, Jelle Licht <jlicht <at> fsfe.org>, Leo Famulari <leo <at> famulari.name>
Subject: [bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities.
Date: Fri, 7 Jan 2022 23:13:54 -0500
Hi,

On 12/31/21 05:18, Liliana Marie Prikler wrote:
> Am Freitag, dem 31.12.2021 um 00:22 -0500 schrieb Philip McGrath:
>>>> +(define (alist-pop alist key)
>>>> +  "Return two values: the first pair in ALIST with the given KEY
>>>> in its
>>>> +'car' (or #f, if no such pair exists) and an assosciation list
>>>> like (and
>>>> +potentially sharing storage with) ALIST, but with no entry for
>>>> KEY."
>>>> +  (match (assoc key alist)
>>>> +    ;; If key isn't present, we don't need to do any allocation
>>>> +    (#f
>>>> +     (values #f alist))
>>>> +    (found
>>>> +     (values found
>>>> +             ;; Because we have `found`, we can find it more
>>>> +             ;; efficiently this time with `eq?`. We avoid using
>>>> +             ;; `delq` because it would copy pairs in a shared
>>>> +             ;; tail. We assume a sufficiently smart compiler to
>>>> +             ;; handle "tail recursion modulo cons" (vid. e.g.
>>>> Indiana
>>>> +             ;; University Technical Report No. 19, Friedman &
>>>> Wise
>>>> +             ;; 1975) at least as efficiently as a hand-written
>>>> +             ;; tail-recursive implementation with an
>>>> accumulator.
>>>> +             (let loop ((alist alist))
>>>> +               (match alist
>>>> +                 ;; We know that `found` is present,
>>>> +                 ;; so no need to check for '()
>>>> +                 ((this . alist)
>>>> +                  (if (eq? this found)
>>>> +                      alist
>>>> +                      (cons this (loop alist))))))))))
>>> I think this can be more efficiently be done in a "single" loop.
>>>
>>>     (let loop ((rest alist)
>>>                (previous '()))
>>>       (match rest
>>>         (() (values #f alist))
>>>         ((first . rest)
>>>          (if (eq? (car first) key)
>>>              (values first (reverse! previous rest))
>>>              (loop rest (cons first previous))))))
>>>
>>
>> I'll admit to a Racket bias, but, having just eliminated the use of
>> 'assoc-set!', I'm loathe to start mutating pairs (even correctly). To
>> quote a bit from the SRFI-1 spec for 'append-reverse!', "note that
>> this pattern of iterative computation followed by a reverse can
>> frequently be rewritten as a recursion, dispensing with the reverse
>> and append-reverse steps, and shifting temporary, intermediate
>> storage from the heap to the stack, which is typically a win for
>> reasons of cache locality and eager storage reclamation." (See how
>> 'set-cdr!' can crash safe Chez Scheme!
>> <https://github.com/cisco/ChezScheme/issues/599>)
>>
>> IIUC, using SRFI-1's 'span' would lead to the same situation.
> For the record, we can use the non-destructive append and reverse here
> at the expense of more copying.  If done in terms of SRFI-1 span, we
> would not need reverse as far as I understand.
> 
>>> Also, I don't think your version is tail-recursive.  (loop alist)
>>> is not in tail position from what I can tell.
>>
>> Yes, "tail recursion modulo cons" refers to a compiler optimization
>> for functions which are _not_ tail recursive. For full details, see
>> the Friedman & Wise 1975 tech report I cited at
>> <https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various
>> other articles), but, as briefly as I can: The optimization rests on
>> the observation that many recursive functions, like the classic
>> definition of 'map':
>>
>>       (define (map f lst)
>>         (match lst
>>           (()
>>            '())
>>           ((this . lst)
>>            (cons (f this)
>>                  (map f lst)))))
>>
>> are nearly tail-recursive, and the only real work remaining to be
>> done in the continuation of the recursive call is to fill in the cdr
>> of the pair. Thus, a compiler can safely transform this code into a
>> truly tail-recursive implementation:
>>
>>       (define (map f lst)
>>         (match lst
>>           (()
>>            '())
>>           ((this . lst)
>>            (define ret (list (f this)))
>>            (let loop ((dest ret)
>>                       (lst lst))
>>              (match lst
>>                ((this . lst)
>>                 (define new (list (f this)))
>>                 (set-cdr! dest new)
>>                 (loop new lst))
>>                (()
>>                 ret))))))
>>
>> Unlike the Proper Implementation of Tail Calls (so-called "tail-call
>> optimization"), handling "tail recursion modulo cons" truly is an
>> optimization: it does not change the space complexity of the
>> function. But it can allow the compiler to generate whatever code it
>> thinks will work best with its collector/allocator and
>> continuation/"call stack" implementation.
>>
>> (The optimizations applies to constructors in general, not just
>> 'cons', and a compiler can safely apply it to values that are
>> immutable from the perspective of the source language.)
> I'm not aware to which extent Guile implements tail recursion modulo
> cons and I'd argue neither are you until you dig down into disassembly.
> I think it's better here to avoid patterns from Racket that would feel
> foreign to Guilers, particularly if you have to explain them with
> reference to a paper (we already get hate for referring to Wingo's fold
> for XML handling).

In a sense, "tail recursion modulo cons" was a red herring here. The 
essential requirement for implementing 'alist-pop' or 'map' as I did is 
that the language implementation be "safe for space", i.e. not have 
"stack overflow"s: Guile meets that requirement. [1]

In a safe-for-space language, the naturally recursive implementations 
and the implementations with explicit, non-destructive accumulators both 
allocate O(n) temporary storage. The difference is that the explicit 
accumulator versions allocate temporary pairs on the heap, while the 
naturally recursive version allocates its temporary space on the "stack" 
(i.e. additional frames of the (non-reified) continuation), which is 
generally, and specifically for Guile per [1], much better (though a 
sufficiently smart generational garbage collector with bump-pointer 
allocation in the nursery could mitigate the difference somewhat).

All of that relies just on the guarantees of Guile as a safe-for-space 
language. The optimization in "tail recursion modulo cons" is that a 
compiler could, if it chose to expend its effort this way, make the 
naturally recursive implementations work without the O(n) temporary 
"stack" storage by transforming transforming the non-tail recursion into 
tail recursion. In essence, it could achieve a similar effect to an 
explicit accumulator plus 'reverse!' without the many downsides (some of 
which [1] discusses).

But the naturally recursive implementation is preferable even if the 
optimization does not apply.

> 
> In principle, what you're supposing is that a sufficiently smart
> compiler could rewrite
> 
>    (let ((before after (span PRED mylist))) (append before after))
> 
> to (list-copy mylist), which as far as I'm aware Guile currently
> doesn't.  It could be argued that it would start doing so once I cast
> some magic incantations, but I wouldn't count on it without reading the
> disassembly.

In some sense that's true, but your example would require a lot of 
interprocedural analysis, not just a directly visible pattern with 
well-known primitives using analysis that has been well known since the 
'70s. But, again, the optimization isn't really relevant.
>>> Is order relevant here?  Because we could just as well reimplement
>>> our alist-delete* loop and cons the replacement onto the rest.
>>> WDYT?
>>
>> Relying on order for JSON objects is non-interoperable, per RFC 8259
>> §4. I'm not intending for these alist procedures to be exported, so
>> I'm not trying to handle any more general case than that, as I
>> explain in the comments at the top of the file.
>>
>> I'm not sure what the advantage would be to reimplementing the
>> 'alist-delete' loop here.
> Fair enough, the question was however not so much what is required per
> RFC, but rather if there is a natural feel of order to package.json
> that we ought not disturb.  Particularly, putting dependencies before
> name and version could be confusing to whoever needs to debug a delete-
> dependencies phase gone wrong.

I haven't noticed a consistent convention in "package.json" files (which 
IIUC may not be entirely hand-written).

For debugging, the biggest problem is that (guix build json) doesn't add 
any linebreaks or indentation.

If I were changing it, I'd want it to write object keys in 'string<?' 
order and to raise an exception if given duplicate keys.

-Philip

[1]: 
https://www.gnu.org/software/guile/docs/docs-2.2/guile-ref/Stack-Overflow.html




This bug report was last modified 3 years and 195 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.