GNU bug report logs -
#70007
[PATCH] native JSON encoder
Previous Next
Full log
Message #17 received at 70007 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Wed, 27 Mar 2024 16:49:53 +0100
> Cc: 70007 <at> debbugs.gnu.org
>
> Here is an updated patch. It now ignores duplicated keys in objects represented by alists and plists, just like the old encoder. (I didn't include this in the first draft out of fear it would be slow and complicated, but it turned out just to be complicated.)
>
> The performance is still acceptable, which means at least 2x the speed of the Jansson-based encoder.
Thanks. A few initial comments and questions, based on a very cursory
reading.
> +/* JSON encoding context */
This is not our comment style.
> +typedef struct {
> + char *buf;
> + ptrdiff_t size; /* number of bytes in buf */
> + ptrdiff_t capacity; /* allocated size of buf */
> + ptrdiff_t chars_delta; /* size - {number of Unicode chars in buf} */
When you say "Unicode chars", what do you mean? characters or bytes?
If characters, then why do you need to qualify them with "Unicode"?
> +struct symset_tbl
> +{
> + /* Table used by the containing object if any, so that we can easily
> + all tables if an error occurs. */
> + struct symset_tbl *up;
> + /* Table of symbols (2**bits entries), Qunbound where unused. */
> + Lisp_Object entries[];
^^
Is this portable enough?
> +static struct symset_tbl *
> +alloc_symset_table (int bits)
> +{
> + struct symset_tbl *st = xmalloc (sizeof *st + (sizeof *st->entries << bits));
> + int size = 1 << bits;
I'd add an assertion here that BITS is not large enough to produce zero.
> +/* Enlarge the table used by a symset. */
^^
Two spaces there, please.
> +static NO_INLINE void
> +symset_expand (symset_t *ss)
> +{
> + struct symset_tbl *old_table = ss->table;
> + int oldbits = ss->bits;
> + int oldsize = 1 << oldbits;
I'd add an assertion here about the magnitude of BITS.
> + while (p < end)
> + {
> + unsigned char c = *p;
> + if (json_plain_char[c])
> + {
> + json_out_byte (jo, c);
> + p++;
> + }
> + else if (c > 0x7f)
> + {
> + if (STRING_MULTIBYTE (str))
> + {
> + int n;
> + if (c <= 0xc1)
> + string_not_unicode (str);
> + if (c <= 0xdf)
> + n = 2;
> + else if (c <= 0xef)
> + {
> + int v = (((c & 0x0f) << 12)
> + + ((p[1] & 0x3f) << 6) + (p[2] & 0x3f));
> + if (char_surrogate_p (v))
> + string_not_unicode (str);
> + n = 3;
> + }
> + else if (c <= 0xf7)
> + {
> + int v = (((c & 0x07) << 18)
> + + ((p[1] & 0x3f) << 12)
> + + ((p[2] & 0x3f) << 6)
> + + (p[3] & 0x3f));
> + if (v > MAX_UNICODE_CHAR)
> + string_not_unicode (str);
> + n = 4;
> + }
> + else
> + string_not_unicode (str);
> + json_out_str (jo, (const char *)p, n);
> + jo->chars_delta += n - 1;
> + p += n;
> + }
> + else
> + string_not_unicode (str);
This rejects unibyte non-ASCII strings, AFAU, in which case I suggest
to think whether we really want that. E.g., why is it wrong to encode
a string to UTF-8, and then send it to JSON?
> +static void
> +json_out_float (json_out_t *jo, Lisp_Object f)
> +{
> + double x = XFLOAT_DATA (f);
> + if (isinf (x) || isnan (x))
> + signal_error ("not a finite number", f);
Is JSON unable to handle Inf and NaN?
> +static Lisp_Object
> +json_out_string_result (json_out_t *jo)
> +{
> + /* FIXME: should this be a unibyte or multibyte string?
> + Right now we make a multibyte string for test compatibility,
> + but we are really encoding so unibyte would make more sense. */
I indeed think this should be a unibyte string, because otherwise
writing it to a file or a process will/might encode it, which would be
wrong.
> + json_out_t jo = {
> + .maxdepth = 25,
Is this arbitrary, or is it what JSON expects? If arbitrary, should
it be customizable? should it be documented?
> + /* FIXME: Do we really need to do all this work below to insert a string?
> + Is there no function already written? I must be missing something. */
There is no function. All the insert_from_* functions in insdel.c do
something similar.
Btw, shouldn't json-insert call treesit_record_change? Yuan?
This bug report was last modified 249 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.