GNU bug report logs - #70007
[PATCH] native JSON encoder

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Tue, 26 Mar 2024 15:35:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Yuan Fu <casouri <at> gmail.com>, 70007 <at> debbugs.gnu.org
Subject: bug#70007: [PATCH] native JSON encoder
Date: Wed, 27 Mar 2024 19:57:24 +0100
Eli, thank you for your comments!

27 mars 2024 kl. 18.40 skrev Eli Zaretskii <eliz <at> gnu.org>:

>> +/* JSON encoding context */
> 
> This is not our comment style.

I'll go through the code and clean up all comments.

>> +typedef struct {
>> +  char *buf;
>> +  ptrdiff_t size;		/* number of bytes in buf */
>> +  ptrdiff_t capacity;		/* allocated size of buf */
>> +  ptrdiff_t chars_delta;        /* size - {number of Unicode chars in buf} */
> 
> When you say "Unicode chars", what do you mean? characters or bytes?
> If characters, then why do you need to qualify them with "Unicode"?

Characters. Will clarify.

>> +  Lisp_Object entries[];
>                        ^^
> Is this portable enough?

Something I'd like to know, too. We rely on C99 in many other aspects. Are there still compilers that are important to us but don't get this right?

10 years ago this was apparently an issue for IBM XL C 12.1, but modern versions are based on Clang. We could take our chances here; obviously we'll change it if someone complains but it seems unlikely. What do you think?

> I'd add an assertion here that BITS is not large enough to produce zero.

I'll deal with that in some way or another.

> This rejects unibyte non-ASCII strings, AFAU, in which case I suggest
> to think whether we really want that.  E.g., why is it wrong to encode
> a string to UTF-8, and then send it to JSON?

The way I see it, that would break the JSON abstraction: it transports strings of Unicode characters, not strings of bytes. A user who for some reason has a string of bytes that encode Unicode characters can just decode it in order to prove it to us. It's not the JSON encoder's job to decode the user's strings.

(It would also be a pain to deal with and risks slowing down the string serialiser even if it's a case that never happens.)

> Is JSON unable to handle Inf and NaN?

That's right.

>> +  /* FIXME: should this be a unibyte or multibyte string?
>> +     Right now we make a multibyte string for test compatibility,
>> +     but we are really encoding so unibyte would make more sense.  */
> 
> I indeed think this should be a unibyte string, because otherwise
> writing it to a file or a process will/might encode it, which would be
> wrong.

I would prefer that, too, but used multibyte for compatibility with the old code and so that its tests pass.
It should probably be a separate change if we decide that unibyte is better here.

>> +  json_out_t jo = {
>> +    .maxdepth = 25,
> 
> Is this arbitrary, or is it what JSON expects?  If arbitrary, should
> it be customizable? should it be documented?

It's semi-arbitrary but reasonable: the JSON_checker at json.org uses a maximum depth of 20 by default, and many implementations use its test suite. RFC-8259 states that the maximum depth is implementation-dependent.

It's hardly worth making this into a parameter for the user to adjust but I'll clarify the code.

>> +  /* FIXME: Do we really need to do all this work below to insert a string?
>> +     Is there no function already written?  I must be missing something.  */
> 
> There is no function.  All the insert_from_* functions in insdel.c do
> something similar.

Thank you for confirming that. Looks like we could use some abstraction then.





This bug report was last modified 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.