GNU bug report logs - #70007
[PATCH] native JSON encoder

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Tue, 26 Mar 2024 15:35:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 70007 <at> debbugs.gnu.org
Subject: bug#70007: [PATCH] native JSON encoder
Date: Wed, 27 Mar 2024 13:46:17 +0100
26 mars 2024 kl. 17.46 skrev Eli Zaretskii <eliz <at> gnu.org>:

>> - The old code incorrectly accepted strings with non-Unicode characters (raw bytes). There is no reason to do this; JSON is UTF-8 only.
> 
> Would it complicate the code not to reject raw bytes?  I'd like to
> avoid incompatibilities if it's practical.  Also, Emacs traditionally
> doesn't reject raw bytes, leaving that to the application or the user.

Actually I may have misrepresented the behaviour of the old encoder. It doesn't accept any raw bytes but only sequences that happen to form valid UTF-8. It's quite strange, and I don't really think this was ever intended, just a consequence of the implementation.

This means that it accepts an already encoded unibyte UTF-8 string:

  (json-serialize "\303\251") -> "\"é\""

which is doubly odd since it's supposed to be encoding, but it ends up decoding the characters instead.
Even worse, it accepts mixtures of encoded and decoded chars:

  (json-serialize "é\303\251") -> "\"éé\""

which is just bonkers.
So while we could try to replicate this 'interesting' behaviour it would definitely complicate the code and be of questionable use.

The JSON spec is quite clear that it's UTF-8 only. The only useful deviation that I can think of would be to allow unpaired surrogates (WTF-8) to pass through for transmission of Windows file names, but that would be an extension -- the old encoder doesn't permit those.





This bug report was last modified 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.