GNU bug report logs -
#70007
[PATCH] native JSON encoder
Previous Next
Full log
View this message in rfc822 format
26 mars 2024 kl. 17.46 skrev Eli Zaretskii <eliz <at> gnu.org>:
>> - The old code incorrectly accepted strings with non-Unicode characters (raw bytes). There is no reason to do this; JSON is UTF-8 only.
>
> Would it complicate the code not to reject raw bytes? I'd like to
> avoid incompatibilities if it's practical. Also, Emacs traditionally
> doesn't reject raw bytes, leaving that to the application or the user.
Actually I may have misrepresented the behaviour of the old encoder. It doesn't accept any raw bytes but only sequences that happen to form valid UTF-8. It's quite strange, and I don't really think this was ever intended, just a consequence of the implementation.
This means that it accepts an already encoded unibyte UTF-8 string:
(json-serialize "\303\251") -> "\"é\""
which is doubly odd since it's supposed to be encoding, but it ends up decoding the characters instead.
Even worse, it accepts mixtures of encoded and decoded chars:
(json-serialize "é\303\251") -> "\"éé\""
which is just bonkers.
So while we could try to replicate this 'interesting' behaviour it would definitely complicate the code and be of questionable use.
The JSON spec is quite clear that it's UTF-8 only. The only useful deviation that I can think of would be to allow unpaired surrogates (WTF-8) to pass through for transmission of Windows file names, but that would be an extension -- the old encoder doesn't permit those.
This bug report was last modified 249 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.