GNU bug report logs -
#70007
[PATCH] native JSON encoder
Previous Next
Full log
Message #23 received at 70007 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Wed, 27 Mar 2024 19:57:24 +0100
> Cc: Yuan Fu <casouri <at> gmail.com>,
> 70007 <at> debbugs.gnu.org
>
> Eli, thank you for your comments!
Thanks for working on this in the first place.
> > This rejects unibyte non-ASCII strings, AFAU, in which case I suggest
> > to think whether we really want that. E.g., why is it wrong to encode
> > a string to UTF-8, and then send it to JSON?
>
> The way I see it, that would break the JSON abstraction: it transports strings of Unicode characters, not strings of bytes.
What's the difference? AFAIU, JSON expects UTF-8 encoded strings, and
whether that is used as a sequence of bytes or a sequence of
characters is in the eyes of the beholder: the bytestream is the same,
only the interpretation changes. So I'm not sure I understand how
this would break the assumption.
> A user who for some reason has a string of bytes that encode Unicode characters can just decode it in order to prove it to us. It's not the JSON encoder's job to decode the user's strings.
I didn't suggest to decode the input string, not at all. I suggested
to allow unibyte strings, and process them just like you process
pure-ASCII strings, leaving it to the caller to make sure the string
has only valid UTF-8 sequences. Forcing callers to decode such
strings is IMO too harsh and largely unjustified.
> (It would also be a pain to deal with and risks slowing down the string serialiser even if it's a case that never happens.)
I don't understand why. Once again, I'm just talking about passing
the bytes through as you do with ASCII characters.
This bug report was last modified 249 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.