GNU bug report logs - #70007
[PATCH] native JSON encoder

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Tue, 26 Mar 2024 15:35:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 70007 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: casouri <at> gmail.com, 70007 <at> debbugs.gnu.org
Subject: Re: bug#70007: [PATCH] native JSON encoder
Date: Wed, 27 Mar 2024 21:05:54 +0200
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Wed, 27 Mar 2024 19:57:24 +0100
> Cc: Yuan Fu <casouri <at> gmail.com>,
>  70007 <at> debbugs.gnu.org
> 
> Eli, thank you for your comments!

Thanks for working on this in the first place.

> > This rejects unibyte non-ASCII strings, AFAU, in which case I suggest
> > to think whether we really want that.  E.g., why is it wrong to encode
> > a string to UTF-8, and then send it to JSON?
> 
> The way I see it, that would break the JSON abstraction: it transports strings of Unicode characters, not strings of bytes.

What's the difference?  AFAIU, JSON expects UTF-8 encoded strings, and
whether that is used as a sequence of bytes or a sequence of
characters is in the eyes of the beholder: the bytestream is the same,
only the interpretation changes.  So I'm not sure I understand how
this would break the assumption.

> A user who for some reason has a string of bytes that encode Unicode characters can just decode it in order to prove it to us. It's not the JSON encoder's job to decode the user's strings.

I didn't suggest to decode the input string, not at all.  I suggested
to allow unibyte strings, and process them just like you process
pure-ASCII strings, leaving it to the caller to make sure the string
has only valid UTF-8 sequences.  Forcing callers to decode such
strings is IMO too harsh and largely unjustified.

> (It would also be a pain to deal with and risks slowing down the string serialiser even if it's a case that never happens.)

I don't understand why.  Once again, I'm just talking about passing
the bytes through as you do with ASCII characters.




This bug report was last modified 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.