GNU bug report logs - #61726
[PATCH] Eglot: Support positionEncoding capability

Previous Next

Package: emacs;

Reported by: Augusto Stoffel <arstoffel <at> gmail.com>

Date: Thu, 23 Feb 2023 08:06:01 UTC

Severity: normal

Tags: patch

Done: João Távora <joaotavora <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #137 received at 61726 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 61726 <at> debbugs.gnu.org, joaotavora <at> gmail.com
Subject: Re: bug#61726: [PATCH] Eglot: Support positionEncoding capability
Date: Fri, 24 Feb 2023 17:19:20 +0200
> From: Augusto Stoffel <arstoffel <at> gmail.com>
> Cc: João Távora <joaotavora <at> gmail.com>,
>   61726 <at> debbugs.gnu.org
> Date: Fri, 24 Feb 2023 15:45:30 +0100
> 
> On Fri, 24 Feb 2023 at 15:51, Eli Zaretskii wrote:
> 
> > This is a misunderstanding: I didn't mean to say that we should send
> > invalid UTF-8 sequences.  I meant something else.  Quote from the rest
> > of my message:
> >
> >> > and `json-serialize' rightfully emits an error.
> >> 
> >> There's no "rightfully" here.  It's our decision to signal an error in
> >> this case.
> 
> In fact, our decision was to follow the JSON specification, which says
> UTF-8 is the only allowed exchange encoding:
> https://www.rfc-editor.org/rfc/rfc8259#section-8.1
> 
> >> Substituting some innocent character for the unencodable
> >> ones would be an entirely legitimate alternative.
> 
> So actually the answer here is no.  You can save arbitrary bytes in a
> file in your laptop and call it data.json.  But it you pass some data to
> someone else and promise it's in JSON format, then it _must_ be UTF-8
> encoded.

I'm bewildered by my apparent inability to explain what I mean.  So
let me try with an example.  Suppose the buffer text in question is

   abcde\201xyz

where \201 is a raw byte.  I'm saying that, instead of signaling an
error, we could send to the server the string

   abcde xyz

where the \201 byte was replaced by the SPC character.  The latter
string is, of course, perfectly correct UTF-8 sequence, and so doesn't
violate any specs.

The SPC character as a replacement is, of course, just one example.
We could instead use '?' or U+FFFD REPLACEMENT CHARACTER, or anything
else, and all of those replacements can be encoded in UTF-8 without
any problems.

Did I make myself clear now?




This bug report was last modified 2 years and 139 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.