#30076 - [PATCH] web: Recognize JSON content type as text.

GNU bug report logs - #30076
[PATCH] web: Recognize JSON content type as text.

Package: guile;

Reported by: Arun Isaac <arunisaac <at> systemreboot.net>

Date: Thu, 11 Jan 2018 05:33:01 UTC

Severity: normal

Tags: patch

View this message in rfc822 format

From: Mark H Weaver <mhw <at> netris.org> To: Arun Isaac <arunisaac <at> systemreboot.net> Cc: 30076 <at> debbugs.gnu.org Subject: bug#30076: [PATCH] web: Recognize JSON content type as text. Date: Tue, 30 Jan 2018 22:31:04 -0500

Hi Arun, Arun Isaac <arunisaac <at> systemreboot.net> writes: > * module/web/response.scm (text-content-type?): Recognize JSON content > type as text. While this would seem reasonable at first glance, it seems to me that this will result in JSON texts with non-ASCII characters being mishandled in many cases. Within Guile, 'text-content-type?' is currently used in two places: * 'decode-response-body' in (web client), and * 'response-body-port' in (web response). In both places, if 'text-content-type?' returns true, the encoding of the response is assumed to be "ISO-8859-1" if not otherwise specified by an explicit 'charset' parameter. This is what RFC 2616 specifies for text/plain, although RFC 6657 would change the default to US-ASCII, as it was in RFC 2046, and maybe we should look into that. However, things are quite different for the application/json MIME type, as specified in RFCs 4627 and 7159. Those RFCs specify that JSON text "SHALL" (i.e. MUST) be encoded in Unicode (UTF-8, UTF-16 or UTF-32), that the default encoding is UTF-8, and furthermore that no charset parameter is defined for application/json. So, we can expect at least some conforming implementations to omit the 'charset' parameter, and yet in that case we must assume that the encoding is Unicode, and most definitely not ISO-8859-1. RFC 4627 makes the additional interesting observation (in section 3, "encoding") that since the first two characters of JSON text will always be ASCII, and since UTF-8/UTF-16/UTF-32 are the only valid encodings for JSON text, we can reliably determine the encoding by looking at the pattern of nul bytes in the first four octets: 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8 Given that any of these encodings above are possible, and that there is no 'charset' parameter defined for "application/json", it seems to me that we have no choice but to be prepared to auto-detect the encoding, as described in RFC 4627 section 3 if the 'charset' parameter is missing. What do you think? Mark

This bug report was last modified 7 years and 191 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #30076 [PATCH] web: Recognize JSON content type as text.

GNU bug report logs - #30076
[PATCH] web: Recognize JSON content type as text.