From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 24 14:06:43 2016 Received: (at submit) by debbugs.gnu.org; 24 Oct 2016 18:06:43 +0000 Received: from localhost ([127.0.0.1]:54580 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1byjdz-0005Bi-K3 for submit@debbugs.gnu.org; Mon, 24 Oct 2016 14:06:43 -0400 Received: from eggs.gnu.org ([208.118.235.92]:56830) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1byjdx-0005BV-5l for submit@debbugs.gnu.org; Mon, 24 Oct 2016 14:06:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1byjdq-0001Wm-RU for submit@debbugs.gnu.org; Mon, 24 Oct 2016 14:06:35 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:53585) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1byjdq-0001WY-NN for submit@debbugs.gnu.org; Mon, 24 Oct 2016 14:06:34 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47793) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1byjdp-0000zQ-GX for bug-gnu-emacs@gnu.org; Mon, 24 Oct 2016 14:06:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1byjdl-0001Ry-1v for bug-gnu-emacs@gnu.org; Mon, 24 Oct 2016 14:06:33 -0400 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:35305) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1byjdk-0001QN-RL for bug-gnu-emacs@gnu.org; Mon, 24 Oct 2016 14:06:28 -0400 Received: by mail-wm0-x242.google.com with SMTP id o81so10948550wma.2 for ; Mon, 24 Oct 2016 11:06:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:mime-version; bh=uAyMNHKkYQiuEN229znX13JkMoQCcJvBaQLGPLed6Kk=; b=NE8nZ0RaQ6tvxJagfc2txLF0ZtbnsDi50gqTo+nmaeDmRLNcb9uix7XcoUqG4ZpOPG ownOe0q66Gzqv8je4skYqlRgoGkWoddN2gg0loiWZUkj5Y8r9tLDH6Ub3/iYBDOjaS29 2BzTa1RVzocF1DA22b534gBMBrTQ5FFCGlnFq/Yds7NKurVgOQygT8JvY4CDWsVG7tQU kTpUDoxjC9P0O9PWnC9haWnL535hPaIuA+UI5lq26JXOTQOyrSWjjYB6+yf6lcON2VYM SJc1nYhLJZN3kJwJCNZLQBpY+gW5Lsiu/dqNw3zh/J2Epb94d9KfhRFDz6X3a5NxlWI7 HaFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:mime-version; bh=uAyMNHKkYQiuEN229znX13JkMoQCcJvBaQLGPLed6Kk=; b=MLz0+4wdf5kSdUzahLRUnPf/Us3HReoX3ckxtz9BKXGSkOrDxvVub5rKnD0sm8CxWm B/U0Df3yKhtfKXLfMRZaGWqDf4DeO0EIlIAyy+97RcVJKVxmOxChzX+4ns1sqbgNu9UA 4Gmv0fq6qEFiAtmfmmq3W+vZ0lZoN7TlC4M6A4m+ajME6YoooEFUAElTv5meUrykokhh WOxrkGrq2AijqDD0C0qpKdIiU7xSwGfAsqhme1kw1mm9jusGelOk11UErJe0cQDfmiTb NdA/2hEizaJ9/HM2WKCN9buomgtPxkc75FFaBCRIxD5htQMN/zAMJlMNr0RjQ9OW6PZW +LjQ== X-Gm-Message-State: AA6/9RmMiOnGwUaagHqa1crA/CL9XoUqghuKnyBFEmhVEHugYYSN4Tmdw5Xiy+rx2z7oKQ== X-Received: by 10.28.1.83 with SMTP id 80mr15712992wmb.31.1477332386976; Mon, 24 Oct 2016 11:06:26 -0700 (PDT) Received: from caladan (dial-184214.pool.broadband44.net. [212.46.184.214]) by smtp.gmail.com with ESMTPSA id q125sm16060786wmd.2.2016.10.24.11.06.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Oct 2016 11:06:25 -0700 (PDT) From: Helmut Eller To: bug-gnu-emacs@gnu.org Subject: 26.0.50; JSON strings with utf-16 escape codes Date: Mon, 24 Oct 2016 20:06:18 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) json-read-from-string doesn't parse strings correctly if the the \u syntax is used to write UTF-16 surrogates: (equal (json-read-from-string "\"\\uD834\\uDD1E\"") "\"\U0001D11E\"") => nil The correct result t. To quote RFC 7159[*]: To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a 12-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E". [*] https://tools.ietf.org/html/rfc7159#section-7 In GNU Emacs 26.0.50.2 (x86_64-unknown-linux-gnu, GTK+ Version 3.14.5) of 2016-10-24 built on caladan Repository revision: 26ccd19269c040ad5960a7567aa5fc88f142c709 Windowing system distributor 'The X.Org Foundation', version 11.0.11604000 System Description: Debian GNU/Linux 8.5 (jessie) Configured using: 'configure --with-xpm=no --with-jpeg=no --with-gif=no --with-tiff=no' Configured features: PNG SOUND DBUS GSETTINGS NOTIFY GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 Important settings: value of $LANG: C.UTF-8 locale-coding-system: utf-8-unix From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 24 15:57:39 2016 Received: (at 24784) by debbugs.gnu.org; 24 Oct 2016 19:57:39 +0000 Received: from localhost ([127.0.0.1]:54888 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bylNK-0007th-Tu for submit@debbugs.gnu.org; Mon, 24 Oct 2016 15:57:39 -0400 Received: from mail-wm0-f46.google.com ([74.125.82.46]:32782) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bylNJ-0007tT-5l for 24784@debbugs.gnu.org; Mon, 24 Oct 2016 15:57:37 -0400 Received: by mail-wm0-f46.google.com with SMTP id c78so19382795wme.0 for <24784@debbugs.gnu.org>; Mon, 24 Oct 2016 12:57:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=tiUMmn7r/VDVciX6UtB2EFj1/kTDsArEALupvQjFS40=; b=fVvbBbfus/TvQizko2U/QmqqDAjdb3NX13Bh0cCDjnjaRxGT/iJ9ZXQpD7rCG7whTB Xg/Wb/VzZdaUF4oeqitfWwgGA4It9uuR8EMOLKShYw97tcZz+4lcrXF9Aesjg9vxTPXF PxICy61QlUT7lJgFelXz02FZ0UMqQjdgluo1/nDashpnOAZKlq6hQ6s7G4gapc1P4Mhd S+JNzoCycAZ69jcOqn0CJRL6wAVGJg9lIkIo8/BHr4u1d3cJQbWAefckADAqIUc39eri O4sHOK1QM3FYW1ZhPHsrvjbp2rGdCAPS7hBsfzuBt+vNg3HGIZ7YGYdb5QN4FjO4Ne3Y O4CQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=tiUMmn7r/VDVciX6UtB2EFj1/kTDsArEALupvQjFS40=; b=G7G2UEmJTu9MhW77x+MqI1SGTXP+XqwxbJnrRxuHLIIyvMFD/qvSspe7cOXWBsdjEd shpjQqX1AZX2ZODJr6Xa0enB0MZ1+JrnUipF8dizbettJuT3cEcQghbBxY/8SwZ1y+xv gUrCKt/3r0iz4DxXSSMgi+NDEQRstYSHeS4hxntq0kBIZcyni2ScSPFqQ5WP8BGIy2CP DGHUWUHYYgQp78Y+T9Kqt6StD7I0fmShrcOTlCv3DDGIZI4BsmAO08FT3wX/WDempQKr KzAjAL+BYgFp+Png1tIt6eNIaCYghc8rQqob6UsljWHB4j7dcFkvlMmH9hLeFuJHxcd1 Jnbw== X-Gm-Message-State: AA6/9RlUyAOSuiyvpzFMfxsa0C844y2RROyn6ptBAGc341pgUnTjgNw/oI/FdXq6QILdWZ6eO8Uv1e8LsQp9Xg== X-Received: by 10.28.191.3 with SMTP id p3mr25428167wmf.112.1477339050958; Mon, 24 Oct 2016 12:57:30 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Philipp Stephani Date: Mon, 24 Oct 2016 19:57:19 +0000 Message-ID: Subject: Re: bug#24784: 26.0.50; JSON strings with utf-16 escape codes To: Helmut Eller , 24784@debbugs.gnu.org Content-Type: multipart/mixed; boundary=001a114e2a8e5983f6053fa1cd21 X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 24784 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) --001a114e2a8e5983f6053fa1cd21 Content-Type: multipart/alternative; boundary=001a114e2a8e5983f2053fa1cd1f --001a114e2a8e5983f2053fa1cd1f Content-Type: text/plain; charset=UTF-8 Helmut Eller schrieb am Mo., 24. Okt. 2016 um 20:58 Uhr: > > json-read-from-string doesn't parse strings correctly if the the \u > syntax is used to write UTF-16 surrogates: > > (equal (json-read-from-string "\"\\uD834\\uDD1E\"") "\"\U0001D11E\"") > => nil > > The correct result t. To quote RFC 7159[*]: > > To escape an extended character that is not in the Basic Multilingual > Plane, the character is represented as a 12-character sequence, > encoding the UTF-16 surrogate pair. So, for example, a string > containing only the G clef character (U+1D11E) may be represented as > "\uD834\uDD1E". > > [*] https://tools.ietf.org/html/rfc7159#section-7 > > Thanks for reporting, I've attached a patch. --001a114e2a8e5983f2053fa1cd1f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Helmut= Eller <eller.helmut@gmail.com= > schrieb am Mo., 24. Okt. 2016 um 20:58=C2=A0Uhr:

json-read-from-string doesn't parse strings correctly if the the \u
syntax is used to write UTF-16 surrogates:

=C2=A0(equal (json-read-from-string "\"\\uD834\\uDD1E\""= ;) "\"\U0001D11E\"")
=C2=A0=3D> nil

The correct result t.=C2=A0 To quote RFC 7159[*]:

=C2=A0 =C2=A0To escape an extended character that is not in the Basic Multi= lingual
=C2=A0 =C2=A0Plane, the character is represented as a 12-character sequence= ,
=C2=A0 =C2=A0encoding the UTF-16 surrogate pair.=C2=A0 So, for example, a s= tring
=C2=A0 =C2=A0containing only the G clef character (U+1D11E) may be represen= ted as
=C2=A0 =C2=A0"\uD834\uDD1E".

[*] https://tools.ietf.org/html/rfc= 7159#section-7

Thanks for reporting, I've attached a patch.=C2=A0
--001a114e2a8e5983f2053fa1cd1f-- --001a114e2a8e5983f6053fa1cd21 Content-Type: text/plain; charset=US-ASCII; name="0001-Fix-encoding-of-JSON-surrogate-pairs.txt" Content-Disposition: attachment; filename="0001-Fix-encoding-of-JSON-surrogate-pairs.txt" Content-Transfer-Encoding: base64 Content-ID: <157f844beafc1921ba11> X-Attachment-Id: 157f844beafc1921ba11 RnJvbSA2YzYzMGJkNWIwMDEyNDNkNmI3MTE1MzgwMDg4OTA5YTdhMTgwZGRiIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBQaGlsaXBwIFN0ZXBoYW5pIDxwaHN0QGdvb2dsZS5jb20+CkRh dGU6IE1vbiwgMjQgT2N0IDIwMTYgMjE6NTQ6NTEgKzAyMDAKU3ViamVjdDogW1BBVENIXSBGaXgg ZW5jb2Rpbmcgb2YgSlNPTiBzdXJyb2dhdGUgcGFpcnMKCkpTT04gcmVxdWlyZXMgdGhhdCBzdWNo IHBhaXJzIGJlIHRyZWF0ZWQgYXMgVVRGLTE2IHN1cnJvZ2F0ZSBwYWlycywgbm90CmluZGl2aWR1 YWwgY29kZSBwb2ludHM7IGNmLiBCdWcgIzI0Nzg0LgoKKiBsaXNwL2pzb24uZWwgKGpzb24tcmVh ZC1lc2NhcGVkLWNoYXIpOiBGaXggZGVjb2Rpbmcgb2Ygc3Vycm9nYXRlCnBhaXJzLgooanNvbi0t ZGVjb2RlLXV0Zi0xNi1zdXJyb2dhdGVzKTogTmV3IGRlZnN1YnN0LgoKKiB0ZXN0L2xpc3AvanNv bi10ZXN0cy5lbCAodGVzdC1qc29uLXJlYWQtc3RyaW5nKTogQWRkIHRlc3QgZm9yCnN1cnJvZ2F0 ZSBwYWlycy4KKHRlc3QtanNvbi1lbmNvZGUtc3RyaW5nKTogQWRkIHRlc3QgZm9yIG5vbi1CTVAg Y2hhcmFjdGVyIGVuY29kaW5nLgotLS0KIGxpc3AvanNvbi5lbCAgICAgICAgICAgIHwgMTMgKysr KysrKysrKysrKwogdGVzdC9saXNwL2pzb24tdGVzdHMuZWwgfCAgNyArKysrKy0tCiAyIGZpbGVz IGNoYW5nZWQsIDE4IGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEv bGlzcC9qc29uLmVsIGIvbGlzcC9qc29uLmVsCmluZGV4IGZkYWM4ZDkuLjViZmRmZDQgMTAwNjQ0 Ci0tLSBhL2xpc3AvanNvbi5lbAorKysgYi9saXNwL2pzb24uZWwKQEAgLTM2Myw2ICszNjMsMTAg QEAganNvbi1zcGVjaWFsLWNoYXJzCiAKIDs7IFN0cmluZyBwYXJzaW5nCiAKKyhkZWZzdWJzdCBq c29uLS1kZWNvZGUtdXRmLTE2LXN1cnJvZ2F0ZXMgKGhpZ2ggbG93KQorICAiUmV0dXJuIHRoZSBj b2RlIHBvaW50IHJlcHJlc2VudGVkIGJ5IHRoZSBVVEYtMTYgc3Vycm9nYXRlcyBISUdIIGFuZCBM T1cuIgorICAoKyAobHNoICgtIGhpZ2ggI3hEODAwKSAxMCkgKC0gbG93ICN4REMwMCkgI3gxMDAw MCkpCisKIChkZWZ1biBqc29uLXJlYWQtZXNjYXBlZC1jaGFyICgpCiAgICJSZWFkIHRoZSBKU09O IHN0cmluZyBlc2NhcGVkIGNoYXJhY3RlciBhdCBwb2ludC4iCiAgIDs7IFNraXAgb3ZlciB0aGUg J1wnCkBAIC0zNzIsNiArMzc2LDE1IEBAIGpzb24tcmVhZC1lc2NhcGVkLWNoYXIKICAgICAoY29u ZAogICAgICAoc3BlY2lhbCAoY2RyIHNwZWNpYWwpKQogICAgICAoKG5vdCAoZXEgY2hhciA/dSkp IGNoYXIpCisgICAgIDs7IFNwZWNpYWwtY2FzZSBVVEYtMTYgc3Vycm9nYXRlIHBhaXJzLAorICAg ICA7OyBjZi4gaHR0cHM6Ly90b29scy5pZXRmLm9yZy9odG1sL3JmYzcxNTkjc2VjdGlvbi03Cisg ICAgICgobG9va2luZy1hdAorICAgICAgIChyeCAoZ3JvdXAgKGFueSAiRGQiKSAoYW55ICI4OUFC YWIiKSAoPSAyIChhbnkgIjAtOUEtRmEtZiIpKSkKKyAgICAgICAgICAgIlxcdSIgKGdyb3VwIChh bnkgIkRkIikgKGFueSAiQy1GYy1mIikgKD0gMiAoYW55ICIwLTlBLUZhLWYiKSkpKSkKKyAgICAg IChqc29uLWFkdmFuY2UgMTApCisgICAgICAoanNvbi0tZGVjb2RlLXV0Zi0xNi1zdXJyb2dhdGVz CisgICAgICAgKHN0cmluZy10by1udW1iZXIgKG1hdGNoLXN0cmluZyAxKSAxNikKKyAgICAgICAo c3RyaW5nLXRvLW51bWJlciAobWF0Y2gtc3RyaW5nIDIpIDE2KSkpCiAgICAgICgobG9va2luZy1h dCAiWzAtOUEtRmEtZl1bMC05QS1GYS1mXVswLTlBLUZhLWZdWzAtOUEtRmEtZl0iKQogICAgICAg KGxldCAoKGhleCAobWF0Y2gtc3RyaW5nIDApKSkKICAgICAgICAgKGpzb24tYWR2YW5jZSA0KQpk aWZmIC0tZ2l0IGEvdGVzdC9saXNwL2pzb24tdGVzdHMuZWwgYi90ZXN0L2xpc3AvanNvbi10ZXN0 cy5lbAppbmRleCA3OGNlYmI0Li44OTU4MDAwIDEwMDY0NAotLS0gYS90ZXN0L2xpc3AvanNvbi10 ZXN0cy5lbAorKysgYi90ZXN0L2xpc3AvanNvbi10ZXN0cy5lbApAQCAtMTY3LDE0ICsxNjcsMTcg QEAganNvbi10ZXN0cy0td2l0aC10ZW1wLWJ1ZmZlcgogICAgIChzaG91bGQgKGVxdWFsIChqc29u LXJlYWQtc3RyaW5nKSAiYWJjzrHOss6zIikpKQogICAoanNvbi10ZXN0cy0td2l0aC10ZW1wLWJ1 ZmZlciAiXCJcXG5hc2RcXHUwNDQ0XFx1MDQ0YlxcdTA0MzJmZ2hcXHRcIiIKICAgICAoc2hvdWxk IChlcXVhbCAoanNvbi1yZWFkLXN0cmluZykgIlxuYXNk0YTRi9CyZmdoXHQiKSkpCisgIDs7IEJ1 ZyMyNDc4NAorICAoanNvbi10ZXN0cy0td2l0aC10ZW1wLWJ1ZmZlciAiXCJcXHVEODM0XFx1REQx RVwiIgorICAgIChzaG91bGQgKGVxdWFsIChqc29uLXJlYWQtc3RyaW5nKSAiXFUwMDAxRDExRSIp KSkKICAgKGpzb24tdGVzdHMtLXdpdGgtdGVtcC1idWZmZXIgImZvbyIKICAgICAoc2hvdWxkLWVy cm9yIChqc29uLXJlYWQtc3RyaW5nKSA6dHlwZSAnanNvbi1zdHJpbmctZm9ybWF0KSkpCiAKIChl cnQtZGVmdGVzdCB0ZXN0LWpzb24tZW5jb2RlLXN0cmluZyAoKQogICAoc2hvdWxkIChlcXVhbCAo anNvbi1lbmNvZGUtc3RyaW5nICJmb28iKSAiXCJmb29cIiIpKQogICAoc2hvdWxkIChlcXVhbCAo anNvbi1lbmNvZGUtc3RyaW5nICJhXG5cZmIiKSAiXCJhXFxuXFxmYlwiIikpCi0gIChzaG91bGQg KGVxdWFsIChqc29uLWVuY29kZS1zdHJpbmcgIlxuYXNk0YTRi9CyXHUwMDFmXHUwMDdmZmdoXHQi KQotICAgICAgICAgICAgICAgICAiXCJcXG5hc2TRhNGL0LJcXHUwMDFmXHUwMDdmZmdoXFx0XCIi KSkpCisgIChzaG91bGQgKGVxdWFsIChqc29uLWVuY29kZS1zdHJpbmcgIlxuYXNk0YTRi9Cy8J2E nlx1MDAxZlx1MDA3ZmZnaFx0IikKKyAgICAgICAgICAgICAgICAgIlwiXFxuYXNk0YTRi9Cy8J2E nlxcdTAwMWZcdTAwN2ZmZ2hcXHRcIiIpKSkKIAogKGVydC1kZWZ0ZXN0IHRlc3QtanNvbi1lbmNv ZGUta2V5ICgpCiAgIChzaG91bGQgKGVxdWFsIChqc29uLWVuY29kZS1rZXkgImZvbyIpICJcImZv b1wiIikpCi0tIAoyLjEwLjEKCg== --001a114e2a8e5983f6053fa1cd21-- From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 24 19:19:29 2016 Received: (at 24784) by debbugs.gnu.org; 24 Oct 2016 23:19:29 +0000 Received: from localhost ([127.0.0.1]:55499 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1byoWf-0004Vf-0X for submit@debbugs.gnu.org; Mon, 24 Oct 2016 19:19:29 -0400 Received: from mail-wm0-f54.google.com ([74.125.82.54]:36909) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1byoWc-0004VP-Rg for 24784@debbugs.gnu.org; Mon, 24 Oct 2016 19:19:27 -0400 Received: by mail-wm0-f54.google.com with SMTP id d199so15167078wmd.0 for <24784@debbugs.gnu.org>; Mon, 24 Oct 2016 16:19:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=s6dlSeHoLPju8Hvo5APxJrXJ/0GltEUO1/ZH+nv3zwg=; b=PdIBSCW4cLkg7yMWG6lVFRi4U+evjVRqViBG0hMrpVv5uOiJSNM4LQepckYubezckR q+JE8FcyJUlsOl6rWjTIKD6LLc9Eqh0KL7xSqSDGis6HlmopDPsQ4Za/5xEMOoIKUoh4 jVb7a1Fzm9dGfJeP2ZWA69MS/7vtq2ogigokdDrGWwU+YNvi4z9KfTayZRqvFMXruE8z aU0Nm7O8Emd2+aYKU7vi3Atgi9HHDnxN8PfalJ9pS3nl6aRzN1hPCh70WPmmV9FUbr+O zLuqfLSz1nLJz+HHLcuRJbg4QYY1aFeE2HWqghf3FUPMHPS7ivWsoPnOETvGA9buDcBf 5k9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=s6dlSeHoLPju8Hvo5APxJrXJ/0GltEUO1/ZH+nv3zwg=; b=FfD+CVNDtVWLL5IIRq9kTkmT8g2QraPv08iUjFtIgkW6SvSI+bydQ6gzsw0DxLYYOj 7YxYUxtZChSfFmBs4CMKsdn0rQL77tyC4yduhLtXKXXxVQ2VlJmGsjosjcI0/YN0QiOv L0GKWvTv91Ae5L0jll+YkGEx7Jqo4mOfcIy6CLrsGiFxveA/xXQf1tjMyMwXmuBTDzLr Epz4Q+SbeAWaZZAp8TPfRbMIKahMG9NmeRA7xqvvxtmlVEdofYlL5zP2S/l+BqVV3DCG dwsXjdyWpB5LBd+/x+7rV5xnTBX2s8Nh2CJH9yEFnA2yxhrJLLZOqvPM1ynlSNErw7v5 FRfA== X-Gm-Message-State: ABUngvdhZHcmtfUEYws39ztdCbZTGkW/I7Ar1ntr/Mzv6DSTnt0RWyqAGMs2OONjp4h48Q== X-Received: by 10.28.155.137 with SMTP id d131mr139541wme.131.1477351161174; Mon, 24 Oct 2016 16:19:21 -0700 (PDT) Received: from [192.168.1.3] ([185.105.173.41]) by smtp.googlemail.com with ESMTPSA id e2sm21705294wjw.14.2016.10.24.16.19.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Oct 2016 16:19:20 -0700 (PDT) Subject: Re: bug#24784: 26.0.50; JSON strings with utf-16 escape codes To: Philipp Stephani , Helmut Eller , 24784@debbugs.gnu.org References: From: Dmitry Gutov Message-ID: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> Date: Tue, 25 Oct 2016 02:19:18 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Thunderbird/50.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 24784 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) Philipp, Thanks. Some comments: On 24.10.2016 22:57, Philipp Stephani wrote: > +(defsubst json--decode-utf-16-surrogates (high low) IIRC, there might be no actual benefit from making it a defsubst. If someone could benchmark it, I'd like to see the result. > + ;; Special-case UTF-16 surrogate pairs, > + ;; cf. https://tools.ietf.org/html/rfc7159#section-7 > + ((looking-at > + (rx (group (any "Dd") (any "89ABab") (= 2 (any "0-9A-Fa-f"))) > + "\\u" (group (any "Dd") (any "C-Fc-f") (= 2 (any "0-9A-Fa-f"))))) > + (json-advance 10) > + (json--decode-utf-16-surrogates > + (string-to-number (match-string 1) 16) > + (string-to-number (match-string 2) 16))) Shouldn't this go below the UTF-8 case, as the less-frequent one? > (ert-deftest test-json-encode-string () > (should (equal (json-encode-string "foo") "\"foo\"")) > (should (equal (json-encode-string "a\n\fb") "\"a\\n\\fb\"")) > - (should (equal (json-encode-string "\nasdфыв\u001f\u007ffgh\t") > - "\"\\nasdфыв\\u001f\u007ffgh\\t\""))) > + (should (equal (json-encode-string "\nasdфыв𝄞\u001f\u007ffgh\t") > + "\"\\nasdфыв𝄞\\u001f\u007ffgh\\t\""))) Why are we testing string encoding here? From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 26 12:40:19 2016 Received: (at 24784) by debbugs.gnu.org; 26 Oct 2016 16:40:19 +0000 Received: from localhost ([127.0.0.1]:57761 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzRFP-0000RZ-OU for submit@debbugs.gnu.org; Wed, 26 Oct 2016 12:40:19 -0400 Received: from mail-lf0-f46.google.com ([209.85.215.46]:36599) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzRFK-0000RH-TC for 24784@debbugs.gnu.org; Wed, 26 Oct 2016 12:40:14 -0400 Received: by mail-lf0-f46.google.com with SMTP id b75so9935058lfg.3 for <24784@debbugs.gnu.org>; Wed, 26 Oct 2016 09:40:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=83mIEAWr2hQx7rjXHN0JMT8h7LC0CFKXl73BR/6jCIU=; b=MMrH5o8k3jeine0FhE7Y7ctbS7tBVLE7zpYdoKHnIKlthXIzbKWIPNbuJtvFVwhGC4 KM/avNkdTjMVjaisSMUPSdCSFXjmIhfLdMYL45QU+JtkGbgDocJolSZrbPGDY8CtoC0I KIqb7ZtkZTR/yDXu+eX+qtH/ZGiJq06detKX3tCX55Xy4UdnK00yrtfyD656Ua6Oy9GW Fj+HCPlb8BcNrnZCTZqmct85kBwE587jpRb/5CC9XWo1cNq9HYIpzHzUE8HdFeCKGfQb yXGpRkdU2cnLs+HLSRvtH16BB3/XZRMhZRd2IHjN0wAhvc46Cs9VPf+WGLU2fNg7ZIQh e4kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=83mIEAWr2hQx7rjXHN0JMT8h7LC0CFKXl73BR/6jCIU=; b=SesJnmoekhGNzjQxenQB7WwVvjiiWS9CXZ8EsyxlLpS7UecH0t4szrJbZnG2laMebg 8pxb998HIvFb4x1SfjuVBhSVL5YhlLjbT31SLe/XBWPrGkil/jdAevktuBzYqukMRqcO XFpq510okQdWZHb7k6biuirveMF6c41MfiWrAY/GwmgDhKuHxwHlbBpF2KEhOf4FlzlV nMENp6MF8cXsnPzIOF4hC1HXUDK8tfTm4Rl9BGllIp5/nLQx1EEYEUJ0DokUfLjKJB6F G5Rq1iBjrCImF2v9r+5YyrXNlVlFlFze8OqFd7EwpbTKiRsRekB3fimRNfqFChWuZg34 KUAA== X-Gm-Message-State: ABUngveORMXWQwydSnAhKvto6FCOAUMZvYSPNYyqM6xbfv4VGOsd3nbk5N3QC7+qSCoxSg== X-Received: by 10.194.28.5 with SMTP id x5mr3355541wjg.63.1477500004427; Wed, 26 Oct 2016 09:40:04 -0700 (PDT) Received: from caladan (dial-184214.pool.broadband44.net. [212.46.184.214]) by smtp.gmail.com with ESMTPSA id 71sm10477251wmo.7.2016.10.26.09.40.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Oct 2016 09:40:02 -0700 (PDT) From: Helmut Eller To: Dmitry Gutov Subject: Re: bug#24784: 26.0.50; JSON strings with utf-16 escape codes References: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> Date: Wed, 26 Oct 2016 18:39:57 +0200 In-Reply-To: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> (Dmitry Gutov's message of "Tue, 25 Oct 2016 02:19:18 +0300") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.2 (/) X-Debbugs-Envelope-To: 24784 Cc: Philipp Stephani , 24784@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.2 (/) On Tue, Oct 25 2016, Dmitry Gutov wrote: > On 24.10.2016 22:57, Philipp Stephani wrote: > >> +(defsubst json--decode-utf-16-surrogates (high low) > > IIRC, there might be no actual benefit from making it a defsubst. If > someone could benchmark it, I'd like to see the result. I guess it doesn't hurt but I also doubt that it makes a measurable difference as utf-16 surrogates are rarely needed. > >> + ;; Special-case UTF-16 surrogate pairs, >> + ;; cf. https://tools.ietf.org/html/rfc7159#section-7 >> + ((looking-at >> + (rx (group (any "Dd") (any "89ABab") (= 2 (any "0-9A-Fa-f"))) >> + "\\u" (group (any "Dd") (any "C-Fc-f") (= 2 (any "0-9A-Fa-f"))))) >> + (json-advance 10) >> + (json--decode-utf-16-surrogates >> + (string-to-number (match-string 1) 16) >> + (string-to-number (match-string 2) 16))) > > Shouldn't this go below the UTF-8 case, as the less-frequent one? There's also an opportunity to detect unpaired surrogates, e.g.: (defun json-read-escaped-char () "Read the JSON string escaped character at point." ;; Skip over the '\' (json-advance) (let* ((char (json-pop)) (special (assq char json-special-chars))) (cond (special (cdr special)) ((not (eq char ?u)) char) ((looking-at "[0-9A-Fa-f]\\{4\\}") (let* ((code (string-to-number (match-string 0) 16))) (json-advance 4) (cond ((<= #xD800 code #xDBFF) ; UTF-16 high surrogate (cond ((looking-at "\\\\u\\([Dd][C-Fc-f][0-9A-Fa-f]\\{2\\}\\)") (let ((low (string-to-number (match-string 1) 16))) (json-advance 6) (json--decode-utf-16-surrogates code low))) (t ;; Expected low surrogate missing (signal 'json-string-escape (list (point)))))) ((<= #xDC00 code #xDFFF) ;; Unexpected low surrogate (signal 'json-string-escape (list (point)))) (t code)))) (t (signal 'json-string-escape (list (point))))))) From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 26 13:37:51 2016 Received: (at 24784) by debbugs.gnu.org; 26 Oct 2016 17:37:51 +0000 Received: from localhost ([127.0.0.1]:57811 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzS99-0003aq-CP for submit@debbugs.gnu.org; Wed, 26 Oct 2016 13:37:51 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:45185) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzS97-0003ac-Rm for 24784@debbugs.gnu.org; Wed, 26 Oct 2016 13:37:50 -0400 Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u9QHbhxA014569 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 26 Oct 2016 17:37:43 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id u9QHbhJl023584 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 26 Oct 2016 17:37:43 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u9QHbgLG032388; Wed, 26 Oct 2016 17:37:42 GMT MIME-Version: 1.0 Message-ID: Date: Wed, 26 Oct 2016 10:37:41 -0700 (PDT) From: Drew Adams To: Helmut Eller , Dmitry Gutov Subject: RE: bug#24784: 26.0.50; JSON strings with utf-16 escape codes References: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> In-Reply-To: X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.9.1 (1003210) [OL 12.0.6753.5000 (x86)] Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Source-IP: userv0021.oracle.com [156.151.31.71] X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: 24784 Cc: Philipp Stephani , 24784@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.6 (---) > >> +(defsubst json--decode-utf-16-surrogates (high low) > > > > IIRC, there might be no actual benefit from making it a defsubst. > > If someone could benchmark it, I'd like to see the result. >=20 > I guess it doesn't hurt but I also doubt that it makes a measurable > difference as utf-16 surrogates are rarely needed. IMO, we should never, ever use defsubst nowadays. Unless you can come up with a REALLY good rationale. Every time defsubst is used it throws a monkey wrench in the ability of users to extend and modify Emacs behavior. And nothing is really gained. Just one opinion. From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 31 11:53:45 2016 Received: (at 24784) by debbugs.gnu.org; 31 Dec 2016 16:53:45 +0000 Received: from localhost ([127.0.0.1]:33423 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNMuf-0008Ew-H6 for submit@debbugs.gnu.org; Sat, 31 Dec 2016 11:53:45 -0500 Received: from mail-wj0-f171.google.com ([209.85.210.171]:34839) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNMud-0008Eg-Eb for 24784@debbugs.gnu.org; Sat, 31 Dec 2016 11:53:44 -0500 Received: by mail-wj0-f171.google.com with SMTP id v7so398007190wjy.2 for <24784@debbugs.gnu.org>; Sat, 31 Dec 2016 08:53:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Ad6jepg04IA5ByolmQPfzyjwRR3VIg2XvO5Kr7ojxk8=; b=LLkp0wlc19OpIM2cVdfwXgVFFvRmlnGkPQp0r/1oq2NxgSD0OuLBQIUCeltq2TuqSM fW66ITc68h9X1h2yGRvCkxXm2giaIGiWElkDmnb+bayC7Gd7GRCMgCy9TR/2J3/Izze8 NcjSh6cV8RueDERbqwFDZ2nlu3ICg9GQcIzFT9CbyNZwYMB1q/19gmWsDs8XkK3eYbze 6rX6nTF4PZI5JIFhzgC942TkIpMYASAv2B+nbOPjNqBQk8JCKvTrH/rj/HCj6GEUH8RD mPHYDoBheUjG0Nltn8710yogFAPJzx16AtV2OfEbCmDBkcA2Qf8rcslll1BpL0Ans5Xe AbVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Ad6jepg04IA5ByolmQPfzyjwRR3VIg2XvO5Kr7ojxk8=; b=WDqSXjAEsHfEJKovon2kXvYLC5zMDsSX2JLJ/JInp5Rceshbgi8MIcoVc01WG0aLej NFBbhWlFkx+SOHU078cCqNblzkQEUqQ9/Cz2C5Ntv43nBsYyvRhJpciMlZg3FS9ERjzW spvbZU/p4pek/fgWL775CKhbbJ5hJF7Pwv/lDzJTzdS3LOji2P2ciqBexIe2VenL2SGO PLqxKx9VjNHgTmg3zW11Wg0Q9Oah+aFrod1qu/JMFqm3LaGQ5F4a04QY/4wx8zjkak4d Fk/xUuS34HG6HkMbAY/qnu3EBBd4qH7iWh+OOv5bsRDC+p9RhU6DyoI3Ki9Bf/UpcQ7P RmGQ== X-Gm-Message-State: AIkVDXJNKwJuIFNBroLGfFKHSVsGIrojcDxZ72fhf9/A8epH3fpUQNUuHCmef/OzvAHeGa5PGIXiEl0IoBUxaA== X-Received: by 10.194.58.7 with SMTP id m7mr32052820wjq.73.1483203217639; Sat, 31 Dec 2016 08:53:37 -0800 (PST) MIME-Version: 1.0 References: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> In-Reply-To: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> From: Philipp Stephani Date: Sat, 31 Dec 2016 16:53:27 +0000 Message-ID: Subject: Re: bug#24784: 26.0.50; JSON strings with utf-16 escape codes To: Dmitry Gutov , Helmut Eller , 24784@debbugs.gnu.org Content-Type: multipart/alternative; boundary=047d7b86cf3eeacec30544f72823 X-Spam-Score: -0.4 (/) X-Debbugs-Envelope-To: 24784 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.4 (/) --047d7b86cf3eeacec30544f72823 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dmitry Gutov schrieb am Di., 25. Okt. 2016 um 01:19 Uhr: > Philipp, > > Thanks. Some comments: > > On 24.10.2016 22:57, Philipp Stephani wrote: > > > +(defsubst json--decode-utf-16-surrogates (high low) > > IIRC, there might be no actual benefit from making it a defsubst. If > someone could benchmark it, I'd like to see the result. > Agreed; converted to defun. I've only used defsubst because some other helper functions also used defsubst. > > > + ;; Special-case UTF-16 surrogate pairs, > > + ;; cf. https://tools.ietf.org/html/rfc7159#section-7 > > + ((looking-at > > + (rx (group (any "Dd") (any "89ABab") (=3D 2 (any "0-9A-Fa-f"))) > > + "\\u" (group (any "Dd") (any "C-Fc-f") (=3D 2 (any > "0-9A-Fa-f"))))) > > + (json-advance 10) > > + (json--decode-utf-16-surrogates > > + (string-to-number (match-string 1) 16) > > + (string-to-number (match-string 2) 16))) > > Shouldn't this go below the UTF-8 case, as the less-frequent one? > No, the below case is more general and therefore has to come last. > > > (ert-deftest test-json-encode-string () > > (should (equal (json-encode-string "foo") "\"foo\"")) > > (should (equal (json-encode-string "a\n\fb") "\"a\\n\\fb\"")) > > - (should (equal (json-encode-string "\nasd=C3=91=E2=80=9E=C3=91=E2=80= =B9=C3=90=C2=B2\u001f\u007ffgh\t") > > - "\"\\nasd=C3=91=E2=80=9E=C3=91=E2=80=B9=C3=90=C2=B2\\= u001f\u007ffgh\\t\""))) > > + (should (equal (json-encode-string "\nasd=C3=91=E2=80=9E=C3=91=E2=80= =B9=C3=90=C2=B2=C3=B0 =E2=80=9E=C5=BE\u001f\u007ffgh\t") > > + "\"\\nasd=C3=91=E2=80=9E=C3=91=E2=80=B9=C3=90=C2=B2= =C3=B0 =E2=80=9E=C5=BE\\u001f\u007ffgh\\t\""))) > > Why are we testing string encoding here? > It's not 100% related to the patch, but I think it can be included for symmetry reasons (testing encoding as well as decoding). --047d7b86cf3eeacec30544f72823 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Dmitry= Gutov <dgutov@yandex.ru> sch= rieb am Di., 25. Okt. 2016 um 01:19=C2=A0Uhr:
Philipp,

Thanks. Some comments:

On 24.10.2016 22:57, Philipp Stephani wrote:

> +(defsubst json--decode-utf-16-surrogates (high low)

IIRC, there might be no actual benefit from making it a defsubst. If
someone could benchmark it, I'd like to see the result.

Agreed; converted to defun. I'= ve only used defsubst because some other helper functions also used defsubs= t.
=C2=A0

> +=C2=A0 =C2=A0 =C2=A0;; Special-case UTF-16 surrogate pairs,
> +=C2=A0 =C2=A0 =C2=A0;; cf. ht= tps://tools.ietf.org/html/rfc7159#section-7
> +=C2=A0 =C2=A0 =C2=A0((looking-at
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0(rx (group (any "Dd") (any "= ;89ABab") (=3D 2 (any "0-9A-Fa-f")))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"\\u" (group (any = "Dd") (any "C-Fc-f") (=3D 2 (any "0-9A-Fa-f")= ))))
> +=C2=A0 =C2=A0 =C2=A0 (json-advance 10)
> +=C2=A0 =C2=A0 =C2=A0 (json--decode-utf-16-surrogates
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0(string-to-number (match-string 1) 16)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0(string-to-number (match-string 2) 16)))
Shouldn't this go below the UTF-8 case, as the less-frequent one?

No, the below case is mo= re general and therefore has to come last.
=C2=A0

>=C2=A0 (ert-deftest test-json-encode-string ()
>=C2=A0 =C2=A0 (should (equal (json-encode-string "foo") "= ;\"foo\""))
>=C2=A0 =C2=A0 (should (equal (json-encode-string "a\n\fb") &q= uot;\"a\\n\\fb\""))
> -=C2=A0 (should (equal (json-encode-string "\nasd=C3=91=E2=80=9E= =C3=91=E2=80=B9=C3=90=C2=B2\u001f\u007ffgh\t")
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"\= "\\nasd=C3=91=E2=80=9E=C3=91=E2=80=B9=C3=90=C2=B2\\u001f\u007ffgh\\t\&= quot;")))
> +=C2=A0 (should (equal (json-encode-string "\nasd=C3=91=E2=80=9E= =C3=91=E2=80=B9=C3=90=C2=B2=C3=B0 =E2=80=9E=C5=BE\u001f\u007ffgh\t") > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"\= "\\nasd=C3=91=E2=80=9E=C3=91=E2=80=B9=C3=90=C2=B2=C3=B0 =E2=80=9E=C5= =BE\\u001f\u007ffgh\\t\"")))

Why are we testing string encoding here?

It's not 100% related to the patch, but I think i= t can be included for symmetry reasons (testing encoding as well as decodin= g).=C2=A0
--047d7b86cf3eeacec30544f72823-- From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 31 19:45:44 2016 Received: (at 24784) by debbugs.gnu.org; 1 Jan 2017 00:45:44 +0000 Received: from localhost ([127.0.0.1]:33565 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNUHQ-00045a-9F for submit@debbugs.gnu.org; Sat, 31 Dec 2016 19:45:44 -0500 Received: from mail-lf0-f49.google.com ([209.85.215.49]:32843) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNUHM-00045L-Ev for 24784@debbugs.gnu.org; Sat, 31 Dec 2016 19:45:41 -0500 Received: by mail-lf0-f49.google.com with SMTP id c13so254833621lfg.0 for <24784@debbugs.gnu.org>; Sat, 31 Dec 2016 16:45:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=x6h3AqGzAtpk9YOZRn8bv2CXju/S1ji7N8g0FftgwjE=; b=gcfvNfyDeGzeYbIFI0aTkiWgQmafr+awSSSSdVZhDmOqNYZpi2sZd4THT7S9IILiQ9 h+XgFEPzBd4SCpMiNbFHL+dvLFENtR+y/YF5LJQd6vn0+Lrtf8J283smS/qTyRGTcx2q wviIKVmaMsng+5c68Ndb6DRNfGLgYWEJLJh/l47tq8foOPndxKzATBtWYLVJfrmq3ksM Aw+z99Mlnh3SBFfbYG5GK8/lnJPJamVRhryYTUuT5qWLtCP9OMPQPb/TdMckOsgBESh1 DVwR48jzZ5uGLLkFnkg8S+XlXNnAjzRuEhcEHJTdaeKS33yyuNkf22fpQ4bmgpOIGepN MbZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=x6h3AqGzAtpk9YOZRn8bv2CXju/S1ji7N8g0FftgwjE=; b=XsYROUUlSLT/VcqMQDqjCjqTgjJjUBVGMsSoi8uXz32r3CuogznINfDMkgjTcXGo84 8dT6QGMiR5cJRUTH2jjUm5LeLL48nkJr6KXBI5QmFq1S4OofQ/3mzZiIcor8SLUHP85v xhhshpVZe0nxyOpSAcRY8yI6TUsTD64PFoX6HQ2fUX1z5LIleuvtTqlJW2eCbciF7j2j YjJnr9SqJxD0rvuw94GX1/GHnSgDFIaGYM5S7J+uWVPzx5GEE9auu/GrqgeYx002yyTk yuKP0nIoa3o1qau1M0cLRXyeYeOOrn4+Jj2Q+Xgb9ORWrSRcPkefsu9iVBn67rCRbdTk KkSg== X-Gm-Message-State: AIkVDXKY5kkzvpXlw/tzHGzeae/xYLmMCU0u5gWbSSTq+0P8wSCevmllOPMHOeh3IP+mcw== X-Received: by 10.46.21.2 with SMTP id s2mr17869898ljd.19.1483231533831; Sat, 31 Dec 2016 16:45:33 -0800 (PST) Received: from [192.168.1.174] ([178.252.127.239]) by smtp.googlemail.com with ESMTPSA id n124sm12851730lfn.5.2016.12.31.16.45.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 31 Dec 2016 16:45:33 -0800 (PST) Subject: Re: bug#24784: 26.0.50; JSON strings with utf-16 escape codes To: Philipp Stephani , Helmut Eller , 24784@debbugs.gnu.org References: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> From: Dmitry Gutov Message-ID: Date: Sun, 1 Jan 2017 03:45:31 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Thunderbird/50.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 3.4 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On 31.12.2016 19:53, Philipp Stephani wrote: > Agreed; converted to defun. I've only used defsubst because some other > helper functions also used defsubst. Thanks. Those others can probably be changed as well. [...] Content analysis details: (3.4 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 3.6 RCVD_IN_SORBS_WEB RBL: SORBS: sender is an abusable web server [178.252.127.239 listed in dnsbl.sorbs.net] -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.215.49 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (dgutov[at]yandex.ru) -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.215.49 listed in wl.mailspike.net] 0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.215.49 listed in dnsbl.sorbs.net] 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid 0.0 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-Debbugs-Envelope-To: 24784 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 3.4 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On 31.12.2016 19:53, Philipp Stephani wrote: > Agreed; converted to defun. I've only used defsubst because some other > helper functions also used defsubst. Thanks. Those others can probably be changed as well. [...] Content analysis details: (3.4 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.215.49 listed in dnsbl.sorbs.net] 3.6 RCVD_IN_SORBS_WEB RBL: SORBS: sender is an abusable web server [178.252.127.239 listed in dnsbl.sorbs.net] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.215.49 listed in wl.mailspike.net] -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.215.49 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (raaahh[at]gmail.com) 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid 0.0 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders On 31.12.2016 19:53, Philipp Stephani wrote: > Agreed; converted to defun. I've only used defsubst because some other > helper functions also used defsubst. Thanks. Those others can probably be changed as well. > No, the below case is more general and therefore has to come last. Makes sense. > It's not 100% related to the patch, but I think it can be included for > symmetry reasons (testing encoding as well as decoding). Of course. These are testing utf-8 encoding, though, right? It would be better if you split them to a separate commit, I think. From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 01 07:26:26 2017 Received: (at 24784-done) by debbugs.gnu.org; 1 Jan 2017 12:26:26 +0000 Received: from localhost ([127.0.0.1]:33711 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNfDW-0005nr-5V for submit@debbugs.gnu.org; Sun, 01 Jan 2017 07:26:26 -0500 Received: from mail-wm0-f44.google.com ([74.125.82.44]:38740) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNfDU-0005nd-5z for 24784-done@debbugs.gnu.org; Sun, 01 Jan 2017 07:26:24 -0500 Received: by mail-wm0-f44.google.com with SMTP id k184so193404128wme.1 for <24784-done@debbugs.gnu.org>; Sun, 01 Jan 2017 04:26:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=vv/FbNEewgxXcMNKUWA6e7p9My3mJU5W5P26NTT6n7Y=; b=JkO0UmHrm6WF6bKzbaTY9Izegt2k+JeV9SGG1NUak4IDluyA04q2kk2jrnY7niMs+m acXnp/2RgfyNZ78KA/QLlvQ0iqXXPh54pg8Q9ywVlIOuf61ytJooTbnO5gDTamyHtIvI C3ZOhGMxRrrM0iwzMRDBTYE/5r7BGBF+plHtdXbJNS1XZ82dDmMaxBFNY9fcgOgpSYz5 5sSvK99QiFBiG7QZOoQ1JSvuAKRtO2ahVjMc+sMTUx2g14gGaArM8fBX8KE2JmHhz1XS M+frs+V/6yL2UWx+IhQdocvKNF79S4ydEME877sh5x0j8y77hiN16uhduyk366UApp/1 7XLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=vv/FbNEewgxXcMNKUWA6e7p9My3mJU5W5P26NTT6n7Y=; b=o1KsvzLhPUgVtpdRtW7Poha91G723ePe922n/N03P8wFh6bNLJWajeWdvTB1n3kBGV SRxMZS2AATg8yegoZE8iaQhi6OAfHWIqQ5wGESvjp0clm0T7qAcctq83pNiSkneZi+lR XRGExfGu/R42CRSdJ3eyhaiAuIkHTrBnhycH8hzpy2938h8i+06qpHJDNfBwQBzYmmv8 Rw4LC35nA9gTtSxcVqRORhbSpp3yDm5tJAXmfQ8DFzghafDEuK2bICKsVNY6pzlcZm2g V91Ty7PDplkYhncdvvtk6fZziwqQ6knppLp5mZ+rKbQCdfpTjs154AmlcryJ+WRjWBmp eC9Q== X-Gm-Message-State: AIkVDXJW9ndIY7BhHMcQVkJFM//3UHUyKegSHLJK1ZGwwBTA1l3etjxzfWJW7BfwSNZrZ2IHY8cZR4bqvoIBVQ== X-Received: by 10.28.129.81 with SMTP id c78mr48896219wmd.94.1483273578467; Sun, 01 Jan 2017 04:26:18 -0800 (PST) MIME-Version: 1.0 References: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> In-Reply-To: From: Philipp Stephani Date: Sun, 01 Jan 2017 12:26:07 +0000 Message-ID: Subject: Re: bug#24784: 26.0.50; JSON strings with utf-16 escape codes To: Dmitry Gutov , Helmut Eller , 24784-done@debbugs.gnu.org Content-Type: multipart/alternative; boundary=001a11424262bfdab50545078afd X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 24784-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) --001a11424262bfdab50545078afd Content-Type: text/plain; charset=UTF-8 Dmitry Gutov schrieb am So., 1. Jan. 2017 um 01:45 Uhr: > On 31.12.2016 19:53, Philipp Stephani wrote: > > > Agreed; converted to defun. I've only used defsubst because some other > > helper functions also used defsubst. > > Thanks. Those others can probably be changed as well. > Yes, but rather not in this commit. > > > No, the below case is more general and therefore has to come last. > > Makes sense. > > > It's not 100% related to the patch, but I think it can be included for > > symmetry reasons (testing encoding as well as decoding). > > Of course. These are testing utf-8 encoding, though, right? It would be > better if you split them to a separate commit, I think. > OK, I've removed it from this patch and pushed it as 93be35e038. --001a11424262bfdab50545078afd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Dmitry= Gutov <dgutov@yandex.ru> sch= rieb am So., 1. Jan. 2017 um 01:45=C2=A0Uhr:
On 31.12.2016 19:53, Philipp Stephani wrote:

> Agreed; converted to defun. I've only used defsubst because some o= ther
> helper functions also used defsubst.

Thanks. Those others can probably be changed as well.

Yes, but rather not in this commit.
=C2=A0

> No, the below case is more general and therefore has to come last.

Makes sense.

> It's not 100% related to the patch, but I think it can be included= for
> symmetry reasons (testing encoding as well as decoding).

Of course. These are testing utf-8 encoding, though, right? It would be
better if you split them to a separate commit, I think.

OK, I've removed it from this patc= h and pushed it as 93be35e038.
--001a11424262bfdab50545078afd-- From unknown Sun Jun 22 22:44:40 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 30 Jan 2017 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator