From unknown Sat Jun 14 04:57:51 2025 X-Loop: help-debbugs@gnu.org Subject: bug#38269: SSAX incorrect handling of > in CDATA Resent-From: Andrew Gierth Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 19 Nov 2019 14:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 38269 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: 38269@debbugs.gnu.org X-Debbugs-Original-To: bug-guile@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.157417499422816 (code B ref -1); Tue, 19 Nov 2019 14:50:01 +0000 Received: (at submit) by debbugs.gnu.org; 19 Nov 2019 14:49:54 +0000 Received: from localhost ([127.0.0.1]:46838 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iX4pG-0005vv-0N for submit@debbugs.gnu.org; Tue, 19 Nov 2019 09:49:54 -0500 Received: from lists.gnu.org ([209.51.188.17]:53971) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iX3lk-0002OB-P0 for submit@debbugs.gnu.org; Tue, 19 Nov 2019 08:42:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:37959) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iX3lj-0004sy-6w for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:12 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iX3li-0008Vp-54 for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:11 -0500 Received: from lungold.riddles.org.uk ([82.68.208.19]:57560) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iX3lh-0008R4-TM for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:10 -0500 Received: from [192.168.127.1] (port=38258 helo=caithnard.riddles.org.uk) by lungold.riddles.org.uk with esmtp (Exim 4.92.3 (FreeBSD)) (envelope-from ) id 1iX3lT-0006Pt-2v for bug-guile@gnu.org; Tue, 19 Nov 2019 13:41:55 +0000 Received: from localhost ([127.0.0.1]:23006 helo=caithnard.riddles.org.uk) by caithnard.riddles.org.uk with esmtp (Exim 4.92.3 (FreeBSD)) (envelope-from ) id 1iX3lS-000286-Qa for bug-guile@gnu.org; Tue, 19 Nov 2019 13:41:54 +0000 From: Andrew Gierth Message-ID: <87zhgsyost.fsf@news-spur.riddles.org.uk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix) Date: Tue, 19 Nov 2019 13:41:54 +0000 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 82.68.208.19 X-Spam-Score: -2.3 (--) X-Mailman-Approved-At: Tue, 19 Nov 2019 09:49:52 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) The bug: > (xml->sxml "") $2 = (*TOP* (e ">")) The expected result is (*TOP* (e ">")). In upstream/SSAX.scm: ; procedure+: ssax:read-cdata-body PORT STR-HANDLER SEED [...] ; Within a CDATA section all characters are taken at their face value, ; with only three exceptions: [..] ; > is treated as an embedded #\> character This handling of > is contrary to the XML specification, in which there are no special character sequences inside CDATA except newline and the "]]>" closing tag. I have confirmed this by checking other XML parsers. The code seems to be based on a wild misreading of another section of the specification that does not apply here. (And unfortunately, the W3C validation suite for XML happens not to contain any instances of > inside CDATA.) I believe the fix should be as simple as removing the entire (#\&) case from the function (and fixing the test cases). This bug seems to exist in all versions of SSAX. -- Andrew.