From unknown Sat Jun 14 03:50:10 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#38269 <38269@debbugs.gnu.org> To: bug#38269 <38269@debbugs.gnu.org> Subject: Status: SSAX incorrect handling of > in CDATA Reply-To: bug#38269 <38269@debbugs.gnu.org> Date: Sat, 14 Jun 2025 10:50:10 +0000 retitle 38269 SSAX incorrect handling of > in CDATA reassign 38269 guile submitter 38269 Andrew Gierth severity 38269 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 19 09:49:54 2019 Received: (at submit) by debbugs.gnu.org; 19 Nov 2019 14:49:54 +0000 Received: from localhost ([127.0.0.1]:46838 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iX4pG-0005vv-0N for submit@debbugs.gnu.org; Tue, 19 Nov 2019 09:49:54 -0500 Received: from lists.gnu.org ([209.51.188.17]:53971) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iX3lk-0002OB-P0 for submit@debbugs.gnu.org; Tue, 19 Nov 2019 08:42:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:37959) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iX3lj-0004sy-6w for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:12 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iX3li-0008Vp-54 for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:11 -0500 Received: from lungold.riddles.org.uk ([82.68.208.19]:57560) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iX3lh-0008R4-TM for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:10 -0500 Received: from [192.168.127.1] (port=38258 helo=caithnard.riddles.org.uk) by lungold.riddles.org.uk with esmtp (Exim 4.92.3 (FreeBSD)) (envelope-from ) id 1iX3lT-0006Pt-2v for bug-guile@gnu.org; Tue, 19 Nov 2019 13:41:55 +0000 Received: from localhost ([127.0.0.1]:23006 helo=caithnard.riddles.org.uk) by caithnard.riddles.org.uk with esmtp (Exim 4.92.3 (FreeBSD)) (envelope-from ) id 1iX3lS-000286-Qa for bug-guile@gnu.org; Tue, 19 Nov 2019 13:41:54 +0000 From: Andrew Gierth To: bug-guile@gnu.org Subject: SSAX incorrect handling of > in CDATA Message-ID: <87zhgsyost.fsf@news-spur.riddles.org.uk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix) Date: Tue, 19 Nov 2019 13:41:54 +0000 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 82.68.208.19 X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 19 Nov 2019 09:49:52 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) The bug: > (xml->sxml "") $2 = (*TOP* (e ">")) The expected result is (*TOP* (e ">")). In upstream/SSAX.scm: ; procedure+: ssax:read-cdata-body PORT STR-HANDLER SEED [...] ; Within a CDATA section all characters are taken at their face value, ; with only three exceptions: [..] ; > is treated as an embedded #\> character This handling of > is contrary to the XML specification, in which there are no special character sequences inside CDATA except newline and the "]]>" closing tag. I have confirmed this by checking other XML parsers. The code seems to be based on a wild misreading of another section of the specification that does not apply here. (And unfortunately, the W3C validation suite for XML happens not to contain any instances of > inside CDATA.) I believe the fix should be as simple as removing the entire (#\&) case from the function (and fixing the test cases). This bug seems to exist in all versions of SSAX. -- Andrew.