GNU bug report logs - #38269
SSAX incorrect handling of > in CDATA

Previous Next

Package: guile;

Reported by: Andrew Gierth <andrew <at> tao11.riddles.org.uk>

Date: Tue, 19 Nov 2019 14:50:01 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Andrew Gierth <andrew <at> tao11.riddles.org.uk>
To: 38269 <at> debbugs.gnu.org
Subject: bug#38269: SSAX incorrect handling of &gt; in CDATA
Date: Tue, 19 Nov 2019 13:41:54 +0000
The bug:

> (xml->sxml "<e><![CDATA[&gt;]]></e>")
$2 = (*TOP* (e ">"))

The expected result is (*TOP* (e "&gt;")).

In upstream/SSAX.scm:

; procedure+: 	ssax:read-cdata-body PORT STR-HANDLER SEED
[...]
; Within a CDATA section all characters are taken at their face value,
; with only three exceptions:
[..]
;	&gt; is treated as an embedded #\> character

This handling of &gt; is contrary to the XML specification, in which
there are no special character sequences inside CDATA except newline and
the "]]>" closing tag. I have confirmed this by checking other XML
parsers. The code seems to be based on a wild misreading of another
section of the specification that does not apply here. (And
unfortunately, the W3C validation suite for XML happens not to contain
any instances of &gt; inside CDATA.)

I believe the fix should be as simple as removing the entire (#\&) case
from the function (and fixing the test cases).

This bug seems to exist in all versions of SSAX.

-- 
Andrew.




This bug report was last modified 5 years and 208 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.