GNU bug report logs - #26533
26.0.50; xml-parse-region's symbol-qname argument is ignored

Previous Next

Package: emacs;

Reported by: Christopher Wellons <wellons <at> nullprogram.com>

Date: Sun, 16 Apr 2017 12:49:01 UTC

Severity: normal

Found in version 26.0.50

Done: David Engster <deng <at> randomsample.de>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 26533 in the body.
You can then email your comments to 26533 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#26533; Package emacs. (Sun, 16 Apr 2017 12:49:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Christopher Wellons <wellons <at> nullprogram.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 16 Apr 2017 12:49:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Christopher Wellons <wellons <at> nullprogram.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.0.50; xml-parse-region's symbol-qname argument is ignored
Date: Sun, 16 Apr 2017 08:36:07 -0400
A bug was introduced in aea67018 that causes the special "symbol-qnames"
value for PARSE-NS to be ignored, as if it were nil. This information is
discarded by the change to xml-parse-attlist, so functions further down
the line see the argument as if it was set to nil.

Here's an example of the bug:

    (with-temp-buffer
      (insert "<root a:b='c'></root>")
      (let ((xml-default-ns ()))
        (xml-parse-region nil nil nil nil 'symbol-qnames)))

Prior to this commit (Emacs 25.1 and earlier) the result is:

    ((root ((b . "c"))))

After this commit:

    ((root ((a:b . "c"))))

This is the same as PARSE-NS being set to nil.




Reply sent to David Engster <deng <at> randomsample.de>:
You have taken responsibility. (Mon, 17 Apr 2017 15:34:02 GMT) Full text and rfc822 format available.

Notification sent to Christopher Wellons <wellons <at> nullprogram.com>:
bug acknowledged by developer. (Mon, 17 Apr 2017 15:34:02 GMT) Full text and rfc822 format available.

Message #10 received at 26533-done <at> debbugs.gnu.org (full text, mbox):

From: David Engster <deng <at> randomsample.de>
To: Christopher Wellons <wellons <at> nullprogram.com>
Cc: 26533-done <at> debbugs.gnu.org
Subject: Re: bug#26533: 26.0.50;
 xml-parse-region's symbol-qname argument is ignored
Date: Mon, 17 Apr 2017 17:33:07 +0200
Christopher Wellons writes:
> A bug was introduced in aea67018 that causes the special "symbol-qnames"
> value for PARSE-NS to be ignored, as if it were nil. This information is
> discarded by the change to xml-parse-attlist, so functions further down
> the line see the argument as if it was set to nil.
>
> Here's an example of the bug:
>
>     (with-temp-buffer
>       (insert "<root a:b='c'></root>")
>       (let ((xml-default-ns ()))
>         (xml-parse-region nil nil nil nil 'symbol-qnames)))
>
> Prior to this commit (Emacs 25.1 and earlier) the result is:
>
>     ((root ((b . "c"))))
>
> After this commit:
>
>     ((root ((a:b . "c"))))
>
> This is the same as PARSE-NS being set to nil.

Thanks for the report.

You are right that the fix for bug #23440 was not correct. I now pushed
a hopefully better version to master.

Note however that your test above has two problems: First, it's invalid
XML since you're using an undeclared prefix (so the parser should rather
throw an error, but I'm not eager to make the xml parser more strict, as
there's a lot of invalid XML in the wild). Second, I don't understand
why you let-bind `xml-default-ns' to nil. This will break namespace
expansion, and it will actually do this for the whole Emacs session if
xml.el gets autoloaded during the above.

-David




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#26533; Package emacs. (Mon, 17 Apr 2017 16:30:02 GMT) Full text and rfc822 format available.

Message #13 received at 26533-done <at> debbugs.gnu.org (full text, mbox):

From: Christopher Wellons <wellons <at> nullprogram.com>
To: David Engster <deng <at> randomsample.de>
Cc: 26533-done <at> debbugs.gnu.org
Subject: Re: bug#26533: 26.0.50;
 xml-parse-region's symbol-qname argument is ignored
Date: Mon, 17 Apr 2017 12:29:15 -0400
Thanks, David! Your fix works fine as far as I can tell.

I'm using this trick in Elfeed (a syndication feed reader) as a fast
method to strip all namespaces from the XML as it's being parsed. As you
said, there's a lot of invalid XML in the wild. I've found it works a
lot better to ignore namespaces and strictness, instead extracting the
required information heuristically as long as it's reasonably close.
Otherwise there would be a whole lot more feeds that wouldn't work well,
or at all, in Elfeed.

I had noticed with symbol-qnames that xml-parse-region drops unknown
namespaces. Since this information comes from an alist, that seemed like
reasonable behavior and I assumed it was intentional -- though signaling
an error would also be reasonable. To tightly control which namespaces
are stripped, I bind xml-default-ns to my own alist for that call. This
feels like the natural and lispy way to use this function.

The file that binds xml-default-ns requires the xml package explicitly,
so there's no risk of it autoloading while it's bound. Though that's an
interesting consequence I hadn't considered before. I _have_ seen
similar issues with accept-process-output when arbitrary process events
are handled while the stack is in an unusual state.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 16 May 2017 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 33 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.