From stan@derbycityprints.com Tue Nov 17 14:12:47 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 17 Nov 2009 22:12:47 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.5 required=4.0 tests=AWL,FOURLA autolearn=no version=3.2.5-bugs.debian.org_2005_01_02 Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id nAHMCisp027533 for ; Tue, 17 Nov 2009 14:12:46 -0800 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NAWHz-0001Iv-AZ for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:12:43 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NAWHu-0001I1-Ie for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:12:42 -0500 Received: from [199.232.76.173] (port=43490 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAWHu-0001Hy-Dk for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:12:38 -0500 Received: from mail-yw0-f177.google.com ([209.85.211.177]:46334) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NAWHu-0000Vr-6f for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:12:38 -0500 Received: by ywh7 with SMTP id 7so528071ywh.24 for ; Tue, 17 Nov 2009 14:12:37 -0800 (PST) MIME-Version: 1.0 Sender: stan@derbycityprints.com Received: by 10.150.23.10 with SMTP id 10mr963121ybw.329.1258495957241; Tue, 17 Nov 2009 14:12:37 -0800 (PST) Date: Tue, 17 Nov 2009 17:12:37 -0500 X-Google-Sender-Auth: 6006def3b38056db Message-ID: Subject: `xml-parse-file' returns incorrect results strings after `>' before `<' when CR\LF TAB+ From: MON KEY To: bug-gnu-emacs@gnu.org Content-Type: text/plain; charset=UTF-8 X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) `xml-parse-file' returns incorrect results strings after `>' before `<' when CR\LF TAB+ `xml-parse-file' fails to retrun correct results when there are ^C-j (e.g. CR\LF) followed by \t+ e.g TAB+ after a tag's trailing `>' and before the next tag's leading `<'. IOW the following: ,---- | CR\LF | TAB TAB TAB `---- Returns (:NOTE with my pp-ing to help clarify the problem): ,---- | (ELEMENT nil | ((attr1 . "a1") | (attr2 . "a2") | (attr3 . "a3") | (attr4 . "a4") | (attr5 . "a5") " | " ;; <-i.e. (mapconcat #'char-to-string '(32 10 9 9 9) "") | (NEXT-NODE nil (... `---- Is it if fair/safe to assume that where these types of sequences occur they are not part of the XML and can be removed with a regexp? E.g. : ,---- | (while (search-forward-regexp "\"\)\n[\[:blank:]]+\"\)" nil t) | (replace-match "")) `---- or perhaps: ,---- | (defun cln-xml<-parsed (fname &optional insertp intrp) | "Strip non-sensical strings created by xml-parse-file because of | CR\LF TAB+ following tags/elements. | FNAME is an XML filename path to parse and clean. | When INSERTP is non-nil or called-interactively insert pretty printed lisp | representation of XML file at point. Does not move point." | (interactive "fXML file to parse: \ni\np") | (let (get-xml) | (setq get-xml | (with-temp-buffer | (prin1 (xml-parse-file fname) (current-buffer)) | (goto-char (point-min)) | (while (search-forward-regexp | "\\( \"\n[\[:blank:]]+\\)\"\\(\\(\\()\\)\\|\\( (\\)\\)\\)" nil t) | ;;^^1^^^^^^^^^^^^^^^^^^^^^^^^^2^^3^^^^^^^^^^^^4^^^^^^^^^^^^ | (replace-match "\\2")) | (pp-buffer) | (buffer-substring-no-properties (point-min) (point-max)))) | (if (or insertp intrp) | (save-excursion | (newline) | (princ get-xml (current-buffer))) | get-xml))) `---- :SEE-ALSO (URL `http://lists.gnu.org/archive/html/bug-gnu-emacs/2001-11/msg00052.html') s_P From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 01 07:27:17 2012 Received: (at submit) by debbugs.gnu.org; 1 Jul 2012 11:27:17 +0000 Received: from localhost ([127.0.0.1]:39899 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SlIJB-0005qr-KO for submit@debbugs.gnu.org; Sun, 01 Jul 2012 07:27:17 -0400 Received: from eggs.gnu.org ([208.118.235.92]:49445) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SlIJ9-0005qk-SD for submit@debbugs.gnu.org; Sun, 01 Jul 2012 07:27:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SlIEm-0006gn-Qp for submit@debbugs.gnu.org; Sun, 01 Jul 2012 07:22:45 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:41716) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SlIEm-0006gj-NS for submit@debbugs.gnu.org; Sun, 01 Jul 2012 07:22:44 -0400 Received: from eggs.gnu.org ([208.118.235.92]:37574) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SlIEl-00086S-50 for bug-gnu-emacs@gnu.org; Sun, 01 Jul 2012 07:22:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SlIEj-0006gZ-9K for bug-gnu-emacs@gnu.org; Sun, 01 Jul 2012 07:22:42 -0400 Received: from fencepost.gnu.org ([208.118.235.10]:43914) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SlIEj-0006gV-60 for bug-gnu-emacs@gnu.org; Sun, 01 Jul 2012 07:22:41 -0400 Received: from cm162.gamma80.maxonline.com.sg ([202.156.80.162]:43587 helo=ulysses) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1SlIEg-000855-Hi; Sun, 01 Jul 2012 07:22:39 -0400 From: Chong Yidong To: MON KEY Subject: Re: bug#4950: `xml-parse-file' returns incorrect results strings after `>' before `<' when CR\LF TAB+ References: Date: Sun, 01 Jul 2012 19:22:33 +0800 In-Reply-To: (MON KEY's message of "Tue, 17 Nov 2009 17:12:37 -0500") Message-ID: <874npr6806.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit Cc: 4950@debbugs.gnu.org, bug-gnu-emacs@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) MON KEY writes: > CR\LF > TAB TAB TAB > > Returns (:NOTE with my pp-ing to help clarify the problem): > > (ELEMENT nil > ((attr1 . "a1") > (attr2 . "a2") > (attr3 . "a3") > (attr4 . "a4") > (attr5 . "a5") " > " ;; <-i.e. (mapconcat #'char-to-string '(32 10 9 9 9) "") > (NEXT-NODE nil (... > > Is it if fair/safe to assume that where these types of sequences occur > they are not part of the XML and can be removed with a regexp? No. XML 1.0 Recommendation, Section 2.10 White Space Handling: "An XML processor MUST always pass all characters in a document that are not markup through to the application." From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 01 07:29:07 2012 Received: (at control) by debbugs.gnu.org; 1 Jul 2012 11:29:07 +0000 Received: from localhost ([127.0.0.1]:39906 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SlIKx-0005tz-0T for submit@debbugs.gnu.org; Sun, 01 Jul 2012 07:29:07 -0400 Received: from fencepost.gnu.org ([208.118.235.10]:51609) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SlIKv-0005tt-C6 for control@debbugs.gnu.org; Sun, 01 Jul 2012 07:29:05 -0400 Received: from cm162.gamma80.maxonline.com.sg ([202.156.80.162]:43600 helo=ulysses) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1SlIGY-0008Ag-S4 for control@debbugs.gnu.org; Sun, 01 Jul 2012 07:24:35 -0400 From: Chong Yidong To: control@debbugs.gnu.org Subject: tags 4950 + notabug Date: Sun, 01 Jul 2012 19:24:30 +0800 Message-ID: <877gunzpu9.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) tags 4950 + notabug close 4950 thanks From unknown Sun Aug 17 22:12:10 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 30 Jul 2012 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator