GNU bug report logs -
#61514
30.0.50; sadistically long xml line hangs emacs
Previous Next
Reported by: "Mark A. Hershberger" <mah <at> everybody.org>
Date: Tue, 14 Feb 2023 21:05:02 UTC
Severity: normal
Found in version 30.0.50
Done: Gregory Heytings <gregory <at> heytings.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> Date: Tue, 14 Feb 2023 16:02:04 -0500
> From: "Mark A. Hershberger" via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>
>
> There seems to be a regression between 28 and 30 with how emacs handles
> long lines.
No, there's no regression with long lines. There's an existing bug in
our regexp routines and/or nxml. See below.
> Bottom line: Emacs 30 is handling files with long lines worse than Emacs
> 28.
This conclusion is incorrect, or at least inaccurate. Emacs 28.2 has
the same problem as Emacs 30. Take that a.xml file, truncate it after
250000 characters, then visit it with Emacs 28.2 -- you will see that
Emacs 28.2 freezes exactly like Emacs 30 does.
The problem is in the combination of nxml-mode and some subtle
bug/misfeature in our regexp routines. Specifically, when we overflow
the fail stack, we fail to recover in this case, and seem to infloop
inside re_match_2_internal, or maybe recover very inefficiently (I
waited for almost 1 hour before giving up). The call which causes the
loop is in xmltok.el, in the indicated line:
(defun xmltok-scan-attributes ()
(let ((recovering nil)
(atts-needing-normalization nil))
(while (cond ((or (looking-at (xmltok-attribute regexp))
;; use non-greedy group
(when (looking-at (concat "[^<>\n]+?" <<<<<<<<<<<<<<<<<
(xmltok-attribute regexp)))
(unless recovering
(xmltok-add-error "Malformed attribute"
(point)
(save-excursion
(goto-char (xmltok-attribute start
name))
(skip-chars-backward "\r\n\t ")
(point))))
t))
The regexp that causes this is as follows:
"[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
This bug report was last modified 2 years and 147 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.