GNU bug report logs -
#61514
30.0.50; sadistically long xml line hangs emacs
Previous Next
Reported by: "Mark A. Hershberger" <mah <at> everybody.org>
Date: Tue, 14 Feb 2023 21:05:02 UTC
Severity: normal
Found in version 30.0.50
Done: Gregory Heytings <gregory <at> heytings.org>
Bug is archived. No further changes may be made.
Full log
Message #101 received at 61514 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: "Mark A. Hershberger" <mah <at> everybody.org>, 61514 <at> debbugs.gnu.org
> Date: Sun, 19 Feb 2023 18:48:43 -0500
>
> > The problem is in the combination of nxml-mode and some subtle
> > bug/misfeature in our regexp routines. Specifically, when we overflow
> > the fail stack, we fail to recover in this case, and seem to infloop
> > inside re_match_2_internal, or maybe recover very inefficiently (I
> > waited for almost 1 hour before giving up). The call which causes the
> > loop is in xmltok.el, in the indicated line:
> >
> > (defun xmltok-scan-attributes ()
> > (let ((recovering nil)
> > (atts-needing-normalization nil))
> > (while (cond ((or (looking-at (xmltok-attribute regexp))
> > ;; use non-greedy group
> > (when (looking-at (concat "[^<>\n]+?" <<<<<<<<<<<<<<<<<
> > (xmltok-attribute regexp)))
> > (unless recovering
> > (xmltok-add-error "Malformed attribute"
> > (point)
> > (save-excursion
> > (goto-char (xmltok-attribute start
> > name))
> > (skip-chars-backward "\r\n\t ")
> > (point))))
> > t))
> >
> > The regexp that causes this is as follows:
> >
> > "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
>
> IIUC the above describes the code where we're stuck inf-looping inside
> `looking-at`?
Not inflooping, but very slowly backtracking, or so it seems.
> Is it the same place where the regexp-stack overflow happens (and with
> the same regexp)?
It's (almost) the same place, but not the same regexp. The regexp
which causes the stack-overflow message (which is emitted from
set-auto-mode, before entering redisplay) is this:
"\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
makes all the difference. So the looking-at which fails reasonably
quickly is the first call to looking-at above, whereas the one the
"hangs" is the second one. Maybe this points out a way out of this
misery?
This bug report was last modified 2 years and 147 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.