GNU bug report logs - #61514
30.0.50; sadistically long xml line hangs emacs

Previous Next

Package: emacs;

Reported by: "Mark A. Hershberger" <mah <at> everybody.org>

Date: Tue, 14 Feb 2023 21:05:02 UTC

Severity: normal

Found in version 30.0.50

Done: Gregory Heytings <gregory <at> heytings.org>

Bug is archived. No further changes may be made.

Full log


Message #101 received at 61514 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: mah <at> everybody.org, 61514 <at> debbugs.gnu.org
Subject: Re: bug#61514: 30.0.50; sadistically long xml line hangs emacs
Date: Mon, 20 Feb 2023 14:19:18 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: "Mark A. Hershberger" <mah <at> everybody.org>,  61514 <at> debbugs.gnu.org
> Date: Sun, 19 Feb 2023 18:48:43 -0500
> 
> > The problem is in the combination of nxml-mode and some subtle
> > bug/misfeature in our regexp routines.  Specifically, when we overflow
> > the fail stack, we fail to recover in this case, and seem to infloop
> > inside re_match_2_internal, or maybe recover very inefficiently (I
> > waited for almost 1 hour before giving up).  The call which causes the
> > loop is in xmltok.el, in the indicated line:
> >
> > (defun xmltok-scan-attributes ()
> >   (let ((recovering nil)
> > 	(atts-needing-normalization nil))
> >     (while (cond ((or (looking-at (xmltok-attribute regexp))
> > 		      ;; use non-greedy group
> > 		      (when (looking-at (concat "[^<>\n]+?"  <<<<<<<<<<<<<<<<<
> > 						(xmltok-attribute regexp)))
> > 			(unless recovering
> > 			  (xmltok-add-error "Malformed attribute"
> > 					    (point)
> > 					    (save-excursion
> > 					      (goto-char (xmltok-attribute start
> > 									   name))
> > 					      (skip-chars-backward "\r\n\t ")
> > 					      (point))))
> > 			t))
> >
> > The regexp that causes this is as follows:
> >
> >   "[^<>\n]+?\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
> 
> IIUC the above describes the code where we're stuck inf-looping inside
> `looking-at`?

Not inflooping, but very slowly backtracking, or so it seems.

> Is it the same place where the regexp-stack overflow happens (and with
> the same regexp)?

It's (almost) the same place, but not the same regexp.  The regexp
which causes the stack-overflow message (which is emitted from
set-auto-mode, before entering redisplay) is this:

  "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"

As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
makes all the difference.  So the looking-at which fails reasonably
quickly is the first call to looking-at above, whereas the one the
"hangs" is the second one.  Maybe this points out a way out of this
misery?




This bug report was last modified 2 years and 147 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.