GNU bug report logs - #61514
30.0.50; sadistically long xml line hangs emacs

Previous Next

Package: emacs;

Reported by: "Mark A. Hershberger" <mah <at> everybody.org>

Date: Tue, 14 Feb 2023 21:05:02 UTC

Severity: normal

Found in version 30.0.50

Done: Gregory Heytings <gregory <at> heytings.org>

Bug is archived. No further changes may be made.

Full log


Message #140 received at 61514 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: mah <at> everybody.org, 61514 <at> debbugs.gnu.org
Subject: Re: bug#61514: 30.0.50; sadistically long xml line hangs emacs
Date: Mon, 20 Feb 2023 09:59:30 -0500
Eli Zaretskii [2023-02-20 15:54:52] wrote:

>> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
>> Cc: mah <at> everybody.org,  61514 <at> debbugs.gnu.org
>> Date: Mon, 20 Feb 2023 08:19:26 -0500
>> 
>> >   "\\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=\\(?:[ \r\t\n]*\\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'\\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)\\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"
>> >
>> > As you can see, the prepended "[^<>\n]+?" in the regexp which "hangs"
>> > makes all the difference.  So the looking-at which fails reasonably
>> > quickly is the first call to looking-at above, whereas the one the
>> > "hangs" is the second one.
>> 
>> Yes, it makes a lot of sense now.
>> 
>> > Maybe this points out a way out of this misery?
>> 
>> I think it does.  E.g. there's a chance that using "[^<>\n]+?\\<"
>> instead of "[^<>\n]+?"  avoids the hang
>
> It does, thanks.
>
>> (not sure if it's the right thing to do for all the regexp that can
>> be returned by `xmltok-attribute`, tho).
>
> How would we go about finding out?  Because other than that, changing
> the regexp solves this nasty problem, and all the tests in
> test/lisp/nxml/ still pass.

I did find out: we'll always get the same regexp hre, so it's OK.

It turns out that (xmltok-attribute regexp) doesn't mean to return "the
something of `regexp`" but to return the "the regexp named
`xmltok-attribute`".

`xmltok-attribute` is a funny macro built by `xmltok-defregexp`.

>> And for the stack overflow I haven't yet found its origin.
>
> Not sure what is the mystery here.  AFAIU, we look for the closing
> ">", don't find it, and then start looking for fewer and fewer non-'>'
> characters followed by '>'.  Isn't that what happens here?

Right, but the stack overflows always come from repetitions where
our `mutually_exclusive_p` test fails.  Let's see:

    \\(\\(?:\\(xmlns\\)\\|[_[:alpha:]][-._[:alnum:]]*\\)\\(:[_[:alpha:]][-._[:alnum:]]*\\)?\\)[ \r\t\n]*=

The first two `*` should be non-backtracking because they repeat
[-._[:alnum:]] which is mutually-exclusive with what follows (either `:`
or whitespace, or `=`).  Similarly the third `*` should be
non-backtracking because its body can't match the `=` that must follow.

    \\(?:[\s\r\t\n]*

there aren't enough whitespaces so even if this can backtrack it
shouldn't be the source of our current problems.

    \\('[^<'&\r\n\t]*\\([&\r\n\t][^<']*\\)?'

Neither `*` here should backtrack.

    \\|\"[^<\"&\r\n\t]*\\([&\r\n\t][^<\"]*\\)?\"\\)

Same here.

    \\(?:\\([ \r\t\n]*>\\)\\|\\(?:\\([ \r\t\n]*/\\)\\(>\\)?\\)\\|\\([ \r\t\n]+\\)\\)\\)?"

And here we're back to only repeating whitespace.

What am I missing?


        Stefan





This bug report was last modified 2 years and 147 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.