GNU bug report logs -
#22146
Stack overflow in reftex-parse-all
Previous Next
Reported by: Nils Kanning <nils <at> kanning.de>
Date: Fri, 11 Dec 2015 21:19:02 UTC
Severity: normal
Done: Tassilo Horn <tsdh <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22146 in the body.
You can then email your comments to 22146 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Fri, 11 Dec 2015 21:19:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Nils Kanning <nils <at> kanning.de>
:
New bug report received and forwarded. Copy sent to
bug-auctex <at> gnu.org
.
(Fri, 11 Dec 2015 21:19:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Dear all,
I encounter the error "Stack overflow in regexp matcher" if I apply the
function reftex-parse-all to the following file:
\documentclass{article}
\begin{document}
[\}
foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo
...
foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo
\end{document}
Here ... stands for 500 copies of the line above. The file can be
compiled successfully with pdflatex.
I am using the Debian packages emacs24 24.5+1-3 and auctex 11.88-1.1.
The same error also occurs with different versions.
In case you are wondering, the non-matching brackets [\} appear for
example in physics as so-called "super Lie brackets".
Best
Nils
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Fri, 11 Dec 2015 23:38:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Hi Nils,
2015-12-11 22:03 GMT+01:00 Nils Kanning <nils <at> kanning.de>:
> Dear all,
>
> I encounter the error "Stack overflow in regexp matcher" if I apply the
> function reftex-parse-all to the following file:
>
> \documentclass{article}
> \begin{document}
> [\}
> foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo
> ...
> foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo
> \end{document}
>
> Here ... stands for 500 copies of the line above. The file can be
> compiled successfully with pdflatex.
>
> I am using the Debian packages emacs24 24.5+1-3 and auctex 11.88-1.1.
> The same error also occurs with different versions.
We've recently released AUCTeX 11.89, you can grab it from ELPA ;-)
> In case you are wondering, the non-matching brackets [\} appear for
> example in physics as so-called "super Lie brackets".
Thanks for taking the time to report this bug, I can reproduce it.
The culprit is the
(re-search-forward (reftex-everything-regexp) nil t)
within the `reftex-with-special-syntax' macro in
`reftex-parse-from-file' function. Actually in the code it is called
as
(re-search-forward regexp nil t)
I wrote the above line for ease of reproduction.
Of course `reftex-everything-regexp' is an undocumented function that
returns one of two undocumented variables. I can reproduce the bug
only if `reftex-support-index' is non-nil, so in the end the culprit
is `reftex-everything-regexp'.
There is a simple workaround you can put in your code: close the
brackets, also in a comment is fine:
[\} % ]
But I don't know how to really fix the bug, it's too late for me to
decrypt that regexp now. Having a clue of what it should *exactly*
match would be of great help.
Bye,
Mosè
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sat, 12 Dec 2015 07:55:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Mosè Giordano <mose <at> gnu.org> writes:
Hi Nils & Mosé,
>> In case you are wondering, the non-matching brackets [\} appear for
>> example in physics as so-called "super Lie brackets".
>
> Thanks for taking the time to report this bug, I can reproduce it.
I can reproduce it with Emacs 24.5 but not with Emacs 25.0.50.16 from
the current emacs-25 branch. So maybe it has been fixed after the last
release (either the regex, or the Emacs regex matcher has become
better).
If anyone of you could try with Emacs 25 and report back, that'd be
great.
> There is a simple workaround you can put in your code: close the
> brackets, also in a comment is fine:
>
> [\} % ]
>
> But I don't know how to really fix the bug, it's too late for me to
> decrypt that regexp now. Having a clue of what it should *exactly*
> match would be of great help.
Obviously, it should match everything that's interesting to reftex. :-)
But seriously, have a look at the bottom of `reftex-compile-variables'
where `reftex-everything-regexp' is composed from several other regexes
with more local scopes, e.g., one for matching labels, one for sections,
one for index entries, etc.
Bye,
Tassilo
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sat, 12 Dec 2015 09:14:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Hi Mosé and Tassilo,
thanks a lot for your quick replies! With that workaround I can
continue editing :-)
Nils
On Sat, 2015-12-12 at 08:54 +0100, Tassilo Horn wrote:
> Mosè Giordano <mose <at> gnu.org> writes:
>
> Hi Nils & Mosé,
>
> > > In case you are wondering, the non-matching brackets [\} appear
> > > for
> > > example in physics as so-called "super Lie brackets".
> >
> > Thanks for taking the time to report this bug, I can reproduce it.
>
> I can reproduce it with Emacs 24.5 but not with Emacs 25.0.50.16 from
> the current emacs-25 branch. So maybe it has been fixed after the
> last
> release (either the regex, or the Emacs regex matcher has become
> better).
>
> If anyone of you could try with Emacs 25 and report back, that'd be
> great.
>
> > There is a simple workaround you can put in your code: close the
> > brackets, also in a comment is fine:
> >
> > [\} % ]
> >
> > But I don't know how to really fix the bug, it's too late for me to
> > decrypt that regexp now. Having a clue of what it should *exactly*
> > match would be of great help.
>
> Obviously, it should match everything that's interesting to reftex.
> :-)
>
> But seriously, have a look at the bottom of `reftex-compile-
> variables'
> where `reftex-everything-regexp' is composed from several other
> regexes
> with more local scopes, e.g., one for matching labels, one for
> sections,
> one for index entries, etc.
>
> Bye,
> Tassilo
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sat, 12 Dec 2015 09:43:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Hi Tassilo,
2015-12-12 8:54 GMT+01:00 Tassilo Horn <tsdh <at> gnu.org>:
> Mosè Giordano <mose <at> gnu.org> writes:
>
> Hi Nils & Mosé,
>
>>> In case you are wondering, the non-matching brackets [\} appear for
>>> example in physics as so-called "super Lie brackets".
>>
>> Thanks for taking the time to report this bug, I can reproduce it.
>
> I can reproduce it with Emacs 24.5 but not with Emacs 25.0.50.16 from
> the current emacs-25 branch. So maybe it has been fixed after the last
> release (either the regex, or the Emacs regex matcher has become
> better).
`reftex-compile-variables' hasn't been changed in the last two years,
maybe Emacs regexp matcher has been improved. But the error may
indicate of a malformed, or too greedy, regexp.
> If anyone of you could try with Emacs 25 and report back, that'd be
> great.
>
>> There is a simple workaround you can put in your code: close the
>> brackets, also in a comment is fine:
>>
>> [\} % ]
>>
>> But I don't know how to really fix the bug, it's too late for me to
>> decrypt that regexp now. Having a clue of what it should *exactly*
>> match would be of great help.
>
> Obviously, it should match everything that's interesting to reftex. :-)
>
> But seriously, have a look at the bottom of `reftex-compile-variables'
> where `reftex-everything-regexp' is composed from several other regexes
> with more local scopes, e.g., one for matching labels, one for sections,
> one for index entries, etc.
Yes, I saw it, but it was too late to really study it ;-) I can see
there are some labeling and sectioning commands, but there are also
many groups matching something less intuitive at a first glance. I'll
try to have a more thorough look in the next days.
I'm not completely sure how to fix a "Stack overflow in regexp
matcher" error: the problem is that it matches nothing or too much?
My understanding is the former, but the culprit could be the all those
\n\r, that cause the regexp to try and match all the buffer, even if
it fails in the end.
Bye,
Mosè
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sat, 12 Dec 2015 10:06:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Mosè Giordano <mose <at> gnu.org> writes:
> I'm not completely sure how to fix a "Stack overflow in regexp
> matcher" error: the problem is that it matches nothing or too much?
It has incomplete matches in too many different ways. Something like
.*.*
is a trivial example of stuff that can usually match in too many
different ways. A nice regexp is one where each character of an
incomplete match is only a candidate for a single component of the
regexp.
An ugly regexp is one where adding one character to the match grows the
possibly match/pattern correspondences by more than one repeatedly.
--
David Kastrup
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sun, 13 Dec 2015 11:55:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Hi David,
2015-12-12 11:05 GMT+01:00 David Kastrup <dak <at> gnu.org>:
> Mosè Giordano <mose <at> gnu.org> writes:
>
>> I'm not completely sure how to fix a "Stack overflow in regexp
>> matcher" error: the problem is that it matches nothing or too much?
>
> It has incomplete matches in too many different ways. Something like
>
> .*.*
>
> is a trivial example of stuff that can usually match in too many
> different ways. A nice regexp is one where each character of an
> incomplete match is only a candidate for a single component of the
> regexp.
>
> An ugly regexp is one where adding one character to the match grows the
> possibly match/pattern correspondences by more than one repeatedly.
Thanks for the explanation, but in this concrete case I can reproduce
the bug in Emacs 24.5 evaluating
(re-search-forward "\\[[^]]*\\<label")
at the beginning of the buffer suggested by Nils. So, it doesn't seem
there is an incomplete match, but simply "\\[[^]]*" is too greedy.
This also explains why closing the bracket works around the bug.
Tassilo, does `reftex-parse-all' fail with larger buffers in Emacs 25?
If so, we can make the regexp less greedy by specifying the maximum
number of non-closing brackets characters, something like
"\\[[^]]\{0,1000\}\\<label"
I don't know if this can somehow affect performances.
Bye,
Mosè
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sun, 13 Dec 2015 15:12:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Mosè Giordano <mose <at> gnu.org> writes:
>
> (re-search-forward "\\[[^]]*\\<label")
>
> at the beginning of the buffer suggested by Nils. So, it doesn't seem
> there is an incomplete match, but simply "\\[[^]]*" is too greedy.
This expression contains only a single explicit wildcard. A stack
overflow for it most certainly is a bug. Is this really fixed in 25.1
or is there just a larger stack? Maybe make the test case significantly
larger and check that still no stack overflow occurs.
--
David Kastrup
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sun, 13 Dec 2015 16:56:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 22146 <at> debbugs.gnu.org (full text, mbox):
David Kastrup <dak <at> gnu.org> writes:
>> (re-search-forward "\\[[^]]*\\<label")
>>
>> at the beginning of the buffer suggested by Nils. So, it doesn't
>> seem there is an incomplete match, but simply "\\[[^]]*" is too
>> greedy.
>
> This expression contains only a single explicit wildcard. A stack
> overflow for it most certainly is a bug. Is this really fixed in 25.1
> or is there just a larger stack?
I also get a stack overflow with this starting with a buffer size of
about 800 lines. But I can run
(re-search-forward reftex-everything-regexp nil t)
without problems even after adding 100.000 more "foo foo foo..." lines
to the test file...
Ah, `reftex-everything-regexp' doesn't include that problematic regex
anymore.
Now it contains "\\[[^[]]*\\<label"
instead of "\\[[^]]*\\<label".
Well, that new regexp has not this problem because it is wrong and
doesn't match keyval style labels anymore. That problem has been
introduced by:
--8<---------------cut here---------------start------------->8---
commit 32a488344057f210b51f4618feb3a85799eef0c5
Author: Nils Ackermann <nils <at> ackermath.info>
Date: Tue Jun 16 09:24:47 2015 +0200
Improve reftex-label-regexps default value
* lisp/textmodes/reftex-vars.el (reftex-label-regexps): Make
keyvals label regexp more strict to better cope with unbalanced
brackets common in math documents.
--8<---------------cut here---------------end--------------->8---
I just now changed the regexp (`reftex-label-regexps') to
"\\[[^][]*\\<label[[:space:]]*=[[:space:]]*{?\\(?1:[^],}]+\\)}?"
(the stuff after \\<label has been there before, too) which works again
in the sense it matches keyval style labels but unfortunately also
causes the stack overflow.
Bye,
Tassilo
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sun, 13 Dec 2015 17:13:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Hi Tassilo,
2015-12-13 17:55 GMT+01:00 Tassilo Horn <tsdh <at> gnu.org>:
> David Kastrup <dak <at> gnu.org> writes:
>
>>> (re-search-forward "\\[[^]]*\\<label")
>>>
>>> at the beginning of the buffer suggested by Nils. So, it doesn't
>>> seem there is an incomplete match, but simply "\\[[^]]*" is too
>>> greedy.
>>
>> This expression contains only a single explicit wildcard. A stack
>> overflow for it most certainly is a bug. Is this really fixed in 25.1
>> or is there just a larger stack?
>
> I also get a stack overflow with this starting with a buffer size of
> about 800 lines. But I can run
>
> (re-search-forward reftex-everything-regexp nil t)
>
> without problems even after adding 100.000 more "foo foo foo..." lines
> to the test file...
Uh, in Emacs 24.5
(re-search-forward "\\[[^]]*\\<label" nil t)
throws the stack overflow error, indeed I tested also this
possibility. So are you saying that in Emacs 25.1 the noerror option
prevents stack overflow? Anyway, how about my suggestion to use
"\\[[^]]\{0,1000\}\\<label"
? 1000 characters are 12.5 80-column-wide lines, I think it's large enough.
Bye,
Mosè
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sun, 13 Dec 2015 17:47:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Mosè Giordano <mose <at> gnu.org> writes:
>>>> (re-search-forward "\\[[^]]*\\<label")
>>>>
>>>> at the beginning of the buffer suggested by Nils. So, it doesn't
>>>> seem there is an incomplete match, but simply "\\[[^]]*" is too
>>>> greedy.
>>>
>>> This expression contains only a single explicit wildcard. A stack
>>> overflow for it most certainly is a bug. Is this really fixed in
>>> 25.1 or is there just a larger stack?
>>
>> I also get a stack overflow with this starting with a buffer size of
>> about 800 lines. But I can run
>>
>> (re-search-forward reftex-everything-regexp nil t)
>>
>> without problems even after adding 100.000 more "foo foo foo..."
>> lines to the test file...
>
> Uh, in Emacs 24.5
>
> (re-search-forward "\\[[^]]*\\<label" nil t)
>
> throws the stack overflow error, indeed I tested also this
> possibility. So are you saying that in Emacs 25.1 the noerror option
> prevents stack overflow?
No, in Emacs 25 the regexp was broken in a way that its meaning was not
"[ and then many times something different than ]" but "[, then not
another [, and then zero or many times a ]". Now that I fixed the
regexp again, Emacs 25 breaks in the same way.
> Anyway, how about my suggestion to use
>
> "\\[[^]]\{0,1000\}\\<label"
>
> ? 1000 characters are 12.5 80-column-wide lines, I think it's large enough.
Isn't the syntax "\\[[^]]\\{0,1000\\}\\<label"? I tried going up to a
maximum of 30000 and it still didn't overflow, so I think that's an
appropriate fix (say, maybe 2000 instead of 1000 to be extra safe).
Feel free to change `reftex-label-regexps' accordingly.
Interestingly, "x\\{0,40000\\}" gives an error:
(invalid-regexp "Invalid content of \\{\\}")
So there seems to be a (pretty random) limit...
Bye,
Tassilo
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sun, 13 Dec 2015 17:53:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Tassilo Horn <tsdh <at> gnu.org> writes:
> Mosè Giordano <mose <at> gnu.org> writes:
> Isn't the syntax "\\[[^]]\\{0,1000\\}\\<label"? I tried going up to a
> maximum of 30000 and it still didn't overflow, so I think that's an
> appropriate fix (say, maybe 2000 instead of 1000 to be extra safe).
> Feel free to change `reftex-label-regexps' accordingly.
>
> Interestingly, "x\\{0,40000\\}" gives an error:
>
> (invalid-regexp "Invalid content of \\{\\}")
>
> So there seems to be a (pretty random) limit...
I'd guess that the regexp library translates this into a "short", so the
limit would be SHORT_MAX, namely 32767.
--
David Kastrup
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sun, 13 Dec 2015 18:01:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 22146 <at> debbugs.gnu.org (full text, mbox):
2015-12-13 18:46 GMT+01:00 Tassilo Horn <tsdh <at> gnu.org>:
> Mosè Giordano <mose <at> gnu.org> writes:
>
>>>>> (re-search-forward "\\[[^]]*\\<label")
>>>>>
>>>>> at the beginning of the buffer suggested by Nils. So, it doesn't
>>>>> seem there is an incomplete match, but simply "\\[[^]]*" is too
>>>>> greedy.
>>>>
>>>> This expression contains only a single explicit wildcard. A stack
>>>> overflow for it most certainly is a bug. Is this really fixed in
>>>> 25.1 or is there just a larger stack?
>>>
>>> I also get a stack overflow with this starting with a buffer size of
>>> about 800 lines. But I can run
>>>
>>> (re-search-forward reftex-everything-regexp nil t)
>>>
>>> without problems even after adding 100.000 more "foo foo foo..."
>>> lines to the test file...
>>
>> Uh, in Emacs 24.5
>>
>> (re-search-forward "\\[[^]]*\\<label" nil t)
>>
>> throws the stack overflow error, indeed I tested also this
>> possibility. So are you saying that in Emacs 25.1 the noerror option
>> prevents stack overflow?
>
> No, in Emacs 25 the regexp was broken in a way that its meaning was not
> "[ and then many times something different than ]" but "[, then not
> another [, and then zero or many times a ]". Now that I fixed the
> regexp again, Emacs 25 breaks in the same way.
>
>> Anyway, how about my suggestion to use
>>
>> "\\[[^]]\{0,1000\}\\<label"
>>
>> ? 1000 characters are 12.5 80-column-wide lines, I think it's large enough.
>
> Isn't the syntax "\\[[^]]\\{0,1000\\}\\<label"?
Uh, yes, sorry.
> I tried going up to a
> maximum of 30000 and it still didn't overflow,
It does for me, from 16667 to be precise.
> so I think that's an
> appropriate fix (say, maybe 2000 instead of 1000 to be extra safe).
> Feel free to change `reftex-label-regexps' accordingly.
I don't have write access to Emacs repo :-)
Bye,
Mosè
Information forwarded
to
bug-auctex <at> gnu.org
:
bug#22146
; Package
auctex
.
(Sun, 13 Dec 2015 18:49:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 22146 <at> debbugs.gnu.org (full text, mbox):
Mosè Giordano <mose <at> gnu.org> writes:
>> I tried going up to a maximum of 30000 and it still didn't overflow,
>
> It does for me, from 16667 to be precise.
Interesting.
>> so I think that's an appropriate fix (say, maybe 2000 instead of 1000
>> to be extra safe). Feel free to change `reftex-label-regexps'
>> accordingly.
>
> I don't have write access to Emacs repo :-)
Good excuse for not having to touch reftex. :-)
I've changed it now to use \{0,2000\} instead of *.
Bye,
Tassilo
Reply sent
to
Tassilo Horn <tsdh <at> gnu.org>
:
You have taken responsibility.
(Sun, 13 Dec 2015 18:56:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Nils Kanning <nils <at> kanning.de>
:
bug acknowledged by developer.
(Sun, 13 Dec 2015 18:56:02 GMT)
Full text and
rfc822 format available.
Message #49 received at 22146-done <at> debbugs.gnu.org (full text, mbox):
David Kastrup <dak <at> gnu.org> writes:
>> Isn't the syntax "\\[[^]]\\{0,1000\\}\\<label"? I tried going up to
>> a maximum of 30000 and it still didn't overflow, so I think that's an
>> appropriate fix (say, maybe 2000 instead of 1000 to be extra safe).
>> Feel free to change `reftex-label-regexps' accordingly.
>>
>> Interestingly, "x\\{0,40000\\}" gives an error:
>>
>> (invalid-regexp "Invalid content of \\{\\}")
>>
>> So there seems to be a (pretty random) limit...
>
> I'd guess that the regexp library translates this into a "short", so
> the limit would be SHORT_MAX, namely 32767.
(re-search-forward "x\\{0,32767\\}") => works
(re-search-forward "x\\{0,32768\\}") => error
Yup, of course you're right.
Anyway, I'm closing this bug with the recent emacs/reftex commit which
looks at a label=... at most 2000 chars after the opening [.
Bye,
Tassilo
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 11 Jan 2016 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 163 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.