GNU bug report logs -
#13802
stack overflow in mm-add-meta-html-tag
Previous Next
Reported by: Thien-Thi Nguyen <ttn <at> gnuvola.org>
Date: Sun, 24 Feb 2013 09:18:02 UTC
Severity: normal
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13802 in the body.
You can then email your comments to 13802 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13802
; Package
emacs
.
(Sun, 24 Feb 2013 09:18:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Thien-Thi Nguyen <ttn <at> gnuvola.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 24 Feb 2013 09:18:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I see a "Stack overflow in regexp matcher" error traceable back to
lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
(re-search-forward "\
<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
To allow the user (not me) to continue, i kludged the form to be:
(ignore-errors
(re-search-forward "..." nil t))
that is, wrapping w/ ‘ignore-errors’. Is there a better solution?
One idea (untested) is to replace the ".+" (used to match the charset)
with a more specific pattern. Perhaps "[^<>]+" or "\\sw+"?
Thinking more systematically, maybe Emacs should add a condition
‘stack-overflow/regexp’ (or something like that) such that code can
‘condition-case’ for it and try a fallback path.
--
Thien-Thi Nguyen ..................................... GPG key: 4C807502
. NB: ttn at glug dot org is not me .
. (and has not been since 2007 or so) .
. ACCEPT NO SUBSTITUTES .
........... please send technical questions to mailing lists ...........
[Message part 2 (application/pgp-signature, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org
:
bug#13802
; Package
emacs,gnus
.
(Mon, 25 Feb 2013 00:36:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 13802 <at> debbugs.gnu.org (full text, mbox):
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>
> (re-search-forward "\
> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
>
> To allow the user (not me) to continue, i kludged the form to be:
>
> (ignore-errors
> (re-search-forward "..." nil t))
>
> that is, wrapping w/ ‘ignore-errors’. Is there a better solution?
`sgml-html-meta-auto-coding-function' uses a similar regexp
that doesn't fail with stack overflow. You could get some ideas
from this regexp and sync the regexp in `mm-add-meta-html-tag' with it.
Information forwarded
to
bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org
:
bug#13802
; Package
emacs,gnus
.
(Mon, 25 Feb 2013 02:06:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 13802 <at> debbugs.gnu.org (full text, mbox):
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
> (re-search-forward "\
> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
Hmm... I don't see any obvious reason for a stack overflow unless the
text has some very long lines or a lot of space between elements.
> One idea (untested) is to replace the ".+" (used to match the charset)
> with a more specific pattern. Perhaps "[^<>]+" or "\\sw+"?
I don't think that would help. To avoid such overflow, you need to
reduce the backtracking, i.e. reduce the number of cases where two
options are possible according to the simplistic regexp-optimizer.
\s<CHAR> pattern is actually very poor in this respect, because the
optimizer can't know anything about the chars that this matches (since
it depends on text-properties).
The flip side is that replacing \\s- with [ \t\n] might help (this way,
the optimizer will see that the + repetition does not need backtracking
since a char cannot both match a loop iteration and the "after the
loop" content).
Similarly using [^;'\"]+ instead of \\sw+ would help, and maybe replacing
.+ with [^'\"\n]+ would help as well.
> Thinking more systematically, maybe Emacs should add a condition
> ‘stack-overflow/regexp’ (or something like that) such that code can
> ‘condition-case’ for it and try a fallback path.
In reality, such overflow should only ever happen if you have backrefs
in your regexp.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org
:
bug#13802
; Package
emacs,gnus
.
(Sat, 06 Jul 2013 16:12:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 13802 <at> debbugs.gnu.org (full text, mbox):
Thien-Thi Nguyen <ttn <at> gnuvola.org> writes:
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>
> (re-search-forward "\
> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
Do you know what text it is that triggers this bug?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org
:
bug#13802
; Package
emacs,gnus
.
(Fri, 31 Jan 2014 00:40:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 13802 <at> debbugs.gnu.org (full text, mbox):
Juri Linkov <juri <at> jurta.org> writes:
>> I see a "Stack overflow in regexp matcher" error traceable back to
>> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>>
>> (re-search-forward "\
>> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
>>
>> To allow the user (not me) to continue, i kludged the form to be:
>>
>> (ignore-errors
>> (re-search-forward "..." nil t))
>>
>> that is, wrapping w/ ‘ignore-errors’. Is there a better solution?
>
> `sgml-html-meta-auto-coding-function' uses a similar regexp
> that doesn't fail with stack overflow. You could get some ideas
> from this regexp and sync the regexp in `mm-add-meta-html-tag' with it.
I've adapted the regexp from that function in the patch below, but since
I don't have a test case, I'm not really sure about committing it.
Thien-Thi, could you post the message that triggers this error, or the
relevant bits of it?
diff --git a/lisp/mm-decode.el b/lisp/mm-decode.el
index 17c8fb1..eaf9de4 100644
--- a/lisp/mm-decode.el
+++ b/lisp/mm-decode.el
@@ -1405,9 +1405,7 @@ Return t if meta tag is added or replaced."
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">" charset))
(let ((case-fold-search t))
(goto-char (point-min))
- (if (re-search-forward "\
-<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
-text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\([^\"'>]+\\)\\)?[^>]*>" nil t)
+ (if (re-search-forward "<meta\\s-+\\http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']text/\\(\\sw+\\)\\(?:;\\s-*?charset=[\"']?\\(.+?\\)\\)[\"'\\s-/>]" nil t)
(if (and (not force-charset)
(match-beginning 2)
(string-match "\\`html\\'" (match-string 1)))
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org
:
bug#13802
; Package
emacs,gnus
.
(Fri, 31 Jan 2014 06:08:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 13802 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
() Lars Ingebrigtsen <larsi <at> gnus.org>
() Thu, 30 Jan 2014 16:38:49 -0800
Juri Linkov <juri <at> jurta.org> writes:
> `sgml-html-meta-auto-coding-function' uses a similar regexp that
> doesn't fail with stack overflow. You could get some ideas from
> this regexp and sync the regexp in `mm-add-meta-html-tag' with it.
I've adapted the regexp from that function in the patch below, but
since I don't have a test case, I'm not really sure about committing
it.
Thien-Thi, could you post the message that triggers this error, or
the relevant bits of it?
I'd like to, but no longer have immediate access to that particular
message -- it might take a day or two to excavate (if at all). However,
i do remember it was all on one line (no newlines, machine generated).
diff [...]
- ORIGINAL-HAIRY-REGEXP
+ ANOTER-HAIRY-REGEXP
Maybe this would be a good time to substitute a symbolic regexp?
--
Thien-Thi Nguyen
GPG key: 4C807502
(if you're human and you know it)
read my lisp: (responsep (questions 'technical)
(not (via 'mailing-list)))
=> nil
[Message part 2 (application/pgp-signature, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org
:
bug#13802
; Package
emacs,gnus
.
(Tue, 01 Mar 2016 14:14:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 13802 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> I've adapted the regexp from that function in the patch below, but since
> I don't have a test case, I'm not really sure about committing it.
>
> Thien-Thi, could you post the message that triggers this error, or the
> relevant bits of it?
[...]
> - (if (re-search-forward "\
> -<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> -text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\([^\"'>]+\\)\\)?[^>]*>" nil t)
> + (if (re-search-forward "<meta\\s-+\\http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']text/\\(\\sw+\\)\\(?:;\\s-*?charset=[\"']?\\(.+?\\)\\)[\"'\\s-/>]" nil t)
> (if (and (not force-charset)
Since we have no test case for this, and I haven't seen any other
reports in this area, I'm not applying my patch, and I'm closing this
report. If you see this again, please reopen.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
bug closed, send any further explanations to
13802 <at> debbugs.gnu.org and Thien-Thi Nguyen <ttn <at> gnuvola.org>
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Tue, 01 Mar 2016 14:14:04 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 30 Mar 2016 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 86 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.