GNU bug report logs - #13802
stack overflow in mm-add-meta-html-tag

Previous Next

Packages: emacs, gnus;

Reported by: Thien-Thi Nguyen <ttn <at> gnuvola.org>

Date: Sun, 24 Feb 2013 09:18:02 UTC

Severity: normal

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13802 in the body.
You can then email your comments to 13802 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#13802; Package emacs. (Sun, 24 Feb 2013 09:18:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Thien-Thi Nguyen <ttn <at> gnuvola.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 24 Feb 2013 09:18:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Thien-Thi Nguyen <ttn <at> gnuvola.org>
To: bug-gnu-emacs <at> gnu.org
Subject: stack overflow in mm-add-meta-html-tag
Date: Sun, 24 Feb 2013 10:17:53 +0100
[Message part 1 (text/plain, inline)]
I see a "Stack overflow in regexp matcher" error traceable back to
lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:

  (re-search-forward "\
  <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
  text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)

To allow the user (not me) to continue, i kludged the form to be:

  (ignore-errors
    (re-search-forward "..." nil t))

that is, wrapping w/ ‘ignore-errors’.  Is there a better solution?

One idea (untested) is to replace the ".+" (used to match the charset)
with a more specific pattern.  Perhaps "[^<>]+" or "\\sw+"?

Thinking more systematically, maybe Emacs should add a condition
‘stack-overflow/regexp’ (or something like that) such that code can
‘condition-case’ for it and try a fallback path.

-- 
Thien-Thi Nguyen ..................................... GPG key: 4C807502
.                  NB: ttn at glug dot org is not me                   .
.                 (and has not been since 2007 or so)                  .
.                        ACCEPT NO SUBSTITUTES                         .
........... please send technical questions to mailing lists ...........
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#13802; Package emacs,gnus. (Mon, 25 Feb 2013 00:36:01 GMT) Full text and rfc822 format available.

Message #8 received at 13802 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: Thien-Thi Nguyen <ttn <at> gnuvola.org>
Cc: 13802 <at> debbugs.gnu.org
Subject: Re: bug#13802: stack overflow in mm-add-meta-html-tag
Date: Mon, 25 Feb 2013 02:20:39 +0200
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>
>   (re-search-forward "\
>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
>
> To allow the user (not me) to continue, i kludged the form to be:
>
>   (ignore-errors
>     (re-search-forward "..." nil t))
>
> that is, wrapping w/ ‘ignore-errors’.  Is there a better solution?

`sgml-html-meta-auto-coding-function' uses a similar regexp
that doesn't fail with stack overflow.  You could get some ideas
from this regexp and sync the regexp in `mm-add-meta-html-tag' with it.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#13802; Package emacs,gnus. (Mon, 25 Feb 2013 02:06:02 GMT) Full text and rfc822 format available.

Message #11 received at 13802 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Thien-Thi Nguyen <ttn <at> gnuvola.org>
Cc: 13802 <at> debbugs.gnu.org
Subject: Re: bug#13802: stack overflow in mm-add-meta-html-tag
Date: Sun, 24 Feb 2013 21:04:21 -0500
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:

>   (re-search-forward "\
>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)

Hmm... I don't see any obvious reason for a stack overflow unless the
text has some very long lines or a lot of space between elements.

> One idea (untested) is to replace the ".+" (used to match the charset)
> with a more specific pattern.  Perhaps "[^<>]+" or "\\sw+"?

I don't think that would help.  To avoid such overflow, you need to
reduce the backtracking, i.e. reduce the number of cases where two
options are possible according to the simplistic regexp-optimizer.
\s<CHAR> pattern is actually very poor in this respect, because the
optimizer can't know anything about the chars that this matches (since
it depends on text-properties).
The flip side is that replacing \\s- with [ \t\n] might help (this way,
the optimizer will see that the + repetition does not need backtracking
since a char cannot both match a loop iteration and the "after the
loop" content).
Similarly using [^;'\"]+ instead of \\sw+ would help, and maybe replacing
.+ with [^'\"\n]+ would help as well.

> Thinking more systematically, maybe Emacs should add a condition
> ‘stack-overflow/regexp’ (or something like that) such that code can
> ‘condition-case’ for it and try a fallback path.

In reality, such overflow should only ever happen if you have backrefs
in your regexp.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#13802; Package emacs,gnus. (Sat, 06 Jul 2013 16:12:02 GMT) Full text and rfc822 format available.

Message #14 received at 13802 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Thien-Thi Nguyen <ttn <at> gnuvola.org>
Cc: 13802 <at> debbugs.gnu.org
Subject: Re: bug#13802: stack overflow in mm-add-meta-html-tag
Date: Sat, 06 Jul 2013 18:11:00 +0200
Thien-Thi Nguyen <ttn <at> gnuvola.org> writes:

> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>
>   (re-search-forward "\
>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)

Do you know what text it is that triggers this bug?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#13802; Package emacs,gnus. (Fri, 31 Jan 2014 00:40:01 GMT) Full text and rfc822 format available.

Message #17 received at 13802 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Juri Linkov <juri <at> jurta.org>
Cc: Thien-Thi Nguyen <ttn <at> gnuvola.org>, 13802 <at> debbugs.gnu.org
Subject: Re: bug#13802: stack overflow in mm-add-meta-html-tag
Date: Thu, 30 Jan 2014 16:38:49 -0800
Juri Linkov <juri <at> jurta.org> writes:

>> I see a "Stack overflow in regexp matcher" error traceable back to
>> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>>
>>   (re-search-forward "\
>>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
>>
>> To allow the user (not me) to continue, i kludged the form to be:
>>
>>   (ignore-errors
>>     (re-search-forward "..." nil t))
>>
>> that is, wrapping w/ ‘ignore-errors’.  Is there a better solution?
>
> `sgml-html-meta-auto-coding-function' uses a similar regexp
> that doesn't fail with stack overflow.  You could get some ideas
> from this regexp and sync the regexp in `mm-add-meta-html-tag' with it.

I've adapted the regexp from that function in the patch below, but since
I don't have a test case, I'm not really sure about committing it.

Thien-Thi, could you post the message that triggers this error, or the
relevant bits of it?

diff --git a/lisp/mm-decode.el b/lisp/mm-decode.el
index 17c8fb1..eaf9de4 100644
--- a/lisp/mm-decode.el
+++ b/lisp/mm-decode.el
@@ -1405,9 +1405,7 @@ Return t if meta tag is added or replaced."
 <meta http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">" charset))
       (let ((case-fold-search t))
 	(goto-char (point-min))
-	(if (re-search-forward "\
-<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
-text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\([^\"'>]+\\)\\)?[^>]*>" nil t)
+	(if (re-search-forward "<meta\\s-+\\http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']text/\\(\\sw+\\)\\(?:;\\s-*?charset=[\"']?\\(.+?\\)\\)[\"'\\s-/>]" nil t)
 	    (if (and (not force-charset)
 		     (match-beginning 2)
 		     (string-match "\\`html\\'" (match-string 1)))


-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#13802; Package emacs,gnus. (Fri, 31 Jan 2014 06:08:02 GMT) Full text and rfc822 format available.

Message #20 received at 13802 <at> debbugs.gnu.org (full text, mbox):

From: Thien-Thi Nguyen <ttn <at> gnuvola.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Juri Linkov <juri <at> jurta.org>, 13802 <at> debbugs.gnu.org
Subject: Re: bug#13802: stack overflow in mm-add-meta-html-tag
Date: Fri, 31 Jan 2014 07:10:28 +0100
[Message part 1 (text/plain, inline)]
() Lars Ingebrigtsen <larsi <at> gnus.org>
() Thu, 30 Jan 2014 16:38:49 -0800

   Juri Linkov <juri <at> jurta.org> writes:

   > `sgml-html-meta-auto-coding-function' uses a similar regexp that
   > doesn't fail with stack overflow.  You could get some ideas from
   > this regexp and sync the regexp in `mm-add-meta-html-tag' with it.

   I've adapted the regexp from that function in the patch below, but
   since I don't have a test case, I'm not really sure about committing
   it.

   Thien-Thi, could you post the message that triggers this error, or
   the relevant bits of it?

I'd like to, but no longer have immediate access to that particular
message -- it might take a day or two to excavate (if at all).  However,
i do remember it was all on one line (no newlines, machine generated).

   diff [...]
   - ORIGINAL-HAIRY-REGEXP
   + ANOTER-HAIRY-REGEXP

Maybe this would be a good time to substitute a symbolic regexp?

-- 
Thien-Thi Nguyen
   GPG key: 4C807502
   (if you're human and you know it)
      read my lisp: (responsep (questions 'technical)
                               (not (via 'mailing-list)))
                     => nil
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#13802; Package emacs,gnus. (Tue, 01 Mar 2016 14:14:02 GMT) Full text and rfc822 format available.

Message #23 received at 13802 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Juri Linkov <juri <at> jurta.org>
Cc: 13802 <at> debbugs.gnu.org, Thien-Thi Nguyen <ttn <at> gnuvola.org>
Subject: Re: bug#13802: stack overflow in mm-add-meta-html-tag
Date: Tue, 01 Mar 2016 16:58:27 +1100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> I've adapted the regexp from that function in the patch below, but since
> I don't have a test case, I'm not really sure about committing it.
>
> Thien-Thi, could you post the message that triggers this error, or the
> relevant bits of it?

[...]

> -	(if (re-search-forward "\
> -<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> -text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\([^\"'>]+\\)\\)?[^>]*>" nil t)
> +	(if (re-search-forward "<meta\\s-+\\http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']text/\\(\\sw+\\)\\(?:;\\s-*?charset=[\"']?\\(.+?\\)\\)[\"'\\s-/>]" nil t)
>  	    (if (and (not force-charset)

Since we have no test case for this, and I haven't seen any other
reports in this area, I'm not applying my patch, and I'm closing this
report.  If you see this again, please reopen.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug closed, send any further explanations to 13802 <at> debbugs.gnu.org and Thien-Thi Nguyen <ttn <at> gnuvola.org> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Tue, 01 Mar 2016 14:14:04 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 30 Mar 2016 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 86 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.