GNU bug report logs -
#24751
26.0.50; Regex stack overflow not detected properly (gets "Variable binding depth exceeds max-specpdl-size")
Previous Next
Reported by: npostavs <at> users.sourceforge.net
Date: Fri, 21 Oct 2016 03:54:01 UTC
Severity: normal
Tags: fixed, patch
Found in version 26.0.50
Fixed in version 26.1
Done: npostavs <at> users.sourceforge.net
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 24751 in the body.
You can then email your comments to 24751 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Fri, 21 Oct 2016 03:54:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
npostavs <at> users.sourceforge.net
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 21 Oct 2016 03:54:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
This is a followup from Bug #24358, since that thread filled up with
messages about the crash in 25.1, so I'm opening a new bug for the wrong
regex stack overflow check (this affects only the master branch).
To reproduce the error run (re-search-forward ".*\\(\n.*\\)*" nil t) in
a buffer with contents of the attached bug-24358-regex-max-specpdl.txt.
[bug-24358-regex-max-specpdl.txt (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
Here is my analysis from
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24358#27 again:
[Message part 4 (message/rfc822, inline)]
> (I'm also on GNU/Linux, Arch) I get the same max-specpdl-size error
> with 25.1.50 [this refers to the master branch which is now 26.0.50
> (emacs-25 was 25.0.xx at the time)]. With emacs version 24.5 (and
> below) I get (error "Stack overflow in regexp matcher") [as expected].
The problem is that re_max_failures is set to 40001 (instead of the
original 40000) in main()[1], which is a problem because of the
GROW_FAIL_STACK uses (re_max_failures * TYPICAL_FAILURE_SIZE) as a cap
on the amount to allocate, but ((fail_stack).size * sizeof
(fail_stack_elt_t)) to calculate current allocation.
Since TYPICAL_FAILURE_SIZE = 20 and sizeof (fail_stack_elt_t) = 8, and
it seems that (fail_stack).size grows in increments of 3, when
(fail_stack).avail is 99999 and (fail_stack).size reaches 100002:
(fail_stack).size * sizeof (fail_stack_elt_t) => 800016
re_max_failures * TYPICAL_FAILURE_SIZE => 800020
ENSURE_FAIL_STACK(3) then loops indefinitely reallocating a stack of
size 800020 again and again until the record_xmalloc fails to
grow_specdl() (thus the "max-specpdl-size" error).
----------
So we we might want to fix the re_max_failures setting in main, but it
doesn't quite make sense to me that GROW_FAIL_STACK relies on
re_max_failures being a multiple of (sizeof (fail_stack_elt_t)). At the
definition of TYPICAL_FAILURE_SIZE we have
/* Estimate the size of data pushed by a typical failure stack entry.
An estimate is all we need, because all we use this for
is to choose a limit for how big to make the failure stack. */
/* BEWARE, the value `20' is hard-coded in emacs.c:main(). */
#define TYPICAL_FAILURE_SIZE 20
Why do we use an "estimate" here? What's wrong with just using
(re_max_failures * sizeof (fail_stack_elt_t)) as the limit? Or should
the limit actually be (re_max_failures * TYPICAL_FAILURE_SIZE * sizeof
(fail_stack_elt_t))?
-----------
827 long lim = rlim.rlim_cur;
(gdb) p rlim
$1 = {
rlim_cur = 8388608,
rlim_max = 18446744073709551615
}
(gdb) next
833 int ratio = 20 * sizeof (char *);
(gdb)
834 ratio += ratio / 3;
(gdb)
837 int extra = 200000;
(gdb) p ratio
$2 = 213
[...]
(gdb) display ((newlim - extra) / ratio)
1: ((newlim - extra) / ratio) = 40000
(gdb) next
856 newlim += pagesize - 1;
1: ((newlim - extra) / ratio) = 40000
(gdb)
857 if (0 <= rlim.rlim_max && rlim.rlim_max < newlim)
1: ((newlim - extra) / ratio) = 40019
(gdb)
859 newlim -= newlim % pagesize;
1: ((newlim - extra) / ratio) = 40019
(gdb)
861 if (pagesize <= newlim - lim)
1: ((newlim - extra) / ratio) = 40001
(gdb) undisplay 1
(gdb) next
863 rlim.rlim_cur = newlim;
(gdb)
864 if (setrlimit (RLIMIT_STACK, &rlim) == 0)
(gdb)
865 lim = newlim;
(gdb)
870 re_max_failures = lim < extra ? 0 : min (lim - extra, SIZE_MAX) / ratio;
(gdb)
875 stack_bottom = &stack_bottom_variable;
(gdb) p re_max_failures
$3 = 40001
-----------
[1]: This was the case since 9d356f62 2016-05-27 "Robustify stack-size calculation"
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Fri, 04 Nov 2016 08:22:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Date: Thu, 20 Oct 2016 23:54:05 -0400
>
> So we we might want to fix the re_max_failures setting in main, but it
> doesn't quite make sense to me that GROW_FAIL_STACK relies on
> re_max_failures being a multiple of (sizeof (fail_stack_elt_t)). At the
> definition of TYPICAL_FAILURE_SIZE we have
>
> /* Estimate the size of data pushed by a typical failure stack entry.
> An estimate is all we need, because all we use this for
> is to choose a limit for how big to make the failure stack. */
> /* BEWARE, the value `20' is hard-coded in emacs.c:main(). */
> #define TYPICAL_FAILURE_SIZE 20
>
> Why do we use an "estimate" here? What's wrong with just using
> (re_max_failures * sizeof (fail_stack_elt_t)) as the limit? Or should
> the limit actually be (re_max_failures * TYPICAL_FAILURE_SIZE * sizeof
> (fail_stack_elt_t))?
I think it should be the latter, indeed.
Can you propose a patch along those lines that would remove the
infloop in ENSURE_FAIL_STACK?
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sat, 05 Nov 2016 19:34:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 24751 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: npostavs <at> users.sourceforge.net
>> Date: Thu, 20 Oct 2016 23:54:05 -0400
>>
>> So we we might want to fix the re_max_failures setting in main, but it
>> doesn't quite make sense to me that GROW_FAIL_STACK relies on
>> re_max_failures being a multiple of (sizeof (fail_stack_elt_t)). At the
>> definition of TYPICAL_FAILURE_SIZE we have
>>
>> /* Estimate the size of data pushed by a typical failure stack entry.
>> An estimate is all we need, because all we use this for
>> is to choose a limit for how big to make the failure stack. */
>> /* BEWARE, the value `20' is hard-coded in emacs.c:main(). */
>> #define TYPICAL_FAILURE_SIZE 20
>>
>> Why do we use an "estimate" here? What's wrong with just using
>> (re_max_failures * sizeof (fail_stack_elt_t)) as the limit? Or should
>> the limit actually be (re_max_failures * TYPICAL_FAILURE_SIZE * sizeof
>> (fail_stack_elt_t))?
>
> I think it should be the latter, indeed.
>
> Can you propose a patch along those lines that would remove the
> infloop in ENSURE_FAIL_STACK?
>
> Thanks.
The below seems to work, but effectively increases the size of the
failure stack (so the sample file size has to be increased 8-fold to get
a regex stack overflow). Strangely, changing the value in the
definition of re_max_failures doesn't seem to have any effect, it stays
40000 regardless. I am quite confused.
diff --git i/src/regex.c w/src/regex.c
index 1c6c9e5..163c5b4 100644
--- i/src/regex.c
+++ w/src/regex.c
@@ -1320,19 +1320,22 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax)
#define GROW_FAIL_STACK(fail_stack) \
(((fail_stack).size * sizeof (fail_stack_elt_t) \
- >= re_max_failures * TYPICAL_FAILURE_SIZE) \
+ >= re_max_failures * sizeof (fail_stack_elt_t) \
+ * TYPICAL_FAILURE_SIZE) \
? 0 \
: ((fail_stack).stack \
= REGEX_REALLOCATE_STACK ((fail_stack).stack, \
(fail_stack).size * sizeof (fail_stack_elt_t), \
- min (re_max_failures * TYPICAL_FAILURE_SIZE, \
+ min (re_max_failures * sizeof (fail_stack_elt_t) \
+ * TYPICAL_FAILURE_SIZE, \
((fail_stack).size * sizeof (fail_stack_elt_t) \
* FAIL_STACK_GROWTH_FACTOR))), \
\
(fail_stack).stack == NULL \
? 0 \
: ((fail_stack).size \
- = (min (re_max_failures * TYPICAL_FAILURE_SIZE, \
+ = (min (re_max_failures * sizeof (fail_stack_elt_t) \
+ * TYPICAL_FAILURE_SIZE, \
((fail_stack).size * sizeof (fail_stack_elt_t) \
* FAIL_STACK_GROWTH_FACTOR)) \
/ sizeof (fail_stack_elt_t)), \
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sun, 06 Nov 2016 15:46:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Sat, 05 Nov 2016 15:34:29 -0400
>
> >> #define TYPICAL_FAILURE_SIZE 20
> >>
> >> Why do we use an "estimate" here? What's wrong with just using
> >> (re_max_failures * sizeof (fail_stack_elt_t)) as the limit? Or should
> >> the limit actually be (re_max_failures * TYPICAL_FAILURE_SIZE * sizeof
> >> (fail_stack_elt_t))?
> >
> > I think it should be the latter, indeed.
> >
> > Can you propose a patch along those lines that would remove the
> > infloop in ENSURE_FAIL_STACK?
> >
> > Thanks.
>
> The below seems to work
Thanks.
I think the patch can be simplified, where we now multiply by the size
of fail_stack_elt_t and then divide by it: simply remove both the
multiplication and the division. That will make the code easier to
read, and will make the units of each variable clear, something that I
think is at the heart of this issue.
> but effectively increases the size of the failure stack (so the
> sample file size has to be increased 8-fold to get a regex stack
> overflow).
Which IMO is exactly TRT, since re_max_failures was computed given the
runtime stack size of 8MB, so having it bail out after merely 800KB
doesn't sound right to me, don't you agree?
> Strangely, changing the value in the definition of re_max_failures
> doesn't seem to have any effect, it stays 40000 regardless. I am
> quite confused.
I don't think I follow. Can you tell what you tried to change, and
where did you see the lack of any effect?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sun, 13 Nov 2016 05:40:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 24751 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>
> I think the patch can be simplified, where we now multiply by the size
> of fail_stack_elt_t and then divide by it: simply remove both the
> multiplication and the division. That will make the code easier to
> read, and will make the units of each variable clear, something that I
> think is at the heart of this issue.
Ah, right.
[v2-0001-Fix-computation-of-regex-stack-limit.patch (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
>
>> but effectively increases the size of the failure stack (so the
>> sample file size has to be increased 8-fold to get a regex stack
>> overflow).
>
> Which IMO is exactly TRT, since re_max_failures was computed given the
> runtime stack size of 8MB, so having it bail out after merely 800KB
> doesn't sound right to me, don't you agree?
Yes, I suppose we should also try to make use of the stack, rather than
calling malloc, right? Something like this:
diff --git i/src/regex.c w/src/regex.c
index d23ba01..dcabde5 100644
--- i/src/regex.c
+++ w/src/regex.c
@@ -447,7 +447,11 @@ init_syntax_once (void)
#else /* not REGEX_MALLOC */
# ifdef emacs
-# define REGEX_USE_SAFE_ALLOCA USE_SAFE_ALLOCA
+# define REGEX_USE_SAFE_ALLOCA \
+ ptrdiff_t sa_avail = re_max_failures \
+ * TYPICAL_FAILURE_SIZE * sizeof (fail_stack_elt_t); \
+ ptrdiff_t sa_count = SPECPDL_INDEX (); bool sa_must_free = false
+
# define REGEX_SAFE_FREE() SAFE_FREE ()
# define REGEX_ALLOCATE SAFE_ALLOCA
# else
>
>> Strangely, changing the value in the definition of re_max_failures
>> doesn't seem to have any effect, it stays 40000 regardless. I am
>> quite confused.
>
> I don't think I follow. Can you tell what you tried to change, and
> where did you see the lack of any effect?
Modifying the initial value of re_max_failures as in the below patch,
has no effect on the size of the file at which stack regex overflow
happens (I confirmed it gets compiled by adding a #warning on the
previous line). Printing re_max_failures in gdb still shows 40000.
diff --git i/src/regex.c w/src/regex.c
index d23ba01..d9170c0 100644
--- i/src/regex.c
+++ w/src/regex.c
@@ -1249,7 +1249,7 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax)
whose default stack limit is 2mb. In order for a larger
value to work reliably, you have to try to make it accord
with the process stack limit. */
-size_t re_max_failures = 40000;
+size_t re_max_failures = 20;
# else
size_t re_max_failures = 4000;
# endif
Actually I find Emacs still compiles if I removed that line completely,
there's just a compile warning saying
regex.o: In function `re_match_2_internal':
/home/npostavs/src/emacs/emacs-master/lib-src/../src/regex.c:5529: warning: the 're_max_failures' variable is obsolete and will go away.
I guess there's some kind of definition of it in libc?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sun, 13 Nov 2016 16:13:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Sun, 13 Nov 2016 00:39:39 -0500
>
> > I think the patch can be simplified, where we now multiply by the size
> > of fail_stack_elt_t and then divide by it: simply remove both the
> > multiplication and the division. That will make the code easier to
> > read, and will make the units of each variable clear, something that I
> > think is at the heart of this issue.
>
> Ah, right.
Thanks, LGTM.
> >> but effectively increases the size of the failure stack (so the
> >> sample file size has to be increased 8-fold to get a regex stack
> >> overflow).
> >
> > Which IMO is exactly TRT, since re_max_failures was computed given the
> > runtime stack size of 8MB, so having it bail out after merely 800KB
> > doesn't sound right to me, don't you agree?
>
> Yes, I suppose we should also try to make use of the stack, rather than
> calling malloc, right? Something like this:
>
> diff --git i/src/regex.c w/src/regex.c
> index d23ba01..dcabde5 100644
> --- i/src/regex.c
> +++ w/src/regex.c
> @@ -447,7 +447,11 @@ init_syntax_once (void)
> #else /* not REGEX_MALLOC */
>
> # ifdef emacs
> -# define REGEX_USE_SAFE_ALLOCA USE_SAFE_ALLOCA
> +# define REGEX_USE_SAFE_ALLOCA \
> + ptrdiff_t sa_avail = re_max_failures \
> + * TYPICAL_FAILURE_SIZE * sizeof (fail_stack_elt_t); \
> + ptrdiff_t sa_count = SPECPDL_INDEX (); bool sa_must_free = false
> +
Yes. And please also add a comment there saying that this replaces
USE_SAFE_ALLOCA.
> -size_t re_max_failures = 40000;
> +size_t re_max_failures = 20;
> # else
> size_t re_max_failures = 4000;
> # endif
>
>
> Actually I find Emacs still compiles if I removed that line completely,
> there's just a compile warning saying
>
> regex.o: In function `re_match_2_internal':
> /home/npostavs/src/emacs/emacs-master/lib-src/../src/regex.c:5529: warning: the 're_max_failures' variable is obsolete and will go away.
>
> I guess there's some kind of definition of it in libc?
Most probably. You should be able to see that using "nm -A". If
that's indeed so, I think we should rename that variable to something
like emacs_re_max_failures, to avoid stomping on the libc variable..
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Tue, 15 Nov 2016 03:08:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 24751 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>>
>> Yes, I suppose we should also try to make use of the stack, rather than
>> calling malloc, right? Something like this:
>>
>> diff --git i/src/regex.c w/src/regex.c
>> index d23ba01..dcabde5 100644
>> --- i/src/regex.c
>> +++ w/src/regex.c
>> @@ -447,7 +447,11 @@ init_syntax_once (void)
>> #else /* not REGEX_MALLOC */
>>
>> # ifdef emacs
>> -# define REGEX_USE_SAFE_ALLOCA USE_SAFE_ALLOCA
>> +# define REGEX_USE_SAFE_ALLOCA \
>> + ptrdiff_t sa_avail = re_max_failures \
>> + * TYPICAL_FAILURE_SIZE * sizeof (fail_stack_elt_t); \
>> + ptrdiff_t sa_count = SPECPDL_INDEX (); bool sa_must_free = false
>> +
>
> Yes. And please also add a comment there saying that this replaces
> USE_SAFE_ALLOCA.
Actually, we should avoid increasing this limit if the stack wasn't
increased, right? Here's what I came up with, I think it doesn't cover
Cygwin/Windows though.
diff --git c/src/emacs.c i/src/emacs.c
index b74df21..d4655c8 100644
--- c/src/emacs.c
+++ i/src/emacs.c
@@ -831,8 +831,8 @@ main (int argc, char **argv)
re_max_failures, then add 33% to cover the size of the
smaller stacks that regex.c successively allocates and
discards on its way to the maximum. */
- int ratio = 20 * sizeof (char *);
- ratio += ratio / 3;
+ int min_ratio = 20 * sizeof (char *);
+ int ratio = min_ratio + min_ratio / 3;
/* Extra space to cover what we're likely to use for other reasons. */
int extra = 200000;
@@ -869,6 +869,7 @@ main (int argc, char **argv)
/* Don't let regex.c overflow the stack. */
re_max_failures = lim < extra ? 0 : min (lim - extra, SIZE_MAX) / ratio;
+ emacs_re_safe_alloca = re_max_failures * min_ratio;
}
#endif /* HAVE_SETRLIMIT and RLIMIT_STACK and not CYGWIN */
diff --git c/src/regex.c i/src/regex.c
index d23ba01..56cffa1 100644
--- c/src/regex.c
+++ i/src/regex.c
@@ -447,7 +447,13 @@ init_syntax_once (void)
#else /* not REGEX_MALLOC */
# ifdef emacs
-# define REGEX_USE_SAFE_ALLOCA USE_SAFE_ALLOCA
+/* This may be adjusted in main(), if the stack is successfully grown. */
+ptrdiff_t emacs_re_safe_alloca = MAX_ALLOCA;
+/* Like USE_SAFE_ALLOCA, but use emacs_re_safe_alloca. */
+# define REGEX_USE_SAFE_ALLOCA \
+ ptrdiff_t sa_avail = emacs_re_safe_alloca; \
+ ptrdiff_t sa_count = SPECPDL_INDEX (); bool sa_must_free = false
+
# define REGEX_SAFE_FREE() SAFE_FREE ()
# define REGEX_ALLOCATE SAFE_ALLOCA
# else
diff --git c/src/regex.h i/src/regex.h
index 4922440..45cbe0a 100644
--- c/src/regex.h
+++ i/src/regex.h
@@ -187,6 +187,11 @@
/* Roughly the maximum number of failure points on the stack. */
extern size_t re_max_failures;
+#ifdef emacs
+/* Amount of memory that we can safely stack allocate. */
+extern ptrdiff_t emacs_re_safe_alloca;
+#endif
+
/* Define combinations of the above bits for the standard possibilities.
(The [[[ comments delimit what gets put into the Texinfo file, so
>>
>>
>> Actually I find Emacs still compiles if I removed that line completely,
>> there's just a compile warning saying
>>
>> regex.o: In function `re_match_2_internal':
>> /home/npostavs/src/emacs/emacs-master/lib-src/../src/regex.c:5529: warning: the 're_max_failures' variable is obsolete and will go away.
>>
>> I guess there's some kind of definition of it in libc?
>
> Most probably. You should be able to see that using "nm -A". If
> that's indeed so, I think we should rename that variable to something
> like emacs_re_max_failures, to avoid stomping on the libc variable..
Ah, right:
$ nm -A /usr/lib/libc.so.6 | grep re_max_failures
/usr/lib/libc.so.6:0000000000000000 n __evoke_link_warning_re_max_failures
/usr/lib/libc.so.6:00000000003981d8 D re_max_failures
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Tue, 15 Nov 2016 16:13:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Mon, 14 Nov 2016 22:08:18 -0500
>
> Actually, we should avoid increasing this limit if the stack wasn't
> increased, right? Here's what I came up with, I think it doesn't cover
> Cygwin/Windows though.
>
> diff --git c/src/emacs.c i/src/emacs.c
> index b74df21..d4655c8 100644
> --- c/src/emacs.c
> +++ i/src/emacs.c
> @@ -831,8 +831,8 @@ main (int argc, char **argv)
> re_max_failures, then add 33% to cover the size of the
> smaller stacks that regex.c successively allocates and
> discards on its way to the maximum. */
> - int ratio = 20 * sizeof (char *);
> - ratio += ratio / 3;
> + int min_ratio = 20 * sizeof (char *);
> + int ratio = min_ratio + min_ratio / 3;
>
> /* Extra space to cover what we're likely to use for other reasons. */
> int extra = 200000;
> @@ -869,6 +869,7 @@ main (int argc, char **argv)
>
> /* Don't let regex.c overflow the stack. */
> re_max_failures = lim < extra ? 0 : min (lim - extra, SIZE_MAX) / ratio;
> + emacs_re_safe_alloca = re_max_failures * min_ratio;
> }
> #endif /* HAVE_SETRLIMIT and RLIMIT_STACK and not CYGWIN */
Right, but I have 2 comments:
. we shouldn't set re_max_failures to zero if the amount of stack is
less than 'extra', since in that case we will allocate the failure
stack off the heap;
. emacs_re_safe_alloca should have its minimum value MAX_ALLOCA, not
zero, because SAFE_ALLOCA can still be used in regex.c, even
though the failure stack will be malloc'ed.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Wed, 16 Nov 2016 01:06:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 24751 <at> debbugs.gnu.org (full text, mbox):
>> @@ -869,6 +869,7 @@ main (int argc, char **argv)
>>
>> /* Don't let regex.c overflow the stack. */
>> re_max_failures = lim < extra ? 0 : min (lim - extra, SIZE_MAX) / ratio;
>> + emacs_re_safe_alloca = re_max_failures * min_ratio;
>> }
>> #endif /* HAVE_SETRLIMIT and RLIMIT_STACK and not CYGWIN */
>
> . we shouldn't set re_max_failures to zero if the amount of stack is
> less than 'extra', since in that case we will allocate the failure
> stack off the heap;
Then what should we set it to? Maybe we shouldn't modify it at all,
since the stack isn't actually a limiting factor?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Wed, 16 Nov 2016 16:26:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Tue, 15 Nov 2016 20:06:29 -0500
>
> >> @@ -869,6 +869,7 @@ main (int argc, char **argv)
> >>
> >> /* Don't let regex.c overflow the stack. */
> >> re_max_failures = lim < extra ? 0 : min (lim - extra, SIZE_MAX) / ratio;
> >> + emacs_re_safe_alloca = re_max_failures * min_ratio;
> >> }
> >> #endif /* HAVE_SETRLIMIT and RLIMIT_STACK and not CYGWIN */
> >
> > . we shouldn't set re_max_failures to zero if the amount of stack is
> > less than 'extra', since in that case we will allocate the failure
> > stack off the heap;
>
> Then what should we set it to? Maybe we shouldn't modify it at all,
> since the stack isn't actually a limiting factor?
Yes, I think this is the best solution.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Wed, 16 Nov 2016 23:25:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 24751 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: npostavs <at> users.sourceforge.net
>> Cc: 24751 <at> debbugs.gnu.org
>> Date: Tue, 15 Nov 2016 20:06:29 -0500
>>
>> >> @@ -869,6 +869,7 @@ main (int argc, char **argv)
>> >>
>> >> /* Don't let regex.c overflow the stack. */
>> >> re_max_failures = lim < extra ? 0 : min (lim - extra, SIZE_MAX) / ratio;
>> >> + emacs_re_safe_alloca = re_max_failures * min_ratio;
>> >> }
>> >> #endif /* HAVE_SETRLIMIT and RLIMIT_STACK and not CYGWIN */
>> >
>> > . we shouldn't set re_max_failures to zero if the amount of stack is
>> > less than 'extra', since in that case we will allocate the failure
>> > stack off the heap;
>>
>> Then what should we set it to? Maybe we shouldn't modify it at all,
>> since the stack isn't actually a limiting factor?
>
> Yes, I think this is the best solution.
>
One more question, is this comment (around line 1198) now obsolete? (if
not, it sounds like we might still have some serious problems)
/* Define MATCH_MAY_ALLOCATE unless we need to make sure that the
searching and matching functions should not call alloca. On some
systems, alloca is implemented in terms of malloc, and if we're
using the relocating allocator routines, then malloc could cause a
relocation, which might (if the strings being searched are in the
ralloc heap) shift the data out from underneath the regexp
routines.
Here's another reason to avoid allocation: Emacs
processes input from X in a signal handler; processing X input may
call malloc; if input arrives while a matching routine is calling
malloc, then we're scrod. But Emacs can't just block input while
calling matching routines; then we don't notice interrupts when
they come in. So, Emacs blocks input around all regexp calls
except the matching calls, which it leaves unprotected, in the
faith that they will not malloc. */
Also this one (around line 430)
/* Should we use malloc or alloca? If REGEX_MALLOC is not defined, we
use `alloca' instead of `malloc'. This is because using malloc in
re_search* or re_match* could cause memory leaks when C-g is used in
Emacs; also, malloc is slower and causes storage fragmentation. On
the other hand, malloc is more portable, and easier to debug.
Because we sometimes use alloca, some routines have to be macros,
not functions -- `alloca'-allocated space disappears at the end of the
function it is called in. */
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Thu, 17 Nov 2016 16:22:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Wed, 16 Nov 2016 18:25:22 -0500
>
> One more question, is this comment (around line 1198) now obsolete? (if
> not, it sounds like we might still have some serious problems)
>
> /* Define MATCH_MAY_ALLOCATE unless we need to make sure that the
> searching and matching functions should not call alloca. On some
> systems, alloca is implemented in terms of malloc, and if we're
> using the relocating allocator routines, then malloc could cause a
> relocation, which might (if the strings being searched are in the
> ralloc heap) shift the data out from underneath the regexp
> routines.
>
> Here's another reason to avoid allocation: Emacs
> processes input from X in a signal handler; processing X input may
> call malloc; if input arrives while a matching routine is calling
> malloc, then we're scrod. But Emacs can't just block input while
> calling matching routines; then we don't notice interrupts when
> they come in. So, Emacs blocks input around all regexp calls
> except the matching calls, which it leaves unprotected, in the
> faith that they will not malloc. */
The second part is obsolete: we no longer do anything significant from
a signal handler, we just set a flag.
The first part is not obsolete, but its reasoning is backwards:
SAFE_ALLOCA indeed can call malloc, but it could only cause relocation
if REGEX_MALLOC is defined (and ralloc.c is compiled in). And when
you define REGEX_MALLOC, MATCH_MAY_ALLOCATE is undefined. So the text
there should be revised.
> Also this one (around line 430)
>
> /* Should we use malloc or alloca? If REGEX_MALLOC is not defined, we
> use `alloca' instead of `malloc'. This is because using malloc in
> re_search* or re_match* could cause memory leaks when C-g is used in
> Emacs; also, malloc is slower and causes storage fragmentation. On
> the other hand, malloc is more portable, and easier to debug.
>
> Because we sometimes use alloca, some routines have to be macros,
> not functions -- `alloca'-allocated space disappears at the end of the
> function it is called in. */
This is correct AFAIU, but perhaps it's worth adding that even if
SAFE_ALLOCA decides to call malloc, it takes care to set up
unwind-protect scheme that will free the allocated memory upon C-g (or
any other throw-type op), and avoid leaking memory.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sat, 19 Nov 2016 10:03:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 24751 <at> debbugs.gnu.org (full text, mbox):
One other comment: the value of 'extra' in emacs.c looks too small to
me. If we want to safely allocate the failure stack off the run-time
C stack, we should make 'extra' large enough to cover a typical GC,
which can take as many as 30K stack frames. So I'd enlarge 'extra' to
be something like 1500000.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sun, 01 Jan 2017 18:33:01 GMT)
Full text and
rfc822 format available.
Message #44 received at 24751 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>>
>> /* Define MATCH_MAY_ALLOCATE unless we need to make sure that the
>> searching and matching functions should not call alloca. On some
>> systems, alloca is implemented in terms of malloc, and if we're
>> using the relocating allocator routines, then malloc could cause a
>> relocation, which might (if the strings being searched are in the
>> ralloc heap) shift the data out from underneath the regexp
>> routines.
>>
>> [...]
>
> The first part is not obsolete, but its reasoning is backwards:
> SAFE_ALLOCA indeed can call malloc, but it could only cause relocation
> if REGEX_MALLOC is defined (and ralloc.c is compiled in). And when
> you define REGEX_MALLOC, MATCH_MAY_ALLOCATE is undefined. So the text
> there should be revised.
Is there ever any case where REGEX_MALLOC is defined? I can't see where
it happens. I don't understand why you say relocation is dependent on
REGEX_MALLOC, I thought only REL_ALLOC affects that.
And since we added r_alloc_inhibit_buffer_relocation around the regex
calls, aren't all these concerns about relocation obsolete?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sun, 01 Jan 2017 18:43:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Sun, 01 Jan 2017 13:33:35 -0500
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
> >>
> >> /* Define MATCH_MAY_ALLOCATE unless we need to make sure that the
> >> searching and matching functions should not call alloca. On some
> >> systems, alloca is implemented in terms of malloc, and if we're
> >> using the relocating allocator routines, then malloc could cause a
> >> relocation, which might (if the strings being searched are in the
> >> ralloc heap) shift the data out from underneath the regexp
> >> routines.
> >>
> >> [...]
> >
> > The first part is not obsolete, but its reasoning is backwards:
> > SAFE_ALLOCA indeed can call malloc, but it could only cause relocation
> > if REGEX_MALLOC is defined (and ralloc.c is compiled in). And when
> > you define REGEX_MALLOC, MATCH_MAY_ALLOCATE is undefined. So the text
> > there should be revised.
>
> Is there ever any case where REGEX_MALLOC is defined? I can't see where
> it happens.
I don't understand the question. You can compile regex.c with the
"-DREGEX_MALLOC" option whenever you like. We don't do that, but as
long as the code which supports that is in regex.c, the comment goes
with it.
> I don't understand why you say relocation is dependent on
> REGEX_MALLOC, I thought only REL_ALLOC affects that.
REL_ALLOC determines whether ralloc.c is compiled in, which I
mentioned above.
> And since we added r_alloc_inhibit_buffer_relocation around the regex
> calls, aren't all these concerns about relocation obsolete?
The calls to r_alloc_inhibit_buffer_relocation are outside of regex.c,
so the comments in regex.c don't know anything about that.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sun, 01 Jan 2017 18:57:01 GMT)
Full text and
rfc822 format available.
Message #50 received at 24751 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: npostavs <at> users.sourceforge.net
>> Cc: 24751 <at> debbugs.gnu.org
>> Date: Sun, 01 Jan 2017 13:33:35 -0500
>>
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>> >>
>> >> /* Define MATCH_MAY_ALLOCATE unless we need to make sure that the
>> >> searching and matching functions should not call alloca. On some
>> >> systems, alloca is implemented in terms of malloc, and if we're
>> >> using the relocating allocator routines, then malloc could cause a
>> >> relocation, which might (if the strings being searched are in the
>> >> ralloc heap) shift the data out from underneath the regexp
>> >> routines.
>> >>
>> >> [...]
>> >
>> > The first part is not obsolete, but its reasoning is backwards:
>> > SAFE_ALLOCA indeed can call malloc, but it could only cause relocation
>> > if REGEX_MALLOC is defined (and ralloc.c is compiled in). And when
>> > you define REGEX_MALLOC, MATCH_MAY_ALLOCATE is undefined. So the text
>> > there should be revised.
>>
>> Is there ever any case where REGEX_MALLOC is defined? I can't see where
>> it happens.
>
> I don't understand the question. You can compile regex.c with the
> "-DREGEX_MALLOC" option whenever you like. We don't do that,
^^^^^^^^^^^^^^^^
Thanks, that's what I was wondering about in my question.
>
>> I don't understand why you say relocation is dependent on
>> REGEX_MALLOC, I thought only REL_ALLOC affects that.
>
> REL_ALLOC determines whether ralloc.c is compiled in, which I
> mentioned above.
But if REL_ALLOC is defined, then SAFE_ALLOCA could cause relocation
(via malloc) regardless of whether REGEX_MALLOC is defined or not, no?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sun, 01 Jan 2017 20:08:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Sun, 01 Jan 2017 13:57:05 -0500
>
> >> I don't understand why you say relocation is dependent on
> >> REGEX_MALLOC, I thought only REL_ALLOC affects that.
> >
> > REL_ALLOC determines whether ralloc.c is compiled in, which I
> > mentioned above.
>
> But if REL_ALLOC is defined, then SAFE_ALLOCA could cause relocation
> (via malloc) regardless of whether REGEX_MALLOC is defined or not, no?
Relocation as side effect of calling malloc only happens with buffer
text. This is not what the comment in question alludes to. It
alludes to this:
/* Define how to allocate the failure stack. */
#if defined REL_ALLOC && defined REGEX_MALLOC
# define REGEX_ALLOCATE_STACK(size) \
r_alloc (&failure_stack_ptr, (size))
# define REGEX_REALLOCATE_STACK(source, osize, nsize) \
r_re_alloc (&failure_stack_ptr, (nsize))
# define REGEX_FREE_STACK(ptr) \
r_alloc_free (&failure_stack_ptr)
#else /* not using relocating allocator */
# define REGEX_ALLOCATE_STACK(size) REGEX_ALLOCATE (size)
# define REGEX_REALLOCATE_STACK(source, o, n) REGEX_REALLOCATE (source, o, n)
# define REGEX_FREE_STACK(ptr) REGEX_FREE (ptr)
#endif /* not using relocating allocator */
This calls ralloc.c functions directly for allocating/reallocating the
failure stack, when both REL_ALLOC and REGEX_MALLOC are defined. So
the relocation in question is that of the failure stack, which won't
happen if we call malloc, even if REL_ALLOC is defined, because only
buffer text can be relocated when ralloc.c is called from malloc.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Mon, 02 Jan 2017 04:49:02 GMT)
Full text and
rfc822 format available.
Message #56 received at 24751 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>>
>> /* Define MATCH_MAY_ALLOCATE unless we need to make sure that the
>> searching and matching functions should not call alloca. On some
>> systems, alloca is implemented in terms of malloc, and if we're
>> using the relocating allocator routines, then malloc could cause a
>> relocation, which might (if the strings being searched are in the
>> ralloc heap) shift the data out from underneath the regexp
>> routines.
>
>
> The first part is not obsolete, but its reasoning is backwards:
> SAFE_ALLOCA indeed can call malloc, but it could only cause relocation
> if REGEX_MALLOC is defined (and ralloc.c is compiled in). And when
> you define REGEX_MALLOC, MATCH_MAY_ALLOCATE is undefined. So the text
> there should be revised.
Everything you've said makes sense after your last message, but I'm
still unable to put it together and come up with a revised comment.
Could you make a suggestion? Here are the rest of the changes.
[v1-0001-Use-expanded-stack-during-regex-matches.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Mon, 02 Jan 2017 15:25:01 GMT)
Full text and
rfc822 format available.
Message #59 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Sun, 01 Jan 2017 23:49:46 -0500
>
> Everything you've said makes sense after your last message, but I'm
> still unable to put it together and come up with a revised comment.
> Could you make a suggestion?
How about the below?
--- src/regex.c~0 2016-12-11 06:39:19.000000000 +0200
+++ src/regex.c 2017-01-02 12:40:44.266517100 +0200
@@ -1195,24 +1195,28 @@
gettext_noop ("Range striding over charsets") /* REG_ERANGEX */
};
-/* Avoiding alloca during matching, to placate r_alloc. */
+/* Whether to allocate memory during matching. */
-/* Define MATCH_MAY_ALLOCATE unless we need to make sure that the
- searching and matching functions should not call alloca. On some
- systems, alloca is implemented in terms of malloc, and if we're
- using the relocating allocator routines, then malloc could cause a
- relocation, which might (if the strings being searched are in the
- ralloc heap) shift the data out from underneath the regexp
- routines.
-
- Here's another reason to avoid allocation: Emacs
- processes input from X in a signal handler; processing X input may
- call malloc; if input arrives while a matching routine is calling
- malloc, then we're scrod. But Emacs can't just block input while
- calling matching routines; then we don't notice interrupts when
- they come in. So, Emacs blocks input around all regexp calls
- except the matching calls, which it leaves unprotected, in the
- faith that they will not malloc. */
+/* Define MATCH_MAY_ALLOCATE to allow the searching and matching
+ functions allocate memory for the failure stack and registers.
+ Normally should be defined, because otherwise searching and
+ matching routines will have much smaller memory resources at their
+ disposal, and therefore might fail to handle complex regexps.
+ Therefore undefine MATCH_MAY_ALLOCATE only in the following
+ exceptional situations:
+
+ . When running on a system where memory is at premium.
+ . When alloca cannot be used at all, perhaps due to bugs in
+ its implementation, or its being unavailable, or due to a
+ very small stack size. This requires to define REGEX_MALLOC
+ to use malloc instead, which in turn could lead to memory
+ leaks if search is interrupted by a signal. (For these
+ reasons, defining REGEX_MALLOC when building Emacs
+ automatically undefines MATCH_MAY_ALLOCATE, but outside
+ Emacs you may not care about memory leaks.) If you want to
+ prevent the memory leaks, undefine MATCH_MAY_ALLOCATE.
+ . When code that calls the searching and matching functions
+ cannot allow memory allocation, for whatever reasons. */
/* Normally, this is fine. */
#define MATCH_MAY_ALLOCATE
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Mon, 02 Jan 2017 18:30:03 GMT)
Full text and
rfc822 format available.
Message #62 received at 24751 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: npostavs <at> users.sourceforge.net
>> Cc: 24751 <at> debbugs.gnu.org
>> Date: Sun, 01 Jan 2017 23:49:46 -0500
>>
>> Everything you've said makes sense after your last message, but I'm
>> still unable to put it together and come up with a revised comment.
>> Could you make a suggestion?
>
> How about the below?
Looks good, I've incorporated it into the patch:
[v3-0001-Fix-computation-of-regex-stack-limit.patch (text/plain, attachment)]
[v3-0002-Use-expanded-stack-during-regex-matches.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Mon, 02 Jan 2017 19:23:01 GMT)
Full text and
rfc822 format available.
Message #65 received at 24751 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 24751 <at> debbugs.gnu.org
> Date: Mon, 02 Jan 2017 13:30:26 -0500
>
> > How about the below?
>
> Looks good, I've incorporated it into the patch:
Fine with me, thanks.
Added tag(s) patch.
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Mon, 02 Jan 2017 21:59:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#24751
; Package
emacs
.
(Sun, 08 Jan 2017 23:49:02 GMT)
Full text and
rfc822 format available.
Message #70 received at 24751 <at> debbugs.gnu.org (full text, mbox):
tags 24751 fixed
close 24751 26.1
quit
I've pushed to master.
1: 2017-01-08 18:45:52 -0500 9a19f26cd796c7321f659a8dbea5296b0eeea51d
Fix computation of regex stack limit
2: 2017-01-08 18:45:52 -0500 13c6f1d185d301aad2f6d756c148acb2edd0889f
Use expanded stack during regex matches
Added tag(s) fixed.
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Sun, 08 Jan 2017 23:49:02 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 26.1, send any further explanations to
24751 <at> debbugs.gnu.org and npostavs <at> users.sourceforge.net
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Sun, 08 Jan 2017 23:49:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 06 Feb 2017 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 8 years and 195 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.