GNU bug report logs -
#61514
30.0.50; sadistically long xml line hangs emacs
Previous Next
Reported by: "Mark A. Hershberger" <mah <at> everybody.org>
Date: Tue, 14 Feb 2023 21:05:02 UTC
Severity: normal
Found in version 30.0.50
Done: Gregory Heytings <gregory <at> heytings.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
>> Looking at the history of that variable, which is in fact a
>> compile-time constant, I see that it was initially (May 1995) set to
>> 200000. A few months later (Nov 1995) it was set to 20000, and reduced
>> again (apparently because of bug reports) to 8000 and to 4000 (both in
>> Jun 1996). Two months later it was again set to 20000 (Aug 1996), and
>> a year later to 40000 (Dec 1997). It kept that value since then. As
>> these changes (and this bug report) demonstrate, it is not possible to
>> give that variable a "one size fits all" value.
>
> Note that the stack is allocated with `SAFE_ALLOCA` and used to be
> allocated with just `alloca`. So the constant was probably reduced
> (back in the 90s) in response to reports of segfaults due to C stack
> overflows.
>
Indeed. But now that we use SAFE_ALLOCA, we fallback to malloc when there
is not enough room for an alloca, so the constant seems even more
arbitrary.
>
> Nowadays we should be hopefully(?) safe from such segfaults since
> `SAFE_ALLOCA` only uses `alloca` for smallish allocations.
>
That's not the case in regex-emacs.c: REGEX_USE_SAFE_ALLOCA sets sa_avail
to emacs_re_safe_alloca (~6 MiB) instead of its default MAX_ALLOCA value
(16 KiB).
>
> This really needs a comment (at least one referring to this bug report).
> I think the idea is that we hope the regexp will need at most one stack
> entry per character, so the above means that we're willing to limit the
> regexp search to about 1kB of text, which sounds fair given it's
> supposed to match just a single XML attribute.
>
Indeed, thanks!
>> + DEFVAR_INT ("regexp-max-failures", Vregexp_max_failures,
>> + doc: /* Maximum number of failures points in a regexp search. */);
>> + Vregexp_max_failures = max_regexp_max_failures;
>
> This name is misleading. It suggests it's talking about how many times
> we fail, whereas the reality is that it's about the number of pending
> branches in the search space (which the source code calls "failure
> points" because it's info to be used in case the current branch fails to
> match). It could also be described as the number of "pending
> continuations" or "stacked failure continuations" or some wording like
> that.
>
> But for the var name itself, how 'bout `regexp-max-backtracking-depth`?
>
Indeed again, and thanks again!
Updated patch attached.
[Make-the-backtracking-depth-of-regexp-searches-modif.patch (text/x-diff, attachment)]
This bug report was last modified 2 years and 147 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.