GNU bug report logs - #61514
30.0.50; sadistically long xml line hangs emacs

Previous Next

Package: emacs;

Reported by: "Mark A. Hershberger" <mah <at> everybody.org>

Date: Tue, 14 Feb 2023 21:05:02 UTC

Severity: normal

Found in version 30.0.50

Done: Gregory Heytings <gregory <at> heytings.org>

Bug is archived. No further changes may be made.

Full log


Message #98 received at 61514 <at> debbugs.gnu.org (full text, mbox):

From: Gregory Heytings <gregory <at> heytings.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 61514 <at> debbugs.gnu.org, mah <at> everybody.org
Subject: Re: bug#61514: 30.0.50; sadistically long xml line hangs emacs
Date: Mon, 20 Feb 2023 11:28:44 +0000
[Message part 1 (text/plain, inline)]
>> Looking at the history of that variable, which is in fact a 
>> compile-time constant, I see that it was initially (May 1995) set to 
>> 200000.  A few months later (Nov 1995) it was set to 20000, and reduced 
>> again (apparently because of bug reports) to 8000 and to 4000 (both in 
>> Jun 1996).  Two months later it was again set to 20000 (Aug 1996), and 
>> a year later to 40000 (Dec 1997).  It kept that value since then.  As 
>> these changes (and this bug report) demonstrate, it is not possible to 
>> give that variable a "one size fits all" value.
>
> Note that the stack is allocated with `SAFE_ALLOCA` and used to be 
> allocated with just `alloca`.  So the constant was probably reduced 
> (back in the 90s) in response to reports of segfaults due to C stack 
> overflows.
>

Indeed.  But now that we use SAFE_ALLOCA, we fallback to malloc when there 
is not enough room for an alloca, so the constant seems even more 
arbitrary.

>
> Nowadays we should be hopefully(?) safe from such segfaults since 
> `SAFE_ALLOCA` only uses `alloca` for smallish allocations.
>

That's not the case in regex-emacs.c: REGEX_USE_SAFE_ALLOCA sets sa_avail 
to emacs_re_safe_alloca (~6 MiB) instead of its default MAX_ALLOCA value 
(16 KiB).

>
> This really needs a comment (at least one referring to this bug report). 
> I think the idea is that we hope the regexp will need at most one stack 
> entry per character, so the above means that we're willing to limit the 
> regexp search to about 1kB of text, which sounds fair given it's 
> supposed to match just a single XML attribute.
>

Indeed, thanks!

>> +  DEFVAR_INT ("regexp-max-failures", Vregexp_max_failures,
>> +	      doc: /* Maximum number of failures points in a regexp search.  */);
>> +  Vregexp_max_failures = max_regexp_max_failures;
>
> This name is misleading.  It suggests it's talking about how many times 
> we fail, whereas the reality is that it's about the number of pending 
> branches in the search space (which the source code calls "failure 
> points" because it's info to be used in case the current branch fails to 
> match).  It could also be described as the number of "pending 
> continuations" or "stacked failure continuations" or some wording like 
> that.
>
> But for the var name itself, how 'bout `regexp-max-backtracking-depth`?
>

Indeed again, and thanks again!

Updated patch attached.
[Make-the-backtracking-depth-of-regexp-searches-modif.patch (text/x-diff, attachment)]

This bug report was last modified 2 years and 147 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.