GNU bug report logs - #72692
Emacs 31.05 (40eecd594ac) get SIGSEGV on Linux (Linux 6.6.45 Kde Wayland)

Previous Next

Package: emacs;

Reported by: Eval EXEC <execvy <at> gmail.com>

Date: Sun, 18 Aug 2024 08:31:01 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: execvy <at> gmail.com, 72692 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: bug#72692: Emacs 31.05 (40eecd594ac) get SIGSEGV on Linux (Linux 6.6.45 Kde Wayland)
Date: Mon, 26 Aug 2024 19:04:13 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Mon, 26 Aug 2024 05:52:11 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: Juri Linkov <juri <at> linkov.net>, execvy <at> gmail.com, 72692 <at> debbugs.gnu.org
>>
>> >> -      f->face_change = true;
>> >
>> > Sorry about that.  I've now reverted that change; we will have to find
>> > a cheaper solution for the original problem.
>>
>> This patch, moving down the setting of f->face_change to when we
>> definitely know it's required, seems to avoid the
>> always-in-the-foreground problem, but I'm not sure it fixes Juri's
>> performance issue:
>
> If all we want is to set the face_change flag without causing
> performance issues, then the patch below does it, and AFAIU should
> prevent the segfaults in this bug (please check, as I'm unable to
> trigger the segfaults):

Will do.

> diff --git a/src/xfaces.c b/src/xfaces.c
> index 684b6cc..1baa364 100644
> --- a/src/xfaces.c
> +++ b/src/xfaces.c
> @@ -733,9 +733,13 @@ recompute_basic_faces (struct frame *f)
>  {
>    if (FRAME_FACE_CACHE (f))
>      {
> +      bool non_basic_faces_cached =
> +	FRAME_FACE_CACHE (f)->used >= BASIC_FACE_ID_SENTINEL;

There's an off by one here, I think: FRAME_FACE_CACHE (f)->used is a
count, and it'll usually be equal to BASIC_FACE_ID_SENTINEL after the
basic faces have been realized, greater (never equal) if non-basic faces
have been realized, and 0 if no faces have been realized.

So, in fact, your patch is equivalent to checking that FRAME_FACE_CACHE
(f)->used != 0, and only setting face_change if it was.

That makes sense to me, but it doesn't match the variable names or your
description.

> The idea here is that if all we have in the frame's face cache are the
> basic faces, then they are all ASCII faces, and therefore the issue
> with non-ASCII faces still holding the freed fontset ID should not
> arise.

See above, not what your code actually does.  Good thing, too, because
that would change things only if no non-ASCII face were to be in use.

> The performance problem was caused by init_iterator calling
> recompute_basic_faces every time and then shooting itself in the foot
> by immediately setting the face_change flag,

And it's easily fixed by clearing the face_change flag after
recompute_basic_faces runs, rather than before, which is the more
important part of my first patch.  (The less important part is a mere
performance optimization).

> but this is now avoided
> because init_iterator only calls recompute_basic_faces if the frame's
> face cache is empty.  Most other places that call
> recompute_basic_faces also do that when the cache is empty (in which
> case we don't have to worry about non-ASCII faces), but some call it
> unconditionally, and the above is for those.

??? I don't see any such callers in my backtraces of Emacs with or
without the patch, but I do see significant performance differences, so
this must be wrong.

> Btw, thinking again about the original problem, I don't think I
> understand why free_face_fontset could cause a segfault.  All that
> function does is remove the fontset from the cache and put nil into
> its slot.  Thereafter trying to access the fontset by its ID will
> yield nil.

It's font_for_char that segfaults, or fontset_font, depending on whether
assertions are enabled.

The code path is this, in the case with assertions:

  fontset = FONTSET_FROM_ID (face->fontset);  /* fontset is nil */
  eassert (!BASE_FONTSET_P (fontset));

Since

#define BASE_FONTSET_P(fontset) (NILP (FONTSET_BASE (fontset)))

and

#define FONTSET_BASE(fontset) XCHAR_TABLE (fontset)->extras[3]

we end up looking at XCHAR_TABLE (Qnil)->extras[3], which segfaults
because nil is not a char table.

The code path without assertions is similar, it just that the
XCHAR_TABLE happens further down the call stack, in fontset_find_font.
See the original backtrace.

> So how can this cause trouble?  The two ways I can think
> of are:
>
>   . Some code that calls FONTSET_FROM_ID and does not check that the
>     result is nil.  We indeed have a couple of places like that, but
>     it should be easy to fix them.  However, can this cause a
>     segfault?

Yes.

>   . While some face holds to a (stale) fontset ID, a new fontset is
>     created and stored in the cache at the slot corresponding to that
>     stale ID.  So accessing this ID will yield a wrong, but still
>     valid fontset.  Can this cause a segfault?

No, just cause us to silently use the wrong fontset, leading to other
bugs.

> Are there any other scenarios?  (And in your reproducer, the second
> scenario above was AFAIU disabled by never decreasing next_fontset_id,
> so only the first one could cause some trouble?)

Correct. The first scenario causes segfaults, the second scenario causes
silently misbehaving code. We need to fix both, of course.

>> However, I would still prefer a solution which doesn't, even for a short
>> period of time, leave the non-ASCII faces' ->ascii_face pointing to a
>> face which has been freed, and their ->fontset identifying a fontset
>> which might be nil.
>
> This goes against the design of the faces and their interaction with
> the display code, I think.

If the design indeed relies on dangling pointers to freed memory
surviving said freeing (it doesn't), that would be a very bad design
indeed, particularly since this fact isn't documented.  The correct
thing to do here is to ensure all references to ascii_face are removed,
by freeing the non-ASCII faces, before we remove the ascii_face itself.

And please, let's document such "design decisions" (actually, bugs) if
you refuse to let anyone fix them.  Having a pointer in a structure
which might point to random memory is a bug unless there is a very loud
comment advising us of this fact.

>> Is it true that all non-ASCII faces share their ASCII face's hash and
>> thus live in the same collision chain?  We could walk that and unrealize
>> them all when the ASCII face is unrealized, then.
>>
>> Something like this:
>
> If all we want is to free all the faces that belong to a certain frame
> and share the same ascii_face, why not simply walk the frame's face
> cache looking for such faces, and free each one of them?

Because there are 1009 buckets, and we only need to walk half of one.
Otherwise the effects are identical.

Pip





This bug report was last modified 257 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.