GNU bug report logs - #72692
Emacs 31.05 (40eecd594ac) get SIGSEGV on Linux (Linux 6.6.45 Kde Wayland)

Previous Next

Package: emacs;

Reported by: Eval EXEC <execvy <at> gmail.com>

Date: Sun, 18 Aug 2024 08:31:01 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #136 received at 72692 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: execvy <at> gmail.com, 72692 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#72692: Emacs 31.05 (40eecd594ac) get SIGSEGV on Linux (Linux
 6.6.45 Kde Wayland)
Date: Tue, 27 Aug 2024 14:44:11 +0300
> Date: Mon, 26 Aug 2024 19:04:13 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: execvy <at> gmail.com, 72692 <at> debbugs.gnu.org, juri <at> linkov.net
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> > --- a/src/xfaces.c
> > +++ b/src/xfaces.c
> > @@ -733,9 +733,13 @@ recompute_basic_faces (struct frame *f)
> >  {
> >    if (FRAME_FACE_CACHE (f))
> >      {
> > +      bool non_basic_faces_cached =
> > +	FRAME_FACE_CACHE (f)->used >= BASIC_FACE_ID_SENTINEL;
> 
> There's an off by one here, I think

Yes, it should be > instead of >= .

> FRAME_FACE_CACHE (f)->used is a
> count, and it'll usually be equal to BASIC_FACE_ID_SENTINEL after the
> basic faces have been realized, greater (never equal) if non-basic faces
> have been realized, and 0 if no faces have been realized.
> 
> So, in fact, your patch is equivalent to checking that FRAME_FACE_CACHE
> (f)->used != 0, and only setting face_change if it was.

As explained in my previous message, the test for having non-basic
faces in the cache is because when we have only the basic faces, the
problem with sharing the fontsets, and with having non-ASCII faces
that point to released ASCII faces, cannot happen, and therefore
there's no need to do anything.

It is true that 'used != 0' should also do the trick, but I realized
that testing against BASIC_FACE_ID_SENTINEL is stronger, because it
also prevents setting the flag when we have some faces in the cache,
but they are all basic.  Setting the face_change flag when we don't
need that will cause us to unnecessarily re-realize the basic faces,
so refraining from that prevents performance degradation.

> > The idea here is that if all we have in the frame's face cache are the
> > basic faces, then they are all ASCII faces, and therefore the issue
> > with non-ASCII faces still holding the freed fontset ID should not
> > arise.
> 
> See above, not what your code actually does.

Because of the >= thing? that's easily fixed, of course.

> Good thing, too, because
> that would change things only if no non-ASCII face were to be in use.

I don't understand what you are saying with this sentence.

> > The performance problem was caused by init_iterator calling
> > recompute_basic_faces every time and then shooting itself in the foot
> > by immediately setting the face_change flag,
> 
> And it's easily fixed by clearing the face_change flag after
> recompute_basic_faces runs, rather than before

I don't follow: if we clear the flag after recompute_basic_faces
exits, the faces will not be re-realized by next redisplay, and we
might be back at the problem of having non-ASCII faces whose ASCII
parents were removed from the cache and re-realize.  How will this be
useful?  Or what am I missing?

> > but this is now avoided
> > because init_iterator only calls recompute_basic_faces if the frame's
> > face cache is empty.  Most other places that call
> > recompute_basic_faces also do that when the cache is empty (in which
> > case we don't have to worry about non-ASCII faces), but some call it
> > unconditionally, and the above is for those.
> 
> ??? I don't see any such callers in my backtraces of Emacs with or
> without the patch

There are 2 such callers in frame.c, one of them was in the backtrace
you show in https://debbugs.gnu.org/cgi/bugreport.cgi?bug=72692#98,
viz.:

  Thread 1 "emacs" hit Breakpoint 3, free_realized_face (f=0x11d1cc8, face=0x11e22b0) at xfaces.c:4612
  4612				  fprintf (stderr, "Freeing fontset %d that's still in use by %p (ASCII face %p==%p)!\n", face->fontset,
  (gdb) p face->fontset
  $1 = 83
  (gdb) bt
  #0  free_realized_face (f=0x11d1cc8, face=0x11e22b0) at xfaces.c:4612
  #1  0x000000000055fe5d in realize_face (cache=0x12731e0, attrs=0x7fffffffb5f0, former_face_id=0) at xfaces.c:6117
  #2  0x000000000055faf7 in realize_default_face (f=0x11d1cc8) at xfaces.c:6030
  #3  0x000000000055f023 in realize_basic_faces (f=0x11d1cc8) at xfaces.c:5882
  #4  0x000000000055084f in recompute_basic_faces (f=0x11d1cc8) at xfaces.c:737
  #5  0x0000000000445160 in gui_set_alpha_background (f=0x11d1cc8, arg=XIL(0x7ffff33a80c7), oldval=XIL(0x7ffff33a80c7)) at frame.c:5237

There's also one caller in fontset.c (which calls Fclear_face_cache
before recompute_basic_faces, but Fclear_face_cache might decide not
to clear the cache).

> but I do see significant performance differences, so this must be
> wrong.

What performance difference did you see, and what is "this" that you
think must be wrong?

> > Btw, thinking again about the original problem, I don't think I
> > understand why free_face_fontset could cause a segfault.  All that
> > function does is remove the fontset from the cache and put nil into
> > its slot.  Thereafter trying to access the fontset by its ID will
> > yield nil.
> 
> It's font_for_char that segfaults, or fontset_font, depending on whether
> assertions are enabled.
> 
> The code path is this, in the case with assertions:
> 
>   fontset = FONTSET_FROM_ID (face->fontset);  /* fontset is nil */
>   eassert (!BASE_FONTSET_P (fontset));
> 
> Since
> 
> #define BASE_FONTSET_P(fontset) (NILP (FONTSET_BASE (fontset)))
> 
> and
> 
> #define FONTSET_BASE(fontset) XCHAR_TABLE (fontset)->extras[3]
> 
> we end up looking at XCHAR_TABLE (Qnil)->extras[3], which segfaults
> because nil is not a char table.
> 
> The code path without assertions is similar, it just that the
> XCHAR_TABLE happens further down the call stack, in fontset_find_font.
> See the original backtrace.

OK, so adding protection against fontset being nil, where we currently
lack that, should take care of these cases, right?

> >> However, I would still prefer a solution which doesn't, even for a short
> >> period of time, leave the non-ASCII faces' ->ascii_face pointing to a
> >> face which has been freed, and their ->fontset identifying a fontset
> >> which might be nil.
> >
> > This goes against the design of the faces and their interaction with
> > the display code, I think.
> 
> If the design indeed relies on dangling pointers to freed memory
> surviving said freeing (it doesn't), that would be a very bad design
> indeed, particularly since this fact isn't documented.

Faces on this level (i.e. those cached in the frame's face cache and
referenced by their face ID) don't matter in Emacs until they are used
to display something.  The design, AFAIU it, is to make sure the
cached faces are in good shape when we need them for display.

> The correct thing to do here is to ensure all references to
> ascii_face are removed, by freeing the non-ASCII faces, before we
> remove the ascii_face itself.

Maybe.  I'm not yet sure, primarily because we use the current code
for so many decades.

> And please, let's document such "design decisions" (actually, bugs) if
> you refuse to let anyone fix them.  Having a pointer in a structure
> which might point to random memory is a bug unless there is a very loud
> comment advising us of this fact.

If I come to the conclusion that these are indeed bugs, I will try to
fix them, of course.




This bug report was last modified 257 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.