Package: emacs;
Reported by: Eval EXEC <execvy <at> gmail.com>
Date: Sun, 18 Aug 2024 08:31:01 UTC
Severity: normal
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Message #145 received at 72692 <at> debbugs.gnu.org (full text, mbox):
From: Pip Cet <pipcet <at> protonmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: execvy <at> gmail.com, 72692 <at> debbugs.gnu.org, juri <at> linkov.net Subject: Re: bug#72692: Emacs 31.05 (40eecd594ac) get SIGSEGV on Linux (Linux 6.6.45 Kde Wayland) Date: Tue, 27 Aug 2024 19:23:50 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes: >> Date: Mon, 26 Aug 2024 19:04:13 +0000 >> From: Pip Cet <pipcet <at> protonmail.com> >> Cc: execvy <at> gmail.com, 72692 <at> debbugs.gnu.org, juri <at> linkov.net >> >> "Eli Zaretskii" <eliz <at> gnu.org> writes: >> >> > --- a/src/xfaces.c >> > +++ b/src/xfaces.c >> > @@ -733,9 +733,13 @@ recompute_basic_faces (struct frame *f) >> > { >> > if (FRAME_FACE_CACHE (f)) >> > { >> > + bool non_basic_faces_cached = >> > + FRAME_FACE_CACHE (f)->used >= BASIC_FACE_ID_SENTINEL; >> >> There's an off by one here, I think > > Yes, it should be > instead of >= . Ah, okay. I really wasn't sure. >> FRAME_FACE_CACHE (f)->used is a >> count, and it'll usually be equal to BASIC_FACE_ID_SENTINEL after the >> basic faces have been realized, greater (never equal) if non-basic faces >> have been realized, and 0 if no faces have been realized. >> >> So, in fact, your patch is equivalent to checking that FRAME_FACE_CACHE >> (f)->used != 0, and only setting face_change if it was. > > As explained in my previous message, the test for having non-basic > faces in the cache is because when we have only the basic faces, the > problem with sharing the fontsets, and with having non-ASCII faces > that point to released ASCII faces, cannot happen, and therefore > there's no need to do anything. Indeed, but that adds a perilous dependency on whether we only have the basic faces (no non-ASCII faces in use) or not. > It is true that 'used != 0' should also do the trick, but I realized Indeed, it does. > that testing against BASIC_FACE_ID_SENTINEL is stronger, because it > also prevents setting the flag when we have some faces in the cache, > but they are all basic. Setting the face_change flag when we don't > need that will cause us to unnecessarily re-realize the basic faces, > so refraining from that prevents performance degradation. So it's a performance optimization, and one which helps only users who don't use non-ASCII during redisplay. >> > The performance problem was caused by init_iterator calling >> > recompute_basic_faces every time and then shooting itself in the foot >> > by immediately setting the face_change flag, >> >> And it's easily fixed by clearing the face_change flag after >> recompute_basic_faces runs, rather than before > > I don't follow: if we clear the flag after recompute_basic_faces > exits, the faces will not be re-realized by next redisplay, and we > might be back at the problem of having non-ASCII faces whose ASCII > parents were removed from the cache and re-realize. How will this be > useful? Or what am I missing? 'recompute_basic_faces' never realizes non-ASCII faces, and the face cache was empty before we called it, so it's safe to leave the flag off. The performance problem was caused by setting the flag unnecessarily in these circumstances. >> > but this is now avoided >> > because init_iterator only calls recompute_basic_faces if the frame's >> > face cache is empty. Most other places that call >> > recompute_basic_faces also do that when the cache is empty (in which >> > case we don't have to worry about non-ASCII faces), but some call it >> > unconditionally, and the above is for those. >> >> ??? I don't see any such callers in my backtraces of Emacs with or >> without the patch > > There are 2 such callers in frame.c, one of them was in the backtrace > you show in https://debbugs.gnu.org/cgi/bugreport.cgi?bug=72692#98, > viz.: Yes, but that was the reproducer, not code that ran into performance issues. Setting the background alpha is hardly performance critical :-) > There's also one caller in fontset.c (which calls Fclear_face_cache > before recompute_basic_faces, but Fclear_face_cache might decide not > to clear the cache). > >> but I do see significant performance differences, so this must be >> wrong. > > What performance difference did you see, and what is "this" that you > think must be wrong? "the above is for those". It's not those callers that cause the performance issue. >> > Btw, thinking again about the original problem, I don't think I >> > understand why free_face_fontset could cause a segfault. All that >> > function does is remove the fontset from the cache and put nil into >> > its slot. Thereafter trying to access the fontset by its ID will >> > yield nil. >> >> It's font_for_char that segfaults, or fontset_font, depending on whether >> assertions are enabled. >> >> The code path is this, in the case with assertions: >> >> fontset = FONTSET_FROM_ID (face->fontset); /* fontset is nil */ >> eassert (!BASE_FONTSET_P (fontset)); >> >> Since >> >> #define BASE_FONTSET_P(fontset) (NILP (FONTSET_BASE (fontset))) >> >> and >> >> #define FONTSET_BASE(fontset) XCHAR_TABLE (fontset)->extras[3] >> >> we end up looking at XCHAR_TABLE (Qnil)->extras[3], which segfaults >> because nil is not a char table. >> >> The code path without assertions is similar, it just that the >> XCHAR_TABLE happens further down the call stack, in fontset_find_font. >> See the original backtrace. > > OK, so adding protection against fontset being nil, where we currently > lack that, should take care of these cases, right? It'll turn a segfaulting bug into a wrong-behavior bug. >> >> However, I would still prefer a solution which doesn't, even for a short >> >> period of time, leave the non-ASCII faces' ->ascii_face pointing to a >> >> face which has been freed, and their ->fontset identifying a fontset >> >> which might be nil. >> > >> > This goes against the design of the faces and their interaction with >> > the display code, I think. >> >> If the design indeed relies on dangling pointers to freed memory >> surviving said freeing (it doesn't), that would be a very bad design >> indeed, particularly since this fact isn't documented. > > Faces on this level (i.e. those cached in the frame's face cache and > referenced by their face ID) don't matter in Emacs until they are used > to display something. The design, AFAIU it, is to make sure the > cached faces are in good shape when we need them for display. Indeed, and dangling pointers hardly qualify as good shape. >> The correct thing to do here is to ensure all references to >> ascii_face are removed, by freeing the non-ASCII faces, before we >> remove the ascii_face itself. > > Maybe. I'm not yet sure, primarily because we use the current code > for so many decades. Okay, I can live with "maybe". >> And please, let's document such "design decisions" (actually, bugs) if >> you refuse to let anyone fix them. Having a pointer in a structure >> which might point to random memory is a bug unless there is a very loud >> comment advising us of this fact. > > If I come to the conclusion that these are indeed bugs, I will try to > fix them, of course. I still think #72802 is clearly a bug and you said you didn't want it fixed, so forgive my confusion on this matter. Pip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.