GNU bug report logs - #38748
28.0.50; crash on MacOS 10.15.2

Previous Next

Package: emacs;

Reported by: Andrii Kolomoiets <andreyk.mad <at> gmail.com>

Date: Thu, 26 Dec 2019 09:49:01 UTC

Severity: normal

Merged with 38822

Found in versions 27.0.60, 28.0.50

Fixed in version 27.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #119 received at 38748 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: rpluim <at> gmail.com, andreyk.mad <at> gmail.com, alan <at> idiocy.org,
 jguenther <at> gmail.com, 38748 <at> debbugs.gnu.org
Subject: Re: bug#38748: 28.0.50; crash on MacOS 10.15.2
Date: Fri, 10 Jan 2020 10:27:45 +0200
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 10 Jan 2020 07:32:07 +0000
> Cc: rpluim <at> gmail.com, alan <at> idiocy.org, jguenther <at> gmail.com, 
> 	andreyk.mad <at> gmail.com, 38748 <at> debbugs.gnu.org
> 
> > The backtrace shows a very recursive GC, it doesn't show any other
> > function being deeply recursive.  So I'm not sure I understand what
> > tail-recursive function did you have in mind.  Can you elaborate?
> 
> I can. I think we're looking at two bugs: the first is the simple
> use-after-free of XFRAME (frame)->output_data.ns where `frame' is a
> dead frame. I've confirmed on GNU/Linux that mark_frame is called for
> a frame for which x_free_frame_resources has already been called, if
> there's a global variable still referencing the frame. I think the
> same thing happens on macOS.

This one doesn't depend on the 'ok's initialization in
face_inherited_attr in any way, does it?

> 1. I think face_inherited_attr is being optimized to tail-call itself
> rather than calling itself in a new stack frame; thus, it loops
> indefinitely for a faulty face setup which would otherwise lead to an
> immediate crash.
> 1b. that optimization only works without the harmless initialization of "ok".
> 
> 2. Our initial face setup is faulty in the sense above.
> 
> 3. Something happens on a secondary thread which causes our face setup
> to become non-faulty, possibly during GC.

What do you mean by "secondary thread"?  And how can GC modify Lisp
data structures? that'd be a terrible bug.

In any case, the full backtrace shows no trace of face_inherited_attr
call anywhere in the callstack, so if there is indeed infinite
recursion in that function, it was somehow exited long ago by the time
GC runs.

As for the tail-recursion part: do you see any sign of that in the
disassembly posted by Robert?  I didn't, but maybe I missed
something.  And such subtleties should only rear their ugly heads in
optimized code, whereas we already know that an unoptimized build
crashes in the same way.

I still think the shortest way to finding the culprit here is to
patiently and painfully go over the last_marked array, deciphering
the Lisp object we marked, until we succeed in identifying the Lisp
data structure which got corrupted.  Once we succeed in identifying
that data structure, it should be relatively easy to find who and
where corrupts it.  This may mean a lot of inconvenient drudgery,
exacerbated by the fact that having a functional GDB on macOS is not
easy, but I don't think we have a better way at this point.




This bug report was last modified 4 years and 300 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.