GNU bug report logs -
#76180
[feature/igc] Remaining known tracing issues
Previous Next
Full log
View this message in rfc822 format
Currently, I'm aware of three known issues which would potentially
result in live objects not being traced, which may lead to crashes that
are very hard to debug. These are:
1. Bytecode objects violating the stack limit prosaically called
HORRIBLE_ESTIMATE will fail to scan some of their excessively-large
bytecode stack if interrupted by GC in execution.
2. Nativecomp code is sometimes compiled without -fomit-frame-pointer
and uses %rbp as a general-purpose register: if said nativecomp code
calls SETJMP directly, and setjmp() "scrambles" the frame pointer
register, as it unfortunately does on most systems, the "scrambled"
general-purpose value will not be traced and will not pin or keep alive
its referenced object.
3. setjmp buffers in general are allocated using igc_alloc_handler,
which does not scan the setjmp buffer conservatively. As it is
unpredictable which callee-saved registers might contain live
references, we need to scan the setjmp buffer conservatively. That
still won't scan the stack pointer or frame pointer if they are
"scrambled", so we must also ensure the frame pointer is never omitted
in compiled code.
Note that there are good reasons we're not seeing frequent crashes due
to these bugs: 1. doesn't happen because most bytecode objects are small
(and nativecomp is often in use). 2. affects only nativecomp functions
calling setjmp directly, not those calling intermediate Emacs C code
(which saves the %rbp register on the stack where MPS would see it).
3. Most likely requires aggresssive link-time optimization options to
cause bugs.
My idea is to propose three stop-gap patches for these issues as soon as
this has a bug number; more satisfying patches will require help from
bytecode/bytecode GC experts (Mattias), the nativecomp maintainer
(Andrea), and a potential rewrite to always keep jump buffers on the C
stack (Paul).
My hope is that once these stop-gap patches are installed, it's a good
point to call again for general testing of the feature/igc branch, under
certain restrictions and with specific instructions to reduce the
possibility of data loss due to crashes (e.g. run in GDB, set a
breakpoint on kill, don't use TTY signals, xwidgets, unusual toolkits).
I hope that "PGTK" and "WIDE_EMACS_INT" can be removed from the list of
build constellations expected to be unstable ASAP.
The branch will continue to contain bugs and we can't take any
responsibility, but at least avoiding those known bugs seems to result
in a usable work environment here, and it might be a good opportunity to
regain some of the early testers who were too frustrated by stability
issues to let them know that we've fixed some of them.
Thoughts?
Pip
This bug report was last modified 175 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.