GNU bug report logs - #33014
26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function

Previous Next

Package: emacs;

Reported by: Gemini Lasswell <gazally <at> runbox.com>

Date: Thu, 11 Oct 2018 05:32:01 UTC

Severity: normal

Tags: fixed

Found in version 26.1.50

Fixed in version 27.1

Done: Gemini Lasswell <gazally <at> runbox.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33014 in the body.
You can then email your comments to 33014 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Thu, 11 Oct 2018 05:32:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gemini Lasswell <gazally <at> runbox.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 11 Oct 2018 05:32:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
Date: Wed, 10 Oct 2018 22:30:29 -0700
[Message part 1 (text/plain, inline)]
When I run some byte-compiled code which creates some threads, and then,
while a thread is blocked, interactively evaluate the function which
was used to create that thread, Emacs has a fatal error or segmentation
fault when the thread becomes unblocked.

To reproduce:

  Build Emacs from master with this patch, in which I've pasted some
  excerpts from my current project onto the end of lisp/thread.el.  It's
  going to be like ERT but designed to run benchmarks instead of tests,
  but right now all it does is to create a buffer and three threads, set
  the threads up to communicate with each other, log their progress to
  *Messages*, and update the buffer when they finish:

[0001-Reproduce-Bswitch-segfault.patch (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
  Run Emacs with -Q, and then type:
    M-x erb-summary-run RET
    s

  Wait several seconds for the second to the last line in the buffer to
  change to "Finished".  There will also be an echo area message about
  erb--status being set to done.

  Navigate to lisp/thread.el, select everything from the definition of
  erb--benchmark-monitor to the end of the file, and use:
    M-x eval-region RET

  Return to the buffer created by erb-summary-run, and type 's' again.

Result:

lisp.h:2241: Emacs fatal error: assertion failed: HASH_TABLE_P (a)

My suspicion is that the garbage collector is freeing something needed
by the blocked thread.  Setting gc-cons-threshold to 500M before doing
the steps above stops the error from happening.

Here's the backtrace.  While trying to sort out how to reproduce this, I
also saw it segfault in Ffuncall, in styled_format, and in the Bswitch
case of exec_byte_code just past where this error occurs, when it tries
to access h->count.

Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
#0  terminate_due_to_signal (sig=sig <at> entry=6,
    backtrace_limit=backtrace_limit <at> entry=2147483647) at emacs.c:369
#1  0x00000000005a4d99 in die (msg=msg <at> entry=0x678d52 "HASH_TABLE_P (a)",
    file=file <at> entry=0x6768a5 "lisp.h", line=line <at> entry=2241) at alloc.c:7094
#2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
#3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=...,
    nargs=nargs <at> entry=0, args=<optimized out>,
    args <at> entry=0x16eac38 <bss_sbrk_buffer+9926040>) at bytecode.c:1403
#4  0x00000000005cb972 in funcall_lambda (fun=..., nargs=nargs <at> entry=0,
    arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>,
    arg_vector <at> entry=0x158ec58 <bss_sbrk_buffer+8500664>) at eval.c:3057
#5  0x00000000005c818b in Ffuncall (nargs=nargs <at> entry=1,
    args=args <at> entry=0x158ec50 <bss_sbrk_buffer+8500656>) at eval.c:2870
#6  0x000000000064443b in invoke_thread_function () at thread.c:684
#7  0x00000000005c728f in internal_condition_case (
    bfun=bfun <at> entry=0x644400 <invoke_thread_function>, handlers=...,
    handlers <at> entry=XIL(0xc3c0), hfun=hfun <at> entry=0x644320 <record_thread_error>)
    at eval.c:1373
#8  0x0000000000644dd1 in run_thread (state=0x158ec30 <bss_sbrk_buffer+8500624>)
    at thread.c:723
#9  0x00007f1cebf602a7 in start_thread ()
   from /nix/store/hwwqshlmazzjzj7yhrkyjydxamvvkfd3-glibc-2.26-131/lib/libpthread.so.0
#10 0x00007f1ceb5fd57f in clone ()
   from /nix/store/hwwqshlmazzjzj7yhrkyjydxamvvkfd3-glibc-2.26-131/lib/libc.so.6

Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
"erb--benchmark-monitor-func" (0x158ec58)


In GNU Emacs 27.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.28)
 of 2018-10-09 built on sockeye
Repository revision: 708444efad7a2ce1e309532898b844527e2d9c64
Windowing system distributor 'The X.Org Foundation', version 11.0.11906000
System Description: NixOS 18.03.git.bd06547 (Impala)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --prefix=/home/gem/src/emacs/master/bin --with-modules
 --with-x-toolkit=gtk3 --with-xft --config-cache
 --enable-checking=yes,glyphs --enable-check-lisp-object-type'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND DBUS GSETTINGS GLIB NOTIFY LIBSELINUX
GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM
MODULES THREADS GMP

Important settings:
  value of $EMACSLOADPATH:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv dired dired-loaddefs format-spec rfc822 mml
easymenu mml-sec password-cache epa derived epg epg-config gnus-util
rmail rmail-loaddefs time-date mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib sendmail
rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads dbusbind
inotify dynamic-setting system-font-setting font-render-setting
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 94967 9472)
 (symbols 48 20045 1)
 (strings 32 28456 1769)
 (string-bytes 1 816313)
 (vectors 16 14265)
 (vector-slots 8 504082 12268)
 (floats 8 47 70)
 (intervals 56 213 0)
 (buffers 992 11))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Fri, 12 Oct 2018 08:13:02 GMT) Full text and rfc822 format available.

Message #8 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Fri, 12 Oct 2018 11:12:17 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Date: Wed, 10 Oct 2018 22:30:29 -0700
> 
> When I run some byte-compiled code which creates some threads, and then,
> while a thread is blocked, interactively evaluate the function which
> was used to create that thread, Emacs has a fatal error or segmentation
> fault when the thread becomes unblocked.

Can you please make a smaller stand-alone test case, which doesn't
require patching Emacs?  That will make it much easier to try
reproducing the problem.

> Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
> #0  terminate_due_to_signal (sig=sig <at> entry=6,
>     backtrace_limit=backtrace_limit <at> entry=2147483647) at emacs.c:369
> #1  0x00000000005a4d99 in die (msg=msg <at> entry=0x678d52 "HASH_TABLE_P (a)",
>     file=file <at> entry=0x6768a5 "lisp.h", line=line <at> entry=2241) at alloc.c:7094
> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
> #3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=...,
>     nargs=nargs <at> entry=0, args=<optimized out>,
>     args <at> entry=0x16eac38 <bss_sbrk_buffer+9926040>) at bytecode.c:1403
> #4  0x00000000005cb972 in funcall_lambda (fun=..., nargs=nargs <at> entry=0,
>     arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>,
>     arg_vector <at> entry=0x158ec58 <bss_sbrk_buffer+8500664>) at eval.c:3057
> #5  0x00000000005c818b in Ffuncall (nargs=nargs <at> entry=1,
>     args=args <at> entry=0x158ec50 <bss_sbrk_buffer+8500656>) at eval.c:2870
> #6  0x000000000064443b in invoke_thread_function () at thread.c:684
> #7  0x00000000005c728f in internal_condition_case (
>     bfun=bfun <at> entry=0x644400 <invoke_thread_function>, handlers=...,
>     handlers <at> entry=XIL(0xc3c0), hfun=hfun <at> entry=0x644320 <record_thread_error>)
>     at eval.c:1373
> #8  0x0000000000644dd1 in run_thread (state=0x158ec30 <bss_sbrk_buffer+8500624>)

Can you show the Lisp backtrace of this thread?  Also, what is the
offending object 'a' in this frame:

> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241

and what was its parent object in the calling frame?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Fri, 12 Oct 2018 20:04:02 GMT) Full text and rfc822 format available.

Message #11 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Fri, 12 Oct 2018 13:02:56 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

> Can you please make a smaller stand-alone test case, which doesn't
> require patching Emacs?  That will make it much easier to try
> reproducing the problem.

I've tried to do that without success.  The bug won't reproduce if I put
all the code added to thread.el by the patch into its own file and load
it with C-u M-x byte-compile-file, and it also doesn't work to put the
resulting .elc on my load-path and load it with require.

I've determined today that having -O2 in CFLAGS is necessary to
reproduce the bug, and that -O1 or -O0 won't do it.

> Can you show the Lisp backtrace of this thread?  Also, what is the
> offending object 'a' in this frame:

The Lisp backtrace is really short:

Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
"erb--benchmark-monitor-func" (0x158ec58)

>> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
>
> and what was its parent object in the calling frame?

Those are both optimized out with -O2.  I recompiled bytecode.c with
"volatile" on the declaration of jmp_table, and got this:

(gdb) up 3
#3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., 
    nargs=nargs <at> entry=0, args=<optimized out>, 
    args <at> entry=0x16eacf8 <bss_sbrk_buffer+9926232>) at bytecode.c:1403
1403	            struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);
(gdb) p jmp_table
$1 = make_number(514)
(gdb) p *top
$3 = XIL(0x42b4d0)
(gdb) pp *top
remove

Then I started looking at other variables in exec_byte_code, and found
this which didn't look right:

(gdb) p *vectorp
$13 = XIL(0x7f4934009523)
(gdb) pr
(((help-menu "Help" keymap (emacs-tutorial menu-item "Emacs Tutorial" help-with-tutorial :help "Lear
?\207" [yank-menu kill-ring buffer-read-only gui-backend-selection-exists-p CLIPBOARD featurep ns] 2
\205^Q^@ÅÆ!\207" [visual-line-mode word-wrap truncate-lines 0 nil toggle-truncate-lines -1] 2 nil ni

(I've truncated the result of printing *vectorp since each line is over
5000 characters long.)

Since that looked like it was unlikely to be the original value of
*vectorp, I started a new debugging session and stepped through
Thread 7's call to exec_byte_code for erb--benchmark-monitor-func, and
determined that *vectorp's initial value was erb--status-updates, which
matches the first element of the constants vector in
(symbol-function 'erb--benchmark-monitor-func).

The value of vectorp was 0x16eac38 so I set a watchpoint on
  *(EMACS_INT *) 0x16eac38
and continued, and then during the execution of eval-region
it triggered here:

Thread 1 "monitor" hit Hardware watchpoint 7: *(EMACS_INT *) 0x16eac38

Old value = 60897760
New value = 24075314
setup_on_free_list (v=v <at> entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
    nbytes=nbytes <at> entry=272) at alloc.c:3060
3060	  total_free_vector_slots += nbytes / word_size;
(gdb) bt 10
#0  setup_on_free_list (v=v <at> entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
    nbytes=nbytes <at> entry=272) at alloc.c:3060
#1  0x00000000005a9a24 in sweep_vectors () at alloc.c:3297
#2  0x00000000005adb2e in gc_sweep () at alloc.c:6872
#3  garbage_collect_1 (end=<optimized out>) at alloc.c:5860
#4  Fgarbage_collect () at alloc.c:5989
#5  0x00000000005ca478 in maybe_gc () at lisp.h:4804
#6  Ffuncall (nargs=4, args=args <at> entry=0x7fff210a3bc8) at eval.c:2838
#7  0x0000000000611e00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., 
    args_template=..., nargs=nargs <at> entry=2, args=<optimized out>, 
    args <at> entry=0x9bd128 <pure+781288>) at bytecode.c:632
#8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fff210a3bc8), 
    nargs=nargs <at> entry=2, arg_vector=0x9bd128 <pure+781288>, 
    arg_vector <at> entry=0x7fff210a3f00) at eval.c:3057
#9  0x00000000005ca54b in Ffuncall (nargs=3, args=args <at> entry=0x7fff210a3ef8)
    at eval.c:2870
(More stack frames follow...)

Note that just as was happening when we were working through bug#32357,
the thread names which gdb prints are wrong, which I verified with:

(gdb) p current_thread
$21 = (struct thread_state *) 0xd73480 <main_thread>
(gdb) p current_thread->name
$22 = XIL(0)

Am I correct that the next step is to figure out why the garbage
collector is not marking this vector?  Presumably it's no longer
attached to the function definition for erb--benchmark-monitor-func by
the time the garbage collector runs, but it's supposed to be found by
mark_stack when called from mark_one_thread for Thread 7, right?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sat, 13 Oct 2018 06:24:01 GMT) Full text and rfc822 format available.

Message #14 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sat, 13 Oct 2018 09:23:38 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: 33014 <at> debbugs.gnu.org
> Date: Fri, 12 Oct 2018 13:02:56 -0700
> 
> I've tried to do that without success.  The bug won't reproduce if I put
> all the code added to thread.el by the patch into its own file and load
> it with C-u M-x byte-compile-file, and it also doesn't work to put the
> resulting .elc on my load-path and load it with require.

Did you try loading it as a .el file?

Anyway, it's too bad that the reproduction is so Heisenbug-like.  It
probably won't reproduce on my system anyway.

> I've determined today that having -O2 in CFLAGS is necessary to
> reproduce the bug, and that -O1 or -O0 won't do it.

One more reason why reproduction elsewhere is probably hard.

> The Lisp backtrace is really short:
> 
> Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
> "erb--benchmark-monitor-func" (0x158ec58)

If you succeed in reproducing this when this code is loaded
uncompiled, the backtrace might be more helpful.

> >> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
> >
> > and what was its parent object in the calling frame?
> 
> Those are both optimized out with -O2.  I recompiled bytecode.c with
> "volatile" on the declaration of jmp_table, and got this:
> 
> (gdb) up 3
> #3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., 
>     nargs=nargs <at> entry=0, args=<optimized out>, 
>     args <at> entry=0x16eacf8 <bss_sbrk_buffer+9926232>) at bytecode.c:1403
> 1403	            struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);
> (gdb) p jmp_table
> $1 = make_number(514)
> (gdb) p *top
> $3 = XIL(0x42b4d0)
> (gdb) pp *top
> remove

Which one of these is the one that triggers the assertion violation?

> Thread 1 "monitor" hit Hardware watchpoint 7: *(EMACS_INT *) 0x16eac38
> 
> Old value = 60897760
> New value = 24075314
> setup_on_free_list (v=v <at> entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
>     nbytes=nbytes <at> entry=272) at alloc.c:3060
> 3060	  total_free_vector_slots += nbytes / word_size;
> (gdb) bt 10
> #0  setup_on_free_list (v=v <at> entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
>     nbytes=nbytes <at> entry=272) at alloc.c:3060
> #1  0x00000000005a9a24 in sweep_vectors () at alloc.c:3297
> #2  0x00000000005adb2e in gc_sweep () at alloc.c:6872
> #3  garbage_collect_1 (end=<optimized out>) at alloc.c:5860
> #4  Fgarbage_collect () at alloc.c:5989
> #5  0x00000000005ca478 in maybe_gc () at lisp.h:4804
> #6  Ffuncall (nargs=4, args=args <at> entry=0x7fff210a3bc8) at eval.c:2838
> #7  0x0000000000611e00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., 
>     args_template=..., nargs=nargs <at> entry=2, args=<optimized out>, 
>     args <at> entry=0x9bd128 <pure+781288>) at bytecode.c:632
> #8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fff210a3bc8), 
>     nargs=nargs <at> entry=2, arg_vector=0x9bd128 <pure+781288>, 
>     arg_vector <at> entry=0x7fff210a3f00) at eval.c:3057
> #9  0x00000000005ca54b in Ffuncall (nargs=3, args=args <at> entry=0x7fff210a3ef8)
>     at eval.c:2870
> (More stack frames follow...)

Can you show the Lisp backtrace for the above?

> Note that just as was happening when we were working through bug#32357,
> the thread names which gdb prints are wrong, which I verified with:

Looks like a bug in pthreads version of sys_thread_create: it calls
prctl with first arg PR_SET_NAME, but my reading of the documentation
is that such a call gives the name to the _calling_ thread, which is
not the thread just created.  We should instead call
pthread_setname_np, I think (but I'm not an expert on pthreads).

> Am I correct that the next step is to figure out why the garbage
> collector is not marking this vector?  Presumably it's no longer
> attached to the function definition for erb--benchmark-monitor-func by
> the time the garbage collector runs, but it's supposed to be found by
> mark_stack when called from mark_one_thread for Thread 7, right?

Is this vector the byte-code of erb--benchmark-monitor-func?  If so,
how come it is no longer attached to the function, as long as the
function does exist?

And if this vector isn't the byte-code of erb--benchmark-monitor-func,
then what is it?

IMO, we cannot reason about what GC does or doesn't do until we
understand what data structure it processes, and what is the relation
of that data structure to the symbols in your program and in Emacs.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sat, 13 Oct 2018 17:18:01 GMT) Full text and rfc822 format available.

Message #17 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sat, 13 Oct 2018 10:17:10 -0700
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> Did you try loading it as a .el file?

Yes, but I couldn't reproduce the bug.

>> The Lisp backtrace is really short:
>> 
>> Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
>> "erb--benchmark-monitor-func" (0x158ec58)
>
> If you succeed in reproducing this when this code is loaded
> uncompiled, the backtrace might be more helpful.

The assertion happens in the Bswitch case of exec_byte_code when it's
running erb--benchmark-monitor-func, and there's only one 'switch' in
the disassembled bytecode, see line 16:

[bytecode.txt (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
>> (gdb) p jmp_table
>> $1 = make_number(514)
>> (gdb) p *top
>> $3 = XIL(0x42b4d0)
>> (gdb) pp *top
>> remove
>
> Which one of these is the one that triggers the assertion violation?

jmp_table.  The assertion violation is at line 1403 in bytecode.c:

            struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);

>> Thread 1 "monitor" hit Hardware watchpoint 7: *(EMACS_INT *) 0x16eac38
>> 
>> Old value = 60897760
>> New value = 24075314
>> setup_on_free_list (v=v <at> entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
>>     nbytes=nbytes <at> entry=272) at alloc.c:3060
>> 3060	  total_free_vector_slots += nbytes / word_size;
>> (gdb) bt 10
>> #0  setup_on_free_list (v=v <at> entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
>>     nbytes=nbytes <at> entry=272) at alloc.c:3060
>> #1  0x00000000005a9a24 in sweep_vectors () at alloc.c:3297
>> #2  0x00000000005adb2e in gc_sweep () at alloc.c:6872
>> #3  garbage_collect_1 (end=<optimized out>) at alloc.c:5860
>> #4  Fgarbage_collect () at alloc.c:5989
>> #5  0x00000000005ca478 in maybe_gc () at lisp.h:4804
>> #6  Ffuncall (nargs=4, args=args <at> entry=0x7fff210a3bc8) at eval.c:2838
>> #7  0x0000000000611e00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., 
>>     args_template=..., nargs=nargs <at> entry=2, args=<optimized out>, 
>>     args <at> entry=0x9bd128 <pure+781288>) at bytecode.c:632
>> #8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fff210a3bc8), 
>>     nargs=nargs <at> entry=2, arg_vector=0x9bd128 <pure+781288>, 
>>     arg_vector <at> entry=0x7fff210a3f00) at eval.c:3057
>> #9  0x00000000005ca54b in Ffuncall (nargs=3, args=args <at> entry=0x7fff210a3ef8)
>>     at eval.c:2870
>> (More stack frames follow...)
>
> Can you show the Lisp backtrace for the above?

(gdb) xbacktrace
"Automatic GC" (0x0)
"string-match" (0x210a3bd0)
"completion-pcm--string->pattern" (0x210a3f00)
"completion-pcm--find-all-completions" (0x210a43a0)
"completion-pcm-try-completion" (0x210a4668)
0x1723c30 PVEC_COMPILED
"completion--some" (0x210a4b60)
"completion--nth-completion" (0x210a4e68)
"completion-try-completion" (0x210a50f0)
"execute-extended-command--shorter" (0x210a5390)
"execute-extended-command" (0x210a5760)
"funcall-interactively" (0x210a5758)
"call-interactively" (0x210a5a90)
"command-execute" (0x210a5d48)

>> Am I correct that the next step is to figure out why the garbage
>> collector is not marking this vector?  Presumably it's no longer
>> attached to the function definition for erb--benchmark-monitor-func by
>> the time the garbage collector runs, but it's supposed to be found by
>> mark_stack when called from mark_one_thread for Thread 7, right?
>
> Is this vector the byte-code of erb--benchmark-monitor-func?  If so,
> how come it is no longer attached to the function, as long as the
> function does exist?

This vector is the constants vector for the byte-code of
erb--benchmark-monitor-func.

When eval-region evaluates the defun for erb--benchmark-monitor-func, it
replaces the symbol's function definition, so it removes that reference
to the byte-code.  AFAIK the only other reference to the byte-code
is on the stack of Thread 7, which is running the byte-code.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sat, 13 Oct 2018 18:05:02 GMT) Full text and rfc822 format available.

Message #20 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sat, 13 Oct 2018 21:04:18 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: 33014 <at> debbugs.gnu.org
> Date: Sat, 13 Oct 2018 10:17:10 -0700
> 
> When eval-region evaluates the defun for erb--benchmark-monitor-func, it
> replaces the symbol's function definition, so it removes that reference
> to the byte-code.  AFAIK the only other reference to the byte-code
> is on the stack of Thread 7, which is running the byte-code.

So you are saying that the call to mark_stack inside mark_one_thread
doesn't do its job well enough?  AFAIU, it's supposed to scan the
stack of each and every thread, and mark Lisp objects referenced from
those stacks.

How do we know there's a reference to that vector on thread 7's stack?
Could it be that there is no reference at all?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sun, 14 Oct 2018 19:30:02 GMT) Full text and rfc822 format available.

Message #23 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sun, 14 Oct 2018 12:29:42 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

> How do we know there's a reference to that vector on thread 7's stack?
> Could it be that there is no reference at all?

Yes it could be that the reference is getting optimized out.  I asked
gdb for more detail on the stack frames for exec_byte_code and
funcall_lambda, and the arguments referring to the byte-code object and
its components do appear to be optimized out, see below.  I also tried
adding 'volatile' to the declaration of the local variable 'fun' in
Ffuncall, and that made the bug go away.

Is there anything else I should be looking at before concluding that
this is the problem?  And if it is, what is the best way to fix it?


lisp.h:2241: Emacs fatal error: assertion failed: HASH_TABLE_P (a)
[Switching to Thread 0x7f6eca4b9700 (LWP 17729)]

Thread 7 "builder 0" hit Breakpoint 1, terminate_due_to_signal (sig=sig <at> entry=6, 
    backtrace_limit=backtrace_limit <at> entry=2147483647) at emacs.c:369
369	{
(gdb) bt
#0  terminate_due_to_signal (sig=sig <at> entry=6, 
    backtrace_limit=backtrace_limit <at> entry=2147483647) at emacs.c:369
#1  0x00000000005a7159 in die (msg=msg <at> entry=0x67b132 "HASH_TABLE_P (a)", 
    file=file <at> entry=0x678c85 "lisp.h", line=line <at> entry=2241) at alloc.c:7094
#2  0x0000000000614685 in XHASH_TABLE (a=...) at lisp.h:2241
#3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., 
    nargs=nargs <at> entry=0, args=<optimized out>, 
    args <at> entry=0x16eac38 <bss_sbrk_buffer+9926040>) at bytecode.c:1403
#4  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7f6eca4b8470), 
    nargs=nargs <at> entry=0, arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>, 
    arg_vector <at> entry=0x1574c58 <bss_sbrk_buffer+8394168>) at eval.c:3057
#5  0x00000000005ca54b in Ffuncall (nargs=nargs <at> entry=1, 
    args=args <at> entry=0x1574c50 <bss_sbrk_buffer+8394160>) at eval.c:2870
#6  0x000000000064680b in invoke_thread_function () at thread.c:684
#7  0x00000000005c964f in internal_condition_case (
    bfun=bfun <at> entry=0x6467d0 <invoke_thread_function>, handlers=..., 
    handlers <at> entry=XIL(0xc3c0), hfun=hfun <at> entry=0x6466f0 <record_thread_error>)
    at eval.c:1373
#8  0x00000000006471a1 in run_thread (state=0x1574c30 <bss_sbrk_buffer+8394128>)
    at thread.c:723
#9  0x00007f6eea5cb5a7 in start_thread ()
   from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libpthread.so.0
#10 0x00007f6ee9c6622f in clone ()
   from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libc.so.6
(gdb) set print frame-arguments all
(gdb) info frame 3
Stack frame at 0x7f6eca4b87d0:
 rip = 0x614685 in exec_byte_code (bytecode.c:1403); saved rip = 0x5cdd32
 called by frame at 0x7f6eca4b87d0, caller of frame at 0x7f6eca4b87d0
 source language c.
 Arglist at 0x7f6eca4b87c0, args: bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs <at> entry=0, args=<optimized out>, 
    args <at> entry=0x16eac38 <bss_sbrk_buffer+9926040>
 Locals at 0x7f6eca4b87c0, Previous frame's sp is 0x7f6eca4b87c8
 Saved registers:
  rbx at 0x7f6eca4b8798, rbp at 0x7f6eca4b87c0, r12 at 0x7f6eca4b87a0,
  r13 at 0x7f6eca4b87a8, r14 at 0x7f6eca4b87b0, r15 at 0x7f6eca4b87b8
(gdb) info frame 4
Stack frame at 0x7f6eca4b87d0:
 rip = 0x5cdd32 in funcall_lambda (eval.c:3057); saved rip = 0x64680b
 tail call frame, caller of frame at 0x7f6eca4b87d0
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp is 0x7f6eca4b87d0




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sun, 14 Oct 2018 19:47:01 GMT) Full text and rfc822 format available.

Message #26 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Gemini Lasswell <gazally <at> runbox.com>, 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sun, 14 Oct 2018 21:46:35 +0200
On Okt 13 2018, Eli Zaretskii <eliz <at> gnu.org> wrote:

> So you are saying that the call to mark_stack inside mark_one_thread
> doesn't do its job well enough?  AFAIU, it's supposed to scan the
> stack of each and every thread, and mark Lisp objects referenced from
> those stacks.
>
> How do we know there's a reference to that vector on thread 7's stack?
> Could it be that there is no reference at all?

Do we actually mark the registers of the threads as gc roots?

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Mon, 15 Oct 2018 02:38:02 GMT) Full text and rfc822 format available.

Message #29 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Mon, 15 Oct 2018 05:37:26 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: 33014 <at> debbugs.gnu.org
> Date: Sun, 14 Oct 2018 12:29:42 -0700
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > How do we know there's a reference to that vector on thread 7's stack?
> > Could it be that there is no reference at all?
> 
> Yes it could be that the reference is getting optimized out.  I asked
> gdb for more detail on the stack frames for exec_byte_code and
> funcall_lambda, and the arguments referring to the byte-code object and
> its components do appear to be optimized out, see below.  I also tried
> adding 'volatile' to the declaration of the local variable 'fun' in
> Ffuncall, and that made the bug go away.

"Optimized out" is GDB's way of saying it's confused by the complex
way a variable's location changes as the program counter advances.  It
doesn't mean the variable is lost, just that GDB lost its track.

> Is there anything else I should be looking at before concluding that
> this is the problem?  And if it is, what is the best way to fix it?

There's the question Andreas asked, we should look into that, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Mon, 15 Oct 2018 15:00:02 GMT) Full text and rfc822 format available.

Message #32 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: gazally <at> runbox.com, 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Mon, 15 Oct 2018 17:59:26 +0300
> From: Andreas Schwab <schwab <at> linux-m68k.org>
> Cc: Gemini Lasswell <gazally <at> runbox.com>,  33014 <at> debbugs.gnu.org
> Date: Sun, 14 Oct 2018 21:46:35 +0200
> 
> Do we actually mark the registers of the threads as gc roots?

According to my reading of the code, we do.  Each time a running
thread is about to release the global lock, we call
flush_stack_call_func, which is supposed to flush relevant registers
to the stack of that thread.  And mark_one_thread marks the stack of
each thread, so it should be able to see the Lisp objects on that
stack.

In this case, the function whose bytecode seems to be GC'ed is the
thread function itself.  That function is also marked, as part of
marking the thread object itself, although, of course, re-evaluating
the function will redefine the function.  But, if my reading of
exec_byte_code is correct, the bytecode should be on the stack and in
registers while we execute it, so even though the bytecode gets
disconnected from the function, it is still reachable from the stack,
and should have been marked...

Could this be some bug in the implementation of __builtin_unwind_init
etc., which causes it not to save some registers under some
conditions?  Gemini, what version of the compiler are you using?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Mon, 15 Oct 2018 16:24:03 GMT) Full text and rfc822 format available.

Message #35 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org, Andreas Schwab <schwab <at> linux-m68k.org>
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Mon, 15 Oct 2018 09:22:46 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

> Could this be some bug in the implementation of __builtin_unwind_init
> etc., which causes it not to save some registers under some
> conditions?  Gemini, what version of the compiler are you using?

7.3.0





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Mon, 15 Oct 2018 16:42:01 GMT) Full text and rfc822 format available.

Message #38 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Mon, 15 Oct 2018 19:41:43 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: Andreas Schwab <schwab <at> linux-m68k.org>,  33014 <at> debbugs.gnu.org
> Date: Mon, 15 Oct 2018 09:22:46 -0700
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Could this be some bug in the implementation of __builtin_unwind_init
> > etc., which causes it not to save some registers under some
> > conditions?  Gemini, what version of the compiler are you using?
> 
> 7.3.0

Then it's unlikely, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Tue, 16 Oct 2018 18:48:02 GMT) Full text and rfc822 format available.

Message #41 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org, Andreas Schwab <schwab <at> linux-m68k.org>
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Tue, 16 Oct 2018 11:46:36 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

> In this case, the function whose bytecode seems to be GC'ed is the
> thread function itself.  That function is also marked, as part of
> marking the thread object itself, although, of course, re-evaluating
> the function will redefine the function.  But, if my reading of
> exec_byte_code is correct, the bytecode should be on the stack and in
> registers while we execute it, so even though the bytecode gets
> disconnected from the function, it is still reachable from the stack,
> and should have been marked...

My knowledge of what gcc does and how the code it generates works is
superficial, but I don't see why an optimizer would find it necessary to
save the following values:

- The value of 'fun' in Ffuncall after it is used as an argument for
  funcall_lambda.

- The value of 'fun' in funcall_lambda after it is used to calculate
  the arguments to exec_byte_code.

- The value of 'vector' in exec_byte_code after the calculation of
  vectorp.

These three are the only variables that I see in Thread 7 from which the
garbage collector could find the constants vector which it's not
finding.  If gcc's optimizer puts them all in registers instead of on
the stack because it knows it won't need them later, those registers
will be overwritten with other values by recursive calls before
flush_stack_call_func is called.  Here's the backtrace of where Thread 7
is stopped while Thread 1 is running garbage collection, in which the
three frames I'm talking about above are 10, 11 and 12:

(gdb) thread apply 7 bt

Thread 7 (Thread 0x7fecdacdd700 (LWP 5509)):
#0  0x00007fecf771f592 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libpthread.so.0
#1  0x0000000000648389 in sys_cond_wait (cond=cond <at> entry=0x16e9c48 <bss_sbrk_buffer+9921960>, mutex=mutex <at> entry=0xd73440 <global_lock>) at systhread.c:163
#2  0x0000000000647747 in condition_wait_callback (arg=0x16e9c30 <bss_sbrk_buffer+9921936>) at thread.c:410
#3  0x00000000005a9608 in flush_stack_call_func (func=func <at> entry=0x647630 <condition_wait_callback>, arg=<optimized out>) at alloc.c:5021
#4  0x0000000000646e1d in Fcondition_wait (cond=<optimized out>) at thread.c:449
#5  0x00000000005cc49e in funcall_subr (subr=0xcdc1c0 <Scondition_wait>, numargs=numargs <at> entry=1, args=args <at> entry=0x7fecdacdc1c0) at eval.c:2931
#6  0x00000000005ca661 in Ffuncall (nargs=2, args=args <at> entry=0x7fecdacdc1b8) at eval.c:2856
#7  0x0000000000611e00 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs <at> entry=1, 
    args=<optimized out>, args <at> entry=0x11bde68 <bss_sbrk_buffer+4499400>) at bytecode.c:632
#8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fecdacdc1b8), nargs=nargs <at> entry=1, arg_vector=0x11bde68 <bss_sbrk_buffer+4499400>, arg_vector <at> entry=0x7fecdacdc470)
    at eval.c:3057
#9  0x00000000005ca54b in Ffuncall (nargs=2, args=args <at> entry=0x7fecdacdc468) at eval.c:2870
#10 0x0000000000611e00 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs <at> entry=0, 
    args=<optimized out>, args <at> entry=0x16eac38 <bss_sbrk_buffer+9926040>) at bytecode.c:632
#11 0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fecdacdc468), nargs=nargs <at> entry=0, arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>, 
    arg_vector <at> entry=0x1578c58 <bss_sbrk_buffer+8410552>) at eval.c:3057
#12 0x00000000005ca54b in Ffuncall (nargs=nargs <at> entry=1, args=args <at> entry=0x1578c50 <bss_sbrk_buffer+8410544>) at eval.c:2870
#13 0x000000000064680b in invoke_thread_function () at thread.c:684
#14 0x00000000005c964f in internal_condition_case (bfun=bfun <at> entry=0x6467d0 <invoke_thread_function>, handlers=<optimized out>, handlers <at> entry=XIL(0xc3c0), 
    hfun=hfun <at> entry=0x6466f0 <record_thread_error>) at eval.c:1373
#15 0x00000000006471a1 in run_thread (state=0x1578c30 <bss_sbrk_buffer+8410512>) at thread.c:723
#16 0x00007fecf77195a7 in start_thread () from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libpthread.so.0
#17 0x00007fecf6db422f in clone () from
/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libc.so.6

gdb shows a value for fun in frame 11, but when I try to print
XIL(0x7fecdacdc468) it complains about it being an invalid lisp object,
and then the result of "info frame 11" shows some similar values,
so I'm thinking gdb is confused:

(gdb) thread apply 7 info frame 11

Thread 7 (Thread 0x7fecdacdd700 (LWP 5509)):
Stack frame at 0x7fecdacdc7d0:
 rip = 0x5cdd32 in funcall_lambda (eval.c:3057); saved rip = 0x64680b
 tail call frame, caller of frame at 0x7fecdacdc7d0
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp is 0x7fecdacdc7d0

I haven't figured out how to get gdb to print the Lisp backtrace of one
thread while execution is stopped in a different one.  But I expect
Thread 7's Lisp backtrace looks like this:

condition-wait
thread-queue-get
erb--benchmark-monitor-func





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Tue, 16 Oct 2018 19:26:01 GMT) Full text and rfc822 format available.

Message #44 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Tue, 16 Oct 2018 22:25:21 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: Andreas Schwab <schwab <at> linux-m68k.org>,  33014 <at> debbugs.gnu.org
> Date: Tue, 16 Oct 2018 11:46:36 -0700
> 
> My knowledge of what gcc does and how the code it generates works is
> superficial, but I don't see why an optimizer would find it necessary to
> save the following values:
> 
> - The value of 'fun' in Ffuncall after it is used as an argument for
>   funcall_lambda.
> 
> - The value of 'fun' in funcall_lambda after it is used to calculate
>   the arguments to exec_byte_code.
> 
> - The value of 'vector' in exec_byte_code after the calculation of
>   vectorp.

There are calling frames as well.  For GC to pay attention to a Lisp
object, it is enough to have that object _somewhere_ on the stack.

Anyway, are you saying that stack marking doesn't work in optimized
code?  We've been using this technique for the last 17 years without
problems; why would the fact that we have more than one thread change
that?  The same arguments you submit are valid for a single-threaded
Emacs, right?

I think the chance of something like what you describe to happen here
are small, and we shouldn't throw in the towel so quickly.  I don't
think we've exhausted all the other possibilities, not yet.

> gdb shows a value for fun in frame 11, but when I try to print
> XIL(0x7fecdacdc468) it complains about it being an invalid lisp object,
> and then the result of "info frame 11" shows some similar values,
> so I'm thinking gdb is confused:

It's quite possible that GDB is not confused, and you've found some
evidence of the problem.

How did you try to print XIL(0x7fecdacdc468)?  Maybe we should take a
good look at this object.

> I haven't figured out how to get gdb to print the Lisp backtrace of one
> thread while execution is stopped in a different one.

You can't, AFAIR.  The code that helps us produce a Lisp backtrace
doesn't work in that case.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Tue, 16 Oct 2018 19:39:02 GMT) Full text and rfc822 format available.

Message #47 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: gazally <at> runbox.com
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Tue, 16 Oct 2018 22:38:05 +0300
> Date: Tue, 16 Oct 2018 22:25:21 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
> 
> > - The value of 'fun' in Ffuncall after it is used as an argument for
> >   funcall_lambda.
> > 
> > - The value of 'fun' in funcall_lambda after it is used to calculate
> >   the arguments to exec_byte_code.
> > 
> > - The value of 'vector' in exec_byte_code after the calculation of
> >   vectorp.
> 
> There are calling frames as well.  For GC to pay attention to a Lisp
> object, it is enough to have that object _somewhere_ on the stack.

And btw, 'fun' is not the object we should be tracing in this case.
We should be tracing the bytecode that is being run, either the entire
vector or some of its elements.  AFAIU, that is the bytecode of the
thread function, which I think is the one called here:

> #11 0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fecdacdc468), nargs=nargs <at> entry=0, arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>, 
>     arg_vector <at> entry=0x1578c58 <bss_sbrk_buffer+8410552>) at eval.c:3057
> #12 0x00000000005ca54b in Ffuncall (nargs=nargs <at> entry=1, args=args <at> entry=0x1578c50 <bss_sbrk_buffer+8410544>) at eval.c:2870
> #13 0x000000000064680b in invoke_thread_function () at thread.c:684




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Wed, 17 Oct 2018 16:22:02 GMT) Full text and rfc822 format available.

Message #50 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Wed, 17 Oct 2018 19:21:01 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: Andreas Schwab <schwab <at> linux-m68k.org>,  33014 <at> debbugs.gnu.org
> Date: Tue, 16 Oct 2018 11:46:36 -0700
> 
> My knowledge of what gcc does and how the code it generates works is
> superficial, but I don't see why an optimizer would find it necessary to
> save the following values:
> 
> - The value of 'fun' in Ffuncall after it is used as an argument for
>   funcall_lambda.
> 
> - The value of 'fun' in funcall_lambda after it is used to calculate
>   the arguments to exec_byte_code.
> 
> - The value of 'vector' in exec_byte_code after the calculation of
>   vectorp.

After thinking about this a bit, I don't really agree with the last
one: the compiler could indeed stop tracking 'vector', but not
XVECTOR (vector)->contents, and we are interested in the latter.

One other thought is that, if worse comes to worst, we may consider
disallowing redefinition of a function that is currently being
executed (in another thread).

However, I'm still not convinced we are there.  Can we establish which
element(s) of the bytecode vector are GC'ed in this scenario?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Thu, 18 Oct 2018 01:08:02 GMT) Full text and rfc822 format available.

Message #53 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Wed, 17 Oct 2018 18:07:39 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

> After thinking about this a bit, I don't really agree with the last
> one: the compiler could indeed stop tracking 'vector', but not
> XVECTOR (vector)->contents, and we are interested in the latter.

If the compiler stops tracking 'vector', and the garbage collector frees
it, doesn't that cause XVECTOR (vector)->contents to be overwritten?  In
the debugging session in my second message in this thread I had a
hardware watchpoint on what vectorp was pointing at and it went off in
setup_on_free_list.

> However, I'm still not convinced we are there.  Can we establish which
> element(s) of the bytecode vector are GC'ed in this scenario?

I'll see if I can figure that out.

Is there an easy way to print the function binding of a Lisp symbol from
gdb?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Thu, 18 Oct 2018 17:05:02 GMT) Full text and rfc822 format available.

Message #56 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Thu, 18 Oct 2018 20:04:03 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: 33014 <at> debbugs.gnu.org,  schwab <at> linux-m68k.org
> Date: Wed, 17 Oct 2018 18:07:39 -0700
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > After thinking about this a bit, I don't really agree with the last
> > one: the compiler could indeed stop tracking 'vector', but not
> > XVECTOR (vector)->contents, and we are interested in the latter.
> 
> If the compiler stops tracking 'vector', and the garbage collector frees
> it, doesn't that cause XVECTOR (vector)->contents to be overwritten?

Hmmm... could be.

> > However, I'm still not convinced we are there.  Can we establish which
> > element(s) of the bytecode vector are GC'ed in this scenario?
> 
> I'll see if I can figure that out.
> 
> Is there an easy way to print the function binding of a Lisp symbol from
> gdb?

Not sure what you mean by "the function binding" in this context.
I hope something like the following will do:

  (gdb) p fun
  (gdb) xpr

Let me know if this isn't what you meant.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Fri, 19 Oct 2018 00:23:01 GMT) Full text and rfc822 format available.

Message #59 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Thu, 18 Oct 2018 17:22:36 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

> Anyway, are you saying that stack marking doesn't work in optimized
> code?  We've been using this technique for the last 17 years without
> problems; why would the fact that we have more than one thread change
> that?  The same arguments you submit are valid for a single-threaded
> Emacs, right?

Apparently so.  I set up a single-threaded situation where I could
redefine a function while exec_byte_code was running it, and got a
segfault.  I've gained some insights from debugging this version of the
bug which I will put into a separate email.

Here are steps which consistently reproduce it for me:

Save the following code to the file 'repro.el', and then run
emacs -Q (I'm using master built with -O2 in CFLAGS):

;;;  -*- lexical-binding: t -*-
(defvar my-var "ok")
(defun my-loop-1 ()
  (let ((val 0))
    (while t
      (insert "Now in recursive edit\n")
      (recursive-edit)
      (insert (format "Leaving recursive edit: %s\n" my-var))
      (let ((things '(a b c d e)))
	(cond 
         ((= val 0) (message "foo: %s" (last things)))
         ((= val 1) (message "bar: %s" things))
         ((= val 2) (message "baz: %s" (car things)))
         (t (message "bop: %s" (nth 2 things))))
	(setq val (mod (1+ val) 3))))))

(defun my-loop ()
  (interactive)
  (redraw-display)
  (my-loop-1))

(defun my-gc-1 ()
  (garbage-collect))

(defun my-gc ()
  (interactive)
  (my-gc-1))

(provide 'repro)

Then, from emacs -Q:

C-x C-f repro.el RET
C-u M-x byte-compile-file RET repro.el RET
C-x b RET
M-x my-loop RET
C-x b RET
M-x eval-buffer RET
C-x b RET
M-x my-gc RET
C-M-c

Result:

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffc8, 
    message=<optimized out>) at editfns.c:3129
3129	      unsigned char format_char = *format++;
(gdb) bt
#0  0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffc8, message=<optimized out>) at editfns.c:3129
#1  0x00000000005ca771 in Ffuncall (nargs=3, args=args <at> entry=0x7ffffffeffc0) at eval.c:2859
#2  0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs <at> entry=0, args=<optimized out>, args <at> entry=0x31bda38)
    at bytecode.c:632
#3  0x00000000005cde82 in funcall_lambda (fun=XIL(0x7ffffffeffc0), nargs=nargs <at> entry=0, arg_vector=0x31bda38, arg_vector <at> entry=0x7fffffff0240) at eval.c:3060
#4  0x00000000005ca65b in Ffuncall (nargs=1, args=args <at> entry=0x7fffffff0238) at eval.c:2873
#5  0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs <at> entry=0, args=<optimized out>, args <at> entry=0x31bdaf8)
    at bytecode.c:632
#6  0x00000000005cde82 in funcall_lambda (fun=XIL(0x7fffffff0238), nargs=nargs <at> entry=0, arg_vector=0x31bdaf8, arg_vector <at> entry=0x7fffffff0640) at eval.c:3060
#7  0x00000000005ca65b in Ffuncall (nargs=nargs <at> entry=1, args=args <at> entry=0x7fffffff0638) at eval.c:2873
#8  0x00000000005c6653 in Ffuncall_interactively (nargs=1, args=0x7fffffff0638) at callint.c:253
#9  0x00000000005ca771 in Ffuncall (nargs=nargs <at> entry=2, args=args <at> entry=0x7fffffff0630) at eval.c:2859
#10 0x00000000005cab2c in Fapply (nargs=nargs <at> entry=3, args=args <at> entry=0x7fffffff0630) at eval.c:2432
#11 0x00000000005c6de1 in Fcall_interactively (function=..., record_flag=..., keys=...) at callint.c:340
#12 0x00000000005cc5d7 in funcall_subr (subr=0xcd63c0 <Scall_interactively>, numargs=numargs <at> entry=3, args=args <at> entry=0x7fffffff07c0) at eval.c:2939
#13 0x00000000005ca771 in Ffuncall (nargs=4, args=args <at> entry=0x7fffffff07b8) at eval.c:2859
#14 0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs <at> entry=2, args=<optimized out>, 
    args <at> entry=0x9c3cc8 <pure+808136>) at bytecode.c:632
#15 0x00000000005cde82 in funcall_lambda (fun=XIL(0x7fffffff07b8), nargs=nargs <at> entry=2, arg_vector=0x9c3cc8 <pure+808136>, arg_vector <at> entry=0x7fffffff0aa8) at eval.c:3060
#16 0x00000000005ca65b in Ffuncall (nargs=3, args=args <at> entry=0x7fffffff0aa0) at eval.c:2873
#17 0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs <at> entry=3, args=<optimized out>, 
    args <at> entry=0x9c3978 <pure+807288>) at bytecode.c:632
#18 0x00000000005cde82 in funcall_lambda (fun=XIL(0x7fffffff0aa0), nargs=nargs <at> entry=3, arg_vector=0x9c3978 <pure+807288>, arg_vector <at> entry=0x7fffffff0e90) at eval.c:3060
#19 0x00000000005ca65b in Ffuncall (nargs=nargs <at> entry=4, args=args <at> entry=0x7fffffff0e88) at eval.c:2873
#20 0x00000000005c6653 in Ffuncall_interactively (nargs=4, args=0x7fffffff0e88) at callint.c:253
#21 0x00000000005ca771 in Ffuncall (nargs=nargs <at> entry=5, args=0x7fffffff0e80) at eval.c:2859
#22 0x00000000005caa3a in Fapply (nargs=nargs <at> entry=3, args=args <at> entry=0x7fffffff1030) at eval.c:2479
#23 0x00000000005c6de1 in Fcall_interactively (function=..., record_flag=..., keys=...) at callint.c:340
#24 0x00000000005cc5d7 in funcall_subr (subr=0xcd63c0 <Scall_interactively>, numargs=numargs <at> entry=3, args=args <at> entry=0x7fffffff11c0) at eval.c:2939
#25 0x00000000005ca771 in Ffuncall (nargs=4, args=args <at> entry=0x7fffffff11b8) at eval.c:2859
#26 0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs <at> entry=1, args=<optimized out>, 
    args <at> entry=0x9c3cc8 <pure+808136>) at bytecode.c:632
#27 0x00000000005cde82 in funcall_lambda (fun=XIL(0x7fffffff11b8), nargs=nargs <at> entry=1, arg_vector=0x9c3cc8 <pure+808136>, arg_vector <at> entry=0x7fffffff1478) at eval.c:3060
#28 0x00000000005ca65b in Ffuncall (nargs=nargs <at> entry=2, args=args <at> entry=0x7fffffff1470) at eval.c:2873
#29 0x00000000005ca83a in call1 (fn=..., fn <at> entry=XIL(0x3ff0), arg1=...) at eval.c:2710
#30 0x000000000054f597 in command_loop_1 () at keyboard.c:1451
#31 0x00000000005c975f in internal_condition_case (bfun=bfun <at> entry=0x54f080 <command_loop_1>, handlers=..., handlers <at> entry=XIL(0x53a0), 
    hfun=hfun <at> entry=0x541d60 <cmd_error>) at eval.c:1373
#32 0x000000000053db88 in command_loop_2 (ignore=..., ignore <at> entry=XIL(0)) at keyboard.c:1079
#33 0x00000000005c9683 in internal_catch (tag=..., func=func <at> entry=0x53db60 <command_loop_2>, arg=..., arg <at> entry=XIL(0)) at eval.c:1136
#34 0x000000000053ddeb in command_loop () at keyboard.c:1058
#35 0x0000000000541864 in recursive_edit_1 () at keyboard.c:703
#36 0x0000000000541c23 in Frecursive_edit () at keyboard.c:774
#37 0x000000000041e727 in main (argc=<optimized out>, argv=<optimized out>) at emacs.c:1731

Lisp Backtrace:
"format" (0xfffeffc8)
"my-loop-1" (0xffff0240)
"my-loop" (0xffff0640)
"funcall-interactively" (0xffff0638)
"call-interactively" (0xffff07c0)
"command-execute" (0xffff0aa8)
"execute-extended-command" (0xffff0e90)
"funcall-interactively" (0xffff0e88)
"call-interactively" (0xffff11c0)
"command-execute" (0xffff1478)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Fri, 19 Oct 2018 00:41:02 GMT) Full text and rfc822 format available.

Message #62 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Thu, 18 Oct 2018 17:39:54 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

>   (gdb) p fun
>   (gdb) xpr
>
> Let me know if this isn't what you meant.

I meant something like 'pv', as in:

(gdb) pv emacs-version
"27.0.50"

but which I could use to find out what the bytecode object for
erb--benchmark-monitor-func is.
      




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Fri, 19 Oct 2018 08:39:01 GMT) Full text and rfc822 format available.

Message #65 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Fri, 19 Oct 2018 11:38:11 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: 33014 <at> debbugs.gnu.org,  schwab <at> linux-m68k.org
> Date: Thu, 18 Oct 2018 17:39:54 -0700
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >   (gdb) p fun
> >   (gdb) xpr
> >
> > Let me know if this isn't what you meant.
> 
> I meant something like 'pv', as in:
> 
> (gdb) pv emacs-version
> "27.0.50"
> 
> but which I could use to find out what the bytecode object for
> erb--benchmark-monitor-func is.

But a function doesn't have to be byte-compiled, in which case there's
no bytecode.  Look at the implementation of funcall, and you will see
how Emacs deals with this.  It seemed to me that xpr reflects that, in
that it shows you what object to look at for a given function symbol.

If you are sure the function is already compiled, then funcall_lambda
will show you how it invokes exec_byte_code for such a function, and
you will see there how to access the bytecode of such a function.

HTH

P.S. Patches to .gdbinit to provide such a functionality in a new
command, called, say, "xfunc", will be most welcome, of course.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Fri, 19 Oct 2018 08:46:02 GMT) Full text and rfc822 format available.

Message #68 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Fri, 19 Oct 2018 11:44:35 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: 33014 <at> debbugs.gnu.org,  schwab <at> linux-m68k.org
> Date: Thu, 18 Oct 2018 17:22:36 -0700
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Anyway, are you saying that stack marking doesn't work in optimized
> > code?  We've been using this technique for the last 17 years without
> > problems; why would the fact that we have more than one thread change
> > that?  The same arguments you submit are valid for a single-threaded
> > Emacs, right?
> 
> Apparently so.  I set up a single-threaded situation where I could
> redefine a function while exec_byte_code was running it, and got a
> segfault.  I've gained some insights from debugging this version of the
> bug which I will put into a separate email.

If this is the case, then I think we should protect the definition of
a running function from GC, in some way, either by making sure it is
referenced by some stack-based Lisp object, even in heavily optimized
code (e.g., by using 'volatile' qualifiers); or by some other method
that will ensure that definition is marked and not swept.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Fri, 19 Oct 2018 19:33:02 GMT) Full text and rfc822 format available.

Message #71 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Fri, 19 Oct 2018 12:32:32 -0700
[Message part 1 (text/plain, inline)]
Gemini Lasswell <gazally <at> runbox.com> writes:

> I set up a single-threaded situation where I could redefine a function
> while exec_byte_code was running it, and got a segfault.  I've gained
> some insights from debugging this version of the bug which I will put
> into a separate email.

Here's a gdb transcript going through the single-threaded version of
this bug.  In this transcript I use a file 'repro.el' which I've
attached to the end of this message, and is the same as the one in my
last message.

Start gdb with a breakpoint at Fredraw_display:

$ gdb --args ./emacs -Q
...
(gdb) b Fredraw_display
(gdb) r

In Emacs, find the file repro.el and load it with byte-compile-file,
then go back to *scratch* and run my-loop:

  C-x C-f repro.el RET
  C-u M-x byte-compile-file RET repro.el RET
  C-x b RET
  M-x my-loop RET

This gets me to the gdb prompt, at a point in execution where the next
function called will be my-loop-1, so I set a breakpoint in
funcall_lambda, where I can see the bytecode object for my-loop-1 (I
edited out the bytestring):

  Thread 1 "emacs" hit Breakpoint 3, Fredraw_display () at dispnew.c:3027
  3027	{
  (gdb) br funcall_lambda
  Breakpoint 4 at 0x5cdb00: file eval.c, line 3016.
  (gdb) c
  Continuing.

  Thread 1 "emacs" hit Breakpoint 4, funcall_lambda (fun=XIL(0x31c0235),
      nargs=nargs <at> entry=0, arg_vector=arg_vector <at> entry=0x7fffffff01c0)
      at eval.c:3016
  3016	{
  (gdb) clear
  Deleted breakpoint 4
  (gdb) p fun
  $1 = XIL(0x1630fc5)
  (gdb) pr
  #[0 "..." [my-var 0 "Now in recursive edit
  " recursive-edit format "Leaving recursive edit: %s
  " (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3] 6]

Then I skip ahead into exec-byte-code:

  (gdb) br exec_byte_code
  Breakpoint 5 at 0x611bb0: file bytecode.c, line 342.
  (gdb) c
  Continuing.

  Thread 1 "emacs" hit Breakpoint 5, exec_byte_code (bytestr=XIL(0x3571d24),
      vector=XIL(0x31c0195), maxdepth=make_number(4),
      args_template=args_template <at> entry=XIL(0), nargs=nargs <at> entry=0,
      args=args <at> entry=0x0) at bytecode.c:342
  342	{

Here's what's in the register $rbp, and the constants vector:

  (gdb) clear
  Deleted breakpoint 5
  (gdb) p $rbp
  $2 = (void *) 0xb0201
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x000b0201>
  (gdb) p vector
  $3 = XIL(0x1630f35)
  (gdb) pr
  [my-var 0 "Now in recursive edit
  " recursive-edit format "Leaving recursive edit: %s
  " (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3]

Skip ahead, to get to where exec_byte_code has a value for vectorp:

  (gdb) n 12
  366	  USE_SAFE_ALLOCA;
  (gdb) p vectorp
  $4 = (Lisp_Object *) 0x1630f38 <bss_sbrk_buffer+9164248>
  (gdb) p *vectorp
  $5 = XIL(0x2327d80)
  (gdb) pr
  my-var
  (gdb) break mark_vectorlike if ptr->contents == $4
  Breakpoint 6 at 0x5ad400: file alloc.c, line 6036.
  (gdb) c
  Continuing.

The idea is to break when garbage collection finds the constants vector.
(I first tried setting a conditional breakpoint in mark_object, which
made garbage collection either hang or take more time than I had
patience for.)

In Emacs type C-x b RET.  This causes a gc and a breakpoint hit:

  Thread 1 "emacs" hit Breakpoint 6, mark_vectorlike (ptr=0x31c0190) at alloc.c:6036
  6036	  eassert (!VECTOR_MARKED_P (ptr));

  (gdb) bt 20
  #0  mark_vectorlike (ptr=0x1630f30 <bss_sbrk_buffer+9164240>) at alloc.c:6036
  #1  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #2  0x00000000005ad45e in mark_vectorlike (
      ptr=0x1611fd0 <bss_sbrk_buffer+9037424>) at alloc.c:6046
  #3  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #4  0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
  #5  0x00000000005acae4 in mark_object (arg=...) at alloc.c:6434
  #6  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a8e00 <bss_sbrk_buffer+8606880>) at alloc.c:6046
  #7  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a9c30 <bss_sbrk_buffer+8610512>) at alloc.c:6046
  #8  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #9  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a7c30 <bss_sbrk_buffer+8602320>) at alloc.c:6046
  #10 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #11 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a6e80 <bss_sbrk_buffer+8598816>) at alloc.c:6046
  #12 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #13 0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
  #14 0x00000000005acaa5 in mark_object (arg=...) at alloc.c:6431
  #15 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fbed0 <bss_sbrk_buffer+8947056>) at alloc.c:6046
  #16 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #17 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fbf50 <bss_sbrk_buffer+8947184>) at alloc.c:6046
  #18 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #19 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fcc80 <bss_sbrk_buffer+8950560>) at alloc.c:6046
  (More stack frames follow...)

  Lisp Backtrace:
  "Automatic GC" (0x0)
  "eldoc-pre-command-refresh-echo-area" (0xfffefbb0)
  "recursive-edit" (0xfffeffd8)
  "my-loop-1" (0xffff0250)
  "my-loop" (0xffff0650)
  "funcall-interactively" (0xffff0648)
  "call-interactively" (0xffff07d0)
  "command-execute" (0xffff0ab8)
  "execute-extended-command" (0xffff0ea0)
  "funcall-interactively" (0xffff0e98)
  "call-interactively" (0xffff11d0)
  "command-execute" (0xffff1488)

There are 279 frames in the backtrace, and mark_stack and mark_memory
aren't there.  So I'm guessing the constants vector is getting found via
the function definition of 'my-loop-1'.  Keep going:

  (gdb) c
  Continuing.

Now in Emacs do this:
  M-x eval-buffer RET
  C-x b RET
  M-x my-gc RET

Execution does not stop at the breakpoint.  In Emacs type C-M-c.
Result:

  Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
  0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffd8,
      message=<optimized out>) at editfns.c:3129
  3129	      unsigned char format_char = *format++;

What's happened to the constants vector and its contents?

  (gdb) p $3
  $6 = XIL(0x1630f35)
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x01630f35>
  (gdb) p *$4
  $7 = XIL(0x2327d80)
  (gdb) pr
  my-var
  (gdb) p *($4+5)
  $8 = XIL(0x359a6f4)
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x0359a6f4>
  (gdb) p *($4+4)
  $9 = XIL(0x6390)
  (gdb) pr
  format

Looks like the constants vector was freed, and its contents haven't been
overwritten (yet) but the format string has been freed leading to the
crash in styled_format.

While I was developing this method of reproducing this bug, I went
through this exercise without lexical-binding set in repro.el.  In that
version, the register $rbp when exec_byte_code is called contains the
bytecode Lisp_Object (instead of the non-Lisp-object value it contains
in the transcript above), and the first thing exec_byte_code does is
save it on the stack (presumably because the System V AMD64 ABI calling
convention says that called functions which use $rbp should save and
restore it).

Here's the beginning of the disassembly of exec_byte_code from
  "objdump -S bytecode.o":

  0000000000000020 <exec_byte_code>:
     executing BYTESTR.  */

  Lisp_Object
  exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
  		Lisp_Object args_template, ptrdiff_t nargs, Lisp_Object *args)
  {
        20:	55                   	push   %rbp
        21:	48 89 e5             	mov    %rsp,%rbp
        24:	41 57                	push   %r15
        26:	41 56                	push   %r14
        28:	41 55                	push   %r13
        2a:	41 54                	push   %r12
        2c:	49 89 ce             	mov    %rcx,%r14
        2f:	53                   	push   %rbx

So in the non-lexical-binding case the bytecode Lisp_Object is written
to the stack by the first instruction in exec_byte_code, and then during
the execution of 'my-gc' the breakpoint in mark_vectorlike stops at a
point with a much shorter backtrace which includes mark_stack and
mark_memory, and mark_memory's pp is pointing to the location on the
stack where $rbp was written. The bytecode object and constants vector
are consequently not freed, and no segfault happens.

I don't follow everything going on in the disassembly of funcall_lambda,
but I did figure out (by comparison with a debug session in the
multithreaded situation) that the different values in $rbp when
funcall_lambda calls exec_byte_code depend on the different code paths
following the test of whether the first element of the bytecode object
vector (the "args template" as funcall_lambda's comment calls it) is an
integer, which in turn depends on whether my-loop-1 was compiled with
lexical-binding on.

Here is 'repro.el':

[repro.el (text/plain, inline)]
;;;  -*- lexical-binding: t -*-
(defvar my-var "ok")
(defun my-loop-1 ()
  (let ((val 0))
    (while t
      (insert "Now in recursive edit\n")
      (recursive-edit)
      (insert (format "Leaving recursive edit: %s\n" my-var))
      (let ((things '(a b c d e)))
	(cond ;
         ((= val 0) (message "foo: %s" (last things)))
         ((= val 1) (message "bar: %s" things))
         ((= val 2) (message "baz: %s" (car things)))
         (t (message "bop: %s" (nth 2 things))))
	(setq val (mod (1+ val) 3))))))

(defun my-loop ()
  (interactive)
  (redraw-display)
  (my-loop-1))

(defun my-gc-1 ()
  (garbage-collect))

(defun my-gc ()
  (interactive)
  (my-gc-1))

(provide 'repro)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Fri, 19 Oct 2018 20:06:03 GMT) Full text and rfc822 format available.

Message #74 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Fri, 19 Oct 2018 13:05:19 -0700
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

>> > Anyway, are you saying that stack marking doesn't work in optimized
>> > code?  We've been using this technique for the last 17 years without
>> > problems; why would the fact that we have more than one thread change
>> > that?  The same arguments you submit are valid for a single-threaded
>> > Emacs, right?
>> 
>> Apparently so.  I set up a single-threaded situation where I could
>> redefine a function while exec_byte_code was running it, and got a
>> segfault.  I've gained some insights from debugging this version of the
>> bug which I will put into a separate email.
>
> If this is the case, then I think we should protect the definition of
> a running function from GC, in some way, either by making sure it is
> referenced by some stack-based Lisp object, even in heavily optimized
> code (e.g., by using 'volatile' qualifiers); or by some other method
> that will ensure that definition is marked and not swept.

Maybe code optimizers have improved over the last 17 years?

I have patched Emacs with a 'volatile' on the definition of 'fun' in
Ffuncall, and so far haven't managed to reproduce the bug with it:

[0001-src-eval.c-Ffuncall-Make-local-variable-fun-volatile.patch (text/plain, inline)]
From a1fc2dfd392e0ba8754159d855da231a56ca275b Mon Sep 17 00:00:00 2001
From: Gemini Lasswell <gazally <at> runbox.com>
Date: Sun, 14 Oct 2018 12:12:04 -0700
Subject: [PATCH] * src/eval.c (Ffuncall): Make local variable 'fun' volatile
 (bug#33014)

---
 src/eval.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/eval.c b/src/eval.c
index 5e25caaa84..75b30f9c7d 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -2817,8 +2817,8 @@ Thus, (funcall \\='cons \\='x \\='y) returns (x . y).
 usage: (funcall FUNCTION &rest ARGUMENTS)  */)
   (ptrdiff_t nargs, Lisp_Object *args)
 {
-  Lisp_Object fun, original_fun;
-  Lisp_Object funcar;
+  Lisp_Object volatile fun;
+  Lisp_Object original_fun, funcar;
   ptrdiff_t numargs = nargs - 1;
   Lisp_Object val;
   ptrdiff_t count;
-- 
2.16.4

[Message part 3 (text/plain, inline)]
I'll go back now to working on my benchmarking project which I hope
someday will make it easy to see if that 'volatile' causes measurable
harm to performance.  I'll also keep using 'eval-region' and 'eval-buffer'
while I have threads running byte-compiled functions which get redefined
by doing that, and report back here if I encounter this bug again.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sat, 20 Oct 2018 06:42:02 GMT) Full text and rfc822 format available.

Message #77 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sat, 20 Oct 2018 09:41:07 +0300
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: 33014 <at> debbugs.gnu.org
> Date: Fri, 19 Oct 2018 13:05:19 -0700
> 
> > If this is the case, then I think we should protect the definition of
> > a running function from GC, in some way, either by making sure it is
> > referenced by some stack-based Lisp object, even in heavily optimized
> > code (e.g., by using 'volatile' qualifiers); or by some other method
> > that will ensure that definition is marked and not swept.
> 
> Maybe code optimizers have improved over the last 17 years?

I think a much more significant factor is that modern processors have
many more registers to use.

> I have patched Emacs with a 'volatile' on the definition of 'fun' in
> Ffuncall, and so far haven't managed to reproduce the bug with it:

Thanks.  This needs a comment for why we do something strange like
that, but otherwise, if no one has better ideas in a week's time,
let's install this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sat, 20 Oct 2018 08:24:02 GMT) Full text and rfc822 format available.

Message #80 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Gemini Lasswell <gazally <at> runbox.com>, 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sat, 20 Oct 2018 10:23:37 +0200
On Okt 20 2018, Eli Zaretskii <eliz <at> gnu.org> wrote:

>> Maybe code optimizers have improved over the last 17 years?
>
> I think a much more significant factor is that modern processors have
> many more registers to use.

I think an important factor is that they pass arguments in registers, so
it is more likely that the original value of an argument is lost.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sat, 20 Oct 2018 10:21:01 GMT) Full text and rfc822 format available.

Message #83 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: gazally <at> runbox.com, 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sat, 20 Oct 2018 13:20:19 +0300
> From: Andreas Schwab <schwab <at> linux-m68k.org>
> Cc: Gemini Lasswell <gazally <at> runbox.com>,  33014 <at> debbugs.gnu.org
> Date: Sat, 20 Oct 2018 10:23:37 +0200
> 
> On Okt 20 2018, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
> >> Maybe code optimizers have improved over the last 17 years?
> >
> > I think a much more significant factor is that modern processors have
> > many more registers to use.
> 
> I think an important factor is that they pass arguments in registers, so
> it is more likely that the original value of an argument is lost.

Agreed.  That's part of what I meant by "have many more registers".




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Sat, 20 Oct 2018 11:31:01 GMT) Full text and rfc822 format available.

Message #86 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gazally <at> runbox.com, 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Sat, 20 Oct 2018 13:30:16 +0200
On Okt 20 2018, Eli Zaretskii <eliz <at> gnu.org> wrote:

>> From: Andreas Schwab <schwab <at> linux-m68k.org>
>> Cc: Gemini Lasswell <gazally <at> runbox.com>,  33014 <at> debbugs.gnu.org
>> Date: Sat, 20 Oct 2018 10:23:37 +0200
>> 
>> On Okt 20 2018, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> 
>> >> Maybe code optimizers have improved over the last 17 years?
>> >
>> > I think a much more significant factor is that modern processors have
>> > many more registers to use.
>> 
>> I think an important factor is that they pass arguments in registers, so
>> it is more likely that the original value of an argument is lost.
>
> Agreed.  That's part of what I meant by "have many more registers".

You can pass by register even without more of them, and the likelihood
that the original value is lost is even higher then.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Mon, 29 Oct 2018 18:25:02 GMT) Full text and rfc822 format available.

Message #89 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Mon, 29 Oct 2018 11:24:10 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

>> I have patched Emacs with a 'volatile' on the definition of 'fun' in
>> Ffuncall, and so far haven't managed to reproduce the bug with it:
>
> Thanks.  This needs a comment for why we do something strange like
> that, but otherwise, if no one has better ideas in a week's time,
> let's install this.

Pushed to master, along with a new test which should fail if some future
optimizing compiler removes the reference from the stack in spite of the
'volatile'.




Added tag(s) fixed. Request was from Gemini Lasswell <gazally <at> runbox.com> to control <at> debbugs.gnu.org. (Mon, 29 Oct 2018 18:25:02 GMT) Full text and rfc822 format available.

bug marked as fixed in version 27.1, send any further explanations to 33014 <at> debbugs.gnu.org and Gemini Lasswell <gazally <at> runbox.com> Request was from Gemini Lasswell <gazally <at> runbox.com> to control <at> debbugs.gnu.org. (Mon, 29 Oct 2018 18:25:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Mon, 29 Oct 2018 18:57:02 GMT) Full text and rfc822 format available.

Message #96 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, schwab <at> linux-m68k.org, 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Mon, 29 Oct 2018 14:56:11 -0400
>> > After thinking about this a bit, I don't really agree with the last
>> > one: the compiler could indeed stop tracking 'vector', but not
>> > XVECTOR (vector)->contents, and we are interested in the latter.
>> If the compiler stops tracking 'vector', and the garbage collector frees
>> it, doesn't that cause XVECTOR (vector)->contents to be overwritten?
> Hmmm... could be.

Indeed, the conservative GC doesn't try to handle "pointers into the
middle of objects", so a pointer to `XVECTOR (vector)->contents` won't
be sufficient to keep `vector` alive.

> From: Gemini Lasswell <gazally <at> runbox.com>
> Date: Sun, 14 Oct 2018 12:12:04 -0700
> Subject: [PATCH] * src/eval.c (Ffuncall): Make local variable 'fun' volatile
>  (bug#33014)

Shouldn't we do that in exec_byte_code instead (it probably doesn't
matter that much in the end, but I think conceptually that would be the
more correct place)?

E.g. if you change your test to

    (defun eval-tests-33014-redefine ()
      "Remove the Lisp reference to the byte-compiled object."
      (aset (symbol-function #'eval-tests-33014-func) 1 nil)
      (aset (symbol-function #'eval-tests-33014-func) 2 nil))

you won't get a crash but only because these `aset` will fail (bytecode
objects are luckily read-only).  Moving the volatile thingies to
exec_byte_code should let our code work correctly against the above test
even if we changed aset to allow modifying bytecode objects.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Mon, 29 Oct 2018 19:42:01 GMT) Full text and rfc822 format available.

Message #99 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Mon, 29 Oct 2018 21:41:04 +0200
> From: Gemini Lasswell <gazally <at> runbox.com>
> Cc: 33014 <at> debbugs.gnu.org
> Date: Mon, 29 Oct 2018 11:24:10 -0700
> 
> Pushed to master, along with a new test which should fail if some future
> optimizing compiler removes the reference from the stack in spite of the
> 'volatile'.

Thanks.

Any reason not to cherry-pick this to emacs-26?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Wed, 31 Oct 2018 04:50:01 GMT) Full text and rfc822 format available.

Message #102 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Gemini Lasswell <gazally <at> runbox.com>, 33014 <at> debbugs.gnu.org,
 Andreas Schwab <schwab <at> linux-m68k.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a
 thread's function
Date: Tue, 30 Oct 2018 21:49:46 -0700
[Message part 1 (text/plain, inline)]
> Any reason not to cherry-pick this to emacs-26?

No, once we fix it up. Although adding 'volatile' happened to work for Gemini's 
compiler, it won't suffice in general as the C standard does not require 
volatile variables to survive their last access (which is what the patch was 
assuming). Furthermore, Fbyte_code bypasses that patch, so the bug could still 
occur even if 'volatile' cured the bug in the more-common code path.

A simple way to ensure that the constant vector survives GC is to have 
exec_byte_code put the vector into a GC-visible slot. As it happens, there's a 
spare slot that we can appropriate, so this won't cost us stack (or heap) space. 
I installed the first attached patch into master to do that, and backported the 
patch series into emacs-26 via the last two attached patches.

Thanks, Gemini, for the good work in debugging this problem and writing that 
test case. GC bugs can be nasty.
[0001-Improve-fix-for-Bug-33014.patch (text/x-patch, attachment)]
[0001-Refer-to-bytecode-constant-vectors-Bug-33014.patch (text/x-patch, attachment)]
[0002-Add-regression-test-for-Bug-33014.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Wed, 31 Oct 2018 15:35:02 GMT) Full text and rfc822 format available.

Message #105 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: gazally <at> runbox.com, 33014 <at> debbugs.gnu.org, schwab <at> linux-m68k.org,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a
 thread's function
Date: Wed, 31 Oct 2018 17:33:56 +0200
> Cc: Gemini Lasswell <gazally <at> runbox.com>, 33014 <at> debbugs.gnu.org,
>  Stefan Monnier <monnier <at> iro.umontreal.ca>,
>  Andreas Schwab <schwab <at> linux-m68k.org>
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 30 Oct 2018 21:49:46 -0700
> 
> > Any reason not to cherry-pick this to emacs-26?
> 
> No, once we fix it up. Although adding 'volatile' happened to work for Gemini's 
> compiler, it won't suffice in general as the C standard does not require 
> volatile variables to survive their last access (which is what the patch was 
> assuming). Furthermore, Fbyte_code bypasses that patch, so the bug could still 
> occur even if 'volatile' cured the bug in the more-common code path.
> 
> A simple way to ensure that the constant vector survives GC is to have 
> exec_byte_code put the vector into a GC-visible slot. As it happens, there's a 
> spare slot that we can appropriate, so this won't cost us stack (or heap) space. 
> I installed the first attached patch into master to do that, and backported the 
> patch series into emacs-26 via the last two attached patches.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33014; Package emacs. (Thu, 01 Nov 2018 23:16:01 GMT) Full text and rfc822 format available.

Message #108 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Schwab <schwab <at> linux-m68k.org>,
 33014 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Thu, 01 Nov 2018 16:15:21 -0700
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> A simple way to ensure that the constant vector survives GC is to have
> exec_byte_code put the vector into a GC-visible slot. As it happens,
> there's a spare slot that we can appropriate, so this won't cost us
> stack (or heap) space. I installed the first attached patch into
> master to do that, and backported the patch series into emacs-26 via
> the last two attached patches.

Thanks.  I ran through all my methods of reproducing this bug on both
emacs-26 and master, and saw no more problems.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 30 Nov 2018 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 196 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.