GNU bug report logs - #33014
26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function

Previous Next

Package: emacs;

Reported by: Gemini Lasswell <gazally <at> runbox.com>

Date: Thu, 11 Oct 2018 05:32:01 UTC

Severity: normal

Tags: fixed

Found in version 26.1.50

Fixed in version 27.1

Done: Gemini Lasswell <gazally <at> runbox.com>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 33014 <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33014 <at> debbugs.gnu.org
Subject: Re: bug#33014: 26.1.50; 27.0.50;
 Fatal error after re-evaluating a thread's function
Date: Fri, 12 Oct 2018 13:02:56 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

> Can you please make a smaller stand-alone test case, which doesn't
> require patching Emacs?  That will make it much easier to try
> reproducing the problem.

I've tried to do that without success.  The bug won't reproduce if I put
all the code added to thread.el by the patch into its own file and load
it with C-u M-x byte-compile-file, and it also doesn't work to put the
resulting .elc on my load-path and load it with require.

I've determined today that having -O2 in CFLAGS is necessary to
reproduce the bug, and that -O1 or -O0 won't do it.

> Can you show the Lisp backtrace of this thread?  Also, what is the
> offending object 'a' in this frame:

The Lisp backtrace is really short:

Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
"erb--benchmark-monitor-func" (0x158ec58)

>> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
>
> and what was its parent object in the calling frame?

Those are both optimized out with -O2.  I recompiled bytecode.c with
"volatile" on the declaration of jmp_table, and got this:

(gdb) up 3
#3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., 
    nargs=nargs <at> entry=0, args=<optimized out>, 
    args <at> entry=0x16eacf8 <bss_sbrk_buffer+9926232>) at bytecode.c:1403
1403	            struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);
(gdb) p jmp_table
$1 = make_number(514)
(gdb) p *top
$3 = XIL(0x42b4d0)
(gdb) pp *top
remove

Then I started looking at other variables in exec_byte_code, and found
this which didn't look right:

(gdb) p *vectorp
$13 = XIL(0x7f4934009523)
(gdb) pr
(((help-menu "Help" keymap (emacs-tutorial menu-item "Emacs Tutorial" help-with-tutorial :help "Lear
?\207" [yank-menu kill-ring buffer-read-only gui-backend-selection-exists-p CLIPBOARD featurep ns] 2
\205^Q^@ÅÆ!\207" [visual-line-mode word-wrap truncate-lines 0 nil toggle-truncate-lines -1] 2 nil ni

(I've truncated the result of printing *vectorp since each line is over
5000 characters long.)

Since that looked like it was unlikely to be the original value of
*vectorp, I started a new debugging session and stepped through
Thread 7's call to exec_byte_code for erb--benchmark-monitor-func, and
determined that *vectorp's initial value was erb--status-updates, which
matches the first element of the constants vector in
(symbol-function 'erb--benchmark-monitor-func).

The value of vectorp was 0x16eac38 so I set a watchpoint on
  *(EMACS_INT *) 0x16eac38
and continued, and then during the execution of eval-region
it triggered here:

Thread 1 "monitor" hit Hardware watchpoint 7: *(EMACS_INT *) 0x16eac38

Old value = 60897760
New value = 24075314
setup_on_free_list (v=v <at> entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
    nbytes=nbytes <at> entry=272) at alloc.c:3060
3060	  total_free_vector_slots += nbytes / word_size;
(gdb) bt 10
#0  setup_on_free_list (v=v <at> entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
    nbytes=nbytes <at> entry=272) at alloc.c:3060
#1  0x00000000005a9a24 in sweep_vectors () at alloc.c:3297
#2  0x00000000005adb2e in gc_sweep () at alloc.c:6872
#3  garbage_collect_1 (end=<optimized out>) at alloc.c:5860
#4  Fgarbage_collect () at alloc.c:5989
#5  0x00000000005ca478 in maybe_gc () at lisp.h:4804
#6  Ffuncall (nargs=4, args=args <at> entry=0x7fff210a3bc8) at eval.c:2838
#7  0x0000000000611e00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., 
    args_template=..., nargs=nargs <at> entry=2, args=<optimized out>, 
    args <at> entry=0x9bd128 <pure+781288>) at bytecode.c:632
#8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fff210a3bc8), 
    nargs=nargs <at> entry=2, arg_vector=0x9bd128 <pure+781288>, 
    arg_vector <at> entry=0x7fff210a3f00) at eval.c:3057
#9  0x00000000005ca54b in Ffuncall (nargs=3, args=args <at> entry=0x7fff210a3ef8)
    at eval.c:2870
(More stack frames follow...)

Note that just as was happening when we were working through bug#32357,
the thread names which gdb prints are wrong, which I verified with:

(gdb) p current_thread
$21 = (struct thread_state *) 0xd73480 <main_thread>
(gdb) p current_thread->name
$22 = XIL(0)

Am I correct that the next step is to figure out why the garbage
collector is not marking this vector?  Presumably it's no longer
attached to the function definition for erb--benchmark-monitor-func by
the time the garbage collector runs, but it's supposed to be found by
mark_stack when called from mark_one_thread for Thread 7, right?




This bug report was last modified 6 years and 197 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.