GNU bug report logs - #42832
28.0.50; "Bus error" when compiling Emacs now on Debian bullseye

Previous Next

Package: emacs;

Reported by: Lars Ingebrigtsen <larsi <at> gnus.org>

Date: Wed, 12 Aug 2020 17:13:01 UTC

Severity: normal

Found in version 28.0.50

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 42832 in the body.
You can then email your comments to 42832 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 17:13:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lars Ingebrigtsen <larsi <at> gnus.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 12 Aug 2020 17:13:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
Date: Wed, 12 Aug 2020 19:12:06 +0200
I'm getting this on one of my machines:

/bin/bash: line 1: 2759815 Bus error               EMACSLOADPATH= '../src/emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' -f batch-byte-compile cedet/semantic/bovine/c-by.el
make[3]: *** [Makefile:295: cedet/semantic/bovine/c-by.elc] Error 135
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [Makefile:318: compile-main] Error 2
make[1]: *** [Makefile:411: lisp] Error 2
make: *** [Makefile:1126: bootstrap] Error 2

It's reproducible in that I always get this when I say "make", but if I
instead say

./src/emacs -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' -f batch-byte-compile cedet/semantic/bovine/c-by.el

everything works as it should, and it makes the .elc file.

So I'm not sure how to debug this...

On my laptop (which is also Debian bullseye), I'm not seeing any problems.


In GNU Emacs 28.0.50 (build 51, x86_64-pc-linux-gnu, GTK+ Version 3.24.20, cairo version 1.16.0)
 of 2020-08-09 built on xo
Repository revision: 1a845a672dc73c8e98e6cb9bb734616e168e60ba
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12008000
System Description: Debian GNU/Linux bullseye/sid


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 18:23:01 GMT) Full text and rfc822 format available.

Message #8 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 20:22:13 +0200
Additional data point:

It's totally repeatable with "make -j2" and up, but with single-threaded
compilation, everything works fine.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 18:32:02 GMT) Full text and rfc822 format available.

Message #11 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 20:30:48 +0200
I got a core dump, and gdb says it starts with:

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000564c625c0ad5 in terminate_due_to_signal
    (sig=sig <at> entry=7, backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:408
#2  0x0000564c625c0f6b in handle_fatal_signal (sig=sig <at> entry=7)
    at sysdep.c:1782
#3  0x0000564c626bbd9d in deliver_thread_signal
    (sig=7, handler=0x564c625c0f60 <handle_fatal_signal>) at sysdep.c:1756
#4  0x0000564c626bbe89 in deliver_fatal_thread_signal (sig=<optimized out>)
    at sysdep.c:1794
#5  0x00007f13103bc140 in <signal handler called> ()
    at /lib/x86_64-linux-gnu/libpthread.so.0
#6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
#7  mark_object (arg=<optimized out>) at alloc.c:6607
#8  0x0000564c626ffd7e in mark_vectorlike (header=0x564c63246f10)
    at alloc.c:6280
#9  0x0000564c626ffd7e in mark_vectorlike (header=header <at> entry=0x7f130c63a1a8)
    at alloc.c:6280
#10 0x0000564c626ff68c in mark_hash_table (ptr=0x7f130c63a1a8) at alloc.c:6651
#11 mark_object (arg=<optimized out>) at alloc.c:6651
#12 0x0000564c626ffce7 in mark_memory (end=<optimized out>, 
    end <at> entry=0x7ffc912d7c20, start=<optimized out>) at alloc.c:4842
#13 mark_stack (bottom=<optimized out>, end=end <at> entry=0x7ffc912b1840 "")
    at alloc.c:5039
#14 0x0000564c62782e61 in mark_one_thread (thread=0x564c62b50460 <main_thread>)
    at thread.c:630
#15 mark_threads_callback (ignore=<optimized out>) at thread.c:661
#16 0x0000564c627006b7 in garbage_collect () at alloc.c:6068
#17 0x0000564c62700f91 in maybe_garbage_collect () at alloc.c:5975
#18 0x0000564c6271d1d5 in maybe_gc () at lisp.h:5053
#19 Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b1980) at eval.c:2779
#20 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#21 0x0000564c6271d157 in Ffuncall (nargs=2, args=args <at> entry=0x7ffc912b23c0)
    at eval.c:2809
#22 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#23 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b26d0)
    at eval.c:2809
#24 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#25 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b2a38)
    at eval.c:2809
#26 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#27 0x0000564c6271d157 in Ffuncall (nargs=2, args=args <at> entry=0x7ffc912b3450)
    at eval.c:2809
#28 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#29 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b3760)
    at eval.c:2809
#30 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#31 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b3ac8)
    at eval.c:2809
#32 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#33 0x0000564c6271d157 in Ffuncall (nargs=2, args=args <at> entry=0x7ffc912b44e0)
    at eval.c:2809
#34 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#35 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b47f0)
    at eval.c:2809
#36 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#37 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b4b58)
    at eval.c:2809
#38 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#39 0x0000564c6271d157 in Ffuncall (nargs=2, args=args <at> entry=0x7ffc912b5570)
    at eval.c:2809
#40 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, #maxdepth=<optimized out>,
args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#41 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b5880)
    at eval.c:2809
#42 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#43 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b5be8)
    at eval.c:2809
#44 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#45 0x0000564c6271d157 in Ffuncall (nargs=2, args=args <at> entry=0x7ffc912b6600)
    at eval.c:2809
#46 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#47 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b6910)
    at eval.c:2809
#48 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#49 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b6c78)
    at eval.c:2809
#50 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#51 0x0000564c6271d157 in Ffuncall (nargs=2, args=args <at> entry=0x7ffc912b7690)
    at eval.c:2809
#52 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#53 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b79a0)
    at eval.c:2809
#54 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#55 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b7d08)
    at eval.c:2809
#56 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#57 0x0000564c6271d157 in Ffuncall (nargs=2, args=args <at> entry=0x7ffc912b8720)
    at eval.c:2809
#58 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#59 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b8a30)
    at eval.c:2809
#60 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#61 0x0000564c6271d157 in Ffuncall (nargs=3, args=args <at> entry=0x7ffc912b8d98)
    at eval.c:2809
#62 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#63 0x0000564c6271d157 in Ffuncall (nargs=2, args=args <at> entry=0x7ffc912b97b0)
...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 18:36:01 GMT) Full text and rfc822 format available.

Message #14 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50;
 "Bus error" when compiling Emacs now on Debian bullseye
Date: Wed, 12 Aug 2020 21:34:57 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Wed, 12 Aug 2020 20:22:13 +0200
> 
> It's totally repeatable with "make -j2" and up, but with single-threaded
> compilation, everything works fine.

Are you sure it isn't a hardware problem on that machine?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 18:43:02 GMT) Full text and rfc822 format available.

Message #17 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 20:41:39 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> It's totally repeatable with "make -j2" and up, but with single-threaded
>> compilation, everything works fine.
>
> Are you sure it isn't a hardware problem on that machine?

Nope, it could well be.  But there's been no other problems on the
machine, and the problem is so repeatable...

I'll try rebooting it, though, and see whether that has any effect.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 18:51:02 GMT) Full text and rfc822 format available.

Message #20 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50;
 "Bus error" when compiling Emacs now on Debian bullseye
Date: Wed, 12 Aug 2020 21:50:10 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Wed, 12 Aug 2020 20:30:48 +0200
> 
> I got a core dump, and gdb says it starts with:
> 
> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x0000564c625c0ad5 in terminate_due_to_signal
>     (sig=sig <at> entry=7, backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:408
> #2  0x0000564c625c0f6b in handle_fatal_signal (sig=sig <at> entry=7)
>     at sysdep.c:1782
> #3  0x0000564c626bbd9d in deliver_thread_signal
>     (sig=7, handler=0x564c625c0f60 <handle_fatal_signal>) at sysdep.c:1756
> #4  0x0000564c626bbe89 in deliver_fatal_thread_signal (sig=<optimized out>)
>     at sysdep.c:1794
> #5  0x00007f13103bc140 in <signal handler called> ()
>     at /lib/x86_64-linux-gnu/libpthread.so.0
> #6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
                       ^^^^^^^^^^^^^^^^^^^^
If you repeat this, do you get the same value in 'v' as above
(assuming it always crashes with the same backtrace)?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 18:59:02 GMT) Full text and rfc822 format available.

Message #23 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 20:58:11 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> I got a core dump, and gdb says it starts with:
>> 
>> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
>> #1  0x0000564c625c0ad5 in terminate_due_to_signal
>>     (sig=sig <at> entry=7, backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:408
>> #2  0x0000564c625c0f6b in handle_fatal_signal (sig=sig <at> entry=7)
>>     at sysdep.c:1782
>> #3  0x0000564c626bbd9d in deliver_thread_signal
>>     (sig=7, handler=0x564c625c0f60 <handle_fatal_signal>) at sysdep.c:1756
>> #4  0x0000564c626bbe89 in deliver_fatal_thread_signal (sig=<optimized out>)
>>     at sysdep.c:1794
>> #5  0x00007f13103bc140 in <signal handler called> ()
>>     at /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
>                        ^^^^^^^^^^^^^^^^^^^^
> If you repeat this, do you get the same value in 'v' as above
> (assuming it always crashes with the same backtrace)?

Yes, I get the same backtrace every time:

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000559941742ad5 in terminate_due_to_signal
    (sig=sig <at> entry=7, backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:408
#2  0x0000559941742f6b in handle_fatal_signal (sig=sig <at> entry=7)
    at sysdep.c:1782
#3  0x000055994183dd9d in deliver_thread_signal
    (sig=7, handler=0x559941742f60 <handle_fatal_signal>) at sysdep.c:1756
#4  0x000055994183de89 in deliver_fatal_thread_signal (sig=<optimized out>)
    at sysdep.c:1794
#5  0x00007f7880ea8140 in <signal handler called> ()
    at /lib/x86_64-linux-gnu/libpthread.so.0
#6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
#7  mark_object (arg=<optimized out>) at alloc.c:6607
#8  0x0000559941881d7e in mark_vectorlike (header=0x559942facf10)
    at alloc.c:6280
#9  0x0000559941881d7e in mark_vectorlike (header=header <at> entry=0x7f787d1261a8)
    at alloc.c:6280

I've also now rebooted, and I'm still getting the same "bus error".

I've tried building Emacs with -O0, and I don't get the error then.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 19:27:02 GMT) Full text and rfc822 format available.

Message #26 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 21:26:16 +0200
On Aug 12 2020, Lars Ingebrigtsen wrote:

> I'm getting this on one of my machines:
>
> /bin/bash: line 1: 2759815 Bus error               EMACSLOADPATH= '../src/emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' -f batch-byte-compile cedet/semantic/bovine/c-by.el
> make[3]: *** [Makefile:295: cedet/semantic/bovine/c-by.elc] Error 135
> make[3]: *** Waiting for unfinished jobs....
> make[2]: *** [Makefile:318: compile-main] Error 2
> make[1]: *** [Makefile:411: lisp] Error 2
> make: *** [Makefile:1126: bootstrap] Error 2
>
> It's reproducible in that I always get this when I say "make", but if I
> instead say

A bus error usually means an mmaped file got truncated so that the
mapping now extends beyond the end of the file.  Emacs uses mmap to map
the pdmp file.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 19:29:02 GMT) Full text and rfc822 format available.

Message #29 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 21:28:01 +0200
I did some bisecting here, and if I 

git checkout 0d0aad213f941efc0fa0ec032e37dc9c2b08c9fb

(i.e., go back to the version of Emacs just before the recent pdumper
hash table stuff), then Emacs builds without this bus error.

So I've added Paul to the Cc's.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 19:35:02 GMT) Full text and rfc822 format available.

Message #32 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 21:33:53 +0200
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> I did some bisecting here, and if I 
>
> git checkout 0d0aad213f941efc0fa0ec032e37dc9c2b08c9fb
>
> (i.e., go back to the version of Emacs just before the recent pdumper
> hash table stuff), then Emacs builds without this bus error.

Yup, that checkout works, but 16a16645f524c62f7906036b0e383e4247b58de7
has the bus error.

Which is:

commit 16a16645f524c62f7906036b0e383e4247b58de7
Author:     Pip Cet <pipcet <at> gmail.com>
AuthorDate: Tue Aug 11 02:16:53 2020 -0700
Commit:     Paul Eggert <eggert <at> cs.ucla.edu>
CommitDate: Tue Aug 11 02:27:43 2020 -0700

    Rehash hash tables eagerly after loading a dump
    
-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 20:41:01 GMT) Full text and rfc822 format available.

Message #35 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 13:40:23 -0700
[Message part 1 (text/plain, inline)]
On 8/12/20 12:33 PM, Lars Ingebrigtsen wrote:

> Yup, that checkout works, but 16a16645f524c62f7906036b0e383e4247b58de7
> has the bus error.
> 
> commit 16a16645f524c62f7906036b0e383e4247b58de7
> Author:     Pip Cet <pipcet <at> gmail.com>
> AuthorDate: Tue Aug 11 02:16:53 2020 -0700
> Commit:     Paul Eggert <eggert <at> cs.ucla.edu>
> CommitDate: Tue Aug 11 02:27:43 2020 -0700
> 
>      Rehash hash tables eagerly after loading a dump
>      
> 

A quick workaround might be to revert that particular commit; could you try the 
attached patch? It passes "make check" for me.

Obviously it'd be better to have a real fix. I've asked Pip Cet to take a look.
[0001-Revert-2020-08-11T09-16-53Z-pipcet-gmail.com.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 20:48:02 GMT) Full text and rfc822 format available.

Message #38 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 22:47:03 +0200
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> A quick workaround might be to revert that particular commit; could
> you try the attached patch? It passes "make check" for me.

Yup; with that patch applied, the bus error goes away.

But it's an odd problem -- I've tried building on three machines now,
and it only fails on one.  The machine it fails on and one it works on
are both Debian bullseye, both with the same compiler version, etc.  And
on the one machine it does fail on, it only fails when saying "make -j2"
or higher.

So for all I know, there is some kind of very strange hardware error on
that machine...  although that's looking kinda unlikely now.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 21:44:02 GMT) Full text and rfc822 format available.

Message #41 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 21:42:50 +0000
On Wed, Aug 12, 2020 at 8:48 PM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> Paul Eggert <eggert <at> cs.ucla.edu> writes:
>
> > A quick workaround might be to revert that particular commit; could
> > you try the attached patch? It passes "make check" for me.
>
> Yup; with that patch applied, the bus error goes away.

That is strange.

> But it's an odd problem -- I've tried building on three machines now,
> and it only fails on one.  The machine it fails on and one it works on
> are both Debian bullseye, both with the same compiler version, etc.

Same sysctl settings, too? In particular, address randomization
appears to be enabled, does this also happen if you disable it (echo 0
| sudo tee /proc/sys/kernel/randomize_va_space) ?

> And
> on the one machine it does fail on, it only fails when saying "make -j2"
> or higher.
>
> So for all I know, there is some kind of very strange hardware error on
> that machine...  although that's looking kinda unlikely now.

I'm thinking it might "simply" be a very timing-sensitive issue, which
would exonerate me :-)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 21:55:01 GMT) Full text and rfc822 format available.

Message #44 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Wed, 12 Aug 2020 23:54:24 +0200
Pip Cet <pipcet <at> gmail.com> writes:

> Same sysctl settings, too? In particular, address randomization
> appears to be enabled, does this also happen if you disable it (echo 0
> | sudo tee /proc/sys/kernel/randomize_va_space) ?

Tried that now and rebuilt -- bus error in the same place, I think:

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000555555597ad5 in terminate_due_to_signal
    (sig=sig <at> entry=7, backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:408
#2  0x0000555555597f6b in handle_fatal_signal (sig=sig <at> entry=7)
    at sysdep.c:1782
#3  0x0000555555692d9d in deliver_thread_signal
    (sig=7, handler=0x555555597f60 <handle_fatal_signal>) at sysdep.c:1756
#4  0x0000555555692e89 in deliver_fatal_thread_signal (sig=<optimized out>)
    at sysdep.c:1794
#5  0x00007ffff5726140 in <signal handler called> ()
    at /lib/x86_64-linux-gnu/libpthread.so.0
#6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
#7  mark_object (arg=<optimized out>) at alloc.c:6607
#8  0x00005555556d6d7e in mark_vectorlike (header=0x555555c34f10)
    at alloc.c:6280
#9  0x00005555556d6d7e in mark_vectorlike (header=header <at> entry=0x7ffff19a41a8)
    at alloc.c:6280
#10 0x00005555556d668c in mark_hash_table (ptr=0x7ffff19a41a8) at alloc.c:6651
#11 mark_object (arg=<optimized out>) at alloc.c:6651
#12 0x00005555556d6ce7 in mark_memory (end=<optimized out>, 
    end <at> entry=0x7ffffffee950, start=<optimized out>) at alloc.c:4842


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Wed, 12 Aug 2020 22:01:01 GMT) Full text and rfc822 format available.

Message #47 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Thu, 13 Aug 2020 00:00:39 +0200
Pip Cet <pipcet <at> gmail.com> writes:

>> So for all I know, there is some kind of very strange hardware error on
>> that machine...  although that's looking kinda unlikely now.
>
> I'm thinking it might "simply" be a very timing-sensitive issue, which
> would exonerate me :-)

:-)

I think we should just leave it as is for now, and see whether anybody
else sees this problem, too (and perhaps we'll find some commonalities
between the setups).  I'll just switch my test builds to a different
machine for now.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Thu, 13 Aug 2020 10:07:02 GMT) Full text and rfc822 format available.

Message #50 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Thu, 13 Aug 2020 10:05:57 +0000
On Wed, Aug 12, 2020 at 9:54 PM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> Pip Cet <pipcet <at> gmail.com> writes:
>
> > Same sysctl settings, too? In particular, address randomization
> > appears to be enabled, does this also happen if you disable it (echo 0
> > | sudo tee /proc/sys/kernel/randomize_va_space) ?
>
> Tried that now and rebuilt -- bus error in the same place, I think:
>
> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x0000555555597ad5 in terminate_due_to_signal
>     (sig=sig <at> entry=7, backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:408
> #2  0x0000555555597f6b in handle_fatal_signal (sig=sig <at> entry=7)
>     at sysdep.c:1782
> #3  0x0000555555692d9d in deliver_thread_signal
>     (sig=7, handler=0x555555597f60 <handle_fatal_signal>) at sysdep.c:1756
> #4  0x0000555555692e89 in deliver_fatal_thread_signal (sig=<optimized out>)
>     at sysdep.c:1794
> #5  0x00007ffff5726140 in <signal handler called> ()
>     at /lib/x86_64-linux-gnu/libpthread.so.0
> #6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
> #7  mark_object (arg=<optimized out>) at alloc.c:6607
> #8  0x00005555556d6d7e in mark_vectorlike (header=0x555555c34f10)
>     at alloc.c:6280

I'm trying to reproduce your build environment vaguely, and while the
addresses don't match up perfectly that does indeed appear to be an
eagerly-rehashed hash table's ->hash vector.

This is a shot in the dark, but in my case, the table containing that
address is Vdbus_registered_objects_table. Can you check whether
that's true in your case, too? Something like "p
globals.f_Vdbus_registered_objects_table" in gdb (using the core dump
should be fine) should either produce 0x555555c34f15 or something
else. If it is dbus, can you try compiling without it?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Thu, 13 Aug 2020 10:13:02 GMT) Full text and rfc822 format available.

Message #53 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Thu, 13 Aug 2020 12:12:08 +0200
Pip Cet <pipcet <at> gmail.com> writes:

> This is a shot in the dark, but in my case, the table containing that
> address is Vdbus_registered_objects_table. Can you check whether
> that's true in your case, too? Something like "p
> globals.f_Vdbus_registered_objects_table" in gdb (using the core dump
> should be fine) should either produce 0x555555c34f15 or something
> else. If it is dbus, can you try compiling without it?

Let's see...

(gdb) p globals.f_Vdbus_registered_objects_table
$1 = (Lisp_Object) 0x7ffff19e6665

I'll try compiling without dbus and see what happens.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Thu, 13 Aug 2020 10:17:01 GMT) Full text and rfc822 format available.

Message #56 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Thu, 13 Aug 2020 12:15:50 +0200
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> I'll try compiling without dbus and see what happens.

With ./configure --without-dbus, "make bootstrap" doesn't error out for
me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Thu, 13 Aug 2020 14:09:02 GMT) Full text and rfc822 format available.

Message #59 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Thu, 13 Aug 2020 14:08:12 +0000
On Thu, Aug 13, 2020 at 10:15 AM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> > I'll try compiling without dbus and see what happens.
>
> With ./configure --without-dbus, "make bootstrap" doesn't error out for
> me.

So even though the hash table wasn't the dbus hash table, omitting the
dbus code somehow avoids the problem? Odd.

All that sounds to me like we ought to dig down into the core file and
figure out what happened, since the issue is likely to remain present
otherwise and it seems somewhat difficult to track down and reproduce.

The other odd thing is that 0xc000000018000000. That looks like a
GC-marked pseudovector header, but I've checked and can't find
anything that would generate PVEC_COMPILEDs of length 0, which would
be a severe bug.

Can you find out which hash table lives at 0x7ffff19a41a8? I'd suggest
something like "find &globals,&globals+1,0x7ffff19a41ad" to get the
offset in globals, if it is a global variable, then looking it up with
"ptype/o globals".

(If you don't have the time, I'd be happy to look at the core file
myself, if we can arrange that).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 11:50:02 GMT) Full text and rfc822 format available.

Message #62 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 13:48:56 +0200
Pip Cet <pipcet <at> gmail.com> writes:

> Can you find out which hash table lives at 0x7ffff19a41a8? I'd suggest
> something like "find &globals,&globals+1,0x7ffff19a41ad" to get the
> offset in globals, if it is a global variable, then looking it up with
> "ptype/o globals".

That's the value from mark_vectorlike?  It's moved a bit:

#9  0x00005555556d6d7e in mark_vectorlike (header=header <at> entry=0x7ffff19a4190)
    at alloc.c:6280

But it says:

(gdb) find &globals,&globals+1,0x7ffff19a4190
Pattern not found.

> (If you don't have the time, I'd be happy to look at the core file
> myself, if we can arrange that).

The machine is unfortunately deep inside my private network, so there's
no easy way to allow ssh to it...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 12:06:01 GMT) Full text and rfc822 format available.

Message #65 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 12:05:08 +0000
On Fri, Aug 14, 2020 at 11:49 AM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> Pip Cet <pipcet <at> gmail.com> writes:
> > Can you find out which hash table lives at 0x7ffff19a41a8? I'd suggest
> > something like "find &globals,&globals+1,0x7ffff19a41ad" to get the
> > offset in globals, if it is a global variable, then looking it up with
> > "ptype/o globals".
>
> That's the value from mark_vectorlike?  It's moved a bit:

That's strange, but possible if non-reproducible things happen on the dbus...

> #9  0x00005555556d6d7e in mark_vectorlike (header=header <at> entry=0x7ffff19a4190)
>     at alloc.c:6280
>
> But it says:
>
> (gdb) find &globals,&globals+1,0x7ffff19a4190
> Pattern not found.

It would probably be 0x7ffff19a4195 that we'd be looking for, stored
as a tagged pointer, but it's possible it's not a global variable at
all, of course.

> > (If you don't have the time, I'd be happy to look at the core file
> > myself, if we can arrange that).
>
> The machine is unfortunately deep inside my private network, so there's
> no easy way to allow ssh to it...

If you do have a machine that could serve files, it'd be the core file
and the corresponding emacs executable that would be most interesting.
I expect the core file to be rather large, though.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 12:35:02 GMT) Full text and rfc822 format available.

Message #68 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 14:34:44 +0200
Pip Cet <pipcet <at> gmail.com> writes:

> If you do have a machine that could serve files, it'd be the core file
> and the corresponding emacs executable that would be most interesting.
> I expect the core file to be rather large, though.

Sure, I put them at:

https://quimby.gnus.org/s/emacs
https://quimby.gnus.org/s/core

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 14:26:01 GMT) Full text and rfc822 format available.

Message #71 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 14:24:48 +0000
On Thu, Aug 13, 2020 at 2:08 PM Pip Cet <pipcet <at> gmail.com> wrote:
> All that sounds to me like we ought to dig down into the core file and
> figure out what happened, since the issue is likely to remain present
> otherwise and it seems somewhat difficult to track down and reproduce.

I have a theory, and it sounds like a somewhat silly bug.

- there's a hash table h in the dumper image
- h->hash points to dynamically allocated storage (as it always does
after my patch)
- the last reference to the hash table dies
- garbage_collect is called and collects h->hash
- h->hash's storage is reallocated for a different vector with a
different start position
- a word (re)appears on the stack which looks like it's a pointer to h
(it isn't, actually)
- garbage_collect is called and calls mark_maybe_pointer(h)
- h is recognized as a pdumper object
- h->hash is marked
- we're now marking a word in the middle of the new vector that
occupies the space that h->hash used to occupy
- in our case, this word is 0xc000000018000005, which is interpreted
as a tagged pointer, dereferencing of which leads to SIGBUS

Is there something which I'm missing which would prevent this scenario?

If no, any ideas on how to fix it? The obvious fix would be to always
mark all pdumped objects, but that has a performance cost. Less
obvious would be clearing the memory in the pdumper image that belongs
to an object that's being "freed", or keeping track of which pdumper
objects are still valid after GC...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 15:03:02 GMT) Full text and rfc822 format available.

Message #74 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 15:01:44 +0000
[Message part 1 (text/plain, inline)]
On Fri, Aug 14, 2020 at 2:24 PM Pip Cet <pipcet <at> gmail.com> wrote:
> If no, any ideas on how to fix it? The obvious fix would be to always
> mark all pdumped objects, but that has a performance cost. Less
> obvious would be clearing the memory in the pdumper image that belongs
> to an object that's being "freed", or keeping track of which pdumper
> objects are still valid after GC...

I've gone with the last idea. This patch should fix things, though
given how difficult the bug is to trigger reliably it might also
merely appear to fix it...
[0001-Try-to-avoid-marking-zombie-pdumper-objects.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 15:38:01 GMT) Full text and rfc822 format available.

Message #77 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 17:37:42 +0200
Pip Cet <pipcet <at> gmail.com> writes:

> I've gone with the last idea. This patch should fix things,

It does!  With that patch, I'm not able to reproduce the bug.

> though given how difficult the bug is to trigger reliably it might
> also merely appear to fix it...

:-/

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 19:10:02 GMT) Full text and rfc822 format available.

Message #80 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 19:08:56 +0000
[Message part 1 (text/plain, inline)]
On Fri, Aug 14, 2020 at 3:37 PM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> Pip Cet <pipcet <at> gmail.com> writes:
> > I've gone with the last idea. This patch should fix things,
> It does!  With that patch, I'm not able to reproduce the bug.

Oops. There was a bug in the patch which would have resulted in assert
failures had I tested it with assertions. Fixed version attached.

The crash was looking at a hash table created, most likely, by
cl--generic-get-dispatcher; so that makes my theory sound more
plausible, too.
[0001-Try-to-avoid-marking-zombie-pdumper-objects.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 19:36:02 GMT) Full text and rfc822 format available.

Message #83 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 21:35:01 +0200
Pip Cet <pipcet <at> gmail.com> writes:

> Oops. There was a bug in the patch which would have resulted in assert
> failures had I tested it with assertions. Fixed version attached.

I can confirm that this version of the patch also makes the bus error go
away.

But, curiously enough, doing a "git pull" also makes the bus error go
away.  Rewinding git back 24 hours brought the bus error back again, and
I applied the patch to that version, and that made the bus error go
away.

So something has been committed over the last 24 hours resulting in me
no longer being able to reproduce the bug.  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 21:11:02 GMT) Full text and rfc822 format available.

Message #86 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 42832 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 23:10:21 +0200
I got a segfault again!

This time on FreeBSD (I'm working on setting up some VMs to do some
testing on non-Debian systems), and in a different place:

  ELC      foldout.elc
  ELC      follow.elc
gmake[3]: *** [Makefile:295: find-cmd.elc] Segmentation fault (core dumped)
gmake[3]: *** Waiting for unfinished jobs....
gmake[3]: Leaving directory '/usr/home/larsi/src/emacs/trunk/lisp'
gmake[2]: *** [Makefile:318: compile-main] Error 2
gmake[2]: Leaving directory '/usr/home/larsi/src/emacs/trunk/lisp'
gmake[1]: *** [Makefile:411: lisp] Error 2
gmake[1]: Leaving directory '/usr/home/larsi/src/emacs/trunk'
gmake: *** [GNUmakefile:93: default] Error 2

Backtrace:

#0  0x00000008023ec1ba in thr_kill () at /lib/libc.so.7
#1  0x00000008023ea5e4 in raise () at /lib/libc.so.7
#2  0x000000000041f5a4 in terminate_due_to_signal
    (sig=sig <at> entry=11, backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:408
#3  0x000000000041fa0c in handle_fatal_signal (sig=sig <at> entry=11)
    at sysdep.c:1782
#4  0x00000000005170c0 in deliver_thread_signal
    (sig=sig <at> entry=11, handler=0x41fa01 <handle_fatal_signal>) at sysdep.c:1756
#5  0x0000000000517126 in deliver_fatal_thread_signal (sig=11) at sysdep.c:1879
#6  handle_sigsegv (sig=11, siginfo=<optimized out>, arg=<optimized out>)
    at sysdep.c:1879
#7  0x00000008014f03ce in  () at /lib/libthr.so.3
#8  0x00000008014ef98f in  () at /lib/libthr.so.3
#9  0x00007ffffffff193 in <signal handler called> ()
#10 0x0000000000558ca2 in cons_marked_p (c=0x50) at pdumper.h:148
#11 mark_object (arg=<optimized out>) at alloc.c:6733
#12 0x000000000055969e in mark_vectorlike (header=0x8043fb620) at alloc.c:6280
#13 0x000000000055969e in mark_vectorlike (header=header <at> entry=0x803d112d8)
    at alloc.c:6280
#14 0x0000000000558fe8 in mark_hash_table (ptr=0x803d112d8) at alloc.c:6651
#15 mark_object (arg=<optimized out>) at alloc.c:6651
#16 0x000000000055960f in mark_memory (end=<optimized out>, 
    end <at> entry=0x7ffffffeea60, start=<optimized out>) at alloc.c:4842

And, again, your zombie patch makes the bug disappear.

So I guess it's time to commit it?  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 21:50:01 GMT) Full text and rfc822 format available.

Message #89 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>, Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 14:48:53 -0700
[Message part 1 (text/plain, inline)]
Thanks for working on this and writing a fix for this obscure bug.

A couple of minor thoughts on the patch.

We can avoid copying bitmaps by swapping pointers to them.

There are now two opportunities for calloc to fail, and we leak memory if the 
second one fails. A simple fix is to call calloc once and split the result in half.

Proposed patch attached.
[0001-Fix-bus-error-on-Debian-bullseye.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42832; Package emacs. (Fri, 14 Aug 2020 22:26:02 GMT) Full text and rfc822 format available.

Message #92 received at 42832 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 42832 <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 22:25:14 +0000
On Fri, Aug 14, 2020 at 9:48 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Thanks for working on this and writing a fix for this obscure bug.

Thank you for having a look at it.

> Proposed patch attached.

LGTM, with one very minor nit:

(pdumper_find_object_type_impl): Return PDUMPER_NO_OBJECT if
the last_mark_bits’ bit is set.

It's actually if it's unset that we return PDUMPER_NO_OBJECT.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Fri, 14 Aug 2020 22:53:02 GMT) Full text and rfc822 format available.

Notification sent to Lars Ingebrigtsen <larsi <at> gnus.org>:
bug acknowledged by developer. (Fri, 14 Aug 2020 22:53:02 GMT) Full text and rfc822 format available.

Message #97 received at 42832-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 42832-done <at> debbugs.gnu.org
Subject: Re: bug#42832: 28.0.50; "Bus error" when compiling Emacs now on
 Debian bullseye
Date: Fri, 14 Aug 2020 15:52:49 -0700
On 8/14/20 3:25 PM, Pip Cet wrote:
> It's actually if it's unset that we return PDUMPER_NO_OBJECT.

I fixed that, installed, and am marking the bug as done. Thanks very much.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 12 Sep 2020 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 278 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.