GNU bug report logs - #57789
Emacs 28.1 clone build with native compilation crashes on s390x

Previous Next

Package: emacs;

Reported by: Rob Browning <rlb <at> defaultvalue.org>

Date: Wed, 14 Sep 2022 01:05:01 UTC

Severity: normal

Tags: moreinfo

Full log


View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com>
To: Rob Browning <rlb <at> defaultvalue.org>
Cc: gerd.moellmann <at> gmail.com, 57789 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, Andrea Corallo <acorallo <at> gnu.org>, Stefan Kangas <stefankangas <at> gmail.com>
Subject: bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
Date: Fri, 03 Jan 2025 18:57:55 +0000
"Rob Browning" <rlb <at> defaultvalue.org> writes:

> Stefan Kangas <stefankangas <at> gmail.com> writes:
>
>> Thanks.  I guess not a lot of us have access to an s390x machine, so I
>> don't think anyone has been able to test it.
>
> Hmm, I think I've heard there may be (or were?) some public instances
> that provide short-term dev access, but have never looked in to it.

I have cfarm access, but cfarm doesn't have an s390 machine :-(

> I was also going to outline an easy way to test in a vm at least on a
> Debian system via debvm/mmdebstrap, but after doing that, I wasn't able
> to reproduce the problem there.  (Happy to provide instructions for
> anyone interested, otherwise.)

Same compiler?  Is ASLR in use? In any case, I'm always interested in
weird machines, even if they're virtual, so I'd appreciate such
instructions.

> In any case, I just tried both the current Debian package and an
> upstream emcs-29.4 checkout on zelenka.debian.org, and both fail.
>
> The emacs-29.4 tree fails like this:
>
> make[3]: Entering directory '/home/rlb/emacs/admin/unidata'
> make[3]: Nothing to be done for 'charscript.el'.
> make[3]: Leaving directory '/home/rlb/emacs/admin/unidata'
> make -C ../admin/unidata emoji-zwj.el
> make[3]: Entering directory '/home/rlb/emacs/admin/unidata'
> make[3]: Nothing to be done for 'emoji-zwj.el'.
> make[3]: Leaving directory '/home/rlb/emacs/admin/unidata'
>   ELC+ELN  ../lisp/emacs-lisp/eldoc.elc
>
> Error: wrong-type-argument ("../lisp/emacs-lisp/eldoc.el" hash-table-p
> [unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound unbound unbound unbound unbound unbound unbound
> unbound unbound])
> Fatal error 11: Segmentation fault

Two random guesses:

1. purespace overflow.  This causes erratic behavior of pretty much
every description.  The tell-tale sign would be a "Pure Lisp storage
overflowed" message at some point in the "make bootstrap" log, maybe a
very long time before we crash.

2. GC problem.  One possible problem is that Emacs currently relies on
__builtin_unwind_init to do the right thing.  If __builtin_unwind_init
isn't implemented on s390, but is necessary (the second part is very
likely), we'll fail to mark some objects on the stack.

(2) seems more likely.

> Backtrace:
> ../src/bootstrap-emacs(emacs_backtrace+0x46) [0x2aa1c2f12f6]
> ../src/bootstrap-emacs(terminate_due_to_signal+0x9e) [0x2aa1c18fb76]
> ../src/bootstrap-emacs(+0x8fdde) [0x2aa1c18fdde]
> ../src/bootstrap-emacs(+0x1ef45a) [0x2aa1c2ef45a]
> ../src/bootstrap-emacs(+0x1ef4a2) [0x2aa1c2ef4a2]
> linux-vdso64.so.1(__kernel_rt_sigreturn+0x0) [0x3ffdc0e5480]
> ../src/bootstrap-emacs(+0x2433a4) [0x2aa1c3433a4]
> ../src/bootstrap-emacs(visit_static_gc_roots+0x196) [0x2aa1c342dae]
> ../src/bootstrap-emacs(garbage_collect+0x1e6) [0x2aa1c3445d6]
> ../src/bootstrap-emacs(eval_sub+0x54c) [0x2aa1c370244]
> ../src/bootstrap-emacs(eval_sub+0x4ac) [0x2aa1c3701a4]
> ../src/bootstrap-emacs(Fcond+0x84) [0x2aa1c3711f4]
> ../src/bootstrap-emacs(eval_sub+0x8d2) [0x2aa1c3705ca]
> ../src/bootstrap-emacs(Fwhile+0x6e) [0x2aa1c370fb6]

Can you disassemble the Fwhile, eval_sub, and visit_static_gc_roots
functions?  I assume s390 disassembled code isn't too hard to read...

Random aside: is 0x2aa1c3705ca a likely S390 program counter?  The
number looks familiar because it looks similar to a Lisp_Object
representing a symbol on x86-64 without ASLR (an example would be
0x2aaa8dac00e8).  I guess it's just a coincidence though.

Thanks!

Pip





This bug report was last modified 156 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.