GNU bug report logs - #63288
30.0.50; Emacs 30 packages fail to build with native comp on some machines

Previous Next

Package: emacs;

Reported by: Brian Leung <leungbk <at> posteo.net>

Date: Fri, 5 May 2023 04:00:02 UTC

Severity: normal

Found in version 30.0.50

Done: Pip Cet <pipcet <at> protonmail.com>

Bug is archived. No further changes may be made.

Full log


Message #35 received at 63288 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: damien <at> merenne.me
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andrea Corallo <acorallo <at> gnu.org>,
 63288 <at> debbugs.gnu.org
Subject: Re: bug#63288: 30.0.50;
 Emacs 30 packages fail to build with native comp on some machines
Date: Sat, 25 Jan 2025 17:26:17 +0000
<damien <at> merenne.me> writes:

> To reproduce, I simply have to start `emacs -Q` and eval
>
> ```
> (require 'package)
> (package-read-from-string "((emacs \"25.1\"))")
> ```
> in the scratch buffer:
>
> ```
> Debugger entered--Lisp error: (error "Can’t read whole string")
>   error("Can't read whole string")
>   package-read-from-string("((emacs \"25.1\"))")
>   (progn (package-read-from-string "((emacs \"25.1\"))"))
>   eval((progn (package-read-from-string "((emacs \"25.1\"))")) t)
>   elisp--eval-last-sexp(nil)
> ```
> Again, after re-evaluating the defun, it's working ok.
>
> I found something while messing around with recompiling the file.
> If I rebuild it manually it works ok, but then rebuilding the whole emacs source tree, and it fails again.
> So I found this: building with `make -j48` (I have a 24 core CPU) triggers the problem while building
> with `make -j 1` does not. I attach the build.sh I used to configure the source tree, the configure and
>  build logs, the elc file with the problem and a good version.

Ouch.  It's known that make -j produces different output sometimes
(defsubst gets inlined sometimes, sometimes it doesn't), but that's the
first time I heard about an actual build breaking in identical settings
depending on it!

No "pure space overflow" message in the logs you sent, so it's probably
not that.

Few people have 24 cores, so it might just be that that particular
constellation results in a weird build order.

-(defalias 'package-read-from-string #[257 "\300!\211\242\243\3011\300\"\210\302\303!0\207\210\207" [read-from-string (end-of-file) error "Can't read whole string"] 7 "Read a Lisp expression from STR.\nSignal an error if the entire string was not used.\n\n(fn STR)"])
+(defalias 'package-read-from-string #[257 "\300!\211\242\243\3011\302\303!0\207\210\207" [read-from-string (end-of-file) error "Can't read whole string"] 6 "Read a Lisp expression from STR.\nSignal an error if the entire string was not used.\n\n(fn STR)"])

That seems to be the problematic code.  For some weird reason, I'm
seeing problems with either definition when running M-x disassemble.
That's very strange, but it might also be a local bug in my current
session.

Disassembling by reading the byte string is possible, but it'll take a
while.

(I was going to suggest this might be a nativecomp bug, and that maybe
removing comp--type-check-optim from comp.el here:

(defconst comp-passes '(comp--spill-lap
                        comp--limplify
                        comp--fwprop
                        comp--call-optim
                        comp--ipa-pure
                        comp--add-cstrs
                        comp--fwprop
                        comp--type-check-optim
                        comp--tco
                        comp--fwprop
                        comp--remove-type-hints
                        comp--sanitizer
                        comp--compute-function-types
                        comp--final)
  "Passes to be executed in order.")

might help.  However, that seems less likely now so I really wouldn't
bother at this point.  Disassembling the byte code is the next logical
thing to do, and then it's more likely to be a byte-opt bug that only
happens in weird nativecomp setups because the byte compiler options
differ there.)

> I don't think this is some memory corruption happening under load, I compile a big c++ code base
> daily and I never encountered any problem with the compiled binaries. Also I'm not the only one with
> this exact problem so... But who knows...

Unlikely to be HW error.  It's reproducible, right?

Pip





This bug report was last modified 132 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.