Package: emacs;
Reported by: Tom Gillespie <tgbugs <at> gmail.com>
Date: Fri, 4 Sep 2020 01:07:02 UTC
Severity: normal
Done: Andrea Corallo <akrl <at> sdf.org>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Tom Gillespie <tgbugs <at> gmail.com> To: 43190 <at> debbugs.gnu.org Cc: Andrea Corallo <akrl <at> sdf.org> Subject: bug#43190: feature/native-comp; failure to compile json-mode leaves Emacs broken Date: Thu, 3 Sep 2020 18:06:00 -0700
Hi Andrea, Found another one. I don't know what is going wrong on the compiler side that induces it, but hopefully the repro can get you pointed in the right direction. Also sent to the bug tracker so it should show up there shortly. I have left the section on issues with kill-buffer in the report so that anyone encountering the effect of the bug won't have to spend the same amount of time I did figuring out what is happening. Best! Tom This issue occurs at 3023eb569213a3dd5183640f6e322acd00ea536a. It does not happen in emacs-27 nor on master at ae6daa680a5f5f5fb9c6a15296e5e88c97cd770a. When trying to load json-mode for the first time it fails to compile due to some issue in json-snatcher, the error that is displayed is json-mode.el:33:1: Error: Symbol’s function definition is void: jsons-remove-buffer I suspect that something goes wrong inside json-snatcher.el and the end result is that =(add-hook 'kill-buffer-hook 'jsons-remove-buffer)= on line json-snatcher.el:85 manages to run, but then the defun for that function is never called. The fact that that function cannot be found means that anything that calls kill-buffer will be broken, which includes a terrifyingly large portion of Emacs' debugging functionality (see footnote 1). On a bad day this also breaks things like the ability to exit Emacs via =C-x C-c=. Fortunately in the repro below this state is not triggered so you can at least exit. Just in case I included the escape hatches that can be evaluated using =C-x C-e= in the scratch buffer most relevantly =(pop kill-buffer-hook)=. Here is a simple reproduction that is hopefully enough to track down what is going wrong with native compile. I have not been able to produce a simple visible repro using --batch, but I think the problem is clear enough. #+begin_src bash [[ -d "/tmp/test/.emacs.d/" ]] && rm -r /tmp/test/.emacs.d/; \ emacs -q \ --eval "(setq user-emacs-directory \"/tmp/test/.emacs.d/\")" \ --eval "(require 'package)" \ --eval "(add-to-list 'package-archives '(\"melpa\" . \"https://melpa.org/packages/\") t)" \ --eval "(package-refresh-contents)" \ --eval "(package-install 'json-mode)" #+end_src The repro for emacs-27 without the --eval options. #+begin_src bash [[ -d "/tmp/test/.emacs.d/" ]] && rm -r /tmp/test/.emacs.d/; \ emacs-27 -q \ #+end_src The repro for the master branch without the --eval options. #+begin_src bash ./autogen.sh ./configure make -j 8 [[ -d "/tmp/test/.emacs.d/" ]] && rm -r /tmp/test/.emacs.d/; \ src/emacs -q \ #+end_src A longer reproduction that controls for a number of variables and shows the larger issues with kill-buffer is below. Run this and hit enter in the ielm buffer and you will experience and odd hang. Hit enter again and then switch to the scratch buffer if you want to play around with the additional issues. #+begin_src bash [[ -d "/tmp/test/.emacs.d/" ]] && rm -r /tmp/test/.emacs.d/; \ emacs -q \ --eval "(setq user-emacs-directory \"/tmp/test/.emacs.d/\")" \ --eval "(when (boundp 'comp-eln-load-path) (pop comp-eln-load-path) (add-to-list 'comp-eln-load-path (concat user-emacs-directory \"eln-cache\")))" \ --eval "(require 'package)" \ --eval "(add-to-list 'package-archives '(\"melpa\" . \"https://melpa.org/packages/\") t)" \ --eval "(package-refresh-contents)" \ --eval "(ignore-errors (package-install 'json-mode))" \ --eval "(ielm)" \ --eval "(with-current-buffer \"*ielm*\" (insert \"(setq asdf (+ 1 2))\"))" \ --eval "(with-current-buffer \"*scratch*\" (insert \"(pop kill-buffer-hook) ; calling this will revert the massive breakage asdf ; C-x C-e this to see more brokeness with native comp or working with 27 (fmakunbound 'jsons-remove-buffer) ; use to repro the effects without native comp C-x-C-e-me-i-am-not-bound ; asdf will be bound if testing with 27 ;; run these blocks to be able to safely enter the debugger (setq debug-on-message \\\"break\\\") (defvar break-message \\\"break\\\") (defun jsons-remove-buffer () (let ((bm break-message)) (setq break-message \\\"\\\") (message bm)))\"))" #+end_src Backtraces. #+begin_example elisp Debugger entered--Lisp error: "break" message("break") (let ((bm break-message)) (setq break-message "") (message bm)) jsons-remove-buffer() kill-buffer(#<buffer *temp*-833840>) ielm-eval-input(#("(+ 1 2)" 0 7 (fontified t)) nil) ielm-send-input(nil) ielm-return() funcall-interactively(ielm-return) command-execute(ielm-return) #+end_example #+begin_example elisp Debugger entered--Lisp error: "break" message("break") (let ((bm break-message)) (setq break-message "") (message bm)) jsons-remove-buffer() kill-buffer(#<buffer *temp*-61838>) #f(compiled-function () #<bytecode 0x11ac346e22dcf22>)() cl-print-to-string-with-limit(backtrace--print (void-variable asdf) 5000) backtrace--print-to-string((void-variable asdf) nil) backtrace-print-to-string((void-variable asdf)) debugger--insert-header((error (void-variable asdf))) #f(compiled-function () #<bytecode 0x1f401d22beba>)() backtrace-print() debugger-setup-buffer((error (void-variable asdf))) debug(error (void-variable asdf)) (progn asdf) elisp--eval-last-sexp(nil) eval-last-sexp(nil) funcall-interactively(eval-last-sexp nil) command-execute(eval-last-sexp) #+end_example Thoughts on the presence of a larger issue. While this is clearly a bug in the way that json-snatcher is written (the symbol should never have been added to the hook before the function was defined), it is only causing an issue with the new native comp code. Furthermore this example reveals a number of systematic issues with when and where kill-buffer is being called inside core Emacs code, and with how hook failures on kill-buffer are being handled. This was an absolute nightmare to debug and I think I will submit a separate issue about the utterly undebuggable nature of issues with kill-buffer-hook. Thoughts on potential solutions while they are fresh in my mind. It might be possible to add a check in add-hook that could be invoked for hooks that want to require any symbol added to the hook list to already be bound at the time that it is added to the hook list. This would prevent the issue that we see with json-snatcher in the future. However this does not prevent issues if someone was doing something evil like using push to add to kill-buffer-hook. The error message that is currently emitted for these kinds of failures is almost good enough, I think that it just needs a note that the symbol was originally found the kill-buffer-hook list and maybe provide instructions on how to remove the symbol from the hook. For example "kill-buffer: Symbol’s function definition is void: jsons-remove-buffer Symbol jsons-remove-buffer in kill-buffer-hook list is void you can run (remove-hook 'kill-buffer-hook 'jsons-remove-buffer) to restore kill-buffer functionality" Footnotes. 1. For example toggle-debug-on-error completely fails to function correctly in this case, as do things like ielm-return. Anything which uses with-temp-buffer and tries to print, such as cl-print-to-string-with-limit will have its output silenced or ignored due to the failure of kill-buffer in the unwind forms. Normal approaches to debugging nearly all wind up stuck in infinite loops since most of them make a call to kill-buffer somewhere.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.