Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Sun, 12 Jan 2025 17:09:02 UTC
Severity: normal
View this message in rfc822 format
From: Pip Cet <pipcet <at> protonmail.com> To: Stefan Kangas <stefankangas <at> gmail.com> Cc: eggert <at> cs.ucla.edu, 75520 <at> debbugs.gnu.org, mattiase <at> acm.org, monnier <at> iro.umontreal.ca, Eli Zaretskii <eliz <at> gnu.org>, acorallo <at> gnu.org Subject: bug#75520: Circular code or data can hang Emacs unquittably Date: Wed, 15 Jan 2025 14:02:25 +0000
"Stefan Kangas" <stefankangas <at> gmail.com> writes: > Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text > editors" <bug-gnu-emacs <at> gnu.org> writes: > >> "Eli Zaretskii" <eliz <at> gnu.org> writes: >> >>> We can fix only the most painful ones. We don't need to make sure >>> none remain, if it isn't easy. >> >> It's not. Maybe coccinelle can help, though (Stefan Kangas just used >> it, so it seems to still exist, and coccigrep at least was updated >> recently). > > AFAIK, it's actively developed and used. Version 1.3.0 was released > last November. Thanks! Here's what I'm going to do (this is long, but I'd appreciate feedback on how to improve the coccinelle scripts in particular!): 1. DONE: coccinelle script to detect likely loops 2. TODO: improve coccinelle script to un-detect likely loops that call an infloop-breaking function 3. DONE: additional compile-time reporting of globals used in likely loops 4. TODO: go through all of them :-) Coccinelle doesn't seem to handle do { ... } while (...); loops, but we don't really need those: the only two hits would be timer_check_2 and validate_plist. (timer_check is probably a problem: it calls copy-sequence on Vtimer_list and Vtimer_idle_list, which may throw while input is blocked and inhibit-quit is t. It would be really nice to mark critical sections which should not ever throw a signal so we can eassert that we're not in one, but that's only a runtime check...) Here are the cocci files I'm using: // Find loops for bug#75520: Circular code or data can hang Emacs unquittably // This file cannot be applied to intervals.c, because that contains a // double loop which matches the pattern in two different ways. @@ identifier A; expression R; expression R2; @@ + maybe_circular(A); while ( ( CONSP (A) | R && CONSP (A) | CONSP (A) && R | R && CONSP (A) && R2 ) ) { <... A = XCDR (A); ...> } // Find loops for bug#75520: Circular code or data can hang Emacs unquittably @@ identifier A; expression E; expression E2; expression R; expression R2; @@ +E; +maybe_circular (A); -for (E; +for (; ( CONSP (A) | CONSP (A) && R | R && CONSP (A) | R && CONSP (A) && R2 ); E2) { <... A = XCDR(A); ...> } (Note that coccinelle is smart enough to find "A = XCDR(A);" even if that is the third sub-expression "argument" of the for statement. This makes things a lot easier.) These add calls to a new macro, maybe_circular, to happen right before the first iteration of the loop. It also breaks coding.c, because it matches a for loop definition in a macro, but fails to add the necessary final backslash to the new line it produces. So we could leave it at that, and go over the diff, removing calls to maybe_circular that are safe. If we had a list of function calls which we know prevent an infloop (such as maybe_quit, Flength(A), etc.), we could create another .cocci file to remove maybe_circular automatically if the loop is likely to be safe, but that would probably involve a small number of false negatives if the call to the infloop-preventing function is conditional. To get a quick list of global variables which are known to appear in such loops, this works (but, probably, only on ELF systems; definitely requires optimization, probably doesn't work with clang): #define is_tainted_global(v, v2, s) ({ \ if (__builtin_constant_p ((v) == (v2))) \ { \ asm volatile (".pushsection .tainted_globals,\"aw\"\n\t" \ ".asciz \"" s "\"\n\t" \ ".popsection" : : "i" (&v2)); \ } \ }) #include "tainted.h" #define maybe_circular(v) ({ \ is_tainted (v); \ 1; \ }) where tainted.h is generated by this bash script: (echo "#define is_tainted(v) ({ \\"; (grep ' Lisp_Object f_V' | sed -e 's/ Lisp_Object f_//g' | sed -e 's/;$//' | while read; do echo "is_tainted_global(v, $REPLY, \"$REPLY\"); \\"; done) < globals.h; echo "})") > tainted.h This puts the names of all global variables which are known to appear in potentially circular loops into an ELF section: Contents of section .tainted_globals: 67ed60 566f7665 726c6179 5f617272 6f775f76 Voverlay_arrow_v 67ed70 61726961 626c655f 6c697374 00566f76 ariable_list.Vov 67ed80 65726c61 795f6172 726f775f 76617269 erlay_arrow_vari 67ed90 61626c65 5f6c6973 7400566f 7665726c able_list.Voverl 67eda0 61795f61 72726f77 5f766172 6961626c ay_arrow_variabl 67edb0 655f6c69 73740056 6f766572 6c61795f e_list.Voverlay_ 67edc0 6172726f 775f7661 72696162 6c655f6c arrow_variable_l 67edd0 69737400 5677696e 646f775f 70657273 ist.Vwindow_pers 67ede0 69737465 6e745f70 6172616d 65746572 istent_parameter 67edf0 73005677 696e646f 775f7065 72736973 s.Vwindow_persis 67ee00 74656e74 5f706172 616d6574 65727300 tent_parameters. 67ee10 5677696e 646f775f 70657273 69737465 Vwindow_persiste 67ee20 6e745f70 6172616d 65746572 73005663 nt_parameters.Vc 67ee30 6f6d706c 6574696f 6e5f7265 67657870 ompletion_regexp 67ee40 5f6c6973 74005663 6f6d706c 6574696f _list.Vcompletio 67ee50 6e5f6967 6e6f7265 645f6578 74656e73 n_ignored_extens 67ee60 696f6e73 0056636f 6d706c65 74696f6e ions.Vcompletion 67ee70 5f69676e 6f726564 5f657874 656e7369 _ignored_extensi 67ee80 6f6e7300 56646562 75675f69 676e6f72 ons.Vdebug_ignor 67ee90 65645f65 72726f72 73005666 6f6e745f ed_errors.Vfont_ 67eea0 656e636f 64696e67 5f616c69 73740056 encoding_alist.V 67eeb0 66616365 5f666f6e 745f7265 7363616c face_font_rescal 67eec0 655f616c 69737400 56736372 6970745f e_alist.Vscript_ 67eed0 72657072 6573656e 74617469 76655f63 representative_c 67eee0 68617273 00567363 616c6162 6c655f66 hars.Vscalable_f 67eef0 6f6e7473 5f616c6c 6f776564 00 onts_allowed. This finds both cases that are actually safe (Vwindow_persistent_parameters has its own cycle detection: no teleporting there) and cases that seem unsafe (Vcompletion_ignored_extensions). Somewhat independently of this, we might want to put static Lisp_Objects into a separate section: this allows us to make staticpro of such objects a nop and simply protect the entire section. Similarly, we can make global Lisp_Object "variables" (actually elements of struct emacs_globals) contiguous in memory, and protect them all at once, omitting the staticpro calls for that case as well. This would require changes to make-docfile.c which should also allow us to conditionalize DEFVAR_LISP/DEFSYM in a way that detects unreachable DEFVAR_LISP/DEFSYM "calls" and doesn't create variables for all of them. (For example, right now, builds without nativecomp still defsym some 65 symbols from comp.c, wasting space in lispsym and for the symbol name). (Another reason we may want to touch make-docfile.c is that we sometimes use uninitialized global Lisp variables; if they were contiguous in memory, we could simply poison them and catch such errors. I know this approach works because if I poison the entire globals area, it allows me to catch one such bug before crashing because non-Lisp-Object globals are also poisoned: Voverriding_plist_environment is used by init_syntax_once() but only initialized by syms_of_fns()) These changes should help performance, particularly with MPS, and may allow protecting the static/global areas with memory barriers, which may or may not help performance further (it may help debugging to put each Lisp_Object on a separate page). I also think this should allow us to add runtime checks ensuring that certain static Lisp_Objects do not ever refer to circular objects. So that's my approach: coccinelle for static analysis, compile-time analysis producing an ELF section of likely tainted globals, and potentially changes to ensure certain statics are non-circular at runtime. Of course, the first thing to do for any such changes is to remove DEFVAR_LISP_NOPRO; see bug#75521 for a very lengthy discussion of this question; IIUC, the conclusion is that the master branch will retain DEFVAR_LISP_NOPRO for now, which complicates things significantly. Pip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.