Package: emacs;
Reported by: chiaki-ishikawa-thunderbird-account <chiaki.ishikawa <at> ubin.jp>
Date: Tue, 4 Feb 2020 12:39:01 UTC
Severity: normal
Tags: moreinfo, notabug, unreproducible
Found in version 26.2
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Message #22 received at 39413 <at> debbugs.gnu.org (full text, mbox):
From: "ISHIKAWA,chiaki" <ishikawa <at> yk.rim.or.jp> To: 39413 <at> debbugs.gnu.org Cc: chiaki-ishikawa-thunderbird-account <chiaki.ishikawa <at> ubin.jp>, Noam Postavsky <npostavs <at> gmail.com> Subject: Re: bug#39413: 26.2; Emacs gets hung Date: Wed, 29 Apr 2020 06:36:46 +0900
Hi, I am back to this strange guru meditation of emacs. Emacs would eventually begin working, but the silent period where no key or mouse event is acknowledged is rather longish. Several weeks after I installed GNU Emacs 28.0.50 (that is what |emacs-version| variable says. I installed from git repo.) , I began noticing the occasional long pause (no response) again. That is noticeable when I was running a mozilla thunderbird build script in *shell* buffer. (M-X shell RETURN) Obviously, the large log may cause some resource issues, but, say the year before, I might have gotten the warning about memory running low (I can't recall the exact warning. It asked something about "Am I sure to do the operation?", like for example, when I invoked |erase-buffer| to the very large *shell* buffer that contains many log lines. Emacs responded something about dire situation and if I was really invoking the function, and not by mistake, etc. However, it never got seemingly hung for like 10-15 seconds. Yesterday, this long pause happened again, and so I attached gdb to the running emacs process and the following is what I found. Basically, sweep_conses() takes unusually long time. (It is possible that by the time I noticed the long hung and attached gdb, the major portion of the problematic code had already been executed.) This is the first stac ktrace when I attach gdb to emacs. So it looks that it was trying to free cons cells. #0 lisp_align_free (block=<optimized out>) at alloc.c:1266 #1 0x000055bc512500f2 in sweep_conses () at alloc.c:6907 #2 0x000055bc51255b27 in gc_sweep () at alloc.c:7134 #3 garbage_collect () at alloc.c:6105 #4 0x000055bc51255ed1 in maybe_garbage_collect () at alloc.c:5968 #5 0x000055bc51271785 in maybe_gc () at lisp.h:5068 #6 Ffuncall (nargs=4, args=args <at> entry=0x7fff92901a70) at eval.c:2778 #7 0x000055bc512a5838 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>) at bytecode.c:633 #8 0x000055bc51271707 in Ffuncall (nargs=4, args=args <at> entry=0x7fff92901f38) at eval.c:2808 ... omitted ... I invoked "finish" command to gdb. (A very handy command, indeed.) finish Run till exit from #0 lisp_align_free (block=<optimized out>) at alloc.c:1266 0x000055bc512500f2 in sweep_conses () at alloc.c:6907 6907 lisp_align_free (cblk); where #0 0x000055bc512500f2 in sweep_conses () at alloc.c:6907 #1 0x000055bc51255b27 in gc_sweep () at alloc.c:7134 #2 garbage_collect () at alloc.c:6105 #3 0x000055bc51255ed1 in maybe_garbage_collect () at alloc.c:5968 ... omitted ... Another finish Run till exit from #0 0x000055bc512500f2 in sweep_conses () at alloc.c:6907 gc_sweep () at alloc.c:7135 7135 sweep_floats (); (gdb) sweep_conses() took rather a long time. Maybe five seconds or so. (Could be longer almost 10 seconds...) Another finish. sweep_floats (); (gdb) finish Run till exit from #0 gc_sweep () at alloc.c:7135 garbage_collect () at alloc.c:7141 7141 check_string_bytes (!noninteractive); (gdb) sweep_float() ends quickly. Another finish. finish Run till exit from #0 garbage_collect () at alloc.c:7141 0x000055bc51271785 in maybe_gc () at lisp.h:5068 5068 maybe_garbage_collect (); (gdb) check_string_bytes finished quickly. finish Run till exit from #0 0x000055bc51271785 in maybe_gc () at lisp.h:5068 deliver_input_available_signal (sig=29) at keyboard.c:7198 7198 deliver_process_signal (sig, handle_input_available_signal); (gdb) Quit (gdb) where The above one took a fraction of a second. It was noticeable. finish Run till exit from #0 deliver_input_available_signal (sig=29) at keyboard.c:7198 <signal handler called> (gdb) finish Run till exit from #0 <signal handler called> 0x000055bc51271785 in maybe_gc () at lisp.h:5068 5068 maybe_garbage_collect (); (gdb) immediate return Well, actually after a few more finish (which completed immediately), emacs was responsive again. I had a few similar experiences yesterday. My observation: So my take on this is sweep_conses() takes a rather long time and not letting emacs to take any input. (And there could be other functions before it.) Question.: Has there been any change in emacs's alloc.c code in the last year or two? (Even an seemingly innocuous change might have introduced a very bad CPU cache behavior that makes sweep_conses() to perform badly. Or maybe it is now allowed to run without time-constraint?) I suspect that since the apparent long hung occurred during the time when |*shell*| buffer was accumulating many lines from a largish running build process, maybe the string allocation was not catching up and causing excessive GC. In that case, the ordering of reclaiming functions could be arranged to find string first? (Addition to the buffer of lines read from a child process is basically an insertion at the end of the buffer of strings, correct?) OR maybe I am hitting the age old adagio of "GC considered harmful". [1] Simply my virtual space is too large to be handled efficiently. I remember reading the paper the first time it appeared, but I never thought it would hit me on my home PC. (The PC at the time had 64KB-128KB of memory). I am running emacs under Debian GNU/linux inside VirtualBox which is assigned 16GB memory. My suggestion: I would think sweep_conses() may want to quit early, if possible, after obtaining certain amount of space (or for a casual GC invoked periodically, may want to quit after one second or something like that. That is, we may want to set a elapsed time limit.) In a really tight situation, we can invoke the reclaiming functions with a flag telling them to use as much CPU/Elapsed time to reclaim as many octets as possible, but for a periodic GC, it might make sense to return half-heartedly. Oh, wait, have I set a variable to control periodic GC and forgot about it? I checked. I don't think I tweaked the following variables. C-H k output. gc-cons-threshold is a variable defined in ‘C source code’. Its value is 800000 gc-cons-percentage is a variable defined in ‘C source code’. Its value is 0.1 As of now in a different incarnation of emacs process (this is a differnt process from the one which I attached gdb.), gc-elapsed is a variable defined in ‘C source code’. Its value is 68.3795433 Hope this helps us shedding some light on the guru meditation of emacs probably due to an unusually longish GC. TIA Chiaki PS: Given that this may have something to do with the paging activity of the emacs process, the linux version change (from 4 to 5) may have affected the paging behavior of the program, but it is hard to say. I now see this linux instance is using 293M SWAP. Compiling mozilla thunderbird source runs |make -j6| and g++-9 compiles CPP source files which are very large and so they exert heavy memory pressure. A bad memory access pattern of emacs, not friendly to locality of access, would add to page faults. Except, |xosview| program does not show any paging activity. I am not sure if |xosview| is correct or not. [1] "Address/memory management for a gigantic LISP environment or, GC considered harmful", Jon L. White, LFP '80: Proceedings of the 1980 ACM conference on LISP and functional programming, August 1980 Pages 119–127 https://doi.org/10.1145/800087.802797 https://dl.acm.org/doi/10.1145/1317203.1317206 On 2020/02/23 2:53, chiaki-ishikawa-thunderbird-account wrote: > Thank you. I will taken note of "finish" and see if I can observe the > return from alloc.c code > if I get to see the next obvious hung. > > Chiaki > > On 2020/02/22 2:27, Noam Postavsky wrote: >> tags 39413 + unreproducible >> quit >> >> chiaki-ishikawa-thunderbird-account <chiaki.ishikawa <at> ubin.jp> writes: >> >>> I printed stack backtrace and Emacs seemed to be in memory allocator. >>> Then I let it continue. >>> Still the emacs screen was locked up. >>> So I interrupted the execution, and printed stack backtrace. >>> Again Emacs was in memory allocator. >>> Eventually I gave up and killed emacs. >>> >>> It seems that garbage collection or >>> routine in alloc.c was looping. >> That's not enough evidence to show that the garbage collection was >> looping. There could be some higher level loop that does a lot of >> allocation, so that if you stop at some random point you would have a >> high probability of stopping in the alloc.c. If you catch this in gdb >> again, try running the 'finish' a few times to see if it can leave the >> alloc.c code. >> >>> This is not a repeatable bug. I have no idea how to reproduce this. It >>> occurs every now and then. > > >
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.