Package: emacs;
Reported by: Michael Heerdegen <michael_heerdegen <at> web.de>
Date: Mon, 14 Sep 2020 00:44:01 UTC
Severity: normal
Merged with 43395, 43876, 44666
Found in version 28.0.50
Done: Stefan Monnier <monnier <at> iro.umontreal.ca>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Eli Zaretskii <eliz <at> gnu.org> To: Trevor Bentley <trevor <at> trevorbentley.com> Cc: 43389 <at> debbugs.gnu.org Subject: bug#43389: 28.0.50; Emacs memory leaks Date: Fri, 30 Oct 2020 10:00:29 +0200
> From: Trevor Bentley <trevor <at> trevorbentley.com> > Date: Thu, 29 Oct 2020 21:17:20 +0100 > > It doesn't start leaking until it has been active for 2-3 days. > It might depends on other factors, such as suspending or losing > network connectivity. Once the leak triggers, it grows at a rate > of about 1MB every few seconds. My machine has 32GB, so it gets > pretty far before I notice and kill it. I'm not sure if there is a > limit. > > I built emacs with debug symbols and dumped some strace logs last > time it happened. This is from the "native-comp" branch, since > it's the only one I had built with debug symbols: GNU Emacs > 28.0.50, commit feed53f8b5da0e58cce412cd41a52883dba6c1be. I see > the same with the version installed from my package manager (Arch, > GNU Emacs 27.1), and the strace log looks about the same, though > without symbols. > > I waited until it was actively leaking, and then ran the following > command to print a stack trace whenever the heap is extended with > brk(): > > $ sudo strace -p $PID -k -r --trace="?brk" --signal="SIGTERM" > > The findings: this particular leak is triggered in libgnutls. I > get large batches of the following (truncated) stack trace Thanks. This trace doesn't show how many bytes were allocated, does it? Without that it is hard to judge whether these GnuTLS calls could be the culprit. Because the full trace shows other calls to malloc, for example this: > /usr/lib/libc-2.32.so(brk+0xb) [0xf6e7b] > /usr/lib/libc-2.32.so(__sbrk+0x84) [0xf6f54] > /usr/lib/libc-2.32.so(__default_morecore+0xd) [0x8d80d] > /usr/lib/libc-2.32.so(sysmalloc+0x372) [0x890e2] > /usr/lib/libc-2.32.so(_int_malloc+0xd9e) [0x8ad6e] > /usr/lib/libc-2.32.so(_int_memalign+0x3f) [0x8b01f] > /usr/lib/libc-2.32.so(_mid_memalign+0x13c) [0x8c12c] > /home/trevor/applications/opt/bin/emacs-28.0.50(lisp_align_malloc+0x2e) [0x2364ee] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcons+0x65) [0x237f74] > /home/trevor/applications/opt/bin/emacs-28.0.50(store_in_alist+0x5f) [0x5c9a3] > /home/trevor/applications/opt/bin/emacs-28.0.50(gui_report_frame_params+0x46a) [0x607f1] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fframe_parameters+0x499) [0x5d88b] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fframe_parameter+0x381) [0x5dc9c] > /home/trevor/applications/opt/bin/emacs-28.0.50(eval_sub+0x7a7) [0x26f964] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fif+0x1f) [0x26b590] > /home/trevor/applications/opt/bin/emacs-28.0.50(eval_sub+0x38b) [0x26f548] > /home/trevor/applications/opt/bin/emacs-28.0.50(Feval+0x7a) [0x26ef45] > /home/trevor/applications/opt/bin/emacs-28.0.50(funcall_subr+0x257) [0x271463] > /home/trevor/applications/opt/bin/emacs-28.0.50(Ffuncall+0x192) [0x270fe9] > /home/trevor/applications/opt/bin/emacs-28.0.50(internal_condition_case_n+0xa1) [0x26d81a] > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__call+0x211) [0x73943] > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__call1+0xba) [0x73b47] > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__eval+0x35) [0x73bd7] > /home/trevor/applications/opt/bin/emacs-28.0.50(display_mode_element+0xe32) [0xb5515] This seems to indicate some mode-line element that uses :eval, but without knowing what it does it is hard to say anything more specific. I also see this: > /home/trevor/applications/opt/bin/emacs-28.0.50(_start+0x2e) [0x4598e] 2.870962 brk(0x55f5ed9a4000) = 0x55f5ed9a4000 > /usr/lib/libc-2.32.so(brk+0xb) [0xf6e7b] > /usr/lib/libc-2.32.so(__sbrk+0x84) [0xf6f54] > /usr/lib/libc-2.32.so(__default_morecore+0xd) [0x8d80d] > /usr/lib/libc-2.32.so(sysmalloc+0x372) [0x890e2] > /usr/lib/libc-2.32.so(_int_malloc+0xd9e) [0x8ad6e] > /usr/lib/libc-2.32.so(_int_memalign+0x3f) [0x8b01f] > /usr/lib/libc-2.32.so(_mid_memalign+0x13c) [0x8c12c] > /home/trevor/applications/opt/bin/emacs-28.0.50(lisp_align_malloc+0x2e) [0x2364ee] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcons+0x65) [0x237f74] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fmake_list+0x4f) [0x238544] > /home/trevor/applications/opt/bin/emacs-28.0.50(concat+0x5c3) [0x2792f6] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcopy_sequence+0x16a) [0x278d2a] > /home/trevor/applications/opt/bin/emacs-28.0.50(timer_check+0x33) [0x1b79dd] > /home/trevor/applications/opt/bin/emacs-28.0.50(readable_events+0x1a) [0x1b5d00] > /home/trevor/applications/opt/bin/emacs-28.0.50(get_input_pending+0x2f) [0x1bcf3a] > /home/trevor/applications/opt/bin/emacs-28.0.50(detect_input_pending_run_timers+0x2e) [0x1c4eb1] > /home/trevor/applications/opt/bin/emacs-28.0.50(wait_reading_process_output+0x14ec) [0x2de0c0] > /home/trevor/applications/opt/bin/emacs-28.0.50(sit_for+0x211) [0x53e78] > /home/trevor/applications/opt/bin/emacs-28.0.50(read_char+0x1019) [0x1b3f62] This indicates some timer that runs; again, without knowing which timer and what it does, it is hard to proceed. Etc. etc. -- the bottom line is that I think we need to know how many bytes are allocated in each call to make some progress. It would be even more useful if we could somehow know which of the allocated buffers are free'd soon and which aren't. That's because Emacs calls memory allocation functions _a_lot_, and it is completely normal to see a lot of these calls. What we need is to find allocations that don't get free'd, and whose byte counts come close to explaining the rate of 1MB every few seconds. So these calls need to be filtered somehow, otherwise we will not see the forest for the gazillion trees. > I'm not sure if gnutls is giving back buffers that emacs is > supposed to free, or if the leak is entirely contained within > gnutls, but something in that path is hanging on to a lot of > allocations indefinitely. The GnuTLS functions we call in emacs_gnutls_read are: gnutls_record_recv emacs_gnutls_handle_error The latter is only called if there's an error, so I'm guessing it is not part of your trace. And the former doesn't say in its documentation that Emacs should free any buffers after calling it, so I'm not sure how Emacs could be the culprit here. If GnuTLS is the culprit (and as explained above, this is not certain at this point), perhaps upgrading to a newer GnuTLS version or reporting this to GnuTLS developers would allow some progress.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.