Package: emacs;
Reported by: Michael Heerdegen <michael_heerdegen <at> web.de>
Date: Mon, 14 Sep 2020 00:44:01 UTC
Severity: normal
Merged with 43395, 43876, 44666
Found in version 28.0.50
Done: Stefan Monnier <monnier <at> iro.umontreal.ca>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Carlos O'Donell <carlos <at> redhat.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: fweimer <at> redhat.com, 43389 <at> debbugs.gnu.org, bugs <at> gnu.support, dj <at> redhat.com, michael_heerdegen <at> web.de, trevor <at> trevorbentley.com Subject: bug#43389: 28.0.50; Emacs memory leaks using hard disk all time Date: Wed, 25 Nov 2020 13:57:34 -0500
On 11/25/20 1:03 PM, Eli Zaretskii wrote: >> Cc: bugs <at> gnu.support, fweimer <at> redhat.com, 43389 <at> debbugs.gnu.org, >> dj <at> redhat.com, michael_heerdegen <at> web.de >> From: Carlos O'Donell <carlos <at> redhat.com> >> Date: Wed, 25 Nov 2020 12:45:04 -0500 >> >> On 11/24/20 11:07 AM, Eli Zaretskii wrote: >>> Look at the large chunks in the tail of this. Together, they do >>> account for ~2GB. >>> >>> Carlos, are these chunks in use (i.e. allocated and not freed), or are >>> they the free chunks that are available for allocation, but not >>> released to the OS? If the former, then it sounds like this session >>> does have around 2GB of allocated heap data, so either there's some >>> allocated memory we don't account for, or there is indeed a memory >>> leak in Emacs. If these are the free chunks, then the way glibc >>> manages free'd memory is indeed an issue. >> >> These chunks are all free and mapped for use by the algorithm to satisfy >> a request by the application. > > So we have more than 1.5GB free memory available for allocation, is > that right? There are 3 malloc_info traces in the log. 1. Lines 47-219. Day 1: 1100MiB of RSS. 2. Lines 386-556. Day 4: 2.3GiB of RSS. 3. Lines 744-792. Day 5: 4.2GiB of RSS. Lines are numbered for the log starting at 1. > To make sure there are no misunderstandings, I'm talking about this > part of the log: Your analysis is for trace #2, lines 386-556. My analysis was for trace #3, lines 744-792. > <heap nr="0"> > <sizes> > [...] > <size from="10753" to="12273" total="11387550" count="990"/> > <size from="12289" to="16369" total="32661229" count="2317"/> > <size from="16385" to="20465" total="36652437" count="2037"/> > <size from="20481" to="24561" total="21272131" count="947"/> > <size from="24577" to="28657" total="25462302" count="958"/> > <size from="28673" to="32753" total="28087234" count="914"/> > <size from="32769" to="36849" total="39080113" count="1121"/> > <size from="36865" to="40945" total="30141527" count="775"/> > <size from="40961" to="65521" total="166092799" count="3119"/> > <size from="65537" to="98289" total="218425380" count="2692"/> > <size from="98321" to="131057" total="178383171" count="1555"/> > <size from="131089" to="163825" total="167800886" count="1142"/> > <size from="163841" to="262065" total="367649915" count="1819"/> > <size from="262161" to="522673" total="185347984" count="560"/> > <size from="525729" to="30878897" total="113322865" count="97"/> > <unsorted from="33" to="33" total="33" count="1"/> > </sizes> > > If I sum up the "total=" parts of these large numbers, I get 1.6GB. > Is this free memory, given back to glibc for future allocations from > this arena, and if so, are those 1.6GB part of the 4.2GB total? In trace #2 we have these final statistics: 549 <total type="fast" count="39" size="2656"/> 550 <total type="rest" count="44013" size="1755953515"/> 551 <total type="mmap" count="6" size="121565184"/> 552 <system type="current" size="2246778880"/> 553 <system type="max" size="2246778880"/> 554 <aspace type="total" size="2246778880"/> 555 <aspace type="mprotect" size="2246778880"/> 556 </malloc> This shows ~1.7GiB of unused free chunks. Keep in mind glibc malloc is a heap-based allocator so if you have FIFO usage pattern you won't see the kernel heap decrease until you free the most recently allocated chunk. In trace #3 we *do* see that application demand consumes all these free chunks again, so something is using them in the application. There are none left reported in the malloc_info statistics (could also be chunk corruption). During trace #2 the only way to free some of the ~1.7GiB in-use by the algorithm is to call malloc_trim() to free back unused pages (requires free/unsorted chunk walk and mmumap() calls to the kernel to reduce RSS accounting). Calling malloc_trim is expensive, particularly if you're just going to use the chunks again, as appears to be happening the next day. In trace #3, for which we are at 4.2GiB of RSS usage, we see the following: 742 ;; malloc-info 743 (malloc-info) 744 <malloc version="1"> 745 <heap nr="0"> 746 <sizes> 747 </sizes> 748 <total type="fast" count="0" size="0"/> 749 <total type="rest" count="1" size="112688"/> a. Arena 0 (kernel heap) shows 0KiB of unused fast bins, 112KiB of other in 1 bin (probably top-chunk). 750 <system type="current" size="4243079168"/> 751 <system type="max" size="4243079168"/> 752 <aspace type="total" size="4243079168"/> 753 <aspace type="mprotect" size="4243079168"/> b. Arena 0 (kernel heap) shows 4.2GiB "current" which means that the sbrk-extended kernel heap is in use up to 4.2GiB. WARNING: We count "foreign" uses of sbrk as brk space, so looking for sbrk or brk by a foreign source is useful. 754 </heap> 755 <heap nr="1"> 756 <sizes> 757 <size from="17" to="32" total="32" count="1"/> 758 <size from="33" to="48" total="240" count="5"/> 759 <size from="49" to="64" total="256" count="4"/> 760 <size from="65" to="80" total="160" count="2"/> 761 <size from="97" to="112" total="224" count="2"/> 762 <size from="33" to="33" total="231" count="7"/> 763 <size from="49" to="49" total="294" count="6"/> 764 <size from="65" to="65" total="390" count="6"/> 765 <size from="81" to="81" total="162" count="2"/> 766 <size from="97" to="97" total="97" count="1"/> 767 <size from="129" to="129" total="516" count="4"/> 768 <size from="161" to="161" total="644" count="4"/> 769 <size from="209" to="209" total="1254" count="6"/> 770 <size from="241" to="241" total="241" count="1"/> 771 <size from="257" to="257" total="257" count="1"/> 772 <size from="305" to="305" total="610" count="2"/> 773 <size from="32209" to="32209" total="32209" count="1"/> 774 <size from="3982129" to="8059889" total="28065174" count="6"/> 775 <unsorted from="209" to="4020593" total="4047069" count="13"/> 776 </sizes> 777 <total type="fast" count="14" size="912"/> 778 <total type="rest" count="61" size="42357420"/> 779 <system type="current" size="42426368"/> 780 <system type="max" size="42426368"/> 781 <aspace type="total" size="42426368"/> 782 <aspace type="mprotect" size="42426368"/> 783 <aspace type="subheaps" size="1"/> c. Arena 1 has 42MiB of free'd chunks for use. 784 </heap> 785 <total type="fast" count="14" size="912"/> 786 <total type="rest" count="62" size="42470108"/> 787 <total type="mmap" count="9" size="208683008"/> d. We have: - 912KiB of fast bins. - 42MiB of regular bins. - 200MiB of mmap'd large chunks. 788 <system type="current" size="4285505536"/> 789 <system type="max" size="4285505536"/> 790 <aspace type="total" size="4285505536"/> e. Total allocated space is 4.2GiB. 791 <aspace type="mprotect" size="4285505536"/> 792 </malloc> Something is using the kernel heap chunks, or calling sbrk/brk directly (since foreign brks are counted by our statistics). >> This shows the application is USING memory on the main system heap. >> >> It might not be "leaked" memory since the application might be using it. >> >> You want visibility into what is USING that memory. >> >> With glibc-malloc-trace-utils you can try to do that with: >> >> LD_PRELOAD=libmtrace.so \ >> MTRACE_CTL_FILE=/home/user/app.mtr \ >> MTRACE_CTL_BACKTRACE=1 \ >> ./app >> >> This will use libgcc's unwinder to get a copy of the malloc caller >> address and then we'll have to decode that based on a /proc/self/maps. >> >> Next steps: >> - Get a glibc-malloc-trace-utils trace of the application ratcheting. >> - Get a copy of /proc/$PID/maps for the application (shorter version of smaps). >> >> Then we might be able to correlate where all the kernel heap data went? > > Thanks for the instructions. Would people please try that and report > the results? > -- Cheers, Carlos.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.