GNU bug report logs - #21509
25.0.50; X11 error: BadPixmap when creating first emacsclient frame; and memory leak

Previous Next

Package: emacs;

Reported by: Dima Kogan <dima <at> secretsauce.net>

Date: Fri, 18 Sep 2015 04:42:01 UTC

Severity: normal

Found in version 25.0.50

To reply to this bug, email your comments to 21509 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Fri, 18 Sep 2015 04:42:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Dima Kogan <dima <at> secretsauce.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 18 Sep 2015 04:42:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: bug-gnu-emacs <at> gnu.org
Subject: 25.0.50; X11 error: BadPixmap when creating first emacsclient frame;
 and memory leak
Date: Thu, 17 Sep 2015 21:41:00 -0700
[Message part 1 (text/plain, inline)]
Hi. I'm continuing the hunt for memory leaks:

  http://lists.gnu.org/archive/html/emacs-devel/2015-09/msg00619.html

As noted in that post, even after fixing the winner-mode problem, memory
still leaks, albeit slowly. Chasing down that leak led me to this bug.
The issue is an X11 BadPixmap error thrown maybe 20% of the time when
the FIRST client frame is opened. I have only tested with the lucid
widgets. Steps:

  1. emacs -Q --daemon
  2. In bash:   while true; do timeout 1 emacsclient -a '' -c; sleep 1; done;

So I spawn a new frame kill it after 1 second, then wait 1 more second
and repeat. Most of the time this looks like what you'd expect. About
20% of the time, however, the new X window pops up momentarily, and
instantly goes away. What's happening internally is apparently an X11
error that blows up the window. This isn't printed anywhere, but I use
perf to see it:

   # perf probe -x /usr/bin/emacs --add 'x_connection_closed error_message:string'
   ...

   # perf record -p `pidof emacs` -eprobe_emacs:x_connection_closed
   ... Ctrl-c when done

   # perf script
       emacs-tst 24692 [000] 443392.814048: probe_emacs:x_connection_closed: (4c0000) error_message_string="X protocol error: BadPixmap (invalid Pixmap parameter) on protocol request 55"


This clearly shouldn't happen. I've been running my emacs as a daemon
for years, however and I've never seen this. It turns out that this
happens ONLY if we creates/destroy the only frame. If instead, we


  1. create one frame by keeping an instance of the client open
  2. Run the above bash snippet to repeatedly create/destroy the second
  frame, then this X error does not happen, and everything works.


Another part of this is that when an X frame dies in this way, we leak
memory. The biggest offender is the font system. Normally, when a frame
is created, we open some fonts with xftfont_open(). Then when the frame
is destroyed x_delete_terminal() is called, which among other things
does

  if (dpyinfo->display)
      XtCloseDisplay();

which frees the fonts. If we get an X error, however, then
x_connection_closed() runs first. This sets dpyinfo->display=0, so the
above doesn't run, and the fonts leak.

I'm attaching a plot where I sampled the memory usage every second. At
the start I had an extra client window open, and we leak memory. Then I
closed that client window, and we started leaking memory much faster.
Then I created an extra client window again, and the leak slowed down
once again.

If we can fix the X error, the leak maybe doesn't matter.

[memory_test.pdf (application/pdf, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Fri, 02 Oct 2015 07:33:02 GMT) Full text and rfc822 format available.

Message #8 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: 21509 <at> debbugs.gnu.org
Subject: update
Date: Fri, 02 Oct 2015 00:32:01 -0700
A small update: this X error only comes up when scroll bars are enabled.
Turning them off with 'M-x scroll-bar-mode' makes the crash disappear.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Sun, 03 Sep 2017 10:16:02 GMT) Full text and rfc822 format available.

Message #11 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Dima Kogan <dima <at> secretsauce.net>, 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50; X11 error: BadPixmap when creating first
 emacsclient frame; and memory leak
Date: Sun, 03 Sep 2017 12:15:18 +0200
Dima, just in case you're still around

> The issue is an X11 BadPixmap error thrown maybe 20% of the time when
> the FIRST client frame is opened. I have only tested with the lucid
> widgets. Steps:
>
>    1. emacs -Q --daemon
>    2. In bash:   while true; do timeout 1 emacsclient -a '' -c; sleep 1; done;
>
> So I spawn a new frame kill it after 1 second, then wait 1 more second
> and repeat. Most of the time this looks like what you'd expect. About
> 20% of the time, however, the new X window pops up momentarily, and
> instantly goes away. What's happening internally is apparently an X11
> error that blows up the window. This isn't printed anywhere, but I use
> perf to see it:
>
>     # perf probe -x /usr/bin/emacs --add 'x_connection_closed error_message:string'
>     ...
>
>     # perf record -p `pidof emacs` -eprobe_emacs:x_connection_closed
>     ... Ctrl-c when done
>
>     # perf script
>         emacs-tst 24692 [000] 443392.814048: probe_emacs:x_connection_closed: (4c0000) error_message_string="X protocol error: BadPixmap (invalid Pixmap parameter) on protocol request 55"

I now installed in master a "fix" for this, see also the discussion of
Bug#27816.

Can you try it and also check whether it has any implications on the
memory leak problem?

Thanks, martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Tue, 05 Sep 2017 06:22:01 GMT) Full text and rfc822 format available.

Message #14 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: martin rudalics <rudalics <at> gmx.at>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50;
 X11 error: BadPixmap when creating first emacsclient frame; and memory leak
Date: Mon, 04 Sep 2017 23:21:32 -0700
[Message part 1 (text/plain, inline)]
martin rudalics <rudalics <at> gmx.at> writes:

> Dima, just in case you're still around

Hi. I'm around, just don't have time to work on emacs as much as I would
like.


> I now installed in master a "fix" for this, see also the discussion of
> Bug#27816.
>
> Can you try it and also check whether it has any implications on the
> memory leak problem?

Thank you very much for tracking this down!

I'm at the current HEAD (979797b9eca0ab009cc75a29765f998ec2aa1b45). This
is after your fix was applied.

I gathered data with 4 different sets of code:

- Lucid widgets. At HEAD, but with the fix (commit
  d577d1609c6c9d11b6af30a33e02bb21ffa821fd) pulled out. This should
  trigger this bug. Confirmed.

- Lucid widgets. At HEAD, with fix pulled out (same as above), but with
  the scroll bar turned off. This bug should not be triggered. Confirmed
  also.

- Lucid widgets. At HEAD, without any changes. The fix is in place. The
  bug should not be triggered. Again, confirmed. The fix works.

- GTK widgets. At HEAD, without any changes.

For each of these cases, I started a daemon with

  dima <at> scrawny:~/emacs/src$ ./emacs -Q --daemon=tst

Then I looped the client, repeatedly spawning and killing new frames
(same as what I was doing when looking for leaks before)

  dima <at> scrawny:~/emacs/lib-src$ while true; do timeout 1 ./emacsclient
  -a '' -c -s tst; sleep 1; done;

At the same time, I'd collect memory usage data:

  dima <at> scrawny:~$ while true; do ps -h -p PID -O rss; sleep 1; done | tee /tmp/logfile

I'm attaching the gathered data for all the cases, and a plot made with
this command:

  $ cat \
    <(< /tmp/before.log awk '{print NR,"nofix_yesscroll",$2}') \
    <(< /tmp/before.no.scroll.log  awk '{print NR,"nofix_noscroll",$2}') \
    <(< /tmp/after.yes.scroll.log awk '{print NR,"yesfix_yesscroll",$2}') \
    <(< /tmp/after.yes.scroll.gtk.log awk '{print NR,"yesfix_yesscroll_gtk",$2}') \
    | feedgnuplot --lines --dataid --domain --autolegend --xlabel 'frame index (2 seconds per frame)' --ylabel 'Memory consumed (kB)'

So in the case of lucid widgets, you not only fixed the bug, but also
massively plugged the memory leaks that were associated with new emacs
frames. Awesome!

The fix doesn't affect GTK, so that plot is just for reference. It's
clearly still leaking, and its baseline memory usage is higher. I was
thinking that GTK does a larger number of small allocations that are
soon freed, and thus suffers from more fragmentation issues. To test
that, I invoked malloc_trim(0) just after t=450 in the gtk run. This
asks malloc to return the the kernel all the unused memory it's holding.
And we see a 5MB drop. So the GTK leaks aren't as bad as they look.

So yeah. Good fix and thanks.

[before.log (application/octet-stream, attachment)]
[before.no.scroll.log (application/octet-stream, attachment)]
[after.yes.scroll.log (application/octet-stream, attachment)]
[after.yes.scroll.gtk.log (application/octet-stream, attachment)]
[emacs21509.pdf (application/pdf, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Tue, 05 Sep 2017 08:19:02 GMT) Full text and rfc822 format available.

Message #17 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50; X11 error: BadPixmap when creating first
 emacsclient frame; and memory leak
Date: Tue, 05 Sep 2017 10:18:25 +0200
> So in the case of lucid widgets, you not only fixed the bug,

I coin it a "fix" because I just pushed the same workaround that had
been used for the GTK build before.  But it hardly qualifies as a real
fix because it makes the referencing counting mechanism practically
obsolete for any kind of frame deletion.

> but also
> massively plugged the memory leaks that were associated with new emacs
> frames. Awesome!

IIUC your numbers of Lucid with scrollbars now coincide with the numbers
of Lucid without scrollbars before the "fix".  OTOH the numbers for GTK
largely coincide with those of Lucid with scrollbars before the "fix".
So X itself seems much more dominant than any toolkit particularities.

> The fix doesn't affect GTK, so that plot is just for reference. It's
> clearly still leaking, and its baseline memory usage is higher. I was
> thinking that GTK does a larger number of small allocations that are
> soon freed, and thus suffers from more fragmentation issues. To test
> that, I invoked malloc_trim(0) just after t=450 in the gtk run. This
> asks malloc to return the the kernel all the unused memory it's holding.
> And we see a 5MB drop. So the GTK leaks aren't as bad as they look.

I think what was going on before the "fix" is this: We free data we
shouldn't, probably due to a bug in the reference counting mechanism.
Some time later, after that data got overwritten, the new data triggers
the error in the toolkit code.  I say "some time later" because the
error (1) apparently nowhere triggers with _every_ invocation of
emacsclient, (2) its frequency seems to differ according to the various
bug reports from every third invocation to every fifth, and (3) it seems
to have a fairly constant frequency, at least here when it starts to
happen every third invocation it will continue to happen every third
invocation.

While the above tries to explain the bug reported by the toolkit, any
memory leak should follow from trying to delete the terminal but when
doing so we forget to explicitly free some (maybe font related) data and
when invoking a new client we reallocate that data instead of reusing
the one we forgot about.

This does not explain any difference between the GTK (before the "fix")
and Lucid (after the "fix") behaviors.  What happens with GTK when you
allow it to delete the terminal by allowing terminal->reference_count
drop to zero?  If it does not crash immediately, is the memory leak more
heavy than it is now?

Also, could you try whether changing ‘gc-cons-threshold’ in either
direction has any impact on the occurrence of the toolkit bug or the
growth of the memory leak?  Once I thought that this could affect the
frequency of the error but didn't get any conclusive results.

Thanks, martin





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Sat, 09 Sep 2017 01:50:02 GMT) Full text and rfc822 format available.

Message #20 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: martin rudalics <rudalics <at> gmx.at>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50;
 X11 error: BadPixmap when creating first emacsclient frame; and memory leak
Date: Fri, 08 Sep 2017 18:49:38 -0700
martin rudalics <rudalics <at> gmx.at> writes:

>  > but also
>  > massively plugged the memory leaks that were associated with new emacs
>  > frames. Awesome!
>
> IIUC your numbers of Lucid with scrollbars now coincide with the numbers
> of Lucid without scrollbars before the "fix".

No, that's not right.

Lucid with scrollbars post-fix is the blue line: memory usage is stable
as frames are created/destroyed: the leak is ~0

Lucid without scrollbars pre-fix is the green line. Memory usage is
climbing. We aren't leaking as badly as the other cases, but we ARE
leaking.

So the fix resolved the large leakages in the other cases and also the
small leakages that weren't scroll-bar-related.


> OTOH the numbers for GTK largely coincide with those of Lucid with
> scrollbars before the "fix". So X itself seems much more dominant than
> any toolkit particularities.

I don't think this is right either. Lucid with scrollbars pre-fix is the
purple line. We leak memory at a high, constant rate. GTK memory usage
(yellow) is noisy and fragmented (I bet we're invoking malloc/free much
more often). The baseline memory consumption is higher, the past that,
the leak rate isn't nearly as bad as the purple. The higher
fragmentation means that the internals of malloc() matter too: I invoked
malloc_trim() just after t=450s, and we see the memory usage dropped
sharply as a result.


> This does not explain any difference between the GTK (before the "fix")
> and Lucid (after the "fix") behaviors.  What happens with GTK when you
> allow it to delete the terminal by allowing terminal->reference_count
> drop to zero?  If it does not crash immediately, is the memory leak more
> heavy than it is now?

I haven't run that experiment, but I could do that. Probably won't get
to it for a week, though.


> Also, could you try whether changing  gc-cons-threshold  in either
> direction has any impact on the occurrence of the toolkit bug or the
> growth of the memory leak?  Once I thought that this could affect the
> frequency of the error but didn't get any conclusive results.

OK. I'll look.

By the way, if you haven't tried it yet, look at rr:

http://www.rr-project.com

It's a debugging instrumentation tool that saves a process execution
trace, and lets you replay it forwards and backwards in gdb. Could make
it possible to find heisen-problems such as this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Sat, 09 Sep 2017 08:01:02 GMT) Full text and rfc822 format available.

Message #23 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50; X11 error: BadPixmap when creating first
 emacsclient frame; and memory leak
Date: Sat, 09 Sep 2017 09:59:50 +0200
>> IIUC your numbers of Lucid with scrollbars now coincide with the numbers
>> of Lucid without scrollbars before the "fix".
>
> No, that's not right.
>
> Lucid with scrollbars post-fix is the blue line: memory usage is stable
> as frames are created/destroyed: the leak is ~0
>
> Lucid without scrollbars pre-fix is the green line. Memory usage is
> climbing. We aren't leaking as badly as the other cases, but we ARE
> leaking.
>
> So the fix resolved the large leakages in the other cases and also the
> small leakages that weren't scroll-bar-related.

I can't follow you.  before.no.scroll.log has

16112 19608 S ?        00:00:00 emacs
...
16112 29700 S ?        00:00:12 emacs

while after.yes.scroll.log has

17508 19652 S ?        00:00:00 emacs
...
17508 29160 S ?        00:00:13 emacs

which strike me as very similar (especially given the figures of the two
other logs).  What am I missing?

>> OTOH the numbers for GTK largely coincide with those of Lucid with
>> scrollbars before the "fix". So X itself seems much more dominant than
>> any toolkit particularities.
>
> I don't think this is right either. Lucid with scrollbars pre-fix is the
> purple line. We leak memory at a high, constant rate. GTK memory usage
> (yellow) is noisy and fragmented (I bet we're invoking malloc/free much
> more often). The baseline memory consumption is higher, the past that,
> the leak rate isn't nearly as bad as the purple. The higher
> fragmentation means that the internals of malloc() matter too: I invoked
> malloc_trim() just after t=450s, and we see the memory usage dropped
> sharply as a result.

I can't tell that since it's not in your figures.  But again, the raw
data from before.log and after.yes.scroll.gtk.log seem similar too.

>> This does not explain any difference between the GTK (before the "fix")
>> and Lucid (after the "fix") behaviors.  What happens with GTK when you
>> allow it to delete the terminal by allowing terminal->reference_count
>> drop to zero?  If it does not crash immediately, is the memory leak more
>> heavy than it is now?
>
> I haven't run that experiment, but I could do that. Probably won't get
> to it for a week, though.

Please try.  If you succeed running it, the resulting difference between
the "before" GTK and Lucid runs should IMHO show the noise introduced by
malloc/free more clearly.  And obviously running the before/after
experiments for Motif would be interesting too: The Motif build crashed
in the menu bar section while the Lucid build crashed in the scroll bar
section.

>> Also, could you try whether changing  gc-cons-threshold  in either
>> direction has any impact on the occurrence of the toolkit bug or the
>> growth of the memory leak?  Once I thought that this could affect the
>> frequency of the error but didn't get any conclusive results.
>
> OK. I'll look.
>
> By the way, if you haven't tried it yet, look at rr:
>
> http://www.rr-project.com

It's

http://www.rr-project.org

now.

> It's a debugging instrumentation tool that saves a process execution
> trace, and lets you replay it forwards and backwards in gdb. Could make
> it possible to find heisen-problems such as this.

If it's a GC related problem, it would need more patience than I'm able
to provide anyway.  I'm still trying to find a common pattern for all
these behaviors from a coarse-grained perspective.  And obviously it
would be interesting to find out why the whereabouts at the time of the
"X11 error that blows up the window" apparently differ for the various
builds.  IIUC it's still us who determine which one to draw first - the
menu bar or the scroll bar.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Sun, 15 Oct 2017 03:49:02 GMT) Full text and rfc822 format available.

Message #26 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: martin rudalics <rudalics <at> gmx.at>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50;
 X11 error: BadPixmap when creating first emacsclient frame; and memory leak
Date: Sat, 14 Oct 2017 20:48:33 -0700
[Message part 1 (text/plain, inline)]
Hi. Sorry it took me this long to get back to you. I'll try to reply in
a more timely way if you have more requests. Notes inline


martin rudalics <rudalics <at> gmx.at> writes:

>  >> IIUC your numbers of Lucid with scrollbars now coincide with the numbers
>  >> of Lucid without scrollbars before the "fix".
>  >
>  > No, that's not right.
>  >
>  > Lucid with scrollbars post-fix is the blue line: memory usage is stable
>  > as frames are created/destroyed: the leak is ~0
>  >
>  > Lucid without scrollbars pre-fix is the green line. Memory usage is
>  > climbing. We aren't leaking as badly as the other cases, but we ARE
>  > leaking.
>  >
>  > So the fix resolved the large leakages in the other cases and also the
>  > small leakages that weren't scroll-bar-related.
>
> I can't follow you.  before.no.scroll.log has
>
> 16112 19608 S ?        00:00:00 emacs
> ...
> 16112 29700 S ?        00:00:12 emacs
>
> while after.yes.scroll.log has
>
> 17508 19652 S ?        00:00:00 emacs
> ...
> 17508 29160 S ?        00:00:13 emacs
>
> which strike me as very similar (especially given the figures of the two
> other logs).  What am I missing?

I wasn't consistent in how long I was running each trial, so by simply
looking at the final memory usage values as you're doing above, you're
looking at different runtimes. The plot attached the last time shows all
of the raw data, and you should clearly see the different slopes of the
two trials: i.e. one is leaking and the other is not.



>  >> OTOH the numbers for GTK largely coincide with those of Lucid with
>  >> scrollbars before the "fix". So X itself seems much more dominant than
>  >> any toolkit particularities.
>  >
>  > I don't think this is right either. Lucid with scrollbars pre-fix is the
>  > purple line. We leak memory at a high, constant rate. GTK memory usage
>  > (yellow) is noisy and fragmented (I bet we're invoking malloc/free much
>  > more often). The baseline memory consumption is higher, the past that,
>  > the leak rate isn't nearly as bad as the purple. The higher
>  > fragmentation means that the internals of malloc() matter too: I invoked
>  > malloc_trim() just after t=450s, and we see the memory usage dropped
>  > sharply as a result.
>
> I can't tell that since it's not in your figures.  But again, the raw
> data from before.log and after.yes.scroll.gtk.log seem similar too.

Again, the plot is much more descriptive. The GTK case uses a lot more
memory upfront and then leaks slowly. Conversely, the lucid
no-fix-yes-scrollbars case uses much less upfront, but then leaks much
faster. In the arbitrary-length trials from the previous email, they end
up at roughly the same place. But that's a coincidence.



>  >> This does not explain any difference between the GTK (before the "fix")
>  >> and Lucid (after the "fix") behaviors.  What happens with GTK when you
>  >> allow it to delete the terminal by allowing terminal->reference_count
>  >> drop to zero?  If it does not crash immediately, is the memory leak more
>  >> heavy than it is now?
>  >
>  > I haven't run that experiment, but I could do that. Probably won't get
>  > to it for a week, though.
>
> Please try.  If you succeed running it, the resulting difference between
> the "before" GTK and Lucid runs should IMHO show the noise introduced by
> malloc/free more clearly.  And obviously running the before/after
> experiments for Motif would be interesting too: The Motif build crashed
> in the menu bar section while the Lucid build crashed in the scroll bar
> section.
>
>  >> Also, could you try whether changing  gc-cons-threshold  in either
>  >> direction has any impact on the occurrence of the toolkit bug or the
>  >> growth of the memory leak?  Once I thought that this could affect the
>  >> frequency of the error but didn't get any conclusive results.

I ran a few more trials to get the data you asked for. All of these:

- Are at the same baseline revision as before: 979797b9eca0ab. This is
  after your lucid "fix"

- Are 4 minutes long

- invoke malloc_trim() at the end, so the noticeable drop in resident
  memory use at the end of each trial is visible

The different trials are

- GTK stock: has the reference-counting logic to prevent
  terminal->reference_count reaching 0

- GTK without the refcount logic: has NO reference-counting logic to
  prevent terminal->reference_count reaching 0

- Lucid stock

- Lucid with no refcount logic and stock 800k gc-cons-threshold

- Lucid with no refcount logic and 80k gc-cons-threshold

- Lucid with no refcount logic and 8000k gc-cons-threshold

Raw data and accompanying plot attached. Plot made thusly:

cat \
    <(< gtk.stock.log awk '{print NR,"gtk.stock.log",             $2}') \
    <(< gtk.no.refcount.logic.log awk '{print NR,"gtk.no.refcount.logic.log",   $2}') \
    <(< lucid.no.refcount.logic.gc80k.log awk '{print NR,"lucid.no.refcount.logic.gc80k.log",   $2}') \
    <(< lucid.no.refcount.logic.gc800k.log awk '{print NR,"lucid.no.refcount.logic.gc800k.log",  $2}') \
    <(< lucid.no.refcount.logic.gc8000k.log awk '{print NR,"lucid.no.refcount.logic.gc8000k.log", $2}') \
    <(< lucid.yes.refcount.logic.log awk '{print NR,"lucid.yes.refcount.logic.log",           $2}') \
    | feedgnuplot --lines --dataid --domain --autolegend --xlabel 'frame index (2 seconds per frame)' --ylabel 'Memory consumed (kB)'


As expected, turning off the refcounting logic for the GTK terminal
makes it crash long before the trial ends. And it leaks lots of memory
in the meantime: ~ 430kB/frame

For whatever reason, the gtk trial is much more consistent this time. It
leaks at ~ 15kB/frame, although malloc_trim() gives back 1300kB at the
end, so the 15kB/frame is an overestimate, as far as emacs is concerned
at least.

As before, adding the refcounting logic to the lucid terminal drops the
leak rate to 0.

Tweaking gc-cons-threshold does have an effect, but it's not clear what.
The default is 800kB. Looks like the 80k and 800k settings produce
similar leaks at 54kB/frame.

An 8000kB setting produces a sharp climb of about 8000kB above the
baseline, and then a slower leak of ~44kB/frame, although this isn't as
linear as the others.

Does any of this speak to you?



[lucid.no.refcount.logic.gc80k.log (application/octet-stream, attachment)]
[lucid.no.refcount.logic.gc800k.log (application/octet-stream, attachment)]
[lucid.no.refcount.logic.gc8000k.log (application/octet-stream, attachment)]
[lucid.yes.refcount.logic.log (application/octet-stream, attachment)]
[gtk.stock.log (application/octet-stream, attachment)]
[gtk.no.refcount.logic.log (application/octet-stream, attachment)]
[emacs21509_2017-10-14.pdf (application/pdf, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Sun, 15 Oct 2017 09:42:01 GMT) Full text and rfc822 format available.

Message #29 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50; X11 error: BadPixmap when creating first
 emacsclient frame; and memory leak
Date: Sun, 15 Oct 2017 11:40:54 +0200
> The plot attached the last time shows all
> of the raw data, and you should clearly see the different slopes of the
> two trials: i.e. one is leaking and the other is not.
[...]
> Again, the plot is much more descriptive.

I sadly have to admit that I misinterpreted your last posts completely
simply because I then missed the pdf attachment (which I usually don't
see with my settings).  When you were talking about blue and green lines
I thought of them as borders we'd better not transgress ...  So most of
my earlier remarks were just silly.

Now everything is much much clearer to me.  In fact, your plots make it
easy to distinguish the "upfront" consumption from the actual leaks and
allow to see the leakage clearly.  I'll eventually have to include links
to your plots in the code.

> Does any of this speak to you?

The only thing that still stupefies me is why an 80k cons threshold
eventually produces strictly more leakage than the 800k and 8000k ones.
Do you have an explanation for that?  Doesn't that also mean that with
such a low threshold running the test for some twenty minutes will have
us run out of memory?

Ah yes.  Could you please run GTK (not the crashing one) with the 80k
and 8000k cons thresholds too?

Many thanks, martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Sun, 15 Oct 2017 18:25:01 GMT) Full text and rfc822 format available.

Message #32 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: martin rudalics <rudalics <at> gmx.at>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50;
 X11 error: BadPixmap when creating first emacsclient frame; and memory leak
Date: Sun, 15 Oct 2017 11:24:49 -0700
[Message part 1 (text/plain, inline)]
martin rudalics <rudalics <at> gmx.at> writes:

> Now everything is much much clearer to me.  In fact, your plots make it
> easy to distinguish the "upfront" consumption from the actual leaks and
> allow to see the leakage clearly.  I'll eventually have to include links
> to your plots in the code.

Glad that was cleared up.


>  > Does any of this speak to you?
>
> The only thing that still stupefies me is why an 80k cons threshold
> eventually produces strictly more leakage than the 800k and 8000k ones.
> Do you have an explanation for that?
>
> Doesn't that also mean that with such a low threshold running the test
> for some twenty minutes will have us run out of memory?

Sigh. This was my fault: it appears that I sent you the wrong plot.
Apparently if the emacs windows pop up in the background, the toolkits
do less work and allocate less memory. The plot I sent by mistake was
made while I was doing other work on that machine, so the memory usage
is inconsistent. Yesterday I reran the trials to get good data, and the
data in that email is correct. But I sent the wrong plot by mistake.


> Ah yes.  Could you please run GTK (not the crashing one) with the 80k
> and 8000k cons thresholds too?

OK. Let's forget about yesterday's plot. Here's a new plot containing
all the correct data from yesterday, and these two new trials you just
asked for.

In both the lucid and gtk cases, we get similar results with
gc-cons-threshold=80k and gc-cons-threshold=800k.

Setting it to 8000k, we observe an initial steep climb of ~8000k, at
which point the gc kicks in, and we start leaking at the same rate as
the 80k,800k cases, albeit with a ~8000k offset. I think the initial
climb makes sense to me, but not the offset. Does it make sense to you?


[emacs21509_2017-10-15.pdf (application/pdf, attachment)]
[gtk.yes.refcount.logic.gc80k.log (application/octet-stream, attachment)]
[gtk.yes.refcount.logic.gc8000k.log (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Mon, 16 Oct 2017 07:43:02 GMT) Full text and rfc822 format available.

Message #35 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50; X11 error: BadPixmap when creating first
 emacsclient frame; and memory leak
Date: Mon, 16 Oct 2017 09:42:09 +0200
> Setting it to 8000k, we observe an initial steep climb of ~8000k, at
> which point the gc kicks in, and we start leaking at the same rate as
> the 80k,800k cases, albeit with a ~8000k offset. I think the initial
> climb makes sense to me, but not the offset. Does it make sense to you?

Do you mean the apparent fact that the value of cons threshold affects
the Resident Set Size?

What surprises me is that both 800K cons-threshold runs apparently start
with a considerable lower memory consumption (around 20000 kB) than the
80k and 8000k runs.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Mon, 16 Oct 2017 16:34:01 GMT) Full text and rfc822 format available.

Message #38 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: martin rudalics <rudalics <at> gmx.at>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50;
 X11 error: BadPixmap when creating first emacsclient frame; and memory leak
Date: Mon, 16 Oct 2017 09:33:01 -0700
martin rudalics <rudalics <at> gmx.at> writes:

> What surprises me is that both 800K cons-threshold runs apparently start
> with a considerable lower memory consumption (around 20000 kB) than the
> 80k and 8000k runs.

Ah, that I CAN explain: it's an artifact of the way I ran the trials.

For the 800k cases I'd start the emacs daemon from the commandline, and
then start gathering the data and cycling the frames.

For the 80k and 8000k cases I'd start the daemon, and then take an
extra step:

  Open a single client frame to (setq gc-cons-threshold whatever)

I should have started gathering the usage data before I opened this
first frame, but I guess I didn't. So the memory usage plots the 800k
start when no frames have ever been created, but the 80k and 8000k start
after 1 frame was created.

So this difference isn't indicative of anything interesting.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21509; Package emacs. (Tue, 17 Oct 2017 09:00:02 GMT) Full text and rfc822 format available.

Message #41 received at 21509 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: 21509 <at> debbugs.gnu.org
Subject: Re: bug#21509: 25.0.50; X11 error: BadPixmap when creating first
 emacsclient frame; and memory leak
Date: Tue, 17 Oct 2017 10:58:59 +0200
> Ah, that I CAN explain: it's an artifact of the way I ran the trials.
[...]
> So this difference isn't indicative of anything interesting.

Thanks.  As for the issue you can't explain was my guess

  Do you mean the apparent fact that the value of cons threshold affects
  the Resident Set Size?

correct?

martin




This bug report was last modified 7 years and 298 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.