Package: emacs;
Reported by: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Wed, 19 Jul 2023 21:17:02 UTC
Severity: normal
Found in version 29.0.92
View this message in rfc822 format
From: Dmitry Gutov <dmitry <at> gutov.dev> To: Eli Zaretskii <eliz <at> gnu.org> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net, 64735 <at> debbugs.gnu.org Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Fri, 8 Sep 2023 03:53:37 +0300
Let's try to investigate this some more, if we can. On 31/07/2023 14:38, Eli Zaretskii wrote: >> Date: Sun, 30 Jul 2023 04:35:49 +0300 >> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net, >> 64735 <at> debbugs.gnu.org >> From: Dmitry Gutov <dmitry <at> gutov.dev> >> >> In this context, junks objects are objects that will need to be >> collected by garbage collector very soon because they are just a >> byproduct of a function's execution (but aren't used in the return >> value, for example). The more of them a function creates, the more work >> it will be, supposedly, for the GC. >> >> Heap is perhaps the wrong term (given that C has its own notion of >> heap), but I meant the memory managed by the Lisp runtime. >> >> And chunks are the buffered strings that get passed to the process >> filter. Chunks of the process' output. By default, these chunks are 4096 >> characters long, but the comparisons tweak that value by 10x and 100x. > > If the subprocess output is inserted into a buffer, its effect on the > GC will be different. (Not sure if this is relevant to the issue at > hand, as I lost track of the many variants of the function that were > presented.) Yes, one of the variants inserts into the buffer (one that uses a synchronous process call and also, coincidentally, spends less time in GC), and the asynchronous work from a process filter. >>> If I read what you wrote superficially, without delving into the >>> details (which I can't understand), you are saying that the overall >>> amount of consing is roughly the same. >> >> What is "amount of consing"? Is it just the number of objects? Or does >> their size (e.g. string length) affect GC pressure as well? > > In general, both, since we have 2 GC thresholds, and GC is actually > done when both are exceeded. So the effect will also depend on how > much Lisp memory is already allocated in the Emacs process where these > benchmarks are run. All right. >>> This is consistent with the >>> fact that the GC times change only very little. So I don't think I >>> see, on this level, what puzzles you in this picture. >> >> Now that you pointed that out, the picture is just more puzzling. While >> 0.1s in GC is not insignificant (it's 10% of the whole runtime), it does >> seem to have been more of a fluke, and on average the fluctuations in GC >> time are smaller. >> >> Here's an extended comparison: >> >> (("with-find 4096" . "Elapsed time: 1.737742s (1.019624s in 28 GCs)") >> ("with-find 40960" . "Elapsed time: 1.515376s (0.942906s in 26 GCs)") >> ("with-find 409600" . "Elapsed time: 1.458987s (0.948857s in 26 GCs)") >> ("with-find 1048576" . "Elapsed time: 1.363882s (0.888599s in 24 GCs)") >> ("with-find-p 4096" . "Elapsed time: 1.202522s (0.745758s in 19 GCs)") >> ("with-find-p 40960" . "Elapsed time: 1.005221s (0.640815s in 16 GCs)") >> ("with-find-p 409600" . "Elapsed time: 0.855483s (0.591208s in 15 GCs)") >> ("with-find-p 1048576". "Elapsed time: 0.825936s (0.623876s in 16 GCs)") >> ("with-find-sync 4096" . "Elapsed time: 0.848059s (0.272570s in 7 GCs)") >> ("with-find-sync 409600"."Elapsed time: 0.912932s (0.339230s in 9 GCs)") >> ("with-find-sync 1048576"."Elapsed time: 0.880479s (0.303047s in 8 GCs)" >> )) >> >> What was puzzling for me, overall, is that if we take "with-find 409600" >> (the fastest among the asynchronous runs without parallelism) and >> "with-find-sync", the difference in GC time (which is repeatable), >> 0.66s, almost covers all the difference in performance. And as for >> "with-find-p 409600", it would come out on top! Which it did in Ihor's >> tests when GC was disabled. >> >> But where does the extra GC time come from? Is it from extra consing in >> the asynchronous call's case? If it is, it's not from all the chunked >> strings, apparently, given that increasing max string's size (and >> decreasing their number by 2x-6x, according to my logging) doesn't >> affect the reported GC time much. >> >> Could the extra time spent in GC just come from the fact that it's given >> more opportunities to run, maybe? call_process stays entirely in C, >> whereas make-process, with its asynchronous approach, goes between C and >> Lisp even time it receives input. The report above might indicate so: >> with-find-p have ~20 garbage collection cycles, whereas with-find-sync - >> only ~10. Or could there be some other source of consing, unrelated to >> the process output string, and how finely they are sliced? > > These questions can only be answered by dumping the values of the 2 GC > thresholds and of consing_until_gc for each GC cycle. It could be > that we are consing more Lisp memory, or it could be that one of the > implementations provides fewer opportunities for Emacs to call > maybe_gc. Or it could be some combination of the two. Do you think the outputs of https://elpa.gnu.org/packages/emacs-gc-stats.html could help? Otherwise, I suppose I need to add some fprintf's somewhere. Would the beginning of maybe_gc inside lisp.h be a good place for that? >> If we get back to increasing read-process-output-max, which does help >> (apparently due to reducing the number we switch between reading from >> the process and doing... whatever else), the sweet spot seems to be >> 1048576, which is my system's maximum value. Anything higher - and the >> perf goes back to worse -- I'm guessing something somewhere resets the >> value to default? Not sure why it doesn't clip to the maximum allowed, >> though. >> >> Anyway, it would be helpful to be able to decide on as high as possible >> value without manually reading from /proc/sys/fs/pipe-max-size. And what >> of other OSes? > > Is this with pipes or with PTYs? All examples which use make-process call it with :connection-type 'pipe. The one that calls process-file (the "synchronous" impl) also probably does, but I don't see that in the docstring.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.