GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net, 64735 <at> debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Fri, 8 Sep 2023 03:53:37 +0300
Let's try to investigate this some more, if we can.

On 31/07/2023 14:38, Eli Zaretskii wrote:
>> Date: Sun, 30 Jul 2023 04:35:49 +0300
>> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net,
>>   64735 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dmitry <at> gutov.dev>
>>
>> In this context, junks objects are objects that will need to be
>> collected by garbage collector very soon because they are just a
>> byproduct of a function's execution (but aren't used in the return
>> value, for example). The more of them a function creates, the more work
>> it will be, supposedly, for the GC.
>>
>> Heap is perhaps the wrong term (given that C has its own notion of
>> heap), but I meant the memory managed by the Lisp runtime.
>>
>> And chunks are the buffered strings that get passed to the process
>> filter. Chunks of the process' output. By default, these chunks are 4096
>> characters long, but the comparisons tweak that value by 10x and 100x.
> 
> If the subprocess output is inserted into a buffer, its effect on the
> GC will be different.  (Not sure if this is relevant to the issue at
> hand, as I lost track of the many variants of the function that were
> presented.)

Yes, one of the variants inserts into the buffer (one that uses a 
synchronous process call and also, coincidentally, spends less time in 
GC), and the asynchronous work from a process filter.

>>> If I read what you wrote superficially, without delving into the
>>> details (which I can't understand), you are saying that the overall
>>> amount of consing is roughly the same.
>>
>> What is "amount of consing"? Is it just the number of objects? Or does
>> their size (e.g. string length) affect GC pressure as well?
> 
> In general, both, since we have 2 GC thresholds, and GC is actually
> done when both are exceeded.  So the effect will also depend on how
> much Lisp memory is already allocated in the Emacs process where these
> benchmarks are run.

All right.

>>> This is consistent with the
>>> fact that the GC times change only very little.  So I don't think I
>>> see, on this level, what puzzles you in this picture.
>>
>> Now that you pointed that out, the picture is just more puzzling. While
>> 0.1s in GC is not insignificant (it's 10% of the whole runtime), it does
>> seem to have been more of a fluke, and on average the fluctuations in GC
>> time are smaller.
>>
>> Here's an extended comparison:
>>
>> (("with-find 4096" . "Elapsed time: 1.737742s (1.019624s in 28 GCs)")
>>    ("with-find 40960" . "Elapsed time: 1.515376s (0.942906s in 26 GCs)")
>>    ("with-find 409600" . "Elapsed time: 1.458987s (0.948857s in 26 GCs)")
>>    ("with-find 1048576" . "Elapsed time: 1.363882s (0.888599s in 24 GCs)")
>>    ("with-find-p 4096" . "Elapsed time: 1.202522s (0.745758s in 19 GCs)")
>>    ("with-find-p 40960" . "Elapsed time: 1.005221s (0.640815s in 16 GCs)")
>> ("with-find-p 409600" . "Elapsed time: 0.855483s (0.591208s in 15 GCs)")
>> ("with-find-p 1048576". "Elapsed time: 0.825936s (0.623876s in 16 GCs)")
>> ("with-find-sync 4096" . "Elapsed time: 0.848059s (0.272570s in 7 GCs)")
>> ("with-find-sync 409600"."Elapsed time: 0.912932s (0.339230s in 9 GCs)")
>> ("with-find-sync 1048576"."Elapsed time: 0.880479s (0.303047s in 8 GCs)"
>> ))
>>
>> What was puzzling for me, overall, is that if we take "with-find 409600"
>> (the fastest among the asynchronous runs without parallelism) and
>> "with-find-sync", the difference in GC time (which is repeatable),
>> 0.66s, almost covers all the difference in performance. And as for
>> "with-find-p 409600", it would come out on top! Which it did in Ihor's
>> tests when GC was disabled.
>>
>> But where does the extra GC time come from? Is it from extra consing in
>> the asynchronous call's case? If it is, it's not from all the chunked
>> strings, apparently, given that increasing max string's size (and
>> decreasing their number by 2x-6x, according to my logging) doesn't
>> affect the reported GC time much.
>>
>> Could the extra time spent in GC just come from the fact that it's given
>> more opportunities to run, maybe? call_process stays entirely in C,
>> whereas make-process, with its asynchronous approach, goes between C and
>> Lisp even time it receives input. The report above might indicate so:
>> with-find-p have ~20 garbage collection cycles, whereas with-find-sync -
>> only ~10. Or could there be some other source of consing, unrelated to
>> the process output string, and how finely they are sliced?
> 
> These questions can only be answered by dumping the values of the 2 GC
> thresholds and of consing_until_gc for each GC cycle.  It could be
> that we are consing more Lisp memory, or it could be that one of the
> implementations provides fewer opportunities for Emacs to call
> maybe_gc.  Or it could be some combination of the two.

Do you think the outputs of 
https://elpa.gnu.org/packages/emacs-gc-stats.html could help?

Otherwise, I suppose I need to add some fprintf's somewhere. Would the 
beginning of maybe_gc inside lisp.h be a good place for that?

>> If we get back to increasing read-process-output-max, which does help
>> (apparently due to reducing the number we switch between reading from
>> the process and doing... whatever else), the sweet spot seems to be
>> 1048576, which is my system's maximum value. Anything higher - and the
>> perf goes back to worse -- I'm guessing something somewhere resets the
>> value to default? Not sure why it doesn't clip to the maximum allowed,
>> though.
>>
>> Anyway, it would be helpful to be able to decide on as high as possible
>> value without manually reading from /proc/sys/fs/pipe-max-size. And what
>> of other OSes?
> 
> Is this with pipes or with PTYs?

All examples which use make-process call it with :connection-type 'pipe.

The one that calls process-file (the "synchronous" impl) also probably 
does, but I don't see that in the docstring.




This bug report was last modified 1 year and 273 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.