GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
Message #491 received at 64735 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 10/09/2023 08:33, Eli Zaretskii wrote:
>> Date: Sun, 10 Sep 2023 04:30:24 +0300
>> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net,
>> 64735 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dmitry <at> gutov.dev>
>>
>> On 08/09/2023 09:35, Eli Zaretskii wrote:
>>> I think you'd need to expose consing_until_gc to Lisp, and then you
>>> can collect the data from Lisp.
>>
>> I can expose it to Lisp and print all three from post-gc-hook, but the
>> result just looks like this:
>>
>> gc-pct 0.1 gc-thr 800000 cugc 4611686018427387903
>>
>> Perhaps I need to add a hook which runs at the beginning of GC? Or of
>> maybe_gc even?
>
> You could record its value in a local variable at the entry to
> garbage_collect, and the expose that value to Lisp.
That also doesn't seem to give much, given that the condition for
entering 'maybe_garbage_collect' is (consing_until_gc < 0). I.e. we wait
until it's down to 0, then garbage-collect. What we could perhaps do is
add another hook (or a printer) at the beginning of maybe_gc, but either
would result in lots and lots of output.
>> Alternatively, (memory-use-counts) seems to retain some counters which
>> don't get erased during garbage collection.
>
> Maybe using those will be good enough, indeed.
I added this instrumentation:
(defvar last-mem-counts '(0 0 0 0 0 0 0))
(defun gc-record-after ()
(let* ((counts (memory-use-counts))
(diff (cl-map 'list
(lambda (old new) (- new old))
last-mem-counts counts)))
(setq last-mem-counts counts)
(message "counts diff %s" diff)))
(add-hook 'post-gc-hook #'gc-record-after)
so that after each garbage collection we print the differences in all
counters (CONSES FLOATS VECTOR-CELLS SYMBOLS STRING-CHARS INTERVALS
STRINGS).
And a message call when the process finishes.
And made those recordings during the benchmark runs of two different
listing methods (one using make-process, another using process-file) to
list all files in a large directory (there are ~200000 files there).
The make-process one I also ran with a different (large) value of
read-process-output-max. Results attached.
What's in there? First of all, for find-directory-files-recursively-3,
there are 0 garbage collections between the beginning of the function
and when we start parsing the output (no GCs while the process is
writing to the buffer synchronously). I guess inserting output in a
buffer doesn't increase consing, so there's nothing to GC?
Next: for find-directory-files-recursively-2, the process only finishes
at the end, when all GC cycles are done for. I suppose that also means
we block the process's output while Lisp is running, and also that
whatever GC events occur might coincide with the chunks of output coming
from the process, and however many of them turn out to be in total.
So there is also a second recording for
find-directory-files-recursively-2 with read-process-output-max=409600.
It does improve the performance significantly (and reduce the number of
GC pauses). I guess what I'm still not clear on, is whether the number
of GC pauses is fewer because of less consing (the only column that
looks significantly different is the 3rd: VECTOR-CELLS), or because the
process finishes faster due to larger buffers, which itself causes fewer
calls to maybe_gc.
And, of course, what else could be done to reduce the time spent in GC
in the asynchronous case.
[gcs2.txt (text/plain, attachment)]
[gcs3.txt (text/plain, attachment)]
This bug report was last modified 1 year and 273 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.