GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net, 64735 <at> debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Mon, 11 Sep 2023 03:02:55 +0300
[Message part 1 (text/plain, inline)]
On 10/09/2023 08:33, Eli Zaretskii wrote:
>> Date: Sun, 10 Sep 2023 04:30:24 +0300
>> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net,
>>   64735 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dmitry <at> gutov.dev>
>>
>> On 08/09/2023 09:35, Eli Zaretskii wrote:
>>> I think you'd need to expose consing_until_gc to Lisp, and then you
>>> can collect the data from Lisp.
>>
>> I can expose it to Lisp and print all three from post-gc-hook, but the
>> result just looks like this:
>>
>> gc-pct 0.1 gc-thr 800000 cugc 4611686018427387903
>>
>> Perhaps I need to add a hook which runs at the beginning of GC? Or of
>> maybe_gc even?
> 
> You could record its value in a local variable at the entry to
> garbage_collect, and the expose that value to Lisp.

That also doesn't seem to give much, given that the condition for 
entering 'maybe_garbage_collect' is (consing_until_gc < 0). I.e. we wait 
until it's down to 0, then garbage-collect. What we could perhaps do is 
add another hook (or a printer) at the beginning of maybe_gc, but either 
would result in lots and lots of output.

>> Alternatively, (memory-use-counts) seems to retain some counters which
>> don't get erased during garbage collection.
> 
> Maybe using those will be good enough, indeed.

I added this instrumentation:

(defvar last-mem-counts '(0 0 0 0 0 0 0))

(defun gc-record-after ()
  (let* ((counts (memory-use-counts))
         (diff (cl-map 'list
                       (lambda (old new) (- new old))
                       last-mem-counts counts)))
    (setq last-mem-counts counts)
    (message "counts diff %s" diff)))

(add-hook 'post-gc-hook #'gc-record-after)

so that after each garbage collection we print the differences in all 
counters (CONSES FLOATS VECTOR-CELLS SYMBOLS STRING-CHARS INTERVALS 
STRINGS).

And a message call when the process finishes.

And made those recordings during the benchmark runs of two different 
listing methods (one using make-process, another using process-file) to 
list all files in a large directory (there are ~200000 files there).

The make-process one I also ran with a different (large) value of 
read-process-output-max. Results attached.

What's in there? First of all, for find-directory-files-recursively-3, 
there are 0 garbage collections between the beginning of the function 
and when we start parsing the output (no GCs while the process is 
writing to the buffer synchronously). I guess inserting output in a 
buffer doesn't increase consing, so there's nothing to GC?

Next: for find-directory-files-recursively-2, the process only finishes 
at the end, when all GC cycles are done for. I suppose that also means 
we block the process's output while Lisp is running, and also that 
whatever GC events occur might coincide with the chunks of output coming 
from the process, and however many of them turn out to be in total.

So there is also a second recording for 
find-directory-files-recursively-2 with read-process-output-max=409600. 
It does improve the performance significantly (and reduce the number of 
GC pauses). I guess what I'm still not clear on, is whether the number 
of GC pauses is fewer because of less consing (the only column that 
looks significantly different is the 3rd: VECTOR-CELLS), or because the 
process finishes faster due to larger buffers, which itself causes fewer 
calls to maybe_gc.

And, of course, what else could be done to reduce the time spent in GC 
in the asynchronous case.
[gcs2.txt (text/plain, attachment)]
[gcs3.txt (text/plain, attachment)]

This bug report was last modified 1 year and 273 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.