#64735 - 29.0.92; find invocations are ~15x slower because of ignores

GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Message #467 received at 64735 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev> To: Eli Zaretskii <eliz <at> gnu.org> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net, 64735 <at> debbugs.gnu.org Subject: Re: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Sat, 29 Jul 2023 03:12:34 +0300

On 27/07/2023 16:30, Dmitry Gutov wrote: > I can imagine that the filter-based approach necessarily creates more > strings (to pass to the filter function). Maybe we could increase those > strings' size (thus reducing the number) by increasing the read buffer > size? To go further along this route, first of all, I verified that the input strings are (almost) all the same length: 4096. And they are parsed into strings with length 50-100 characters, meaning the number of "junk" objects due to the process-filter approach probably shouldn't matter too much, given that the number of strings returned is 40-80x more. But then I ran these tests with different values of read-process-output-max, which exactly increased those strings' size, proportionally reducing their number. The results were: > (my-bench-rpom 1 default-directory "") => (("with-find-p 4096" . "Elapsed time: 0.945478s (0.474680s in 6 GCs)") ("with-find-p 40960" . "Elapsed time: 0.760727s (0.395379s in 5 GCs)") ("with-find-p 409600" . "Elapsed time: 0.729757s (0.394881s in 5 GCs)")) where (defun my-bench-rpom (count path regexp) (setq path (expand-file-name path)) (list (cons "with-find-p 4096" (let ((read-process-output-max 4096)) (benchmark count (list 'find-directory-files-recursively-2 path regexp)))) (cons "with-find-p 40960" (let ((read-process-output-max 40960)) (benchmark count (list 'find-directory-files-recursively-2 path regexp)))) (cons "with-find-p 409600" (let ((read-process-output-max 409600)) (benchmark count (list 'find-directory-files-recursively-2 path regexp)))))) ...with the last iteration showing consistently the same or better performance than the "sync" version I benchmarked previously. What does that mean for us? The number of strings in the heap is reduced, but not by much (again, the result is a list with 43x more elements). The combined memory taken up by these intermediate strings to be garbage-collected, is the same. It seems like per-chunk overhead is non-trivial, and affects GC somehow (but not in a way that just any string would). In this test, by default, the output produces ~6000 strings and passes them to the filter function. Meaning, read_and_dispose_of_process_output is called about 6000 times, producing the overhead of roughly 0.2s. Something in there must be producing extra work for the GC. This line seems suspect: list3 (outstream, make_lisp_proc (p), text), Creates 3 conses and one Lisp object (tagged pointer). But maybe I'm missing something bigger.

This bug report was last modified 1 year and 327 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #64735 29.0.92; find invocations are ~15x slower because of ignores

GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores