GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net, 64735 <at> debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Sat, 29 Jul 2023 03:12:34 +0300
On 27/07/2023 16:30, Dmitry Gutov wrote:
> I can imagine that the filter-based approach necessarily creates more 
> strings (to pass to the filter function). Maybe we could increase those 
> strings' size (thus reducing the number) by increasing the read buffer 
> size?

To go further along this route, first of all, I verified that the input 
strings are (almost) all the same length: 4096. And they are parsed into 
strings with length 50-100 characters, meaning the number of "junk" 
objects due to the process-filter approach probably shouldn't matter too 
much, given that the number of strings returned is 40-80x more.

But then I ran these tests with different values of 
read-process-output-max, which exactly increased those strings' size, 
proportionally reducing their number. The results were:

> (my-bench-rpom 1 default-directory "")

=>

(("with-find-p 4096" . "Elapsed time: 0.945478s (0.474680s in 6 GCs)")
 ("with-find-p 40960" . "Elapsed time: 0.760727s (0.395379s in 5 GCs)")
("with-find-p 409600" . "Elapsed time: 0.729757s (0.394881s in 5 GCs)"))

where

(defun my-bench-rpom (count path regexp)
  (setq path (expand-file-name path))
  (list
   (cons "with-find-p 4096"
         (let ((read-process-output-max 4096))
           (benchmark count (list 'find-directory-files-recursively-2 
path regexp))))
   (cons "with-find-p 40960"
         (let ((read-process-output-max 40960))
           (benchmark count (list 'find-directory-files-recursively-2 
path regexp))))
   (cons "with-find-p 409600"
         (let ((read-process-output-max 409600))
           (benchmark count (list 'find-directory-files-recursively-2 
path regexp))))))

...with the last iteration showing consistently the same or better 
performance than the "sync" version I benchmarked previously.

What does that mean for us? The number of strings in the heap is 
reduced, but not by much (again, the result is a list with 43x more 
elements). The combined memory taken up by these intermediate strings to 
be garbage-collected, is the same.

It seems like per-chunk overhead is non-trivial, and affects GC somehow 
(but not in a way that just any string would).

In this test, by default, the output produces ~6000 strings and passes 
them to the filter function. Meaning, read_and_dispose_of_process_output 
is called about 6000 times, producing the overhead of roughly 0.2s. 
Something in there must be producing extra work for the GC.

This line seems suspect:

       list3 (outstream, make_lisp_proc (p), text),

Creates 3 conses and one Lisp object (tagged pointer). But maybe I'm 
missing something bigger.




This bug report was last modified 1 year and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.