GNU bug report logs - #66020
[PATCH] Reduce GC churn in read_process_output

Previous Next

Package: emacs;

Reported by: Dmitry Gutov <dmitry <at> gutov.dev>

Date: Sat, 16 Sep 2023 01:27:02 UTC

Severity: wishlist

Tags: patch

Done: Dmitry Gutov <dmitry <at> gutov.dev>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 66020 <at> debbugs.gnu.org
Subject: bug#66020: (bug#64735 spin-off): regarding the default for read-process-output-max
Date: Tue, 19 Sep 2023 22:59:43 +0300
This is another continuation from bug#64735, a subthread in this bug 
seems more fitting, given that I did most of the tests with its patch 
applied.

On 16/09/2023 08:37, Eli Zaretskii wrote:
>> Date: Sat, 16 Sep 2023 04:32:26 +0300
>> Cc:luangruo <at> yahoo.com,sbaugh <at> janestreet.com,yantar92 <at> posteo.net,
>>   64735 <at> debbugs.gnu.org
>> From: Dmitry Gutov<dmitry <at> gutov.dev>
>>
>>>> I wonder what scenario that might become apparent in. Launching many
>>>> small processes at once? Can't think of a realistic test case.
>>> One process suffices.  The effect might not be significant, but
>>> slowdowns due to new features are generally considered regressions.
>> We'd need some objective way to evaluate this. Otherwise we'd just stop
>> at the prospect of slowing down some process somewhere by 9ns (never
>> mind speeding others up).
> That could indeed happen, and did happen in other cases.  My personal
> conclusion from similar situations is that it is impossible to tell in
> advance what the reaction will be; we need to present the numbers and
> see how the chips fall.

I wrote this test:

(defun test-ls-output ()
  (with-temp-buffer
    (let ((proc
           (make-process :name "ls"
                         :sentinel (lambda (&rest _))
                         :buffer (current-buffer)
                         :stderr (current-buffer)
                         :connection-type 'pipe
                         :command '("ls"))))
      (while (accept-process-output proc))
      (buffer-string))))

And tried to find some case where the difference is the least in favor 
of high buffer length. The one in favor of it we already know (a process 
with lots and lots of output).

But when running 'ls' on a small directory (output 500 chars long), the 
variance in benchmarking is larger than any difference I can see from 
changing read-process-output-max from 4096 to 40960 (or to 40900 even). 
The benchmark is the following:

  (benchmark 1000 '(let ((read-process-output-fast t) 
(read-process-output-max 4096)) (test-ls-output)))

When the directory is a little large (output ~50000 chars), there is 
more nuance. At first, as long as (!) read_and_insert_process_output_v2 
patch is applied and read-process-output-fast is non-nil, the difference 
is negligible:

| read-process-output-max | bench result                        |
|                    4096 | (4.566418994 28 0.8000380139999992) |
|                   40960 | (4.640526664 32 0.8330555910000008) |
|                  409600 | (4.629948652 30 0.7989731299999994) |

For completeness, here are the same results for 
read-process-output-fast=nil (emacs-29 is similar, though all a little 
slower):

| read-process-output-max | bench result                        |
|                    4096 | (4.953397326 52 1.354643750000001)  |
|                   40960 | (6.942334958 75 2.0616055079999995) |
|                  409600 | (7.124765651 76 2.0892871070000005) |

But as the session gets older (and I repeat these and other 
memory-intensive benchmarks), the outlay changes, and the larger buffer 
leads to uniformly worse number (the below is taken with 
read-process-output-fast=t; with that var set to nil the results were 
even worse):

| read-process-output-max | bench result                        |
|                    4096 | (5.02324481 41 0.8851443580000051)  |
|                   40960 | (5.438721274 61 1.2202541989999958) |
|                  409600 | (6.11188183 77 1.5461468160000038)  |

...which seems odd given that in general, the buffer length closer to 
the length of the output should be preferable, because otherwise it is 
allocated multiple times, and read_process_output is likewise called 
more. Perhaps longer strings get more difficult to allocate as 
fragmentation increases?

So, the last table is from a session I had running from yesterday, and 
the first table was produced after I restarted Emacs about an hour ago 
(the numbers were stable for 1-2 hours while I was writing this email 
on-and-off, then started degrading again a little bit, though not yet -- 
a couple of hours since -- even halfway to the numbers in the last table).

Where to go from here?

- Maybe we declare the difference insignificant and bump the value of 
read-process-output-max, given that it helps in other cases,
- Or try to find out the cause for degradation,
- Or keep the default the same, but make it easier to use different 
value for different processes (meaning, we resurrect the discussion in 
bug#38561).




This bug report was last modified 343 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.