GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
View this message in rfc822 format
On 24/07/2023 14:20, Eli Zaretskii wrote:
>> Date: Sun, 23 Jul 2023 22:27:26 +0300
>> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net,
>> 64735 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dmitry <at> gutov.dev>
>>
>> On 23/07/2023 20:56, Eli Zaretskii wrote:
>>>> And, ideally, do all the relevant benchmarking when proposing the change.
>>> Of course. Although the benchmarks until now already show quite a
>>> variability.
>>
>> Speaking of your MS Windows results that are unflattering to 'find', it
>> might be worth it to do a more varied comparison, to determine the
>> OS-specific bottleneck.
>>
>> Off the top of my head, here are some possibilities:
>>
>> 1. 'find' itself is much slower there. There is room for improvement in
>> the port.
>
> I think it's the filesystem, not the port (which I did myself in this
> case).
But directory-files-recursively goes through the same filesystem,
doesn't it?
> But I'd welcome similar tests on other Windows systems with
> other ports of Find. Just remember to measure this particular
> benchmark, not just Find itself from the shell, as the times are very
> different (as I reported up-thread).
Concur.
>> 2. The process output handling is worse.
>
> Not sure what that means.
Emacs's ability to process the output of a process on the particular
platform.
You said:
Btw, the Find command with pipe to some other program, like wc,
finishes much faster, like 2 to 4 times faster than when it is run
from find-directory-files-recursively. That's probably the slowdown
due to communications with async subprocesses in action.
One thing to try it changing the -with-find implementation to use a
synchronous call, to compare (e.g. using 'process-file'). And repeat
these tests on GNU/Linux too.
That would help us gauge the viability of using an asynchronous process
to get the file listing. But also, if one was just looking into
reimplementing directory-files-recursively using 'find' (to create an
endpoint with swappable implementations, for example), 'process-file' is
a suitable substitute because the original is also currently synchronous.
>> 3. Something particular to the project being used for the test.
>
> I don't think I understand this one.
This described the possibility where the disparity between the
implementations' runtimes was due to something unusual in the project
structure, if you tested different projects between Windows and
GNU/Linux, making direct comparison less useful. It's the least likely
cause, but still sometimes a possibility.
>> To look into the possibility #1, you can try running the same command in
>> the terminal with the output to NUL and comparing the runtime to what's
>> reported in the benchmark.
>
> Output to the null device is a bad idea, as (AFAIR) Find is clever
> enough to detect that and do nothing. I run "find | wc" instead, and
> already reported that it is much faster.
Now I see it, thanks.
>> I actually remember, from my time on MS Windows about 10 years ago, that
>> some older ports of 'find' and/or 'grep' did have performance problems,
>> but IIRC ezwinports contained the improved versions.
>
> The ezwinports is the version I'm using here. But maybe someone came
> up with a better one: after all, I did my port many years ago (because
> the native ports available back then were abysmally slow).
We should also look at the exact numbers. If you say that "| wc"
invocation is 2-4x faster than what's reported in the benchmark, then it
takes about 2-4 seconds. Which is still oddly slower than your reported
numbers for directory-files-recursively.
This bug report was last modified 1 year and 273 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.