GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
View this message in rfc822 format
> Date: Mon, 24 Jul 2023 15:55:13 +0300
> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net,
> 64735 <at> debbugs.gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
>
> >> 1. 'find' itself is much slower there. There is room for improvement in
> >> the port.
> >
> > I think it's the filesystem, not the port (which I did myself in this
> > case).
>
> But directory-files-recursively goes through the same filesystem,
> doesn't it?
It does (more or less; see below). But I was not trying to explain
why Find is slower than directory-files-recursively, I was trying to
explain why Find on Windows is slower than Find on GNU/Linux.
If you are asking why directory-files-recursively is so much faster on
Windows than Find, then the main factors I can think about are:
. IPC, at least in how we implement it in Emacs on MS-Windows, via a
separate thread and OS-level events between them to signal that
stuff is available for reading, whereas
directory-files-recursively avoids this overhead completely;
. Find uses Posix APIs: 'stat', 'chdir', 'readdir' -- which on
Windows are emulated by wrappers around native APIs. Moreover,
Find uses 'char *' for file names, so calling native APIs involves
transparent conversion to UTF-16 and back, which is what native
APIs accept and return. By contrast, Emacs on Windows calls the
native APIs directly, and converts to UTF-16 from UTF-8, which is
faster. (This last point also means that using Find on Windows
has another grave disadvantage: it cannot fully support non-ASCII
file names, only those that can be encoded by the current
single-byte system codepage.)
> >> 2. The process output handling is worse.
> >
> > Not sure what that means.
>
> Emacs's ability to process the output of a process on the particular
> platform.
>
> You said:
>
> Btw, the Find command with pipe to some other program, like wc,
> finishes much faster, like 2 to 4 times faster than when it is run
> from find-directory-files-recursively. That's probably the slowdown
> due to communications with async subprocesses in action.
I see this slowdown on GNU/Linux as well.
> One thing to try it changing the -with-find implementation to use a
> synchronous call, to compare (e.g. using 'process-file'). And repeat
> these tests on GNU/Linux too.
This still uses pipes, albeit without the pselect stuff.
> >> 3. Something particular to the project being used for the test.
> >
> > I don't think I understand this one.
>
> This described the possibility where the disparity between the
> implementations' runtimes was due to something unusual in the project
> structure, if you tested different projects between Windows and
> GNU/Linux, making direct comparison less useful. It's the least likely
> cause, but still sometimes a possibility.
I have on my Windows system a d:/usr/share tree that is very similar
to (albeit somewhat smaller than) a typical /usr/share tree on Posix
systems. I tried with that as well, and the results were similar.
> > The ezwinports is the version I'm using here. But maybe someone came
> > up with a better one: after all, I did my port many years ago (because
> > the native ports available back then were abysmally slow).
>
> We should also look at the exact numbers. If you say that "| wc"
> invocation is 2-4x faster than what's reported in the benchmark, then it
> takes about 2-4 seconds. Which is still oddly slower than your reported
> numbers for directory-files-recursively.
Yes, so there are additional factors at work, at least with this port
of Find.
This bug report was last modified 16 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.