GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
Message #416 received at 64735 <at> debbugs.gnu.org (full text, mbox):
> Date: Mon, 24 Jul 2023 15:55:13 +0300
> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net,
> 64735 <at> debbugs.gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
>
> >> 1. 'find' itself is much slower there. There is room for improvement in
> >> the port.
> >
> > I think it's the filesystem, not the port (which I did myself in this
> > case).
>
> But directory-files-recursively goes through the same filesystem,
> doesn't it?
It does (more or less; see below). But I was not trying to explain
why Find is slower than directory-files-recursively, I was trying to
explain why Find on Windows is slower than Find on GNU/Linux.
If you are asking why directory-files-recursively is so much faster on
Windows than Find, then the main factors I can think about are:
. IPC, at least in how we implement it in Emacs on MS-Windows, via a
separate thread and OS-level events between them to signal that
stuff is available for reading, whereas
directory-files-recursively avoids this overhead completely;
. Find uses Posix APIs: 'stat', 'chdir', 'readdir' -- which on
Windows are emulated by wrappers around native APIs. Moreover,
Find uses 'char *' for file names, so calling native APIs involves
transparent conversion to UTF-16 and back, which is what native
APIs accept and return. By contrast, Emacs on Windows calls the
native APIs directly, and converts to UTF-16 from UTF-8, which is
faster. (This last point also means that using Find on Windows
has another grave disadvantage: it cannot fully support non-ASCII
file names, only those that can be encoded by the current
single-byte system codepage.)
> >> 2. The process output handling is worse.
> >
> > Not sure what that means.
>
> Emacs's ability to process the output of a process on the particular
> platform.
>
> You said:
>
> Btw, the Find command with pipe to some other program, like wc,
> finishes much faster, like 2 to 4 times faster than when it is run
> from find-directory-files-recursively. That's probably the slowdown
> due to communications with async subprocesses in action.
I see this slowdown on GNU/Linux as well.
> One thing to try it changing the -with-find implementation to use a
> synchronous call, to compare (e.g. using 'process-file'). And repeat
> these tests on GNU/Linux too.
This still uses pipes, albeit without the pselect stuff.
> >> 3. Something particular to the project being used for the test.
> >
> > I don't think I understand this one.
>
> This described the possibility where the disparity between the
> implementations' runtimes was due to something unusual in the project
> structure, if you tested different projects between Windows and
> GNU/Linux, making direct comparison less useful. It's the least likely
> cause, but still sometimes a possibility.
I have on my Windows system a d:/usr/share tree that is very similar
to (albeit somewhat smaller than) a typical /usr/share tree on Posix
systems. I tried with that as well, and the results were similar.
> > The ezwinports is the version I'm using here. But maybe someone came
> > up with a better one: after all, I did my port many years ago (because
> > the native ports available back then were abysmally slow).
>
> We should also look at the exact numbers. If you say that "| wc"
> invocation is 2-4x faster than what's reported in the benchmark, then it
> takes about 2-4 seconds. Which is still oddly slower than your reported
> numbers for directory-files-recursively.
Yes, so there are additional factors at work, at least with this port
of Find.
This bug report was last modified 1 year and 274 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.