GNU bug report logs - #35531
problem with ls in coreutils

Previous Next

Package: coreutils;

Reported by: Viktors Berstis <cugnujm <at> berstis.com>

Date: Wed, 1 May 2019 22:53:01 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 35531 in the body.
You can then email your comments to 35531 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Wed, 01 May 2019 22:53:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Viktors Berstis <cugnujm <at> berstis.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 01 May 2019 22:53:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Viktors Berstis <cugnujm <at> berstis.com>
To: bug-coreutils <at> gnu.org
Subject: problem with ls in coreutils
Date: Wed, 1 May 2019 15:03:31 -0700
When running "ls" or "ls -U" on a windows directory containing 50000 
files, ls takes forever.  Something seems to be highly inefficient in there.

This is for the 64 bit version build 4/20/2005 11:41AM.  The exe size is 
180736 bytes.

Thanks.

- Viktors Berstis




Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Thu, 02 May 2019 05:45:02 GMT) Full text and rfc822 format available.

Message #8 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Kamil Dudka <kdudka <at> redhat.com>
To: Viktors Berstis <cugnujm <at> berstis.com>
Cc: bug-coreutils <at> gnu.org, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Thu, 02 May 2019 07:44:03 +0200
On Thursday, May 2, 2019 12:03:31 AM CEST Viktors Berstis wrote:
> When running "ls" or "ls -U" on a windows directory containing 50000
> files, ls takes forever.  Something seems to be highly inefficient in there.

Could you please try it with ls -U -1?

Kamil

> This is for the 64 bit version build 4/20/2005 11:41AM.  The exe size is
> 180736 bytes.
> 
> Thanks.
> 
> - Viktors Berstis






Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Thu, 02 May 2019 05:46:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Thu, 02 May 2019 23:18:01 GMT) Full text and rfc822 format available.

Message #14 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Viktors Berstis <cugnujm <at> berstis.com>
To: 35531 <at> debbugs.gnu.org
Cc: kdudka <at> redhat.com
Subject: Re: bug#35531: problem with ls in coreutils
Date: Thu, 2 May 2019 16:12:52 -0700
My machine has 64GB of ram, 6 core 3.5ghz processor and fast disks.
The directory in question has 57,600 files in it with a total size of 
about 47gb.
On a freshly booted machine (nothing cached),  "dir /on dirname | wc" 
takes about 6 seconds.  The second time it takes about 2 seconds.
On a freshly booted machine, "ls -U -1 dirname | wc" takes 5 minutes 48 
seconds!  A second time it is about a minute less.
ls might be doing something akin to opening every file.  If I run a 
program to actually open and read every file in that directory, the 
system seems to cache it all in ram.  Then the ls takes only about 11 
seconds.

- Viktors Berstis

Kamil Dudka wrote:
> On Thursday, May 2, 2019 12:03:31 AM CEST Viktors Berstis wrote:
>> When running "ls" or "ls -U" on a windows directory containing 50000
>> files, ls takes forever.  Something seems to be highly inefficient in there.
> Could you please try it with ls -U -1?
>
> Kamil
>
>> This is for the 64 bit version build 4/20/2005 11:41AM.  The exe size is
>> 180736 bytes.
>>
>> Thanks.
>>
>> - Viktors Berstis
>




Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 03 May 2019 00:12:02 GMT) Full text and rfc822 format available.

Message #17 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Viktors Berstis <cugnujm <at> berstis.com>
Cc: kdudka <at> redhat.com, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Thu, 2 May 2019 17:11:20 -0700
It's probably something inside the kernel (e.g., filesystem code).

What does the shell command 'strace -o /tmp/tr -s 128 -T ls -U -1
dirname | wc' say? You can see which system calls are taking the most
time by then running 'sort -t"<" -k2n /tmp/tr'. On my platform (Fedora
29 x86-64 ext4, an older desktop with only disk drives), the hoggiest
syscalls are getdents64, which are as much as 24 ms per call when the
data are not cached, and more like 0.7 ms per call when the data are
cached (each such call retrieves about 1000 directory entries). What do
you see?





Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 03 May 2019 01:19:02 GMT) Full text and rfc822 format available.

Message #20 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Viktors Berstis <cugnujm <at> berstis.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: kdudka <at> redhat.com, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Thu, 2 May 2019 17:41:45 -0700
I am running coreutlls on Windows, not linux... so strace does not work 
there.

The November 10, 1999 version 3.16 of coreutils "ls" command is 
lightning fast on Windows (and on the large directory) but unfortunately 
stops at 32K files.  The newer version of "ls" built for Windows has the 
problem.
By "new" version, I am using the 64 bit build for windows dated 
4/20/2005 at 11:41AM with exe size of 180736 bytes, md5sum: 
47ba770d80382cbd66ddba13924c1417  Version 5.3.0  .  I didn't see a place 
to download a newer binary version to try.

BTW, booting the machine with Ubuntu, ls on that same large directory is 
very fast.

- Viktors Berstis

Paul Eggert wrote:
> It's probably something inside the kernel (e.g., filesystem code).
>
> What does the shell command 'strace -o /tmp/tr -s 128 -T ls -U -1
> dirname | wc' say? You can see which system calls are taking the most
> time by then running 'sort -t"<" -k2n /tmp/tr'. On my platform (Fedora
> 29 x86-64 ext4, an older desktop with only disk drives), the hoggiest
> syscalls are getdents64, which are as much as 24 ms per call when the
> data are not cached, and more like 0.7 ms per call when the data are
> cached (each such call retrieves about 1000 directory entries). What do
> you see?
>
>





Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 03 May 2019 01:21:01 GMT) Full text and rfc822 format available.

Message #23 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Viktors Berstis <cugnujm <at> berstis.com>
Cc: kdudka <at> redhat.com, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Thu, 2 May 2019 18:20:17 -0700
On 5/2/19 5:41 PM, Viktors Berstis wrote:
> The newer version of "ls" built for Windows has the problem.

Ah, then you'll have to talk to whoever built that version, which is not
me (and generally speaking they don't hang out on this mailing list).





Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 03 May 2019 03:44:02 GMT) Full text and rfc822 format available.

Message #26 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Viktors Berstis <cugnujm <at> berstis.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: kdudka <at> redhat.com, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Thu, 2 May 2019 20:43:20 -0700
I downloaded it from 
https://sourceforge.net/projects/gnuwin32/files/coreutils/5.3.0/coreutils-5.3.0.exe/download
The help said "Report bugs to <bug-coreutils <at> gnu.org>" which is what I did.
The build is so old that I suspect none of the original players are around.
Do you know of a windows binary or windows source that is newer 
anywhere?  Thanks.

- Viktors Berstis

Paul Eggert wrote:
> On 5/2/19 5:41 PM, Viktors Berstis wrote:
>> The newer version of "ls" built for Windows has the problem.
> Ah, then you'll have to talk to whoever built that version, which is not
> me (and generally speaking they don't hang out on this mailing list).
>
>





Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 03 May 2019 11:16:02 GMT) Full text and rfc822 format available.

Message #29 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Kamil Dudka <kdudka <at> redhat.com>
To: Viktors Berstis <cugnujm <at> berstis.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Fri, 03 May 2019 13:15:44 +0200
On Friday, May 3, 2019 5:43:20 AM CEST Viktors Berstis wrote:
> I downloaded it from
> https://sourceforge.net/projects/gnuwin32/files/coreutils/5.3.0/coreutils-5.
> 3.0.exe/download The help said "Report bugs to <bug-coreutils <at> gnu.org>"
> which is what I did. The build is so old that I suspect none of the
> original players are around. Do you know of a windows binary or windows
> source that is newer
> anywhere?  Thanks.
> 
> - Viktors Berstis

`ls -U1` will not run significantly faster than `ls` on powerful hardware.  
The key difference is that `ls -U1` prints the results continuously as the 
list of files is read from file system whereas `ls` will be silent until
the complete list is read.  You need to use a new enough version of coreutils 
for this to work properly.  This optimisation was introduced in coreutils-7.5:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=v7.0~113
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=v7.5~49

Kamil

> Paul Eggert wrote:
> > On 5/2/19 5:41 PM, Viktors Berstis wrote:
> >> The newer version of "ls" built for Windows has the problem.
> > 
> > Ah, then you'll have to talk to whoever built that version, which is not
> > me (and generally speaking they don't hang out on this mailing list).






Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 03 May 2019 15:57:02 GMT) Full text and rfc822 format available.

Message #32 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Viktors Berstis <cugnujm <at> berstis.com>
To: Kamil Dudka <kdudka <at> redhat.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Fri, 3 May 2019 08:56:35 -0700
[Message part 1 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 03 May 2019 16:14:02 GMT) Full text and rfc822 format available.

Message #35 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Viktors Berstis <cugnujm <at> berstis.com>
Cc: kdudka <at> redhat.com, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Fri, 3 May 2019 09:13:05 -0700
On 5/2/19 8:43 PM, Viktors Berstis wrote:
> I downloaded it from
> https://sourceforge.net/projects/gnuwin32/files/coreutils/5.3.0/coreutils-5.3.0.exe/download
>
> The help said "Report bugs to <bug-coreutils <at> gnu.org>" which is what I
> did. 

Whoever built it just copied that line from upstream. If the build has
MS-Windows-specific problems, you'll need to find an MS-Windows person
somewhere who can fix it, or find a better build somewhere. This
bug-reporting system is not the best place to do that; see:

https://www.gnu.org/prep/standards/html_node/System-Portability.html

and look for "Windows".





Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 03 May 2019 16:25:01 GMT) Full text and rfc822 format available.

Message #38 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Kamil Dudka <kdudka <at> redhat.com>
To: Viktors Berstis <cugnujm <at> berstis.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Fri, 03 May 2019 18:24:09 +0200
On Friday, May 3, 2019 5:56:35 PM CEST Viktors Berstis wrote:
> I don't think the problem has anything to do with sorting or -U1.

It was unclear what you meant by "the problem" so I pointed out the only 
inefficiency that was immediately obvious to me.

>     When ls is taking over 5 minutes for something that should run in a
>     couple of seconds, the task manager shows that it is using nearly no
>     CPU.... it is doing a lot of  "other I/O".

You can try to use some profiling/tracing tools to debug the root cause.

> It doesn't look like the build you referenced is designed to be
>     compileable for Windows.  Is there one that is?  Thanks.

I would suggest to build the latest upstream release (coreutils-8.31 now) 
from:

https://www.gnu.org/software/coreutils/

Kamil






Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Sat, 04 May 2019 01:05:02 GMT) Full text and rfc822 format available.

Message #41 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Peter Edwards <no-spam <at> optusnet.com.au>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Sat, 4 May 2019 10:01:40 +1000
Hi

Although this bug report seems to be a problem with the windows port
of ls, it reminded me of an interesting investigation into slow ls
speeds due to colorizing via the LS_COLORS environment variable.

See 
https://news.sherlock.stanford.edu/posts/when-setting-an-environment-variable-gives-you-a-40-x-speedup

I thought it an interesting case study.

Regards - PSDE




Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Sat, 04 May 2019 05:28:01 GMT) Full text and rfc822 format available.

Message #44 received at submit <at> debbugs.gnu.org (full text, mbox):

From: L A Walsh <coreutils <at> tlinx.org>
To: Viktors Berstis <cugnujm <at> berstis.com>
Cc: Coreutils <bug-coreutils <at> gnu.org>
Subject: Re: bug#35531: problem with ls in coreutils
Date: Fri, 03 May 2019 22:26:29 -0700
On 5/1/2019 3:03 PM, Viktors Berstis wrote:
> When running "ls" or "ls -U" on a windows directory containing 50000 
> files, ls takes forever.  Something seems to be highly inefficient in there.
>   
---
    it sounds like you are running ls with no options
(nothing in environment and no switches on the command line).

    Is this the case?  If is, I'm stumped unless whoever
compiled that had it set to do some things by default.

    Basically on Windows, anything that you might get away with on
linux with a stat call, takes an 'open' call on windows.   That gets
costly.  Anything that appends a classifyer to the end of the file
(like ls -F, --classify or --file-type) or that would display any
of the data or size information (ls -l would be right out!).  The
only thing 'ls' could display without such a penalty is the file
name.  However that only apply to stock ls, and since we don't know
what options might have been enabled for that 'ls' (including any
default usage of switches such as those mentioned above), it's
hard to say exactly what the problem is.

    A suggestion -- try installing a minimal snapshot of 'Cygwin'
('cygwin.org') and try env -i /bin/ls on cygwin's command line
in that directory and see how fast that is.  If it is slow,  then
something excessively weird is going on that is the wonder of a closed
source Windows.  However, my hunch would have it be 'fast', but since
I don't know the cause, can't say if that would help or not.

    One further possibility that I'd think unlikely: the directory could
be very fragmented and take a long time to (5minutes?! really unlikely,
almost has to be the missing stat call) read...though the figures
you are stating sound out of bounds for a fragmented directory.
Still, if you grab the 'contig' tool from the sysinternals site (a
windows subsite), it can show you the number of fragments a file
is split into -- and can be used on directories:
/prog/Sysinternals/cmd> contig -a -v .

Contig v1.6 - Makes files contiguous
Copyright (C) 1998-2010 Mark Russinovich
Sysinternals - www.sysinternals.com
------------------------
Processing C:\prog\Sysinternals\cmd:
Scanning file...
Cluster: Length
0: 3
File size: 12288 bytes
C:\prog\Sysinternals\cmd is in 1 fragment
------------------------
Summary:
     Number of files processed   : 1
     Average fragmentation       : 1 frags/file


========
Other than those options, not sure what else to suggest to narrow
it down, but thought i'd at least mention a few possibilities.

Good luck!





Information forwarded to bug-coreutils <at> gnu.org:
bug#35531; Package coreutils. (Fri, 10 May 2019 10:13:01 GMT) Full text and rfc822 format available.

Message #47 received at 35531 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Peter Edwards <no-spam <at> optusnet.com.au>, 35531 <at> debbugs.gnu.org
Subject: Re: bug#35531: problem with ls in coreutils
Date: Fri, 10 May 2019 03:12:18 -0700
tag 35531 notabug
close 35531
stop

On 03/05/19 17:01, Peter Edwards wrote:
> Hi
> 
> Although this bug report seems to be a problem with the windows port
> of ls, it reminded me of an interesting investigation into slow ls
> speeds due to colorizing via the LS_COLORS environment variable.
> 
> See 
> https://news.sherlock.stanford.edu/posts/when-setting-an-environment-variable-gives-you-a-40-x-speedup
> 
> I thought it an interesting case study.

Thanks for the info.

In summary, to speed up ls color induced processing significantly,
disable stat() and getxattr() calls with:
  LS_COLORS='ex=00:su=00:sg=00:ca=00:'

A general point though is that colors are for human processing,
and how fast can one process the output from ls :)
I.E. if ls is being written to pipe/file or somewhere where
speed may be important, the coloring is disabled by default anyway.

cheers,
Pádraig




Added tag(s) notabug. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Fri, 10 May 2019 10:13:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 35531 <at> debbugs.gnu.org and Viktors Berstis <cugnujm <at> berstis.com> Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Fri, 10 May 2019 10:13:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 07 Jun 2019 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 7 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.