GNU bug report logs - #7182
sort -R slow

Previous Next

Package: coreutils;

Reported by: Ole Tange <tange <at> gnu.org>

Date: Sat, 9 Oct 2010 13:12:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 7182 in the body.
You can then email your comments to 7182 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7182; Package coreutils. (Sat, 09 Oct 2010 13:12:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ole Tange <tange <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 09 Oct 2010 13:12:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ole Tange <tange <at> gnu.org>
To: bug-coreutils <at> gnu.org
Subject: sort -R slow
Date: Sat, 9 Oct 2010 14:52:41 +0200
I recently needed to randomize some lines. So I tried using 'sort -R'.
I was astonished how slow that was. So I tested how slow a competing
strategies are. GNU sort is two magnitudes slower than unsort and more
than one magnitude slower than perl:

$ time unsort file
real    0m1.388s

$ unsort --version
unsort 1.1.2

$ time perl -e 'print sort { rand() <=> rand() } <>' file
real    0m6.621s

$ time sort -R file
real    4m8.403s

$ sort --version
sort (GNU coreutils) 8.5

What is even scarier: sort without -R is faster than sort -R:

$ time sort file
real    0m53.553s

I would expect sort -R to be faster than sort and faster than Perl if
not as fast as unsort.


/Ole




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7182; Package coreutils. (Sat, 09 Oct 2010 19:18:02 GMT) Full text and rfc822 format available.

Message #8 received at 7182 <at> debbugs.gnu.org (full text, mbox):

From: "Alan Curry" <pacman-cu <at> kosh.dhis.org>
To: tange <at> gnu.org (Ole Tange)
Cc: 7182 <at> debbugs.gnu.org
Subject: Re: bug#7182: sort -R slow
Date: Sat, 9 Oct 2010 14:21:00 -0500 (GMT+5)
Ole Tange writes:
> 
> I recently needed to randomize some lines. So I tried using 'sort -R'.
> I was astonished how slow that was. So I tested how slow a competing
> strategies are. GNU sort is two magnitudes slower than unsort and more
> than one magnitude slower than perl:

Never heard of "unsort". Why didn't you try shuf(1)?

Also, your perl is not valid:

> 
> $ time perl -e 'print sort { rand() <=> rand() } <>' file
> real    0m6.621s

That comparison function is not consistent (unless very lucky).

> I would expect sort -R to be faster than sort and faster than Perl if
> not as fast as unsort.

How big is your test file? I expect sort(1) to be optimized for big jobs. I
bet it would win the contest if you are shuffling a file that's bigger than
available RAM.





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7182; Package coreutils. (Sat, 09 Oct 2010 21:30:03 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Davide Brini <dave_br <at> gmx.com>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#7182: sort -R slow
Date: Sat, 9 Oct 2010 22:06:28 +0100
On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange <tange <at> gnu.org> wrote:

> I recently needed to randomize some lines. So I tried using 'sort -R'.
> I was astonished how slow that was. So I tested how slow a competing
> strategies are. GNU sort is two magnitudes slower than unsort and more
> than one magnitude slower than perl:
> 
> $ time unsort file
> real    0m1.388s
> 
> $ unsort --version
> unsort 1.1.2
> 
> $ time perl -e 'print sort { rand() <=> rand() } <>' file
> real    0m6.621s
> 
> $ time sort -R file
> real    4m8.403s
> 
> $ sort --version
> sort (GNU coreutils) 8.5
> 
> What is even scarier: sort without -R is faster than sort -R:
> 
> $ time sort file
> real    0m53.553s
> 
> I would expect sort -R to be faster than sort and faster than Perl if
> not as fast as unsort.

On my system, locale settings seem to impact the runtime significantly:

$ wc -l bigfile 
1000000 bigfile

$ time LC_ALL=en_US.utf8 sort -R bigfile > /dev/null

real	1m29.302s
user	1m21.009s
sys	0m0.155s

$ time LC_ALL=C sort -R bigfile > /dev/null

real	0m38.881s
user	0m35.276s
sys	0m0.118s


However, shuf is much faster, and seems mostly unaffected by the locale
used:

$ time shuf bigfile > /dev/null

real	0m1.044s
user	0m0.833s
sys	0m0.042s

-- 
D.




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Sun, 07 Aug 2011 20:44:01 GMT) Full text and rfc822 format available.

Notification sent to Ole Tange <tange <at> gnu.org>:
bug acknowledged by developer. (Sun, 07 Aug 2011 20:44:02 GMT) Full text and rfc822 format available.

Message #16 received at 7182-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Ole Tange <tange <at> gnu.org>
Cc: 7182-done <at> debbugs.gnu.org
Subject: Re: bug#7182: sort -R slow
Date: Sun, 07 Aug 2011 22:42:52 +0200
Davide Brini wrote:
> On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange <tange <at> gnu.org> wrote:
>
>> I recently needed to randomize some lines. So I tried using 'sort -R'.
>> I was astonished how slow that was. So I tested how slow a competing
>> strategies are. GNU sort is two magnitudes slower than unsort and more
>> than one magnitude slower than perl:
>>
>> $ time unsort file
>> real    0m1.388s
>>
>> $ unsort --version
>> unsort 1.1.2
>>
>> $ time perl -e 'print sort { rand() <=> rand() } <>' file
>> real    0m6.621s
>>
>> $ time sort -R file
>> real    4m8.403s
>>
>> $ sort --version
>> sort (GNU coreutils) 8.5
>>
>> What is even scarier: sort without -R is faster than sort -R:
>>
>> $ time sort file
>> real    0m53.553s
>>
>> I would expect sort -R to be faster than sort and faster than Perl if
>> not as fast as unsort.
>
> On my system, locale settings seem to impact the runtime significantly:
>
> $ wc -l bigfile
> 1000000 bigfile
>
> $ time LC_ALL=en_US.utf8 sort -R bigfile > /dev/null
>
> real	1m29.302s
> user	1m21.009s
> sys	0m0.155s
>
> $ time LC_ALL=C sort -R bigfile > /dev/null
>
> real	0m38.881s
> user	0m35.276s
> sys	0m0.118s
>
>
> However, shuf is much faster, and seems mostly unaffected by the locale
> used:
>
> $ time shuf bigfile > /dev/null
>
> real	0m1.044s
> user	0m0.833s
> sys	0m0.042s

Thanks for the report.
I think the performance of sort -R will often be worse
than that of shuf (by design, since it accesses each byte of each line
once more, to compute the hash), except when the input size is larger
than available memory.

The info documentation for sort -R does refer to "shuf".

Any suggestions for improvements are welcome.
I'm closing this.

You're welcome to reopen or file a new report.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 05 Sep 2011 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 349 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.