From debbugs-submit-bounces@debbugs.gnu.org Sat Oct 09 09:11:21 2010 Received: (at submit) by debbugs.gnu.org; 9 Oct 2010 13:11:21 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1P4ZCq-0004vz-TM for submit@debbugs.gnu.org; Sat, 09 Oct 2010 09:11:21 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1P4Ys1-0004n0-V3 for submit@debbugs.gnu.org; Sat, 09 Oct 2010 08:49:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P4YvD-0004T7-Fy for submit@debbugs.gnu.org; Sat, 09 Oct 2010 08:53:09 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,T_DKIM_INVALID,T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:58801) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P4YvD-0004T3-EB for submit@debbugs.gnu.org; Sat, 09 Oct 2010 08:53:07 -0400 Received: from [140.186.70.92] (port=55425 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P4YvC-0003xn-Ba for bug-coreutils@gnu.org; Sat, 09 Oct 2010 08:53:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P4YvB-0004Sb-D5 for bug-coreutils@gnu.org; Sat, 09 Oct 2010 08:53:06 -0400 Received: from mail-qy0-f176.google.com ([209.85.216.176]:43740) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P4YvB-0004SS-Aq for bug-coreutils@gnu.org; Sat, 09 Oct 2010 08:53:05 -0400 Received: by qyk29 with SMTP id 29so2339558qyk.0 for ; Sat, 09 Oct 2010 05:53:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:received:from:date :x-google-sender-auth:message-id:subject:to:content-type; bh=fbyhFg6wbm/ziVgnsL2rjXBo7l9CGNdm6r3zOYMu18Q=; b=R3VbEcha1oM/E62GkWSThTjgN8wlb8Qe4M4+psZBMHcVhdcDQJBt91ZodTBUQYkcU/ VgUqpS7CMCfHFGlQs2YfUMJ56KqeSTt91Ig52dPBEXdJTa79cuSxA+uRO4TRnCUuNF4s vW5+LCRTv4OYKFbDhwWUD/6Cv3Kph4y5hz+jE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type; b=LRiYCW1vTHVXqit5cjBll3RV0aun+FhwZ1sRbkuoJkavx8C3wrxRLVNvo/Tt8CYePr 5ssI0sj0h4DBnXFzQPZ3iOAzpn2nkcm4uaSJku/D/bSTMi1Iz7Onxo0a2HPiy3RgiZmG JbHYZSDB7XBgUD/VEk7MTFa+zIgoIN+qL3FDQ= Received: by 10.229.96.12 with SMTP id f12mr788756qcn.274.1286628783855; Sat, 09 Oct 2010 05:53:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.12.82 with HTTP; Sat, 9 Oct 2010 05:52:41 -0700 (PDT) From: Ole Tange Date: Sat, 9 Oct 2010 14:52:41 +0200 X-Google-Sender-Auth: fmN0_KoCKo39sHGG5ZubOGISvMI Message-ID: Subject: sort -R slow To: bug-coreutils@gnu.org Content-Type: text/plain; charset=ISO-8859-1 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 09 Oct 2010 09:11:19 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) I recently needed to randomize some lines. So I tried using 'sort -R'. I was astonished how slow that was. So I tested how slow a competing strategies are. GNU sort is two magnitudes slower than unsort and more than one magnitude slower than perl: $ time unsort file real 0m1.388s $ unsort --version unsort 1.1.2 $ time perl -e 'print sort { rand() <=> rand() } <>' file real 0m6.621s $ time sort -R file real 4m8.403s $ sort --version sort (GNU coreutils) 8.5 What is even scarier: sort without -R is faster than sort -R: $ time sort file real 0m53.553s I would expect sort -R to be faster than sort and faster than Perl if not as fast as unsort. /Ole From debbugs-submit-bounces@debbugs.gnu.org Sat Oct 09 15:17:42 2010 Received: (at 7182) by debbugs.gnu.org; 9 Oct 2010 19:17:42 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1P4evO-0007Sw-9u for submit@debbugs.gnu.org; Sat, 09 Oct 2010 15:17:42 -0400 Received: from c-67-162-90-113.hsd1.in.comcast.net ([67.162.90.113] helo=kosh.dhis.org) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1P4evM-0007Sn-OM for 7182@debbugs.gnu.org; Sat, 09 Oct 2010 15:17:41 -0400 Received: (qmail 4240 invoked by uid 1000); 9 Oct 2010 19:21:00 -0000 Message-ID: <20101009192100.4239.qmail@kosh.dhis.org> From: "Alan Curry" Subject: Re: bug#7182: sort -R slow To: tange@gnu.org (Ole Tange) Date: Sat, 9 Oct 2010 14:21:00 -0500 (GMT+5) In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 7182 Cc: 7182@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.0 (/) Ole Tange writes: > > I recently needed to randomize some lines. So I tried using 'sort -R'. > I was astonished how slow that was. So I tested how slow a competing > strategies are. GNU sort is two magnitudes slower than unsort and more > than one magnitude slower than perl: Never heard of "unsort". Why didn't you try shuf(1)? Also, your perl is not valid: > > $ time perl -e 'print sort { rand() <=> rand() } <>' file > real 0m6.621s That comparison function is not consistent (unless very lucky). > I would expect sort -R to be faster than sort and faster than Perl if > not as fast as unsort. How big is your test file? I expect sort(1) to be optimized for big jobs. I bet it would win the contest if you are shuffling a file that's bigger than available RAM. From debbugs-submit-bounces@debbugs.gnu.org Sat Oct 09 17:29:48 2010 Received: (at submit) by debbugs.gnu.org; 9 Oct 2010 21:29:48 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1P4gzD-0008MO-Kd for submit@debbugs.gnu.org; Sat, 09 Oct 2010 17:29:47 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1P4gzC-0008MI-4w for submit@debbugs.gnu.org; Sat, 09 Oct 2010 17:29:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P4h2O-00049R-BJ for submit@debbugs.gnu.org; Sat, 09 Oct 2010 17:33:06 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,T_RP_MATCHES_RCVD,T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:41151) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P4h2O-00049L-8h for submit@debbugs.gnu.org; Sat, 09 Oct 2010 17:33:04 -0400 Received: from [140.186.70.92] (port=60723 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P4h2N-0007Gx-0T for bug-coreutils@gnu.org; Sat, 09 Oct 2010 17:33:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P4grH-0003D2-1Z for bug-coreutils@gnu.org; Sat, 09 Oct 2010 17:21:36 -0400 Received: from mailout-eu.gmx.com ([213.165.64.42]:41030) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1P4grG-0003Ch-M9 for bug-coreutils@gnu.org; Sat, 09 Oct 2010 17:21:35 -0400 Received: (qmail invoked by alias); 09 Oct 2010 21:21:31 -0000 Received: from hex.aaisp.net.uk (EHLO scooter.muppet.show) [90.155.53.9] by mail.gmx.com (mp-eu005) with SMTP; 09 Oct 2010 23:21:31 +0200 X-Authenticated: #48875277 X-Provags-ID: V01U2FsdGVkX18wYadvtiIwGd9gDGpsXxaQhaa46e/VBoOMT2Zf4p JNJ5x3xm2cbIZO Date: Sat, 9 Oct 2010 22:06:28 +0100 From: Davide Brini To: bug-coreutils@gnu.org Subject: Re: bug#7182: sort -R slow Message-ID: <20101009220628.10381f04@scooter.muppet.show> In-Reply-To: References: X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -5.7 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange wrote: > I recently needed to randomize some lines. So I tried using 'sort -R'. > I was astonished how slow that was. So I tested how slow a competing > strategies are. GNU sort is two magnitudes slower than unsort and more > than one magnitude slower than perl: > > $ time unsort file > real 0m1.388s > > $ unsort --version > unsort 1.1.2 > > $ time perl -e 'print sort { rand() <=> rand() } <>' file > real 0m6.621s > > $ time sort -R file > real 4m8.403s > > $ sort --version > sort (GNU coreutils) 8.5 > > What is even scarier: sort without -R is faster than sort -R: > > $ time sort file > real 0m53.553s > > I would expect sort -R to be faster than sort and faster than Perl if > not as fast as unsort. On my system, locale settings seem to impact the runtime significantly: $ wc -l bigfile 1000000 bigfile $ time LC_ALL=en_US.utf8 sort -R bigfile > /dev/null real 1m29.302s user 1m21.009s sys 0m0.155s $ time LC_ALL=C sort -R bigfile > /dev/null real 0m38.881s user 0m35.276s sys 0m0.118s However, shuf is much faster, and seems mostly unaffected by the locale used: $ time shuf bigfile > /dev/null real 0m1.044s user 0m0.833s sys 0m0.042s -- D. From debbugs-submit-bounces@debbugs.gnu.org Sun Aug 07 16:43:54 2011 Received: (at 7182-done) by debbugs.gnu.org; 7 Aug 2011 20:43:54 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QqACQ-0002mR-9I for submit@debbugs.gnu.org; Sun, 07 Aug 2011 16:43:54 -0400 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QqACO-0002mK-4U for 7182-done@debbugs.gnu.org; Sun, 07 Aug 2011 16:43:53 -0400 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id E97FF60098; Sun, 7 Aug 2011 22:42:52 +0200 (CEST) From: Jim Meyering To: Ole Tange Subject: Re: bug#7182: sort -R slow In-Reply-To: <20101009220628.10381f04@scooter.muppet.show> (Davide Brini's message of "Sat, 9 Oct 2010 22:06:28 +0100") References: <20101009220628.10381f04@scooter.muppet.show> Date: Sun, 07 Aug 2011 22:42:52 +0200 Message-ID: <87hb5teywj.fsf@rho.meyering.net> Lines: 70 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -6.1 (------) X-Debbugs-Envelope-To: 7182-done Cc: 7182-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.1 (------) Davide Brini wrote: > On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange wrote: > >> I recently needed to randomize some lines. So I tried using 'sort -R'. >> I was astonished how slow that was. So I tested how slow a competing >> strategies are. GNU sort is two magnitudes slower than unsort and more >> than one magnitude slower than perl: >> >> $ time unsort file >> real 0m1.388s >> >> $ unsort --version >> unsort 1.1.2 >> >> $ time perl -e 'print sort { rand() <=> rand() } <>' file >> real 0m6.621s >> >> $ time sort -R file >> real 4m8.403s >> >> $ sort --version >> sort (GNU coreutils) 8.5 >> >> What is even scarier: sort without -R is faster than sort -R: >> >> $ time sort file >> real 0m53.553s >> >> I would expect sort -R to be faster than sort and faster than Perl if >> not as fast as unsort. > > On my system, locale settings seem to impact the runtime significantly: > > $ wc -l bigfile > 1000000 bigfile > > $ time LC_ALL=en_US.utf8 sort -R bigfile > /dev/null > > real 1m29.302s > user 1m21.009s > sys 0m0.155s > > $ time LC_ALL=C sort -R bigfile > /dev/null > > real 0m38.881s > user 0m35.276s > sys 0m0.118s > > > However, shuf is much faster, and seems mostly unaffected by the locale > used: > > $ time shuf bigfile > /dev/null > > real 0m1.044s > user 0m0.833s > sys 0m0.042s Thanks for the report. I think the performance of sort -R will often be worse than that of shuf (by design, since it accesses each byte of each line once more, to compute the hash), except when the input size is larger than available memory. The info documentation for sort -R does refer to "shuf". Any suggestions for improvements are welcome. I'm closing this. You're welcome to reopen or file a new report. From unknown Sat Aug 16 21:12:49 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 05 Sep 2011 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator