From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Rogier Wolff Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 24 Feb 2012 15:13:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 10877@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13300963673059 (code B ref -1); Fri, 24 Feb 2012 15:13:01 +0000 Received: (at submit) by debbugs.gnu.org; 24 Feb 2012 15:12:47 +0000 Received: from localhost ([127.0.0.1]:54246 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S0wpB-0000nF-U1 for submit@debbugs.gnu.org; Fri, 24 Feb 2012 10:12:47 -0500 Received: from eggs.gnu.org ([208.118.235.92]:49438) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S0wp9-0000mw-Ib for submit@debbugs.gnu.org; Fri, 24 Feb 2012 10:12:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S0wmS-0008V2-Dj for submit@debbugs.gnu.org; Fri, 24 Feb 2012 10:10:04 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.2 Received: from lists.gnu.org ([140.186.70.17]:37539) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S0wmS-0008Us-AN for submit@debbugs.gnu.org; Fri, 24 Feb 2012 10:09:56 -0500 Received: from eggs.gnu.org ([208.118.235.92]:55071) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S0wmM-0004zH-EF for bug-coreutils@gnu.org; Fri, 24 Feb 2012 10:09:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S0wmJ-0008TH-Kg for bug-coreutils@gnu.org; Fri, 24 Feb 2012 10:09:50 -0500 Received: from cust-95-128-94-82.breedbanddelft.nl ([95.128.94.82]:44801 helo=bitwizard.nl) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S0wmJ-0008T4-Er for bug-coreutils@gnu.org; Fri, 24 Feb 2012 10:09:47 -0500 Received: by abra2 (Postfix, from userid 1000) id 68C1FDFEB5; Fri, 24 Feb 2012 14:53:57 +0100 (CET) Date: Fri, 24 Feb 2012 14:53:57 +0100 From: Rogier Wolff Message-ID: <20120224135357.GA31568@bitwizard.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Organization: BitWizard.nl User-Agent: Mutt/1.5.20 (2009-06-14) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Hi, I understand the need for external temporary files. (I rewrote dos 3.3 sort for personal use when I got my first computer in 1987). However on my 8G machine sort imho is allowed to use say 200M or maybe even 2000M of core memory. If you write a temporary file every 20Mb, there will be such a whole lot of them. zebigbos:/> l /tmp/sort* | wc -l 163 zebigbos:/> free total used free shared buffers cached Mem: 8264484 8100420 164064 0 376756 6853884 -/+ buffers/cache: 869780 7394704 Swap: 0 0 0 zebigbos:/> It seems that I may be allowed to set this with the -S option. The manual does not mention a default. A more sensible default would be in order. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* The plan was simple, like my brother-in-law Phil. But unlike Phil, this plan just might work. From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 24 Feb 2012 17:31:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Rogier Wolff Cc: 10877@debbugs.gnu.org Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133010464716904 (code B ref 10877); Fri, 24 Feb 2012 17:31:03 +0000 Received: (at 10877) by debbugs.gnu.org; 24 Feb 2012 17:30:47 +0000 Received: from localhost ([127.0.0.1]:54419 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S0yyk-0004Oa-Ny for submit@debbugs.gnu.org; Fri, 24 Feb 2012 12:30:46 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:50973) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S0yyh-0004ON-Gb for 10877@debbugs.gnu.org; Fri, 24 Feb 2012 12:30:45 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id C2B1AA60003; Fri, 24 Feb 2012 09:28:02 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cQS8EWgJbmP3; Fri, 24 Feb 2012 09:28:02 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 759ECA60002; Fri, 24 Feb 2012 09:28:02 -0800 (PST) Message-ID: <4F47C8A2.9060103@cs.ucla.edu> Date: Fri, 24 Feb 2012 09:28:02 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 References: <20120224135357.GA31568@bitwizard.nl> In-Reply-To: <20120224135357.GA31568@bitwizard.nl> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) I think 'sort' by default uses 1/8 of physical memory. It sounds like that calculation isn't working on your host; if so, it'd be nice if you could dive into the code and see why. From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Rogier Wolff Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 24 Feb 2012 18:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 10877@debbugs.gnu.org, Rogier Wolff Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133010661823026 (code B ref 10877); Fri, 24 Feb 2012 18:04:01 +0000 Received: (at 10877) by debbugs.gnu.org; 24 Feb 2012 18:03:38 +0000 Received: from localhost ([127.0.0.1]:54449 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S0zUU-0005zH-NR for submit@debbugs.gnu.org; Fri, 24 Feb 2012 13:03:37 -0500 Received: from cust-95-128-94-82.breedbanddelft.nl ([95.128.94.82]:33926 helo=bitwizard.nl) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S0zUP-0005z2-NK for 10877@debbugs.gnu.org; Fri, 24 Feb 2012 13:03:32 -0500 Received: by abra2 (Postfix, from userid 1000) id 6A67EE00CF; Fri, 24 Feb 2012 19:00:48 +0100 (CET) Date: Fri, 24 Feb 2012 19:00:48 +0100 From: Rogier Wolff Message-ID: <20120224180048.GA3211@bitwizard.nl> References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F47C8A2.9060103@cs.ucla.edu> Organization: BitWizard.nl User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Score: -0.9 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.9 (/) On Fri, Feb 24, 2012 at 09:28:02AM -0800, Paul Eggert wrote: > I think 'sort' by default uses 1/8 of physical memory. > It sounds like that calculation isn't working on your host; > if so, it'd be nice if you could dive into the code and see why. OK. I'll look into it. The server where I observed the 20M files has 8G RAM, and is currently very busy with its primary task (serving files). The server runs 32bit userspace and an PAE kernel. I checked if I can reproduce the problem on my workstation that would be available for testing. My workstation has 8G ram and is fully 64-bit. (i.e. kernel + userspace). On my workstation I get 34M files.... So problem reproduced. It is using 30 times smaller files than what you predict it would. Analysis so far: -S 2% results in about 160Mbytes of core memory used, and temporary files of about 114Mbytes. My conclusion is that physmem_total() is working. After a bit of instrumentation I get: default sort size: avail=1946.183594M, total=8001.156250M, mem=1946.183594M size is now: 1946M after rlimit size is now: 1946M after margin size is now: 973M after rlimitrss size is now: 973M returning: 973M so also the function "default_sort_size" seems to be working. I have to leave now. More later. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* The plan was simple, like my brother-in-law Phil. But unlike Phil, this plan just might work. From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Rogier Wolff Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 25 Feb 2012 13:00:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 10877@debbugs.gnu.org, Rogier Wolff Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133017476526372 (code B ref 10877); Sat, 25 Feb 2012 13:00:02 +0000 Received: (at 10877) by debbugs.gnu.org; 25 Feb 2012 12:59:25 +0000 Received: from localhost ([127.0.0.1]:55262 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1HDg-0006rJ-So for submit@debbugs.gnu.org; Sat, 25 Feb 2012 07:59:25 -0500 Received: from cust-95-128-94-82.breedbanddelft.nl ([95.128.94.82]:58607 helo=bitwizard.nl) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1HDf-0006rB-2d for 10877@debbugs.gnu.org; Sat, 25 Feb 2012 07:59:24 -0500 Received: by abra2 (Postfix, from userid 1000) id 9B429DFEB5; Sat, 25 Feb 2012 13:56:42 +0100 (CET) Date: Sat, 25 Feb 2012 13:56:42 +0100 From: Rogier Wolff Message-ID: <20120225125642.GA10279@bitwizard.nl> References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F47C8A2.9060103@cs.ucla.edu> Organization: BitWizard.nl User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Score: -0.9 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.9 (/) On Fri, Feb 24, 2012 at 09:28:02AM -0800, Paul Eggert wrote: > I think 'sort' by default uses 1/8 of physical memory. > It sounds like that calculation isn't working on your host; > if so, it'd be nice if you could dive into the code and see why. OK. I've diven in to the code a bit and after some more instrumentation I get: assurancetourix:~/sort/sortproblem/coreutils-8.5/src> tail -n +10 du.unsorted | ./sort sort_buffer_size: bound=0M unknown filesize sortsize=0M unknown filesize guessing=1M default sort size: avail=217.234375M, total=8001.207031M, mem=1000.150879M size is now: 1000M after rlimit size is now: 1000M after margin size is now: 500M after rlimitrss size is now: 500M returning: 500M got size_bound=500M / file_size = 1M worstcase=49M (worst_case_per_input_byte = 49 returning size=49M The core of the problem is that when the filesize cannot be determined (as in this case because it's a pipe), it guesses the filesize to be 1M, and then uses "worstcase" calculations to come to "about 50M" for the buffer size. This happens in the function around line 1380 in sort.c I don't think that any guessing should be done if we cannot determine the filesize. In that case we have great heuristics to come up with a reasonable buffer size without the filesize. Similarly, there is a logic error in the code that determines the maximum memory to use: You said it was supposed to use 1/8th of total memory. However it then takes another factor of two "margin". I think the margin should not be applied to "total mem/8". (the quick fix is to just turn that "8" in the code into a "4".) Similarly, I think that it is reasonable to use "all available" memory. This "all available" is really free memory, so there should not be any risk in trying to use it all. Of course there are RSS limits. Those should apply, with the appropriate margin. Here is my suggestion for default_sort_size. I've left the instrumentation/debug in there. (I don't understand why SIZE_MAX/1024/1024 evaluates to -1 on my machine, but I don't really care. It seems to work....). /* Return the default sort size. */ static size_t default_sort_size (void) { /* structure of this function: * * We start with too big a number, and then "filter" it as we see * fit. */ double mem, total, avail; struct rlimit rlimit; size_t size; /* Let SIZE be MEM, but no more than the maximum object size or system resource limits. Avoid the MIN macro here, as it is not quite right when only one argument is floating point. Don't bother to check for values like RLIM_INFINITY since in practice they are not much less than SIZE_MAX. */ size = SIZE_MAX; if (getrlimit (RLIMIT_DATA, &rlimit) == 0 && rlimit.rlim_cur < size) size = rlimit.rlim_cur; fprintf (stderr, "after rlimit size is now: %dM\n", size/1024/1024); #ifdef RLIMIT_AS if (getrlimit (RLIMIT_AS, &rlimit) == 0 && rlimit.rlim_cur < size) size = rlimit.rlim_cur; #endif /* Leave a large safety margin for the above limits, as failure can occur when they are exceeded. */ size /= 2; fprintf (stderr, "after margin size is now: %dM\n", size/1024/1024); #ifdef RLIMIT_RSS /* Leave a 1/16 margin for RSS to leave room for code, stack, etc. Exceeding RSS is not fatal, but can be quite slow. */ if (getrlimit (RLIMIT_RSS, &rlimit) == 0 && rlimit.rlim_cur / 16 * 15 < size) size = rlimit.rlim_cur / 16 * 15; #endif fprintf (stderr, "after rlimitrss size is now: %dM\n", size/1024/1024); /* Let MEM be available memory or 1/8 of total memory, whichever is greater. */ avail = physmem_available (); total = physmem_total (); mem = MAX (avail, total / 8); fprintf (stderr, "default sort size: avail=%fM, total=%fM, mem=%fM\n", avail/1024/1024, total/1024/1024, mem/1024/1024); if (mem < size) size = mem; fprintf (stderr, "size is now: %dM\n", size/1024/1024); /* Here is the odd one out: If we've reduced the number too far, we'll increase it again here to the minimum value: MIN_SORT_SIZE */ if (MIN_SORT_SIZE > size) size = MIN_SORT_SIZE; fprintf (stderr, "returning: %dM\n", MAX (size, MIN_SORT_SIZE)/ 1024/1024); /* Use no less than the minimum. */ return size; } Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* The plan was simple, like my brother-in-law Phil. But unlike Phil, this plan just might work. From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 25 Feb 2012 15:03:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Rogier Wolff Cc: Paul Eggert , 10877@debbugs.gnu.org Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.13301821255939 (code B ref 10877); Sat, 25 Feb 2012 15:03:01 +0000 Received: (at 10877) by debbugs.gnu.org; 25 Feb 2012 15:02:05 +0000 Received: from localhost ([127.0.0.1]:56360 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1J8O-0001Xj-FD for submit@debbugs.gnu.org; Sat, 25 Feb 2012 10:02:05 -0500 Received: from mail2.vodafone.ie ([213.233.128.44]:60620) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1J8N-0001XP-8w for 10877@debbugs.gnu.org; Sat, 25 Feb 2012 10:02:04 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvcCACX2SE9tThHq/2dsb2JhbAAMN7IJA4N/AQEEMgFGEAsNARMWDwkDAgECAUUGDQEHAQG+dosUgi4LBgQDBAMIBAoHDwEKAwECAQIChQgDB4QWBJs8jG8 Received: from unknown (HELO [192.168.1.79]) ([109.78.17.234]) by mail2.vodafone.ie with ESMTP; 25 Feb 2012 14:59:21 +0000 Message-ID: <4F48F749.3040205@draigBrady.com> Date: Sat, 25 Feb 2012 14:59:21 +0000 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 MIME-Version: 1.0 References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> <20120225125642.GA10279@bitwizard.nl> In-Reply-To: <20120225125642.GA10279@bitwizard.nl> X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) A general note on this is to be wary of cache levels. It may be more efficient to sort in chunks of the cache level of the machine rather than as large a chunk as possible. Some very quick testing here (a while ago), shows -S2M -T/dev/shm was as fast or faster than big buffers, as well as being "nicer" to the rest of the system. cheers, Pádraig. p.s. ensure MALLOC_PERTURB_ is unset when benchmarking sort From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 25 Feb 2012 18:44:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Rogier Wolff Cc: 10877@debbugs.gnu.org Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133019543625293 (code B ref 10877); Sat, 25 Feb 2012 18:44:01 +0000 Received: (at 10877) by debbugs.gnu.org; 25 Feb 2012 18:43:56 +0000 Received: from localhost ([127.0.0.1]:56411 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1Mb6-0006Zu-8x for submit@debbugs.gnu.org; Sat, 25 Feb 2012 13:43:56 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:53130) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1Mb3-0006Zm-9p for 10877@debbugs.gnu.org; Sat, 25 Feb 2012 13:43:54 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id C1EDEA60002; Sat, 25 Feb 2012 10:41:11 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u0TPzUpJ8vPd; Sat, 25 Feb 2012 10:41:10 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 44D7039E800D; Sat, 25 Feb 2012 10:41:10 -0800 (PST) Message-ID: <4F492B41.9030705@cs.ucla.edu> Date: Sat, 25 Feb 2012 10:41:05 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> <20120225125642.GA10279@bitwizard.nl> In-Reply-To: <20120225125642.GA10279@bitwizard.nl> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 02/25/2012 04:56 AM, Rogier Wolff wrote: > there is a logic error in the code that determines the > maximum memory to use: You said it was supposed to use 1/8th of total > memory. However it then takes another factor of two "margin". Thanks for catching that. I installed a fix (patch at end of this message). > I don't think that any guessing should be done if we cannot determine > the filesize. In that case we have great heuristics to come up with a > reasonable buffer size without the filesize. A problem with that idea is, suppose we have many independent 'sort' invocations running at at the same time, as part of a shell pipeline say? If they each grab 1/8 of physical RAM, merely because they want to sort piped data of a few bytes, they may exhaust swap space. Perhaps we can improve the heuristics for pipes, but I hope you can see why I'm a bit leery of a heuristic that says "if the input is from a pipe, pretend it's from a file of infinite size". >From 28197ef851af8f7e4f5f98f4433090cbbd63fbac Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 25 Feb 2012 10:32:52 -0800 Subject: [PATCH] sort: default to physmem/8, not physmem/16 * src/sort.c (default_sort_size): Don't divide advice by 2. Just divide the hard limits by 2. This matches the comments. Reported by Rogier Wolff in http://bugs.gnu.org/10877 --- src/sort.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/sort.c b/src/sort.c index 6875a6a..60ff415 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1414,13 +1414,9 @@ default_sort_size (void) struct rlimit rlimit; /* Let SIZE be MEM, but no more than the maximum object size or - system resource limits. Avoid the MIN macro here, as it is not - quite right when only one argument is floating point. Don't - bother to check for values like RLIM_INFINITY since in practice - they are not much less than SIZE_MAX. */ + system resource limits. Don't bother to check for values like + RLIM_INFINITY since in practice they are not much less than SIZE_MAX. */ size_t size = SIZE_MAX; - if (mem < size) - size = mem; if (getrlimit (RLIMIT_DATA, &rlimit) == 0 && rlimit.rlim_cur < size) size = rlimit.rlim_cur; #ifdef RLIMIT_AS @@ -1439,7 +1435,11 @@ default_sort_size (void) size = rlimit.rlim_cur / 16 * 15; #endif - /* Use no less than the minimum. */ + /* Return the minimum of MEM and SIZE, but no less than + MIN_SORT_SIZE. Avoid the MIN macro here, as it is not quite + right when only one argument is floating point. */ + if (mem < size) + size = mem; return MAX (size, MIN_SORT_SIZE); } -- 1.7.6.5 From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 25 Feb 2012 18:45:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?P=C3=A1draig?= Brady Cc: 10877@debbugs.gnu.org, Rogier Wolff Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133019547725373 (code B ref 10877); Sat, 25 Feb 2012 18:45:01 +0000 Received: (at 10877) by debbugs.gnu.org; 25 Feb 2012 18:44:37 +0000 Received: from localhost ([127.0.0.1]:56415 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1Mbk-0006bB-Mu for submit@debbugs.gnu.org; Sat, 25 Feb 2012 13:44:36 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:53155) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1Mbi-0006b4-6I for 10877@debbugs.gnu.org; Sat, 25 Feb 2012 13:44:34 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 1AA1B39E800D; Sat, 25 Feb 2012 10:41:53 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ss+gNzYWhIan; Sat, 25 Feb 2012 10:41:52 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 9897139E800C; Sat, 25 Feb 2012 10:41:52 -0800 (PST) Message-ID: <4F492B70.2030506@cs.ucla.edu> Date: Sat, 25 Feb 2012 10:41:52 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> <20120225125642.GA10279@bitwizard.nl> <4F48F749.3040205@draigBrady.com> In-Reply-To: <4F48F749.3040205@draigBrady.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 02/25/2012 06:59 AM, P=C3=A1draig Brady wrote: > Some very quick testing here (a while ago), > shows -S2M -T/dev/shm was as fast or faster than > big buffers, as well as being "nicer" to the rest > of the system. I think this is a great idea for improving the performance of 'sort' in the future. It should work particularly well with multithreaded sort if we can assume that different cores have different caches. Admittedly getting at this info portably may be a pain. But perhaps I could assign that to my students next quarter.... From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Rogier Wolff Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sun, 26 Feb 2012 06:56:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?P=C3=A1draig?= Brady Cc: Paul Eggert , 10877@debbugs.gnu.org, Rogier Wolff Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133023935925660 (code B ref 10877); Sun, 26 Feb 2012 06:56:02 +0000 Received: (at 10877) by debbugs.gnu.org; 26 Feb 2012 06:55:59 +0000 Received: from localhost ([127.0.0.1]:56636 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1Y1W-0006fo-Eh for submit@debbugs.gnu.org; Sun, 26 Feb 2012 01:55:58 -0500 Received: from cust-95-128-94-82.breedbanddelft.nl ([95.128.94.82]:52270 helo=bitwizard.nl) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1Y1U-0006fg-EU for 10877@debbugs.gnu.org; Sun, 26 Feb 2012 01:55:57 -0500 Received: by abra2 (Postfix, from userid 1000) id 63286E03C4; Sun, 26 Feb 2012 07:53:11 +0100 (CET) Date: Sun, 26 Feb 2012 07:53:11 +0100 From: Rogier Wolff Message-ID: <20120226065311.GB23142@bitwizard.nl> References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> <20120225125642.GA10279@bitwizard.nl> <4F48F749.3040205@draigBrady.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <4F48F749.3040205@draigBrady.com> Organization: BitWizard.nl User-Agent: Mutt/1.5.20 (2009-06-14) Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.9 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.9 (/) On Sat, Feb 25, 2012 at 02:59:21PM +0000, P=E1draig Brady wrote: > Some very quick testing here (a while ago), shows -S2M -T/dev/shm > was as fast or faster than big buffers, as well as being "nicer" to > the rest of the system. If that is the case, I expect it is still way faster to sort chunks of the processor cache size, and then merge them in memory before writing out chunks of a gigabyte (in my case with 8G RAM) to disk. On the other hand. I'm not sure how efficient the 500-way merge is going to be. Roger.=20 --=20 ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 *= * ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 *= * *-- BitWizard writes Linux device drivers for any device you may have! --= * The plan was simple, like my brother-in-law Phil. But unlike Phil, this plan just might work. From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Rogier Wolff Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sun, 26 Feb 2012 07:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 10877@debbugs.gnu.org, Rogier Wolff Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133024065827523 (code B ref 10877); Sun, 26 Feb 2012 07:18:02 +0000 Received: (at 10877) by debbugs.gnu.org; 26 Feb 2012 07:17:38 +0000 Received: from localhost ([127.0.0.1]:56644 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1YMT-00079r-PC for submit@debbugs.gnu.org; Sun, 26 Feb 2012 02:17:38 -0500 Received: from cust-95-128-94-82.breedbanddelft.nl ([95.128.94.82]:52581 helo=bitwizard.nl) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1YMR-00079k-RQ for 10877@debbugs.gnu.org; Sun, 26 Feb 2012 02:17:36 -0500 Received: by abra2 (Postfix, from userid 1000) id A26AFE010C; Sun, 26 Feb 2012 08:14:51 +0100 (CET) Date: Sun, 26 Feb 2012 08:14:51 +0100 From: Rogier Wolff Message-ID: <20120226071451.GD23142@bitwizard.nl> References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> <20120225125642.GA10279@bitwizard.nl> <4F492B41.9030705@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F492B41.9030705@cs.ucla.edu> Organization: BitWizard.nl User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Score: -0.9 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.9 (/) On Sat, Feb 25, 2012 at 10:41:05AM -0800, Paul Eggert wrote: > A problem with that idea is, suppose we have many independent 'sort' > invocations running at at the same time, as part of a shell pipeline > say? If they each grab 1/8 of physical RAM, merely because they > want to sort piped data of a few bytes, they may exhaust swap space. Two things. First: Many modern operating systems do "lazy" allocation. With 8G RAM + 8G SWAP you can have hundreds of apps each allocating a gigabyte, as long as they don't all use that gigabyte. Second: If you're afraid for a lone operating system that DOES show this problem, then a slight change in the codebase might be in order for "unknown sort size". Then the "1/8th of memory" becomes an upper bound and the buffer should be dynamically expanded. If you do not have to "move" the data from the old buffer to a new buffer you can just allocate a 2x larger buffer each time you reach the limit (but are below the upper bound). If you DO have the expense of having to move all the data up to that point, I would suggest allocating a 4x larger buffer up to the upper bound. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* The plan was simple, like my brother-in-law Phil. But unlike Phil, this plan just might work. From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sun, 26 Feb 2012 07:29:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Rogier Wolff Cc: 10877@debbugs.gnu.org Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133024132428472 (code B ref 10877); Sun, 26 Feb 2012 07:29:01 +0000 Received: (at 10877) by debbugs.gnu.org; 26 Feb 2012 07:28:44 +0000 Received: from localhost ([127.0.0.1]:56653 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1YXE-0007PB-Es for submit@debbugs.gnu.org; Sun, 26 Feb 2012 02:28:44 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:42625) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1YXC-0007P3-5a for 10877@debbugs.gnu.org; Sun, 26 Feb 2012 02:28:43 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id A95E239E800D; Sat, 25 Feb 2012 23:25:57 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7W3o6iwa4V0G; Sat, 25 Feb 2012 23:25:57 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 53F1639E800C; Sat, 25 Feb 2012 23:25:57 -0800 (PST) Message-ID: <4F49DE89.8050506@cs.ucla.edu> Date: Sat, 25 Feb 2012 23:26:01 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> <20120225125642.GA10279@bitwizard.nl> <4F492B41.9030705@cs.ucla.edu> <20120226071451.GD23142@bitwizard.nl> In-Reply-To: <20120226071451.GD23142@bitwizard.nl> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 02/25/2012 11:14 PM, Rogier Wolff wrote: > Many modern operating systems do "lazy" allocation. Sure, that's an old trick. But this has its own problems: it can mean a process *thinks* it has memory allocated, but it doesn't *really* have the memory; which means when it tries to actually *use* its memory it can get killed. This is not a direction we want 'sort' to head. > a slight change in the codebase might be in order > for "unknown sort size". Sorry, I didn't follow the rest of that comment. Perhaps you could suggest a patch? That might explain things better. "diff -u" format is typically best. From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: Wimpy external files. Resent-From: Rogier Wolff Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sun, 26 Feb 2012 20:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 10877@debbugs.gnu.org Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.133028767716140 (code B ref 10877); Sun, 26 Feb 2012 20:22:01 +0000 Received: (at 10877) by debbugs.gnu.org; 26 Feb 2012 20:21:17 +0000 Received: from localhost ([127.0.0.1]:58567 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1kaq-0004CH-Tp for submit@debbugs.gnu.org; Sun, 26 Feb 2012 15:21:17 -0500 Received: from cust-95-128-94-82.breedbanddelft.nl ([95.128.94.82]:40919 helo=bitwizard.nl) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S1kao-0004C8-6Q for 10877@debbugs.gnu.org; Sun, 26 Feb 2012 15:21:15 -0500 Received: by abra2 (Postfix, from userid 1000) id C9A13E03C6; Sun, 26 Feb 2012 21:18:25 +0100 (CET) Date: Sun, 26 Feb 2012 21:18:25 +0100 From: Rogier Wolff Message-ID: <20120226201825.GA16411@bitwizard.nl> References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> <20120225125642.GA10279@bitwizard.nl> <4F492B41.9030705@cs.ucla.edu> <20120226071451.GD23142@bitwizard.nl> <4F49DE89.8050506@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F49DE89.8050506@cs.ucla.edu> Organization: BitWizard.nl User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Score: -0.9 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.9 (/) On Sat, Feb 25, 2012 at 11:26:01PM -0800, Paul Eggert wrote: > On 02/25/2012 11:14 PM, Rogier Wolff wrote: > > > Many modern operating systems do "lazy" allocation. > > Sure, that's an old trick. But this has its own problems: > it can mean a process *thinks* it has memory allocated, but it > doesn't *really* have the memory; which means when it tries to > actually *use* its memory it can get killed. This is not a direction > we want 'sort' to head. Hmm. Ok. > > a slight change in the codebase might be in order > > for "unknown sort size". > > Sorry, I didn't follow the rest of that comment. Perhaps you > could suggest a patch? That might explain things better. > "diff -u" format is typically best. This one is more work than 10 minutes. Before I put in the effort I would like to know if this is something that stands a chance... Maybe some peudocode helps explain: Currently there is a bufsize = .... buffer = malloc (bufsize); and then during the sorting something like: if (data_in_buffer + new_data_len > bufsize) { write_data_from buffer (); } I propose to make that: bufsize = .... ; // this returns a negative number to indicate it is a // wild guess, but an upper limit. if (bufsize < 0) { curbufsize = MINBUFSIZE; bufsize = -bufsize; } else { curbufsize = bufsize; } buffer = malloc (curbufsize); and then during the sorting: if (data_in_buffer + new_data_len > curbufsize) { curbufsize *= 2; if (curbufsize > bufsize) curbufsize = bufsize; buffer = realloc (buffer, curbufsize); if (data_in_buffer + new_data_len > curbufsize) { write_data_from buffer (); } write_data_from buffer (); } i.e. we determine an upper limit at "guessing time", and increase the memory buffer up to that limit when the small default buffer ends up being too small. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* The plan was simple, like my brother-in-law Phil. But unlike Phil, this plan just might work. From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 30 04:07:08 2012 Received: (at control) by debbugs.gnu.org; 30 Aug 2012 08:07:08 +0000 Received: from localhost ([127.0.0.1]:57030 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T6zmO-0004un-2q for submit@debbugs.gnu.org; Thu, 30 Aug 2012 04:07:08 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:33853) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T6zmL-0004uf-2I for control@debbugs.gnu.org; Thu, 30 Aug 2012 04:07:06 -0400 Received: from compute5.internal (compute5.nyi.mail.srv.osa [10.202.2.45]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id ED43A20DA5 for ; Thu, 30 Aug 2012 04:05:55 -0400 (EDT) Received: from web5.nyi.mail.srv.osa ([10.202.2.215]) by compute5.internal (MEProxy); Thu, 30 Aug 2012 04:05:55 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:mime-version :content-transfer-encoding:content-type:subject:date; s=smtpout; bh=g3x1YkP2vluBEiAMNfVG/RrTs8w=; b=TZifXzPLwB25W+uO3zOoS/hKp7Oe KhJauDa80k+NMovM0kYUSZt+0p/bpnOh0CKI1A233+hxUgr8wXVw3M1yaiA+SIjp 3lcXH7H2asp6p+g+dl6KDunycJuQGMWel+95+AZrDypuVZfWuzQ8lk/q3aAG3yXQ TT9u7m6/S5244xc= Received: by web5.nyi.mail.srv.osa (Postfix, from userid 99) id BD7B04C0212; Thu, 30 Aug 2012 04:05:55 -0400 (EDT) Message-Id: <1346313955.6848.140661121418585.20DF5764@webmail.messagingengine.com> X-Sasl-Enc: T2AqFlIWN6GQ1vxU9Q18H9U0y8luevpcLNcOSgz5PZxR 1346313955 From: era eriksson To: control@debbugs.gnu.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface Subject: Bug maintenance Date: Thu, 30 Aug 2012 11:05:55 +0300 X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) forcemerge 6366 10924 merge 7389 7394 tags 7394 + patch merge 7948 7963 7968 retitle 9140 [du] broken on OSX 10.7 (Lion) for >4TB file systems retitle 10003 [df] information differs from GUI retitle 10013 [ls] document origin of name tags 10013 + patch retitle 10054 [cp] 8.13: cp -au may replace newer files [sr #107876] retitle 10877 [sort] too eager to use temp files retitle 11760 [mv] data loss after ctrl-C on ntfs -> ntfs move thanks I also wanted to retitle 10900 but it's too opaque, and should be followed up. I took this out because I'm not sure I was able to summarize it correctly. retitle 10639 [cp] test-copy-acl fails on Solaris 64bit + NFS /* era */ -- If this were a real .signature, it would suck less. Well, maybe not. From unknown Mon Aug 18 17:53:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10877: [sort] too eager to use temp files (WAS: Wimpy external files) Resent-From: Assaf Gordon Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Mon, 15 Oct 2018 16:13:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10877 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: Cc: 10877@debbugs.gnu.org Received: via spool by 10877-submit@debbugs.gnu.org id=B10877.153961996713180 (code B ref 10877); Mon, 15 Oct 2018 16:13:02 +0000 Received: (at 10877) by debbugs.gnu.org; 15 Oct 2018 16:12:47 +0000 Received: from localhost ([127.0.0.1]:51006 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gC5U7-0003QR-53 for submit@debbugs.gnu.org; Mon, 15 Oct 2018 12:12:47 -0400 Received: from mail-pl1-f177.google.com ([209.85.214.177]:43743) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gC5U5-0003Q9-9L; Mon, 15 Oct 2018 12:12:45 -0400 Received: by mail-pl1-f177.google.com with SMTP id 30-v6so9523786plb.10; Mon, 15 Oct 2018 09:12:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:cc:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=CGp1A1aOla8Esc9Kn+thXN+Uqg6P6kipScEOLvML+k8=; b=ejXJazxvYBZcIpf8qWxx1msBiOHoOxQkApkXuLN9dtkqhirpr/EwrFLvUicsQcVLsV CpkLCzZ/NbhwHf1T6+GQaPA22zlUVFQSkLL3ZmLwSbZ2/soKjqirrCdMlZpU88VMWGcG U97s3F44zO9ZpBRgSp0JmYsoFTAj+D041wRYIEJa/cHNZVq16UrieG0FRjS7tAkNX5m6 V52lYLd/rOJlIk470toszIoOU9SnlNQnMWzok1LcKYPPl4W4cWF58LbUYsvRPuyZwpIL r8uQo6ROMIa6ilGhwe74FBrlcsh+rrYwMVbGU5xubfXDdBlogW0Z3T0OANrskVSUgYOq X28Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=CGp1A1aOla8Esc9Kn+thXN+Uqg6P6kipScEOLvML+k8=; b=RIEANfn6YbPRqI2DOrwOWtcbQvepQkhw6jrinluD0vBi6236+gg9aXHFBaEfUhoSxH jflGHOj3rB+upBkMHnnLV/h3rxHBjaGYyFS37I9Y+EXmC7jMi0GMCigplkcEGRDXTTuL uHjHAIUMinuKvjTDJZZdISfCAVcdFKlGT7wNd5R13uv2lRvdHWhYID8K0NDEwcBKkP0e 4Ik0FEzz+wTW0iFym1FNtwKuLSDrv800rsrRD9ARNIwN9uI+HIn9p8VkymXQBs9NCZVm kIXIKHf72NCzQbbZwDMzNNSbln500AMizAZg9miCFlDTV8ehROV5udbnGI3fKWCr+MS7 NaBQ== X-Gm-Message-State: ABuFfojxofGMnGqNxF92devmyu/8WrGe3MzXCqRfFHy+4huX/+5BeD4+ MTgr2WtpuEg7QOxYRv5OQTtJlQhcySQ= X-Google-Smtp-Source: ACcGV61oFTQwjIsyzvuTzF41vS2CY/WwqSi+i11o+OR9xWB0jl1R/1zHhxhV0kSjUtXVEH7WcO/DCQ== X-Received: by 2002:a17:902:24a5:: with SMTP id w34-v6mr17024286pla.73.1539619958632; Mon, 15 Oct 2018 09:12:38 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id q68-v6sm7937365pfb.160.2018.10.15.09.12.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Oct 2018 09:12:36 -0700 (PDT) References: <20120224135357.GA31568@bitwizard.nl> <4F47C8A2.9060103@cs.ucla.edu> <20120225125642.GA10279@bitwizard.nl> <4F492B41.9030705@cs.ucla.edu> <20120226071451.GD23142@bitwizard.nl> <4F49DE89.8050506@cs.ucla.edu> <20120226201825.GA16411@bitwizard.nl> From: Assaf Gordon Message-ID: Date: Mon, 15 Oct 2018 10:12:34 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20120226201825.GA16411@bitwizard.nl> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: tags 10877 fixed close 10877 stop (triaging old bugs) The original issue reported in this bug was fixed here: [...] Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.214.177 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.214.177 listed in wl.mailspike.net] 1.2 MISSING_HEADERS Missing To: header 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.2 (/) tags 10877 fixed close 10877 stop (triaging old bugs) The original issue reported in this bug was fixed here: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a507ed6ede5064b8f15c979e54e6de3bb478d73e Additional issues regarding sort/memory/temp files optimiazations have not been touched upon in 6 years. I'm closing this bug as "fixed". If there is interest, discussion can continue by replying to this thread. regards, - assaf