GNU bug report logs - #10877
[sort] too eager to use temp files

Previous Next

Package: coreutils;

Reported by: Rogier Wolff <R.E.Wolff <at> BitWizard.nl>

Date: Fri, 24 Feb 2012 15:13:01 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Rogier Wolff <R.E.Wolff <at> BitWizard.nl>
Cc: 10877 <at> debbugs.gnu.org
Subject: bug#10877: Wimpy external files.
Date: Sat, 25 Feb 2012 10:41:05 -0800
On 02/25/2012 04:56 AM, Rogier Wolff wrote:

> there is a logic error in the code that determines the
> maximum memory to use: You said it was supposed to use 1/8th of total
> memory. However it then takes another factor of two "margin".

Thanks for catching that.  I installed a fix (patch at end
of this message).

> I don't think that any guessing should be done if we cannot determine
> the filesize. In that case we have great heuristics to come up with a
> reasonable buffer size without the filesize.

A problem with that idea is, suppose we have many
independent 'sort' invocations running at at the same time,
as part of a shell pipeline say?  If they each grab 1/8 of
physical RAM, merely because they want to sort piped data of
a few bytes, they may exhaust swap space.

Perhaps we can improve the heuristics for pipes, but I hope
you can see why I'm a bit leery of a heuristic that says
"if the input is from a pipe, pretend it's from a file of
infinite size".

From 28197ef851af8f7e4f5f98f4433090cbbd63fbac Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Sat, 25 Feb 2012 10:32:52 -0800
Subject: [PATCH] sort: default to physmem/8, not physmem/16

* src/sort.c (default_sort_size): Don't divide advice by 2.
Just divide the hard limits by 2.  This matches the comments.
Reported by Rogier Wolff in http://bugs.gnu.org/10877
---
 src/sort.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/sort.c b/src/sort.c
index 6875a6a..60ff415 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -1414,13 +1414,9 @@ default_sort_size (void)
   struct rlimit rlimit;
 
   /* Let SIZE be MEM, but no more than the maximum object size or
-     system resource limits.  Avoid the MIN macro here, as it is not
-     quite right when only one argument is floating point.  Don't
-     bother to check for values like RLIM_INFINITY since in practice
-     they are not much less than SIZE_MAX.  */
+     system resource limits.  Don't bother to check for values like
+     RLIM_INFINITY since in practice they are not much less than SIZE_MAX.  */
   size_t size = SIZE_MAX;
-  if (mem < size)
-    size = mem;
   if (getrlimit (RLIMIT_DATA, &rlimit) == 0 && rlimit.rlim_cur < size)
     size = rlimit.rlim_cur;
 #ifdef RLIMIT_AS
@@ -1439,7 +1435,11 @@ default_sort_size (void)
     size = rlimit.rlim_cur / 16 * 15;
 #endif
 
-  /* Use no less than the minimum.  */
+  /* Return the minimum of MEM and SIZE, but no less than
+     MIN_SORT_SIZE.  Avoid the MIN macro here, as it is not quite
+     right when only one argument is floating point.  */
+  if (mem < size)
+    size = mem;
   return MAX (size, MIN_SORT_SIZE);
 }
 
-- 
1.7.6.5






This bug report was last modified 6 years and 280 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.