GNU bug report logs - #10877
[sort] too eager to use temp files

Previous Next

Package: coreutils;

Reported by: Rogier Wolff <R.E.Wolff <at> BitWizard.nl>

Date: Fri, 24 Feb 2012 15:13:01 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Rogier Wolff <R.E.Wolff <at> BitWizard.nl>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 10877 <at> debbugs.gnu.org
Subject: bug#10877: Wimpy external files.
Date: Sun, 26 Feb 2012 21:18:25 +0100
On Sat, Feb 25, 2012 at 11:26:01PM -0800, Paul Eggert wrote:
> On 02/25/2012 11:14 PM, Rogier Wolff wrote:
> 
> > Many modern operating systems do "lazy" allocation.
> 
> Sure, that's an old trick.  But this has its own problems:
> it can mean a process *thinks* it has memory allocated, but it
> doesn't *really* have the memory; which means when it tries to
> actually *use* its memory it can get killed.  This is not a direction
> we want 'sort' to head.

Hmm. Ok. 

> > a slight change in the codebase might be in order
> > for "unknown sort size".
> 
> Sorry, I didn't follow the rest of that comment.  Perhaps you
> could suggest a patch?  That might explain things better.
> "diff -u" format is typically best.

This one is more work than 10 minutes. Before I put in the effort I
would like to know if this is something that stands a chance...

Maybe some peudocode helps explain:

Currently there is a 

    bufsize = .... 
    buffer = malloc (bufsize); 

and then during the sorting something like: 

    if (data_in_buffer + new_data_len > bufsize) {
       write_data_from buffer (); 
    }

I propose to make that: 

    bufsize = .... ; // this returns a negative number to indicate it is a 
                     // wild guess, but an upper limit. 

    if (bufsize < 0) {
        curbufsize = MINBUFSIZE; 
	bufsize = -bufsize;
    } else {
	curbufsize = bufsize;
    }
    buffer = malloc (curbufsize); 

and then during the sorting: 

    if (data_in_buffer + new_data_len > curbufsize) {
       curbufsize *= 2; 
       if (curbufsize > bufsize) curbufsize = bufsize; 
       buffer = realloc (buffer, curbufsize);
       if (data_in_buffer + new_data_len > curbufsize) {
          write_data_from buffer (); 
       }
       write_data_from buffer (); 
    }

i.e. we determine an upper limit at "guessing time", and increase the
memory buffer up to that limit when the small default buffer ends up
being too small.

	Roger.

-- 
** R.E.Wolff <at> BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
The plan was simple, like my brother-in-law Phil. But unlike
Phil, this plan just might work.




This bug report was last modified 6 years and 280 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.