GNU bug report logs -
#14752
sort fails to fork() + execlp(compress_program) if overcommit limit is reached
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#14752: sort fails to fork() + execlp(compress_program) if overcommit limit is reached
which was filed against the coreutils package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 14752 <at> debbugs.gnu.org.
--
14752: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14752
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
tag 14752 notabug
close 14752
stop
On 06/30/2013 03:42 AM, Petros Aggelatos wrote:
> I was trying to sort a big file (22GB, 5GB gzipped) with `sort
> --compress-program=gzip -S40% data`. My /tmp filesystem is a 6GB tmpfs
> and the total system RAM is 16GB. The problem was that after a while
> sort would write uncompressed temp files in /tmp causing it to fill up
> and then crash for having no free space.
Thanks for reporting this. However, I think that your system's memory
is just too small for sorting that file (that way, see below).
You already recognized yourself that sort(1) was writing huge chunk files
into the /tmp directory which is a tmpfs file system, i.e., that all that
data is decreasing the memory available for running processes.
The overhead for spawning a new process is negligible compared to such
an amount of data.
In such a case, you're much better off telling sort(1) to use a different
directory for the temporary files.
Here's an excerpt from the texinfo manual
(info coreutils 'sort invocation'):
If the environment variable `TMPDIR' is set, `sort' uses its value
as the directory for temporary files instead of `/tmp'. The
`--temporary-directory' (`-T') option in turn overrides the environment
variable.
...
`-T TEMPDIR'
`--temporary-directory=TEMPDIR'
Use directory TEMPDIR to store temporary files, overriding the
`TMPDIR' environment variable. If this option is given more than
once, temporary files are stored in all the directories given. If
you have a large sort or merge that is I/O-bound, you can often
improve performance by using this option to specify directories on
different disks and controllers.
Have a nice day,
Berny
[Message part 3 (message/rfc822, inline)]
I was trying to sort a big file (22GB, 5GB gzipped) with `sort
--compress-program=gzip -S40% data`. My /tmp filesystem is a 6GB tmpfs
and the total system RAM is 16GB. The problem was that after a while
sort would write uncompressed temp files in /tmp causing it to fill up
and then crash for having no free space.
After messing around with sort.c I found out that fork() would fail
after the first batches were written successfully in /tmp with errno
ENOMEM. I found out that I was hitting my kernel's VM overcommit limit
as the tmpfs utilisation grew. And indeed, running `sysctl -w
vm.overcommit=1` fixed the problem and the file was sorted.
As sort is a program with high memory consumption and could be running
in environments where overcommit is either unavailable or disabled I
would expect it to use another method for spawning its
compress_program. I'm not a C developer, so I'm not sure this is
possible, but vfork() and clone() that seem to solve this problem.
--
Petros Angelatos
This bug report was last modified 9 years and 162 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.