GNU bug report logs - #23113
parallel gzip processes trash hard disks, need larger buffers

Previous Next

Package: gzip;

Reported by: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>

Date: Fri, 25 Mar 2016 18:16:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log

Message #14 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: John Reiser <vendor <at> bitwagon.com>
To: 23113 <at> debbugs.gnu.org
Subject: alternatives: parallel gzip processes trash hard disks
Date: Sat, 2 Apr 2016 21:43:50 -0700

Here are some other approaches which may help:

1. Use gzopen() from zlib to compress the 10GB file as it is generated.
This uses only one CPU core and requires sequential writing only
(no random writes) but that may be enough in some cases.

2. The output from gzip is written 32KiB at at time, so a large output file
involves growing the file many times.  Thus buffering the output from gzip
into larger blocks may help, too.  Try:
	gzip ...  |  dd obs=... of=...

3. Similarly, dd can buffer the input to gzip:
	dd if=... ibs=... obs=...  |  gzip ...

4. dd can also be used to create multiple streams of input
from a single file:
	(dd if=file ibs=... skip=0*N count=N obs=...  |  gzip ... ) &
	(dd if=file ibs=... skip=1*N count=N obs=...  |  gzip ... ) &
	(dd if=file ibs=... skip=2*N count=N obs=...  |  gzip ... ) &
	(dd if=file ibs=... skip=3*N count=N obs=...  |  gzip ... ) &
However dd does not perform arithmetic, so the multiplication j*N
must be given as a literal result.

The dd utility program is quite versatile!

This bug report was last modified 9 years and 42 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #23113 parallel gzip processes trash hard disks, need larger buffers

GNU bug report logs - #23113
parallel gzip processes trash hard disks, need larger buffers