GNU bug report logs - #23113
parallel gzip processes trash hard disks, need larger buffers

Previous Next

Package: gzip;

Reported by: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>

Date: Fri, 25 Mar 2016 18:16:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23113 in the body.
You can then email your comments to 23113 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Fri, 25 Mar 2016 18:16:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Fri, 25 Mar 2016 18:16:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>
To: "bug-gzip <at> gnu.org" <bug-gzip <at> gnu.org>
Subject: parallel gzip processes trash hard disks, need larger buffers
Date: Fri, 25 Mar 2016 16:57:12 +0000
[Message part 1 (text/plain, inline)]
Hi there,

I am using gzip 1.6 to compress large files >10 GiB in parallel (Kubuntu 14.04, 12 cores). The underlying disk system (RAID 10) is able to deliver read speeds >1 GB/s (measured with flushed file caches, iostat -mx 1 100).

Here are some numbers when running gzip in parallel:
1 gzip process: the CPU is the bottleneck in compressing things and utilisation is 100%.
2 gzips in parallel: the disk throughput drops to a meagre 70MB/s and the CPU utilisation per process is at ~60%.
6 gzips in parallel: the disk throughput fluctuates between 50 and 60 MB/s and the CPU utilisation per process is at ~18-20%.

Running 6 gzips in parallel on the same data residing on a SSD: 100% CPU utilisation per process

Googling a bit I found this thread on SuperUser where someone saw the same behaviour already with a single disk doing normally 125 MB/s and running 4 gzips drops it to 25 MB/s:
http://superuser.com/questions/599329/why-is-gzip-slow-despite-cpu-and-hard-drive-performance-not-being-maxed-out

The posts there propose a workaround like this:
  buffer -s 100000 -m 10000000 -p 100 < bigfile.dat | gzip > bigfile.dat.gz

And indeed, using "buffer" resolves trashing problems when working on a disk system. However, using "buffer" is pretty arcane (it isn't even installed per default on most Unix/Linux installations) and pretty counterintuitive.

Would it be possible to have bigger buffers by default (1 MB? 10 MB?) or have an automatism in gzip like "if file to compress >10 MB and free RAM >500MB, setup the file buffer to use 1 (10?) MB" ?

Alternatively, a command line option to manually set the buffer size?

Best,
  Bastien

--
DSM Nutritional Products Microbia Inc | Bioinformatics
60 Westview Street | Lexington, MA 02421 | United States
Phone +1 781 259 7613 | Fax +1 781 259 0615


________________________________

DISCLAIMER:
This e-mail is for the intended recipient only.
If you have received it by mistake please let us know by reply and then delete it from your system; access, disclosure, copying, distribution or reliance on any of it by anyone else is prohibited.
If you as intended recipient have received this e-mail incorrectly, please notify the sender (via e-mail) immediately.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Sun, 27 Mar 2016 04:18:01 GMT) Full text and rfc822 format available.

Message #8 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>
Cc: 23113 <at> debbugs.gnu.org
Subject: Re: bug#23113: parallel gzip processes trash hard disks,
 need larger buffers
Date: Sat, 26 Mar 2016 21:17:11 -0700
On Fri, Mar 25, 2016 at 9:57 AM, Chevreux, Bastien
<bastien.chevreux <at> dsm.com> wrote:
> Hi there,
>
> I am using gzip 1.6 to compress large files >10 GiB in parallel (Kubuntu 14.04, 12 cores). The underlying disk system (RAID 10) is able to deliver read speeds >1 GB/s (measured with flushed file caches, iostat -mx 1 100).
>
> Here are some numbers when running gzip in parallel:
> 1 gzip process: the CPU is the bottleneck in compressing things and utilisation is 100%.
> 2 gzips in parallel: the disk throughput drops to a meagre 70MB/s and the CPU utilisation per process is at ~60%.
> 6 gzips in parallel: the disk throughput fluctuates between 50 and 60 MB/s and the CPU utilisation per process is at ~18-20%.
>
> Running 6 gzips in parallel on the same data residing on a SSD: 100% CPU utilisation per process
>
> Googling a bit I found this thread on SuperUser where someone saw the same behaviour already with a single disk doing normally 125 MB/s and running 4 gzips drops it to 25 MB/s:
> http://superuser.com/questions/599329/why-is-gzip-slow-despite-cpu-and-hard-drive-performance-not-being-maxed-out
>
> The posts there propose a workaround like this:
>   buffer -s 100000 -m 10000000 -p 100 < bigfile.dat | gzip > bigfile.dat.gz
>
> And indeed, using "buffer" resolves trashing problems when working on a disk system. However, using "buffer" is pretty arcane (it isn't even installed per default on most Unix/Linux installations) and pretty counterintuitive.
>
> Would it be possible to have bigger buffers by default (1 MB? 10 MB?) or have an automatism in gzip like "if file to compress >10 MB and free RAM >500MB, setup the file buffer to use 1 (10?) MB" ?
>
> Alternatively, a command line option to manually set the buffer size?

Thanks for the report and suggestions.
However, I suggest that you consider using xz in place of gzip.
Not only can it compress better, it also works faster for comparable
compression ratios.

That said, if you find that setting gzip.h's INBUFSIZ or OUTBUFSIZ to
larger values makes a significant difference, we'd like to hear about
the results and how you measured.




Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Tue, 29 Mar 2016 23:04:01 GMT) Full text and rfc822 format available.

Message #11 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: "23113 <at> debbugs.gnu.org" <23113 <at> debbugs.gnu.org>
Subject: RE: bug#23113: parallel gzip processes trash hard disks, need larger
 buffers
Date: Tue, 29 Mar 2016 23:03:44 +0000
> From: meyering <at> gmail.com [mailto:meyering <at> gmail.com] On Behalf Of Jim Meyering
> [...]
> However, I suggest that you consider using xz in place of gzip.
> Not only can it compress better, it also works faster for comparable compression ratios.

xz is not a viable alternative in this case: use case is not archiving. There is a plethora of programs out there with zlib support compiled in and these won't work on xz packed data. Furthermore, gzip -1 is approximately 4 times faster than xz -1 on FASTQ files (sequencing data), and the use case here is "temporary results, so ok-ish compression in a comparatively short amount of time". Gzip is ideal in that respect as even at -1 it compresses down to ~25-35% ... and that already helps a lot when you do not need 1 TiB of hard disk but only ~350 GiB. Gzip -1 takes ~4.5 hrs, xz -1 almost a day.

> That said, if you find that setting gzip.h's INBUFSIZ or OUTBUFSIZ to larger values makes a significant difference, we'd like to hear about the results and how you measured.

Changing the INBUFSIZ did not have the effect hoped for as this is just the buffer size allocated by gzip ... but in the end it uses only 64k at most  and the calls to the file system read() even end up to request only 32k per call.

I traced this down through multiple layers to the function fill_window() in deflate.c, where things get really intricate using multiple pre-set variables, defines and memcpy()s. It became clear that the code is geared towards using a 64k buffer with a rolling window of 32k. Optimised for 16 bit machines that is.

There are a few mentions of SMALL_MEM, MEDIUM_MEM and BIG_MEM variants via defines. However, code comments say that BIG_MEM would work on a complete file loaded in memory ... which is a no-go for files in the area of 15 to 30 GiB. I'm not even sure the code would be doing what the comments say.

Long story short: I do not feel expert enough to touch said functions and change them to provide for larger input buffering. If I were forced to implement something I'd try it with an outer buffering layer, but I'm not sure it would be elegant or even efficient.

Best,
  Bastien

PS: then again I'm toying with the idea to write a simple gzip-packer replacement which simply buffers data and passes it to zlib.

--
DSM Nutritional Products Microbia Inc | Bioinformatics
60 Westview Street | Lexington, MA 02421 | United States
Phone +1 781 259 7613 | Fax +1 781 259 0615


________________________________

DISCLAIMER:
This e-mail is for the intended recipient only.
If you have received it by mistake please let us know by reply and then delete it from your system; access, disclosure, copying, distribution or reliance on any of it by anyone else is prohibited.
If you as intended recipient have received this e-mail incorrectly, please notify the sender (via e-mail) immediately.

Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Sun, 03 Apr 2016 15:57:02 GMT) Full text and rfc822 format available.

Message #14 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: John Reiser <vendor <at> bitwagon.com>
To: 23113 <at> debbugs.gnu.org
Subject: alternatives: parallel gzip processes trash hard disks
Date: Sat, 2 Apr 2016 21:43:50 -0700
Here are some other approaches which may help:

1. Use gzopen() from zlib to compress the 10GB file as it is generated.
This uses only one CPU core and requires sequential writing only
(no random writes) but that may be enough in some cases.

2. The output from gzip is written 32KiB at at time, so a large output file
involves growing the file many times.  Thus buffering the output from gzip
into larger blocks may help, too.  Try:
	gzip ...  |  dd obs=... of=...

3. Similarly, dd can buffer the input to gzip:
	dd if=... ibs=... obs=...  |  gzip ...

4. dd can also be used to create multiple streams of input
from a single file:
	(dd if=file ibs=... skip=0*N count=N obs=...  |  gzip ... ) &
	(dd if=file ibs=... skip=1*N count=N obs=...  |  gzip ... ) &
	(dd if=file ibs=... skip=2*N count=N obs=...  |  gzip ... ) &
	(dd if=file ibs=... skip=3*N count=N obs=...  |  gzip ... ) &
However dd does not perform arithmetic, so the multiplication j*N
must be given as a literal result.

The dd utility program is quite versatile!





Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Sun, 10 Apr 2016 07:50:02 GMT) Full text and rfc822 format available.

Message #17 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: Mark Adler <madler <at> alumni.caltech.edu>
To: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>
Cc: Jim Meyering <jim <at> meyering.net>,
 "23113 <at> debbugs.gnu.org" <23113 <at> debbugs.gnu.org>
Subject: Re: bug#23113: parallel gzip processes trash hard disks,
 need larger buffers
Date: Sun, 10 Apr 2016 00:49:17 -0700
Bastien,

pigz (a parallel version of gzip) has a variable buffer size. The -b or --blocksize option allows up to 512 MB buffers, defaulting to 128K. See http://zlib.net/pigz/

Mark


> On Mar 29, 2016, at 4:03 PM, Chevreux, Bastien <bastien.chevreux <at> dsm.com> wrote:
> 
>> From: meyering <at> gmail.com [mailto:meyering <at> gmail.com] On Behalf Of Jim Meyering
>> [...]
>> However, I suggest that you consider using xz in place of gzip.
>> Not only can it compress better, it also works faster for comparable compression ratios.
> 
> xz is not a viable alternative in this case: use case is not archiving. There is a plethora of programs out there with zlib support compiled in and these won't work on xz packed data. Furthermore, gzip -1 is approximately 4 times faster than xz -1 on FASTQ files (sequencing data), and the use case here is "temporary results, so ok-ish compression in a comparatively short amount of time". Gzip is ideal in that respect as even at -1 it compresses down to ~25-35% ... and that already helps a lot when you do not need 1 TiB of hard disk but only ~350 GiB. Gzip -1 takes ~4.5 hrs, xz -1 almost a day.
> 
>> That said, if you find that setting gzip.h's INBUFSIZ or OUTBUFSIZ to larger values makes a significant difference, we'd like to hear about the results and how you measured.
> 
> Changing the INBUFSIZ did not have the effect hoped for as this is just the buffer size allocated by gzip ... but in the end it uses only 64k at most  and the calls to the file system read() even end up to request only 32k per call.
> 
> I traced this down through multiple layers to the function fill_window() in deflate.c, where things get really intricate using multiple pre-set variables, defines and memcpy()s. It became clear that the code is geared towards using a 64k buffer with a rolling window of 32k. Optimised for 16 bit machines that is.
> 
> There are a few mentions of SMALL_MEM, MEDIUM_MEM and BIG_MEM variants via defines. However, code comments say that BIG_MEM would work on a complete file loaded in memory ... which is a no-go for files in the area of 15 to 30 GiB. I'm not even sure the code would be doing what the comments say.
> 
> Long story short: I do not feel expert enough to touch said functions and change them to provide for larger input buffering. If I were forced to implement something I'd try it with an outer buffering layer, but I'm not sure it would be elegant or even efficient.
> 
> Best,
>  Bastien
> 
> PS: then again I'm toying with the idea to write a simple gzip-packer replacement which simply buffers data and passes it to zlib.
> 
> --
> DSM Nutritional Products Microbia Inc | Bioinformatics
> 60 Westview Street | Lexington, MA 02421 | United States
> Phone +1 781 259 7613 | Fax +1 781 259 0615
> 
> 
> ________________________________
> 
> DISCLAIMER:
> This e-mail is for the intended recipient only.
> If you have received it by mistake please let us know by reply and then delete it from your system; access, disclosure, copying, distribution or reliance on any of it by anyone else is prohibited.
> If you as intended recipient have received this e-mail incorrectly, please notify the sender (via e-mail) immediately.





Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Tue, 12 Apr 2016 02:31:02 GMT) Full text and rfc822 format available.

Message #20 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Mark Adler <madler <at> alumni.caltech.edu>
Cc: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>,
 "23113 <at> debbugs.gnu.org" <23113 <at> debbugs.gnu.org>
Subject: Re: bug#23113: parallel gzip processes trash hard disks,
 need larger buffers
Date: Mon, 11 Apr 2016 19:30:04 -0700
On Sun, Apr 10, 2016 at 12:49 AM, Mark Adler <madler <at> alumni.caltech.edu> wrote:
> Bastien,
>
> pigz (a parallel version of gzip) has a variable buffer size. The -b or --blocksize option allows up to 512 MB buffers, defaulting to 128K. See http://zlib.net/pigz/

Thanks for the reminder about pigz, Mark.
This is yet another reason to consider gzip is in maintenance-only mode,
i.e., the barrier to adding new features is even higher, given that
pigz is so compatible, yet with added features and the benefit of
a modern codebase.




Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Tue, 12 Apr 2016 04:56:01 GMT) Full text and rfc822 format available.

Message #23 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Mark Adler <madler <at> alumni.caltech.edu>
Cc: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>,
 "23113 <at> debbugs.gnu.org" <23113 <at> debbugs.gnu.org>
Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger
 buffers
Date: Mon, 11 Apr 2016 21:55:04 -0700
Jim Meyering wrote:
> Thanks for the reminder about pigz, Mark.
> This is yet another reason to consider gzip is in maintenance-only mode,
> i.e., the barrier to adding new features is even higher, given that
> pigz is so compatible, yet with added features and the benefit of
> a modern codebase.

It'd be nice if we could migrate GNU gzip into merely being a front-end for pigz 
somehow.




Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Tue, 12 Apr 2016 16:56:01 GMT) Full text and rfc822 format available.

Message #26 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>
To: Mark Adler <madler <at> alumni.caltech.edu>
Cc: Jim Meyering <jim <at> meyering.net>,
 "23113 <at> debbugs.gnu.org" <23113 <at> debbugs.gnu.org>
Subject: RE: bug#23113: parallel gzip processes trash hard disks, need larger
 buffers
Date: Tue, 12 Apr 2016 16:55:30 +0000
Mark,

I knew about pigz, albeit not about -b, thank you for that. Together with -p 1 that would replicate gzip and implement input buffering well enough to be used in parallel pipelines (where you do not want, e.g., 40 pipelines running 40 pigz with 40 threads each).

Questions: how stable / error proof is pigz compared to gzip? I always shied away from it as gzip is so much tried and tested that errors are unlikely ... and the zlib.net homepage does not make an "official" statement like "you should all now move to pigz, it's good and tested enough." Additional question: is there a pigzlib planned? :-)

Jim, Paul: I'd say that this thread/bug can be closed if pigz proves to be as stable / error free as gzip. I suppose that while backporting -b to gzip could be done, it would not make much sense.

Best,
  Bastien

-- 
DSM Nutritional Products Microbia Inc | Bioinformatics
60 Westview Street | Lexington, MA 02421 | United States
Phone +1 781 259 7613 | Fax +1 781 259 0615

-----Original Message-----
From: Mark Adler [mailto:madler <at> alumni.caltech.edu] 
Sent: Sonntag, 10. April 2016 03:49
To: Chevreux, Bastien
Cc: Jim Meyering; 23113 <at> debbugs.gnu.org
Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger buffers

Bastien,

pigz (a parallel version of gzip) has a variable buffer size. The -b or --blocksize option allows up to 512 MB buffers, defaulting to 128K. See http://zlib.net/pigz/

Mark


> On Mar 29, 2016, at 4:03 PM, Chevreux, Bastien <bastien.chevreux <at> dsm.com> wrote:
> 
>> From: meyering <at> gmail.com [mailto:meyering <at> gmail.com] On Behalf Of Jim 
>> Meyering [...] However, I suggest that you consider using xz in place 
>> of gzip.
>> Not only can it compress better, it also works faster for comparable compression ratios.
> 
> xz is not a viable alternative in this case: use case is not archiving. There is a plethora of programs out there with zlib support compiled in and these won't work on xz packed data. Furthermore, gzip -1 is approximately 4 times faster than xz -1 on FASTQ files (sequencing data), and the use case here is "temporary results, so ok-ish compression in a comparatively short amount of time". Gzip is ideal in that respect as even at -1 it compresses down to ~25-35% ... and that already helps a lot when you do not need 1 TiB of hard disk but only ~350 GiB. Gzip -1 takes ~4.5 hrs, xz -1 almost a day.
> 
>> That said, if you find that setting gzip.h's INBUFSIZ or OUTBUFSIZ to larger values makes a significant difference, we'd like to hear about the results and how you measured.
> 
> Changing the INBUFSIZ did not have the effect hoped for as this is just the buffer size allocated by gzip ... but in the end it uses only 64k at most  and the calls to the file system read() even end up to request only 32k per call.
> 
> I traced this down through multiple layers to the function fill_window() in deflate.c, where things get really intricate using multiple pre-set variables, defines and memcpy()s. It became clear that the code is geared towards using a 64k buffer with a rolling window of 32k. Optimised for 16 bit machines that is.
> 
> There are a few mentions of SMALL_MEM, MEDIUM_MEM and BIG_MEM variants via defines. However, code comments say that BIG_MEM would work on a complete file loaded in memory ... which is a no-go for files in the area of 15 to 30 GiB. I'm not even sure the code would be doing what the comments say.
> 
> Long story short: I do not feel expert enough to touch said functions and change them to provide for larger input buffering. If I were forced to implement something I'd try it with an outer buffering layer, but I'm not sure it would be elegant or even efficient.
> 
> Best,
>  Bastien
> 
> PS: then again I'm toying with the idea to write a simple gzip-packer replacement which simply buffers data and passes it to zlib.
> 
> --
> DSM Nutritional Products Microbia Inc | Bioinformatics
> 60 Westview Street | Lexington, MA 02421 | United States Phone +1 781 
> 259 7613 | Fax +1 781 259 0615
> 
> 
> ________________________________
> 
> DISCLAIMER:
> This e-mail is for the intended recipient only.
> If you have received it by mistake please let us know by reply and then delete it from your system; access, disclosure, copying, distribution or reliance on any of it by anyone else is prohibited.
> If you as intended recipient have received this e-mail incorrectly, please notify the sender (via e-mail) immediately.





Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Tue, 12 Apr 2016 17:19:01 GMT) Full text and rfc822 format available.

Message #29 received at 23113 <at> debbugs.gnu.org (full text, mbox):

From: Mark Adler <madler <at> alumni.caltech.edu>
To: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>
Cc: Jim Meyering <jim <at> meyering.net>,
 "23113 <at> debbugs.gnu.org" <23113 <at> debbugs.gnu.org>
Subject: Re: bug#23113: parallel gzip processes trash hard disks,
 need larger buffers
Date: Tue, 12 Apr 2016 10:18:08 -0700
Bastien,

On Apr 12, 2016, at 9:55 AM, Chevreux, Bastien <bastien.chevreux <at> dsm.com> wrote:
> Questions: how stable / error proof is pigz compared to gzip? I always shied away from it as gzip is so much tried and tested that errors are unlikely ... and the zlib.net homepage does not make an "official" statement like "you should all now move to pigz, it's good and tested enough."

Certainly with -p 1, it is nothing more than a wrapper around zlib, which itself is extensively tested. With -p > 1 it uses threads, which has been tested on many systems successfully. Though I'd wonder about how portable it really is. Unfortunately I have no way to know how widely deployed and used pigz is. (Nor do I know how widely deployed and used gzip is, but pretty widely.)

> Additional question: is there a pigzlib planned? :-)

I have been toying with ideas about how to provide parallel support in zlib. At this point, I'm not sure what the interface should be.

Mark





Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Tue, 12 Apr 2016 20:19:01 GMT) Full text and rfc822 format available.

Notification sent to "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>:
bug acknowledged by developer. (Tue, 12 Apr 2016 20:19:02 GMT) Full text and rfc822 format available.

Message #34 received at 23113-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>
Cc: Mark Adler <madler <at> alumni.caltech.edu>, 23113-done <at> debbugs.gnu.org
Subject: Re: bug#23113: parallel gzip processes trash hard disks,
 need larger buffers
Date: Tue, 12 Apr 2016 13:18:18 -0700
On Tue, Apr 12, 2016 at 9:55 AM, Chevreux, Bastien
<bastien.chevreux <at> dsm.com> wrote:
> Mark,
>
> I knew about pigz, albeit not about -b, thank you for that. Together with -p 1 that would replicate gzip and implement input buffering well enough to be used in parallel pipelines (where you do not want, e.g., 40 pipelines running 40 pigz with 40 threads each).
>
> Questions: how stable / error proof is pigz compared to gzip? I always shied away from it as gzip is so much tried and tested that errors are unlikely ... and the zlib.net homepage does not make an "official" statement like "you should all now move to pigz, it's good and tested enough." Additional question: is there a pigzlib planned? :-)

I expect pigz is stable enough to use with very high confidence.
Paul and I are notoriously picky about such things, and would not be
considering how to deprecate gzip in favor of pigz or to make gzip a
wrapper around pigz if we did not have that level of confidence.

One question for Mark: do you know if pigz has been subjected to AFL's
coverage-adaptive fuzzing? If not, it'd be great if someone could find
the time to do that. If someone does that, please also test an
ASAN-enabled binary and tell us how long the tests ran with no trace
of failure.

For reference, here's what happened when AFL was first applied to
linux file system driver code:
https://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing,%20Vault%202016.pdf.
If you read nothing else, look at slide 3, with its table of file
system type vs. the amount of time each driver withstood AFL-driven
abuse before first failure.

FYI, anyone can close one of these "issues," and I'm doing so simply
by replying to the usual DDDDD <at> debbugs.gnu.org address, but with an
inserted "-done" before the "@": 23113-done <at> debbugs.gnu.org




Information forwarded to bug-gzip <at> gnu.org:
bug#23113; Package gzip. (Tue, 12 Apr 2016 20:22:01 GMT) Full text and rfc822 format available.

Message #37 received at 23113-done <at> debbugs.gnu.org (full text, mbox):

From: Mark Adler <madler <at> alumni.caltech.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: "Chevreux, Bastien" <bastien.chevreux <at> dsm.com>, 23113-done <at> debbugs.gnu.org
Subject: Re: bug#23113: parallel gzip processes trash hard disks,
 need larger buffers
Date: Tue, 12 Apr 2016 13:21:42 -0700
Jim,

On Apr 12, 2016, at 1:18 PM, Jim Meyering <jim <at> meyering.net> wrote:
> One question for Mark: do you know if pigz has been subjected to AFL's
> coverage-adaptive fuzzing?

Not that I know of.

Mark





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 11 May 2016 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 42 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.