GNU bug report logs - #30719
Progressively compressing piped input

Previous Next

Package: gzip;

Reported by: "Garreau\, Alexandre" <galex-713 <at> galex-713.eu>

Date: Mon, 5 Mar 2018 21:20:02 UTC

Severity: wishlist

Full log


Message #14 received at 30719 <at> debbugs.gnu.org (full text, mbox):

From: Mark Adler <madler <at> alumni.caltech.edu>
To: "Garreau, Alexandre" <galex-713 <at> galex-713.eu>
Cc: 30719 <at> debbugs.gnu.org
Subject: Re: bug#30719: Progressively compressing piped input
Date: Tue, 6 Mar 2018 18:11:51 -0800
> On Mar 6, 2018, at 1:58 PM, Garreau, Alexandre <galex-713 <at> galex-713.eu> wrote:
> 
> Le 05/03/2018 à 14h54, Mark Adler a écrit :
>> deflate has an inherent latency that accumulates enough data in order
>> to efficiently emit each deflate block. You can deliberately flush
>> (with zlib, not gzip), but if you do that too frequently, e.g. each
>> line, then you will get lousy compression or even expansion.
> 
> Even if the main repetition is being between the lines? like if 80% of
> half the line, and 70% of the other half lines are the same? like in a
> while loop with only ping and date? I thought to it as a very lazy way
> of not having to remove all the redundant output caused by the usage of
> ascii, the repetition of words or similar patterns occuring ever and
> ever.


Alexandre,

It has nothing to do with how much or how little or how often there is repetition. It has to do with the overhead of the header of a dynamic block that is required to describe the Huffman codes used therein. You need several thousand symbols in order to pay for the bits required for the header.

Mark





This bug report was last modified 3 years and 78 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.