GNU bug report logs - #30719
Progressively compressing piped input

Previous Next

Package: gzip;

Reported by: "Garreau\, Alexandre" <galex-713 <at> galex-713.eu>

Date: Mon, 5 Mar 2018 21:20:02 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: "Garreau\, Alexandre" <galex-713 <at> galex-713.eu>
To: Mark Adler <madler <at> alumni.caltech.edu>
Cc: 30719 <at> debbugs.gnu.org
Subject: bug#30719: Progressively compressing piped input
Date: Tue, 06 Mar 2018 22:58:56 +0100
Le 05/03/2018 à 14h54, Mark Adler a écrit :
> deflate has an inherent latency that accumulates enough data in order
> to efficiently emit each deflate block. You can deliberately flush
> (with zlib, not gzip), but if you do that too frequently, e.g. each
> line, then you will get lousy compression or even expansion.

Even if the main repetition is being between the lines? like if 80% of
half the line, and 70% of the other half lines are the same? like in a
while loop with only ping and date? I thought to it as a very lazy way
of not having to remove all the redundant output caused by the usage of
ascii, the repetition of words or similar patterns occuring ever and
ever.

> I wrote something called gzlog
> (https://github.com/madler/zlib/blob/master/examples/gzlog.h
> <https://github.com/madler/zlib/blob/master/examples/gzlog.h>),
> intended to solve this problem. It can take a small amount of input,
> e.g. a line, and update the output gzip file to be complete and valid
> after each line, yet also get good compression in the long run. It
> does this by writing the lines to the log.gz file effectively
> uncompressed (deflate has a “stored” block type), until it has
> accumulated, say, 1 MB of data. Then it goes back and compresses that
> uncompressed 1 MB, again always leaving the gzip file in a valid
> state. gzlog also maintains something like a journal, which allows
> gzlog to repair the gzip file if the last operation was interrupted,
> e.g. by a power failure.

I rather searched some tool that could be used as an utility (since
that’s for a dirty high-level low-frequency medium-term task) rather
than a C thing, yet that’s quite interesting at least in demonstrating
the flexibility of gzip…

>> #!/bin/bash
>> while ping -c1 gnu.org ; do
>>    date --rfc-3339=seconds
>>    sleep 30
>> done | gzip -9 -f | tee sample.log | zcat

maybe the only way to go is just gzipping everything each time a log is
rotated like the standard way, if that pipe thing cannot be done even
with each line being almost the same…




This bug report was last modified 3 years and 77 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.