GNU bug report logs - #29089
Truncated size of big file

Previous Next

Package: gzip;

Reported by: Alex Peshkoff <peshkoff <at> mail.ru>

Date: Tue, 31 Oct 2017 18:05:01 UTC

Severity: normal

Merged with 17804, 30935, 30936, 38766, 42965, 48424, 52227

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 29089 <at> debbugs.gnu.org (full text, mbox):

From: Mark Adler <madler <at> alumni.caltech.edu>
To: Alex Peshkoff <peshkoff <at> mail.ru>
Cc: 29089 <at> debbugs.gnu.org
Subject: Re: bug#29089: Truncated size of big file
Date: Tue, 31 Oct 2017 11:20:29 -0700
Alex,

This is inherent in the gzip format, and is not really a bug in gzip. (Though gzip could notice the problem and not display a large negative compression ratio.)

The gzip format stores the uncompressed length at the end using four bytes, which can only represent up to 2^32-1. So what you are seeing is the low 32 bits of 18962535424, which is in fact 1782666240. When gzip uses that truncated value to compute a compression ratio, it gets a nonsensical result.

Unfortunately the only way to get the real uncompressed length and compute a real ratio is to decompress the entire file. (In fact, pigz will do this with "pigz -lt", which tests the entire file without storing the result, and reports the correct uncompressed size and compression ratio. "pigz -l" will do the same bad thing that "gzip -l" does on > 4 GB uncompressed sizes, though it will report “unk” for questionable ratios, i.e. expansions of the data beyond what would be expected for incompressible data.)

Mark


> On Oct 31, 2017, at 10:59 AM, Alex Peshkoff <peshkoff <at> mail.ru> wrote:
> 
> Before decompressing a copy of database I've decided to take a look at it's size:
> 
> localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
>          compressed        uncompressed  ratio uncompressed_name
>          3645968323          1782666240 -104.5% SWHTOROLT_20171019.GBK
> 
> uncompressed is reported as 1.7Gb which is definitely something unreal like -104.5 compress ratio
> 
> Actual size after unzip is:
> 
> localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
> localhost stg # ls -l SWHTOROLT_20171019.GBK
> -rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK
> 
> Lickily I've had enough disk space - but let me not attach problematic archive to email, I suppose it's easier to reproduce this locally ;)
> 
> Alex.
> 
> 
> 
> 
> 





This bug report was last modified 3 years and 154 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.