GNU bug report logs - #21270
gzip huge filesize problem

Previous Next

Package: gzip;

Reported by: Alexander Kleinsorge <aleks <at> physik.tu-berlin.de>

Date: Sat, 15 Aug 2015 21:57:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 21270 in the body.
You can then email your comments to 21270 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#21270; Package gzip. (Sat, 15 Aug 2015 21:57:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Alexander Kleinsorge <aleks <at> physik.tu-berlin.de>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Sat, 15 Aug 2015 21:57:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Alexander Kleinsorge <aleks <at> physik.tu-berlin.de>
To: bug-gzip <at> gnu.org
Subject: gzip huge filesize problem
Date: Sat, 15 Aug 2015 23:42:20 +0200
Hi Gzip team,

I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu 
14.10 (64 bit). uncompressing the file gives a file with 500 gb 
(checked).
But "gzip -l" shows bad (small) uncompressed_size and bad ratio 
(-5167%).

Below you can see some details, but I think it is a general bug.
Thanks for help, Alexander


gzip -l asus.gz
compressed        uncompressed  ratio uncompressed_name     99630975185  
        1891655680 -5166.9% asus

gzip --version
gzip 1.6

Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

the 2 files (compressed 93gb + uncompressed 500gb)

-rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz
-rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw
-rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz
-rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw





Information forwarded to bug-gzip <at> gnu.org:
bug#21270; Package gzip. (Sun, 16 Aug 2015 07:59:02 GMT) Full text and rfc822 format available.

Message #8 received at 21270 <at> debbugs.gnu.org (full text, mbox):

From: Mark Adler <madler <at> alumni.caltech.edu>
To: Alexander Kleinsorge <aleks <at> physik.tu-berlin.de>
Cc: 21270 <at> debbugs.gnu.org
Subject: Re: bug#21270: gzip huge filesize problem
Date: Sun, 16 Aug 2015 00:58:19 -0700
Alexander,

Thank you for your report.  This is a well-known limitation of the gzip format.  The -l function makes use of the uncompressed length stored in the last four bytes of a gzip stream.  Therein lies the rub, since four bytes can represent no more than 4 GB - 1.

There is another problem with that approach, in that a valid gzip file may consist of a series of concatenated gzip streams, in which case -l will report only on the last one.  In that case, even if the entire stream decompresses to less than 4 GB, the result will still be incorrect.

The only reliable way to determine the uncompressed size of a gzip file is to decompress the entire file (which can be done without storing the result).  This in fact is what "pigz -lt file.gz" does.  It will correctly report the uncompressed length, but takes much longer than "gzip -l".

-l remains useful however in most cases, so it remains a gzip and pigz option.

Mark


> On Aug 15, 2015, at 2:42 PM, Alexander Kleinsorge <aleks <at> physik.tu-berlin.de> wrote:
> 
> Hi Gzip team,
> 
> I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu 14.10 (64 bit). uncompressing the file gives a file with 500 gb (checked).
> But "gzip -l" shows bad (small) uncompressed_size and bad ratio (-5167%).
> 
> Below you can see some details, but I think it is a general bug.
> Thanks for help, Alexander
> 
> 
> gzip -l asus.gz
> compressed        uncompressed  ratio uncompressed_name     99630975185          1891655680 -5166.9% asus
> 
> gzip --version
> gzip 1.6
> 
> Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> 
> the 2 files (compressed 93gb + uncompressed 500gb)
> 
> -rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz
> -rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw
> -rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz
> -rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw
> 
> 
> 
> 





Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Mon, 17 Aug 2015 03:46:02 GMT) Full text and rfc822 format available.

Notification sent to Alexander Kleinsorge <aleks <at> physik.tu-berlin.de>:
bug acknowledged by developer. (Mon, 17 Aug 2015 03:46:03 GMT) Full text and rfc822 format available.

Message #13 received at 21270-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Mark Adler <madler <at> alumni.caltech.edu>
Cc: Alexander Kleinsorge <aleks <at> physik.tu-berlin.de>,
 21270-done <at> debbugs.gnu.org
Subject: Re: bug#21270: gzip huge filesize problem
Date: Sun, 16 Aug 2015 21:44:54 -0600
tags 21270 notabug
thanks

On Sun, Aug 16, 2015 at 1:58 AM, Mark Adler <madler <at> alumni.caltech.edu> wrote:
> Alexander,
>
> Thank you for your report.  This is a well-known limitation of the gzip format.  The -l function makes use of the uncompressed length stored in the last four bytes of a gzip stream.  Therein lies the rub, since four bytes can represent no more than 4 GB - 1.
>
> There is another problem with that approach, in that a valid gzip file may consist of a series of concatenated gzip streams, in which case -l will report only on the last one.  In that case, even if the entire stream decompresses to less than 4 GB, the result will still be incorrect.
>
> The only reliable way to determine the uncompressed size of a gzip file is to decompress the entire file (which can be done without storing the result).  This in fact is what "pigz -lt file.gz" does.  It will correctly report the uncompressed length, but takes much longer than "gzip -l".
>
> -l remains useful however in most cases, so it remains a gzip and pigz option.

Thank you for replying Mark.
I've marked this as "notabug" with the in-line comment above, and am
closing the auto-created issue with the "-done" part of the debbugs
email recipient address.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 14 Sep 2015 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 340 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.