GNU bug report logs -
#21270
gzip huge filesize problem
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 21270 in the body.
You can then email your comments to 21270 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gzip <at> gnu.org
:
bug#21270
; Package
gzip
.
(Sat, 15 Aug 2015 21:57:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Alexander Kleinsorge <aleks <at> physik.tu-berlin.de>
:
New bug report received and forwarded. Copy sent to
bug-gzip <at> gnu.org
.
(Sat, 15 Aug 2015 21:57:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi Gzip team,
I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu
14.10 (64 bit). uncompressing the file gives a file with 500 gb
(checked).
But "gzip -l" shows bad (small) uncompressed_size and bad ratio
(-5167%).
Below you can see some details, but I think it is a general bug.
Thanks for help, Alexander
gzip -l asus.gz
compressed uncompressed ratio uncompressed_name 99630975185
1891655680 -5166.9% asus
gzip --version
gzip 1.6
Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux
the 2 files (compressed 93gb + uncompressed 500gb)
-rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz
-rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw
-rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz
-rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw
Information forwarded
to
bug-gzip <at> gnu.org
:
bug#21270
; Package
gzip
.
(Sun, 16 Aug 2015 07:59:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 21270 <at> debbugs.gnu.org (full text, mbox):
Alexander,
Thank you for your report. This is a well-known limitation of the gzip format. The -l function makes use of the uncompressed length stored in the last four bytes of a gzip stream. Therein lies the rub, since four bytes can represent no more than 4 GB - 1.
There is another problem with that approach, in that a valid gzip file may consist of a series of concatenated gzip streams, in which case -l will report only on the last one. In that case, even if the entire stream decompresses to less than 4 GB, the result will still be incorrect.
The only reliable way to determine the uncompressed size of a gzip file is to decompress the entire file (which can be done without storing the result). This in fact is what "pigz -lt file.gz" does. It will correctly report the uncompressed length, but takes much longer than "gzip -l".
-l remains useful however in most cases, so it remains a gzip and pigz option.
Mark
> On Aug 15, 2015, at 2:42 PM, Alexander Kleinsorge <aleks <at> physik.tu-berlin.de> wrote:
>
> Hi Gzip team,
>
> I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu 14.10 (64 bit). uncompressing the file gives a file with 500 gb (checked).
> But "gzip -l" shows bad (small) uncompressed_size and bad ratio (-5167%).
>
> Below you can see some details, but I think it is a general bug.
> Thanks for help, Alexander
>
>
> gzip -l asus.gz
> compressed uncompressed ratio uncompressed_name 99630975185 1891655680 -5166.9% asus
>
> gzip --version
> gzip 1.6
>
> Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> the 2 files (compressed 93gb + uncompressed 500gb)
>
> -rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz
> -rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw
> -rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz
> -rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw
>
>
>
>
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Mon, 17 Aug 2015 03:46:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Alexander Kleinsorge <aleks <at> physik.tu-berlin.de>
:
bug acknowledged by developer.
(Mon, 17 Aug 2015 03:46:03 GMT)
Full text and
rfc822 format available.
Message #13 received at 21270-done <at> debbugs.gnu.org (full text, mbox):
tags 21270 notabug
thanks
On Sun, Aug 16, 2015 at 1:58 AM, Mark Adler <madler <at> alumni.caltech.edu> wrote:
> Alexander,
>
> Thank you for your report. This is a well-known limitation of the gzip format. The -l function makes use of the uncompressed length stored in the last four bytes of a gzip stream. Therein lies the rub, since four bytes can represent no more than 4 GB - 1.
>
> There is another problem with that approach, in that a valid gzip file may consist of a series of concatenated gzip streams, in which case -l will report only on the last one. In that case, even if the entire stream decompresses to less than 4 GB, the result will still be incorrect.
>
> The only reliable way to determine the uncompressed size of a gzip file is to decompress the entire file (which can be done without storing the result). This in fact is what "pigz -lt file.gz" does. It will correctly report the uncompressed length, but takes much longer than "gzip -l".
>
> -l remains useful however in most cases, so it remains a gzip and pigz option.
Thank you for replying Mark.
I've marked this as "notabug" with the in-line comment above, and am
closing the auto-created issue with the "-done" part of the debbugs
email recipient address.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 14 Sep 2015 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 340 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.