From unknown Mon Aug 18 18:04:00 2025 X-Loop: help-debbugs@gnu.org Subject: bug#21270: gzip huge filesize problem Resent-From: Alexander Kleinsorge Original-Sender: "Debbugs-submit" Resent-CC: bug-gzip@gnu.org Resent-Date: Sat, 15 Aug 2015 21:57:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 21270 X-GNU-PR-Package: gzip X-GNU-PR-Keywords: To: 21270@debbugs.gnu.org X-Debbugs-Original-To: bug-gzip@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.143967579418347 (code B ref -1); Sat, 15 Aug 2015 21:57:01 +0000 Received: (at submit) by debbugs.gnu.org; 15 Aug 2015 21:56:34 +0000 Received: from localhost ([127.0.0.1]:56932 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZQjRK-0004lq-8n for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:56:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:58710) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZQjDl-0004R8-9R for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZQjDk-0000ai-3I for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:32 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:52584) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDk-0000ae-0e for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:32 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40464) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDj-0002rx-8A for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZQjDd-0000Yt-Ky for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:31 -0400 Received: from mail.tu-berlin.de ([130.149.7.33]:3693) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDd-0000YR-FQ for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:25 -0400 X-tubIT-Incoming-IP: 130.149.58.163 Received: from mail.physik-pool.tu-berlin.de ([130.149.58.163] helo=mail.physik.tu-berlin.de) by mail.tu-berlin.de (exim-4.76/mailfrontend-8) with esmtp for id 1ZQjDb-0007Oy-m1; Sat, 15 Aug 2015 23:42:24 +0200 Received: from physik.tu-berlin.de (physik.tu-berlin.de [130.149.58.160]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.physik.tu-berlin.de (Postfix) with ESMTPS id 2C80B6A01 for ; Sat, 15 Aug 2015 21:42:21 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Sat, 15 Aug 2015 23:42:20 +0200 From: Alexander Kleinsorge Message-ID: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> X-Sender: aleks@physik.tu-berlin.de User-Agent: Roundcube Webmail/1.1-git X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Mailman-Approved-At: Sat, 15 Aug 2015 17:56:32 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hi Gzip team, I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu 14.10 (64 bit). uncompressing the file gives a file with 500 gb (checked). But "gzip -l" shows bad (small) uncompressed_size and bad ratio (-5167%). Below you can see some details, but I think it is a general bug. Thanks for help, Alexander gzip -l asus.gz compressed uncompressed ratio uncompressed_name 99630975185 1891655680 -5166.9% asus gzip --version gzip 1.6 Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux the 2 files (compressed 93gb + uncompressed 500gb) -rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz -rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw -rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz -rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw From unknown Mon Aug 18 18:04:00 2025 X-Loop: help-debbugs@gnu.org Subject: bug#21270: gzip huge filesize problem Resent-From: Mark Adler Original-Sender: "Debbugs-submit" Resent-CC: bug-gzip@gnu.org Resent-Date: Sun, 16 Aug 2015 07:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 21270 X-GNU-PR-Package: gzip X-GNU-PR-Keywords: To: Alexander Kleinsorge Cc: 21270@debbugs.gnu.org Received: via spool by 21270-submit@debbugs.gnu.org id=B21270.143971190710715 (code B ref 21270); Sun, 16 Aug 2015 07:59:02 +0000 Received: (at 21270) by debbugs.gnu.org; 16 Aug 2015 07:58:27 +0000 Received: from localhost ([127.0.0.1]:57121 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZQspm-0002ml-HQ for submit@debbugs.gnu.org; Sun, 16 Aug 2015 03:58:26 -0400 Received: from mail.alumni.caltech.edu ([131.215.242.114]:50925) by debbugs.gnu.org with smtp (Exim 4.80) (envelope-from ) id 1ZQspj-0002mc-TX for 21270@debbugs.gnu.org; Sun, 16 Aug 2015 03:58:25 -0400 Received: from [10.0.1.5] (unknown [138.229.211.204]) (Authenticated sender: madler) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id 9469F1201D2; Sun, 16 Aug 2015 00:58:20 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.alumni.caltech.edu 9469F1201D2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alumni.caltech.edu; s=enforce; t=1439711900; bh=TNO3LI4jQu2qqA5au7HfhaRmmitdfrogZwJiI8r0S/Q=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=TeJEHrDKJ+6/ytgQdyrTmCZcpHx1uXfOIBpcOc7nIOfFpD9Zm9VRdt2/ODKfyd/CB 3wGLK5PTw5Jqc3IT5mwZR1TpKMUamJBnyFRkxPEQy6f7Qr+DsIvpyjvThBrAHjCQLs JN26USue/HzLPSx4YO8D/XtDAoqsFst1ehhpG5Ds= Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Content-Type: text/plain; charset=us-ascii From: Mark Adler In-Reply-To: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> Date: Sun, 16 Aug 2015 00:58:19 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <8615556E-DDFC-40C7-8DCD-DB9AD56866D7@alumni.caltech.edu> References: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> X-Mailer: Apple Mail (2.2104) X-MailScanner-Information-Alumni: X-Alumni-MailScanner-ID: 9469F1201D2.AFDE1 X-MailScanner-Alumni: No Virii found X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.1, required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10) X-MailScanner-From: madler@alumni.caltech.edu X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Alexander, Thank you for your report. This is a well-known limitation of the gzip = format. The -l function makes use of the uncompressed length stored in = the last four bytes of a gzip stream. Therein lies the rub, since four = bytes can represent no more than 4 GB - 1. There is another problem with that approach, in that a valid gzip file = may consist of a series of concatenated gzip streams, in which case -l = will report only on the last one. In that case, even if the entire = stream decompresses to less than 4 GB, the result will still be = incorrect. The only reliable way to determine the uncompressed size of a gzip file = is to decompress the entire file (which can be done without storing the = result). This in fact is what "pigz -lt file.gz" does. It will = correctly report the uncompressed length, but takes much longer than = "gzip -l". -l remains useful however in most cases, so it remains a gzip and pigz = option. Mark > On Aug 15, 2015, at 2:42 PM, Alexander Kleinsorge = wrote: >=20 > Hi Gzip team, >=20 > I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu = 14.10 (64 bit). uncompressing the file gives a file with 500 gb = (checked). > But "gzip -l" shows bad (small) uncompressed_size and bad ratio = (-5167%). >=20 > Below you can see some details, but I think it is a general bug. > Thanks for help, Alexander >=20 >=20 > gzip -l asus.gz > compressed uncompressed ratio uncompressed_name = 99630975185 1891655680 -5166.9% asus >=20 > gzip --version > gzip 1.6 >=20 > Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC = 2015 x86_64 x86_64 x86_64 GNU/Linux >=20 > the 2 files (compressed 93gb + uncompressed 500gb) >=20 > -rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz > -rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw > -rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz > -rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw >=20 >=20 >=20 >=20 From unknown Mon Aug 18 18:04:00 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Alexander Kleinsorge Subject: bug#21270: closed (Re: bug#21270: gzip huge filesize problem) Message-ID: References: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> X-Gnu-PR-Message: they-closed 21270 X-Gnu-PR-Package: gzip Reply-To: 21270@debbugs.gnu.org Date: Mon, 17 Aug 2015 03:46:03 +0000 Content-Type: multipart/mixed; boundary="----------=_1439783163-32743-1" This is a multi-part message in MIME format... ------------=_1439783163-32743-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #21270: gzip huge filesize problem which was filed against the gzip package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 21270@debbugs.gnu.org. --=20 21270: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D21270 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1439783163-32743-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 21270-done) by debbugs.gnu.org; 17 Aug 2015 03:45:17 +0000 Received: from localhost ([127.0.0.1]:57829 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZRBMK-0008V4-PQ for submit@debbugs.gnu.org; Sun, 16 Aug 2015 23:45:17 -0400 Received: from mail-vk0-f43.google.com ([209.85.213.43]:34325) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZRBMI-0008Uw-Q9 for 21270-done@debbugs.gnu.org; Sun, 16 Aug 2015 23:45:15 -0400 Received: by vkaw128 with SMTP id w128so18940841vka.1 for <21270-done@debbugs.gnu.org>; Sun, 16 Aug 2015 20:45:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=GBRHDiO8reL+o4hOUjwrhp3zu28ArZvn+/C+zEvQWYE=; b=acU01wl9UQyl2nbGpdEPM5Ks8e0zvoyY7ox747UOLxgJ94IlO1Wo4SFepJjwKTrKCe C8djNT3cpM3J6g4EDfkPpQgZvGdAqdNPxhAWTUbGlqI7UJrJtYAUgBG7CmyGrfSQFFAv 5fbcdq24Q7DPszzcqmF451Y3KYAl4yjMOyEeT5yQK7IKWOlnGIL9OU/2CXEXGtMmPX85 k0yDJ/iwMzRsHNhnkvVPHp7Q4LHOcddtUgVSNxHfMJN9gFJG3nSGE9HHfpXLV/A6CRNC IySP6e8FCt1YXyU4zgsBReGESS3pX2p7uwqJkuukanTRXYezJk3wRtrg1NZ2mi5LLker lCWg== X-Received: by 10.52.175.234 with SMTP id cd10mr9152765vdc.61.1439783114179; Sun, 16 Aug 2015 20:45:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.61.8 with HTTP; Sun, 16 Aug 2015 20:44:54 -0700 (PDT) In-Reply-To: <8615556E-DDFC-40C7-8DCD-DB9AD56866D7@alumni.caltech.edu> References: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> <8615556E-DDFC-40C7-8DCD-DB9AD56866D7@alumni.caltech.edu> From: Jim Meyering Date: Sun, 16 Aug 2015 21:44:54 -0600 X-Google-Sender-Auth: QTPRSHWOFk4LTkQ_yxxjLZ4q5dk Message-ID: Subject: Re: bug#21270: gzip huge filesize problem To: Mark Adler Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 21270-done Cc: Alexander Kleinsorge , 21270-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) tags 21270 notabug thanks On Sun, Aug 16, 2015 at 1:58 AM, Mark Adler wro= te: > Alexander, > > Thank you for your report. This is a well-known limitation of the gzip f= ormat. The -l function makes use of the uncompressed length stored in the = last four bytes of a gzip stream. Therein lies the rub, since four bytes c= an represent no more than 4 GB - 1. > > There is another problem with that approach, in that a valid gzip file ma= y consist of a series of concatenated gzip streams, in which case -l will r= eport only on the last one. In that case, even if the entire stream decomp= resses to less than 4 GB, the result will still be incorrect. > > The only reliable way to determine the uncompressed size of a gzip file i= s to decompress the entire file (which can be done without storing the resu= lt). This in fact is what "pigz -lt file.gz" does. It will correctly repo= rt the uncompressed length, but takes much longer than "gzip -l". > > -l remains useful however in most cases, so it remains a gzip and pigz op= tion. Thank you for replying Mark. I've marked this as "notabug" with the in-line comment above, and am closing the auto-created issue with the "-done" part of the debbugs email recipient address. ------------=_1439783163-32743-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 15 Aug 2015 21:56:34 +0000 Received: from localhost ([127.0.0.1]:56932 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZQjRK-0004lq-8n for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:56:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:58710) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZQjDl-0004R8-9R for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZQjDk-0000ai-3I for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:32 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:52584) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDk-0000ae-0e for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:32 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40464) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDj-0002rx-8A for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZQjDd-0000Yt-Ky for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:31 -0400 Received: from mail.tu-berlin.de ([130.149.7.33]:3693) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDd-0000YR-FQ for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:25 -0400 X-tubIT-Incoming-IP: 130.149.58.163 Received: from mail.physik-pool.tu-berlin.de ([130.149.58.163] helo=mail.physik.tu-berlin.de) by mail.tu-berlin.de (exim-4.76/mailfrontend-8) with esmtp for id 1ZQjDb-0007Oy-m1; Sat, 15 Aug 2015 23:42:24 +0200 Received: from physik.tu-berlin.de (physik.tu-berlin.de [130.149.58.160]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.physik.tu-berlin.de (Postfix) with ESMTPS id 2C80B6A01 for ; Sat, 15 Aug 2015 21:42:21 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Sat, 15 Aug 2015 23:42:20 +0200 From: Alexander Kleinsorge To: bug-gzip@gnu.org Subject: gzip huge filesize problem Message-ID: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> X-Sender: aleks@physik.tu-berlin.de User-Agent: Roundcube Webmail/1.1-git X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 15 Aug 2015 17:56:32 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hi Gzip team, I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu 14.10 (64 bit). uncompressing the file gives a file with 500 gb (checked). But "gzip -l" shows bad (small) uncompressed_size and bad ratio (-5167%). Below you can see some details, but I think it is a general bug. Thanks for help, Alexander gzip -l asus.gz compressed uncompressed ratio uncompressed_name 99630975185 1891655680 -5166.9% asus gzip --version gzip 1.6 Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux the 2 files (compressed 93gb + uncompressed 500gb) -rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz -rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw -rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz -rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw ------------=_1439783163-32743-1--