From unknown Mon Aug 18 21:20:35 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#21270 <21270@debbugs.gnu.org> To: bug#21270 <21270@debbugs.gnu.org> Subject: Status: gzip huge filesize problem Reply-To: bug#21270 <21270@debbugs.gnu.org> Date: Tue, 19 Aug 2025 04:20:35 +0000 retitle 21270 gzip huge filesize problem reassign 21270 gzip submitter 21270 Alexander Kleinsorge severity 21270 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 15 17:56:34 2015 Received: (at submit) by debbugs.gnu.org; 15 Aug 2015 21:56:34 +0000 Received: from localhost ([127.0.0.1]:56932 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZQjRK-0004lq-8n for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:56:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:58710) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZQjDl-0004R8-9R for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZQjDk-0000ai-3I for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:32 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:52584) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDk-0000ae-0e for submit@debbugs.gnu.org; Sat, 15 Aug 2015 17:42:32 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40464) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDj-0002rx-8A for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZQjDd-0000Yt-Ky for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:31 -0400 Received: from mail.tu-berlin.de ([130.149.7.33]:3693) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQjDd-0000YR-FQ for bug-gzip@gnu.org; Sat, 15 Aug 2015 17:42:25 -0400 X-tubIT-Incoming-IP: 130.149.58.163 Received: from mail.physik-pool.tu-berlin.de ([130.149.58.163] helo=mail.physik.tu-berlin.de) by mail.tu-berlin.de (exim-4.76/mailfrontend-8) with esmtp for id 1ZQjDb-0007Oy-m1; Sat, 15 Aug 2015 23:42:24 +0200 Received: from physik.tu-berlin.de (physik.tu-berlin.de [130.149.58.160]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.physik.tu-berlin.de (Postfix) with ESMTPS id 2C80B6A01 for ; Sat, 15 Aug 2015 21:42:21 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Sat, 15 Aug 2015 23:42:20 +0200 From: Alexander Kleinsorge To: bug-gzip@gnu.org Subject: gzip huge filesize problem Message-ID: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> X-Sender: aleks@physik.tu-berlin.de User-Agent: Roundcube Webmail/1.1-git X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 15 Aug 2015 17:56:32 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hi Gzip team, I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu 14.10 (64 bit). uncompressing the file gives a file with 500 gb (checked). But "gzip -l" shows bad (small) uncompressed_size and bad ratio (-5167%). Below you can see some details, but I think it is a general bug. Thanks for help, Alexander gzip -l asus.gz compressed uncompressed ratio uncompressed_name 99630975185 1891655680 -5166.9% asus gzip --version gzip 1.6 Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux the 2 files (compressed 93gb + uncompressed 500gb) -rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz -rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw -rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz -rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw From debbugs-submit-bounces@debbugs.gnu.org Sun Aug 16 03:58:27 2015 Received: (at 21270) by debbugs.gnu.org; 16 Aug 2015 07:58:27 +0000 Received: from localhost ([127.0.0.1]:57121 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZQspm-0002ml-HQ for submit@debbugs.gnu.org; Sun, 16 Aug 2015 03:58:26 -0400 Received: from mail.alumni.caltech.edu ([131.215.242.114]:50925) by debbugs.gnu.org with smtp (Exim 4.80) (envelope-from ) id 1ZQspj-0002mc-TX for 21270@debbugs.gnu.org; Sun, 16 Aug 2015 03:58:25 -0400 Received: from [10.0.1.5] (unknown [138.229.211.204]) (Authenticated sender: madler) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id 9469F1201D2; Sun, 16 Aug 2015 00:58:20 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.alumni.caltech.edu 9469F1201D2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alumni.caltech.edu; s=enforce; t=1439711900; bh=TNO3LI4jQu2qqA5au7HfhaRmmitdfrogZwJiI8r0S/Q=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=TeJEHrDKJ+6/ytgQdyrTmCZcpHx1uXfOIBpcOc7nIOfFpD9Zm9VRdt2/ODKfyd/CB 3wGLK5PTw5Jqc3IT5mwZR1TpKMUamJBnyFRkxPEQy6f7Qr+DsIvpyjvThBrAHjCQLs JN26USue/HzLPSx4YO8D/XtDAoqsFst1ehhpG5Ds= Subject: Re: bug#21270: gzip huge filesize problem Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Content-Type: text/plain; charset=us-ascii From: Mark Adler In-Reply-To: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> Date: Sun, 16 Aug 2015 00:58:19 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <8615556E-DDFC-40C7-8DCD-DB9AD56866D7@alumni.caltech.edu> References: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> To: Alexander Kleinsorge X-Mailer: Apple Mail (2.2104) X-MailScanner-Information-Alumni: X-Alumni-MailScanner-ID: 9469F1201D2.AFDE1 X-MailScanner-Alumni: No Virii found X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.1, required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10) X-MailScanner-From: madler@alumni.caltech.edu X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 21270 Cc: 21270@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Alexander, Thank you for your report. This is a well-known limitation of the gzip = format. The -l function makes use of the uncompressed length stored in = the last four bytes of a gzip stream. Therein lies the rub, since four = bytes can represent no more than 4 GB - 1. There is another problem with that approach, in that a valid gzip file = may consist of a series of concatenated gzip streams, in which case -l = will report only on the last one. In that case, even if the entire = stream decompresses to less than 4 GB, the result will still be = incorrect. The only reliable way to determine the uncompressed size of a gzip file = is to decompress the entire file (which can be done without storing the = result). This in fact is what "pigz -lt file.gz" does. It will = correctly report the uncompressed length, but takes much longer than = "gzip -l". -l remains useful however in most cases, so it remains a gzip and pigz = option. Mark > On Aug 15, 2015, at 2:42 PM, Alexander Kleinsorge = wrote: >=20 > Hi Gzip team, >=20 > I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu = 14.10 (64 bit). uncompressing the file gives a file with 500 gb = (checked). > But "gzip -l" shows bad (small) uncompressed_size and bad ratio = (-5167%). >=20 > Below you can see some details, but I think it is a general bug. > Thanks for help, Alexander >=20 >=20 > gzip -l asus.gz > compressed uncompressed ratio uncompressed_name = 99630975185 1891655680 -5166.9% asus >=20 > gzip --version > gzip 1.6 >=20 > Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC = 2015 x86_64 x86_64 x86_64 GNU/Linux >=20 > the 2 files (compressed 93gb + uncompressed 500gb) >=20 > -rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz > -rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw > -rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz > -rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw >=20 >=20 >=20 >=20 From debbugs-submit-bounces@debbugs.gnu.org Sun Aug 16 23:45:17 2015 Received: (at 21270-done) by debbugs.gnu.org; 17 Aug 2015 03:45:17 +0000 Received: from localhost ([127.0.0.1]:57829 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZRBMK-0008V4-PQ for submit@debbugs.gnu.org; Sun, 16 Aug 2015 23:45:17 -0400 Received: from mail-vk0-f43.google.com ([209.85.213.43]:34325) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZRBMI-0008Uw-Q9 for 21270-done@debbugs.gnu.org; Sun, 16 Aug 2015 23:45:15 -0400 Received: by vkaw128 with SMTP id w128so18940841vka.1 for <21270-done@debbugs.gnu.org>; Sun, 16 Aug 2015 20:45:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=GBRHDiO8reL+o4hOUjwrhp3zu28ArZvn+/C+zEvQWYE=; b=acU01wl9UQyl2nbGpdEPM5Ks8e0zvoyY7ox747UOLxgJ94IlO1Wo4SFepJjwKTrKCe C8djNT3cpM3J6g4EDfkPpQgZvGdAqdNPxhAWTUbGlqI7UJrJtYAUgBG7CmyGrfSQFFAv 5fbcdq24Q7DPszzcqmF451Y3KYAl4yjMOyEeT5yQK7IKWOlnGIL9OU/2CXEXGtMmPX85 k0yDJ/iwMzRsHNhnkvVPHp7Q4LHOcddtUgVSNxHfMJN9gFJG3nSGE9HHfpXLV/A6CRNC IySP6e8FCt1YXyU4zgsBReGESS3pX2p7uwqJkuukanTRXYezJk3wRtrg1NZ2mi5LLker lCWg== X-Received: by 10.52.175.234 with SMTP id cd10mr9152765vdc.61.1439783114179; Sun, 16 Aug 2015 20:45:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.61.8 with HTTP; Sun, 16 Aug 2015 20:44:54 -0700 (PDT) In-Reply-To: <8615556E-DDFC-40C7-8DCD-DB9AD56866D7@alumni.caltech.edu> References: <720ec2e73b65124c55fe1da9ddd4e503@physik.tu-berlin.de> <8615556E-DDFC-40C7-8DCD-DB9AD56866D7@alumni.caltech.edu> From: Jim Meyering Date: Sun, 16 Aug 2015 21:44:54 -0600 X-Google-Sender-Auth: QTPRSHWOFk4LTkQ_yxxjLZ4q5dk Message-ID: Subject: Re: bug#21270: gzip huge filesize problem To: Mark Adler Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 21270-done Cc: Alexander Kleinsorge , 21270-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) tags 21270 notabug thanks On Sun, Aug 16, 2015 at 1:58 AM, Mark Adler wro= te: > Alexander, > > Thank you for your report. This is a well-known limitation of the gzip f= ormat. The -l function makes use of the uncompressed length stored in the = last four bytes of a gzip stream. Therein lies the rub, since four bytes c= an represent no more than 4 GB - 1. > > There is another problem with that approach, in that a valid gzip file ma= y consist of a series of concatenated gzip streams, in which case -l will r= eport only on the last one. In that case, even if the entire stream decomp= resses to less than 4 GB, the result will still be incorrect. > > The only reliable way to determine the uncompressed size of a gzip file i= s to decompress the entire file (which can be done without storing the resu= lt). This in fact is what "pigz -lt file.gz" does. It will correctly repo= rt the uncompressed length, but takes much longer than "gzip -l". > > -l remains useful however in most cases, so it remains a gzip and pigz op= tion. Thank you for replying Mark. I've marked this as "notabug" with the in-line comment above, and am closing the auto-created issue with the "-done" part of the debbugs email recipient address. From unknown Mon Aug 18 21:20:35 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 14 Sep 2015 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator