From unknown Sat Jun 21 03:10:08 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#23113 <23113@debbugs.gnu.org> To: bug#23113 <23113@debbugs.gnu.org> Subject: Status: parallel gzip processes trash hard disks, need larger buffers Reply-To: bug#23113 <23113@debbugs.gnu.org> Date: Sat, 21 Jun 2025 10:10:08 +0000 retitle 23113 parallel gzip processes trash hard disks, need larger buffers reassign 23113 gzip submitter 23113 "Chevreux, Bastien" severity 23113 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 25 14:15:14 2016 Received: (at submit) by debbugs.gnu.org; 25 Mar 2016 18:15:14 +0000 Received: from localhost ([127.0.0.1]:38090 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ajWGO-0004Cw-RC for submit@debbugs.gnu.org; Fri, 25 Mar 2016 14:15:14 -0400 Received: from eggs.gnu.org ([208.118.235.92]:33167) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ajVGs-0002eM-QR for submit@debbugs.gnu.org; Fri, 25 Mar 2016 13:11:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ajVGk-0000Vt-H9 for submit@debbugs.gnu.org; Fri, 25 Mar 2016 13:11:33 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: *** X-Spam-Status: No, score=3.3 required=5.0 tests=BAYES_50,HTML_MESSAGE, RECEIVED_FROM_WINDOWS_HOST,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:57692) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajVGk-0000Vo-DL for submit@debbugs.gnu.org; Fri, 25 Mar 2016 13:11:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52353) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajVGh-0000iS-1P for bug-gzip@gnu.org; Fri, 25 Mar 2016 13:11:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ajVGb-0000RC-Vf for bug-gzip@gnu.org; Fri, 25 Mar 2016 13:11:26 -0400 Received: from mail-am1on0074.outbound.protection.outlook.com ([157.56.112.74]:27026 helo=emea01-am1-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajVGb-0000Qp-EH for bug-gzip@gnu.org; Fri, 25 Mar 2016 13:11:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=DSM1234.onmicrosoft.com; s=selector1-dsm-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=8czc44Kq/qUAtci4Q746eXgXB/jgM/Q+fHI3pZF37MQ=; b=XYah7UtqLtnTWCJTaNmwe5szRRHdgKZY2wVNoAypARIZGpFkG+JegKu4wjw9J8JOWhJJSso8bgpKla8G7R4+SuRqtfBEk3rjemxUiwTnXC5K724VRwXJ1JWUtDejYvXQi+XTx/QbGuDce9QFr3Jl50dhTAIvE7Y1T8+ipwdfr0U= Received: from DB4PR07MB0558.eurprd07.prod.outlook.com (10.242.221.154) by DB4PR07MB0558.eurprd07.prod.outlook.com (10.242.221.154) with Microsoft SMTP Server (TLS) id 15.1.443.7; Fri, 25 Mar 2016 16:57:12 +0000 Received: from DB4PR07MB0558.eurprd07.prod.outlook.com ([10.242.221.154]) by DB4PR07MB0558.eurprd07.prod.outlook.com ([10.242.221.154]) with mapi id 15.01.0443.015; Fri, 25 Mar 2016 16:57:12 +0000 From: "Chevreux, Bastien" To: "bug-gzip@gnu.org" Subject: parallel gzip processes trash hard disks, need larger buffers Thread-Topic: parallel gzip processes trash hard disks, need larger buffers Thread-Index: AdGGq/ZTmWOZ8qQdQeSojYVjxpZlJw== Date: Fri, 25 Mar 2016 16:57:12 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: gnu.org; dkim=none (message not signed) header.d=none;gnu.org; dmarc=none action=none header.from=dsm.com; x-originating-ip: [64.80.133.250] x-ms-office365-filtering-correlation-id: 59ca277a-5cc5-4567-b104-08d354ce82f8 x-microsoft-exchange-diagnostics: 1; DB4PR07MB0558; 5:ilaYBo7lZO4oeZTErQcokesCgjcmLui84j24PWbCMQUEyE7hvU2/8WeI1MCfsA/2ZxHhRgoH1/3szypFfhpkXXrpD+aZPjIA5gOq7tVBOgkDxqUAEcDmaQXxtWx+7xnoVXvz0SGcz/Ot61C5bmTixw==; 24:Xu9mLc03iz4GsedWDRpoxsma+66o2q0UFGFNLwRtDL+Zn/EqM6LH0N3EuO3V/DsLLW4D2AdMVJ1+nYgeG0ZNuBpaqIZeMjjquxUWhvqog80= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB4PR07MB0558; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046); SRVR:DB4PR07MB0558; BCL:0; PCL:0; RULEID:; SRVR:DB4PR07MB0558; x-forefront-prvs: 0892FA9A88 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(38564003)(51884002)(77096005)(6116002)(790700001)(586003)(122556002)(16236675004)(19300405004)(1220700001)(102836003)(2501003)(5008740100001)(1096002)(2351001)(86362001)(3846002)(450100001)(15975445007)(2900100001)(5004730100002)(110136002)(11100500001)(2906002)(19580395003)(92566002)(107886002)(33656002)(5002640100001)(5003600100002)(50986999)(54356999)(76576001)(229853001)(74316001)(5630700001)(3660700001)(15395725005)(189998001)(81166005)(87936001)(66066001)(19617315012)(3280700002)(10400500002)(5640700001)(19625215002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB4PR07MB0558; H:DB4PR07MB0558.eurprd07.prod.outlook.com; FPR:; SPF:None; MLV:ovrnspm; PTR:InfoNoRecords; LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_DB4PR07MB0558963CA6B78E62AC02829985830DB4PR07MB0558eurp_" MIME-Version: 1.0 X-OriginatorOrg: dsm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Mar 2016 16:57:12.5610 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 49618402-6ea3-441d-957d-7df8773fee54 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR07MB0558 X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 25 Mar 2016 14:15:11 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --_000_DB4PR07MB0558963CA6B78E62AC02829985830DB4PR07MB0558eurp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi there, I am using gzip 1.6 to compress large files >10 GiB in parallel (Kubuntu 14= .04, 12 cores). The underlying disk system (RAID 10) is able to deliver rea= d speeds >1 GB/s (measured with flushed file caches, iostat -mx 1 100). Here are some numbers when running gzip in parallel: 1 gzip process: the CPU is the bottleneck in compressing things and utilisa= tion is 100%. 2 gzips in parallel: the disk throughput drops to a meagre 70MB/s and the C= PU utilisation per process is at ~60%. 6 gzips in parallel: the disk throughput fluctuates between 50 and 60 MB/s = and the CPU utilisation per process is at ~18-20%. Running 6 gzips in parallel on the same data residing on a SSD: 100% CPU ut= ilisation per process Googling a bit I found this thread on SuperUser where someone saw the same = behaviour already with a single disk doing normally 125 MB/s and running 4 = gzips drops it to 25 MB/s: http://superuser.com/questions/599329/why-is-gzip-slow-despite-cpu-and-hard= -drive-performance-not-being-maxed-out The posts there propose a workaround like this: buffer -s 100000 -m 10000000 -p 100 < bigfile.dat | gzip > bigfile.dat.gz And indeed, using "buffer" resolves trashing problems when working on a dis= k system. However, using "buffer" is pretty arcane (it isn't even installed= per default on most Unix/Linux installations) and pretty counterintuitive. Would it be possible to have bigger buffers by default (1 MB? 10 MB?) or ha= ve an automatism in gzip like "if file to compress >10 MB and free RAM >500= MB, setup the file buffer to use 1 (10?) MB" ? Alternatively, a command line option to manually set the buffer size? Best, Bastien -- DSM Nutritional Products Microbia Inc | Bioinformatics 60 Westview Street | Lexington, MA 02421 | United States Phone +1 781 259 7613 | Fax +1 781 259 0615 ________________________________ DISCLAIMER: This e-mail is for the intended recipient only. If you have received it by mistake please let us know by reply and then del= ete it from your system; access, disclosure, copying, distribution or relia= nce on any of it by anyone else is prohibited. If you as intended recipient have received this e-mail incorrectly, please = notify the sender (via e-mail) immediately. --_000_DB4PR07MB0558963CA6B78E62AC02829985830DB4PR07MB0558eurp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi there,

 

I am using gzip 1.6 to compress= large files >10 GiB in parallel (Kubuntu 14.04, 12 cores). The underlyi= ng disk system (RAID 10) is able to deliver read speeds >1 GB/s (measure= d with flushed file caches, iostat –mx 1 100).

 

Here are some numbers when runn= ing gzip in parallel:

1 gzip process: the CPU is the = bottleneck in compressing things and utilisation is 100%.

2 gzips in parallel: the disk t= hroughput drops to a meagre 70MB/s and the CPU utilisation per process is a= t ~60%.

6 gzips in parallel: the disk t= hroughput fluctuates between 50 and 60 MB/s and the CPU utilisation per pro= cess is at ~18-20%.

 

Running 6 gzips in parallel on = the same data residing on a SSD: 100% CPU utilisation per process

 

Googling a bit I found this thr= ead on SuperUser where someone saw the same behaviour already with a single= disk doing normally 125 MB/s and running 4 gzips drops it to 25 MB/s:=

http://superuser.com/questions/599329/why-is-gzip-slow-= despite-cpu-and-hard-drive-performance-not-being-maxed-out

 

The posts there propose a worka= round like this:

  buffer -s 100000 -m= 10000000 -p 100 < bigfile.dat | gzip > bigfile.dat.gz

 

And indeed, using “buffer= ” resolves trashing problems when working on a disk system. However, = using “buffer” is pretty arcane (it isn’t even installed = per default on most Unix/Linux installations) and pretty counterintuitive.<= o:p>

 

Would it be possible to have bi= gger buffers by default (1 MB? 10 MB?) or have an automatism in gzip like &= #8220;if file to compress >10 MB and free RAM >500MB, setup the file = buffer to use 1 (10?) MB” ?

 

Alternatively, a command line o= ption to manually set the buffer size?

 

Best,

  Bastien

 

--
DSM Nutritional Products Microbia Inc | Bioinformatics
60 Westview Street | Lexington, MA 02421 | United States
Phone +1 781 259 7613
| Fax +1 781 259 0615

 




DISCLAIMER:
This e-mail is for the intended recipient only.
If you have received it by mistake please let us know by reply and then del= ete it from your system; access, disclosure, copying, distribution or relia= nce on any of it by anyone else is prohibited.
If you as intended recipient have received this e-mail incorrectly, please = notify the sender (via e-mail) immediately.
--_000_DB4PR07MB0558963CA6B78E62AC02829985830DB4PR07MB0558eurp_-- From debbugs-submit-bounces@debbugs.gnu.org Sun Mar 27 00:17:41 2016 Received: (at 23113) by debbugs.gnu.org; 27 Mar 2016 04:17:41 +0000 Received: from localhost ([127.0.0.1]:39106 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ak28y-0002tV-VQ for submit@debbugs.gnu.org; Sun, 27 Mar 2016 00:17:41 -0400 Received: from mail-oi0-f66.google.com ([209.85.218.66]:36391) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ak28v-0002tG-Mb for 23113@debbugs.gnu.org; Sun, 27 Mar 2016 00:17:39 -0400 Received: by mail-oi0-f66.google.com with SMTP id u194so1718846oia.3 for <23113@debbugs.gnu.org>; Sat, 26 Mar 2016 21:17:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=TqTWrwXeRIC9Wtfy+R9mWBHWYGgu/IsL5yWQcw/iXA4=; b=V8x5M2h0BiYxkR2/+oBexKq+MIopgS1kjPemA2kq1iznyBonnN7ElJZ5QbQt49S5wh XvAiKJJSjlHFwfm2Hwh9PG/vOeNgT/Lm3r7Ih+dM0KP8LRi0k2faO9aTYSAU3/rwi5Rb TQTfo0hzJ4UdCuMlYt4Yc2X7xCdsXKIumhlHj8KjT7k0lXWOUL8nBTqiEAeZ/eXqYVOt J4sn4fCb1bFajFB+B2iWuD8xgXPz5JMZofzADsqgeKd8H7py405G2qSIQaA2QTQDcseq uwYtg9WVMs/D8Fq48ucaPZqqywEQbZiYwJniKPGQoNSVOa6bDaaooqWZKhgNPUQZ5qA8 SpIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=TqTWrwXeRIC9Wtfy+R9mWBHWYGgu/IsL5yWQcw/iXA4=; b=N89xu0PylI2PXTPTYngqgSbGzMckSxBrNunbGkne26yoedWVmVUE2A+1ZR1BREi1tN oHczLxhIWXJUiZMUbsrkOlMcCU00rscJ+EqaoIX9PuiZsQmPPqL0SC0Zos6A6LGfvTLT JzC08RxmPIvnIzJysGLFuzajCaEcclrNmQfKCq9Pne//fPiRuAyC3oe4LBGBJ2M0LFzt TCSm4PBAtOb0lq1H3UGRvo3oZe3YPyymjFN3VSYcHzf0jghdEA1su5CY0L61UIMlZb1O P851MKCB5KgrcDAlYrL2ZgeIQc8N6awYkAriCzB3q31CZmBLWq22i0c09mQ53PfI5hNw 8Pjg== X-Gm-Message-State: AD7BkJI/i7mWF4xtJoMvzyCap3QxeOfI6bC0blhZdzrMSPpVEAOn085yjST2DU2M7dVQaQMwEncljYtJIIy0ZA== X-Received: by 10.157.20.146 with SMTP id d18mr8824542ote.172.1459052251855; Sat, 26 Mar 2016 21:17:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.44.5 with HTTP; Sat, 26 Mar 2016 21:17:11 -0700 (PDT) In-Reply-To: References: From: Jim Meyering Date: Sat, 26 Mar 2016 21:17:11 -0700 X-Google-Sender-Auth: tpJJ37UWsqy_FGvBPmzkB2i7kbU Message-ID: Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger buffers To: "Chevreux, Bastien" Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 23113 Cc: 23113@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Fri, Mar 25, 2016 at 9:57 AM, Chevreux, Bastien wrote: > Hi there, > > I am using gzip 1.6 to compress large files >10 GiB in parallel (Kubuntu 14.04, 12 cores). The underlying disk system (RAID 10) is able to deliver read speeds >1 GB/s (measured with flushed file caches, iostat -mx 1 100). > > Here are some numbers when running gzip in parallel: > 1 gzip process: the CPU is the bottleneck in compressing things and utilisation is 100%. > 2 gzips in parallel: the disk throughput drops to a meagre 70MB/s and the CPU utilisation per process is at ~60%. > 6 gzips in parallel: the disk throughput fluctuates between 50 and 60 MB/s and the CPU utilisation per process is at ~18-20%. > > Running 6 gzips in parallel on the same data residing on a SSD: 100% CPU utilisation per process > > Googling a bit I found this thread on SuperUser where someone saw the same behaviour already with a single disk doing normally 125 MB/s and running 4 gzips drops it to 25 MB/s: > http://superuser.com/questions/599329/why-is-gzip-slow-despite-cpu-and-hard-drive-performance-not-being-maxed-out > > The posts there propose a workaround like this: > buffer -s 100000 -m 10000000 -p 100 < bigfile.dat | gzip > bigfile.dat.gz > > And indeed, using "buffer" resolves trashing problems when working on a disk system. However, using "buffer" is pretty arcane (it isn't even installed per default on most Unix/Linux installations) and pretty counterintuitive. > > Would it be possible to have bigger buffers by default (1 MB? 10 MB?) or have an automatism in gzip like "if file to compress >10 MB and free RAM >500MB, setup the file buffer to use 1 (10?) MB" ? > > Alternatively, a command line option to manually set the buffer size? Thanks for the report and suggestions. However, I suggest that you consider using xz in place of gzip. Not only can it compress better, it also works faster for comparable compression ratios. That said, if you find that setting gzip.h's INBUFSIZ or OUTBUFSIZ to larger values makes a significant difference, we'd like to hear about the results and how you measured. From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 29 19:03:55 2016 Received: (at 23113) by debbugs.gnu.org; 29 Mar 2016 23:03:55 +0000 Received: from localhost ([127.0.0.1]:44213 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1al2fy-0007Dw-MY for submit@debbugs.gnu.org; Tue, 29 Mar 2016 19:03:54 -0400 Received: from mail-ve1eur01on0061.outbound.protection.outlook.com ([104.47.1.61]:58126 helo=EUR01-VE1-obe.outbound.protection.outlook.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1al2fv-0007Df-37 for 23113@debbugs.gnu.org; Tue, 29 Mar 2016 19:03:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=DSM1234.onmicrosoft.com; s=selector1-dsm-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=pEECjxKqAb+34uu+ZU+W/OSR/Ch51J/EYszvhyqIftQ=; b=ktxnC4bYB3FVXow5OROJhAJ0DFLDIkiMiD3TSII8UxzrmeNh08GnOuaS9pWvzMA16bUc7RQfCB3ymzDpVb0NixhrxL/e8GxKs7zMmRjIvE+QeTqBY80QEYn1XMpDYuVQ8Sn7RD0MEjmExPVG/J4JxCciuXEEpYLpy5Px0aN8+Fk= Received: from DB3PR07MB0556.eurprd07.prod.outlook.com (2a01:111:e400:9431::25) by DB3PR07MB0555.eurprd07.prod.outlook.com (2a01:111:e400:9431::24) with Microsoft SMTP Server (TLS) id 15.1.447.15; Tue, 29 Mar 2016 23:03:44 +0000 Received: from DB3PR07MB0556.eurprd07.prod.outlook.com ([fe80::7132:bdda:7f4b:3844]) by DB3PR07MB0556.eurprd07.prod.outlook.com ([fe80::7132:bdda:7f4b:3844%17]) with mapi id 15.01.0447.023; Tue, 29 Mar 2016 23:03:44 +0000 From: "Chevreux, Bastien" To: Jim Meyering Subject: RE: bug#23113: parallel gzip processes trash hard disks, need larger buffers Thread-Topic: bug#23113: parallel gzip processes trash hard disks, need larger buffers Thread-Index: AdGGq/ZTmWOZ8qQdQeSojYVjxpZlJwBM5JKAAIh+zYA= Date: Tue, 29 Mar 2016 23:03:44 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: meyering.net; dkim=none (message not signed) header.d=none;meyering.net; dmarc=none action=none header.from=dsm.com; x-originating-ip: [64.80.133.250] x-ms-office365-filtering-correlation-id: 63a5534a-66b2-4dae-4249-08d3582660fa x-microsoft-exchange-diagnostics: 1; DB3PR07MB0555; 5:2HZq/ZvYfp5onbnT9lip9u8nkfadzgdD8utYbVPvWw40CAiNaBoA6D3GN5Ge7TFwwIOme19A6dawv5ntouIgkT+Z3vDYnvqIsoHyE7Oxw1bYnAVEHfyRB6vpzfjr2n3wVuAUTcTgMbpl9Px3bN6iAw==; 24:1X/sjaDfRm1YcL9mC0SRSbBTn513GmBRT2/e+LtWfD74dRhLe62VIx9GQNsSIDiQDgPSOC7Tz3XOUihLE2/lNLdD0sc/gHFphlWA8pGjpnw= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB3PR07MB0555; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001); SRVR:DB3PR07MB0555; BCL:0; PCL:0; RULEID:; SRVR:DB3PR07MB0555; x-forefront-prvs: 0896BFCE6C x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(38564003)(189998001)(110136002)(76176999)(5003600100002)(6116002)(3846002)(5004730100002)(86362001)(586003)(102836003)(54356999)(92566002)(1096002)(33656002)(3280700002)(50986999)(1220700001)(66066001)(2950100001)(87936001)(2900100001)(11100500001)(5008740100001)(2906002)(4326007)(3660700001)(19580405001)(81166005)(5002640100001)(5250100002)(76576001)(74316001)(19580395003); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR07MB0555; H:DB3PR07MB0556.eurprd07.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 MIME-Version: 1.0 X-OriginatorOrg: dsm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Mar 2016 23:03:44.6824 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 49618402-6ea3-441d-957d-7df8773fee54 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR07MB0555 X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 23113 Cc: "23113@debbugs.gnu.org" <23113@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) PiBGcm9tOiBtZXllcmluZ0BnbWFpbC5jb20gW21haWx0bzptZXllcmluZ0BnbWFpbC5jb21dIE9u IEJlaGFsZiBPZiBKaW0gTWV5ZXJpbmcNCj4gWy4uLl0NCj4gSG93ZXZlciwgSSBzdWdnZXN0IHRo YXQgeW91IGNvbnNpZGVyIHVzaW5nIHh6IGluIHBsYWNlIG9mIGd6aXAuDQo+IE5vdCBvbmx5IGNh biBpdCBjb21wcmVzcyBiZXR0ZXIsIGl0IGFsc28gd29ya3MgZmFzdGVyIGZvciBjb21wYXJhYmxl IGNvbXByZXNzaW9uIHJhdGlvcy4NCg0KeHogaXMgbm90IGEgdmlhYmxlIGFsdGVybmF0aXZlIGlu IHRoaXMgY2FzZTogdXNlIGNhc2UgaXMgbm90IGFyY2hpdmluZy4gVGhlcmUgaXMgYSBwbGV0aG9y YSBvZiBwcm9ncmFtcyBvdXQgdGhlcmUgd2l0aCB6bGliIHN1cHBvcnQgY29tcGlsZWQgaW4gYW5k IHRoZXNlIHdvbid0IHdvcmsgb24geHogcGFja2VkIGRhdGEuIEZ1cnRoZXJtb3JlLCBnemlwIC0x IGlzIGFwcHJveGltYXRlbHkgNCB0aW1lcyBmYXN0ZXIgdGhhbiB4eiAtMSBvbiBGQVNUUSBmaWxl cyAoc2VxdWVuY2luZyBkYXRhKSwgYW5kIHRoZSB1c2UgY2FzZSBoZXJlIGlzICJ0ZW1wb3Jhcnkg cmVzdWx0cywgc28gb2staXNoIGNvbXByZXNzaW9uIGluIGEgY29tcGFyYXRpdmVseSBzaG9ydCBh bW91bnQgb2YgdGltZSIuIEd6aXAgaXMgaWRlYWwgaW4gdGhhdCByZXNwZWN0IGFzIGV2ZW4gYXQg LTEgaXQgY29tcHJlc3NlcyBkb3duIHRvIH4yNS0zNSUgLi4uIGFuZCB0aGF0IGFscmVhZHkgaGVs cHMgYSBsb3Qgd2hlbiB5b3UgZG8gbm90IG5lZWQgMSBUaUIgb2YgaGFyZCBkaXNrIGJ1dCBvbmx5 IH4zNTAgR2lCLiBHemlwIC0xIHRha2VzIH40LjUgaHJzLCB4eiAtMSBhbG1vc3QgYSBkYXkuDQoN Cj4gVGhhdCBzYWlkLCBpZiB5b3UgZmluZCB0aGF0IHNldHRpbmcgZ3ppcC5oJ3MgSU5CVUZTSVog b3IgT1VUQlVGU0laIHRvIGxhcmdlciB2YWx1ZXMgbWFrZXMgYSBzaWduaWZpY2FudCBkaWZmZXJl bmNlLCB3ZSdkIGxpa2UgdG8gaGVhciBhYm91dCB0aGUgcmVzdWx0cyBhbmQgaG93IHlvdSBtZWFz dXJlZC4NCg0KQ2hhbmdpbmcgdGhlIElOQlVGU0laIGRpZCBub3QgaGF2ZSB0aGUgZWZmZWN0IGhv cGVkIGZvciBhcyB0aGlzIGlzIGp1c3QgdGhlIGJ1ZmZlciBzaXplIGFsbG9jYXRlZCBieSBnemlw IC4uLiBidXQgaW4gdGhlIGVuZCBpdCB1c2VzIG9ubHkgNjRrIGF0IG1vc3QgIGFuZCB0aGUgY2Fs bHMgdG8gdGhlIGZpbGUgc3lzdGVtIHJlYWQoKSBldmVuIGVuZCB1cCB0byByZXF1ZXN0IG9ubHkg MzJrIHBlciBjYWxsLg0KDQpJIHRyYWNlZCB0aGlzIGRvd24gdGhyb3VnaCBtdWx0aXBsZSBsYXll cnMgdG8gdGhlIGZ1bmN0aW9uIGZpbGxfd2luZG93KCkgaW4gZGVmbGF0ZS5jLCB3aGVyZSB0aGlu Z3MgZ2V0IHJlYWxseSBpbnRyaWNhdGUgdXNpbmcgbXVsdGlwbGUgcHJlLXNldCB2YXJpYWJsZXMs IGRlZmluZXMgYW5kIG1lbWNweSgpcy4gSXQgYmVjYW1lIGNsZWFyIHRoYXQgdGhlIGNvZGUgaXMg Z2VhcmVkIHRvd2FyZHMgdXNpbmcgYSA2NGsgYnVmZmVyIHdpdGggYSByb2xsaW5nIHdpbmRvdyBv ZiAzMmsuIE9wdGltaXNlZCBmb3IgMTYgYml0IG1hY2hpbmVzIHRoYXQgaXMuDQoNClRoZXJlIGFy ZSBhIGZldyBtZW50aW9ucyBvZiBTTUFMTF9NRU0sIE1FRElVTV9NRU0gYW5kIEJJR19NRU0gdmFy aWFudHMgdmlhIGRlZmluZXMuIEhvd2V2ZXIsIGNvZGUgY29tbWVudHMgc2F5IHRoYXQgQklHX01F TSB3b3VsZCB3b3JrIG9uIGEgY29tcGxldGUgZmlsZSBsb2FkZWQgaW4gbWVtb3J5IC4uLiB3aGlj aCBpcyBhIG5vLWdvIGZvciBmaWxlcyBpbiB0aGUgYXJlYSBvZiAxNSB0byAzMCBHaUIuIEknbSBu b3QgZXZlbiBzdXJlIHRoZSBjb2RlIHdvdWxkIGJlIGRvaW5nIHdoYXQgdGhlIGNvbW1lbnRzIHNh eS4NCg0KTG9uZyBzdG9yeSBzaG9ydDogSSBkbyBub3QgZmVlbCBleHBlcnQgZW5vdWdoIHRvIHRv dWNoIHNhaWQgZnVuY3Rpb25zIGFuZCBjaGFuZ2UgdGhlbSB0byBwcm92aWRlIGZvciBsYXJnZXIg aW5wdXQgYnVmZmVyaW5nLiBJZiBJIHdlcmUgZm9yY2VkIHRvIGltcGxlbWVudCBzb21ldGhpbmcg SSdkIHRyeSBpdCB3aXRoIGFuIG91dGVyIGJ1ZmZlcmluZyBsYXllciwgYnV0IEknbSBub3Qgc3Vy ZSBpdCB3b3VsZCBiZSBlbGVnYW50IG9yIGV2ZW4gZWZmaWNpZW50Lg0KDQpCZXN0LA0KICBCYXN0 aWVuDQoNClBTOiB0aGVuIGFnYWluIEknbSB0b3lpbmcgd2l0aCB0aGUgaWRlYSB0byB3cml0ZSBh IHNpbXBsZSBnemlwLXBhY2tlciByZXBsYWNlbWVudCB3aGljaCBzaW1wbHkgYnVmZmVycyBkYXRh IGFuZCBwYXNzZXMgaXQgdG8gemxpYi4NCg0KLS0NCkRTTSBOdXRyaXRpb25hbCBQcm9kdWN0cyBN aWNyb2JpYSBJbmMgfCBCaW9pbmZvcm1hdGljcw0KNjAgV2VzdHZpZXcgU3RyZWV0IHwgTGV4aW5n dG9uLCBNQSAwMjQyMSB8IFVuaXRlZCBTdGF0ZXMNClBob25lICsxIDc4MSAyNTkgNzYxMyB8IEZh eCArMSA3ODEgMjU5IDA2MTUNCg0KDQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXw0K DQpESVNDTEFJTUVSOg0KVGhpcyBlLW1haWwgaXMgZm9yIHRoZSBpbnRlbmRlZCByZWNpcGllbnQg b25seS4NCklmIHlvdSBoYXZlIHJlY2VpdmVkIGl0IGJ5IG1pc3Rha2UgcGxlYXNlIGxldCB1cyBr bm93IGJ5IHJlcGx5IGFuZCB0aGVuIGRlbGV0ZSBpdCBmcm9tIHlvdXIgc3lzdGVtOyBhY2Nlc3Ms IGRpc2Nsb3N1cmUsIGNvcHlpbmcsIGRpc3RyaWJ1dGlvbiBvciByZWxpYW5jZSBvbiBhbnkgb2Yg aXQgYnkgYW55b25lIGVsc2UgaXMgcHJvaGliaXRlZC4NCklmIHlvdSBhcyBpbnRlbmRlZCByZWNp cGllbnQgaGF2ZSByZWNlaXZlZCB0aGlzIGUtbWFpbCBpbmNvcnJlY3RseSwgcGxlYXNlIG5vdGlm eSB0aGUgc2VuZGVyICh2aWEgZS1tYWlsKSBpbW1lZGlhdGVseS4NCg== From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 03 11:56:27 2016 Received: (at 23113) by debbugs.gnu.org; 3 Apr 2016 15:56:27 +0000 Received: from localhost ([127.0.0.1]:50111 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amkO3-0008Jj-3y for submit@debbugs.gnu.org; Sun, 03 Apr 2016 11:56:27 -0400 Received: from bitwagon.com ([74.82.39.175]:41494) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1amZtL-00073G-16 for 23113@debbugs.gnu.org; Sun, 03 Apr 2016 00:44:03 -0400 Received: from f22e64.local ([24.21.156.164]) by bitwagon.com for <23113@debbugs.gnu.org>; Sat, 2 Apr 2016 21:43:51 -0700 To: 23113@debbugs.gnu.org From: John Reiser Subject: alternatives: parallel gzip processes trash hard disks Message-ID: <57009F86.1000908@bitwagon.com> Date: Sat, 2 Apr 2016 21:43:50 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23113 X-Mailman-Approved-At: Sun, 03 Apr 2016 11:56:26 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Here are some other approaches which may help: 1. Use gzopen() from zlib to compress the 10GB file as it is generated. This uses only one CPU core and requires sequential writing only (no random writes) but that may be enough in some cases. 2. The output from gzip is written 32KiB at at time, so a large output file involves growing the file many times. Thus buffering the output from gzip into larger blocks may help, too. Try: gzip ... | dd obs=... of=... 3. Similarly, dd can buffer the input to gzip: dd if=... ibs=... obs=... | gzip ... 4. dd can also be used to create multiple streams of input from a single file: (dd if=file ibs=... skip=0*N count=N obs=... | gzip ... ) & (dd if=file ibs=... skip=1*N count=N obs=... | gzip ... ) & (dd if=file ibs=... skip=2*N count=N obs=... | gzip ... ) & (dd if=file ibs=... skip=3*N count=N obs=... | gzip ... ) & However dd does not perform arithmetic, so the multiplication j*N must be given as a literal result. The dd utility program is quite versatile! From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 10 03:49:29 2016 Received: (at 23113) by debbugs.gnu.org; 10 Apr 2016 07:49:29 +0000 Received: from localhost ([127.0.0.1]:56521 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apA7d-0008SO-0s for submit@debbugs.gnu.org; Sun, 10 Apr 2016 03:49:29 -0400 Received: from mail.alumni.caltech.edu ([131.215.242.114]:7375) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1apA7b-0008S9-2T for 23113@debbugs.gnu.org; Sun, 10 Apr 2016 03:49:27 -0400 Received: from [10.0.1.29] (unknown [97.90.41.147]) (Authenticated sender: madler) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id D7C1A120094; Sun, 10 Apr 2016 00:49:19 -0700 (PDT) X-DKIM: Sendmail DKIM Filter v2.8.3 mail.alumni.caltech.edu D7C1A120094 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=alumni.caltech.edu; s=enforce; t=1460274560; bh=RERxs0oRXaKZh6A2PFqUtnGRaJ8ru3a+gPlt/UpJtUM=; h=Subject:Mime-Version:Content-Type:From:In-Reply-To:Date:Cc: Content-Transfer-Encoding:Message-Id:References:To; b=UwrBOnaW5oOrXX+/Weju2U+zjOavooXNIwduNjdDgUQtNFyz7dk1TWCpE4V8iX4Qm N9f2lAxD80kh7p2AJZL6fbAAbBpmjkElkWy7kXWb+h6+kb43xIzJ8J4+syRt30Espn y7U5xBjemup8S4voN0ODMl0+XhfKxThfq/3hzuBM= Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger buffers Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Content-Type: text/plain; charset=us-ascii From: Mark Adler In-Reply-To: Date: Sun, 10 Apr 2016 00:49:17 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> References: To: "Chevreux, Bastien" X-Mailer: Apple Mail (2.3124) X-MailScanner-Information-Alumni: X-Alumni-MailScanner-ID: D7C1A120094.A1F93 X-MailScanner-Alumni: No Virii found X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.1, required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10) X-MailScanner-From: madler@alumni.caltech.edu X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 23113 Cc: Jim Meyering , "23113@debbugs.gnu.org" <23113@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Bastien, pigz (a parallel version of gzip) has a variable buffer size. The -b or = --blocksize option allows up to 512 MB buffers, defaulting to 128K. See = http://zlib.net/pigz/ Mark > On Mar 29, 2016, at 4:03 PM, Chevreux, Bastien = wrote: >=20 >> From: meyering@gmail.com [mailto:meyering@gmail.com] On Behalf Of Jim = Meyering >> [...] >> However, I suggest that you consider using xz in place of gzip. >> Not only can it compress better, it also works faster for comparable = compression ratios. >=20 > xz is not a viable alternative in this case: use case is not = archiving. There is a plethora of programs out there with zlib support = compiled in and these won't work on xz packed data. Furthermore, gzip -1 = is approximately 4 times faster than xz -1 on FASTQ files (sequencing = data), and the use case here is "temporary results, so ok-ish = compression in a comparatively short amount of time". Gzip is ideal in = that respect as even at -1 it compresses down to ~25-35% ... and that = already helps a lot when you do not need 1 TiB of hard disk but only = ~350 GiB. Gzip -1 takes ~4.5 hrs, xz -1 almost a day. >=20 >> That said, if you find that setting gzip.h's INBUFSIZ or OUTBUFSIZ to = larger values makes a significant difference, we'd like to hear about = the results and how you measured. >=20 > Changing the INBUFSIZ did not have the effect hoped for as this is = just the buffer size allocated by gzip ... but in the end it uses only = 64k at most and the calls to the file system read() even end up to = request only 32k per call. >=20 > I traced this down through multiple layers to the function = fill_window() in deflate.c, where things get really intricate using = multiple pre-set variables, defines and memcpy()s. It became clear that = the code is geared towards using a 64k buffer with a rolling window of = 32k. Optimised for 16 bit machines that is. >=20 > There are a few mentions of SMALL_MEM, MEDIUM_MEM and BIG_MEM variants = via defines. However, code comments say that BIG_MEM would work on a = complete file loaded in memory ... which is a no-go for files in the = area of 15 to 30 GiB. I'm not even sure the code would be doing what the = comments say. >=20 > Long story short: I do not feel expert enough to touch said functions = and change them to provide for larger input buffering. If I were forced = to implement something I'd try it with an outer buffering layer, but I'm = not sure it would be elegant or even efficient. >=20 > Best, > Bastien >=20 > PS: then again I'm toying with the idea to write a simple gzip-packer = replacement which simply buffers data and passes it to zlib. >=20 > -- > DSM Nutritional Products Microbia Inc | Bioinformatics > 60 Westview Street | Lexington, MA 02421 | United States > Phone +1 781 259 7613 | Fax +1 781 259 0615 >=20 >=20 > ________________________________ >=20 > DISCLAIMER: > This e-mail is for the intended recipient only. > If you have received it by mistake please let us know by reply and = then delete it from your system; access, disclosure, copying, = distribution or reliance on any of it by anyone else is prohibited. > If you as intended recipient have received this e-mail incorrectly, = please notify the sender (via e-mail) immediately. From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 11 22:30:31 2016 Received: (at 23113) by debbugs.gnu.org; 12 Apr 2016 02:30:31 +0000 Received: from localhost ([127.0.0.1]:59895 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apo63-0004bO-Bf for submit@debbugs.gnu.org; Mon, 11 Apr 2016 22:30:31 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:36699) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apo61-0004bB-Q2 for 23113@debbugs.gnu.org; Mon, 11 Apr 2016 22:30:30 -0400 Received: by mail-oi0-f65.google.com with SMTP id v126so525738oia.3 for <23113@debbugs.gnu.org>; Mon, 11 Apr 2016 19:30:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=N0S0gAGH+zd0uh8BJT1z+y2d37qVbI+NVPZDkIQeCTY=; b=K+hvTWdbLDBJnQh4KMDpuyUzDwclu6XeDMbXLdvg6azXdRfL6ziw+Vt8wq3nk2vb6Z CgA4dpidPSKgu6q4gYQ6HbFk/Np2j+3m/BAhyRUOCA0cB8HIjtgehulkFWJubPbC53az 7zkkMxHcb1JsCDuq5x8WjXa1gdZ/27fOG8UC1fqJ2GkdtD0DfV9Jspa0wlYMAyJyyVmr aMLqYIqPvhdn9U3g07d0BsIe6ucWFVPIIyNK1G1YEpIazjbpBewYALrKw8kHNO5qcQYd DFabKmDbbruD64FlZNtPWRoe8dTS4wm05wXUXUes5t+EPLksbLs2UUOxK5HuPWR+NAUl yYKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=N0S0gAGH+zd0uh8BJT1z+y2d37qVbI+NVPZDkIQeCTY=; b=U+jHj9VVLbP21cbjqGhKQu3oFUQ5J1DzrfA3QOguEQRk4T2RKaOINpIlJPABJWMQy7 9eVTidVzreBHuoVXBM+9VtwpQrdBUvOs15cCMZ5Qv4vluT8Vtf3adUq+8SrOkRIhwzOm Ch7t4ZrZcj63YIPTTAlpX2Y3v38S3WhMQzrgRRVcoKklurHTce9ObtyhUUDVXUjiz3f3 0/Y1RAPQShAfPczLRrAPchciE12vpSFP0jPNf5FfJBXkju4Nrn9VBMgMmFT5OhBwH/AW 4193xmqGrPD7A96KvEFJ41kXrFawhvoOJ4MtdIYBteg31QbM2HHqWK105mvhkG9HE3z1 FXCw== X-Gm-Message-State: AOPr4FUjzaEH2d9mH2sr1DcH1a9/087DKIJKWDtyqmFnCgbCeyaWCzmi6zZbMx96ZkNo1rBxdZ+taT4PF65LxQ== X-Received: by 10.202.218.133 with SMTP id r127mr335470oig.36.1460428224179; Mon, 11 Apr 2016 19:30:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.213.141 with HTTP; Mon, 11 Apr 2016 19:30:04 -0700 (PDT) In-Reply-To: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> References: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> From: Jim Meyering Date: Mon, 11 Apr 2016 19:30:04 -0700 X-Google-Sender-Auth: slO5teG4bWosLUgmcnjiSBoF560 Message-ID: Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger buffers To: Mark Adler Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 23113 Cc: "Chevreux, Bastien" , "23113@debbugs.gnu.org" <23113@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Sun, Apr 10, 2016 at 12:49 AM, Mark Adler wrote: > Bastien, > > pigz (a parallel version of gzip) has a variable buffer size. The -b or --blocksize option allows up to 512 MB buffers, defaulting to 128K. See http://zlib.net/pigz/ Thanks for the reminder about pigz, Mark. This is yet another reason to consider gzip is in maintenance-only mode, i.e., the barrier to adding new features is even higher, given that pigz is so compatible, yet with added features and the benefit of a modern codebase. From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 12 00:55:15 2016 Received: (at 23113) by debbugs.gnu.org; 12 Apr 2016 04:55:15 +0000 Received: from localhost ([127.0.0.1]:59931 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apqM6-00083g-TY for submit@debbugs.gnu.org; Tue, 12 Apr 2016 00:55:15 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:55902) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1apqM4-00083N-La for 23113@debbugs.gnu.org; Tue, 12 Apr 2016 00:55:13 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CED431601ED; Mon, 11 Apr 2016 21:55:05 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id DHbzbgvcKV81; Mon, 11 Apr 2016 21:55:05 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1B1B3160D71; Mon, 11 Apr 2016 21:55:05 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id qZ92-7npyxZX; Mon, 11 Apr 2016 21:55:05 -0700 (PDT) Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id ECCFF1601ED; Mon, 11 Apr 2016 21:55:04 -0700 (PDT) Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger buffers To: Jim Meyering , Mark Adler References: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <570C7FA8.9020808@cs.ucla.edu> Date: Mon, 11 Apr 2016 21:55:04 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23113 Cc: "Chevreux, Bastien" , "23113@debbugs.gnu.org" <23113@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Jim Meyering wrote: > Thanks for the reminder about pigz, Mark. > This is yet another reason to consider gzip is in maintenance-only mode, > i.e., the barrier to adding new features is even higher, given that > pigz is so compatible, yet with added features and the benefit of > a modern codebase. It'd be nice if we could migrate GNU gzip into merely being a front-end for pigz somehow. From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 12 12:55:39 2016 Received: (at 23113) by debbugs.gnu.org; 12 Apr 2016 16:55:39 +0000 Received: from localhost ([127.0.0.1]:32995 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aq1bH-0003r4-FP for submit@debbugs.gnu.org; Tue, 12 Apr 2016 12:55:39 -0400 Received: from mail-db3on0056.outbound.protection.outlook.com ([157.55.234.56]:41853 helo=emea01-db3-obe.outbound.protection.outlook.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aq1bE-0003qp-RL for 23113@debbugs.gnu.org; Tue, 12 Apr 2016 12:55:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=DSM1234.onmicrosoft.com; s=selector1-dsm-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=brlf4MbWBz7C7I4b2M/UKj6qKy8orOqETTLKkcS6yDI=; b=oF4yCzEqkXBoM+56xV2+NwH/oJ6fl4mryyp3OwHiVoZb/aXIRkP13R4pp4ugTNtKov/55IJ3cZ8eXh2GWCPFn83JR6Ocd1kXOd7xDYLTMIOJ9qYIQyhQtb0kDoiEEQU9b9Fx9cMUWNSyBu3l7gMXFf14WGeF0cmB8lTzyht8dH4= Received: from DB3PR07MB0556.eurprd07.prod.outlook.com (2a01:111:e400:9431::25) by DB3PR07MB0554.eurprd07.prod.outlook.com (2a01:111:e400:9431::23) with Microsoft SMTP Server (TLS) id 15.1.447.15; Tue, 12 Apr 2016 16:55:30 +0000 Received: from DB3PR07MB0556.eurprd07.prod.outlook.com ([fe80::7132:bdda:7f4b:3844]) by DB3PR07MB0556.eurprd07.prod.outlook.com ([fe80::7132:bdda:7f4b:3844%17]) with mapi id 15.01.0447.029; Tue, 12 Apr 2016 16:55:30 +0000 From: "Chevreux, Bastien" To: Mark Adler Subject: RE: bug#23113: parallel gzip processes trash hard disks, need larger buffers Thread-Topic: bug#23113: parallel gzip processes trash hard disks, need larger buffers Thread-Index: AdGGq/ZTmWOZ8qQdQeSojYVjxpZlJwBM5JKAAIh+zYACPv34gABzE9mQ Date: Tue, 12 Apr 2016 16:55:30 +0000 Message-ID: References: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> In-Reply-To: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: alumni.caltech.edu; dkim=none (message not signed) header.d=none; alumni.caltech.edu; dmarc=none action=none header.from=dsm.com; x-originating-ip: [64.80.133.250] x-ms-office365-filtering-correlation-id: fab3acaa-ab0e-46ee-08fa-08d362f3416c x-microsoft-exchange-diagnostics: 1; DB3PR07MB0554; 5:8GalqoCQW+RA9hKiKhmIOqQLWgpYUEWb/hqgopbN9/NiSn0AaYv5xnpIT09zxjVdXEPq6u7MI7Py0EWW4hLqvRLAlgGl7Kcel9cjC+Vs3aG0dINnFYmV+bembLcUiRRttQteMkNNntHZaY1wgUsUDQ==; 24:7mFQYRjiVeSHI40pURkxjVASolb8HA/RV7q0LXloD2VkYMybW0/+CBJpmwFGBJ2YvR/igx7ec2yEudj7F8Ce+o5u8+QS+WJFX1Po8gPoFwc= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB3PR07MB0554; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026); SRVR:DB3PR07MB0554; BCL:0; PCL:0; RULEID:; SRVR:DB3PR07MB0554; x-forefront-prvs: 0910AAF391 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(377454003)(13464003)(38564003)(24454002)(50986999)(9686002)(76176999)(54356999)(11100500001)(4326007)(66066001)(15975445007)(92566002)(1220700001)(5003600100002)(1096002)(93886004)(5004730100002)(5250100002)(2171001)(87936001)(345774005)(15395725005)(19580395003)(19580405001)(102836003)(6116002)(586003)(33656002)(3846002)(110136002)(86362001)(2950100001)(189998001)(74316001)(2906002)(76576001)(3660700001)(3280700002)(2900100001)(5008740100001)(81166005)(5002640100001); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR07MB0554; H:DB3PR07MB0556.eurprd07.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: dsm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Apr 2016 16:55:30.2452 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 49618402-6ea3-441d-957d-7df8773fee54 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR07MB0554 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 23113 Cc: Jim Meyering , "23113@debbugs.gnu.org" <23113@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Mark, I knew about pigz, albeit not about -b, thank you for that. Together with -= p 1 that would replicate gzip and implement input buffering well enough to = be used in parallel pipelines (where you do not want, e.g., 40 pipelines ru= nning 40 pigz with 40 threads each). Questions: how stable / error proof is pigz compared to gzip? I always shie= d away from it as gzip is so much tried and tested that errors are unlikely= ... and the zlib.net homepage does not make an "official" statement like "= you should all now move to pigz, it's good and tested enough." Additional q= uestion: is there a pigzlib planned? :-) Jim, Paul: I'd say that this thread/bug can be closed if pigz proves to be = as stable / error free as gzip. I suppose that while backporting -b to gzip= could be done, it would not make much sense. Best, Bastien --=20 DSM Nutritional Products Microbia Inc | Bioinformatics 60 Westview Street | Lexington, MA 02421 | United States Phone +1 781 259 7613 | Fax +1 781 259 0615 -----Original Message----- From: Mark Adler [mailto:madler@alumni.caltech.edu]=20 Sent: Sonntag, 10. April 2016 03:49 To: Chevreux, Bastien Cc: Jim Meyering; 23113@debbugs.gnu.org Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larg= er buffers Bastien, pigz (a parallel version of gzip) has a variable buffer size. The -b or --b= locksize option allows up to 512 MB buffers, defaulting to 128K. See http:/= /zlib.net/pigz/ Mark > On Mar 29, 2016, at 4:03 PM, Chevreux, Bastien = wrote: >=20 >> From: meyering@gmail.com [mailto:meyering@gmail.com] On Behalf Of Jim=20 >> Meyering [...] However, I suggest that you consider using xz in place=20 >> of gzip. >> Not only can it compress better, it also works faster for comparable com= pression ratios. >=20 > xz is not a viable alternative in this case: use case is not archiving. T= here is a plethora of programs out there with zlib support compiled in and = these won't work on xz packed data. Furthermore, gzip -1 is approximately 4= times faster than xz -1 on FASTQ files (sequencing data), and the use case= here is "temporary results, so ok-ish compression in a comparatively short= amount of time". Gzip is ideal in that respect as even at -1 it compresses= down to ~25-35% ... and that already helps a lot when you do not need 1 Ti= B of hard disk but only ~350 GiB. Gzip -1 takes ~4.5 hrs, xz -1 almost a da= y. >=20 >> That said, if you find that setting gzip.h's INBUFSIZ or OUTBUFSIZ to la= rger values makes a significant difference, we'd like to hear about the res= ults and how you measured. >=20 > Changing the INBUFSIZ did not have the effect hoped for as this is just t= he buffer size allocated by gzip ... but in the end it uses only 64k at mos= t and the calls to the file system read() even end up to request only 32k = per call. >=20 > I traced this down through multiple layers to the function fill_window() = in deflate.c, where things get really intricate using multiple pre-set vari= ables, defines and memcpy()s. It became clear that the code is geared towar= ds using a 64k buffer with a rolling window of 32k. Optimised for 16 bit ma= chines that is. >=20 > There are a few mentions of SMALL_MEM, MEDIUM_MEM and BIG_MEM variants vi= a defines. However, code comments say that BIG_MEM would work on a complete= file loaded in memory ... which is a no-go for files in the area of 15 to = 30 GiB. I'm not even sure the code would be doing what the comments say. >=20 > Long story short: I do not feel expert enough to touch said functions and= change them to provide for larger input buffering. If I were forced to imp= lement something I'd try it with an outer buffering layer, but I'm not sure= it would be elegant or even efficient. >=20 > Best, > Bastien >=20 > PS: then again I'm toying with the idea to write a simple gzip-packer rep= lacement which simply buffers data and passes it to zlib. >=20 > -- > DSM Nutritional Products Microbia Inc | Bioinformatics > 60 Westview Street | Lexington, MA 02421 | United States Phone +1 781=20 > 259 7613 | Fax +1 781 259 0615 >=20 >=20 > ________________________________ >=20 > DISCLAIMER: > This e-mail is for the intended recipient only. > If you have received it by mistake please let us know by reply and then d= elete it from your system; access, disclosure, copying, distribution or rel= iance on any of it by anyone else is prohibited. > If you as intended recipient have received this e-mail incorrectly, pleas= e notify the sender (via e-mail) immediately. From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 12 13:18:18 2016 Received: (at 23113) by debbugs.gnu.org; 12 Apr 2016 17:18:18 +0000 Received: from localhost ([127.0.0.1]:33010 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aq1xC-0004Oc-0j for submit@debbugs.gnu.org; Tue, 12 Apr 2016 13:18:18 -0400 Received: from mail.alumni.caltech.edu ([131.215.242.114]:53201) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1aq1xA-0004OO-C7 for 23113@debbugs.gnu.org; Tue, 12 Apr 2016 13:18:16 -0400 Received: from dhcp-137-79-213-113.jpl.nasa.gov (unknown [137.79.213.113]) (Authenticated sender: madler) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id 92BC1120149; Tue, 12 Apr 2016 10:18:08 -0700 (PDT) X-DKIM: Sendmail DKIM Filter v2.8.3 mail.alumni.caltech.edu 92BC1120149 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=alumni.caltech.edu; s=enforce; t=1460481488; bh=5sOArlPSYSZf2DM+orz4yiXLPkGTD2+MmS08YF4WBQ4=; h=Subject:Mime-Version:Content-Type:From:In-Reply-To:Date:Cc: Content-Transfer-Encoding:Message-Id:References:To; b=gxMV5nhm+c2Rcu5fCO8ZMssUkHl2aBlWSDblxRAnLPDfy88Nhn+38Y7a9wsIMmXhw hSzjTwY30EwCEwyc3/1YelnSYUkH11Koq8biOXAi3aVpl982pR8n0CRGElYLVtXtW5 au2X44J1Il0zMNSlX9yJn+q8cm/MyD7DbDdAcxpc= Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger buffers Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Content-Type: text/plain; charset=us-ascii From: Mark Adler In-Reply-To: Date: Tue, 12 Apr 2016 10:18:08 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> To: "Chevreux, Bastien" X-Mailer: Apple Mail (2.3124) X-MailScanner-Information-Alumni: X-Alumni-MailScanner-ID: 92BC1120149.A1228 X-MailScanner-Alumni: No Virii found X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.1, required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10) X-MailScanner-From: madler@alumni.caltech.edu X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 23113 Cc: Jim Meyering , "23113@debbugs.gnu.org" <23113@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Bastien, On Apr 12, 2016, at 9:55 AM, Chevreux, Bastien = wrote: > Questions: how stable / error proof is pigz compared to gzip? I always = shied away from it as gzip is so much tried and tested that errors are = unlikely ... and the zlib.net homepage does not make an "official" = statement like "you should all now move to pigz, it's good and tested = enough." Certainly with -p 1, it is nothing more than a wrapper around zlib, = which itself is extensively tested. With -p > 1 it uses threads, which = has been tested on many systems successfully. Though I'd wonder about = how portable it really is. Unfortunately I have no way to know how = widely deployed and used pigz is. (Nor do I know how widely deployed and = used gzip is, but pretty widely.) > Additional question: is there a pigzlib planned? :-) I have been toying with ideas about how to provide parallel support in = zlib. At this point, I'm not sure what the interface should be. Mark From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 12 16:18:47 2016 Received: (at 23113-done) by debbugs.gnu.org; 12 Apr 2016 20:18:47 +0000 Received: from localhost ([127.0.0.1]:33081 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aq4lq-0000LL-NN for submit@debbugs.gnu.org; Tue, 12 Apr 2016 16:18:46 -0400 Received: from mail-ob0-f195.google.com ([209.85.214.195]:36137) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aq4lo-0000L8-Bx for 23113-done@debbugs.gnu.org; Tue, 12 Apr 2016 16:18:45 -0400 Received: by mail-ob0-f195.google.com with SMTP id rf6so1786824obc.3 for <23113-done@debbugs.gnu.org>; Tue, 12 Apr 2016 13:18:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=2qSp79NhAXrrZgH8d5bTw+k3G2UUl+Jacd2lw5Fkq2k=; b=B3sLlZyFch+Kb+dNXDnhcGtDLhPJoEQZWAbGcHqkUGYj93Ipm8n6NLnu3IMWCxYx4w Zve+nDV/4inBrXrEMh01ipIO0R2S0HjF455AwyAa8zIXm9dGZviODbYNtTYMQbTM26bG TIB8XtAsC2IyJFVBs+X5ng3wKt97O2aVNpZ0Q0TuTXZI6672X4heSgoDCzLQbZ72jFIZ o3H1+YNodJKEj0iyBcDdDqkyoWH9dn9ZXdx9DCMC9Oco1maimoPnjruSa9dNBhvpRJzB hRAXRT+HjFTGaTmxypclPjkMwUQcmrlhM06YNNe0rMh+USqvzQAvdW1ATZVVPntSy3P6 M/+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=2qSp79NhAXrrZgH8d5bTw+k3G2UUl+Jacd2lw5Fkq2k=; b=ESP0m5vmxd1OvvVZbmFzzhSqTnGAXcNlWXkWkjoRR10b+PBTWsWWegLsutXfefCPFe kgxheSkCrzpFvzLCV2Sqg7Wdvs9tM6nQbHYB6av3mfrbXsORalND3ChTCtv2j2Iz/ZDy 5EgV6g9TFHoyypac/QdvsnN4UQaKBX8pBIWssB6J9Z8vTsUxFwflLA1b78P3mkhpQbwj 3RNjnsDhUVjFQMo+GbnvuAJPYzKeIr55w8pLK6ehOfTpdTCmV9qzS9uI5NlD+FgcD0Ol QDMpsmiUXn01aUsMJU7gYig0RfkT1AeCm3dnf1n8JlcwARs51OufdNKmDWmHPrlQLJwM 3yvw== X-Gm-Message-State: AOPr4FXEhVsScGTbU5KIz55YLVR3TqV88Q8LEs7QPERRYILF5KtBX7eJM+eCQ+sptETt0kaD/atlFCcG1hx8BQ== X-Received: by 10.60.67.101 with SMTP id m5mr2629215oet.19.1460492318667; Tue, 12 Apr 2016 13:18:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.213.141 with HTTP; Tue, 12 Apr 2016 13:18:18 -0700 (PDT) In-Reply-To: References: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> From: Jim Meyering Date: Tue, 12 Apr 2016 13:18:18 -0700 X-Google-Sender-Auth: w_c9Mxh-i5v1SlOM5VaL9HX8fWg Message-ID: Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger buffers To: "Chevreux, Bastien" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 23113-done Cc: Mark Adler , 23113-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Tue, Apr 12, 2016 at 9:55 AM, Chevreux, Bastien wrote: > Mark, > > I knew about pigz, albeit not about -b, thank you for that. Together with= -p 1 that would replicate gzip and implement input buffering well enough t= o be used in parallel pipelines (where you do not want, e.g., 40 pipelines = running 40 pigz with 40 threads each). > > Questions: how stable / error proof is pigz compared to gzip? I always sh= ied away from it as gzip is so much tried and tested that errors are unlike= ly ... and the zlib.net homepage does not make an "official" statement like= "you should all now move to pigz, it's good and tested enough." Additional= question: is there a pigzlib planned? :-) I expect pigz is stable enough to use with very high confidence. Paul and I are notoriously picky about such things, and would not be considering how to deprecate gzip in favor of pigz or to make gzip a wrapper around pigz if we did not have that level of confidence. One question for Mark: do you know if pigz has been subjected to AFL's coverage-adaptive fuzzing? If not, it'd be great if someone could find the time to do that. If someone does that, please also test an ASAN-enabled binary and tell us how long the tests ran with no trace of failure. For reference, here's what happened when AFL was first applied to linux file system driver code: https://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesyst= em%20fuzzing,%20Vault%202016.pdf. If you read nothing else, look at slide 3, with its table of file system type vs. the amount of time each driver withstood AFL-driven abuse before first failure. FYI, anyone can close one of these "issues," and I'm doing so simply by replying to the usual DDDDD@debbugs.gnu.org address, but with an inserted "-done" before the "@": 23113-done@debbugs.gnu.org From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 12 16:21:53 2016 Received: (at 23113-done) by debbugs.gnu.org; 12 Apr 2016 20:21:53 +0000 Received: from localhost ([127.0.0.1]:33088 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aq4or-0000QN-7v for submit@debbugs.gnu.org; Tue, 12 Apr 2016 16:21:53 -0400 Received: from mail.alumni.caltech.edu ([131.215.242.114]:62234) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1aq4op-0000QA-1a for 23113-done@debbugs.gnu.org; Tue, 12 Apr 2016 16:21:51 -0400 Received: from dhcp-137-79-213-113.jpl.nasa.gov (unknown [137.79.213.113]) (Authenticated sender: madler) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id BBFDA120383; Tue, 12 Apr 2016 13:21:43 -0700 (PDT) X-DKIM: Sendmail DKIM Filter v2.8.3 mail.alumni.caltech.edu BBFDA120383 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=alumni.caltech.edu; s=enforce; t=1460492503; bh=lPQiPnIphA94UhtSbTFhZgCK2PVfoYKN2040ifxvjl8=; h=Subject:Mime-Version:Content-Type:From:In-Reply-To:Date:Cc: Content-Transfer-Encoding:Message-Id:References:To; b=YAhtW7hRqFapvLZPSFeuD2vIzdxuAMHFq6jZuRNs1ca5J9ne8ldb6uvVFQ5Nlu/lT khIBpV0ISg5cMqN4ZCbSySH2mQCn5I2r6IHiyhaCL2f77QL1zf21PR9A4KQwgzmfb2 IYXMUhNRlq5JVO/lwx1V45pvR1HeuQV+nqA3yH58= Subject: Re: bug#23113: parallel gzip processes trash hard disks, need larger buffers Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Content-Type: text/plain; charset=us-ascii From: Mark Adler In-Reply-To: Date: Tue, 12 Apr 2016 13:21:42 -0700 Content-Transfer-Encoding: 7bit Message-Id: <7F361DDB-CA30-419A-A019-EC0137F1F08D@alumni.caltech.edu> References: <09BF4605-5D8A-4959-98AC-AFCCF7520FCF@alumni.caltech.edu> To: Jim Meyering X-Mailer: Apple Mail (2.3124) X-MailScanner-Information-Alumni: X-Alumni-MailScanner-ID: BBFDA120383.AE598 X-MailScanner-Alumni: No Virii found X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.1, required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10) X-MailScanner-From: madler@alumni.caltech.edu X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 23113-done Cc: "Chevreux, Bastien" , 23113-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Jim, On Apr 12, 2016, at 1:18 PM, Jim Meyering wrote: > One question for Mark: do you know if pigz has been subjected to AFL's > coverage-adaptive fuzzing? Not that I know of. Mark From unknown Sat Jun 21 03:10:08 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 11 May 2016 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator