From unknown Sat Sep 06 09:27:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9455: RFE: split --balanced Resent-From: Dave Yost Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 07 Sep 2011 01:02:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 9455 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 9455@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13153573007914 (code B ref -1); Wed, 07 Sep 2011 01:02:01 +0000 Received: (at submit) by debbugs.gnu.org; 7 Sep 2011 01:01:40 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R16WJ-00023a-TH for submit@debbugs.gnu.org; Tue, 06 Sep 2011 21:01:40 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R16V1-0001mC-S6 for submit@debbugs.gnu.org; Tue, 06 Sep 2011 21:00:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R16RI-0008M6-7Y for submit@debbugs.gnu.org; Tue, 06 Sep 2011 20:56:29 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_LOW,T_DKIM_INVALID,T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:56192) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R16RI-0008M2-66 for submit@debbugs.gnu.org; Tue, 06 Sep 2011 20:56:28 -0400 Received: from eggs.gnu.org ([140.186.70.92]:38083) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R16RH-0006Xq-7I for bug-coreutils@gnu.org; Tue, 06 Sep 2011 20:56:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R16RE-0008LU-O9 for bug-coreutils@gnu.org; Tue, 06 Sep 2011 20:56:27 -0400 Received: from mail-yw0-f41.google.com ([209.85.213.41]:48651) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R16RE-0008Kw-LZ for bug-coreutils@gnu.org; Tue, 06 Sep 2011 20:56:24 -0400 Received: by ywm13 with SMTP id 13so4727232ywm.0 for ; Tue, 06 Sep 2011 17:56:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:mime-version:message-id:date:to:from:subject:content-type; bh=ZYT6wX1jUPkr/XzRcWDKNQ/+YMl1UajMPq7krjosO2o=; b=QQxQDqlOBC2Udpt9OUjgdWeH+EnblKDtiOwY3tV1tGm17FJbgJYbWV2xrS80vKWbDX Cnj99MVaOWi/F5hA+FEhofk/NpV13JHs58uyEEbdNENsaFZZZuPuHkZCXK045yBoYO3R Igu+KZh7IIm1pIZUZ97wZZCe8AuMRckNkcQvI= Received: by 10.231.47.17 with SMTP id l17mr10672573ibf.24.1315356983379; Tue, 06 Sep 2011 17:56:23 -0700 (PDT) Received: from [192.168.1.2] (c-98-207-235-40.hsd1.ca.comcast.net [98.207.235.40]) by mx.google.com with ESMTPS id v2sm2965628ibg.2.2011.09.06.17.56.21 (version=SSLv3 cipher=OTHER); Tue, 06 Sep 2011 17:56:22 -0700 (PDT) Mime-Version: 1.0 Message-Id: Date: Tue, 6 Sep 2011 17:55:51 -0700 From: Dave Yost Content-Type: multipart/alternative; boundary="============_-896765514==_ma============" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -5.9 (-----) X-Mailman-Approved-At: Tue, 06 Sep 2011 21:01:38 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) --============_-896765514==_ma============ Content-Type: text/plain; charset="us-ascii" ; format="flowed" Z% for x in 1 2 3 4 5 6 7 for> do echo $x ; done | split --lines=3 \ pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done === xaa 1 2 3 === xab 4 5 6 === xac 7 In some applications, you would like split to more evenly apportion the output to the files, like this: Z% for x in 1 2 3 4 5 6 7 for> do echo $x ; done | split --balanced --lines=3 \ pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done === xaa 1 2 3 === xab 4 5 === xac 6 7 --============_-896765514==_ma============ Content-Type: text/html; charset="us-ascii" RFE: split --balanced
Z% for x in 1 2 3 4 5 6 7
for> do echo $x ; done | split --lines=3 \
pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
=== xaa
1
2
3
=== xab
4
5
6
=== xac
7

In some applications, you would like split to more evenly apportion the output to the files, like this:

Z% for x in 1 2 3 4 5 6 7
for> do echo $x ; done | split --balanced --lines=3 \
pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
=== xaa
1
2
3
=== xab
4
5
=== xac
6
7
--============_-896765514==_ma============-- From unknown Sat Sep 06 09:27:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9455: RFE: split --balanced Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 07 Sep 2011 01:39:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 9455 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Dave Yost Cc: 9455@debbugs.gnu.org Received: via spool by 9455-submit@debbugs.gnu.org id=B9455.131535949516288 (code B ref 9455); Wed, 07 Sep 2011 01:39:01 +0000 Received: (at 9455) by debbugs.gnu.org; 7 Sep 2011 01:38:15 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R175j-0004Ee-1n for submit@debbugs.gnu.org; Tue, 06 Sep 2011 21:38:15 -0400 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R175h-0004ES-1d; Tue, 06 Sep 2011 21:38:14 -0400 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id B6912211D8; Tue, 6 Sep 2011 19:34:18 -0600 (MDT) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 4534C2DC89; Tue, 6 Sep 2011 19:34:18 -0600 (MDT) Date: Tue, 6 Sep 2011 19:34:18 -0600 From: Bob Proulx Message-ID: <20110907013418.GA19585@hysteria.proulx.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -2.5 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.5 (--) severity 9455 wishlist thanks Hi Dave! Dave Yost wrote: > Z% for x in 1 2 3 4 5 6 7 > for> do echo $x ; done | split --lines=3 \ > pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done Sure. Just fyi but GNU seq can produce sequences of numbers very easily. I think this is a little more concise example. seq 1 7 | split --lines=3 head * But do you always mean stdin? Or mostly is this a file? Because... > In some applications, you would like split to more evenly apportion > the output to the files, like this: > > Z% for x in 1 2 3 4 5 6 7 > for> do echo $x ; done | split --balanced --lines=3 \ > pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done > === xaa > 1 > 2 > 3 > === xab > 4 > 5 > === xac > 6 > 7 I think it would be really hard to know if the user wanted the extra lines in the first file. It would be easier to gather that last widow line up into the next to last file. Or easier to leave it alone in its own file. seq 1 7 > input.txt numsplits=3 num=$(wc -l < input.txt) perfile=$(($num / $numsplits)) split --lines=$perfile < input.txt head x* seq 1 17 > input.txt numsplits=3 num=$(wc -l < input.txt) perfile=$(($num / $numsplits)) split --lines=$perfile < input.txt head x* And from there you could get creative and make a decision based upon the number of lines in that last file. if [ $(($num % $numsplits)) -lt $(($num / 2)) ]; then ... If the widow lines are less than half the number of total lines in the files then that last file could be concatenated into the next to last file, the implementation of which I will leave as an exercise. Just as an example. Or they could be put in the first file. But I think what the user would want in something like this is so varied that there isn't any one natural result. So I think this is better left to the caller to decide. Bob From unknown Sat Sep 06 09:27:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9455: RFE: split --balanced Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 07 Sep 2011 08:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 9455 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Dave Yost Cc: 9455@debbugs.gnu.org Received: via spool by 9455-submit@debbugs.gnu.org id=B9455.131538259320596 (code B ref 9455); Wed, 07 Sep 2011 08:04:01 +0000 Received: (at 9455) by debbugs.gnu.org; 7 Sep 2011 08:03:13 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R1D6H-0005M8-0T for submit@debbugs.gnu.org; Wed, 07 Sep 2011 04:03:13 -0400 Received: from mail3.vodafone.ie ([213.233.128.45]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R1D6E-0005M0-Jx for 9455@debbugs.gnu.org; Wed, 07 Sep 2011 04:03:11 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAPgiZ05tTe1+/2dsb2JhbAAMNqpJAQEFDiQBMhQQCw0LCRYPCQMCAQIBRQYNAQcBAb1thmsEmF+LZA Received: from unknown (HELO [192.168.1.79]) ([109.77.237.126]) by mail3.vodafone.ie with ESMTP; 07 Sep 2011 08:59:17 +0100 Message-ID: <4E672454.9000507@draigBrady.com> Date: Wed, 07 Sep 2011 08:59:16 +0100 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0 MIME-Version: 1.0 References: In-Reply-To: X-Enigmail-Version: 1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.6 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) On 09/07/2011 01:55 AM, Dave Yost wrote: > Z% for x in 1 2 3 4 5 6 7 > for> do echo $x ; done | split --lines=3 \ > pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done > === xaa > 1 > 2 > 3 > === xab > 4 > 5 > 6 > === xac > 7 > > In some applications, you would like split to more evenly apportion the output to the files, like this: > > Z% for x in 1 2 3 4 5 6 7 > for> do echo $x ; done | split --balanced --lines=3 \ > pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done > === xaa > 1 > 2 > 3 > === xab > 4 > 5 > === xac > 6 > 7 > So you'd like to distribute evenly across the last 2 buckets. It seems like it would be more general to specify the number of buckets instead and let split balance across them all, which is supported recently. $ seq 7 | split -nr/3; tail x?? ==> xaa <== 1 4 7 ==> xab <== 2 5 ==> xac <== 3 6 $ seq 7 > 7; split -nl/3 7; tail x?? ==> xaa <== 1 2 ==> xab <== 3 4 ==> xac <== 5 6 7 Would that suffice? cheers, Pádraig. From unknown Sat Sep 06 09:27:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9455: RFE: split --balanced Resent-From: Dave Yost Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 07 Sep 2011 12:36:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 9455 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?P=C3=A1draig?= Brady Cc: "9455@debbugs.gnu.org" <9455@debbugs.gnu.org> Received: via spool by 9455-submit@debbugs.gnu.org id=B9455.131539893027146 (code B ref 9455); Wed, 07 Sep 2011 12:36:01 +0000 Received: (at 9455) by debbugs.gnu.org; 7 Sep 2011 12:35:30 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R1HLl-00073m-Nc for submit@debbugs.gnu.org; Wed, 07 Sep 2011 08:35:30 -0400 Received: from mail-pz0-f47.google.com ([209.85.210.47]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R1HLj-00073d-D1 for 9455@debbugs.gnu.org; Wed, 07 Sep 2011 08:35:27 -0400 Received: by pzk2 with SMTP id 2so9955805pzk.20 for <9455@debbugs.gnu.org>; Wed, 07 Sep 2011 05:31:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:references:in-reply-to:mime-version :content-transfer-encoding:content-type:message-id:cc:x-mailer:from :subject:date:to; bh=2a11mUCITSMeNQxzVnMicLigkp4Tm9KZTK+1OKq8FeA=; b=uqwq5xHkGAZxmgVCccLhDIvdOFNROUa6fFCj+Sdi1/8xUvJyLWnHJxaNb3wPZ9LapJ Shu18TyIBfRN+2uiHcR8MDkABLki/JIEbqG0HEM80HpSYj55AKFkYwS7SNvkXIgTGN3Y D9RpBLzfnRyeEL3ChVtwcbyo5nET5hzJLSHHw= Received: by 10.68.30.10 with SMTP id o10mr4198519pbh.46.1315398693543; Wed, 07 Sep 2011 05:31:33 -0700 (PDT) Received: from [192.168.1.4] (c-98-207-235-40.hsd1.ca.comcast.net. [98.207.235.40]) by mx.google.com with ESMTPS id f6sm5395773pbp.2.2011.09.07.05.31.31 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 07 Sep 2011 05:31:32 -0700 (PDT) References: <4E672454.9000507@draigBrady.com> In-Reply-To: <4E672454.9000507@draigBrady.com> Mime-Version: 1.0 (iPhone Mail 8L1) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Message-Id: <202F3176-9246-4258-980D-9B145123F11F@Yost.com> X-Mailer: iPhone Mail (8L1) From: Dave Yost Date: Wed, 7 Sep 2011 05:31:26 -0700 X-Spam-Score: -4.1 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.4 (---) Much better design. Thanks. / from my iPhone 4 / On Sep 7, 2011, at 12:59 AM, P=C3=A1draig Brady wrote: > On 09/07/2011 01:55 AM, Dave Yost wrote: >> Z% for x in 1 2 3 4 5 6 7 >> for> do echo $x ; done | split --lines=3D3 \ >> pipe> && for x in x?? ; do echo "=3D=3D=3D $x" ; cat $x ; done >> =3D=3D=3D xaa >> 1 >> 2 >> 3 >> =3D=3D=3D xab >> 4 >> 5 >> 6 >> =3D=3D=3D xac >> 7 >>=20 >> In some applications, you would like split to more evenly apportion the o= utput to the files, like this: >>=20 >> Z% for x in 1 2 3 4 5 6 7 >> for> do echo $x ; done | split --balanced --lines=3D3 \ >> pipe> && for x in x?? ; do echo "=3D=3D=3D $x" ; cat $x ; done >> =3D=3D=3D xaa >> 1 >> 2 >> 3 >> =3D=3D=3D xab >> 4 >> 5 >> =3D=3D=3D xac >> 6 >> 7 >>=20 >=20 > So you'd like to distribute evenly across the last 2 buckets. > It seems like it would be more general to specify the number of buckets > instead and let split balance across them all, which is supported recently= . >=20 > $ seq 7 | split -nr/3; tail x?? > =3D=3D> xaa <=3D=3D > 1 > 4 > 7 >=20 > =3D=3D> xab <=3D=3D > 2 > 5 >=20 > =3D=3D> xac <=3D=3D > 3 > 6 >=20 > $ seq 7 > 7; split -nl/3 7; tail x?? > =3D=3D> xaa <=3D=3D > 1 > 2 >=20 > =3D=3D> xab <=3D=3D > 3 > 4 >=20 > =3D=3D> xac <=3D=3D > 5 > 6 > 7 >=20 > Would that suffice? >=20 > cheers, > P=C3=A1draig. From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 07 08:52:59 2011 Received: (at control) by debbugs.gnu.org; 7 Sep 2011 12:52:59 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R1Hch-0008BK-7l for submit@debbugs.gnu.org; Wed, 07 Sep 2011 08:52:59 -0400 Received: from mail3.vodafone.ie ([213.233.128.45]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R1Hce-0008BC-ON for control@debbugs.gnu.org; Wed, 07 Sep 2011 08:52:57 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlgCAPZmZ05tTe1+/2dsb2JhbAAMN4RVlFOQfCQKfg0CBRYLAgsDAgECAVgIAQGHaaVlkheBLIQugREEmGKLZg Received: from unknown (HELO [192.168.1.79]) ([109.77.237.126]) by mail3.vodafone.ie with ESMTP; 07 Sep 2011 13:49:02 +0100 Message-ID: <4E67683E.3020009@draigBrady.com> Date: Wed, 07 Sep 2011 13:49:02 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: closing 9455 X-Enigmail-Version: 1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) package coreutils close 9455 8.8