GNU bug report logs - #9455
RFE: split --balanced

Previous Next

Package: coreutils;

Reported by: Dave Yost <Dave <at> Yost.com>

Date: Wed, 7 Sep 2011 01:02:01 UTC

Severity: wishlist

Fixed in version 8.8

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9455 in the body.
You can then email your comments to 9455 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9455; Package coreutils. (Wed, 07 Sep 2011 01:02:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Dave Yost <Dave <at> Yost.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 07 Sep 2011 01:02:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dave Yost <Dave <at> Yost.com>
To: bug-coreutils <at> gnu.org
Subject: RFE: split --balanced
Date: Tue, 6 Sep 2011 17:55:51 -0700
[Message part 1 (text/plain, inline)]
Z% for x in 1 2 3 4 5 6 7
for> do echo $x ; done | split --lines=3 \
pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
=== xaa
1
2
3
=== xab
4
5
6
=== xac
7

In some applications, you would like split to more evenly apportion 
the output to the files, like this:

Z% for x in 1 2 3 4 5 6 7
for> do echo $x ; done | split --balanced --lines=3 \
pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
=== xaa
1
2
3
=== xab
4
5
=== xac
6
7
[Message part 2 (text/html, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9455; Package coreutils. (Wed, 07 Sep 2011 01:39:01 GMT) Full text and rfc822 format available.

Message #8 received at 9455 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Dave Yost <Dave <at> Yost.com>
Cc: 9455 <at> debbugs.gnu.org
Subject: Re: bug#9455: RFE: split --balanced
Date: Tue, 6 Sep 2011 19:34:18 -0600
severity 9455 wishlist
thanks

Hi Dave!

Dave Yost wrote:
> Z% for x in 1 2 3 4 5 6 7
> for> do echo $x ; done | split --lines=3 \
> pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done

Sure.  Just fyi but GNU seq can produce sequences of numbers very
easily.  I think this is a little more concise example.

  seq 1 7 | split --lines=3
  head *

But do you always mean stdin?  Or mostly is this a file?  Because...

> In some applications, you would like split to more evenly apportion
> the output to the files, like this:
> 
> Z% for x in 1 2 3 4 5 6 7
> for> do echo $x ; done | split --balanced --lines=3 \
> pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
> === xaa
> 1
> 2
> 3
> === xab
> 4
> 5
> === xac
> 6
> 7

I think it would be really hard to know if the user wanted the extra
lines in the first file.  It would be easier to gather that last widow
line up into the next to last file.  Or easier to leave it alone in
its own file.

  seq 1 7 > input.txt
  numsplits=3
  num=$(wc -l < input.txt)
  perfile=$(($num / $numsplits))
  split --lines=$perfile < input.txt
  head x*

  seq 1 17 > input.txt
  numsplits=3
  num=$(wc -l < input.txt)
  perfile=$(($num / $numsplits))
  split --lines=$perfile < input.txt
  head x*

And from there you could get creative and make a decision based upon
the number of lines in that last file.

  if [ $(($num % $numsplits)) -lt $(($num / 2)) ]; then ...

If the widow lines are less than half the number of total lines in the
files then that last file could be concatenated into the next to last
file, the implementation of which I will leave as an exercise.  Just
as an example.  Or they could be put in the first file.  But I think
what the user would want in something like this is so varied that
there isn't any one natural result.  So I think this is better left to
the caller to decide.

Bob




Severity set to 'wishlist' from 'normal' Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Wed, 07 Sep 2011 01:39:01 GMT) Full text and rfc822 format available.

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9455; Package coreutils. (Wed, 07 Sep 2011 08:04:01 GMT) Full text and rfc822 format available.

Message #13 received at 9455 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Dave Yost <Dave <at> Yost.com>
Cc: 9455 <at> debbugs.gnu.org
Subject: Re: bug#9455: RFE: split --balanced
Date: Wed, 07 Sep 2011 08:59:16 +0100
On 09/07/2011 01:55 AM, Dave Yost wrote:
> Z% for x in 1 2 3 4 5 6 7
> for> do echo $x ; done | split --lines=3 \
> pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
> === xaa
> 1
> 2
> 3
> === xab
> 4
> 5
> 6
> === xac
> 7
> 
> In some applications, you would like split to more evenly apportion the output to the files, like this:
> 
> Z% for x in 1 2 3 4 5 6 7
> for> do echo $x ; done | split --balanced --lines=3 \
> pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
> === xaa
> 1
> 2
> 3
> === xab
> 4
> 5
> === xac
> 6
> 7
> 

So you'd like to distribute evenly across the last 2 buckets.
It seems like it would be more general to specify the number of buckets
instead and let split balance across them all, which is supported recently.

$ seq 7 | split -nr/3; tail x??
==> xaa <==
1
4
7

==> xab <==
2
5

==> xac <==
3
6

$ seq 7 > 7; split -nl/3 7; tail x??
==> xaa <==
1
2

==> xab <==
3
4

==> xac <==
5
6
7

Would that suffice?

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9455; Package coreutils. (Wed, 07 Sep 2011 12:36:02 GMT) Full text and rfc822 format available.

Message #16 received at 9455 <at> debbugs.gnu.org (full text, mbox):

From: Dave Yost <Dave <at> Yost.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: "9455 <at> debbugs.gnu.org" <9455 <at> debbugs.gnu.org>
Subject: Re: bug#9455: RFE: split --balanced
Date: Wed, 7 Sep 2011 05:31:26 -0700
Much better design. Thanks.


/ from my iPhone 4 /

On Sep 7, 2011, at 12:59 AM, Pádraig Brady <P <at> draigBrady.com> wrote:

> On 09/07/2011 01:55 AM, Dave Yost wrote:
>> Z% for x in 1 2 3 4 5 6 7
>> for> do echo $x ; done | split --lines=3 \
>> pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
>> === xaa
>> 1
>> 2
>> 3
>> === xab
>> 4
>> 5
>> 6
>> === xac
>> 7
>> 
>> In some applications, you would like split to more evenly apportion the output to the files, like this:
>> 
>> Z% for x in 1 2 3 4 5 6 7
>> for> do echo $x ; done | split --balanced --lines=3 \
>> pipe> && for x in x?? ; do echo "=== $x" ; cat $x ; done
>> === xaa
>> 1
>> 2
>> 3
>> === xab
>> 4
>> 5
>> === xac
>> 6
>> 7
>> 
> 
> So you'd like to distribute evenly across the last 2 buckets.
> It seems like it would be more general to specify the number of buckets
> instead and let split balance across them all, which is supported recently.
> 
> $ seq 7 | split -nr/3; tail x??
> ==> xaa <==
> 1
> 4
> 7
> 
> ==> xab <==
> 2
> 5
> 
> ==> xac <==
> 3
> 6
> 
> $ seq 7 > 7; split -nl/3 7; tail x??
> ==> xaa <==
> 1
> 2
> 
> ==> xab <==
> 3
> 4
> 
> ==> xac <==
> 5
> 6
> 7
> 
> Would that suffice?
> 
> cheers,
> Pádraig.




bug marked as fixed in version 8.8, send any further explanations to 9455 <at> debbugs.gnu.org and Dave Yost <Dave <at> Yost.com> Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Wed, 07 Sep 2011 12:53:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 06 Oct 2011 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 339 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.