GNU bug report logs - #20725
Shuf problem

Previous Next

Package: coreutils;

Reported by: Federico Alves <venefax <at> gmail.com>

Date: Wed, 3 Jun 2015 13:21:03 UTC

Severity: normal

Tags: wontfix

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20725 in the body.
You can then email your comments to 20725 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#20725; Package coreutils. (Wed, 03 Jun 2015 13:21:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Federico Alves <venefax <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 03 Jun 2015 13:21:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Federico Alves <venefax <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Shuf problem
Date: Wed, 3 Jun 2015 09:20:15 -0400
[Message part 1 (text/plain, inline)]
I think that shuf should have an option, may set ON by default, to avoid
empty lines in a file when shuffling it. if a file has 100 lines and ten
are simply returns, 99% of the time I do not want an empty line, I want
only the real lines.

Yours

Federico Alves
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#20725; Package coreutils. (Wed, 03 Jun 2015 13:34:03 GMT) Full text and rfc822 format available.

Message #8 received at 20725 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Federico Alves <venefax <at> gmail.com>, 20725 <at> debbugs.gnu.org
Subject: Re: bug#20725: Shuf problem
Date: Wed, 03 Jun 2015 14:33:35 +0100
On 03/06/15 14:20, Federico Alves wrote:
> I think that shuf should have an option, may set ON by default, to avoid empty lines in a file when shuffling it. if a file has 100 lines and ten are simply returns, 99% of the time I do not want an empty line, I want only the real lines.

We would only consider that if it provided functional benefits
or large performance gains.  Neither would be the case here I think
compared to some simple preprocessing like:

 sed '/^$/d' | shuf

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#20725; Package coreutils. (Wed, 03 Jun 2015 14:01:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#20725: Shuf problem
Date: Wed, 3 Jun 2015 14:58:58 +0100
2015-06-03 14:33:35 +0100, Pádraig Brady:
> On 03/06/15 14:20, Federico Alves wrote:
> > I think that shuf should have an option, may set ON by default, to avoid empty lines in a file when shuffling it. if a file has 100 lines and ten are simply returns, 99% of the time I do not want an empty line, I want only the real lines.
> 
> We would only consider that if it provided functional benefits
> or large performance gains.  Neither would be the case here I think
> compared to some simple preprocessing like:
> 
>  sed '/^$/d' | shuf
[...]

Or 

grep -v '^$' | shuf

or

grep . | shuf

(that one will also exclude lines that contain sequences of
bytes that don't form any valid characters).

-- 
Stephane





Information forwarded to bug-coreutils <at> gnu.org:
bug#20725; Package coreutils. (Wed, 03 Jun 2015 14:35:03 GMT) Full text and rfc822 format available.

Message #14 received at 20725 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Federico Alves <venefax <at> gmail.com>
Cc: 20725 <at> debbugs.gnu.org
Subject: Re: bug#20725: Shuf problem
Date: Wed, 03 Jun 2015 15:34:50 +0100
tag 20725 wontfix
close 20725
stop

On 03/06/15 14:40, Federico Alves wrote:
> Dear Padraig
> I think this is exactly the case. Please consider a very large file. It would be very inefficient to use sed first, and the shuf. A simple switch in shuf would be way better.
> It should be ON ny default, I believe.

For a large file shuf will use reservoir sampling as it does for pipes,
and so there should not be significant differences in the base
operation on files and pipes.  Also shuffling a large file containing
blank lines seems like a slightly unusual case, and thus not worth
complicating the interface with another option for.

thanks,
Pádraig.





Added tag(s) wontfix. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Wed, 03 Jun 2015 14:49:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 20725 <at> debbugs.gnu.org and Federico Alves <venefax <at> gmail.com> Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Wed, 03 Jun 2015 14:49:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 02 Jul 2015 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 357 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.