From unknown Sun Jun 22 07:29:54 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#20725 <20725@debbugs.gnu.org> To: bug#20725 <20725@debbugs.gnu.org> Subject: Status: Shuf problem Reply-To: bug#20725 <20725@debbugs.gnu.org> Date: Sun, 22 Jun 2025 14:29:54 +0000 retitle 20725 Shuf problem reassign 20725 coreutils submitter 20725 Federico Alves severity 20725 normal tag 20725 wontfix thanks From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 03 09:20:31 2015 Received: (at submit) by debbugs.gnu.org; 3 Jun 2015 13:20:31 +0000 Received: from localhost ([127.0.0.1]:38327 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z08at-0000uM-6Y for submit@debbugs.gnu.org; Wed, 03 Jun 2015 09:20:31 -0400 Received: from eggs.gnu.org ([208.118.235.92]:50908) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z08aq-0000u8-KL for submit@debbugs.gnu.org; Wed, 03 Jun 2015 09:20:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z08ah-0007tt-CV for submit@debbugs.gnu.org; Wed, 03 Jun 2015 09:20:23 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:36885) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z08ah-0007tl-A4 for submit@debbugs.gnu.org; Wed, 03 Jun 2015 09:20:19 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36163) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z08ag-00055t-FV for bug-coreutils@gnu.org; Wed, 03 Jun 2015 09:20:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z08af-0007sq-IS for bug-coreutils@gnu.org; Wed, 03 Jun 2015 09:20:18 -0400 Received: from mail-wi0-x22b.google.com ([2a00:1450:400c:c05::22b]:33297) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z08af-0007rj-Bb for bug-coreutils@gnu.org; Wed, 03 Jun 2015 09:20:17 -0400 Received: by wiwd19 with SMTP id d19so52464404wiw.0 for ; Wed, 03 Jun 2015 06:20:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Pf1Zcwu8m9x3MA91I5obsOpxb5bJHfcr8oEzL6Sl6iA=; b=KxEYsj6kYy/vdbvm7YCyJ5txbDfIyHuxn+HAkfhIGoo1UV2cfjDr7W1XGfPr5NWyl5 YqUd5cp/p9MyZki5NNM0DqR3k38uOGf/eLVmM3BxwpBbiMvAQ1B60ZUjzox6uO+VyXdW h2d/S2soL5N05SWFDZq9Ws9ezq04M+yVUV3IkmItYqO1NERg4OTIbPwv4mwNNlcyXYF6 Xh8XwHmk4O8wx/1WLQbxZcPoWvqXWoen1Zl0r1n7tP4rAtsV0RgFW0e6xvGEyiXoRVa1 hAiUX9okLDvdz9ofSLrUDoBZbRbunMG2QPjneg+HuEPyJvQs4cpDx3RYCKgs6u/ZDFJe a5tA== MIME-Version: 1.0 X-Received: by 10.180.75.48 with SMTP id z16mr40595554wiv.49.1433337615461; Wed, 03 Jun 2015 06:20:15 -0700 (PDT) Received: by 10.180.195.36 with HTTP; Wed, 3 Jun 2015 06:20:15 -0700 (PDT) Date: Wed, 3 Jun 2015 09:20:15 -0400 Message-ID: Subject: Shuf problem From: Federico Alves To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary=f46d0438951969ccb305179cebd7 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --f46d0438951969ccb305179cebd7 Content-Type: text/plain; charset=UTF-8 I think that shuf should have an option, may set ON by default, to avoid empty lines in a file when shuffling it. if a file has 100 lines and ten are simply returns, 99% of the time I do not want an empty line, I want only the real lines. Yours Federico Alves --f46d0438951969ccb305179cebd7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I t= hink that shuf should have an option, may set ON by default, to avoid empty= lines in a file when shuffling it. if a file has 100 lines and ten are sim= ply returns, 99% of the time I do not want an empty line, I want only the r= eal lines.

Yours

Federico Alves
--f46d0438951969ccb305179cebd7-- From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 03 09:33:49 2015 Received: (at 20725) by debbugs.gnu.org; 3 Jun 2015 13:33:50 +0000 Received: from localhost ([127.0.0.1]:38333 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z08nk-0001Eu-Ap for submit@debbugs.gnu.org; Wed, 03 Jun 2015 09:33:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45698) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z08nc-0001Ef-SU for 20725@debbugs.gnu.org; Wed, 03 Jun 2015 09:33:42 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id 7C4CA312D3C; Wed, 3 Jun 2015 13:33:38 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-124.ams2.redhat.com [10.36.116.124]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t53DXZ8a025483 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 3 Jun 2015 09:33:37 -0400 Message-ID: <556F022F.60106@draigBrady.com> Date: Wed, 03 Jun 2015 14:33:35 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Federico Alves , 20725@debbugs.gnu.org Subject: Re: bug#20725: Shuf problem References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20725 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) On 03/06/15 14:20, Federico Alves wrote: > I think that shuf should have an option, may set ON by default, to avoid empty lines in a file when shuffling it. if a file has 100 lines and ten are simply returns, 99% of the time I do not want an empty line, I want only the real lines. We would only consider that if it provided functional benefits or large performance gains. Neither would be the case here I think compared to some simple preprocessing like: sed '/^$/d' | shuf cheers, PĆ”draig From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 03 10:00:33 2015 Received: (at submit) by debbugs.gnu.org; 3 Jun 2015 14:00:33 +0000 Received: from localhost ([127.0.0.1]:39124 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z09Dc-00021b-Kp for submit@debbugs.gnu.org; Wed, 03 Jun 2015 10:00:33 -0400 Received: from eggs.gnu.org ([208.118.235.92]:38943) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z09Da-00021N-Am for submit@debbugs.gnu.org; Wed, 03 Jun 2015 10:00:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z09DU-0001MP-Fq for submit@debbugs.gnu.org; Wed, 03 Jun 2015 10:00:25 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: *** X-Spam-Status: No, score=3.3 required=5.0 tests=BAYES_50,FREEMAIL_FROM, TO_NO_BRKTS_PCNT autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:59125) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z09DU-0001MG-Do for submit@debbugs.gnu.org; Wed, 03 Jun 2015 10:00:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52440) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z09DT-0002C9-Ba for bug-coreutils@gnu.org; Wed, 03 Jun 2015 10:00:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z09DP-0001I2-H4 for bug-coreutils@gnu.org; Wed, 03 Jun 2015 10:00:23 -0400 Received: from plane.gmane.org ([80.91.229.3]:37959) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z09DP-0001HY-BY for bug-coreutils@gnu.org; Wed, 03 Jun 2015 10:00:19 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Z09DH-0002vC-Q8 for bug-coreutils@gnu.org; Wed, 03 Jun 2015 16:00:11 +0200 Received: from 05448b1b.skybroadband.com ([5.68.139.27]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 03 Jun 2015 16:00:11 +0200 Received: from stephane.chazelas by 05448b1b.skybroadband.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 03 Jun 2015 16:00:11 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: bug-coreutils@gnu.org From: Stephane Chazelas Subject: Re: bug#20725: Shuf problem Date: Wed, 3 Jun 2015 14:58:58 +0100 Lines: 24 Message-ID: <20150603135858.GA4752@chaz.gmail.com> References: <556F022F.60106@draigBrady.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 05448b1b.skybroadband.com Content-Disposition: inline In-Reply-To: <556F022F.60106@draigBrady.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.1 (----) 2015-06-03 14:33:35 +0100, Pįdraig Brady: > On 03/06/15 14:20, Federico Alves wrote: > > I think that shuf should have an option, may set ON by default, to avoid empty lines in a file when shuffling it. if a file has 100 lines and ten are simply returns, 99% of the time I do not want an empty line, I want only the real lines. > > We would only consider that if it provided functional benefits > or large performance gains. Neither would be the case here I think > compared to some simple preprocessing like: > > sed '/^$/d' | shuf [...] Or grep -v '^$' | shuf or grep . | shuf (that one will also exclude lines that contain sequences of bytes that don't form any valid characters). -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 03 10:34:58 2015 Received: (at 20725) by debbugs.gnu.org; 3 Jun 2015 14:34:58 +0000 Received: from localhost ([127.0.0.1]:39134 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z09kv-0002pr-Tb for submit@debbugs.gnu.org; Wed, 03 Jun 2015 10:34:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37275) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z09kt-0002ph-Br for 20725@debbugs.gnu.org; Wed, 03 Jun 2015 10:34:55 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 71F0DB6F24; Wed, 3 Jun 2015 14:34:53 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-124.ams2.redhat.com [10.36.116.124]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t53EYoJq007520 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 3 Jun 2015 10:34:52 -0400 Message-ID: <556F108A.4070004@draigBrady.com> Date: Wed, 03 Jun 2015 15:34:50 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Federico Alves Subject: Re: bug#20725: Shuf problem References: <556F022F.60106@draigBrady.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20725 Cc: 20725@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) tag 20725 wontfix close 20725 stop On 03/06/15 14:40, Federico Alves wrote: > Dear Padraig > I think this is exactly the case. Please consider a very large file. It would be very inefficient to use sed first, and the shuf. A simple switch in shuf would be way better. > It should be ON ny default, I believe. For a large file shuf will use reservoir sampling as it does for pipes, and so there should not be significant differences in the base operation on files and pipes. Also shuffling a large file containing blank lines seems like a slightly unusual case, and thus not worth complicating the interface with another option for. thanks, PĆ”draig. From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 03 10:48:53 2015 Received: (at control) by debbugs.gnu.org; 3 Jun 2015 14:48:53 +0000 Received: from localhost ([127.0.0.1]:39142 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z09yO-0003As-Om for submit@debbugs.gnu.org; Wed, 03 Jun 2015 10:48:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42408) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z09yM-0003Af-NR for control@debbugs.gnu.org; Wed, 03 Jun 2015 10:48:51 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id E4120B6E8A for ; Wed, 3 Jun 2015 14:48:44 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-124.ams2.redhat.com [10.36.116.124]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t53EmgwP021113 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 3 Jun 2015 10:48:44 -0400 Message-ID: <556F13CA.1000200@draigBrady.com> Date: Wed, 03 Jun 2015 15:48:42 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: GNU bug tracker automated control server Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) tag 20725 wontfix close 20725 stop From unknown Sun Jun 22 07:29:54 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 02 Jul 2015 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator