From unknown Tue Jun 17 22:00:49 2025 X-Loop: help-debbugs@gnu.org Subject: bug#72445: shuf with both input-range and head-count biased Resent-From: Daniel Carpenter Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 03 Aug 2024 17:24:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 72445 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 72445@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.172270583218031 (code B ref -1); Sat, 03 Aug 2024 17:24:01 +0000 Received: (at submit) by debbugs.gnu.org; 3 Aug 2024 17:23:52 +0000 Received: from localhost ([127.0.0.1]:54753 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1saITv-0004gi-7j for submit@debbugs.gnu.org; Sat, 03 Aug 2024 13:23:51 -0400 Received: from lists.gnu.org ([209.51.188.17]:53400) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sa9zN-0005YH-AZ for submit@debbugs.gnu.org; Sat, 03 Aug 2024 04:19:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sa9z2-0004Bs-QU for bug-coreutils@gnu.org; Sat, 03 Aug 2024 04:19:24 -0400 Received: from mail-vs1-xe2f.google.com ([2607:f8b0:4864:20::e2f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sa9z0-0001Go-Sj for bug-coreutils@gnu.org; Sat, 03 Aug 2024 04:19:24 -0400 Received: by mail-vs1-xe2f.google.com with SMTP id ada2fe7eead31-492959b906eso2425482137.0 for ; Sat, 03 Aug 2024 01:19:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722673160; x=1723277960; darn=gnu.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=MTtPzPIVqpVxe4UGX51RAP9vpVj9KGbyIe0eLWSCkag=; b=ODjplXAcb6Tpb1xJLRyZwYQSJQ+/L8XddsQt1FGMHndvyxa86y9r5YloO0H5kDEjRA x+pSuJwUcokbpGRhgsOEGfHDfFwCXnfxaqVADPI1JSpkkDo35706P6vQSjdEUo2l7xQr JoCk8yhZ5MAR/E0D8cAJjpC4p16KDpJyLIHI+tRJfMCyLYsRgnzYgUe8csqEtfxoobD1 LXsasBVjgNcPilVPEeIuU0YPk7duQdYZsfPX1EhvJFsXlvMuabBvD9WWv6pU9VWq4Z63 gDL5yiVjSdGAEY5tQaNEcSASiEuiYciJDHXnYA3fTVKbwUFout3O92/AnjF8d7DngyAB FN8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722673160; x=1723277960; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=MTtPzPIVqpVxe4UGX51RAP9vpVj9KGbyIe0eLWSCkag=; b=MssfLG0YRMjrmrikFklPURnF48ePz4TQi7zPJXojJcb63PCQoVTJuZYoeEr8Ytrfwp 9ZmzPX5wR+cQdutYnSwnsC8HuWXgnd392NMCkzK2FXJy7i6CHO3IHsteDI4uaC60EoHE ETFT4MmvvUrHs9aR52nuoyFfHAcUfd1zLILsS3x3ZwOrhpzPL3eTkM0/i6iddfI7uXW0 y+B04ZIbTehejpxVEBPrlkFiVr65t/Njv+zSkXdAy6MvjB9gNxv8RGK6+F+NCRzKW7dL GtGpt4oNAfq2LSi1LB2Ug2R3aFEEvQOyckEyAOXsJOXa6SqvEL9OjYhE1C8Hzgjv+msg LQGA== X-Gm-Message-State: AOJu0YyX5X+T7c5yjJ4iK8jqMsl95/Auq4C9RFlHYtDVG/DZaUqOzqDt MGafWebFGTP8h+uwMQDwLF+uPhSM93a9NsDywMA9COnN9wYmfxohZwKf7x0Q66rQJjxh6n5QMke h9WNUH8zeRptBggkM1xqQcD2xH++gF4KJVT0= X-Google-Smtp-Source: AGHT+IGqSzRq0vtEo1lBKfljxI8w6LONyRAlazpzuDznQGIG3SGER3rgVJBPsXJLCI6Lso2Clm5KjiNQW/RTz8b1Ci4= X-Received: by 2002:a05:6102:512c:b0:493:2177:9811 with SMTP id ada2fe7eead31-4945be08629mr7530405137.14.1722673159944; Sat, 03 Aug 2024 01:19:19 -0700 (PDT) MIME-Version: 1.0 From: Daniel Carpenter Date: Sat, 3 Aug 2024 10:19:09 +0200 Message-ID: Content-Type: multipart/alternative; boundary="000000000000c34177061ec31924" Received-SPF: pass client-ip=2607:f8b0:4864:20::e2f; envelope-from=dansebpub@gmail.com; helo=mail-vs1-xe2f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Mailman-Approved-At: Sat, 03 Aug 2024 13:23:49 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --000000000000c34177061ec31924 Content-Type: text/plain; charset="UTF-8" The above options allow me to use shuf to efficiently simulate a dice roll, but there is a clear bias when I do so, for example: $ for i in {1..10000}; do shuf --input-range=1-6 --head-count=1; done | sort | uniq --count 1730 1 1411 2 1882 3 1809 4 1520 5 1648 6 Using seq instead of input-range does not appear biased: $ for i in {1..10000}; do seq 6 | shuf --head-count=1; done | sort | uniq --count 1652 1 1696 2 1674 3 1638 4 1713 5 1627 6 Same for head: $ for i in {1..10000}; do shuf --input-range=1-6 | head --lines=1; done | sort | uniq --count 1639 1 1674 2 1655 3 1669 4 1688 5 1675 6 It seems that somehow combining both options affects the distribution. I assume there's some performance optimization in that case since shuf doesn't need to permute the entire input range. --000000000000c34177061ec31924 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The above options allow me to use shuf to efficiently simu= late a dice roll, but there is a clear bias when I do so, for example:
=
$ for i in {1..10000}; do shuf --input-range=3D1-6 --head-co= unt=3D1; done | sort | uniq --count
=C2=A0 =C2=A01730 1
=C2=A0 =C2=A0= 1411 2
=C2=A0 =C2=A01882 3
=C2=A0 =C2=A01809 4
=C2=A0 =C2=A01520 5=
=C2=A0 =C2=A01648 6

Using seq instead of i= nput-range does not appear biased:

$ for i in {1..= 10000}; do seq 6 | =C2=A0shuf --head-count=3D1; done | sort | uniq --count<= br>=C2=A0 =C2=A01652 1
=C2=A0 =C2=A01696 2
=C2=A0 =C2=A01674 3
=C2= =A0 =C2=A01638 4
=C2=A0 =C2=A01713 5
=C2=A0 =C2=A01627 6

Same for head:

$ for i in {1..100= 00}; do shuf --input-range=3D1-6 | head --lines=3D1; done | sort | uniq --c= ount
=C2=A0 =C2=A01639 1
=C2=A0 =C2=A01674 2
=C2=A0 =C2=A01655 3=C2=A0 =C2=A01669 4
=C2=A0 =C2=A01688 5
=C2=A0 =C2=A01675 6

It seems that somehow combining both options affects = the distribution. I assume there's some performance optimization=C2=A0i= n that case=C2=A0since shuf doesn't need to permute the entire input ra= nge.
--000000000000c34177061ec31924-- From unknown Tue Jun 17 22:00:49 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Daniel Carpenter Subject: bug#72445: closed (Re: bug#72445: shuf with both input-range and head-count biased) Message-ID: References: X-Gnu-PR-Message: they-closed 72445 X-Gnu-PR-Package: coreutils Reply-To: 72445@debbugs.gnu.org Date: Sun, 04 Aug 2024 07:12:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1722755522-10641-1" This is a multi-part message in MIME format... ------------=_1722755522-10641-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #72445: shuf with both input-range and head-count biased which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 72445@debbugs.gnu.org. --=20 72445: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D72445 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1722755522-10641-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 72445-done) by debbugs.gnu.org; 4 Aug 2024 07:11:31 +0000 Received: from localhost ([127.0.0.1]:55289 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1saVOr-0002k9-Qz for submit@debbugs.gnu.org; Sun, 04 Aug 2024 03:11:30 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:44966) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1saVOm-0002jY-I7 for 72445-done@debbugs.gnu.org; Sun, 04 Aug 2024 03:11:28 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 9340F3C00E405; Sun, 4 Aug 2024 00:10:57 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10032) with ESMTP id AWaRPQlXF6wp; Sun, 4 Aug 2024 00:10:57 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id ED0723C00E406; Sun, 4 Aug 2024 00:10:56 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu ED0723C00E406 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1722755457; bh=XQF01c6D1V6oif97Uz3TudmiLl35cCzKe9vuXpgmcCg=; h=Message-ID:Date:MIME-Version:To:From; b=eQOS4aCdROSanMZ8gCPADIoOia+4xXEuZJBZh4e8B7x510sTrc9eLyMJ/xfBAorHk JdjlWaYPE+e/aaERrJz6jyARHvBU8Z3N3j+D7vIRXaKVmVkouixyEAlWIHGjEKZ1oO gyfwV0H9KDkEwfFUTC6iAuaKdjw6y5ex2N+SD8JZmHD75jW95hPfXFrDXB+BnNWGld IBYLopxSuES6b1hfxL7PsqxtppfoKhTEM1gQxTA2mrW4I4qBPk7z9SpDiogKcUpBvW roTXm5CVy63hJsd6OpZ9IWtn9Hb82SleDj/Fj+U9AIzcPfiO02OizZpNfpacPmIian bZY7oGJG1Ka4g== X-Virus-Scanned: amavis at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id ddmm4-3BEXVu; Sun, 4 Aug 2024 00:10:56 -0700 (PDT) Received: from [192.168.254.12] (unknown [47.154.17.165]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id CC7A93C00E405; Sun, 4 Aug 2024 00:10:56 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------lSMgzvBT4mWhbFrLE6CHLsGI" Message-ID: Date: Sun, 4 Aug 2024 00:10:56 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#72445: shuf with both input-range and head-count biased To: Daniel Carpenter References: Content-Language: en-US From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 72445-done Cc: 72445-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------lSMgzvBT4mWhbFrLE6CHLsGI Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Thanks for the bug report. The bug appears to be due to a weakness in ISAAC that was reported in 2006 by Jean-Philippe Aumasson of the University of Applied Sciences Northwestern Switzerland. Although Aumasson wrote a paper about it, nobody seems to have connected the paper with coreutils. I installed the attached patch, which fixed things for me, and am marking the bug as fixed. --------------lSMgzvBT4mWhbFrLE6CHLsGI Content-Type: text/x-patch; charset=UTF-8; name="0001-shuf-fix-randomness-bug.patch" Content-Disposition: attachment; filename="0001-shuf-fix-randomness-bug.patch" Content-Transfer-Encoding: base64 RnJvbSBiZmJiM2VjN2Y3OThiMTc5ZDdmYTdiNDI2NzNlMDY4YjE4MDQ4ODk5IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTYXQsIDMgQXVnIDIwMjQgMjI6MzE6MjAgLTA3MDAKU3ViamVjdDogW1BBVENI XSBzaHVmOiBmaXggcmFuZG9tbmVzcyBidWcKClByb2JsZW0gcmVwb3J0ZWQgYnkgRGFuaWVs IENhcnBlbnRlciA8aHR0cHM6Ly9idWdzLmdudS5vcmcvNzI0NDU+LgoqIGdsL2xpYi9yYW5k cmVhZC5jIChyYW5kcmVhZF9uZXcpOiBGaWxsIHRoZSBJU0FBQyBidWZmZXIKaW5zdGVhZCBv ZiBzdG9yaW5nIGF0IG1vc3QgQllURVNfQk9VTkQgYnl0ZXMgaW50byBpdC4KLS0tCiBORVdT ICAgICAgICAgICAgICB8ICAzICsrKwogVEhBTktTLmluICAgICAgICAgfCAgMSArCiBnbC9s aWIvcmFuZHJlYWQuYyB8IDEyICsrKysrKysrKysrLQogMyBmaWxlcyBjaGFuZ2VkLCAxNSBp bnNlcnRpb25zKCspLCAxIGRlbGV0aW9uKC0pCgpkaWZmIC0tZ2l0IGEvTkVXUyBiL05FV1MK aW5kZXggNjI1MWEyZjY4Li4yZGEyNThjOWQgMTAwNjQ0Ci0tLSBhL05FV1MKKysrIGIvTkVX UwpAQCAtMTYsNiArMTYsOSBAQCBHTlUgY29yZXV0aWxzIE5FV1MgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAtKi0gb3V0bGluZSAtKi0KICAgaGF2ZSBleGl0ZWQgd2l0 aCBhICJGdW5jdGlvbiBub3QgaW1wbGVtZW50ZWQiIGVycm9yLgogICBbYnVnIGludHJvZHVj ZWQgaW4gY29yZXV0aWxzLTguMjhdCiAKKyAgJ3NodWYnIGdlbmVyYXRlcyBtb3JlLXJhbmRv bSBvdXRwdXQgd2hlbiB0aGUgb3V0cHV0IGlzIHNtYWxsLgorICBbYnVnIGludHJvZHVjZWQg aW4gY29yZXV0aWxzLTguNl0KKwogICAndGFpbCAtYyA0MDk2IC9kZXYvemVybycgbm8gbG9u Z2VyIGxvb3BzIGZvcmV2ZXIuCiAgIFtUaGlzIGJ1ZyB3YXMgcHJlc2VudCBpbiAidGhlIGJl Z2lubmluZyIuXQogCmRpZmYgLS1naXQgYS9USEFOS1MuaW4gYi9USEFOS1MuaW4KaW5kZXgg MTdmOWQ5YzY5Li41N2FjZTM4N2UgMTAwNjQ0Ci0tLSBhL1RIQU5LUy5pbgorKysgYi9USEFO S1MuaW4KQEAgLTE0MCw2ICsxNDAsNyBAQCBEYW1lb24gRy4gUm9nZXJzICAgICAgICAgICAg ICAgICAgICBkZ3IwM0B1YXJrLmVkdQogRGFuIEhhZ2VydHkgICAgICAgICAgICAgICAgICAg ICAgICAgaGFnQGdudS5haS5pdC5lZHUKIERhbiBQYXNjdSAgICAgICAgICAgICAgICAgICAg ICAgICAgIGRhbkBzZXJ2aWNlcy5paXJ1Yy5ybwogRGFuaWVsIEJlcmdzdHJvbSAgICAgICAg ICAgICAgICAgICAgbm9hQG1lbG9keS5zZQorRGFuaWVsIENhcnBlbnRlciAgICAgICAgICAg ICAgICAgICAgZGFuc2VicHViQGdtYWlsLmNvbQogRGFuaWVsIE1hY2ggICAgICAgICAgICAg ICAgICAgICAgICAgZG1hY2hAcmVkaGF0LmNvbQogRGFuaWVsIFAuIEJlcnJhbmfDqSAgICAg ICAgICAgICAgICAgIGJlcnJhbmdlQHJlZGhhdC5jb20KIERhbmllbCBTdGF2cm92c2tpICAg ICAgICAgICAgICAgICAgIGRAc3RhdnJvdnNraS5uZXQKZGlmZiAtLWdpdCBhL2dsL2xpYi9y YW5kcmVhZC5jIGIvZ2wvbGliL3JhbmRyZWFkLmMKaW5kZXggY2JlZTIyNGJiLi40M2MwY2Yw OWYgMTAwNjQ0Ci0tLSBhL2dsL2xpYi9yYW5kcmVhZC5jCisrKyBiL2dsL2xpYi9yYW5kcmVh ZC5jCkBAIC0xODksOSArMTg5LDE5IEBAIHJhbmRyZWFkX25ldyAoY2hhciBjb25zdCAqbmFt ZSwgc2l6ZV90IGJ5dGVzX2JvdW5kKQogICAgICAgICBzZXR2YnVmIChzb3VyY2UsIHMtPmJ1 Zi5jLCBfSU9GQkYsIE1JTiAoc2l6ZW9mIHMtPmJ1Zi5jLCBieXRlc19ib3VuZCkpOwogICAg ICAgZWxzZQogICAgICAgICB7CisgICAgICAgICAgLyogRmlsbCB0aGUgSVNBQUMgYnVmZmVy LiAgQWx0aG91Z2ggaXQgaXMgdGVtcHRpbmcgdG8gcmVhZCBhdAorICAgICAgICAgICAgIG1v c3QgQllURVNfQk9VTkQgYnl0ZXMsIHRoaXMgaXMgaW5jb3JyZWN0IGZvciB0d28gcmVhc29u cy4KKyAgICAgICAgICAgICBGaXJzdCwgQllURVNfQk9VTkQgaXMganVzdCBhbiBlc3RpbWF0 ZS4KKyAgICAgICAgICAgICBTZWNvbmQsIGV2ZW4gaWYgdGhlIGVzdGltYXRlIGlzIGNvcnJl Y3QKKyAgICAgICAgICAgICBJU0FBQzY0IHBvb3JseSByYW5kb21pemVzIHdoZW4gQllURVNf Qk9VTkQgaXMgc21hbGwKKyAgICAgICAgICAgICBhbmQganVzdCB0aGUgZmlyc3QgZmV3IGJ5 dGVzIG9mIHMtPmJ1Zi5pc2FhYy5zdGF0ZS5tCisgICAgICAgICAgICAgYXJlIHJhbmRvbSB3 aGlsZSB0aGUgb3RoZXIgYnl0ZXMgYXJlIGFsbCB6ZXJvLiAgU2VlOgorICAgICAgICAgICAg IEF1bWFzc29uIEotUC4gT24gdGhlIHBzZXVkby1yYW5kb20gZ2VuZXJhdG9yIElTQUFDLgor ICAgICAgICAgICAgIENyeXB0b2xvZ3kgZVByaW50IEFyY2hpdmUuIDIwMDY7NDM4LgorICAg ICAgICAgICAgIDxodHRwczovL2VwcmludC5pYWNyLm9yZy8yMDA2LzQzOD4uICAqLwogICAg ICAgICAgIHMtPmJ1Zi5pc2FhYy5idWZmZXJlZCA9IDA7CiAgICAgICAgICAgaWYgKCEgZ2V0 X25vbmNlIChzLT5idWYuaXNhYWMuc3RhdGUubSwKLSAgICAgICAgICAgICAgICAgICAgICAg ICAgIE1JTiAoc2l6ZW9mIHMtPmJ1Zi5pc2FhYy5zdGF0ZS5tLCBieXRlc19ib3VuZCkpKQor ICAgICAgICAgICAgICAgICAgICAgICAgICAgc2l6ZW9mIHMtPmJ1Zi5pc2FhYy5zdGF0ZS5t KSkKICAgICAgICAgICAgIHsKICAgICAgICAgICAgICAgaW50IGUgPSBlcnJubzsKICAgICAg ICAgICAgICAgcmFuZHJlYWRfZnJlZV9ib2R5IChzKTsKLS0gCjIuNDMuMAoK --------------lSMgzvBT4mWhbFrLE6CHLsGI-- ------------=_1722755522-10641-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 3 Aug 2024 17:23:52 +0000 Received: from localhost ([127.0.0.1]:54753 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1saITv-0004gi-7j for submit@debbugs.gnu.org; Sat, 03 Aug 2024 13:23:51 -0400 Received: from lists.gnu.org ([209.51.188.17]:53400) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sa9zN-0005YH-AZ for submit@debbugs.gnu.org; Sat, 03 Aug 2024 04:19:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sa9z2-0004Bs-QU for bug-coreutils@gnu.org; Sat, 03 Aug 2024 04:19:24 -0400 Received: from mail-vs1-xe2f.google.com ([2607:f8b0:4864:20::e2f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sa9z0-0001Go-Sj for bug-coreutils@gnu.org; Sat, 03 Aug 2024 04:19:24 -0400 Received: by mail-vs1-xe2f.google.com with SMTP id ada2fe7eead31-492959b906eso2425482137.0 for ; Sat, 03 Aug 2024 01:19:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722673160; x=1723277960; darn=gnu.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=MTtPzPIVqpVxe4UGX51RAP9vpVj9KGbyIe0eLWSCkag=; b=ODjplXAcb6Tpb1xJLRyZwYQSJQ+/L8XddsQt1FGMHndvyxa86y9r5YloO0H5kDEjRA x+pSuJwUcokbpGRhgsOEGfHDfFwCXnfxaqVADPI1JSpkkDo35706P6vQSjdEUo2l7xQr JoCk8yhZ5MAR/E0D8cAJjpC4p16KDpJyLIHI+tRJfMCyLYsRgnzYgUe8csqEtfxoobD1 LXsasBVjgNcPilVPEeIuU0YPk7duQdYZsfPX1EhvJFsXlvMuabBvD9WWv6pU9VWq4Z63 gDL5yiVjSdGAEY5tQaNEcSASiEuiYciJDHXnYA3fTVKbwUFout3O92/AnjF8d7DngyAB FN8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722673160; x=1723277960; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=MTtPzPIVqpVxe4UGX51RAP9vpVj9KGbyIe0eLWSCkag=; b=MssfLG0YRMjrmrikFklPURnF48ePz4TQi7zPJXojJcb63PCQoVTJuZYoeEr8Ytrfwp 9ZmzPX5wR+cQdutYnSwnsC8HuWXgnd392NMCkzK2FXJy7i6CHO3IHsteDI4uaC60EoHE ETFT4MmvvUrHs9aR52nuoyFfHAcUfd1zLILsS3x3ZwOrhpzPL3eTkM0/i6iddfI7uXW0 y+B04ZIbTehejpxVEBPrlkFiVr65t/Njv+zSkXdAy6MvjB9gNxv8RGK6+F+NCRzKW7dL GtGpt4oNAfq2LSi1LB2Ug2R3aFEEvQOyckEyAOXsJOXa6SqvEL9OjYhE1C8Hzgjv+msg LQGA== X-Gm-Message-State: AOJu0YyX5X+T7c5yjJ4iK8jqMsl95/Auq4C9RFlHYtDVG/DZaUqOzqDt MGafWebFGTP8h+uwMQDwLF+uPhSM93a9NsDywMA9COnN9wYmfxohZwKf7x0Q66rQJjxh6n5QMke h9WNUH8zeRptBggkM1xqQcD2xH++gF4KJVT0= X-Google-Smtp-Source: AGHT+IGqSzRq0vtEo1lBKfljxI8w6LONyRAlazpzuDznQGIG3SGER3rgVJBPsXJLCI6Lso2Clm5KjiNQW/RTz8b1Ci4= X-Received: by 2002:a05:6102:512c:b0:493:2177:9811 with SMTP id ada2fe7eead31-4945be08629mr7530405137.14.1722673159944; Sat, 03 Aug 2024 01:19:19 -0700 (PDT) MIME-Version: 1.0 From: Daniel Carpenter Date: Sat, 3 Aug 2024 10:19:09 +0200 Message-ID: Subject: shuf with both input-range and head-count biased To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary="000000000000c34177061ec31924" Received-SPF: pass client-ip=2607:f8b0:4864:20::e2f; envelope-from=dansebpub@gmail.com; helo=mail-vs1-xe2f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 03 Aug 2024 13:23:49 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --000000000000c34177061ec31924 Content-Type: text/plain; charset="UTF-8" The above options allow me to use shuf to efficiently simulate a dice roll, but there is a clear bias when I do so, for example: $ for i in {1..10000}; do shuf --input-range=1-6 --head-count=1; done | sort | uniq --count 1730 1 1411 2 1882 3 1809 4 1520 5 1648 6 Using seq instead of input-range does not appear biased: $ for i in {1..10000}; do seq 6 | shuf --head-count=1; done | sort | uniq --count 1652 1 1696 2 1674 3 1638 4 1713 5 1627 6 Same for head: $ for i in {1..10000}; do shuf --input-range=1-6 | head --lines=1; done | sort | uniq --count 1639 1 1674 2 1655 3 1669 4 1688 5 1675 6 It seems that somehow combining both options affects the distribution. I assume there's some performance optimization in that case since shuf doesn't need to permute the entire input range. --000000000000c34177061ec31924 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The above options allow me to use shuf to efficiently simu= late a dice roll, but there is a clear bias when I do so, for example:
=
$ for i in {1..10000}; do shuf --input-range=3D1-6 --head-co= unt=3D1; done | sort | uniq --count
=C2=A0 =C2=A01730 1
=C2=A0 =C2=A0= 1411 2
=C2=A0 =C2=A01882 3
=C2=A0 =C2=A01809 4
=C2=A0 =C2=A01520 5=
=C2=A0 =C2=A01648 6

Using seq instead of i= nput-range does not appear biased:

$ for i in {1..= 10000}; do seq 6 | =C2=A0shuf --head-count=3D1; done | sort | uniq --count<= br>=C2=A0 =C2=A01652 1
=C2=A0 =C2=A01696 2
=C2=A0 =C2=A01674 3
=C2= =A0 =C2=A01638 4
=C2=A0 =C2=A01713 5
=C2=A0 =C2=A01627 6

Same for head:

$ for i in {1..100= 00}; do shuf --input-range=3D1-6 | head --lines=3D1; done | sort | uniq --c= ount
=C2=A0 =C2=A01639 1
=C2=A0 =C2=A01674 2
=C2=A0 =C2=A01655 3=C2=A0 =C2=A01669 4
=C2=A0 =C2=A01688 5
=C2=A0 =C2=A01675 6

It seems that somehow combining both options affects = the distribution. I assume there's some performance optimization=C2=A0i= n that case=C2=A0since shuf doesn't need to permute the entire input ra= nge.
--000000000000c34177061ec31924-- ------------=_1722755522-10641-1--