From unknown Tue Jun 17 20:18:05 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#72445 <72445@debbugs.gnu.org> To: bug#72445 <72445@debbugs.gnu.org> Subject: Status: shuf with both input-range and head-count biased Reply-To: bug#72445 <72445@debbugs.gnu.org> Date: Wed, 18 Jun 2025 03:18:05 +0000 retitle 72445 shuf with both input-range and head-count biased reassign 72445 coreutils submitter 72445 Daniel Carpenter severity 72445 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 03 13:23:52 2024 Received: (at submit) by debbugs.gnu.org; 3 Aug 2024 17:23:52 +0000 Received: from localhost ([127.0.0.1]:54753 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1saITv-0004gi-7j for submit@debbugs.gnu.org; Sat, 03 Aug 2024 13:23:51 -0400 Received: from lists.gnu.org ([209.51.188.17]:53400) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sa9zN-0005YH-AZ for submit@debbugs.gnu.org; Sat, 03 Aug 2024 04:19:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sa9z2-0004Bs-QU for bug-coreutils@gnu.org; Sat, 03 Aug 2024 04:19:24 -0400 Received: from mail-vs1-xe2f.google.com ([2607:f8b0:4864:20::e2f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sa9z0-0001Go-Sj for bug-coreutils@gnu.org; Sat, 03 Aug 2024 04:19:24 -0400 Received: by mail-vs1-xe2f.google.com with SMTP id ada2fe7eead31-492959b906eso2425482137.0 for ; Sat, 03 Aug 2024 01:19:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722673160; x=1723277960; darn=gnu.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=MTtPzPIVqpVxe4UGX51RAP9vpVj9KGbyIe0eLWSCkag=; b=ODjplXAcb6Tpb1xJLRyZwYQSJQ+/L8XddsQt1FGMHndvyxa86y9r5YloO0H5kDEjRA x+pSuJwUcokbpGRhgsOEGfHDfFwCXnfxaqVADPI1JSpkkDo35706P6vQSjdEUo2l7xQr JoCk8yhZ5MAR/E0D8cAJjpC4p16KDpJyLIHI+tRJfMCyLYsRgnzYgUe8csqEtfxoobD1 LXsasBVjgNcPilVPEeIuU0YPk7duQdYZsfPX1EhvJFsXlvMuabBvD9WWv6pU9VWq4Z63 gDL5yiVjSdGAEY5tQaNEcSASiEuiYciJDHXnYA3fTVKbwUFout3O92/AnjF8d7DngyAB FN8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722673160; x=1723277960; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=MTtPzPIVqpVxe4UGX51RAP9vpVj9KGbyIe0eLWSCkag=; b=MssfLG0YRMjrmrikFklPURnF48ePz4TQi7zPJXojJcb63PCQoVTJuZYoeEr8Ytrfwp 9ZmzPX5wR+cQdutYnSwnsC8HuWXgnd392NMCkzK2FXJy7i6CHO3IHsteDI4uaC60EoHE ETFT4MmvvUrHs9aR52nuoyFfHAcUfd1zLILsS3x3ZwOrhpzPL3eTkM0/i6iddfI7uXW0 y+B04ZIbTehejpxVEBPrlkFiVr65t/Njv+zSkXdAy6MvjB9gNxv8RGK6+F+NCRzKW7dL GtGpt4oNAfq2LSi1LB2Ug2R3aFEEvQOyckEyAOXsJOXa6SqvEL9OjYhE1C8Hzgjv+msg LQGA== X-Gm-Message-State: AOJu0YyX5X+T7c5yjJ4iK8jqMsl95/Auq4C9RFlHYtDVG/DZaUqOzqDt MGafWebFGTP8h+uwMQDwLF+uPhSM93a9NsDywMA9COnN9wYmfxohZwKf7x0Q66rQJjxh6n5QMke h9WNUH8zeRptBggkM1xqQcD2xH++gF4KJVT0= X-Google-Smtp-Source: AGHT+IGqSzRq0vtEo1lBKfljxI8w6LONyRAlazpzuDznQGIG3SGER3rgVJBPsXJLCI6Lso2Clm5KjiNQW/RTz8b1Ci4= X-Received: by 2002:a05:6102:512c:b0:493:2177:9811 with SMTP id ada2fe7eead31-4945be08629mr7530405137.14.1722673159944; Sat, 03 Aug 2024 01:19:19 -0700 (PDT) MIME-Version: 1.0 From: Daniel Carpenter Date: Sat, 3 Aug 2024 10:19:09 +0200 Message-ID: Subject: shuf with both input-range and head-count biased To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary="000000000000c34177061ec31924" Received-SPF: pass client-ip=2607:f8b0:4864:20::e2f; envelope-from=dansebpub@gmail.com; helo=mail-vs1-xe2f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 03 Aug 2024 13:23:49 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --000000000000c34177061ec31924 Content-Type: text/plain; charset="UTF-8" The above options allow me to use shuf to efficiently simulate a dice roll, but there is a clear bias when I do so, for example: $ for i in {1..10000}; do shuf --input-range=1-6 --head-count=1; done | sort | uniq --count 1730 1 1411 2 1882 3 1809 4 1520 5 1648 6 Using seq instead of input-range does not appear biased: $ for i in {1..10000}; do seq 6 | shuf --head-count=1; done | sort | uniq --count 1652 1 1696 2 1674 3 1638 4 1713 5 1627 6 Same for head: $ for i in {1..10000}; do shuf --input-range=1-6 | head --lines=1; done | sort | uniq --count 1639 1 1674 2 1655 3 1669 4 1688 5 1675 6 It seems that somehow combining both options affects the distribution. I assume there's some performance optimization in that case since shuf doesn't need to permute the entire input range. --000000000000c34177061ec31924 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The above options allow me to use shuf to efficiently simu= late a dice roll, but there is a clear bias when I do so, for example:
=
$ for i in {1..10000}; do shuf --input-range=3D1-6 --head-co= unt=3D1; done | sort | uniq --count
=C2=A0 =C2=A01730 1
=C2=A0 =C2=A0= 1411 2
=C2=A0 =C2=A01882 3
=C2=A0 =C2=A01809 4
=C2=A0 =C2=A01520 5=
=C2=A0 =C2=A01648 6

Using seq instead of i= nput-range does not appear biased:

$ for i in {1..= 10000}; do seq 6 | =C2=A0shuf --head-count=3D1; done | sort | uniq --count<= br>=C2=A0 =C2=A01652 1
=C2=A0 =C2=A01696 2
=C2=A0 =C2=A01674 3
=C2= =A0 =C2=A01638 4
=C2=A0 =C2=A01713 5
=C2=A0 =C2=A01627 6

Same for head:

$ for i in {1..100= 00}; do shuf --input-range=3D1-6 | head --lines=3D1; done | sort | uniq --c= ount
=C2=A0 =C2=A01639 1
=C2=A0 =C2=A01674 2
=C2=A0 =C2=A01655 3=C2=A0 =C2=A01669 4
=C2=A0 =C2=A01688 5
=C2=A0 =C2=A01675 6

It seems that somehow combining both options affects = the distribution. I assume there's some performance optimization=C2=A0i= n that case=C2=A0since shuf doesn't need to permute the entire input ra= nge.
--000000000000c34177061ec31924-- From debbugs-submit-bounces@debbugs.gnu.org Sun Aug 04 03:11:30 2024 Received: (at 72445-done) by debbugs.gnu.org; 4 Aug 2024 07:11:31 +0000 Received: from localhost ([127.0.0.1]:55289 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1saVOr-0002k9-Qz for submit@debbugs.gnu.org; Sun, 04 Aug 2024 03:11:30 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:44966) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1saVOm-0002jY-I7 for 72445-done@debbugs.gnu.org; Sun, 04 Aug 2024 03:11:28 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 9340F3C00E405; Sun, 4 Aug 2024 00:10:57 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10032) with ESMTP id AWaRPQlXF6wp; Sun, 4 Aug 2024 00:10:57 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id ED0723C00E406; Sun, 4 Aug 2024 00:10:56 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu ED0723C00E406 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1722755457; bh=XQF01c6D1V6oif97Uz3TudmiLl35cCzKe9vuXpgmcCg=; h=Message-ID:Date:MIME-Version:To:From; b=eQOS4aCdROSanMZ8gCPADIoOia+4xXEuZJBZh4e8B7x510sTrc9eLyMJ/xfBAorHk JdjlWaYPE+e/aaERrJz6jyARHvBU8Z3N3j+D7vIRXaKVmVkouixyEAlWIHGjEKZ1oO gyfwV0H9KDkEwfFUTC6iAuaKdjw6y5ex2N+SD8JZmHD75jW95hPfXFrDXB+BnNWGld IBYLopxSuES6b1hfxL7PsqxtppfoKhTEM1gQxTA2mrW4I4qBPk7z9SpDiogKcUpBvW roTXm5CVy63hJsd6OpZ9IWtn9Hb82SleDj/Fj+U9AIzcPfiO02OizZpNfpacPmIian bZY7oGJG1Ka4g== X-Virus-Scanned: amavis at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id ddmm4-3BEXVu; Sun, 4 Aug 2024 00:10:56 -0700 (PDT) Received: from [192.168.254.12] (unknown [47.154.17.165]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id CC7A93C00E405; Sun, 4 Aug 2024 00:10:56 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------lSMgzvBT4mWhbFrLE6CHLsGI" Message-ID: Date: Sun, 4 Aug 2024 00:10:56 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#72445: shuf with both input-range and head-count biased To: Daniel Carpenter References: Content-Language: en-US From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 72445-done Cc: 72445-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------lSMgzvBT4mWhbFrLE6CHLsGI Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Thanks for the bug report. The bug appears to be due to a weakness in ISAAC that was reported in 2006 by Jean-Philippe Aumasson of the University of Applied Sciences Northwestern Switzerland. Although Aumasson wrote a paper about it, nobody seems to have connected the paper with coreutils. I installed the attached patch, which fixed things for me, and am marking the bug as fixed. --------------lSMgzvBT4mWhbFrLE6CHLsGI Content-Type: text/x-patch; charset=UTF-8; name="0001-shuf-fix-randomness-bug.patch" Content-Disposition: attachment; filename="0001-shuf-fix-randomness-bug.patch" Content-Transfer-Encoding: base64 RnJvbSBiZmJiM2VjN2Y3OThiMTc5ZDdmYTdiNDI2NzNlMDY4YjE4MDQ4ODk5IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBTYXQsIDMgQXVnIDIwMjQgMjI6MzE6MjAgLTA3MDAKU3ViamVjdDogW1BBVENI XSBzaHVmOiBmaXggcmFuZG9tbmVzcyBidWcKClByb2JsZW0gcmVwb3J0ZWQgYnkgRGFuaWVs IENhcnBlbnRlciA8aHR0cHM6Ly9idWdzLmdudS5vcmcvNzI0NDU+LgoqIGdsL2xpYi9yYW5k cmVhZC5jIChyYW5kcmVhZF9uZXcpOiBGaWxsIHRoZSBJU0FBQyBidWZmZXIKaW5zdGVhZCBv ZiBzdG9yaW5nIGF0IG1vc3QgQllURVNfQk9VTkQgYnl0ZXMgaW50byBpdC4KLS0tCiBORVdT ICAgICAgICAgICAgICB8ICAzICsrKwogVEhBTktTLmluICAgICAgICAgfCAgMSArCiBnbC9s aWIvcmFuZHJlYWQuYyB8IDEyICsrKysrKysrKysrLQogMyBmaWxlcyBjaGFuZ2VkLCAxNSBp bnNlcnRpb25zKCspLCAxIGRlbGV0aW9uKC0pCgpkaWZmIC0tZ2l0IGEvTkVXUyBiL05FV1MK aW5kZXggNjI1MWEyZjY4Li4yZGEyNThjOWQgMTAwNjQ0Ci0tLSBhL05FV1MKKysrIGIvTkVX UwpAQCAtMTYsNiArMTYsOSBAQCBHTlUgY29yZXV0aWxzIE5FV1MgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAtKi0gb3V0bGluZSAtKi0KICAgaGF2ZSBleGl0ZWQgd2l0 aCBhICJGdW5jdGlvbiBub3QgaW1wbGVtZW50ZWQiIGVycm9yLgogICBbYnVnIGludHJvZHVj ZWQgaW4gY29yZXV0aWxzLTguMjhdCiAKKyAgJ3NodWYnIGdlbmVyYXRlcyBtb3JlLXJhbmRv bSBvdXRwdXQgd2hlbiB0aGUgb3V0cHV0IGlzIHNtYWxsLgorICBbYnVnIGludHJvZHVjZWQg aW4gY29yZXV0aWxzLTguNl0KKwogICAndGFpbCAtYyA0MDk2IC9kZXYvemVybycgbm8gbG9u Z2VyIGxvb3BzIGZvcmV2ZXIuCiAgIFtUaGlzIGJ1ZyB3YXMgcHJlc2VudCBpbiAidGhlIGJl Z2lubmluZyIuXQogCmRpZmYgLS1naXQgYS9USEFOS1MuaW4gYi9USEFOS1MuaW4KaW5kZXgg MTdmOWQ5YzY5Li41N2FjZTM4N2UgMTAwNjQ0Ci0tLSBhL1RIQU5LUy5pbgorKysgYi9USEFO S1MuaW4KQEAgLTE0MCw2ICsxNDAsNyBAQCBEYW1lb24gRy4gUm9nZXJzICAgICAgICAgICAg ICAgICAgICBkZ3IwM0B1YXJrLmVkdQogRGFuIEhhZ2VydHkgICAgICAgICAgICAgICAgICAg ICAgICAgaGFnQGdudS5haS5pdC5lZHUKIERhbiBQYXNjdSAgICAgICAgICAgICAgICAgICAg ICAgICAgIGRhbkBzZXJ2aWNlcy5paXJ1Yy5ybwogRGFuaWVsIEJlcmdzdHJvbSAgICAgICAg ICAgICAgICAgICAgbm9hQG1lbG9keS5zZQorRGFuaWVsIENhcnBlbnRlciAgICAgICAgICAg ICAgICAgICAgZGFuc2VicHViQGdtYWlsLmNvbQogRGFuaWVsIE1hY2ggICAgICAgICAgICAg ICAgICAgICAgICAgZG1hY2hAcmVkaGF0LmNvbQogRGFuaWVsIFAuIEJlcnJhbmfDqSAgICAg ICAgICAgICAgICAgIGJlcnJhbmdlQHJlZGhhdC5jb20KIERhbmllbCBTdGF2cm92c2tpICAg ICAgICAgICAgICAgICAgIGRAc3RhdnJvdnNraS5uZXQKZGlmZiAtLWdpdCBhL2dsL2xpYi9y YW5kcmVhZC5jIGIvZ2wvbGliL3JhbmRyZWFkLmMKaW5kZXggY2JlZTIyNGJiLi40M2MwY2Yw OWYgMTAwNjQ0Ci0tLSBhL2dsL2xpYi9yYW5kcmVhZC5jCisrKyBiL2dsL2xpYi9yYW5kcmVh ZC5jCkBAIC0xODksOSArMTg5LDE5IEBAIHJhbmRyZWFkX25ldyAoY2hhciBjb25zdCAqbmFt ZSwgc2l6ZV90IGJ5dGVzX2JvdW5kKQogICAgICAgICBzZXR2YnVmIChzb3VyY2UsIHMtPmJ1 Zi5jLCBfSU9GQkYsIE1JTiAoc2l6ZW9mIHMtPmJ1Zi5jLCBieXRlc19ib3VuZCkpOwogICAg ICAgZWxzZQogICAgICAgICB7CisgICAgICAgICAgLyogRmlsbCB0aGUgSVNBQUMgYnVmZmVy LiAgQWx0aG91Z2ggaXQgaXMgdGVtcHRpbmcgdG8gcmVhZCBhdAorICAgICAgICAgICAgIG1v c3QgQllURVNfQk9VTkQgYnl0ZXMsIHRoaXMgaXMgaW5jb3JyZWN0IGZvciB0d28gcmVhc29u cy4KKyAgICAgICAgICAgICBGaXJzdCwgQllURVNfQk9VTkQgaXMganVzdCBhbiBlc3RpbWF0 ZS4KKyAgICAgICAgICAgICBTZWNvbmQsIGV2ZW4gaWYgdGhlIGVzdGltYXRlIGlzIGNvcnJl Y3QKKyAgICAgICAgICAgICBJU0FBQzY0IHBvb3JseSByYW5kb21pemVzIHdoZW4gQllURVNf Qk9VTkQgaXMgc21hbGwKKyAgICAgICAgICAgICBhbmQganVzdCB0aGUgZmlyc3QgZmV3IGJ5 dGVzIG9mIHMtPmJ1Zi5pc2FhYy5zdGF0ZS5tCisgICAgICAgICAgICAgYXJlIHJhbmRvbSB3 aGlsZSB0aGUgb3RoZXIgYnl0ZXMgYXJlIGFsbCB6ZXJvLiAgU2VlOgorICAgICAgICAgICAg IEF1bWFzc29uIEotUC4gT24gdGhlIHBzZXVkby1yYW5kb20gZ2VuZXJhdG9yIElTQUFDLgor ICAgICAgICAgICAgIENyeXB0b2xvZ3kgZVByaW50IEFyY2hpdmUuIDIwMDY7NDM4LgorICAg ICAgICAgICAgIDxodHRwczovL2VwcmludC5pYWNyLm9yZy8yMDA2LzQzOD4uICAqLwogICAg ICAgICAgIHMtPmJ1Zi5pc2FhYy5idWZmZXJlZCA9IDA7CiAgICAgICAgICAgaWYgKCEgZ2V0 X25vbmNlIChzLT5idWYuaXNhYWMuc3RhdGUubSwKLSAgICAgICAgICAgICAgICAgICAgICAg ICAgIE1JTiAoc2l6ZW9mIHMtPmJ1Zi5pc2FhYy5zdGF0ZS5tLCBieXRlc19ib3VuZCkpKQor ICAgICAgICAgICAgICAgICAgICAgICAgICAgc2l6ZW9mIHMtPmJ1Zi5pc2FhYy5zdGF0ZS5t KSkKICAgICAgICAgICAgIHsKICAgICAgICAgICAgICAgaW50IGUgPSBlcnJubzsKICAgICAg ICAgICAgICAgcmFuZHJlYWRfZnJlZV9ib2R5IChzKTsKLS0gCjIuNDMuMAoK --------------lSMgzvBT4mWhbFrLE6CHLsGI-- From unknown Tue Jun 17 20:18:05 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 01 Sep 2024 11:24:09 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator