GNU bug report logs - #72445
shuf with both input-range and head-count biased

Previous Next

Package: coreutils;

Reported by: Daniel Carpenter <dansebpub <at> gmail.com>

Date: Sat, 3 Aug 2024 17:24:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Daniel Carpenter <dansebpub <at> gmail.com>
To: 72445 <at> debbugs.gnu.org
Subject: bug#72445: shuf with both input-range and head-count biased
Date: Sat, 3 Aug 2024 10:19:09 +0200
[Message part 1 (text/plain, inline)]
The above options allow me to use shuf to efficiently simulate a dice roll,
but there is a clear bias when I do so, for example:

$ for i in {1..10000}; do shuf --input-range=1-6 --head-count=1; done |
sort | uniq --count
   1730 1
   1411 2
   1882 3
   1809 4
   1520 5
   1648 6

Using seq instead of input-range does not appear biased:

$ for i in {1..10000}; do seq 6 |  shuf --head-count=1; done | sort | uniq
--count
   1652 1
   1696 2
   1674 3
   1638 4
   1713 5
   1627 6

Same for head:

$ for i in {1..10000}; do shuf --input-range=1-6 | head --lines=1; done |
sort | uniq --count
   1639 1
   1674 2
   1655 3
   1669 4
   1688 5
   1675 6

It seems that somehow combining both options affects the distribution. I
assume there's some performance optimization in that case since shuf
doesn't need to permute the entire input range.
[Message part 2 (text/html, inline)]

This bug report was last modified 288 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.