GNU bug report logs - #72445
shuf with both input-range and head-count biased

Previous Next

Package: coreutils;

Reported by: Daniel Carpenter <dansebpub <at> gmail.com>

Date: Sat, 3 Aug 2024 17:24:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#72445: closed (shuf with both input-range and head-count biased)
Date: Sun, 04 Aug 2024 07:12:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Sun, 4 Aug 2024 00:10:56 -0700
with message-id <de2893c2-bbf8-4f02-be79-67e4ce36b3f2 <at> cs.ucla.edu>
and subject line Re: bug#72445: shuf with both input-range and head-count biased
has caused the debbugs.gnu.org bug report #72445,
regarding shuf with both input-range and head-count biased
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
72445: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=72445
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Daniel Carpenter <dansebpub <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: shuf with both input-range and head-count biased
Date: Sat, 3 Aug 2024 10:19:09 +0200
[Message part 3 (text/plain, inline)]
The above options allow me to use shuf to efficiently simulate a dice roll,
but there is a clear bias when I do so, for example:

$ for i in {1..10000}; do shuf --input-range=1-6 --head-count=1; done |
sort | uniq --count
   1730 1
   1411 2
   1882 3
   1809 4
   1520 5
   1648 6

Using seq instead of input-range does not appear biased:

$ for i in {1..10000}; do seq 6 |  shuf --head-count=1; done | sort | uniq
--count
   1652 1
   1696 2
   1674 3
   1638 4
   1713 5
   1627 6

Same for head:

$ for i in {1..10000}; do shuf --input-range=1-6 | head --lines=1; done |
sort | uniq --count
   1639 1
   1674 2
   1655 3
   1669 4
   1688 5
   1675 6

It seems that somehow combining both options affects the distribution. I
assume there's some performance optimization in that case since shuf
doesn't need to permute the entire input range.
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Daniel Carpenter <dansebpub <at> gmail.com>
Cc: 72445-done <at> debbugs.gnu.org
Subject: Re: bug#72445: shuf with both input-range and head-count biased
Date: Sun, 4 Aug 2024 00:10:56 -0700
[Message part 6 (text/plain, inline)]
Thanks for the bug report. The bug appears to be due to a weakness in 
ISAAC that was reported in 2006 by Jean-Philippe Aumasson of the 
University of Applied Sciences Northwestern Switzerland. Although 
Aumasson wrote a paper about it, nobody seems to have connected the 
paper with coreutils.

I installed the attached patch, which fixed things for me, and am 
marking the bug as fixed.
[0001-shuf-fix-randomness-bug.patch (text/x-patch, attachment)]

This bug report was last modified 288 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.