GNU bug report logs - #72445
shuf with both input-range and head-count biased

Previous Next

Package: coreutils;

Reported by: Daniel Carpenter <dansebpub <at> gmail.com>

Date: Sat, 3 Aug 2024 17:24:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 72445 in the body.
You can then email your comments to 72445 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#72445; Package coreutils. (Sat, 03 Aug 2024 17:24:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniel Carpenter <dansebpub <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 03 Aug 2024 17:24:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Carpenter <dansebpub <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: shuf with both input-range and head-count biased
Date: Sat, 3 Aug 2024 10:19:09 +0200
[Message part 1 (text/plain, inline)]
The above options allow me to use shuf to efficiently simulate a dice roll,
but there is a clear bias when I do so, for example:

$ for i in {1..10000}; do shuf --input-range=1-6 --head-count=1; done |
sort | uniq --count
   1730 1
   1411 2
   1882 3
   1809 4
   1520 5
   1648 6

Using seq instead of input-range does not appear biased:

$ for i in {1..10000}; do seq 6 |  shuf --head-count=1; done | sort | uniq
--count
   1652 1
   1696 2
   1674 3
   1638 4
   1713 5
   1627 6

Same for head:

$ for i in {1..10000}; do shuf --input-range=1-6 | head --lines=1; done |
sort | uniq --count
   1639 1
   1674 2
   1655 3
   1669 4
   1688 5
   1675 6

It seems that somehow combining both options affects the distribution. I
assume there's some performance optimization in that case since shuf
doesn't need to permute the entire input range.
[Message part 2 (text/html, inline)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 04 Aug 2024 07:12:02 GMT) Full text and rfc822 format available.

Notification sent to Daniel Carpenter <dansebpub <at> gmail.com>:
bug acknowledged by developer. (Sun, 04 Aug 2024 07:12:02 GMT) Full text and rfc822 format available.

Message #10 received at 72445-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Daniel Carpenter <dansebpub <at> gmail.com>
Cc: 72445-done <at> debbugs.gnu.org
Subject: Re: bug#72445: shuf with both input-range and head-count biased
Date: Sun, 4 Aug 2024 00:10:56 -0700
[Message part 1 (text/plain, inline)]
Thanks for the bug report. The bug appears to be due to a weakness in 
ISAAC that was reported in 2006 by Jean-Philippe Aumasson of the 
University of Applied Sciences Northwestern Switzerland. Although 
Aumasson wrote a paper about it, nobody seems to have connected the 
paper with coreutils.

I installed the attached patch, which fixed things for me, and am 
marking the bug as fixed.
[0001-shuf-fix-randomness-bug.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 01 Sep 2024 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 288 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.