Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <1451048.pW9z17ilMA@PointedEars.de>
References: <87oaksowwg.fsf@Equus.decebal.nl> <1451048.pW9z17ilMA@PointedEars.de>
Date: Sun, 7 Jun 2015 21:51:47 +1000
Subject: Re: Testing random
From: Chris Angelico <rosuav@gmail.com>
Cc: "python-list@python.org" <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.242.1433677915.13271.python-list@python.org>
Lines: 57
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:92236

On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn
<PointedEars@web.de> wrote:
> Cecil Westerhof wrote:
>
>> I wrote a very simple function to test random:
>>     def test_random(length, multiplier = 10000):
>>         number_list = length * [0]
>>         for i in range(length * multiplier):
>>             number_list[random.randint(0, length - 1)] += 1
>>         minimum = min(number_list)
>>         maximum = max(number_list)
>>         return (minimum, maximum, minimum / maximum)
>
> As there is no guarantee that every number will occur randomly, using a
> dictionary at first should be more efficient than a list:

Hmm, I'm not sure that's actually so. His code is aiming to get
'multiplier' values in each box; for any serious multiplier (he starts
with 10 in the main code), you can be fairly confident that every
number will come up at least once. The distribution of numbers won't
ever be perfectly even, but you'd expect it to be reasonably close. I
have a similar routine on my Dungeons & Dragons server; since a roll
of a twenty-sided dice is crucial to most of the game, I have a simple
tester that proves to people that the in-built dice roller is fair.
Its output looks like this:

> roll test
1: 10017
2: 10003
3: 9966
4: 9728
5: 10088
6: 9888
7: 10087
8: 9971
9: 10052
10: 10061
11: 10130
12: 9942
13: 10062
14: 10075
15: 10050
16: 9948
17: 9880
18: 10052
19: 9995
20: 10005
Standard deviation: 90.18 (0.90%)

This is about equivalent to test_random(20), and as you see, he and I
both picked a multiplier of 10K to use by default. (I call the
parameters "max" and "avg" rather than "length" and "multiplier", but
they have the exact same semantics.) The hard part is figuring out
what "looks reasonable"; a true RNG could legitimately produce nothing
but 7s for the entire run, it's just extremely unlikely.

ChrisA