Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #92207 > unrolled thread

Testing random

Started byCecil Westerhof <Cecil@decebal.nl>
First post2015-06-07 08:27 +0200
Last post2015-06-07 11:04 -0400
Articles 20 on this page of 80 — 22 participants

Back to article view | Back to comp.lang.python


Contents

  Testing random Cecil Westerhof <Cecil@decebal.nl> - 2015-06-07 08:27 +0200
    Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 12:40 +0200
      Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-07 21:51 +1000
        Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 17:51 +0200
          Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 02:25 +1000
            Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 18:36 +0200
              Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 02:44 +1000
                Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 20:23 +0200
                  Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 04:52 +1000
                    Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 21:41 +0200
                  Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-07 22:08 +0300
                    Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 21:29 +0200
                      Re: Testing random random832@fastmail.us - 2015-06-07 15:44 -0400
                        Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:09 +0200
                          Re: Testing random random832@fastmail.us - 2015-06-07 16:41 -0400
                            Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:59 +0200
                            Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:26 +1000
                            Re: Testing random random832@fastmail.us - 2015-06-07 21:34 -0400
                            Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 11:42 +1000
                            Re: Testing random MRAB <python@mrabarnett.plus.com> - 2015-06-08 02:49 +0100
                            Re: Testing random random832@fastmail.us - 2015-06-07 21:57 -0400
                      Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-08 10:40 +0300
                        Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-10 19:03 +0200
                          Re: Testing random sohcahtoa82@gmail.com - 2015-06-10 10:52 -0700
                            Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-10 23:00 +0300
                          Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-10 12:02 -0600
                            Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-12 23:32 +0200
                              Re: Testing random alister <alister.nospam.ware@ntlworld.com> - 2015-06-12 21:46 +0000
                              Re: Testing random random832@fastmail.us - 2015-06-12 17:52 -0400
                              Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-12 16:00 -0600
                                Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-13 00:09 +0200
                                  Re: Testing random sohcahtoa82@gmail.com - 2015-06-12 15:55 -0700
                                  Re: Testing random random832@fastmail.us - 2015-06-12 18:57 -0400
                              Re: Testing random Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-06-13 08:53 +0100
                          Re: Testing random random832@fastmail.us - 2015-06-10 14:26 -0400
                  Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-07 14:21 -0700
                    Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-16 21:18 +0200
                      Re: Testing random random832@fastmail.us - 2015-06-16 16:23 -0400
                      Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-16 13:48 -0700
                        Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-16 23:57 +0200
                          Re: Testing random sohcahtoa82@gmail.com - 2015-06-16 15:30 -0700
                            Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-16 16:58 -0600
                            Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 11:28 +0200
                          Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-16 16:26 -0700
                            Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-17 01:45 +0200
                              Re: Testing random sohcahtoa82@gmail.com - 2015-06-16 17:36 -0700
                                Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-17 11:01 +1000
                                Re: Testing random Ethan Furman <ethan@stoneleaf.us> - 2015-06-16 18:32 -0700
                                Re: Testing random Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-06-17 09:41 +0100
                                  Re: Testing random Grant Edwards <invalid@invalid.invalid> - 2015-06-17 14:04 +0000
                                    Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-17 09:01 -0600
                              Re: Testing random MRAB <python@mrabarnett.plus.com> - 2015-06-17 01:42 +0100
                                Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-17 08:53 +0200
                                  Re: Testing random Christian Gollwitzer <auriocus@gmx.de> - 2015-06-17 09:22 +0200
                                    Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-17 17:28 +1000
                                    Re: Testing random Tim Golden <mail@timgolden.me.uk> - 2015-06-17 08:30 +0100
                                      Re: Testing random Cecil Westerhof <Cecil@decebal.nl> - 2015-06-17 11:57 +0200
                              Re: Testing random Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-06-17 01:36 +0000
                              Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 12:33 +0200
                                Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-17 22:47 +1000
                                  Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 15:50 +0200
                        Re: Testing random Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-06-17 01:35 +0000
                    Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-17 07:41 +0300
                  Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:11 +1000
              Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-07 11:07 -0600
              Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 03:20 +1000
              Re: Testing random "C.D. Reimer" <chris@cdreimer.com> - 2015-06-07 10:36 -0700
              Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 04:28 +1000
                Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 04:40 +1000
          Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 04:24 +1000
    Re: Testing random Jonas Wielicki <jonas@wielicki.name> - 2015-06-07 12:41 +0200
    Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-07 22:52 +1000
      Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-07 23:06 +1000
      Re: Testing random Peter Otten <__peter__@web.de> - 2015-06-07 15:35 +0200
        Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 18:36 +0200
          Re: Testing random Peter Otten <__peter__@web.de> - 2015-06-07 18:48 +0200
            Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:15 +0200
        Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:35 +1000
    Re: Testing random Christian Gollwitzer <auriocus@gmx.de> - 2015-06-07 14:53 +0200
    Re: Testing random Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-06-07 11:04 -0400

Page 1 of 4  [1] 2 3 4  Next page →


#92207 — Testing random

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-06-07 08:27 +0200
SubjectTesting random
Message-ID<87oaksowwg.fsf@Equus.decebal.nl>
I wrote a very simple function to test random:
    def test_random(length, multiplier = 10000):
        number_list = length * [0]
        for i in range(length * multiplier):
            number_list[random.randint(0, length - 1)] += 1
        minimum = min(number_list)
        maximum = max(number_list)
        return (minimum, maximum, minimum / maximum)

When running:
    for i in range(1, 7):
        print(test_random(100, 10 ** i))
I get:
    (3, 17, 0.17647058823529413)
    (73, 127, 0.5748031496062992)
    (915, 1086, 0.8425414364640884)
    (9723, 10195, 0.9537027954879843)
    (99348, 100620, 0.9873583780560524)
    (997198, 1002496, 0.9947151908835546)

and when running:
    for i in range(1, 7):
        print(test_random(1000, 10 ** i))
I get:
    (2, 20, 0.1)
    (69, 138, 0.5)
    (908, 1098, 0.8269581056466302)
    (9684, 10268, 0.9431242695753799)
    (99046, 100979, 0.9808574059953059)
    (996923, 1003598, 0.9933489305478886)

It shows that when the first parameter increases the deviation
increases and when the second parameter increases the deviation
decreases. Exactly what you would expect. But what are the ranges you
would expect with a good random function. Then it could be used to
test a random function.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [next] | [standalone]


#92226

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-06-07 12:40 +0200
Message-ID<1451048.pW9z17ilMA@PointedEars.de>
In reply to#92207
Cecil Westerhof wrote:

> I wrote a very simple function to test random:
>     def test_random(length, multiplier = 10000):
>         number_list = length * [0]
>         for i in range(length * multiplier):
>             number_list[random.randint(0, length - 1)] += 1
>         minimum = min(number_list)
>         maximum = max(number_list)
>         return (minimum, maximum, minimum / maximum)

As there is no guarantee that every number will occur randomly, using a 
dictionary at first should be more efficient than a list:

def test_random (length, multiplier=10000):
    number_list = {}
    for i in range(length * multiplier):
        r = random.randint(0, length - 1)
        number_list[r] = number_list[r] + 1 if r in number_list else 1
    values = number_list.values()
    minimum = min(values)
    maximum = max(values)
    return (minimum, maximum, minimum / maximum)

[Тuple literals are going to be introduced with C# 7 ;-)]


Since Python uses indentation as syntax element, it is better not to indent 
Python code by default when posting in order to make it easier testable.  If 
necessary, code sections can be delimited in an unambiguous way that is 
syntactically correct by lines like

#------------------------------------------------------------------------

The line length then should not exceed 72 to 78 characters, for quoting.

Although the <HT> character can be used for indentation, 4 spaces appear to 
be a sensible, device-independent default (8 is too much for more complex 
snippets that need to fit within the conventional 80-characters-per-line 
limit).

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#92236

FromChris Angelico <rosuav@gmail.com>
Date2015-06-07 21:51 +1000
Message-ID<mailman.242.1433677915.13271.python-list@python.org>
In reply to#92226
On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn
<PointedEars@web.de> wrote:
> Cecil Westerhof wrote:
>
>> I wrote a very simple function to test random:
>>     def test_random(length, multiplier = 10000):
>>         number_list = length * [0]
>>         for i in range(length * multiplier):
>>             number_list[random.randint(0, length - 1)] += 1
>>         minimum = min(number_list)
>>         maximum = max(number_list)
>>         return (minimum, maximum, minimum / maximum)
>
> As there is no guarantee that every number will occur randomly, using a
> dictionary at first should be more efficient than a list:

Hmm, I'm not sure that's actually so. His code is aiming to get
'multiplier' values in each box; for any serious multiplier (he starts
with 10 in the main code), you can be fairly confident that every
number will come up at least once. The distribution of numbers won't
ever be perfectly even, but you'd expect it to be reasonably close. I
have a similar routine on my Dungeons & Dragons server; since a roll
of a twenty-sided dice is crucial to most of the game, I have a simple
tester that proves to people that the in-built dice roller is fair.
Its output looks like this:

> roll test
1: 10017
2: 10003
3: 9966
4: 9728
5: 10088
6: 9888
7: 10087
8: 9971
9: 10052
10: 10061
11: 10130
12: 9942
13: 10062
14: 10075
15: 10050
16: 9948
17: 9880
18: 10052
19: 9995
20: 10005
Standard deviation: 90.18 (0.90%)

This is about equivalent to test_random(20), and as you see, he and I
both picked a multiplier of 10K to use by default. (I call the
parameters "max" and "avg" rather than "length" and "multiplier", but
they have the exact same semantics.) The hard part is figuring out
what "looks reasonable"; a true RNG could legitimately produce nothing
but 7s for the entire run, it's just extremely unlikely.

ChrisA

[toc] | [prev] | [next] | [standalone]


#92263

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-06-07 17:51 +0200
Message-ID<3158703.Lr4HFMbMOd@PointedEars.de>
In reply to#92236
Chris Angelico wrote:

> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn
> <PointedEars@web.de> wrote:
>> Cecil Westerhof wrote:
>>> I wrote a very simple function to test random:
>>>     def test_random(length, multiplier = 10000):
>>>         number_list = length * [0]
>>>         for i in range(length * multiplier):
>>>             number_list[random.randint(0, length - 1)] += 1
>>>         minimum = min(number_list)
>>>         maximum = max(number_list)
>>>         return (minimum, maximum, minimum / maximum)
>>
>> As there is no guarantee that every number will occur randomly, using a
>> dictionary at first should be more efficient than a list:
> 
> Hmm, I'm not sure that's actually so. His code is aiming to get
> 'multiplier' values in each box; for any serious multiplier (he starts
> with 10 in the main code), you can be fairly confident that every
> number will come up at least once. 

The wording shows a common misconception: that random distribution would 
mean that it is guaranteed or more probable that every element of the set 
will occur at least once.  It is another common misconception that 
increasing the number of trials would increase the probability of that 
happening.  But that is not so.

The law of large numbers only says that as you increase the number of 
trials, that the relative frequency *approaches* the probability for each 
value of the probability variable, or IOW “the average of the results 
obtained from a large number of trials should be close to the expected 
value, and will *tend to become closer* as more trials are performed.” 
(<http://en.wikipedia.org/wiki/Law_of_large_numbers>; emphasis by me)

> […] a true RNG could legitimately produce nothing but 7s for the entire 
> run, it's just extremely unlikely.

That reasoning is precisely a result of the misconception described above.  
Because people think that every value must occur, they do not think it 
possible that (much) repetition could occur with a (pseudo-)random 
generator, and when they want to mince words they say “(extremely) unlikely” 
instead.  For example, when people see

  6 5 8 7 9 3 7 8 4 7 5 6 8 8 1 2 8 3 5 7 5 4 1 2 4 8 8 7 5 1

and are told that this is random sequence, they find it hard to believe.  
They think something like: “Look at all those repeated 8, and “7, 5” 
occurs twice.  4 occurs more often than 2, and there are much more 5s than 
1s.  That cannot be possibly be random!”

Yes, it *can*.  I have just produced it with

#------------------------------------------------------------------
import random
print(" ".join([str(random.randint(1, 9)) for i in range(0, 30)]))'
#------------------------------------------------------------------

But if you think this *Mersenne Twister* PRNG which “generates numbers with 
nearly uniform distribution and a large period” is flawed, use a proper die 
as RNG, and a sheet of paper to record the outcomes, and do the experiment.  
The outcome is not going to be different.

If the distribution is even and the game is fair, every outcome *always* has 
the same probability of occurring.  As a result, every sequence of outcomes 
of equal length *always* has the same probability of occurring, and the 
probability for a particular sequence of equal length does _not_ increase or 
decrease based on the number of occurrences of previous outcomes.  Those are 
*independent* events.

See also: 
<http://www.teacherlink.org/content/math/interactive/probability/numbersense/misconceptions/home.html>, 
in particular the section “Representativeness”

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#92267

FromChris Angelico <rosuav@gmail.com>
Date2015-06-08 02:25 +1000
Message-ID<mailman.257.1433694323.13271.python-list@python.org>
In reply to#92263
On Mon, Jun 8, 2015 at 1:51 AM, Thomas 'PointedEars' Lahn
<PointedEars@web.de> wrote:
> Chris Angelico wrote:
>
>> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn
>> <PointedEars@web.de> wrote:
>>> Cecil Westerhof wrote:
>>>> I wrote a very simple function to test random:
>>>>     def test_random(length, multiplier = 10000):
>>>>         number_list = length * [0]
>>>>         for i in range(length * multiplier):
>>>>             number_list[random.randint(0, length - 1)] += 1
>>>>         minimum = min(number_list)
>>>>         maximum = max(number_list)
>>>>         return (minimum, maximum, minimum / maximum)
>>>
>>> As there is no guarantee that every number will occur randomly, using a
>>> dictionary at first should be more efficient than a list:
>>
>> Hmm, I'm not sure that's actually so. His code is aiming to get
>> 'multiplier' values in each box; for any serious multiplier (he starts
>> with 10 in the main code), you can be fairly confident that every
>> number will come up at least once.
>
> The wording shows a common misconception: that random distribution would
> mean that it is guaranteed or more probable that every element of the set
> will occur at least once.  It is another common misconception that
> increasing the number of trials would increase the probability of that
> happening.  But that is not so.

The greater the multiplier, the lower the chance that any element will
have no hits. With uniform distribution, a length of 10, and a
multiplier of 10, there are 100 attempts with a 90% chance each that
any given number will be avoided - which works out to 0.9**100 ==
2.6561398887587544e-05 probability that (say) there'll be no 4s in the
list. Invert that and raise to the 10th power, and you get a
probability of 0.9997344177567317 that there'll be at least one in
every bucket. Raise the multiplier to 100, and the probability of a
single whiff becomes 1.7478712517226947e-46; invert and raise to the
tenth power, and it becomes closer to certainty than IEEE double
precision can represent. Raise the length to 100 and the numbers get
lower again; with multiplier 10, probability 0.9956920878572284 of
having one in every bucket (that's a half a percent chance of a zero
anywhere), and at multiplier 100, still underflows to certainty.

But you'll notice that I wasn't actually talking about certainty here.
I was talking about confidence, at levels sufficient to make data-type
decisions on. Sure, there's no guarantee that every number will occur;
but if there's a 0.4% chance that any number will be omitted, I think
the list is going to work out more efficient.

You'll notice that some of the other posts have been concerned more
about correctness (for instance, using collections.Counter and then
making sure there's a zero slot for every element - otherwise empty
slots would be omitted), and then they _do_ acknowledge the chance
that something will underflow. But with the numbers the OP gave, I
would be fully satisfied with optimizing for the case where every
bucket gets at least something.

ChrisA

[toc] | [prev] | [next] | [standalone]


#92271

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-06-07 18:36 +0200
Message-ID<1656356.OjxBvjpJ5d@PointedEars.de>
In reply to#92267
Chris Angelico wrote:

> On Mon, Jun 8, 2015 at 1:51 AM, Thomas 'PointedEars' Lahn
> <PointedEars@web.de> wrote:
>> Chris Angelico wrote:
>>
>>> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn
>>> <PointedEars@web.de> wrote:
>>>> Cecil Westerhof wrote:
>>>>> I wrote a very simple function to test random:
>>>>>     def test_random(length, multiplier = 10000):
>>>>>         number_list = length * [0]
>>>>>         for i in range(length * multiplier):
>>>>>             number_list[random.randint(0, length - 1)] += 1
>>>>>         minimum = min(number_list)
>>>>>         maximum = max(number_list)
>>>>>         return (minimum, maximum, minimum / maximum)
>>>>
>>>> As there is no guarantee that every number will occur randomly, using a
>>>> dictionary at first should be more efficient than a list:
>>>
>>> Hmm, I'm not sure that's actually so. His code is aiming to get
>>> 'multiplier' values in each box; for any serious multiplier (he starts
>>> with 10 in the main code), you can be fairly confident that every
>>> number will come up at least once.
>>
>> The wording shows a common misconception: that random distribution would
>> mean that it is guaranteed or more probable that every element of the set
>> will occur at least once.  It is another common misconception that
>> increasing the number of trials would increase the probability of that
>> happening.  But that is not so.
> 
> The greater the multiplier, the lower the chance that any element will
> have no hits.

Wrong.

> [ex falso quodlibet]

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#92272

FromChris Angelico <rosuav@gmail.com>
Date2015-06-08 02:44 +1000
Message-ID<mailman.259.1433695485.13271.python-list@python.org>
In reply to#92271
On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn
<PointedEars@web.de> wrote:
>> The greater the multiplier, the lower the chance that any element will
>> have no hits.
>
> Wrong.
>
>> [ex falso quodlibet]

Huh. Do you want to explain how, mathematically, I am wrong, or do you
want to join the RUE in my ignore list?

ChrisA

[toc] | [prev] | [next] | [standalone]


#92283

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-06-07 20:23 +0200
Message-ID<5515034.V7dcXEWAvK@PointedEars.de>
In reply to#92272
Chris Angelico wrote:

> On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn
> <PointedEars@web.de> wrote:
>>> The greater the multiplier, the lower the chance that any element will
>>> have no hits.
>> Wrong.
>>
>>> [ex falso quodlibet]
> 
> Huh. Do you want to explain how, mathematically, I am wrong, or do you
> want to join the RUE in my ignore list?

I already did; you have overlooked it.  In a nutshell, the probability of 

  1 1 1 1 1 1 1 1 1 1 1

is *the same* as that of

  1 2 3 4 5 6 7 8 9 1 2

and the same as that of

  8 3 6 3 1 2 6 8 2 1 6.

If the set to choose from is integer numbers from 1 to 9, then *each* of 
those sequences has *the same* probability (1∕9)¹¹ ≈ 3.1866355 × 10⁻¹¹.
Repeating that random experiment one more time, the probability of

  1 1 1 1 1 1 1 1 1 1 1 2

is the same as of

  1 2 3 4 5 6 7 8 9 1 2 2

and the same as that of

  8 3 6 3 1 2 6 8 2 1 6 2

(that the next outcome is a 2): (1∕9)¹² ≈ 3.5407062 × 10⁻¹².  And the 
probability of

  1 1 1 1 1 1 1 1 1 1 1 1

(only 1s) is the *same* as of

  1 1 1 1 1 1 1 1 1 1 1 2

(only 1s and one 2) and the same as of

  1 1 1 1 1 1 1 1 1 1 1 3

(only 1s and a 3) and of

  8 3 6 3 1 2 6 8 2 1 6 4

(that the next outcome is a 4, and there are no 5s at all).

AISB, those are *independent* events; the number of occurrences of an 
outcome before *does not matter* for the probability of the next one.  And 
so the probability of getting a certain number does _not_ change depending 
on the number of times you repeat the experiment.  It is always the same;
in this example, it is always 1∕9.  And the probability of _not_ getting a 
certain number is always the same; in this example, it is always 1 − 1∕9 = 
8∕9.


So, I am sorry to tell you this, but you do _not_ understand probability.  
And you *cannot* understand it intuitively, like you tried to.  Probability 
is counter-intuitive.

That should not surprise or worry you.  Because as I and others have pointed 
out in this thread, the majority of the human population, including the OP, 
does not understand probability – until it has been explained to them.  I am 
not even sure I understand probability in every instance, or rather, I am 
sure that *intuitively* I do not.

It is why humans play the lottery using their “lucky numbers” whereas other 
numbers have exactly the same probability of being drawn; why they think a  
baby boy “is due” when they have two girls already; why they are surprised 
if two people in the room have the same birthday; why *in the short run* you 
can make a fortune by running a casino even if the tables are not rigged; 
why the “gate three” trick works (for a while); why they pay money for 
horoscopes and truthsayers, not recognizing that their wording is so 
ambiguous that it is highly probable to apply to everyone, and why many of 
them believe in a deity or fate.

A big part of the reason is psychology, how the human mind works: The human 
mind has evolved to recognize patterns.  It is an evolutionary advantage to 
be able to recognize patterns, to tell a mammoth from a sabre-stooth lion at 
a distance, a poisonous from an edible plant, and an X from a U.  So humans 
try to see patterns everywhere, even in pure randomness: clouds, texts, 
numbers, you name it.  And when they are told that something is random, and 
they find patterns regardless, they do not believe that it is random.

An extension of that misconception is emphasized by an anecdote (which may 
be apocryphal) told about Richard Feynman (I heard it from Lawrence Krauss 
in “A Universe from Nothing”; he can tell that in a much more funny way than 
I am able to reproduce it here [1]):

  Richard Feynman used to go up to people all the time and he’d say:

    “You won’t believe what happend to me today!
     You won’t believe what happend to me today!”

  And people would say:

    “What?”

  And he would reply:

    “Absolutely nothing!”

Because humans are evolutionary driven to look for patterns, and they 
believe that everything that happens to them is important and has 
significance in “the greater scheme of things” (whose mere existence
is a misconception, too).

HTH

________
[1] Lawrence KRAUSS (2009). “A Universe from Nothing”. The Mariott hotel,
    Burbank. 1:03:12.  <https://youtu.be/-EilZ4VY5Vs?t=3792>
-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#92289

FromChris Angelico <rosuav@gmail.com>
Date2015-06-08 04:52 +1000
Message-ID<mailman.271.1433703133.13271.python-list@python.org>
In reply to#92283
On Mon, Jun 8, 2015 at 4:23 AM, Thomas 'PointedEars' Lahn
<PointedEars@web.de> wrote:
> If the set to choose from is integer numbers from 1 to 9, then *each* of
> those sequences has *the same* probability (1∕9)¹¹ ≈ 3.1866355 × 10⁻¹¹.
>
> AISB, those are *independent* events; the number of occurrences of an
> outcome before *does not matter* for the probability of the next one.  And
> so the probability of getting a certain number does _not_ change depending
> on the number of times you repeat the experiment.  It is always the same;
> in this example, it is always 1∕9.  And the probability of _not_ getting a
> certain number is always the same; in this example, it is always 1 − 1∕9 =
> 8∕9.

Yes, the probability of not getting a certain number is always 8/9...
but the probability of not getting that number for an entire sequence
is 8/9 raised to the power of the length of the sequence, because they
are independent events that must *all* happen. In your case, with a
length of 11, that means there's about a 27% chance of any particular
number not happening. Looking at it the other way, that means there's
a 73% chance of seeing at least one N, for any N. So the probability
of seeing at least one of every digit is actually quite low - about
5.6% - because you're looking for nine digits out of eleven. (Note
that the numbers aren't quite exact here. The probabilities aren't
entirely independent. Calculating off the exact figures is left as an
exercise for the reader, but I don't expect they'll be more than a
percentage point different one way or the other.)

The chances of not seeing a 5 in any of 100 tosses would be virtually
nil. The chances of not seeing a 5 in any of 1000 tosses is so close
to zero that its inverse is indistinguishable from certainty with
Python floats (IEEE double precision):

>>> 1.0-(9/10)**1000 == 1.0
True

That's a pretty low number.

ChrisA

[toc] | [prev] | [next] | [standalone]


#92295

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-06-07 21:41 +0200
Message-ID<3006491.U2VKTQuU5p@PointedEars.de>
In reply to#92289
Chris Angelico wrote:

> On Mon, Jun 8, 2015 at 4:23 AM, Thomas 'PointedEars' Lahn
> <PointedEars@web.de> wrote:
>> If the set to choose from is integer numbers from 1 to 9, then *each* of
>> those sequences has *the same* probability (1∕9)¹¹ ≈ 3.1866355 × 10⁻¹¹.
>>
>> AISB, those are *independent* events; the number of occurrences of an
>> outcome before *does not matter* for the probability of the next one. 
>> And so the probability of getting a certain number does _not_ change
>> depending on the number of times you repeat the experiment.  It is always 
>> the same; in this example, it is always 1∕9.  And the probability of 
>> _not_ getting  certain number is always the same; in this example, it is 
>> always 1 − 1∕9 = 8∕9.
> 
> Yes, the probability of not getting a certain number is always 8/9...
> but the probability of not getting that number for an entire sequence
> is 8/9 raised to the power of the length of the sequence

As it is for getting every number at least once, or only two, three, four, 
five, six, seven, or eight of them.  Each number has the probability of 1∕9 
of being drawn, and for a sequence of n numbers drawn, the probability is 
(1∕9)ⁿ, *no matter* which numbers are in it.

> because they are independent events that must *all* happen. […]

As I already pointed out, your reasoning is purely *intuitive*, _not_ based 
on (correct) math, and (therefore) flawed.  And intuition lets you down 
here.

The correct, mathematical reasoning is: *Because* the events are 
*independent*, the probability of the sequence does not change depending on 
the number of occurrences of an event (outcome of a draw).  That all numbers 
occur is _not_ more likely than that only one number occurs, or two, three, 
and so on.  And that does _not_ depend on how many draws you take as there 
is an infinite reservoir for each number.

I do not think I can explain it to you better, and I am not going to waste 
my precious free time to repeat myself.  Follow the reference I gave.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#92292

FromJussi Piitulainen <jpiitula@ling.helsinki.fi>
Date2015-06-07 22:08 +0300
Message-ID<lf5sia3bakh.fsf@ling.helsinki.fi>
In reply to#92283
Thomas 'PointedEars' Lahn writes:
> Chris Angelico wrote:
>
>> Huh. Do you want to explain how, mathematically, I am wrong, or do
>> you want to join the RUE in my ignore list?
>
> I already did; you have overlooked it.  In a nutshell, the probability
> of
>
>   1 1 1 1 1 1 1 1 1 1 1

There is one way to get those numbers in some order.

(11! / 11! = 1)

> is *the same* as that of
>
>   1 2 3 4 5 6 7 8 9 1 2

There are almost ten million ways to get those numbers in some order.

(11! / 2! / 2! = 9979200)

> and the same as that of
>
>   8 3 6 3 1 2 6 8 2 1 6.

There are more than four hundred thousand ways to get those numbers in
some order.

(11! / 2! / 2! / 2! / 3! / 2! = 415800)

[toc] | [prev] | [next] | [standalone]


#92294

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-06-07 21:29 +0200
Message-ID<1583276.2lplL8rY5W@PointedEars.de>
In reply to#92292
Jussi Piitulainen wrote:

> Thomas 'PointedEars' Lahn writes:
>>   8 3 6 3 1 2 6 8 2 1 6.
> 
> There are more than four hundred thousand ways to get those numbers in
> some order.
> 
> (11! / 2! / 2! / 2! / 3! / 2! = 415800)

Fallacy.  Order is irrelevant here.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#92296

Fromrandom832@fastmail.us
Date2015-06-07 15:44 -0400
Message-ID<mailman.274.1433706275.13271.python-list@python.org>
In reply to#92294
On Sun, Jun 7, 2015, at 15:29, Thomas 'PointedEars' Lahn wrote:
> Jussi Piitulainen wrote:
> 
> > Thomas 'PointedEars' Lahn writes:
> >>   8 3 6 3 1 2 6 8 2 1 6.
> > 
> > There are more than four hundred thousand ways to get those numbers in
> > some order.
> > 
> > (11! / 2! / 2! / 2! / 3! / 2! = 415800)
> 
> Fallacy.  Order is irrelevant here.

Huh? The fact that order is irrelevant is exactly _why_ what he said
_is_ relevant.

[toc] | [prev] | [next] | [standalone]


#92299

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-06-07 22:09 +0200
Message-ID<44804165.jDZYAAtjtx@PointedEars.de>
In reply to#92296
random832@fastmail.us wrote:

> On Sun, Jun 7, 2015, at 15:29, Thomas 'PointedEars' Lahn wrote:
>> Jussi Piitulainen wrote:
>> > Thomas 'PointedEars' Lahn writes:
>> >>   8 3 6 3 1 2 6 8 2 1 6.
>> > 
>> > There are more than four hundred thousand ways to get those numbers in
>> > some order.
>> > 
>> > (11! / 2! / 2! / 2! / 3! / 2! = 415800)
>> Fallacy.  Order is irrelevant here.
> 
> Huh? The fact that order is irrelevant is exactly _why_ what he said
> _is_ relevant.

No.  AISB, those sequences all have the same probability:

<http://www.teacherlink.org/content/math/interactive/probability/numbersense/misconceptions/representativeness.html>
<http://www.teacherlink.org/content/math/interactive/probability/numbersense/misconceptions/recency.html>

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#92302

Fromrandom832@fastmail.us
Date2015-06-07 16:41 -0400
Message-ID<mailman.276.1433709682.13271.python-list@python.org>
In reply to#92299

On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote:
> No.  AISB, those sequences all have the same probability:

Yes and the probability of getting _any_ of the sequences, is the sum of
the probabilities for each one of the sequences individually.

[toc] | [prev] | [next] | [standalone]


#92305

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-06-07 22:59 +0200
Message-ID<12048809.77ogS5qXqA@PointedEars.de>
In reply to#92302
random832@fastmail.us wrote:

> On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote:
>> No.  AISB, those sequences all have the same probability:
> 
> Yes and the probability of getting _any_ of the sequences, is the sum of
> the probabilities for each one of the sequences individually.

The probability of getting any of the sequences is precisely 1, and that is 
irrelevant to the fact that the probability of getting a particular sequence 
is independent of the number of occurrences of a specific event as the 
probability variable is evenly distributed and the events are independent 
(of each other).

You really should follow the references and brush up your math.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#92318

FromSteven D'Aprano <steve@pearwood.info>
Date2015-06-08 11:26 +1000
Message-ID<5574ef45$0$12981$c3e8da3$5496439d@news.astraweb.com>
In reply to#92302
On Mon, 8 Jun 2015 06:56 am, Thomas 'PointedEars' Lahn wrote:

> You really should follow the references and brush up your math.

You really should stop being patronising to everyone and pay attention to
what is actually being said.

If you won't believe us, and you won't believe mathematics, perhaps you will
believe a simulation.


import random

def trial(size):
    return [random.randint(1, 9) for i in range(size)]

def test(trial_size, test_size):
    every_element_seen = 0
    for i in range(test_size):
        s = set(trial(trial_size))
        if len(s) == 9:
            every_element_seen += 1
    return every_element_seen/test_size


msg = "Pr(every element seen at least once) in %d trials"
for size in range(10, 100, 5):
    print(msg % size, end=' ')
    pr = test(size, 1000)
    print(pr)


When I run this simulation, I get:

Pr(every element seen at least once) in 10 trials 0.007
Pr(every element seen at least once) in 15 trials 0.109
Pr(every element seen at least once) in 20 trials 0.349
Pr(every element seen at least once) in 25 trials 0.6
Pr(every element seen at least once) in 30 trials 0.765
Pr(every element seen at least once) in 35 trials 0.868
Pr(every element seen at least once) in 40 trials 0.917
Pr(every element seen at least once) in 45 trials 0.953
Pr(every element seen at least once) in 50 trials 0.976
Pr(every element seen at least once) in 55 trials 0.981
Pr(every element seen at least once) in 60 trials 0.996
Pr(every element seen at least once) in 65 trials 0.999
Pr(every element seen at least once) in 70 trials 0.997
Pr(every element seen at least once) in 75 trials 1.0
Pr(every element seen at least once) in 80 trials 0.998
Pr(every element seen at least once) in 85 trials 1.0
Pr(every element seen at least once) in 90 trials 1.0
Pr(every element seen at least once) in 95 trials 1.0


 

-- 
Steven

[toc] | [prev] | [next] | [standalone]


#92319

Fromrandom832@fastmail.us
Date2015-06-07 21:34 -0400
Message-ID<mailman.284.1433727282.13271.python-list@python.org>
In reply to#92302
On Sun, Jun 7, 2015, at 16:56, Thomas 'PointedEars' Lahn wrote:
> random832@fastmail.us wrote:
> 
> > On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote:
> >> No.  AISB, those sequences all have the same probability:
> > 
> > Yes and the probability of getting _any_ of the sequences, is the sum of
> > the probabilities for each one of the sequences individually.
> 
> The probability of getting any of the sequences is precisely 1

Any of the sequences *satisfying "contains at least one of each result"*
(or, in general, any assertion about the degree of coverage, for
example, covering at least two values being more likely than covering
only one value)

Flipping one coin has two possible sequences: H and T. The probability
of getting at least one H and at least one T is 0.

Flipping two coins has four possible sequences: HH, HT, TH, TT. The
probability of getting at least one H and at least one T is 50%, since
two of these sequences satisfies the condition.

Flipping three coins has eight possible sequences. The probability of
getting at least one H and at least one T is 75%, since six of the
sequences: THH, HTH, HHT, HTT, THT, TTH, all satisfy the condition.

Flipping four coins has sixteen possibile sequences. The probability of
getting at least one H and at least one T is 87.5%

Rolling 2d3 has nine possible sequences.
The probability of getting at least one of each value is 0.
The probability of getting at least one each of two values is 66.7%.
The probability of getting only one value is 33.3%.

Rolling 3d3 has 27 possible sequences.
Six of them have one of each value. The probability is 22.2%.
24 have at least two different values. That probability is 88.9%.
Only 3 have all of the same value. 11.1%.

Rolling 4d3 has 81 possible sequences.
36 of them have at least one of each value. The probability is 44.4%.
78 of them have at least two different values. The probability is 96.3%.
Still only 3 have all of the same value. 3.7%.

Rolling 5d3 has 243 possible sequences.
150 of them have at least one of each value. The probability is 61.7%
240 of them have at least two different values. 98.8%.
Still only 3 have all of the same value. 1.2%.


In general, as the number of trials increases, the probability of having
e.g. at least one of each value never _reaches_ 1, but it gets
arbitrarily close.

[toc] | [prev] | [next] | [standalone]


#92321

FromChris Angelico <rosuav@gmail.com>
Date2015-06-08 11:42 +1000
Message-ID<mailman.285.1433727771.13271.python-list@python.org>
In reply to#92302
On Mon, Jun 8, 2015 at 11:34 AM,  <random832@fastmail.us> wrote:
> In general, as the number of trials increases, the probability of having
> e.g. at least one of each value never _reaches_ 1, but it gets
> arbitrarily close.

And by "arbitrarily close", you mean any of:

* So close to 1.0 that IEEE double precision is unable to represent it
* So unlikely that you could perform one trial every nanosecond until
the heat death of the universe and still not expect to see it
* Less likely than that two files could accidentally collide on MD5,
SHA1, SHA256, and file size, simultaneously

I think all of the above are true of this case, though I have to guess
about the heat death of the universe.

ChrisA

[toc] | [prev] | [next] | [standalone]


#92322

FromMRAB <python@mrabarnett.plus.com>
Date2015-06-08 02:49 +0100
Message-ID<mailman.286.1433728391.13271.python-list@python.org>
In reply to#92302
On 2015-06-08 02:42, Chris Angelico wrote:
> On Mon, Jun 8, 2015 at 11:34 AM,  <random832@fastmail.us> wrote:
>> In general, as the number of trials increases, the probability of having
>> e.g. at least one of each value never _reaches_ 1, but it gets
>> arbitrarily close.
>
> And by "arbitrarily close", you mean any of:
>
I believe the word is "asymptotic".

> * So close to 1.0 that IEEE double precision is unable to represent it
> * So unlikely that you could perform one trial every nanosecond until
> the heat death of the universe and still not expect to see it
> * Less likely than that two files could accidentally collide on MD5,
> SHA1, SHA256, and file size, simultaneously
>
> I think all of the above are true of this case, though I have to guess
> about the heat death of the universe.
>

[toc] | [prev] | [next] | [standalone]


Page 1 of 4  [1] 2 3 4  Next page →

Back to top | Article view | comp.lang.python


csiph-web