Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #92207 > unrolled thread
| Started by | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| First post | 2015-06-07 08:27 +0200 |
| Last post | 2015-06-07 11:04 -0400 |
| Articles | 20 on this page of 80 — 22 participants |
Back to article view | Back to comp.lang.python
Testing random Cecil Westerhof <Cecil@decebal.nl> - 2015-06-07 08:27 +0200
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 12:40 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-07 21:51 +1000
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 17:51 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 02:25 +1000
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 18:36 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 02:44 +1000
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 20:23 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 04:52 +1000
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 21:41 +0200
Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-07 22:08 +0300
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 21:29 +0200
Re: Testing random random832@fastmail.us - 2015-06-07 15:44 -0400
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:09 +0200
Re: Testing random random832@fastmail.us - 2015-06-07 16:41 -0400
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:59 +0200
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:26 +1000
Re: Testing random random832@fastmail.us - 2015-06-07 21:34 -0400
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 11:42 +1000
Re: Testing random MRAB <python@mrabarnett.plus.com> - 2015-06-08 02:49 +0100
Re: Testing random random832@fastmail.us - 2015-06-07 21:57 -0400
Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-08 10:40 +0300
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-10 19:03 +0200
Re: Testing random sohcahtoa82@gmail.com - 2015-06-10 10:52 -0700
Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-10 23:00 +0300
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-10 12:02 -0600
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-12 23:32 +0200
Re: Testing random alister <alister.nospam.ware@ntlworld.com> - 2015-06-12 21:46 +0000
Re: Testing random random832@fastmail.us - 2015-06-12 17:52 -0400
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-12 16:00 -0600
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-13 00:09 +0200
Re: Testing random sohcahtoa82@gmail.com - 2015-06-12 15:55 -0700
Re: Testing random random832@fastmail.us - 2015-06-12 18:57 -0400
Re: Testing random Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-06-13 08:53 +0100
Re: Testing random random832@fastmail.us - 2015-06-10 14:26 -0400
Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-07 14:21 -0700
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-16 21:18 +0200
Re: Testing random random832@fastmail.us - 2015-06-16 16:23 -0400
Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-16 13:48 -0700
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-16 23:57 +0200
Re: Testing random sohcahtoa82@gmail.com - 2015-06-16 15:30 -0700
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-16 16:58 -0600
Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 11:28 +0200
Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-16 16:26 -0700
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-17 01:45 +0200
Re: Testing random sohcahtoa82@gmail.com - 2015-06-16 17:36 -0700
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-17 11:01 +1000
Re: Testing random Ethan Furman <ethan@stoneleaf.us> - 2015-06-16 18:32 -0700
Re: Testing random Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-06-17 09:41 +0100
Re: Testing random Grant Edwards <invalid@invalid.invalid> - 2015-06-17 14:04 +0000
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-17 09:01 -0600
Re: Testing random MRAB <python@mrabarnett.plus.com> - 2015-06-17 01:42 +0100
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-17 08:53 +0200
Re: Testing random Christian Gollwitzer <auriocus@gmx.de> - 2015-06-17 09:22 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-17 17:28 +1000
Re: Testing random Tim Golden <mail@timgolden.me.uk> - 2015-06-17 08:30 +0100
Re: Testing random Cecil Westerhof <Cecil@decebal.nl> - 2015-06-17 11:57 +0200
Re: Testing random Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-06-17 01:36 +0000
Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 12:33 +0200
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-17 22:47 +1000
Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 15:50 +0200
Re: Testing random Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-06-17 01:35 +0000
Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-17 07:41 +0300
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:11 +1000
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-07 11:07 -0600
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 03:20 +1000
Re: Testing random "C.D. Reimer" <chris@cdreimer.com> - 2015-06-07 10:36 -0700
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 04:28 +1000
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 04:40 +1000
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 04:24 +1000
Re: Testing random Jonas Wielicki <jonas@wielicki.name> - 2015-06-07 12:41 +0200
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-07 22:52 +1000
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-07 23:06 +1000
Re: Testing random Peter Otten <__peter__@web.de> - 2015-06-07 15:35 +0200
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 18:36 +0200
Re: Testing random Peter Otten <__peter__@web.de> - 2015-06-07 18:48 +0200
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:15 +0200
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:35 +1000
Re: Testing random Christian Gollwitzer <auriocus@gmx.de> - 2015-06-07 14:53 +0200
Re: Testing random Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-06-07 11:04 -0400
Page 1 of 4 [1] 2 3 4 Next page →
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-06-07 08:27 +0200 |
| Subject | Testing random |
| Message-ID | <87oaksowwg.fsf@Equus.decebal.nl> |
I wrote a very simple function to test random:
def test_random(length, multiplier = 10000):
number_list = length * [0]
for i in range(length * multiplier):
number_list[random.randint(0, length - 1)] += 1
minimum = min(number_list)
maximum = max(number_list)
return (minimum, maximum, minimum / maximum)
When running:
for i in range(1, 7):
print(test_random(100, 10 ** i))
I get:
(3, 17, 0.17647058823529413)
(73, 127, 0.5748031496062992)
(915, 1086, 0.8425414364640884)
(9723, 10195, 0.9537027954879843)
(99348, 100620, 0.9873583780560524)
(997198, 1002496, 0.9947151908835546)
and when running:
for i in range(1, 7):
print(test_random(1000, 10 ** i))
I get:
(2, 20, 0.1)
(69, 138, 0.5)
(908, 1098, 0.8269581056466302)
(9684, 10268, 0.9431242695753799)
(99046, 100979, 0.9808574059953059)
(996923, 1003598, 0.9933489305478886)
It shows that when the first parameter increases the deviation
increases and when the second parameter increases the deviation
decreases. Exactly what you would expect. But what are the ranges you
would expect with a good random function. Then it could be used to
test a random function.
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 12:40 +0200 |
| Message-ID | <1451048.pW9z17ilMA@PointedEars.de> |
| In reply to | #92207 |
Cecil Westerhof wrote:
> I wrote a very simple function to test random:
> def test_random(length, multiplier = 10000):
> number_list = length * [0]
> for i in range(length * multiplier):
> number_list[random.randint(0, length - 1)] += 1
> minimum = min(number_list)
> maximum = max(number_list)
> return (minimum, maximum, minimum / maximum)
As there is no guarantee that every number will occur randomly, using a
dictionary at first should be more efficient than a list:
def test_random (length, multiplier=10000):
number_list = {}
for i in range(length * multiplier):
r = random.randint(0, length - 1)
number_list[r] = number_list[r] + 1 if r in number_list else 1
values = number_list.values()
minimum = min(values)
maximum = max(values)
return (minimum, maximum, minimum / maximum)
[Тuple literals are going to be introduced with C# 7 ;-)]
Since Python uses indentation as syntax element, it is better not to indent
Python code by default when posting in order to make it easier testable. If
necessary, code sections can be delimited in an unambiguous way that is
syntactically correct by lines like
#------------------------------------------------------------------------
The line length then should not exceed 72 to 78 characters, for quoting.
Although the <HT> character can be used for indentation, 4 spaces appear to
be a sensible, device-independent default (8 is too much for more complex
snippets that need to fit within the conventional 80-characters-per-line
limit).
--
PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-06-07 21:51 +1000 |
| Message-ID | <mailman.242.1433677915.13271.python-list@python.org> |
| In reply to | #92226 |
On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn <PointedEars@web.de> wrote: > Cecil Westerhof wrote: > >> I wrote a very simple function to test random: >> def test_random(length, multiplier = 10000): >> number_list = length * [0] >> for i in range(length * multiplier): >> number_list[random.randint(0, length - 1)] += 1 >> minimum = min(number_list) >> maximum = max(number_list) >> return (minimum, maximum, minimum / maximum) > > As there is no guarantee that every number will occur randomly, using a > dictionary at first should be more efficient than a list: Hmm, I'm not sure that's actually so. His code is aiming to get 'multiplier' values in each box; for any serious multiplier (he starts with 10 in the main code), you can be fairly confident that every number will come up at least once. The distribution of numbers won't ever be perfectly even, but you'd expect it to be reasonably close. I have a similar routine on my Dungeons & Dragons server; since a roll of a twenty-sided dice is crucial to most of the game, I have a simple tester that proves to people that the in-built dice roller is fair. Its output looks like this: > roll test 1: 10017 2: 10003 3: 9966 4: 9728 5: 10088 6: 9888 7: 10087 8: 9971 9: 10052 10: 10061 11: 10130 12: 9942 13: 10062 14: 10075 15: 10050 16: 9948 17: 9880 18: 10052 19: 9995 20: 10005 Standard deviation: 90.18 (0.90%) This is about equivalent to test_random(20), and as you see, he and I both picked a multiplier of 10K to use by default. (I call the parameters "max" and "avg" rather than "length" and "multiplier", but they have the exact same semantics.) The hard part is figuring out what "looks reasonable"; a true RNG could legitimately produce nothing but 7s for the entire run, it's just extremely unlikely. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 17:51 +0200 |
| Message-ID | <3158703.Lr4HFMbMOd@PointedEars.de> |
| In reply to | #92236 |
Chris Angelico wrote:
> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn
> <PointedEars@web.de> wrote:
>> Cecil Westerhof wrote:
>>> I wrote a very simple function to test random:
>>> def test_random(length, multiplier = 10000):
>>> number_list = length * [0]
>>> for i in range(length * multiplier):
>>> number_list[random.randint(0, length - 1)] += 1
>>> minimum = min(number_list)
>>> maximum = max(number_list)
>>> return (minimum, maximum, minimum / maximum)
>>
>> As there is no guarantee that every number will occur randomly, using a
>> dictionary at first should be more efficient than a list:
>
> Hmm, I'm not sure that's actually so. His code is aiming to get
> 'multiplier' values in each box; for any serious multiplier (he starts
> with 10 in the main code), you can be fairly confident that every
> number will come up at least once.
The wording shows a common misconception: that random distribution would
mean that it is guaranteed or more probable that every element of the set
will occur at least once. It is another common misconception that
increasing the number of trials would increase the probability of that
happening. But that is not so.
The law of large numbers only says that as you increase the number of
trials, that the relative frequency *approaches* the probability for each
value of the probability variable, or IOW “the average of the results
obtained from a large number of trials should be close to the expected
value, and will *tend to become closer* as more trials are performed.”
(<http://en.wikipedia.org/wiki/Law_of_large_numbers>; emphasis by me)
> […] a true RNG could legitimately produce nothing but 7s for the entire
> run, it's just extremely unlikely.
That reasoning is precisely a result of the misconception described above.
Because people think that every value must occur, they do not think it
possible that (much) repetition could occur with a (pseudo-)random
generator, and when they want to mince words they say “(extremely) unlikely”
instead. For example, when people see
6 5 8 7 9 3 7 8 4 7 5 6 8 8 1 2 8 3 5 7 5 4 1 2 4 8 8 7 5 1
and are told that this is random sequence, they find it hard to believe.
They think something like: “Look at all those repeated 8, and “7, 5”
occurs twice. 4 occurs more often than 2, and there are much more 5s than
1s. That cannot be possibly be random!”
Yes, it *can*. I have just produced it with
#------------------------------------------------------------------
import random
print(" ".join([str(random.randint(1, 9)) for i in range(0, 30)]))'
#------------------------------------------------------------------
But if you think this *Mersenne Twister* PRNG which “generates numbers with
nearly uniform distribution and a large period” is flawed, use a proper die
as RNG, and a sheet of paper to record the outcomes, and do the experiment.
The outcome is not going to be different.
If the distribution is even and the game is fair, every outcome *always* has
the same probability of occurring. As a result, every sequence of outcomes
of equal length *always* has the same probability of occurring, and the
probability for a particular sequence of equal length does _not_ increase or
decrease based on the number of occurrences of previous outcomes. Those are
*independent* events.
See also:
<http://www.teacherlink.org/content/math/interactive/probability/numbersense/misconceptions/home.html>,
in particular the section “Representativeness”
--
PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-06-08 02:25 +1000 |
| Message-ID | <mailman.257.1433694323.13271.python-list@python.org> |
| In reply to | #92263 |
On Mon, Jun 8, 2015 at 1:51 AM, Thomas 'PointedEars' Lahn <PointedEars@web.de> wrote: > Chris Angelico wrote: > >> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn >> <PointedEars@web.de> wrote: >>> Cecil Westerhof wrote: >>>> I wrote a very simple function to test random: >>>> def test_random(length, multiplier = 10000): >>>> number_list = length * [0] >>>> for i in range(length * multiplier): >>>> number_list[random.randint(0, length - 1)] += 1 >>>> minimum = min(number_list) >>>> maximum = max(number_list) >>>> return (minimum, maximum, minimum / maximum) >>> >>> As there is no guarantee that every number will occur randomly, using a >>> dictionary at first should be more efficient than a list: >> >> Hmm, I'm not sure that's actually so. His code is aiming to get >> 'multiplier' values in each box; for any serious multiplier (he starts >> with 10 in the main code), you can be fairly confident that every >> number will come up at least once. > > The wording shows a common misconception: that random distribution would > mean that it is guaranteed or more probable that every element of the set > will occur at least once. It is another common misconception that > increasing the number of trials would increase the probability of that > happening. But that is not so. The greater the multiplier, the lower the chance that any element will have no hits. With uniform distribution, a length of 10, and a multiplier of 10, there are 100 attempts with a 90% chance each that any given number will be avoided - which works out to 0.9**100 == 2.6561398887587544e-05 probability that (say) there'll be no 4s in the list. Invert that and raise to the 10th power, and you get a probability of 0.9997344177567317 that there'll be at least one in every bucket. Raise the multiplier to 100, and the probability of a single whiff becomes 1.7478712517226947e-46; invert and raise to the tenth power, and it becomes closer to certainty than IEEE double precision can represent. Raise the length to 100 and the numbers get lower again; with multiplier 10, probability 0.9956920878572284 of having one in every bucket (that's a half a percent chance of a zero anywhere), and at multiplier 100, still underflows to certainty. But you'll notice that I wasn't actually talking about certainty here. I was talking about confidence, at levels sufficient to make data-type decisions on. Sure, there's no guarantee that every number will occur; but if there's a 0.4% chance that any number will be omitted, I think the list is going to work out more efficient. You'll notice that some of the other posts have been concerned more about correctness (for instance, using collections.Counter and then making sure there's a zero slot for every element - otherwise empty slots would be omitted), and then they _do_ acknowledge the chance that something will underflow. But with the numbers the OP gave, I would be fully satisfied with optimizing for the case where every bucket gets at least something. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 18:36 +0200 |
| Message-ID | <1656356.OjxBvjpJ5d@PointedEars.de> |
| In reply to | #92267 |
Chris Angelico wrote: > On Mon, Jun 8, 2015 at 1:51 AM, Thomas 'PointedEars' Lahn > <PointedEars@web.de> wrote: >> Chris Angelico wrote: >> >>> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn >>> <PointedEars@web.de> wrote: >>>> Cecil Westerhof wrote: >>>>> I wrote a very simple function to test random: >>>>> def test_random(length, multiplier = 10000): >>>>> number_list = length * [0] >>>>> for i in range(length * multiplier): >>>>> number_list[random.randint(0, length - 1)] += 1 >>>>> minimum = min(number_list) >>>>> maximum = max(number_list) >>>>> return (minimum, maximum, minimum / maximum) >>>> >>>> As there is no guarantee that every number will occur randomly, using a >>>> dictionary at first should be more efficient than a list: >>> >>> Hmm, I'm not sure that's actually so. His code is aiming to get >>> 'multiplier' values in each box; for any serious multiplier (he starts >>> with 10 in the main code), you can be fairly confident that every >>> number will come up at least once. >> >> The wording shows a common misconception: that random distribution would >> mean that it is guaranteed or more probable that every element of the set >> will occur at least once. It is another common misconception that >> increasing the number of trials would increase the probability of that >> happening. But that is not so. > > The greater the multiplier, the lower the chance that any element will > have no hits. Wrong. > [ex falso quodlibet] -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-06-08 02:44 +1000 |
| Message-ID | <mailman.259.1433695485.13271.python-list@python.org> |
| In reply to | #92271 |
On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn <PointedEars@web.de> wrote: >> The greater the multiplier, the lower the chance that any element will >> have no hits. > > Wrong. > >> [ex falso quodlibet] Huh. Do you want to explain how, mathematically, I am wrong, or do you want to join the RUE in my ignore list? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 20:23 +0200 |
| Message-ID | <5515034.V7dcXEWAvK@PointedEars.de> |
| In reply to | #92272 |
Chris Angelico wrote:
> On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn
> <PointedEars@web.de> wrote:
>>> The greater the multiplier, the lower the chance that any element will
>>> have no hits.
>> Wrong.
>>
>>> [ex falso quodlibet]
>
> Huh. Do you want to explain how, mathematically, I am wrong, or do you
> want to join the RUE in my ignore list?
I already did; you have overlooked it. In a nutshell, the probability of
1 1 1 1 1 1 1 1 1 1 1
is *the same* as that of
1 2 3 4 5 6 7 8 9 1 2
and the same as that of
8 3 6 3 1 2 6 8 2 1 6.
If the set to choose from is integer numbers from 1 to 9, then *each* of
those sequences has *the same* probability (1∕9)¹¹ ≈ 3.1866355 × 10⁻¹¹.
Repeating that random experiment one more time, the probability of
1 1 1 1 1 1 1 1 1 1 1 2
is the same as of
1 2 3 4 5 6 7 8 9 1 2 2
and the same as that of
8 3 6 3 1 2 6 8 2 1 6 2
(that the next outcome is a 2): (1∕9)¹² ≈ 3.5407062 × 10⁻¹². And the
probability of
1 1 1 1 1 1 1 1 1 1 1 1
(only 1s) is the *same* as of
1 1 1 1 1 1 1 1 1 1 1 2
(only 1s and one 2) and the same as of
1 1 1 1 1 1 1 1 1 1 1 3
(only 1s and a 3) and of
8 3 6 3 1 2 6 8 2 1 6 4
(that the next outcome is a 4, and there are no 5s at all).
AISB, those are *independent* events; the number of occurrences of an
outcome before *does not matter* for the probability of the next one. And
so the probability of getting a certain number does _not_ change depending
on the number of times you repeat the experiment. It is always the same;
in this example, it is always 1∕9. And the probability of _not_ getting a
certain number is always the same; in this example, it is always 1 − 1∕9 =
8∕9.
So, I am sorry to tell you this, but you do _not_ understand probability.
And you *cannot* understand it intuitively, like you tried to. Probability
is counter-intuitive.
That should not surprise or worry you. Because as I and others have pointed
out in this thread, the majority of the human population, including the OP,
does not understand probability – until it has been explained to them. I am
not even sure I understand probability in every instance, or rather, I am
sure that *intuitively* I do not.
It is why humans play the lottery using their “lucky numbers” whereas other
numbers have exactly the same probability of being drawn; why they think a
baby boy “is due” when they have two girls already; why they are surprised
if two people in the room have the same birthday; why *in the short run* you
can make a fortune by running a casino even if the tables are not rigged;
why the “gate three” trick works (for a while); why they pay money for
horoscopes and truthsayers, not recognizing that their wording is so
ambiguous that it is highly probable to apply to everyone, and why many of
them believe in a deity or fate.
A big part of the reason is psychology, how the human mind works: The human
mind has evolved to recognize patterns. It is an evolutionary advantage to
be able to recognize patterns, to tell a mammoth from a sabre-stooth lion at
a distance, a poisonous from an edible plant, and an X from a U. So humans
try to see patterns everywhere, even in pure randomness: clouds, texts,
numbers, you name it. And when they are told that something is random, and
they find patterns regardless, they do not believe that it is random.
An extension of that misconception is emphasized by an anecdote (which may
be apocryphal) told about Richard Feynman (I heard it from Lawrence Krauss
in “A Universe from Nothing”; he can tell that in a much more funny way than
I am able to reproduce it here [1]):
Richard Feynman used to go up to people all the time and he’d say:
“You won’t believe what happend to me today!
You won’t believe what happend to me today!”
And people would say:
“What?”
And he would reply:
“Absolutely nothing!”
Because humans are evolutionary driven to look for patterns, and they
believe that everything that happens to them is important and has
significance in “the greater scheme of things” (whose mere existence
is a misconception, too).
HTH
________
[1] Lawrence KRAUSS (2009). “A Universe from Nothing”. The Mariott hotel,
Burbank. 1:03:12. <https://youtu.be/-EilZ4VY5Vs?t=3792>
--
PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-06-08 04:52 +1000 |
| Message-ID | <mailman.271.1433703133.13271.python-list@python.org> |
| In reply to | #92283 |
On Mon, Jun 8, 2015 at 4:23 AM, Thomas 'PointedEars' Lahn <PointedEars@web.de> wrote: > If the set to choose from is integer numbers from 1 to 9, then *each* of > those sequences has *the same* probability (1∕9)¹¹ ≈ 3.1866355 × 10⁻¹¹. > > AISB, those are *independent* events; the number of occurrences of an > outcome before *does not matter* for the probability of the next one. And > so the probability of getting a certain number does _not_ change depending > on the number of times you repeat the experiment. It is always the same; > in this example, it is always 1∕9. And the probability of _not_ getting a > certain number is always the same; in this example, it is always 1 − 1∕9 = > 8∕9. Yes, the probability of not getting a certain number is always 8/9... but the probability of not getting that number for an entire sequence is 8/9 raised to the power of the length of the sequence, because they are independent events that must *all* happen. In your case, with a length of 11, that means there's about a 27% chance of any particular number not happening. Looking at it the other way, that means there's a 73% chance of seeing at least one N, for any N. So the probability of seeing at least one of every digit is actually quite low - about 5.6% - because you're looking for nine digits out of eleven. (Note that the numbers aren't quite exact here. The probabilities aren't entirely independent. Calculating off the exact figures is left as an exercise for the reader, but I don't expect they'll be more than a percentage point different one way or the other.) The chances of not seeing a 5 in any of 100 tosses would be virtually nil. The chances of not seeing a 5 in any of 1000 tosses is so close to zero that its inverse is indistinguishable from certainty with Python floats (IEEE double precision): >>> 1.0-(9/10)**1000 == 1.0 True That's a pretty low number. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 21:41 +0200 |
| Message-ID | <3006491.U2VKTQuU5p@PointedEars.de> |
| In reply to | #92289 |
Chris Angelico wrote: > On Mon, Jun 8, 2015 at 4:23 AM, Thomas 'PointedEars' Lahn > <PointedEars@web.de> wrote: >> If the set to choose from is integer numbers from 1 to 9, then *each* of >> those sequences has *the same* probability (1∕9)¹¹ ≈ 3.1866355 × 10⁻¹¹. >> >> AISB, those are *independent* events; the number of occurrences of an >> outcome before *does not matter* for the probability of the next one. >> And so the probability of getting a certain number does _not_ change >> depending on the number of times you repeat the experiment. It is always >> the same; in this example, it is always 1∕9. And the probability of >> _not_ getting certain number is always the same; in this example, it is >> always 1 − 1∕9 = 8∕9. > > Yes, the probability of not getting a certain number is always 8/9... > but the probability of not getting that number for an entire sequence > is 8/9 raised to the power of the length of the sequence As it is for getting every number at least once, or only two, three, four, five, six, seven, or eight of them. Each number has the probability of 1∕9 of being drawn, and for a sequence of n numbers drawn, the probability is (1∕9)ⁿ, *no matter* which numbers are in it. > because they are independent events that must *all* happen. […] As I already pointed out, your reasoning is purely *intuitive*, _not_ based on (correct) math, and (therefore) flawed. And intuition lets you down here. The correct, mathematical reasoning is: *Because* the events are *independent*, the probability of the sequence does not change depending on the number of occurrences of an event (outcome of a draw). That all numbers occur is _not_ more likely than that only one number occurs, or two, three, and so on. And that does _not_ depend on how many draws you take as there is an infinite reservoir for each number. I do not think I can explain it to you better, and I am not going to waste my precious free time to repeat myself. Follow the reference I gave. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jpiitula@ling.helsinki.fi> |
|---|---|
| Date | 2015-06-07 22:08 +0300 |
| Message-ID | <lf5sia3bakh.fsf@ling.helsinki.fi> |
| In reply to | #92283 |
Thomas 'PointedEars' Lahn writes: > Chris Angelico wrote: > >> Huh. Do you want to explain how, mathematically, I am wrong, or do >> you want to join the RUE in my ignore list? > > I already did; you have overlooked it. In a nutshell, the probability > of > > 1 1 1 1 1 1 1 1 1 1 1 There is one way to get those numbers in some order. (11! / 11! = 1) > is *the same* as that of > > 1 2 3 4 5 6 7 8 9 1 2 There are almost ten million ways to get those numbers in some order. (11! / 2! / 2! = 9979200) > and the same as that of > > 8 3 6 3 1 2 6 8 2 1 6. There are more than four hundred thousand ways to get those numbers in some order. (11! / 2! / 2! / 2! / 3! / 2! = 415800)
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 21:29 +0200 |
| Message-ID | <1583276.2lplL8rY5W@PointedEars.de> |
| In reply to | #92292 |
Jussi Piitulainen wrote: > Thomas 'PointedEars' Lahn writes: >> 8 3 6 3 1 2 6 8 2 1 6. > > There are more than four hundred thousand ways to get those numbers in > some order. > > (11! / 2! / 2! / 2! / 3! / 2! = 415800) Fallacy. Order is irrelevant here. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | random832@fastmail.us |
|---|---|
| Date | 2015-06-07 15:44 -0400 |
| Message-ID | <mailman.274.1433706275.13271.python-list@python.org> |
| In reply to | #92294 |
On Sun, Jun 7, 2015, at 15:29, Thomas 'PointedEars' Lahn wrote: > Jussi Piitulainen wrote: > > > Thomas 'PointedEars' Lahn writes: > >> 8 3 6 3 1 2 6 8 2 1 6. > > > > There are more than four hundred thousand ways to get those numbers in > > some order. > > > > (11! / 2! / 2! / 2! / 3! / 2! = 415800) > > Fallacy. Order is irrelevant here. Huh? The fact that order is irrelevant is exactly _why_ what he said _is_ relevant.
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 22:09 +0200 |
| Message-ID | <44804165.jDZYAAtjtx@PointedEars.de> |
| In reply to | #92296 |
random832@fastmail.us wrote: > On Sun, Jun 7, 2015, at 15:29, Thomas 'PointedEars' Lahn wrote: >> Jussi Piitulainen wrote: >> > Thomas 'PointedEars' Lahn writes: >> >> 8 3 6 3 1 2 6 8 2 1 6. >> > >> > There are more than four hundred thousand ways to get those numbers in >> > some order. >> > >> > (11! / 2! / 2! / 2! / 3! / 2! = 415800) >> Fallacy. Order is irrelevant here. > > Huh? The fact that order is irrelevant is exactly _why_ what he said > _is_ relevant. No. AISB, those sequences all have the same probability: <http://www.teacherlink.org/content/math/interactive/probability/numbersense/misconceptions/representativeness.html> <http://www.teacherlink.org/content/math/interactive/probability/numbersense/misconceptions/recency.html> -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | random832@fastmail.us |
|---|---|
| Date | 2015-06-07 16:41 -0400 |
| Message-ID | <mailman.276.1433709682.13271.python-list@python.org> |
| In reply to | #92299 |
On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote: > No. AISB, those sequences all have the same probability: Yes and the probability of getting _any_ of the sequences, is the sum of the probabilities for each one of the sequences individually.
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 22:59 +0200 |
| Message-ID | <12048809.77ogS5qXqA@PointedEars.de> |
| In reply to | #92302 |
random832@fastmail.us wrote: > On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote: >> No. AISB, those sequences all have the same probability: > > Yes and the probability of getting _any_ of the sequences, is the sum of > the probabilities for each one of the sequences individually. The probability of getting any of the sequences is precisely 1, and that is irrelevant to the fact that the probability of getting a particular sequence is independent of the number of occurrences of a specific event as the probability variable is evenly distributed and the events are independent (of each other). You really should follow the references and brush up your math. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-08 11:26 +1000 |
| Message-ID | <5574ef45$0$12981$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92302 |
On Mon, 8 Jun 2015 06:56 am, Thomas 'PointedEars' Lahn wrote:
> You really should follow the references and brush up your math.
You really should stop being patronising to everyone and pay attention to
what is actually being said.
If you won't believe us, and you won't believe mathematics, perhaps you will
believe a simulation.
import random
def trial(size):
return [random.randint(1, 9) for i in range(size)]
def test(trial_size, test_size):
every_element_seen = 0
for i in range(test_size):
s = set(trial(trial_size))
if len(s) == 9:
every_element_seen += 1
return every_element_seen/test_size
msg = "Pr(every element seen at least once) in %d trials"
for size in range(10, 100, 5):
print(msg % size, end=' ')
pr = test(size, 1000)
print(pr)
When I run this simulation, I get:
Pr(every element seen at least once) in 10 trials 0.007
Pr(every element seen at least once) in 15 trials 0.109
Pr(every element seen at least once) in 20 trials 0.349
Pr(every element seen at least once) in 25 trials 0.6
Pr(every element seen at least once) in 30 trials 0.765
Pr(every element seen at least once) in 35 trials 0.868
Pr(every element seen at least once) in 40 trials 0.917
Pr(every element seen at least once) in 45 trials 0.953
Pr(every element seen at least once) in 50 trials 0.976
Pr(every element seen at least once) in 55 trials 0.981
Pr(every element seen at least once) in 60 trials 0.996
Pr(every element seen at least once) in 65 trials 0.999
Pr(every element seen at least once) in 70 trials 0.997
Pr(every element seen at least once) in 75 trials 1.0
Pr(every element seen at least once) in 80 trials 0.998
Pr(every element seen at least once) in 85 trials 1.0
Pr(every element seen at least once) in 90 trials 1.0
Pr(every element seen at least once) in 95 trials 1.0
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | random832@fastmail.us |
|---|---|
| Date | 2015-06-07 21:34 -0400 |
| Message-ID | <mailman.284.1433727282.13271.python-list@python.org> |
| In reply to | #92302 |
On Sun, Jun 7, 2015, at 16:56, Thomas 'PointedEars' Lahn wrote: > random832@fastmail.us wrote: > > > On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote: > >> No. AISB, those sequences all have the same probability: > > > > Yes and the probability of getting _any_ of the sequences, is the sum of > > the probabilities for each one of the sequences individually. > > The probability of getting any of the sequences is precisely 1 Any of the sequences *satisfying "contains at least one of each result"* (or, in general, any assertion about the degree of coverage, for example, covering at least two values being more likely than covering only one value) Flipping one coin has two possible sequences: H and T. The probability of getting at least one H and at least one T is 0. Flipping two coins has four possible sequences: HH, HT, TH, TT. The probability of getting at least one H and at least one T is 50%, since two of these sequences satisfies the condition. Flipping three coins has eight possible sequences. The probability of getting at least one H and at least one T is 75%, since six of the sequences: THH, HTH, HHT, HTT, THT, TTH, all satisfy the condition. Flipping four coins has sixteen possibile sequences. The probability of getting at least one H and at least one T is 87.5% Rolling 2d3 has nine possible sequences. The probability of getting at least one of each value is 0. The probability of getting at least one each of two values is 66.7%. The probability of getting only one value is 33.3%. Rolling 3d3 has 27 possible sequences. Six of them have one of each value. The probability is 22.2%. 24 have at least two different values. That probability is 88.9%. Only 3 have all of the same value. 11.1%. Rolling 4d3 has 81 possible sequences. 36 of them have at least one of each value. The probability is 44.4%. 78 of them have at least two different values. The probability is 96.3%. Still only 3 have all of the same value. 3.7%. Rolling 5d3 has 243 possible sequences. 150 of them have at least one of each value. The probability is 61.7% 240 of them have at least two different values. 98.8%. Still only 3 have all of the same value. 1.2%. In general, as the number of trials increases, the probability of having e.g. at least one of each value never _reaches_ 1, but it gets arbitrarily close.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-06-08 11:42 +1000 |
| Message-ID | <mailman.285.1433727771.13271.python-list@python.org> |
| In reply to | #92302 |
On Mon, Jun 8, 2015 at 11:34 AM, <random832@fastmail.us> wrote: > In general, as the number of trials increases, the probability of having > e.g. at least one of each value never _reaches_ 1, but it gets > arbitrarily close. And by "arbitrarily close", you mean any of: * So close to 1.0 that IEEE double precision is unable to represent it * So unlikely that you could perform one trial every nanosecond until the heat death of the universe and still not expect to see it * Less likely than that two files could accidentally collide on MD5, SHA1, SHA256, and file size, simultaneously I think all of the above are true of this case, though I have to guess about the heat death of the universe. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2015-06-08 02:49 +0100 |
| Message-ID | <mailman.286.1433728391.13271.python-list@python.org> |
| In reply to | #92302 |
On 2015-06-08 02:42, Chris Angelico wrote: > On Mon, Jun 8, 2015 at 11:34 AM, <random832@fastmail.us> wrote: >> In general, as the number of trials increases, the probability of having >> e.g. at least one of each value never _reaches_ 1, but it gets >> arbitrarily close. > > And by "arbitrarily close", you mean any of: > I believe the word is "asymptotic". > * So close to 1.0 that IEEE double precision is unable to represent it > * So unlikely that you could perform one trial every nanosecond until > the heat death of the universe and still not expect to see it > * Less likely than that two files could accidentally collide on MD5, > SHA1, SHA256, and file size, simultaneously > > I think all of the above are true of this case, though I have to guess > about the heat death of the universe. >
[toc] | [prev] | [next] | [standalone]
Page 1 of 4 [1] 2 3 4 Next page →
Back to top | Article view | comp.lang.python
csiph-web