Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #92207 > unrolled thread
| Started by | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| First post | 2015-06-07 08:27 +0200 |
| Last post | 2015-06-07 11:04 -0400 |
| Articles | 20 on this page of 80 — 22 participants |
Back to article view | Back to comp.lang.python
Testing random Cecil Westerhof <Cecil@decebal.nl> - 2015-06-07 08:27 +0200
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 12:40 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-07 21:51 +1000
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 17:51 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 02:25 +1000
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 18:36 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 02:44 +1000
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 20:23 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 04:52 +1000
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 21:41 +0200
Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-07 22:08 +0300
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 21:29 +0200
Re: Testing random random832@fastmail.us - 2015-06-07 15:44 -0400
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:09 +0200
Re: Testing random random832@fastmail.us - 2015-06-07 16:41 -0400
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:59 +0200
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:26 +1000
Re: Testing random random832@fastmail.us - 2015-06-07 21:34 -0400
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 11:42 +1000
Re: Testing random MRAB <python@mrabarnett.plus.com> - 2015-06-08 02:49 +0100
Re: Testing random random832@fastmail.us - 2015-06-07 21:57 -0400
Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-08 10:40 +0300
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-10 19:03 +0200
Re: Testing random sohcahtoa82@gmail.com - 2015-06-10 10:52 -0700
Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-10 23:00 +0300
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-10 12:02 -0600
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-12 23:32 +0200
Re: Testing random alister <alister.nospam.ware@ntlworld.com> - 2015-06-12 21:46 +0000
Re: Testing random random832@fastmail.us - 2015-06-12 17:52 -0400
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-12 16:00 -0600
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-13 00:09 +0200
Re: Testing random sohcahtoa82@gmail.com - 2015-06-12 15:55 -0700
Re: Testing random random832@fastmail.us - 2015-06-12 18:57 -0400
Re: Testing random Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-06-13 08:53 +0100
Re: Testing random random832@fastmail.us - 2015-06-10 14:26 -0400
Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-07 14:21 -0700
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-16 21:18 +0200
Re: Testing random random832@fastmail.us - 2015-06-16 16:23 -0400
Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-16 13:48 -0700
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-16 23:57 +0200
Re: Testing random sohcahtoa82@gmail.com - 2015-06-16 15:30 -0700
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-16 16:58 -0600
Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 11:28 +0200
Re: Testing random Ned Batchelder <ned@nedbatchelder.com> - 2015-06-16 16:26 -0700
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-17 01:45 +0200
Re: Testing random sohcahtoa82@gmail.com - 2015-06-16 17:36 -0700
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-17 11:01 +1000
Re: Testing random Ethan Furman <ethan@stoneleaf.us> - 2015-06-16 18:32 -0700
Re: Testing random Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-06-17 09:41 +0100
Re: Testing random Grant Edwards <invalid@invalid.invalid> - 2015-06-17 14:04 +0000
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-17 09:01 -0600
Re: Testing random MRAB <python@mrabarnett.plus.com> - 2015-06-17 01:42 +0100
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-17 08:53 +0200
Re: Testing random Christian Gollwitzer <auriocus@gmx.de> - 2015-06-17 09:22 +0200
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-17 17:28 +1000
Re: Testing random Tim Golden <mail@timgolden.me.uk> - 2015-06-17 08:30 +0100
Re: Testing random Cecil Westerhof <Cecil@decebal.nl> - 2015-06-17 11:57 +0200
Re: Testing random Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-06-17 01:36 +0000
Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 12:33 +0200
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-17 22:47 +1000
Re: Testing random Laura Creighton <lac@openend.se> - 2015-06-17 15:50 +0200
Re: Testing random Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-06-17 01:35 +0000
Re: Testing random Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2015-06-17 07:41 +0300
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:11 +1000
Re: Testing random Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-07 11:07 -0600
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 03:20 +1000
Re: Testing random "C.D. Reimer" <chris@cdreimer.com> - 2015-06-07 10:36 -0700
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 04:28 +1000
Re: Testing random Chris Angelico <rosuav@gmail.com> - 2015-06-08 04:40 +1000
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 04:24 +1000
Re: Testing random Jonas Wielicki <jonas@wielicki.name> - 2015-06-07 12:41 +0200
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-07 22:52 +1000
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-07 23:06 +1000
Re: Testing random Peter Otten <__peter__@web.de> - 2015-06-07 15:35 +0200
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 18:36 +0200
Re: Testing random Peter Otten <__peter__@web.de> - 2015-06-07 18:48 +0200
Re: Testing random Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 22:15 +0200
Re: Testing random Steven D'Aprano <steve@pearwood.info> - 2015-06-08 11:35 +1000
Re: Testing random Christian Gollwitzer <auriocus@gmx.de> - 2015-06-07 14:53 +0200
Re: Testing random Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-06-07 11:04 -0400
Page 4 of 4 — ← Prev page 1 2 3 [4]
| From | Laura Creighton <lac@openend.se> |
|---|---|
| Date | 2015-06-17 15:50 +0200 |
| Message-ID | <mailman.548.1434549061.13271.python-list@python.org> |
| In reply to | #92738 |
In a message of Wed, 17 Jun 2015 22:47:37 +1000, "Steven D'Aprano" writes:
>There are magicians who are capable of forcing coins to land the required
>way up, and somebody once built a machine capable of tossing a coin with
>the precise equal force and velocity every single time. Dice are rarely
>unbiased, and neither are roulette wheels. Nevertheless, we ignore those
>factual biases for the sake of simplicity.
Simlplicity is fine. But if you use this for real world decisions,
such as 'Let's try one more time, I really want a daughter!'
because you didn't know that the simplicity is there, or rather
that what is true in populations isn't true because it is also
true for each individual in a population .. well, that can be tragic.
>Correct. And more Y sperm die than X sperm. Together, these two factors lead
>to a small but real bias towards girl children, at least on countries that
>don't practice wide-spread sex-specific abortion.
It is not clear, last I checked, that 'more Y sperm die' matters, because
most of the time, for reasons we still don't know, the ratio of male
and female sperm produced is not 50/50 either. So it may matter. But
maybe not.
>> But when you look at breeding age men and women you will find
>> that the ratio is a lot closer to 50/50 than the birth age ratio.
>> Human beings, as a population currently produce more males than
>> females.
>
>Surely that depends on where (and when) you are?
>
>I know that, today, both India and China both abort far more female fetuses
>than male ones, leading to a large excess of men. But in countries that
>don't practice selective abortions, my understanding is that there is a
>small excess of women at virtually all ages, especially among the elderly.
>Men tend to die earlier than women in every age bracket.
No. Before breeding age there is an excess of males.
I am excluding the notion of abortion here. (Another reason why humans
make lousy experimental subjects for this sort of study.) What I am
saying is that a whole lot more boy babies get born than female ones.
If infant mortality was sex-neutral, we would expect that, when we
took a census of 16-year-olds we would find that there continued to
be an excess of boys at that level, too. Instead, we have a small
excess of females, enough to believe that:
For populations of human beings, when they hit breeding age, it
is desirable to have a sex ratio near 50/50 but with a small
excess of females.
'Desirable' here just means that this is the answer that pops out when
you study real populations (or model them with computer simulations).
And to get this Desirable outcome, you have to produce many more male
babies than female ones.
Laura
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-06-17 01:35 +0000 |
| Message-ID | <5580ceed$0$1667$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92699 |
On Tue, 16 Jun 2015 13:48:04 -0700, Ned Batchelder wrote: > I apologize, I'm sure I've been using the mathematical terms > imprecisely. We are all intelligent people, so I still believe we > disagree because we are talking about different things. Neil, I believe that your actual mistake is assuming that Thomas is arguing in good faith. I see no evidence that he is, especially given the content of his latest posts. Multiple people have repeatedly explained the difference between his argument and what everyone else is talking about. Others, including me, have demonstrated empirically that he is mistaken, using both simulated tests and direct calculation of the probabilities. At this point, his insistence that we are making the gambler's fallacy is clearly not a mere misunderstanding due to confusion. It is wilful and deliberate misrepresentation of what we are saying. Thomas is correct for a completely different question. Rather than acknowledge that he has misunderstood the question, at every point he doubles down harder and insists that he is right and we are wrong. We have given *absolutely no reason* to think we have fallen for the Gambler's Fallacy. Throughout this thread, Thomas has repeatedly picked on trivial and unimportant errors in terminology as an excuse for dismissing what others have said, while ignoring the substance of their argument. When people have given mathematically indisputable and correct arguments, he has ignored them, or misrepresented them. -- Steven D'Aprano
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jpiitula@ling.helsinki.fi> |
|---|---|
| Date | 2015-06-17 07:41 +0300 |
| Message-ID | <lf5k2v3ymj3.fsf@ling.helsinki.fi> |
| In reply to | #92307 |
Ned Batchelder writes: > Thomas: let's say I generate streams of N digits drawn randomly from > 0-9. I then consider the probability of a zero *never appearing once* > in my stream. Let's call that P(N). Do you agree that as N > increases, P(N) decreases? In probability theory, that could be phrased as the probability that N unknown digits d_1, ..., d_N are all positive, assuming the digits are independent (so learning one digit doesn't reveal anything about any other digit), and for each d_k, the probability p_k of having a positive digit is the same. Mathematicians often abbreviate these assumptions as "i.i.d" for "independent" and "identically distributed". Also assuming uniform distributions, p_1 = p_2 = ... = p_N = 9/10. P(d_k > 0 for k = 1, ..., N) = P(d_1 > 0 and d_2 > 0 and ... and d_N > 0) = (by independence) P(d_1 > 0) * P(d_2 > 0) * ... * P(d_N > 0) = p_1 * p_2 * ... * p_N = (by identical uniform distribution) (9/10)^N In mathematics, (9/10)^N decreases as N increases, so one should indeed agree. Using more impressive notation and terminology correctly will not change the analysis.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-08 11:11 +1000 |
| Message-ID | <5574ebb9$0$13009$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92283 |
On Mon, 8 Jun 2015 04:23 am, Thomas 'PointedEars' Lahn wrote:
> Chris Angelico wrote:
>
>> On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn
>> <PointedEars@web.de> wrote:
>>>> The greater the multiplier, the lower the chance that any element will
>>>> have no hits.
>>> Wrong.
>>>
>>>> [ex falso quodlibet]
>>
>> Huh. Do you want to explain how, mathematically, I am wrong, or do you
>> want to join the RUE in my ignore list?
>
> I already did; you have overlooked it. In a nutshell, the probability of
>
> 1 1 1 1 1 1 1 1 1 1 1
>
> is *the same* as that of
>
> 1 2 3 4 5 6 7 8 9 1 2
>
> and the same as that of
>
> 8 3 6 3 1 2 6 8 2 1 6.
True, but irrelevant. What is irrelevant is that the probability of:
1 1 1 1 1 1 1 1 1 1 1
is MUCH LESS than the probability of:
1 2 3 4 5 6 7 8 9 1 2 or
8 3 6 3 1 2 6 8 2 1 6 or
9 3 1 5 5 4 7 1 6 8 1 or
2 3 3 1 8 9 7 2 5 1 6 or
7 8 9 4 2 7 7 1 3 8 2 or
...
[enumerate millions of other possibilities]
...
2 1 5 5 8 1 8 7 6 5 1 or
3 4 9 5 2 8 4 2 3 2 6.
> So, I am sorry to tell you this, but you do _not_ understand probability.
It's certainly true that you don't.
> And you *cannot* understand it intuitively, like you tried to.
Rubbish. Chris has got it right. Whether he did so "intuitively" or whether
he learned this after years of study is irrelevant, and you have no
evidence for which it is. You are jumping to wrong conclusions -- you are
so sure that you're so much smarter than everyone else that you've ASSUMED
that Chris must be wrong and *completely failed to pay attention* to what
he has actually said, responding to things he hasn't said.
> It is why humans play the lottery using their “lucky numbers” whereas
> other numbers have exactly the same probability of being drawn;
[snip more irrelevant examples]
None of these things are relevant in the slightest.
> An extension of that misconception is emphasized by an anecdote (which may
> be apocryphal) told about Richard Feynman (I heard it from Lawrence Krauss
> in “A Universe from Nothing”; he can tell that in a much more funny way
> than I am able to reproduce it here [1]):
>
> Richard Feynman used to go up to people all the time and he’d say:
>
> “You won’t believe what happend to me today!
> You won’t believe what happend to me today!”
>
> And people would say:
>
> “What?”
>
> And he would reply:
>
> “Absolutely nothing!”
And I wouldn't believe it either. *Something happened*. He got up. He put
his shoes on. He washed his face. These are all events, but they are not
memorable events. They are equivalent to a sample like:
6 8 1 5 9 2 7 6 3 3 2
There are millions of ways to get a non-memorable sample like that, but only
nine ways to get a memorable sample like:
2 2 2 2 2 2 2 2 2 2 2
Your mistake is that you are comparing that one memorable sample to any
*one* of the non-memorable ones, when you should be comparing the
probability of *any* memorable sample against *any* non-memorable ones.
Actually, your real mistake is hubris. *Nothing* in Chris' comments in this
thread should lead you to the conclusion he doesn't know what he is talking
about, but you're so full of the preconceived notion that he must be wrong
because he is not Thomas Lahn that you've actually made an astonishing
blunder and compounded it with laughable arrogance.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-06-07 11:07 -0600 |
| Message-ID | <mailman.261.1433697238.13271.python-list@python.org> |
| In reply to | #92271 |
On Sun, Jun 7, 2015 at 10:44 AM, Chris Angelico <rosuav@gmail.com> wrote: > On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn > <PointedEars@web.de> wrote: >>> The greater the multiplier, the lower the chance that any element will >>> have no hits. >> >> Wrong. >> >>> [ex falso quodlibet] > > Huh. Do you want to explain how, mathematically, I am wrong, or do you > want to join the RUE in my ignore list? My best speculation is that he's either objecting to the generality of your statement (it's false if the probability of some element occurring is zero or eventually degrades to zero), or misreading the word "multiplier" to the conclusion that the value of each element is being multiplied rather than the number of trials. Or trolling; I suppose that's always an option too.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-06-08 03:20 +1000 |
| Message-ID | <mailman.263.1433697617.13271.python-list@python.org> |
| In reply to | #92271 |
On Mon, Jun 8, 2015 at 3:07 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: > On Sun, Jun 7, 2015 at 10:44 AM, Chris Angelico <rosuav@gmail.com> wrote: >> On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn >> <PointedEars@web.de> wrote: >>>> The greater the multiplier, the lower the chance that any element will >>>> have no hits. >>> >>> Wrong. >>> >>>> [ex falso quodlibet] >> >> Huh. Do you want to explain how, mathematically, I am wrong, or do you >> want to join the RUE in my ignore list? > > My best speculation is that he's either objecting to the generality of > your statement (it's false if the probability of some element > occurring is zero or eventually degrades to zero), or misreading the > word "multiplier" to the conclusion that the value of each element is > being multiplied rather than the number of trials. Or trolling; I > suppose that's always an option too. My first thought was your first option, which is why I specifically responded with explanation about how data type selection can viably be about expectations rather than certainties, but he responded with a one-word "Wrong" so I must have been answering the wrong question. I've no idea about the misreading of "multiplier". A fourth possibility is that mathematics works differently for him and for us, which I suppose is possible; when I visited sci.math a while ago, I found some people for whom everything I'd learned in grade school was clearly wrong, and they were doing their best to enlighten the world about the new truths of mathematics that they'd found. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | "C.D. Reimer" <chris@cdreimer.com> |
|---|---|
| Date | 2015-06-07 10:36 -0700 |
| Message-ID | <mailman.266.1433698601.13271.python-list@python.org> |
| In reply to | #92271 |
On 6/7/2015 10:20 AM, Chris Angelico wrote: > A fourth > possibility is that mathematics works differently for him and for us, > which I suppose is possible; when I visited sci.math a while ago, I > found some people for whom everything I'd learned in grade school was > clearly wrong, and they were doing their best to enlighten the world > about the new truths of mathematics that they'd found. I had the unfortunate luck of taking "Harvard Calculus" in college (circa 1995). The textbook was nothing but word problems from end to end. The goal was to get the students away from symbolic thinking of traditional calculus into thinking about real world problems that uses calculus. (Never mind that the majority of students don't use calculus after they get out school.) I bailed out after two weeks. Taking the class at 7:30AM probably didn't help. Chris R.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-08 04:28 +1000 |
| Message-ID | <55748d5f$0$13008$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92271 |
On Mon, 8 Jun 2015 02:36 am, Thomas 'PointedEars' Lahn wrote: > Chris Angelico wrote: >> The greater the multiplier, the lower the chance that any element will >> have no hits. > > Wrong. No, Chris is correct. The "multiplier" increases the number of samples. The larger the number of samples, the lower the chance that any element is not represented. If my previous post didn't convince you, consider an even simpler random distribution: tossing a fair coin. The probability of getting a head is exactly 1/2 whether you toss the coin once or a thousand times. But if you toss the coin once, your chances of getting "no heads" is 1/2. If you toss it a thousand times, your chances of getting "no heads" is 1/2**1000. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-06-08 04:40 +1000 |
| Message-ID | <mailman.269.1433702438.13271.python-list@python.org> |
| In reply to | #92284 |
On Mon, Jun 8, 2015 at 4:28 AM, Steven D'Aprano <steve@pearwood.info> wrote: > If my previous post didn't convince you, consider an even simpler random > distribution: tossing a fair coin. The probability of getting a head is > exactly 1/2 whether you toss the coin once or a thousand times. But if you > toss the coin once, your chances of getting "no heads" is 1/2. If you toss > it a thousand times, your chances of getting "no heads" is 1/2**1000. Although... purity and reality do sometimes disagree. The concept of "independent events" says that if I toss a coin 99 times and it comes up heads every time, the probability that it'll come up heads again the hundredth time is still 50-50. But the reality of cheats coins says that the chances of the coin being a fair one and having been tossed 99 times and come up heads every time is one in 1<<99, and given that the mass of this planet is roughly 6E27 grams, that would mean that you'd need about a thousand earths' worth of coins to get that many... so the chances are much better that you're looking at either an imbalanced coin or a crafty operator, and the probability is much higher that it comes up heads :) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-08 04:24 +1000 |
| Message-ID | <55748c4f$0$13012$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92263 |
On Mon, 8 Jun 2015 01:51 am, Thomas 'PointedEars' Lahn wrote: > Chris Angelico wrote: > >> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn >> <PointedEars@web.de> wrote: >>> Cecil Westerhof wrote: >>>> I wrote a very simple function to test random: >>>> def test_random(length, multiplier = 10000): >>>> number_list = length * [0] >>>> for i in range(length * multiplier): >>>> number_list[random.randint(0, length - 1)] += 1 >>>> minimum = min(number_list) >>>> maximum = max(number_list) >>>> return (minimum, maximum, minimum / maximum) >>> >>> As there is no guarantee that every number will occur randomly, using a >>> dictionary at first should be more efficient than a list: >> >> Hmm, I'm not sure that's actually so. His code is aiming to get >> 'multiplier' values in each box; for any serious multiplier (he starts >> with 10 in the main code), you can be fairly confident that every >> number will come up at least once. > > The wording shows a common misconception: that random distribution would > mean that it is guaranteed or more probable that every element of the set > will occur at least once. It is another common misconception that > increasing the number of trials would increase the probability of that > happening. But that is not so. You are badly mistaken. Of course you are correct that a random distribution does not *guarantee* that every element of the set is seen at least once. But as soon as you call it "a common misconception" that increasing the number of trials increases the probability of seeing every element, you are simply mistaken. It's not a misconception at all, it is true. See below for calculations. [...] > If the distribution is even and the game is fair, every outcome *always* > has > the same probability of occurring. As a result, every sequence of > outcomes of equal length *always* has the same probability of occurring, > and the probability for a particular sequence of equal length does _not_ > increase or > decrease based on the number of occurrences of previous outcomes. Those > are *independent* events. You are confused. Think about it more carefully. To make it simple, let's randomly generate from the set 1, 2, 3, and do just three samples. That's small enough to enumerate all the possibilities: 111 112 113 121 122 123 131 132 133 211 212 213 221 222 223 231 232 233 311 312 313 321 322 323 331 332 333 Each event (111 etc) has a probability of 1/27, but that's not important. We're comparing the probability of all elements being seen, versus at least one element being missed. I'm going to tag the events that contain all three elements: 111 112 113 121 122 123* 131 132* 133 211 212 213* 221 222 223 231* 232 233 311 312* 313 321* 322 323 331 332 333 So there are six events out of 27, or 2/9, where all elements are seen in a trial, versus 7/9 where at least one element will have a count of zero. Putting it another way: Pr(at least one element has zero count) = Pr(1 has zero count) + Pr(2 has zero count) + Pr(3 has zero count) - Pr(1 and 2 both have zero counts) - Pr(1 and 3 both have zero counts) - Pr(2 and 3 both have zero counts) = (2/3)**3 * 3 - (1/3)**3 * 3 which gives us 7/9 as confirmed by direct enumeration of all the possibilities. With four samples per trial: Pr(at least one element has zero count) = 5/9 With five samples: Pr(at least one element has zero count) = 31/81 Ten samples: Pr(at least one element has zero count) = 341/6561 or approximately 0.05 Twenty samples: Pr(at least one element has zero count) = 349525/387420489 or approximately 0.0009 For 100 samples: Pr(at least one element has zero count) = 7.4e-18 (approx) which means that with 100 samples from the set (1, 2, 3), if you could run one million trials every second, it would on average take you almost 4300 years to see a trial that had a zero count for one or more of the elements. The probability of any element *not* showing up in a random trial with N samples decreases rapidly as N increases. Elements with zero count are only a factor when the number of samples is small, or the number of elements is large. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Jonas Wielicki <jonas@wielicki.name> |
|---|---|
| Date | 2015-06-07 12:41 +0200 |
| Message-ID | <mailman.237.1433674295.13271.python-list@python.org> |
| In reply to | #92207 |
[Multipart message — attachments visible in raw view] — view raw
On 07.06.2015 08:27, Cecil Westerhof wrote:
> I wrote a very simple function to test random:
> def test_random(length, multiplier = 10000):
> number_list = length * [0]
> for i in range(length * multiplier):
> number_list[random.randint(0, length - 1)] += 1
> minimum = min(number_list)
> maximum = max(number_list)
> return (minimum, maximum, minimum / maximum)
>
> When running:
> for i in range(1, 7):
> print(test_random(100, 10 ** i))
> I get:
> (3, 17, 0.17647058823529413)
> (73, 127, 0.5748031496062992)
> (915, 1086, 0.8425414364640884)
> (9723, 10195, 0.9537027954879843)
> (99348, 100620, 0.9873583780560524)
> (997198, 1002496, 0.9947151908835546)
>
> and when running:
> for i in range(1, 7):
> print(test_random(1000, 10 ** i))
> I get:
> (2, 20, 0.1)
> (69, 138, 0.5)
> (908, 1098, 0.8269581056466302)
> (9684, 10268, 0.9431242695753799)
> (99046, 100979, 0.9808574059953059)
> (996923, 1003598, 0.9933489305478886)
>
> It shows that when the first parameter increases the deviation
> increases and when the second parameter increases the deviation
> decreases. Exactly what you would expect. But what are the ranges you
> would expect with a good random function.
Really depends on the number of samples. Appearantly, a good RNG would
come close to 1.0 for the ratio.
> Then it could be used to test a random function.
This is an interesting test (more interesting to me than it looks at the
first sight, and certainly better than what I had come up with), but
unfortunately, there is more to testing randomness.
The test clearly suggests that random functions should have a min/max
ratio of about 1.0. Think of a "random" function like this:
def fake_random(minv, maxv, _cache={}):
try:
return next(_cache[minv, maxv])
except (StopIteration, KeyError):
iterator = iter(range(minv, maxv+1))
_cache[minv, maxv] = iterator
return next(iterator)
(if that is hard to read, I agree; it returns the sequence from minv to
maxv (inclusively) over and over again for equal minv and maxv. don’t
pass anything to _cache :), just call it like random.randint)
This gives a "perfect" ratio of 1.0, but is not very random. This kind
of function would probably be ruled out by a compression or correlation
test. If you want to dig deeper into the algorithms for random testing,
the dieharder test suite [1] is probably worth a look.
It all boils down to the use case. For some use cases, the
``fake_random`` might be good enough (unittesting would be such a case:
it is certainly uniformly distributed and allows full coverage of the
tested range), for others it would fail catastrophically (don’t generate
your cryptographic keys with this!).
cheers,
Jonas
[1]: http://www.phy.duke.edu/~rgb/General/dieharder.php
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-07 22:52 +1000 |
| Message-ID | <55743ea4$0$13004$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92207 |
On Sun, 7 Jun 2015 04:27 pm, Cecil Westerhof wrote:
> I wrote a very simple function to test random:
> def test_random(length, multiplier = 10000):
> number_list = length * [0]
> for i in range(length * multiplier):
> number_list[random.randint(0, length - 1)] += 1
> minimum = min(number_list)
> maximum = max(number_list)
> return (minimum, maximum, minimum / maximum)
Putting aside the timing aspects, your frequency calculations are not done
in a very Pythonic manner. A better way might be:
from collections import Counter
from random import randint
def test_random(length, multiplier = 10000):
freq = Counter(
randint(0, length - 1) for i in range(length * multiplier)
)
minimum = min(freq[i] for i in range(length))
maximum = max(freq.values())
return (minimum, maximum, minimum / maximum)
Note carefully the way I calculate the minimum value. That way, if by chance
some number has a frequency of zero, the minimum returned will be zero. For
the maximum, that doesn't matter, since any missing numbers will not be the
most frequent number.
Better than returning the min and max, you should return the counter object
itself. Before doing so, I make sure that any missing numbers are given a
value of zero.
def test_random(size, samples=1000000):
"""Return a sample of `samples` random numbers from 0 to size-1."""
freq = Counter(randint(0, size-1) for i in range(samples))
# Avoid missing numbers. (Likely for small samples.)
freq.subtract(dict.fromkeys(range(size), 0))
return freq
Now once you have the frequencies, you can look at many interesting
statistics, and verify that you have a valid sample.
py> f = test_random(100)
py> sum(f.values()) == 1000000 # Total number of samples is right?
True
py> sorted(f) == list(range(100)) # Any missing numbers?
True
py> max(f) - min(f) # statistical range
99
py> min(f.values()) # Lowest frequency?
9775
py> max(f.values()) # And the highest?
10313
The ratio of those two frequencies isn't very interesting, but we can
calculate it:
py> min(f.values())/max(f.values())
0.9478328323475226
For a uniformly distributed population, that ratio will be 1.0 exactly, but
for any finite sample, it could be any number between 0 and 1. We expect
that if we have a large random sample, the ratio should approach 1, but
moderate deviations from that are expected. In fact, we should be surprised
if the ratio is exactly 1, and suspect shenanigans if it is.
We can extract the mode, and its frequency:
py> f.most_common(1) # mode
[(85, 10313)]
In Python 3.4 or better, we can look at some more statistics:
py> import statistics
py> statistics.median(f.elements())
50.0
py> statistics.mean(f.elements())
49.531949
py> statistics.stdev(f.elements())
28.873178122988367
And verify that the mode is correct:
py> statistics.mode(f.elements())
85
We can see what happens if the data is fiddled with:
py> g = f.copy()
py> g[42] += g[23]
py> g[23] = 0
py> statistics.median(g.elements())
50.0
py> statistics.mean(g.elements())
49.7263
py> statistics.stdev(g.elements())
28.757647388343212
py> g.most_common(3)
[(42, 20258), (85, 10313), (56, 10208)]
The median is unchanged, the mean shifts slightly higher, and the standard
deviation increases. But as you can see, these are not particularly
powerful tests of randomness: only the mode shows an obvious deviation from
uniformity.
> But what are the ranges you
> would expect with a good random function. Then it could be used to
> test a random function.
Python's pseudo-random number generator is based on the Mersenne Twister.
This is one of the most high-quality and extensively tested PRNGs
available, with a period of 2**19937-1 before it repeats. Its randomness
properties are very good indeed. (Although of course that isn't to say that
there aren't bugs in the Python implementation.)
Mersenne Twister is is not suitable for cryptographic work, but apart from
that, it is pretty much as indistinguishable from random as any
deterministic computer program can be.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-07 23:06 +1000 |
| Message-ID | <557441ed$0$13009$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92242 |
On Sun, 7 Jun 2015 10:52 pm, Steven D'Aprano wrote: > The median is unchanged, the mean shifts slightly higher, and the standard > deviation increases. But as you can see, these are not particularly > powerful tests of randomness: only the mode shows an obvious deviation > from uniformity. Oh, I forgot: we can look at the frequencies themselves as well. If our sample is absolutely perfectly distributed uniformly, then every element will have a frequency of exactly 10000. We don't necessarily expect this from a random sample, but if we're too far from that, we should be concerned. Here are the results from the fair sample: py> statistics.mean(f.values()) 10000.0 py> statistics.median(f.values()) 9995.0 py> statistics.stdev(f.values()) 91.8027089673141 If I do the same thing with the biased sample, the standard deviation stands out like a sore thumb: py> statistics.mean(g.values()) 10000.0 py> statistics.median(g.values()) 9993.0 py> statistics.stdev(g.values()) 1442.527341617181 -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-06-07 15:35 +0200 |
| Message-ID | <mailman.247.1433684166.13271.python-list@python.org> |
| In reply to | #92242 |
Steven D'Aprano wrote:
>> I wrote a very simple function to test random:
>> def test_random(length, multiplier = 10000):
>> number_list = length * [0]
>> for i in range(length * multiplier):
>> number_list[random.randint(0, length - 1)] += 1
>> minimum = min(number_list)
>> maximum = max(number_list)
>> return (minimum, maximum, minimum / maximum)
>
> Putting aside the timing aspects, your frequency calculations are not done
> in a very Pythonic manner.
I would agree if the frequency table were sparse, i. e. many indices with
number_list[index] == 0
but that's not the case with on average 10000 hits per index.
> A better way might be:
I'm too lazy to measure, but this will likely be a tad slower. Did you mean
to convey this by "Putting aside the timing aspects"?
> from collections import Counter
> from random import randint
>
> def test_random(length, multiplier = 10000):
> freq = Counter(
> randint(0, length - 1) for i in range(length * multiplier)
> )
> minimum = min(freq[i] for i in range(length))
How about
if len(freq) < length:
minimum = 0
else:
minimum = min(freq.values())
Not as an optimisation (replacing randint() with randrange() is probably
more effective in that department), but because it's closer to being self-
explanatory.
> maximum = max(freq.values())
> return (minimum, maximum, minimum / maximum)
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 18:36 +0200 |
| Message-ID | <1586765.9ka6YhQpJC@PointedEars.de> |
| In reply to | #92252 |
Peter Otten wrote: > Steven D'Aprano wrote: >>> I wrote a very simple function to test random: >>> def test_random(length, multiplier = 10000): >>> number_list = length * [0] >>> for i in range(length * multiplier): >>> number_list[random.randint(0, length - 1)] += 1 >>> minimum = min(number_list) >>> maximum = max(number_list) >>> return (minimum, maximum, minimum / maximum) >> >> Putting aside the timing aspects, your frequency calculations are not >> done in a very Pythonic manner. > > I would agree if the frequency table were sparse, i. e. many indices with > > number_list[index] == 0 > > but that's not the case with on average 10000 hits per index. A common misconception. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-06-07 18:48 +0200 |
| Message-ID | <mailman.260.1433695702.13271.python-list@python.org> |
| In reply to | #92270 |
Thomas 'PointedEars' Lahn wrote: > Peter Otten wrote: > >> Steven D'Aprano wrote: >>>> I wrote a very simple function to test random: >>>> def test_random(length, multiplier = 10000): >>>> number_list = length * [0] >>>> for i in range(length * multiplier): >>>> number_list[random.randint(0, length - 1)] += 1 >>>> minimum = min(number_list) >>>> maximum = max(number_list) >>>> return (minimum, maximum, minimum / maximum) >>> >>> Putting aside the timing aspects, your frequency calculations are not >>> done in a very Pythonic manner. >> >> I would agree if the frequency table were sparse, i. e. many indices with >> >> number_list[index] == 0 >> >> but that's not the case with on average 10000 hits per index. > > A common misconception. A pointless remark. Not counting the ears ;)
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-06-07 22:15 +0200 |
| Message-ID | <2115433.UTq2kp7hT1@PointedEars.de> |
| In reply to | #92273 |
Peter Otten wrote: > Thomas 'PointedEars' Lahn wrote: >> Peter Otten wrote: >>> Steven D'Aprano wrote: >>>>> I wrote a very simple function to test random: >>>>> def test_random(length, multiplier = 10000): >>>>> number_list = length * [0] >>>>> for i in range(length * multiplier): >>>>> number_list[random.randint(0, length - 1)] += 1 >>>>> minimum = min(number_list) >>>>> maximum = max(number_list) >>>>> return (minimum, maximum, minimum / maximum) >>>> >>>> Putting aside the timing aspects, your frequency calculations are not >>>> done in a very Pythonic manner. >>> >>> I would agree if the frequency table were sparse, i. e. many indices >>> with >>> >>> number_list[index] == 0 >>> >>> but that's not the case with on average 10000 hits per index. >> >> A common misconception. > > A pointless remark. Not counting the ears ;) The details are in my other follow-ups. Go read them. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-08 11:35 +1000 |
| Message-ID | <5574f14c$0$13008$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92252 |
On Sun, 7 Jun 2015 11:35 pm, Peter Otten wrote:
> Steven D'Aprano wrote:
>
>
>>> I wrote a very simple function to test random:
>>> def test_random(length, multiplier = 10000):
>>> number_list = length * [0]
>>> for i in range(length * multiplier):
>>> number_list[random.randint(0, length - 1)] += 1
>>> minimum = min(number_list)
>>> maximum = max(number_list)
>>> return (minimum, maximum, minimum / maximum)
>>
>> Putting aside the timing aspects, your frequency calculations are not
>> done in a very Pythonic manner.
>
> I would agree if the frequency table were sparse, i. e. many indices with
>
> number_list[index] == 0
>
> but that's not the case with on average 10000 hits per index.
>
>> A better way might be:
>
> I'm too lazy to measure, but this will likely be a tad slower. Did you
> mean to convey this by "Putting aside the timing aspects"?
No, I meant that using a Counter to count stuff is more Pythonic than using
a list.
>> from collections import Counter
>> from random import randint
>>
>> def test_random(length, multiplier = 10000):
>> freq = Counter(
>> randint(0, length - 1) for i in range(length * multiplier)
>> )
>> minimum = min(freq[i] for i in range(length))
>
> How about
>
> if len(freq) < length:
> minimum = 0
> else:
> minimum = min(freq.values())
That's quite nice, and it's extensible to the case where I return the
counter:
if len(freq) != length:
# At least one element never showed up at all.
freq.subtract(dict.fromkeys(range(length), 0)
> Not as an optimisation (replacing randint() with randrange() is probably
> more effective in that department), but because it's closer to being self-
> explanatory.
I thought about replacing randint with randrange, but they are different
functions with different implementations, and could exhibit different
biases. So one might wish to test them separately for statistical
properties.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-06-07 14:53 +0200 |
| Message-ID | <ml1epn$peo$1@dont-email.me> |
| In reply to | #92207 |
Am 07.06.15 um 08:27 schrieb Cecil Westerhof: > I wrote a very simple function to test random: > def test_random(length, multiplier = 10000): > number_list = length * [0] > for i in range(length * multiplier): > number_list[random.randint(0, length - 1)] += 1 > minimum = min(number_list) > maximum = max(number_list) > return (minimum, maximum, minimum / maximum) > > It shows that when the first parameter increases the deviation > increases and when the second parameter increases the deviation > decreases. Exactly what you would expect. But what are the ranges you > would expect with a good random function. Then it could be used to > test a random function. Random number generators (RNG or PRNG) are usually tested using similar methods, where the outcome can be assigned a probability distribution. For example, diehard is a famous sequence of tests for RNGs. http://en.wikipedia.org/wiki/Diehard_tests You are only checking for uniform distribution. This couldn't detect correlation, for instance if the RNG would just return increasing numbers. A better check for uniformness could be done by the chi square test or Kolmogorov-Smirnov. Then there are tables which relate the deviations to significance. Christian
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2015-06-07 11:04 -0400 |
| Message-ID | <mailman.252.1433689486.13271.python-list@python.org> |
| In reply to | #92207 |
On Sun, 07 Jun 2015 08:27:59 +0200, Cecil Westerhof <Cecil@decebal.nl>
declaimed the following:
>
>It shows that when the first parameter increases the deviation
>increases and when the second parameter increases the deviation
>decreases. Exactly what you would expect. But what are the ranges you
>would expect with a good random function. Then it could be used to
>test a random function.
From what I could make out, you are testing the distribution of an
equally weighted function (that is, any value in the lower..upper range is
equally likely to appear).
Unfortunately, a measure of distribution shows nothing for
predictability (non-randomness).
Here's an example displaying ideal distribution (I should have made the
first for-loop a generator)
>>> def lin(smpl=10000):
... samples = {}
... for i in range(smpl):
... samples[i % 10] = samples.get(i % 10, 0) + 1
... for i in range(10):
... print i, samples[i]
...
>>> lin()
0 1000
1 1000
2 1000
3 1000
4 1000
5 1000
6 1000
7 1000
8 1000
9 1000
>>> lin(9876)
0 988
1 988
2 988
3 988
4 988
5 988
6 987
7 987
8 987
9 987
>>>
But is it random? Only in that there is a finite chance of a truly
random sequence being: 0, 1, 2, 3, 4, ..., 8, 9, 0, 1, ...
If the distribution is widely uneven, the odds are the generator is not
random -- or at least, not random in the equally weighted sense; it may be
perfectly random gaussian or Poisson distribution.
For the equally weighted case, one may need to also compute, say,
differences between adjacent samples, between alternate samples, between
every 10th sample, whatever... with the intent being to reveal if there is
any cyclic pattern detectable.
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [standalone]
Page 4 of 4 — ← Prev page 1 2 3 [4]
Back to top | Article view | comp.lang.python
csiph-web