Groups > comp.lang.java.programmer > #9332 > unrolled thread

RandomDirichlet?

Started by	Giovanni Azua <bravegag@hotmail.com>
First post	2011-11-01 10:09 +0100
Last post	2011-11-01 22:09 -0400
Articles	10 — 5 participants

Back to article view | Back to comp.lang.java.programmer

  RandomDirichlet? Giovanni Azua <bravegag@hotmail.com> - 2011-11-01 10:09 +0100
    Re: RandomDirichlet? Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-01 07:50 -0400
      Re: RandomDirichlet? Giovanni Azua <bravegag@hotmail.com> - 2011-11-01 19:50 +0100
    Re: RandomDirichlet? Wojtek <nowhere@a.com> - 2011-11-01 04:54 -0700
    Re: RandomDirichlet? Roedy Green <see_website@mindprod.com.invalid> - 2011-11-01 06:11 -0700
      Re: RandomDirichlet? Roedy Green <see_website@mindprod.com.invalid> - 2011-11-01 06:16 -0700
      Re: RandomDirichlet? Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-01 20:23 -0400
        Re: RandomDirichlet? Roedy Green <see_website@mindprod.com.invalid> - 2011-11-02 01:39 -0700
          Re: RandomDirichlet? Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-02 08:17 -0400
    Re: RandomDirichlet? Arne Vajhøj <arne@vajhoej.dk> - 2011-11-01 22:09 -0400

#9332 — RandomDirichlet?

From	Giovanni Azua <bravegag@hotmail.com>
Date	2011-11-01 10:09 +0100
Subject	RandomDirichlet?
Message-ID	<CAD575DC.8551%bravegag@hotmail.com>

Hello,

I need to generate random TPCH workloads but I don't want a Uniform
distribution where Q1, RF1, RF2 would be equally probable i.e. using new
Random().nextInt(..)

I instead would like to draw samples from a Dirichlet distribution which
would give a very skewed PD depending on the input alphas (which specify how
skewed I want the PD to be). The Matlab code would be exactly:

function r = dirichlet_sample(a,n)
p = length(a);
r = gamrnd(repmat(a,n,1),1,n,p);
r = r ./ repmat(sum(r,2),1,p);
End

But there is no "gamrnd" RandomGamma in Java nor RandomDirichlet. Before I
spend time writing this, can anyone advice for a low-footprint library
implementing those functions?

TIA,
Best regards,
Giovanni

[toc] | [next] | [standalone]

#9333

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2011-11-01 07:50 -0400
Message-ID	<j8omde$4a7$1@dont-email.me>
In reply to	#9332

On 11/1/2011 5:09 AM, Giovanni Azua wrote:
> Hello,
>
> I need to generate random TPCH workloads but I don't want a Uniform
> distribution where Q1, RF1, RF2 would be equally probable i.e. using new
> Random().nextInt(..)
>
> I instead would like to draw samples from a Dirichlet distribution which
> would give a very skewed PD depending on the input alphas (which specify how
> skewed I want the PD to be). The Matlab code would be exactly:
>
> function r = dirichlet_sample(a,n)
> p = length(a);
> r = gamrnd(repmat(a,n,1),1,n,p);
> r = r ./ repmat(sum(r,2),1,p);
> End
>
> But there is no "gamrnd" RandomGamma in Java nor RandomDirichlet. Before I
> spend time writing this, can anyone advice for a low-footprint library
> implementing those functions?

     Google finds ~114,000 hits for "gamma distribution java".

     As an aside, Google finds ~396,000 hits for "dirichlet
distribution java".

     Finally, Google finds ~111,000,000 hits for "people who
never heard of google".  I'm not sure just what that means ...

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#9345

From	Giovanni Azua <bravegag@hotmail.com>
Date	2011-11-01 19:50 +0100
Message-ID	<CAD5FDF8.87FC%bravegag@hotmail.com>
In reply to	#9333

On 11/1/11 12:50 PM, in article j8omde$4a7$1@dont-email.me, "Eric Sosman"
<esosman@ieee-dot-org.invalid> wrote:
> Google finds ~114,000 hits for "gamma distribution java".
> 
> As an aside, Google finds ~396,000 hits for "dirichlet
> distribution java".
> 
> Finally, Google finds ~111,000,000 hits for "people who
> never heard of google".  I'm not sure just what that means ...
...

This is the kind of useless answer I really do not enjoy and to me means you
are kind of bored at work.

Of course I looked into Google first and found so many hits which means
incredible time to evaluate, test, check it doesn't have zillion useless
dependencies etc etc etc

If you go back and read my OP you will perhaps notice that I asked whether
someone can _recommend_, meaning you freaking used it and can positively
recommend it. I wasn't sure about the full meaning of _recommend_ so I
looked it up in a dictionary.

recommend: "Put forward (someone or something) with approval as being
suitable for a particular purpose or role."

So please learn to read carefully before coming back with such BS answers.

[toc] | [prev] | [next] | [standalone]

#9334

From	Wojtek <nowhere@a.com>
Date	2011-11-01 04:54 -0700
Message-ID	<mn.09267dbb94fe8ed2.70216@a.com>
In reply to	#9332

Giovanni Azua wrote :
> Hello,
>
> I need to generate random TPCH workloads but I don't want a Uniform
> distribution where Q1, RF1, RF2 would be equally probable i.e. using new
> Random().nextInt(..)
>
> I instead would like to draw samples from a Dirichlet distribution which
> would give a very skewed PD depending on the input alphas (which specify how
> skewed I want the PD to be). The Matlab code would be exactly:
>
> function r = dirichlet_sample(a,n)
> p = length(a);
> r = gamrnd(repmat(a,n,1),1,n,p);
> r = r ./ repmat(sum(r,2),1,p);
> End
>
> But there is no "gamrnd" RandomGamma in Java nor RandomDirichlet. Before I
> spend time writing this, can anyone advice for a low-footprint library
> implementing those functions?

Google is your friend: java RandomDirichlet

jgibblda.sourceforge.net

-- 
Wojtek :-)

[toc] | [prev] | [next] | [standalone]

#9336

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-11-01 06:11 -0700
Message-ID	<6hrva7h1irrei62sm79ntmsr803c0np907@4ax.com>
In reply to	#9332

On Tue, 01 Nov 2011 10:09:32 +0100, Giovanni Azua
<bravegag@hotmail.com> wrote, quoted or indirectly quoted someone who
said :

>But there is no "gamrnd" RandomGamma in Java nor RandomDirichlet. Before I
>spend time writing this, can anyone advice for a low-footprint library
>implementing those functions?


That class of problem I have on my back shelf to investigate some day.
One way to proceed is to look at the source code for the Gaussian
random number generator to see how they warp the output from a linear
random to an arbitrary shape.  Look in src.zip.

I would love to figure this out to create a recipe for arbitrary
distrtibutions for http://mindprod.com/jgloss/peudorandom.html
 
I have a vaguely related little problem in my todo list.  
Canada has 308 members of parliament.  If there were no bias toward
males or females, what could you say to a novice about how far  from
50/50 it could it stray and still be accounted for by random
variation, not bias? Presumably using the 19 times of 20 beloved of
pollsters.


-- 
Roedy Green Canadian Mind Products
http://mindprod.com
It's difficult to be rigorous about whether a machine really knows,
thinks, etc., because we’re hard put to define these things. 
We understand human mental processes only slightly better than
a fish understands swimming. 
~ John McCarthy (born: 1927-09-04 died: 2011-10-23 at age: 84).
Inventor of the term AI (Artificial Intelligence), 
the short-circuit OR operator (|| in Java), 
and LISP (LIst Processing Language) that makes EMACS 
(Extensible MACro System) so addictive.

[toc] | [prev] | [next] | [standalone]

#9337

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-11-01 06:16 -0700
Message-ID	<35sva75ih3to958atf37gdnse3k62ehkej@4ax.com>
In reply to	#9336

On Tue, 01 Nov 2011 06:11:22 -0700, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>I have a vaguely related little problem in my todo list.  
>Canada has 308 members of parliament.  If there were no bias toward
>males or females, what could you say to a novice about how far  from
>50/50 it could it stray and still be accounted for by random
>variation, not bias? Presumably using the 19 times of 20 beloved of
>pollsters.


I thought a more convincing way to answer would be a Monte Carlo
Applet that simulated unbiased elections and tracked results,
gradually building up a bell shaped curse.  It would compare the
simulation with the theoretical calculation.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
It's difficult to be rigorous about whether a machine really knows,
thinks, etc., because we’re hard put to define these things. 
We understand human mental processes only slightly better than
a fish understands swimming. 
~ John McCarthy (born: 1927-09-04 died: 2011-10-23 at age: 84).
Inventor of the term AI (Artificial Intelligence), 
the short-circuit OR operator (|| in Java), 
and LISP (LIst Processing Language) that makes EMACS 
(Extensible MACro System) so addictive.

[toc] | [prev] | [next] | [standalone]

#9350

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2011-11-01 20:23 -0400
Message-ID	<j8q2o3$f4g$1@dont-email.me>
In reply to	#9336

On 11/1/2011 9:11 AM, Roedy Green wrote:
> [...]
> I would love to figure this out to create a recipe for arbitrary
> distrtibutions for http://mindprod.com/jgloss/peudorandom.html

     The completely general approach is to find the cumulative
distribution function F(x) = P(X < x), then find its inverse F[-1],
then evaluate X = F[-1](R) for a uniformly distributed R.  The
rest is "just numerical analysis" ...

> I have a vaguely related little problem in my todo list.
> Canada has 308 members of parliament.  If there were no bias toward
> males or females, what could you say to a novice about how far  from
> 50/50 it could it stray and still be accounted for by random
> variation, not bias? Presumably using the 19 times of 20 beloved of
> pollsters.

     Google "chi-squared".

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#9367

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-11-02 01:39 -0700
Message-ID	<m702b7lgaaka7og01tjumog0vanis9qu2l@4ax.com>
In reply to	#9350

On Tue, 01 Nov 2011 20:23:28 -0400, Eric Sosman
<esosman@ieee-dot-org.invalid> wrote, quoted or indirectly quoted
someone who said :

>     The completely general approach is to find the cumulative
>distribution function F(x) = P(X < x), then find its inverse F[-1],
>then evaluate X = F[-1](R) for a uniformly distributed R.  The
>rest is "just numerical analysis" ...

that's simpler than I thought it would be. The hard part is
discovering an inverse function.  I suppose you could always implement
one by interpolation of the original function.

I suppose I could do it for y = x + 1; as an example.

Do you have any free histogram code to recommend to demonstrate that
it works.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Capitalism has spurred the competition that makes CPUs faster and 
faster each year, but the focus on money makes software manufacturers 
do some peculiar things like deliberately leaving bugs and deficiencies
in the software so they can soak the customers for upgrades later.
Whether software is easy to use, or never loses data, when the company
has a near monopoly, is almost irrelevant to profits, and therefore 
ignored. The manufacturer focuses on cheap gimicks like dancing paper 
clips to dazzle naive first-time buyers. The needs of existing 
experienced users are almost irrelevant. I see software rental as the 
best remedy.

[toc] | [prev] | [next] | [standalone]

#9370

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2011-11-02 08:17 -0400
Message-ID	<j8rcdd$lji$1@dont-email.me>
In reply to	#9367

On 11/2/2011 4:39 AM, Roedy Green wrote:
> On Tue, 01 Nov 2011 20:23:28 -0400, Eric Sosman
> <esosman@ieee-dot-org.invalid>  wrote, quoted or indirectly quoted
> someone who said :
>
>>      The completely general approach is to find the cumulative
>> distribution function F(x) = P(X<  x), then find its inverse F[-1],
>> then evaluate X = F[-1](R) for a uniformly distributed R.  The
>> rest is "just numerical analysis" ...
>
> that's simpler than I thought it would be. The hard part is
> discovering an inverse function.  I suppose you could always implement
> one by interpolation of the original function.

     Interpolation, or any other approximation/calculation method.
There's a slight snag when F is not monotonically increasing (as
for any discrete distribution), because then F[-1] doesn't really
exist: It has vertical lines corresponding to F's horizontal
sections.  But that's easily dealt with: Replace each vertical with
a "closed" point at one end and an "open" point at the other, and
you've got a function that generates the desired answer.

     Most of the snazzy methods one learns for generating variates
from various distributions are, in essence, tricks or shortcuts that
avoid the need to evaluate F[-1] explicitly.  Even for some "simple"
distributions F[-1] can be unattractive and expensive to evaluate.
Even F itself might have no closed form in elementary functions,
never mind its inverse!  When random variates are required in great
quantity (as in Monte Carlo integration, for example), both the
speed and accuracy of generation become important -- which is what
motivates the search for these strange-looking calculations that
"just happen" to Do The Right Thing.

     Knuth describes several such methods, some in considerable
detail, in Volume II of "The Art of Computer Programming."

> I suppose I could do it for y = x + 1; as an example.

     Not sure what you mean by this.  To qualify as a cumulative
distribution function, the values of F must lie entirely in [0,1]
and F must be non-decreasing (that is, F(x1) <= F(x2) for all pairs
x1 < x2).

> Do you have any free histogram code to recommend to demonstrate that
> it works.

     Again, I'm not sure what you seek.  Dumping a bunch of samples
into a histogram and staring at the shape may suggest something, but
doesn't really "demonstrate" it.  Calculating a chi-squared statistic
can give an estimate of how likely it is that a particular histogram
shape would arise from a hypothesized distribution, but again that's
not what I'd call a "demonstration."

     Histograms with foreordained bin boundaries are pretty easy to
code.  Histograms with dynamically-chosen boundaries (percentiles, say)
are easy if you can afford to store the observations and sort them
afterward.  And (lemme look it up; ah, here) in the October 1985 issue
of CACM there's an article by Raj Jain and Imrich Chlamtac describing
how to get dynamically-chosen histogram *estimates* without storing
the samples.  I don't know if code's available; Jain and Chlamtac
certainly weren't writing Java in 1985!

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#9352

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2011-11-01 22:09 -0400
Message-ID	<4eb0a64d$0$294$14726298@news.sunsite.dk>
In reply to	#9332

On 11/1/2011 5:09 AM, Giovanni Azua wrote:
> I need to generate random TPCH workloads but I don't want a Uniform
> distribution where Q1, RF1, RF2 would be equally probable i.e. using new
> Random().nextInt(..)
>
> I instead would like to draw samples from a Dirichlet distribution which
> would give a very skewed PD depending on the input alphas (which specify how
> skewed I want the PD to be). The Matlab code would be exactly:
>
> function r = dirichlet_sample(a,n)
> p = length(a);
> r = gamrnd(repmat(a,n,1),1,n,p);
> r = r ./ repmat(sum(r,2),1,p);
> End
>
> But there is no "gamrnd" RandomGamma in Java nor RandomDirichlet. Before I
> spend time writing this, can anyone advice for a low-footprint library
> implementing those functions?

http://acs.lbl.gov/software/colt/
http://acs.lbl.gov/software/colt/api/cern/jet/random/Gamma.html

could be worth a try.

Arne

[toc] | [prev] | [standalone]

csiph-web

RandomDirichlet?

Contents

#9332 — RandomDirichlet?

#9333

#9345

#9334

#9336

#9337

#9350

#9367

#9370

#9352