Generating randoms from a specified CDF

Posted on March 9, 2017

0


This post deals with generating random numbers given a CDF (Cumulative distribution function). CDF may be specified as an analytical function or as a table of values. We also assume that we have a source of pseudo-random uniformly distributed numbers.

Probability Integral Transform

At the core of this issue is the ‘Probability Integral Transform’.  It states that, if \bf{X}  is a continuous random variable with cumulative distribution function F_X , then the random variable Y=F_{X}(X) has a uniform distribution on [0, 1] .

This statement might seem a bit abstract or rather too simple. Let me try and break it down. Let’s say, you use np.random.exponential to generate a say 10,000 random numbers drawn from exponential distribution. Note, the cdf of exponential is a simple analytic function, F(p) = 1 - e^{-\lambda p} . So, if you transform all those 10,000 randoms you just generated using this function F(p) , the resulting 10,000 numbers you get will be uniformly distributed. In other words, if you plot an histogram of these resulting numbers after transformation, you should see something like in figure below.

Here, is a short Python script to demo what I am saying,


import numpy as np
import matplotlib.pyplot as plt

def cdf_of_exponential( x, scale=1.0 ): #only for x>0
    return 1.0 - np.exp(-scale*x)

_set = []
_transformed_set = []
for i in range(100000):
    expp = np.random.exponential() #exponential dist randoms
    texpp = cdf_of_exponential( expp )

    _set.append( expp )
    _transformed_set.append( texpp )

plt.hist( _set, 100 )
plt.figure()
plt.hist( _transformed_set )
plt.show()

Screenshot from 2017-03-08 23-13-38

The plot shown of the left is the histogram of the random number generated. The figure on the right is the histogram of the transformed random numbers, ie. variable texpp. As can be seen, it uniformly distributed randoms between 0 and 1. This is exactly what the statement said.

Proof of Probability Integral Transform

The formal proof of this is followed as borrowed from its wikipedia page.

Given any random variable \bf{X} , define $latex Y = \bf{F_x(X)}. Then:

 \begin{align} F_Y (y) &= \operatorname{P}(Y\leq y) \\         &= \operatorname{P}(F_X (X)\leq y) \\         &= \operatorname{P}(X\leq F^{-1}_X (y)) \\         &= F_X (F^{-1}_X (y)) \\         &= y \end{align}

\bf{F_Y}  is just the CDF of a uniform(0,1)  random variable. Thus, \bf{Y} has a uniform distribution on the interval [0,1].

Inverse Probability Integral Transform

Turns out, the inverse is also true. What this means is that, if you are given random numbers drawn from a uniform distribution between 0 and 1. Then, if you transform these, using the inverse function of the cdf of the desired distribution, the resulting numbers are distributed according to the desired distribution. Inverse cdf is also known as the Quantile function.

Generating Kumaraswamy distributed random numbers

Let me illustrate it with a script. Say, I am interested in having random numbers drawn from Kumaraswamy distribution. The cdf of this distribution is 1 - (1-x^a)^b , a,b are scalar parameters. Accordingly, its inverse cdf is g(y) =  \sqrt[a]{1 - \sqrt[b]{1-y}} . So if y is a random number drawn from uniform random variable g(y) will be distributed according to the Kumaraswamy distribution.


import numpy as np
import matplotlib.pyplot as plt

# From uniforms, get Kumaraswamy distribution a=.5,b=0.5

def inv_cdf_kumaraswamy( y, a=2, b=1 ):
    return np.power( 1.0 - np.power( 1.0 - y, 1.0/b ), 1.0/a )

_set = []
_set_transformed = []
for i in range(100000):
    r = np.random.uniform( low=0, high=1 )

    _set.append( r )
    tr = inv_cdf_kumaraswamy( r, a=.5, b=.5 )
    _set_transformed.append( tr )

plt.hist( _set, 100 )
plt.figure()
plt.hist( _set_transformed, 100 )
plt.show()

Screenshot from 2017-03-09 14-21-23

The plot on the left is the histogram of uniformly distributed randoms. The plot on the right is the desired Kumarswamy distribution with parameters a=0.5, b=0.5, obtained from input uniform randoms.

Generating random numbers, specified a probability distribution function (PDF)

Often times, it is not possible to get a closed form of inverse cdf. One prime example is the Gaussian distribution. Its cdf is expressed in terms of the error function (erf). Which is available in a tabular form (although approximations in closed form do exists). In such a case table lookup is needed to compute the inverse function. Let me demo this bit.

Let us assume, we desired to generate random numbers between 0 and 10, drawn from a distribution whose pdf look as follows. This pdf is specified as a table of values [DOWNLOAD Python-npz]:

desired_pdf.png

Desired PDF

The way to obtain such random numbers involve computation of CDF from this table of values specifying the PDF.  The table contains locations and probability values. Note that locations are sorted. Computation of inverse CDF is just an inverse table lookup. Look at the script below to get a feel of it.


import numpy as np
import matplotlib.pyplot as plt

def inverse_cdf(cdf, y ):
for i in range(1,cdf.shape[0]):
if cdf[i-1] <=y and cdf[i] > y:
return i
return i #last index

A = np.load( 'prob_dist_func.npz')
locs = A['locs']
pdf = A['wts'] #Specified Probability distribution function, sum(pdf) should equal 1

cdf = np.cumsum(pdf) #Specified cummulative distribution function

plt.plot( locs, pdf )
plt.title( 'Probability distribution function' )
plt.figure()
plt.plot( locs, cdf)
plt.title( 'cumulative distribution function' )
plt.show()

#
_set = []
_set_transformed = []
for i in range(100000):
r = np.random.uniform( low=0, high=1 )

_set.append( r )
tr_index = inverse_cdf( cdf, r )
tr = locs[ tr_index ]
_set_transformed.append( tr )

plt.figure()
plt.hist( _set,100 )
plt.title( 'input uniform randoms' )
plt.figure()
plt.hist( _set_transformed, 100 )
plt.title( 'histogram of distribution, hopefully in desired distribution' )
plt.show()

histogram

Histogram of generated random numbers

Uniformly Distributed on n-Sphere

One related problem often encountered in practice is how to generate uniformly distributed unit vectors in d-dimensions. For example, we wish to generate (x_i,y_i) ; i=1 \cdots n, such that their 2d plot appears uniform. First approach may be to just independently generate points sets X, Y. But this clearly does not work, firstly because they are not unit normed.

The solution to this problem is simple, but subtly different. The solution is to generate normally distributed 3D points (n+1 – dimensional in general). Divide each point by its norm and select any of the 2 columns. The resulting set is uniformly distributed in unit-circle. Explanation to this is available here.

I will try and give more insight into this possibly later.

Uniformly distributed on unit circle

 

Conclusion

This is recipe to generate random numbers from a desired distribution and related theory.

Another related topic to this is to test if two distribution are from same/similar distribution. One possible naive way could be to test mean and variances, but alas, these randoms could be far from a Gaussian distribution and even if the means and variances match the distribution’s shapes (histograms) might look very different. This is where the Kolmogorov–Smirnov (KS) test comes into play.

Advertisements
Posted in: Research Blog