Rejection sampling is a computational technique whose aim is generating random numbers from a target probability distribution f(x). It is related to the general field of MonteCarlo methods, whose core is generating repeated random sampling to make numerical estimation of unknown parameters.

## Some words about Randomness

One might ask why a random variable with probability density f(x) cannot be easily generated by a computer algorithm. Well, the fact is that those latter are deterministic by nature, hence it is unfeasible to pretend randomness from them. Indeed, computer algorithms generate so-called pseudo-random numbers: those algorithms pick them from a ‘list of numbers’ which is huge, but ordered in the same way every time. Hence, the source of randomness is the starting point. You can easily visualize it if, when creating random numbers with any software (here, I’ll use R), you specify the starting point (called ‘seed’):

```
set.seed(123)
sample(0:200,10,replace=T)
Output: 57 158 82 177 189 9 106 179 110 91
#let's generate other two samples, one without seed:
sample(0:200,10,replace=T)
Output: 192 91 136 115 20 180 49 8 65 191
#and one with the same seed as above:
set.seed(123)
sample(0:200,10,replace=T)
Output: 57 158 82 177 189 9 106 179 110 91
```

As you can see, as soon as we create two ‘random’ samples with the same starting point, they are the same.

Hence, as an assumption, we consider the only source of randomness the uniform distribution U(0,1), given by:

From that distribution, we can derive other random samples distributed with other densities with deterministic algorithms, which is exactly what we are going to do with rejection sampling.

## Rejection sampling

As anticipated, we want to draw a random sample having a probability distribution f(x), but we are not able to do that with a deterministic algorithm. Hence, we use a so-called proposal function g(x), from which it is easier to draw a random variable X. This function g(x), multiplied by a constant M, has to ‘envelope’ our f(x), so that M*g(x)>=f(x). Then, we generate another random variable U, distributed as U(0,1). We are now provided with a scatter of couples (x,y), where y=u*M*g(x) (so, it is our uniform variable scaled by M*g(x)). The idea is that, for all the points (x,y) which lie under the curve f(x), we accept the fact that x belongs to a random sample drawn from f(x) with probability f(x)/M*g(x). So, the decision rule is:

Note that the constant M is such that g(x) is as close as possible to f(x), so that the rejection area is the smallest possible. The optimal M is given by:

And 1/M represents the probability of acceptance.

Let’s visualize it step by step:

- Consider a probability density function f(x):

- Select a proposal function g(x) such that M*g(x)>=f(x):

- Generate the random variables X ~ g(x) and U~U(0,1). Then, consider all the points (x,y) (where y=u*M*g(x)) under the f(x) curve:

- Appending all the accepted points (those similar to the blue one) to a list which will be our random sample from a distribution f(x).

Now let’s see an implementation in R. For this purpose, I’m going to use a beta distribution with parameters 2,2. Since the beta distribution is given by:

And because the gamma-function present in the first ratio has the following property:

We can express our Beta probability distribution as:

Let’s visualize it in R:

```
x=seq(0,1, length=100)
plot(x, dbeta(x, 2, 2), ylab="density", type ="l", col='blue')
```

Now, ideally we can enclose this function into another function g(x), namely a uniform function from 0 to 1, multiplied by M=1.5 (since the beta(2,2) reaches its maximum at 1.5).

```
x=seq(0,1, length=100)
plot(x, dbeta(x, 2, 2), ylab="density", type ="l", col='blue')
lines(x, dunif(x, 0, 1)*1.5, col="red")
abline(v = 0, col="red")
abline(v = 1, col="red")
```

Now we consider all the points (x, y) (where y=u*M*g(x)) which lie under the blue curve and append them to our list called ‘final_sample’:

#let's first set our functions f=function(x) 6*x*(1-x) #target function g=function(x) x/x #proposal function (uniform(0,1) in vectorial form) random_g=function(n) runif(n,0,1) #our generator from U(0,1) of our rv X M=1.5 #constant n=1000 #size of the sample we want to draw from f(x) acceptance = function(f,M,g,random_g,n){ n_accept = 0 final_sample = rep(NA,n) #creating an empty list of length=size of sample while (n_accept<n) { x = random_g(4) #we can take always the same position, since each #generation will return a different number in that position u = runif(1,0,1) if (u*M*g(x)<=f(x)) { n_accept = n_accept+1 final_sample&[n_accept] = x #appending our accepted value to #the final sample } } final_sample } #let's store and plot the results vals = acceptance(f,M,g,random_g,n) hist(vals, breaks=30, freq=FALSE, main="Sample Density")

As you can see, the shape remembers a beta(2,2) distribution, but this time we have the (pseudo) certainty that those values has been randomly generated, that means:

- they reproduce independence, in the sense that Xi does not tell us anything about its successor Xi+1, nor it could have been predicted knowing its predecessor Xi-1;
- they are uniformly dispersed in [0,1]

The rejection sampling method is a powerful technique but it suffers from a *caveat*: it is inefficient, in the sense that our final sample will contain only some of all the generated points. In other words, we are asking our algorithm to perform some extra work whose results will not end up in our final sample.

That’s why rejection sampling is just one of the many techniques one could employ, among which we can quote, as an example of higher efficiency, the Inverse Transform Sampling.

References: