When we talk about convergence of random variable, we want to study the behavior of a sequence of random variables {Xn}=X1, X2,…,Xn,… when n tends towards infinite. Basically, we want to give a meaning to the writing:
A sequence of random variables, generally speaking, can converge to either another random variable or a constant.
However, there are three different situations we have to take into account:
- Convergence in Probability
- Convergence in Quadratic Mean
- Convergence in Distribution
Let’s examine all of them.
Convergence in Probability
A sequence of random variables {Xn} is said to converge in probability to X if, for any ε>0 (with ε sufficiently small):
Or, alternatively:
To say that Xn converges in probability to X, we write:
This property is meaningful when we have to evaluate the performance, or consistency, of an estimator of some parameters. Indeed, given an estimator T of a parameter θ of our population, we say that T is a weakly consistent estimator of θ if it converges in probability towards θ, that means:
Furthermore, because of the Weak Law of Large Number (WLLN), we know that the sample mean of a population converges towards the expected value of that population (indeed, the estimator is said to be unbiased). Hence:
Let’s visualize it with Python. I’m creating a Uniform distribution with mean zero and range between mean-W and mean+W. Knowing that the probability density function of a Uniform Distribution is:
We can see that the expected value is:
Hence, in our case:
So let’s visualize it:
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import uniform
np.random.seed(0)
N = (10 ** np.linspace(2, 4, 1000)).astype(int)
mu = 0
W = 2
rng = uniform(mu - 0.5 * W, W)
mu_estimate_mean = np.zeros(N.shape)
for i in range(len(N)):
x = rng.rvs(N[i])
mu_estimate_mean[i] = np.mean(x)
fig = plt.figure(figsize=(5, 3.75))
fig.subplots_adjust(hspace=0, bottom=0.15, left=0.15)
ax = fig.add_subplot(211, xscale='log')
ax.scatter(N, mu_estimate_mean, c='b', lw=0, s=4)
As you can see, the higher the sample size n, the closer the sample mean is to the real parameter, which is equal to zero.
Convergence in Quadratic Mean
A sequence of random variables {Xn} is said to converge in Quadratic Mean to X if:
And we write:
Again, convergence in quadratic mean is a measure of consistency of any estimator. Indeed, if an estimator T of a parameter θ converges in quadratic mean to θ, that means:
It is said to be a strongly consistent estimator of θ. An example of convergence in quadratic mean can be given, again, by the sample mean. Indeed, given a sequence of i.i.d. random variable with a given distribution, knowing its expected value and variance:
We want to investigate whether its sample mean (which is itself a random variable) converges in quadratic mean to the real parameter, which would mean that the sample mean is a strongly consistent estimator of µ. So we need to prove that:
Knowing that µ is also the expected value of the sample mean:
The former expression is nothing but the variance of the sample mean, which can be computed as:
Which, if n tens towards infinite, is equal to 0. Hence, the sample mean is a strongly consistent estimator of µ.
Convergence in Distribution
A sequence of random variables {Xn} with probability distribution Fn(x) is said to converge in distribution towards X, with probability distribution F(x), if:
And we write:
There are two important theorems concerning convergence in distribution which need to be introduced:
- Slutsky’s theorem, which states that, given two sequences Xn and Yn, with Xn converging in distribution to X and Yn converging in probability to the constant a, then:
- Central Limit Theorem, which states that, if X is a population of any distribution, with E(X)= µ and Var(X)= σ^2, then:
This latter is pivotal in statistics and data science, since it makes an incredibly strong statement. Indeed, more generally, it is saying that, whenever we are dealing with a sum of many random variable (the more, the better), the resulting random variable will be approximately Normally distributed, hence it will be possible to standardize it.
Furthermore, we can combine those two theorems when we are not provided with the variance of the population (which is the normal situation in real world scenarios). In other words, we’d like the previous relation to be true also for:
Where S^2 is the estimator of the variance, which is unknown. To do so, we can apply the Slutsky’s theorem as follows:
The convergence in probability of the last factor is explained, once more, by the WLLN, which states that, if E(X^4)<infinite, then: