## Model Selection for Linear Regression

Whenever you want to build a Machine Learning model, you have a set of p-dimensional inputs to start from. However not all of these inputs might be necessary to obtain the best predictive model. Moreover, using all of the p predictors might lead to overfitting problem, especially if the number of observations n is not …

## Optimization algorithms: the Newton Method

Predictive Statistics and Machine Learning aim at building models with parameters such that the final output/prediction is as close as possible to the actual value. This implies the optimization of an objective function, which might be either minimized (like loss functions) or maximized (like Maximum Likelihood function). The idea behind optimization routine is starting from …

## Maximum Likelihood Estimation

The main goal of statistical inference is learning from data. However, data we want to learn from are not always available/easy to handle. Imagine we want to know the average income of American women: it might be unfeasible or highly expensive to collect all American women’s income data. Besides, even in a scenario when this …

## Understanding Rejection Sampling method

Rejection sampling is a computational technique whose aim is generating random numbers from a target probability distribution f(x). It is related to the general field of MonteCarlo methods, whose core is generating repeated random sampling to make numerical estimation of unknown parameters. Some words about Randomness One might ask why a random variable with probability …

## One-way Analysis of Variance (ANOVA) with Python

When you are dealing with data which are presented to you in different groups or sub-populations, you might be interested in knowing whether they arise from the same population, or they represent different populations (with different parameters). Let’s consider the following picture: As you can see, there are three different footpaths. Now the question is: …

## Understanding Geometric and Inverse Binomial distribution

In my previous article, I’ve been talking about two of the most popular probability distributions of discrete random variables: Bernoulli and Binomial. Here, I’m going dwell on their so-called ‘counterparts’, which are Geometric and Inverse Binomial. Both of them concerns the idea of a sequence of Bernoulli trials, hence it is worth it to recall …

## Convergence of Random Variable

When we talk about convergence of random variable, we want to study the behavior of a sequence of random variables {Xn}=X1, X2,…,Xn,… when n tends towards infinite. Basically, we want to give a meaning to the writing: A sequence of random variables, generally speaking, can converge to either another random variable or a constant. However, …

## Conditional Probability and Rare Events

Conditional probability refers to the probability of a generic event, given some extra information. More specifically, the conditional probability of one event A with respect to B: Expresses the probability of A given that B has occurred. If the two events are independent, the simple and conditional probability coincides (the occurrence of B has nothing …

## Unbiased Estimators: Bessel’s correction demonstration

When we have a population X of data with dimension N, we are normally provided with a set (or vector) of parameters θ (for a generic parameter, we will use the notation θ) which describes some statistical characteristics of that population (namely, the mean μ). However, it is more common to deal with subsets of …