When you are dealing with data which are presented to you in different groups or sub-populations, you might be interested in knowing whether they arise from the same population, or they represent different populations (with different parameters). Let's consider the following picture: As you can see, there are three different footpaths. Now the question is: [...]

## 5 Python Packages a Data Scientist can’t live without

Python is a general purpose language and, as such, it offers a great number of extensions which range from scientific programming to data visualization, from statistical tools to machine learning. It is almost impossible knowing every available extension, however there are a few of them which are pivotal if your task consists of analyzing data [...]

## Customers segmentation with Unsupervised Algorithms

Unsupervised learning is that field of Machine Learning which deals with unlabeled data. It means that the final goal of our algorithm is not finding the proper membership of a new observation based on its features. In fact, our algorithm will be able only to segregates in two or more classes the available entries, based [...]

## Visualizing the Deposits Multiplier with Python

In this article I'm going to propose a visual interpretation with Python of the so-called deposits multiplier. The latter is a macroeconomics indicator which describes how an initial deposit leads to a greater final increase in the total money supply. To fully understand how it works, we have to consider three actors in the market: [...]

## Hypothesis tests with Python

In my previous article, I've been talking about statistical Hypothesis tests. Those are pivotal in Statistics and Data Science since we are always asked to 'summarize' the huge amount of data we want to analyze in samples. Once provided with samples, which can be arranged with different techniques, like Bootstrap sampling, the general purpose is [...]

## Handling missing values with Missingo

Whenever you are about to inspect and manage some data, one of the first inconvenient which might arises is the presence of some missing values. Together with eventual outliers, they might affect the robustness of your Machine Learning model, it is worth spending some extra time during your cleaning procedure and investigating about the nature [...]

## Multivariate Differential Calculus and Optimization-Part 2

In my previous article, I introduced some concepts which are necessary if we want to set an optimization problem in a multivariate environment. Here, we will first dwell on how to check the smoothness of a surface (which is the main assumption to deploy an optimization task), then we will see how to look for [...]

## Multivariate Differential Calculus and Optimization-Part 1

Differential calculus is a powerful tool to find the optimal solution to a given task. When I say 'optimal solution', I'm referring to the result of the optimization of a given function, called objective function. This result might be either a maximum (namely, if your objective function describes your revenues) or a minimum (namely, if [...]

## Understanding Geometric and Inverse Binomial distribution

In my previous article, I've been talking about two of the most popular probability distributions of discrete random variables: Bernoulli and Binomial. Here, I'm going dwell on their so-called 'counterparts', which are Geometric and Inverse Binomial. Both of them concerns the idea of a sequence of Bernoulli trials, hence it is worth it to recall [...]

## Convergence of Random Variable

When we talk about convergence of random variable, we want to study the behavior of a sequence of random variables {Xn}=X1, X2,...,Xn,... when n tends towards infinite. Basically, we want to give a meaning to the writing: A sequence of random variables, generally speaking, can converge to either another random variable or a constant. However, [...]