Customers segmentation with Unsupervised Algorithms

Unsupervised learning is that field of Machine Learning which deals with unlabeled data. It means that the final goal of our algorithm is not finding the proper membership of a new observation based on its features. In fact, our algorithm will be able only to segregates in two or more classes the available entries, based [...]

Visualizing the Deposits Multiplier with Python

In this article I'm going to propose a visual interpretation with Python of the so-called deposits multiplier. The latter is a macroeconomics indicator which describes how an initial deposit leads to a greater final increase in the total money supply. To fully understand how it works, we have to consider three actors in the market: [...]

Hypothesis tests with Python

In my previous article, I've been talking about statistical Hypothesis tests. Those are pivotal in Statistics and Data Science since we are always asked to 'summarize' the huge amount of data we want to analyze in samples. Once provided with samples, which can be arranged with different techniques, like Bootstrap sampling, the general purpose is [...]

Handling missing values with Missingo

Whenever you are about to inspect and manage some data, one of the first inconvenient which might arises is the presence of some missing values. Together with eventual outliers, they might affect the robustness of your Machine Learning model, it is worth spending some extra time during your cleaning procedure and investigating about the nature [...]

Multivariate Differential Calculus and Optimization-Part 2

In my previous article, I introduced some concepts which are necessary if we want to set an optimization problem in a multivariate environment. Here, we will first dwell on how to check the smoothness of a surface (which is the main assumption to deploy an optimization task), then we will see how to look for [...]

Multivariate Differential Calculus and Optimization-Part 1

Differential calculus is a powerful tool to find the optimal solution to a given task. When I say 'optimal solution', I'm referring to the result of the optimization of a given function, called objective function. This result might be either a maximum (namely, if your objective function describes your revenues) or a minimum (namely, if [...]

Understanding Geometric and Inverse Binomial distribution

In my previous article, I've been talking about two of the most popular probability distributions of discrete random variables: Bernoulli and Binomial. Here, I'm going dwell on their so-called 'counterparts', which are Geometric and Inverse Binomial. Both of them concerns the idea of a sequence of Bernoulli trials, hence it is worth it to recall [...]

Convergence of Random Variable

When we talk about convergence of random variable, we want to study the behavior of a sequence of random variables {Xn}=X1, X2,...,Xn,... when n tends towards infinite. Basically, we want to give a meaning to the writing: A sequence of random variables, generally speaking, can converge to either another random variable or a constant. However, [...]

Conditional Probability and Rare Events

Conditional probability refers to the probability of a generic event, given some extra information. More specifically, the conditional probability of one event A with respect to B: Expresses the probability of A given that B has occurred. If the two events are independent, the simple and conditional probability coincides (the occurrence of B has nothing [...]

Features Engineering: behind the scenes of ML algorithms

The majority of people (including me) tend to think that the core activity of building a Machine Learning algorithm is, doubtlessly, building the algorithm itself. Concretely, it means working with actual data, inferring their structure and make predictions. Well, it emerged from a survey of some years ago that data scientists normally spend 80% of [...]