Model Selection for Linear Regression

Whenever you want to build a Machine Learning model, you have a set of p-dimensional inputs to start from. However not all of these inputs might be necessary to obtain the best predictive model. Moreover, using all of the p predictors might lead to overfitting problem, especially if the number of observations n is not […]

Time Series: why do we need Stationarity and Ergodicity

A time series is a series of data points indexed in time order, normally with equally spaced points in time. Examples of time series are stocks’ prices, monthly returns, company’s sales and so forth. Time series can be seen as data with a target variable (price, returns, amount of sales…) and one feature only: time. […]

Sports Analytics: an exploratory analysis of international football matches-Part 2

In my previous article (Part 1 of this series), I’ve been implementing some interesting visualization tools for a meaningful exploratory analysis. Then, with the Python package Streamlit, I made them interactive in the form of a web app. In this article, I’m going to continue working on the same dataset as before, this time focusing […]

Sports Analytics: an exploratory analysis of international football matches-Part 1

Data Science and Analytics have a huge variety of fields of applications, basically every time pieces of information are delivered in the form of data. The sports industry makes no exception. There is a great business all around, and having the possibility to study the market of sports via powerful analytics tools is a great […]

Cross-Validation for model selection

When you are dealing with a Machine Learning task, you have to properly identify your problem so that you can pick the most suitable algorithm. As first thing, namely, you could categorize your task either as supervised or unsupervised and, if supervised, either as classification or as regression (you can read more about it here). […]

Building a ML model in 3 lines of code? Yes you can

Machine Learning as a subject is not easy. It is indeed a set of tools (mainly algorithms and optimization procedures) whose comprehension involves, inevitably, a deep understanding of Maths and Stats. Nevertheless, the implementation of a ML model to a real scenario might be easier than expected. Indeed, once you got familiar with theoretical concepts, […]

Handling missing values with Missingo

Whenever you are about to inspect and manage some data, one of the first inconvenient which might arises is the presence of some missing values. Together with eventual outliers, they might affect the robustness of your Machine Learning model, it is worth spending some extra time during your cleaning procedure and investigating about the nature […]

Conditional Probability and Rare Events

Conditional probability refers to the probability of a generic event, given some extra information. More specifically, the conditional probability of one event A with respect to B: Expresses the probability of A given that B has occurred. If the two events are independent, the simple and conditional probability coincides (the occurrence of B has nothing […]

The Bias-Variance trade-off

Machine Learning models’ ultimate goal is making reliable predictions on new, unknown data. With this purpose in mind, we want our algorithm to capture relations in existing data and replicate them among new entries. At the same time, we do not want our algorithm to have, let’s say, prejudices because of the data it trained […]

Haar Cascade: Integral Image

Computer vision is a field of study which aims at gaining a deep understanding from digital images or videos. Combined with AI and ML techniques, today many industries are investing in researches and solutions of computer vision. In my article published on Towards Data Science (if you haven’t read it already, I recommend you to […]