Model Selection for Linear Regression

Whenever you want to build a Machine Learning model, you have a set of p-dimensional inputs to start from. However not all of these inputs might be necessary to obtain the best predictive model. Moreover, using all of the p predictors might lead to overfitting problem, especially if the number of observations n is not […]

Interactive analytics and predictions on Restaurant tips

Imagine you own a restaurant and you want to analyze not only the trend of your revenue, but also the reason behind periods of particularly high earnings, moment of the day where a particular kind of clients comes to your restaurant, why some days tips are higher than others and so on. Knowing all those […]

Optimization algorithms: the Newton Method

Predictive Statistics and Machine Learning aim at building models with parameters such that the final output/prediction is as close as possible to the actual value. This implies the optimization of an objective function, which might be either minimized (like loss functions) or maximized (like Maximum Likelihood function). The idea behind optimization routine is starting from […]

Building Machine Learning Apps with Streamlit

Streamlit is an open-source Python library that makes it easy to build beautiful apps for machine learning. You can easily install it via pip in your terminal and then start writing your web app in Python. In this article, I’m going to show some interesting features about Streamlit, building an app with the purpose of […]

Cross-Validation for model selection

When you are dealing with a Machine Learning task, you have to properly identify your problem so that you can pick the most suitable algorithm. As first thing, namely, you could categorize your task either as supervised or unsupervised and, if supervised, either as classification or as regression (you can read more about it here). […]

Building a ML model in 3 lines of code? Yes you can

Machine Learning as a subject is not easy. It is indeed a set of tools (mainly algorithms and optimization procedures) whose comprehension involves, inevitably, a deep understanding of Maths and Stats. Nevertheless, the implementation of a ML model to a real scenario might be easier than expected. Indeed, once you got familiar with theoretical concepts, […]

Understanding Rejection Sampling method

Rejection sampling is a computational technique whose aim is generating random numbers from a target probability distribution f(x). It is related to the general field of MonteCarlo methods, whose core is generating repeated random sampling to make numerical estimation of unknown parameters. Some words about Randomness One might ask why a random variable with probability […]

Ensemble Methods for Machine Learning: AdaBoost

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could not be obtained from any of the constituent learning algorithms alone. The idea of combining multiple algorithms was first developed by computer scientist and Professor Michael Kerns, who was wondering whether “weakly learnability is equivalent to strong learnability”. The goal was turning a weak […]

5 Python Packages a Data Scientist can’t live without

Python is a general purpose language and, as such, it offers a great number of extensions which range from scientific programming to data visualization, from statistical tools to machine learning. It is almost impossible knowing every available extension, however there are a few of them which are pivotal if your task consists of analyzing data […]

Customers segmentation with Unsupervised Algorithms

Unsupervised learning is that field of Machine Learning which deals with unlabeled data. It means that the final goal of our algorithm is not finding the proper membership of a new observation based on its features. In fact, our algorithm will be able only to segregates in two or more classes the available entries, based […]