Cross-Validation for model selection

When you are dealing with a Machine Learning task, you have to properly identify your problem so that you can pick the most suitable algorithm. As first thing, namely, you could categorize your task either as supervised or unsupervised and, if supervised, either as classification or as regression (you can read more about it here). […]

Building a ML model in 3 lines of code? Yes you can

Machine Learning as a subject is not easy. It is indeed a set of tools (mainly algorithms and optimization procedures) whose comprehension involves, inevitably, a deep understanding of Maths and Stats. Nevertheless, the implementation of a ML model to a real scenario might be easier than expected. Indeed, once you got familiar with theoretical concepts, […]

Analyzing U.S. exports with Plotly

In my previous article, I’ve been providing an introduction to some useful graphical tools available in Plotly, an opensource library which can be used both in Python and R. Here, I’m going to play a bit more with Plotly’s functionalities, using as input some data about USA exports in 2011. So let’s import and explore […]

Understanding Rejection Sampling method

Rejection sampling is a computational technique whose aim is generating random numbers from a target probability distribution f(x). It is related to the general field of MonteCarlo methods, whose core is generating repeated random sampling to make numerical estimation of unknown parameters. Some words about Randomness One might ask why a random variable with probability […]

Ensemble Methods for Machine Learning: AdaBoost

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could not be obtained from any of the constituent learning algorithms alone. The idea of combining multiple algorithms was first developed by computer scientist and Professor Michael Kerns, who was wondering whether “weakly learnability is equivalent to strong learnability”. The goal was turning a weak […]

Combinatorics: permutations, combinations and dispositions

Combinatorics is that field of mathematics primarily concerned with counting elements from one or more sets. It can help us counting the number of orders in which something can happen. In this article, I’m going to dwell on three different types of techniques: permutationsdispositionscombinations Permutations Those are the easiest to compute. Imagine we have n objects, different among each others. […]

Some stylized facts about financial time series-with Python

Time series analysis is pivotal in financial markets, since it is mostly based on the analysis of stocks’ prices and the attempt of predicting their future values. In this article, I will dwell on some stylized facts about time series. For this purpose, I’m going to use the historical stock prices of Altaba. You can […]

One-way Analysis of Variance (ANOVA) with Python

When you are dealing with data which are presented to you in different groups or sub-populations, you might be interested in knowing whether they arise from the same population, or they represent different populations (with different parameters). Let’s consider the following picture: As you can see, there are three different footpaths. Now the question is: […]

5 Python Packages a Data Scientist can’t live without

Python is a general purpose language and, as such, it offers a great number of extensions which range from scientific programming to data visualization, from statistical tools to machine learning. It is almost impossible knowing every available extension, however there are a few of them which are pivotal if your task consists of analyzing data […]