In my last article, I’ve been writing about the spreading of COVID-19 without really inferring the structure of the process. I provided some visualization tools and interactive widgets to have an overview of the phenomenon throughout time. Here, I’m going to dwell on the modeling techniques which can be used to understand the diffusion of […]

# Author Archives: valentinaalto

## Model Selection for Linear Regression

Whenever you want to build a Machine Learning model, you have a set of p-dimensional inputs to start from. However not all of these inputs might be necessary to obtain the best predictive model. Moreover, using all of the p predictors might lead to overfitting problem, especially if the number of observations n is not […]

## Bootstrap sampling: an implementation with Python

Bootstrap methods are powerful techniques used in non-parametric statistics, that means, whenever we are provided with data drawn from an unknown distribution law. The underlying issue that bootstrap is meant to address is the well known problem of statistics: we want to collect information about a population, but we are provided only with a sample […]

## Interactive analytics and predictions on Restaurant tips

Imagine you own a restaurant and you want to analyze not only the trend of your revenue, but also the reason behind periods of particularly high earnings, moment of the day where a particular kind of clients comes to your restaurant, why some days tips are higher than others and so on. Knowing all those […]

## How to make animated charts with Plotly

In most of my previous articles, I’ve often been stretching the importance of visualizing the results obtained by a technical analysis. Ideally, your charts should be able to summarize in a glimpse what you have been working on for days. Plus, those charts have to do so in a way which is clear and comprehensible […]

## Time Series: why do we need Stationarity and Ergodicity

A time series is a series of data points indexed in time order, normally with equally spaced points in time. Examples of time series are stocks’ prices, monthly returns, company’s sales and so forth. Time series can be seen as data with a target variable (price, returns, amount of sales…) and one feature only: time. […]

## Interactive Convolutional Neural Network

Image recognition is one of the main topics Deep Learning is focusing on. Indeed, the family of algorithms entitled to deal with image recognition belongs to the class of Neural Networks, typical multi-layers algorithms employed in deep learning tasks. More specifically, image recognition employs Convolutional Neural Networks (CNNs), which I’ve been explaining in my previous […]

## Sports Analytics: an exploratory analysis of international football matches-Part 2

In my previous article (Part 1 of this series), I’ve been implementing some interesting visualization tools for a meaningful exploratory analysis. Then, with the Python package Streamlit, I made them interactive in the form of a web app. In this article, I’m going to continue working on the same dataset as before, this time focusing […]

## Optimization algorithms: the Newton Method

Predictive Statistics and Machine Learning aim at building models with parameters such that the final output/prediction is as close as possible to the actual value. This implies the optimization of an objective function, which might be either minimized (like loss functions) or maximized (like Maximum Likelihood function). The idea behind optimization routine is starting from […]

## Sports Analytics: an exploratory analysis of international football matches-Part 1

Data Science and Analytics have a huge variety of fields of applications, basically every time pieces of information are delivered in the form of data. The sports industry makes no exception. There is a great business all around, and having the possibility to study the market of sports via powerful analytics tools is a great […]