Handling missing values with Missingo

Whenever you are about to inspect and manage some data, one of the first inconvenient which might arises is the presence of some missing values. Together with eventual outliers, they might affect the robustness of your Machine Learning model, it is worth spending some extra time during your cleaning procedure and investigating about the nature […]

Features Engineering: behind the scenes of ML algorithms

The majority of people (including me) tend to think that the core activity of building a Machine Learning algorithm is, doubtlessly, building the algorithm itself. Concretely, it means working with actual data, inferring their structure and make predictions. Well, it emerged from a survey of some years ago that data scientists normally spend 80% of […]

The Bias-Variance trade-off

Machine Learning models’ ultimate goal is making reliable predictions on new, unknown data. With this purpose in mind, we want our algorithm to capture relations in existing data and replicate them among new entries. At the same time, we do not want our algorithm to have, let’s say, prejudices because of the data it trained […]

Streaming analysis with Kafka, InfluxDB and Grafana

If you are dealing with the streaming analysis of your data, there are some tools which can offer performing and easy-to-interpret results. First, we have Kafka, which is a distributed streaming platform which allows its users to send and receive live messages containing a bunch of data (you can read more about it here). We will […]

Unsupervised Learning: PCA and K-means

Machine Learning algorithms can be categorized mainly into two bunches: supervised learning: we are provided with data which are already labeled, hence our aim will be finding, once provided with a new observation, its category (in case of a classification task) or its numerical value (in case of a regression task);unsupervised learning: in this case, […]

Mapping and building machine learning algorithms on geodata with R

Sometimes the very representation method of data, by itself, can provide a huge amount of information and might direct you towards a good analysis. In this article, I will dwell on some interesting plotting methods, provided by R, which are pivotal if you are facing geodata. I will use the famous NYC Taxi Dataset, which […]

How to set and deploy your machine learning experiment with R

The aim of this article is providing a foretaste of the potentiality of machine learning algorithms using R, following step-by-step a standard procedure that, once got familiar, could be a good starting point to design customized models. The idea behind each model, indeed, is the same. In a nutshell, it consists of finding an algorithm […]