Convergence of Random Variable

When we talk about convergence of random variable, we want to study the behavior of a sequence of random variables {Xn}=X1, X2,...,Xn,... when n tends towards infinite. Basically, we want to give a meaning to the writing: A sequence of random variables, generally speaking, can converge to either another random variable or a constant. However, …

Conditional Probability and Rare Events

Conditional probability refers to the probability of a generic event, given some extra information. More specifically, the conditional probability of one event A with respect to B: Expresses the probability of A given that B has occurred. If the two events are independent, the simple and conditional probability coincides (the occurrence of B has nothing …

Features Engineering: behind the scenes of ML algorithms

The majority of people (including me) tend to think that the core activity of building a Machine Learning algorithm is, doubtlessly, building the algorithm itself. Concretely, it means working with actual data, inferring their structure and make predictions. Well, it emerged from a survey of some years ago that data scientists normally spend 80% of …

Unbiased Estimators: Bessel’s correction demonstration

When we have a population X of data with dimension N, we are normally provided with a set (or vector) of parameters θ (for a generic parameter, we will use the notation θ) which describes some statistical characteristics of that population (namely, the mean μ). However, it is more common to deal with subsets of …

The Bias-Variance trade-off

Machine Learning models' ultimate goal is making reliable predictions on new, unknown data. With this purpose in mind, we want our algorithm to capture relations in existing data and replicate them among new entries. At the same time, we do not want our algorithm to have, let's say, prejudices because of the data it trained …

Streaming analysis with Kafka, InfluxDB and Grafana

If you are dealing with the streaming analysis of your data, there are some tools which can offer performing and easy-to-interpret results. First, we have Kafka, which is a distributed streaming platform which allows its users to send and receive live messages containing a bunch of data (you can read more about it here). We will …

Haar Cascade: Integral Image

Computer vision is a field of study which aims at gaining a deep understanding from digital images or videos. Combined with AI and ML techniques, today many industries are investing in researches and solutions of computer vision. In my article published on Towards Data Science (if you haven't read it already, I recommend you to …

Computer Vision: Feature Matching with OpenCV

Computer vision is a field of study which aims at gaining a deep understanding from digital images or videos. Combined with AI and ML techniques, today many industries are investing in researches and solutions of computer vision. Namely, think about the security procedures in the Airport: when you have to exhibit your passport, it is …

Twitter sentiment analysis with Tweepy

The world of social networks could be considered, today, one of the largest free data source available in the market. When you think about Big Data, probably the first example that comes to your mind is Twitter. Like many other social networks, Twitter allows its users to post, comment, like and follow, to express their …

Unsupervised Learning: PCA and K-means

Machine Learning algorithms can be categorized mainly into two bunches: supervised learning: we are provided with data which are already labeled, hence our aim will be finding, once provided with a new observation, its category (in case of a classification task) or its numerical value (in case of a regression task);unsupervised learning: in this case, …