Conditional probability refers to the probability of a generic event, given some extra information. More specifically, the conditional probability of one event A with respect to B:

Expresses the probability of A given that B has occurred. If the two events are independent, the simple and conditional probability coincides (the occurrence of B has nothing to do with that of A), otherwise:

Now, there are two further formulas which are widely used.

  • Law of Total Probability: it says that, given an event A and a series of event B1, B2,….Bn, the probability of A is given by:

Or, alternatively:

Indeed, from the definition of conditional probability, we know that:

We can easily visualize it with the following diagram:

  • Bayes Theorem of Conditional Probability: it states that

The full demonstration of this theorem can be found here.

Now we are going to use those two formulas to solve the following problem.

Imagine you work in a clinic and you want to perform a test on your patients to know whether they are ill. Now, your test will return two results: positive if it detects an ill, negative if it doesn’t. Those values, which are predicted, might or might not be equal to the reality of things. So, to evaluate the performance of your test you need an evaluation metric called Confusion Matrix.

This metric is used in classification tasks. It is a specific table layout that allows visualization of the performance of an algorithm by counting how many times predictions are equal to (or diverge from) actual values:

Let’s interpret it. First, for those of you who are familiar with Statistics, you might have recognized the common terminology of Hypothesis test: indeed, we have type I and II errors when the actual condition is negative (positive) and the predicted one is positive (negative).

But what does it mean ‘positive’ or ‘negative’?

In general, we refer to the positive condition when it corresponds to our null hypothesis, that is the status quo (the conservative situation). Otherwise, if we are facing the alternative hypothesis, rejecting the null, we are dealing with a ‘negative’ condition. Note that positive and negative do not necessary mean good or bad, as you will easily understand with our example:

H0: “The patient is ill”

H1: “The patient is not ill”

We can represent the confusion matrix as follows:

If we do not reject the null when our patient is ill, our confusion matrix will count one true positive. If we reject the null, thinking that a patient is perfectly sane when he’s not, we are facing a type I error: it is the worst scenario you could face, since your are sending home a patient who would need urgent interventions. On the other hand, type II error (accepting the null when it is false) is not as bad as the previous one: it is always better remaining within the conservative status if you have not enough confidence to accept the alternative. Finally, if both predicted and actual values are negative, our confusion matrix will compute this observation as true negative.

Now imagine our test is able to detect illness in 98% of the cases (it means that its true positive ratio, or sensibility, is equal to 0.98). Furthermore, it is able to categorize as healthy people who do not actually exhibit the disease in 99% of the cases (that means, its true negative ratio, or specificity, is 0.99). If we say that I is the event “illness”, + is the event “positive result” and – is the event “negative result”, we have the following data:


Let’s say that in our population the illness is present with an incidence of p.

We are interested in knowing the conditional probability of being ill, given that the test resulted positive. In other words, if a patient receives a positive result from the test, which is the probability that he/she is actually ill?

Namely, if the incidence of the disease is p=0.3, the probability P(I|+)=98%, which makes sense: we have an extremely accurate test, hence it should well segregate ill people from healthy people.

But what happens if the disease is extremely rare, let’s say p=0.1%? In that case, we have P(I|+)=9%.

How can that be possible with an extremely accurate test? It means that only 9% of those who have been categorized as ill actually exhibit the disease. The answer rely in the low probability of the event we are analyzing, which can be categorized as extremely rare. This situation, known as the false positive paradox, is an example of base rate fallacy.

It occurs when false positive tests are more probable than true positive tests, occurring when the overall population has a low incidence of a condition and the incidence rate is lower than the false positive rate.  Indeed, imagine we have a population of 1 million people to analyze. We have a test which, if the person is healthy, returns a positive result only in 1% of the cases (hence, it is very accurate). The disease we want to analyze is, again, very rare with p=0.1%. So, our test will return, in total, 10000 false positives, while the actual number of ill people is only 1000 (10 time smaller).

Published by valentinaalto

I'm a 22-years-old student based in Milan, passionate about everything related to Statistics, Data Science and Machine Learning. I'm eager to learn new concepts and techniques as well as share them with whoever is interested in the topic.

Join the Conversation

1 Comment

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: