In my previous article (Part 1 of this series), I’ve been implementing some interesting visualization tools for a meaningful exploratory analysis. Then, with the Python package Streamlit, I made them interactive in the form of a web app.
In this article, I’m going to continue working on the same dataset as before, this time focusing on the interaction between two teams. I will keep using Plotly as visualization tool, since it provides the possibility to interact with graphs and collect relevant information. Since I won’t attach the code of my previous article, if you are new to Streamlit I strongly recommend to read it before starting.
Now, as anticipated, I want to dwell on the matches between two teams of interest. So, let’s start by filtering our initial dataset (available here) with users’ multiselection:
import streamlit as st import pandas as pd import numpy as np import plotly.express as px import seaborn as sns import matplotlib.pyplot as plt import plotly.graph_objects as go from plotly.subplots import make_subplots st.title('Internationa Football matches') df = pd.read_csv("results.csv") st.subheader('Comparing 2 teams') teams_to_compare = st.multiselect('Pick your teams', df['home_team'].unique()) comparison = df[(df['home_team'].isin(teams)) & (df['away_team'].isin(teams)) ] comparison = comparison.reset_index(drop=True) st.write(comparison) st.write('Number of matches: ', len(comparison))
The object ‘teams_to_compare’ will be a list of two teams, and I’m interested in analyzing those matches where the two teams played one against the other (regardless of which one was playing at home). Then, I asked my app to show me the new filtered dataset together with the number of matches:

Here, I’m interested in all the matches England vs Scotland, and this is how my final dataset looks like.
Now let’s perform some analytics on these two teams.
First, I want to know which is the match with the highest intensity of play, which I decided to quantify as the total number of goals. So, I created a new Pandas series as the sum of the two ‘scores’ columns and then computed the index of the maximum value of that series.
st.subheader('Highest intensity of play') out_c = comparison.iloc[np.argmax(np.array(comparison['home_score']+comparison['away_score']))] st.write(out_c)

So, the most played match was that of the British Championship of 4/15/1961. With the same reasoning, you can investigate any kind of performance. Namely, you can ask to display the match with the highest gap in score between the two teams.
Now, I want to visualize the proportion of wins, losses and draws between my teams. For this purpose, I will use a Plotly pie chart:
team1_w = 0 team2_w = 0 teams_draw=0 team1_cum=[] team2_cum=[] for i in range(len(comparison)): if comparison['home_team'][i]==teams_to_compare[0]: if comparison['home_score'][i]>comparison['away_score'][i]: team1_w+=1 team1_cum.append(1) team2_cum.append(0) elif comparison['home_score'][i]<comparison['away_score'][i]: team2_w+=1 team1_cum.append(0) team2_cum.append(1) else: teams_draw+=1 team1_cum.append(0) team2_cum.append(0) else: if comparison['home_score'][i]<comparison['away_score'][i]: team1_w+=1 team1_cum.append(1) team2_cum.append(0) elif comparison['home_score'][i]>comparison['away_score'][i]: team2_w+=1 team1_cum.append(0) team2_cum.append(1) else: teams_draw+=1 team1_cum.append(0) team2_cum.append(0) comparison_labels = ['Team 1 wins','Team 2 wins','Draws'] comparison_values = [team1_w, team2_w, teams_draw] fig5 = go.Figure(data=[go.Pie(labels=comparison_labels, values=comparison_values)]) st.plotly_chart(fig5)

In the code above, I also defined two lists, team1_cum and team2_cum, so that I can inspect the path across time of wins of my two teams. So let’s build a line chart with buttons and sliders:
st.subheader('Cumulative wins of two teams') fig6 = go.Figure() fig6.add_trace(go.Scatter(x=list(new_df_wins['date']), y=np.cumsum(np.array(team1_cum)), name='team 1')) fig6.add_trace(go.Scatter(x=list(new_df_wins['date']), y=np.cumsum(np.array(team2_cum)), name='team 2')) # Add range slider fig6.update_layout( xaxis=go.layout.XAxis( rangeselector=dict( buttons=list([ dict(count=1, label="1m", step="month", stepmode="backward"), dict(count=6, label="6m", step="month", stepmode="backward"), dict(count=1, label="YTD", step="year", stepmode="todate"), dict(count=1, label="1y", step="year", stepmode="backward"), dict(step="all") ]) ), rangeslider=dict( visible=True ), type="date" ) ) st.plotly_chart(fig6)

Note: in the pie chart, it seemed that team2 (England) won the majority of matches against team1 (Scotland). So why from the line chart above it seems that, for the majority of time, Scotland dominated England? Well, the reason lies in the dataset: England and Scotland played the majority of their matches after 1910, hence it is consistent with the information collected before.
Furthermore, this graph is meaningful. Indeed, we see that up to 1910 (more or less), Scotland has always dominated England. What was the reason for this inversion of trend? One might be interested in focusing on this specific occurrence:

There are two further elementsI want to retrieve. First, I want to see how many times those matches have been played in each city. To do so, I will build a bar chart which displays, for each city, how many times that city occurred in my filtered dataset:
st.subheader('Frequency of city of matches') cities = comparison.groupby('city').count()['country'].index.values occurrences = comparison.groupby('city').count()['country'].values occurrences.sort() fig7 = go.Figure(go.Bar( x=occurrences, y=cities, orientation='h')) st.plotly_chart(fig7)

Second, I want to collect some information about types of tournament. The idea is plotting a bubble chart whose x and y coordinates are the home and away scores, the size represents the intensity of play of that match (sum of goals) and the color represents the type of tournament. Plus, in order to know which of my teams was playing at home, I will set as hover_name the home team, which will be displayed at the top of each bubble.
st.subheader('Tournament information') comparison['challenge']=np.array(comparison['home_score']+comparison['away_score']) fig8 = px.scatter(comparison, x="home_score", y="away_score", size="challenge", color="tournament", hover_name="home_team") st.plotly_chart(fig8)

The first glimpse of this graph shows how the matches with the highest number of goals seem to be those of the British Championship. Finally, let’s combine this information with that of the frequency of the type of tournament for each couple of teams:
tour = st.selectbox('Select a tournament', comparison['tournament'].unique()) comparison_t = comparison[comparison['tournament']==tour] per = len(comparison_t)/len(comparison) st.write(f"{round(per*100,2)}% of matches between the 2 teams have been played as {tour} matches")

So not only British Championship hosted the highest intensity matches, but also the highest number of matches between England and Scotland.
Again, as anticipated in my previous article, those are just few of the analytics you can build on your dataset. It really depends on the information you need, nevertheless a first explanatory insight is always a good starting point, since it might provide new intuitions and perspective of analysis.
I hope you enjoyed the reading!
References: