Annexures | Python Libraries - SampathVSS/DataAnalysis-IPL-Datasets-2008-to-2017 GitHub Wiki

This section will provide all the MySQL queries used to achieve the desired results

1. Quick Insights

1.1 Number of Matches conducted in each year and the period of the tournament

Code:

-- Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine

-- Database Credentials
host = "host name" database = "schema name" user = "user" password = "password"

-- Connecting to the database
-- Create the connection string using the provided credentials
connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"

-- Execute the query and load the data into a pandas DataFrame
with create_engine(connection_string).connect() as connection:
query = ''' Enter the Query '''
df = pd.read_sql(query, connection)

-- Print the DataFrame to verify the data
print(df)

Plotting in Matplotlib:

-- Initialize the plot
fig, ax = plt.subplots(figsize=(10, 8))

-- Set the width of the bars
bar_width = 0.35

-- Plotting the bar for total matches
bars1 = ax.bar(df["Season_Year"] - bar_width / 2, df["total_matches"], color="teal", width=bar_width, label="Total Matches")

-- Adding data labels to the first set of bars (total matches)
for bar in bars1:
bar_height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2, bar_height, round(bar_height), fontweight="bold", ha="center", va="bottom")

-- Plotting the bar for the period of the tournament
bars2 = ax.bar(df["Season_Year"] + (bar_width / 2), df["period_of_tournament"], color="darkblue", width=bar_width, label="Period of the Tournament")

-- Adding data labels to the second set of bars (period of the tournament)
for bar in bars2:
bar_height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2, bar_height, round(bar_height), fontweight="bold", ha="center", va="bottom")

-- Setting the title and labels for the axes
plt.title("Tournament Period vs Number of Matches")
plt.xlabel("Season Year")
plt.ylabel("Values")

-- Adjust the x-axis ticks to show each season year
plt.xticks(df["Season_Year"])

-- Ensure a tight layout for better spacing
plt.tight_layout()

-- Add legend to the plot
plt.legend()

-- Display the plot
plt.show()

Plotting in Seaborn:

-- Importing the Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sqlalchemy import create_engine

-- Plotting with Seaborn
fig, ax = plt.subplots(figsize=(12, 8))

-- Set the width of the bars
bar_width = 0.35

-- Positions of the bars on the x-axis
r1 = range(len(df))
r2 = [x + bar_width for x in r1]

-- Plot total matches
bars1 = ax.bar(r1, df['total_matches'], color='teal', edgecolor='teal', label='Total Number of Matches')

-- Plot period of the tournament
bars2 = ax.bar(r2, df['period_of_tournament'], color='darkblue', edgecolor='darkblue', label='Period of the Tournament (days)')

-- Adding labels to the bars for total matches
for bar in bars1:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2, height, round(height), ha='center', va='bottom', fontweight='bold')

-- Adding labels to the bars for period of the tournament
for bar in bars2:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2, height, round(height), ha='center', va='bottom', fontweight='bold')

-- Setting titles and labels
ax.set_title('Number of Matches vs Period of the Tournament')
ax.set_xlabel('Season Year')
ax.set_ylabel('Values')
ax.set_xticks(range(len(df)))
ax.set_xticklabels(df['Season_Year'])

-- Adding legend
ax.legend()

plt.tight_layout()
plt.show()

Explanation:

Importing Libraries: The necessary libraries for data manipulation (pandas), data visualization (matplotlib.pyplot), and database connection (sqlalchemy) are imported.

Database Credentials: Variables for database connection details such as host, database, user, and password are defined.

Connecting to the Database:

  • A connection string is created using the provided credentials.
  • Using a context manager (with), a connection to the database is established.
  • A SQL query is executed to fetch data, which includes the season year, total number of matches, start and end dates of the matches, and the period of the tournament (difference between end and start dates).
  • The fetched data is stored in a pandas DataFrame (df), which is then printed to verify its contents.

Creating a Multi-Stack Bar Graph:

  • A figure and axis are created using plt.subplots.
  • The width of the bars (bar_width) is defined.
  • Two sets of bars are plotted:
    The first set (bars1) represents the total number of matches per season.
    The second set (bars2) represents the period of the tournament per season.
  • Data labels are added on top of each bar to display their respective heights.
  • Titles and labels for the x and y axes are set.
  • The x-axis ticks are set to display each season year.
  • The layout is adjusted using plt.tight_layout for better spacing.
  • A legend is added to differentiate between the two sets of bars.
  • Finally, the plot is displayed using plt.show.

1.2 Number of matches conducted in each year and in each stadium

Code :

import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine

-- Database Credentials
host = '127.0.0.1'
database = 'ipl'
user = 'root'
password = 'sampu'

-- Connecting to database
connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"

with create_engine(connection_string).connect() as connection:
query = ''' Enter the Query ''' df = pd.read_sql(query, connection)

-- Sort the DataFrame by total_matches in descending order
df_sorted = df.sort_values(by='total_matches', ascending=True)

-- Creating Horizontal Bar Graph
plt.figure(figsize=(10,6))
bars = plt.barh(df_sorted["venue_name"], df_sorted["total_matches"], color="teal")
plt.xlabel("Total Matches")
plt.ylabel("Venue Name")
plt.title("Number of Matches conducted in each venue")

-- Adding data labels to the bars
for bar in bars:
bar_width = bar.get_width()
plt.text(bar_width, bar.get_y() + bar.get_height()/2, round(bar_width), ha="left", va="center", fontweight="bold")

plt.tight_layout()
plt.show()

1.3 Number of matches played by each team in each year in each stadium

Code :

-- Importing the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine

-- Database Credentials
host = 'Host'
database = 'Schema Name'
user = 'user name'
password = 'password'

-- Constructing the connection string for the database
connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"

-- Establishing a connection to the database
with create_engine(connection_string).connect() as connection:

-- SQL query to fetch the number of matches played by each team per year
query = '''Enter the Query '''

-- Executing the query and loading the data into a pandas DataFrame
df = pd.read_sql(query, connection)

-- Printing the DataFrame to verify the data
print(df)

-- Creating a horizontal bar graph to visualize the number of matches played by each team
plt.figure(figsize=(15, 12)) -- Setting the figure size
bar_width = 0.35 # Defining the bar width

-- Sorting the DataFrame by total matches in ascending order
df_sorted = df.sort_values(by="total_matches", ascending=True)

-- Creating horizontal bars
bars = plt.barh(df_sorted["team_name"], df_sorted["total_matches"], color="teal")

-- Adding data labels to each bar
for bar in bars:
bar_width = bar.get_width() -- Getting the width of each bar
plt.text(bar_width, bar.get_y() + bar.get_height() / 2, round(bar_width), fontweight="bold", ha="left", va="center")

-- Adding labels and title to the plot
plt.xlabel("Total Matches")
plt.ylabel("Team Name")
plt.title("Matches Played by Each Team")
plt.tight_layout() -- Adjusting the layout for better fit

-- Displaying the plot
plt.show()

1.4 Demographics of the Players

Code :

import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine

-- Database Credentials
host = '127.0.0.1'
database = 'ipl'
user = 'root'
password = 'sampu'

-- Connecting to database
connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"
with create_engine(connection_string).connect() as connection:
query = ''' Enter the Query ''' df = pd.read_sql(query, connection)

print(df)

-- Creating a pie chart
plt.figure(figsize=(10,6))
plt.pie(df["total_players"],autopct="%1.0f%%",startangle=140)
plt.title("Player Demographics")
plt.legend(df["Country_Name"])
plt.axis("equal")
plt.show()

-- Creating a bar graph
plt.figure(figsize=(12,12))
bars=plt.bar(df["Country_Name"],df["total_players"],color="teal")

for bar in bars:
bar_height = bar.get_height()
plt.text(bar.get_x()+bar.get_width()/2,bar_height,round(bar_height),fontweight="bold",ha="center",va="bottom")

plt.xlabel("Country Name")
plt.ylabel("Number of Players")
plt.title("Player Demographics")
plt.xticks(df["Country_Name"],rotation=45)
plt.show()

2. Key Performance Metrics

2.1 Tournament wise Key Metrics

Code: import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine

-- Database Credentials
host = '127.0.0.1'
database = 'ipl'
user = 'root'
password = 'sampu'

-- Connecting to database
connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"
with create_engine(connection_string).connect() as connection:
query = ''' Enter the Query '''
df = pd.read_sql(query, connection)

-- Create a new figure and axis for the bar plot
fig, ax1 = plt.subplots(figsize=(12, 8))

-- Plot the total runs as a bar graph
bars = ax1.bar(df["Season_year"], df["total_runs"], color='darkblue', width=0.45, label='Total Runs')

-- Add labels to the bars
for bar in bars:
bar_height = bar.get_height()
ax1.text(bar.get_x() + bar.get_width() / 2, bar_height, round(bar_height), ha='center', va='bottom')
-- Set labels for the x-axis and the left y-axis
ax1.set_xlabel('Season Year', fontsize=14)
ax1.set_ylabel('Total Runs', color='darkblue', fontsize=14)

-- Create a second y-axis for the line plot
ax2 = ax1.twinx()

-- Plot the average strike rate as a line graph
line = ax2.plot(df['Season_year'], df['avg_strike_rate'], color='red', linewidth=3, label='Average Strike Rate')

--Set the label for the right y-axis
ax2.set_ylabel('Average Strike Rate', color='red', fontsize=14)

--Set the title and adjust the layout
plt.title('Total Runs Vs Avg Strike Rate')
plt.tight_layout()

--Rotate x-axis labels for better readability
plt.xticks(df['Season_year'], fontsize=14, rotation=45)

-- Display the plot
plt.show()

Scatter Plot

Code: import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine
import seaborn as sns

-- Database Credentials
host = '127.0.0.1'
database = 'ipl'
user = 'root'
password = 'sampu'

  • Connecting to database
    connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"
    with create_engine(connection_string).connect() as connection:
    query = ''' Enter the Query '''
    df = pd.read_sql(query, connection)

-- Create a new figure and axis for the bar plot
plt.figure(figsize=(10,6))

sns.scatterplot(x='total_runs',y='avg_strike_rate',data=df,hue='Season_year',palette='viridis',sizes=(100,2000))

-- Adding data labels to the points

for i in range(df.shape[0]):
plt.annotate(
df['Season_year'][i],
(df['total_runs'][i],df['avg_strike_rate'][i]),
textcoords='offset points',
ha='center',
xytext=(5,5)
)

plt.title('Total Runs Vs Avg Strike Rate')
plt.xlabel('Total Runs Scored',fontsize=14)
plt.ylabel('Average Strike Rate',fontsize=14)
plt.tight_layout()
plt.grid()
plt.show()

2.2 Venue Wise Metrics

Code:

import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine
import seaborn as sns

-- Database Credentials
host = '127.0.0.1'
database = 'ipl'
user = 'root'
password = 'sampu'

-- Connecting to database
connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"
with create_engine(connection_string).connect() as connection:
query = ''' Enter the Query '''
df = pd.read_sql(query, connection)

print(df)

-- Plotting Graph
fig,ax=plt.subplots(figsize=(10,6))

-- Creating Vertical Bar Graph

Bar=sns.barplot(x='total_runs',y='venue_name',data=df,color='teal',width=0.45,label='Total Runs Scored')

-- Adding Datalabels in the Graph

for index, value in enumerate(df['total_runs']):
ax.text(value, index,f"{round(value)}",color='black',fontweight='bold',ha='left',va='center')

plt.title('Venue Wise Total Runs')
plt.xlabel('Total Runs',fontsize=14)
plt.ylabel('Venue Name',fontsize=14)
plt.legend()
plt.tight_layout()
plt.show()

2.3 Team wise Key Metrics

Code :

import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine
import seaborn as sns

-- Database Credentials
host = '127.0.0.1'
database = 'ipl'
user = 'root'
password = 'sampu'

-- Connecting to database
connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"
with create_engine(connection_string).connect() as connection:
query = ''' Enter the Query '''
df = pd.read_sql(query, connection)

print(df)

-- Plotting Graph
plt.figure(figsize=(20,6))

-- Creating Scatter Chart

sns.scatterplot(x='total_runs',y='avg_strike_rate',data=df,hue='Team_Name',palette='pastel',size='Team_Name',sizes=(100,2000),legend=False)

-- Adding data labels to the values

for i in range(df.shape[0]):
plt.annotate(df['Team_Name'][i],(df['total_runs'][i],df['avg_strike_rate'][i]),textcoords='offset points',xytext=(5,5),color="black",ha='center')

plt.title('Team Wise Performance')
plt.xlabel('Total Runs',fontsize=14)
plt.ylabel('Avg Strike Rate',fontsize=14)
plt.xticks()
plt.tight_layout()
plt.show()

2.4 Player wise Key Metrics

Code : import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine
import seaborn as sns

-- Database Credentials
host = '127.0.0.1'
database = 'ipl'
user = 'root'
password = 'sampu'

-- Connecting to database
connection_string = f"mysql+pymysql://{user}:{password}@{host}/{database}"
with create_engine(connection_string).connect() as connection:
query = ''' Enter the Query'''
df = pd.read_sql(query, connection)

-- Caluclating the Median and Mean

median = df['total_runs'].median()
mean = df['total_runs'].mean()

-- Creating Normal Distribution Graph

plt.figure(figsize=(12,6))
sns.kdeplot(df['total_runs'],fill=True,label="Total Runs")
plt.axvline(median,color='red',linestyle='--',label=f"Median Value: {median}")
plt.axvline(mean,color='darkblue',linestyle='-',label=f"Mean Value:{mean}")
plt.title('Normal Distribution of Total Runs')
plt.xlabel('Total Runs',fontsize=14)
plt.ylabel('density')
plt.xticks(range(1,int(df['total_runs'].max())+500,500))
plt.legend()
plt.tight_layout()
plt.show()