Project Purpose - aneeshp4/epl-database GitHub Wiki
Purpose
The purpose of this database is to provide statistical insights into the COVID affected 2019-2020 English Premier League season. This database can be used in multiple ways, either directly interacting with via SQL to get the most control over the data being provided, or via a web application, which is documented here. You can also follow the installation/implementation directions to have the database running on your own MySQL instance.
Answering Example Questions:
Given that the purpose of this database is to provide statistical insights, here are some questions this database can answer:
- Which teams have the most discrepancy in win rate between home and away games?
- How much more likely is it that a team winning at half-time wins the game?
- Do certain referees tend to give more bookings than other referees?
- Do high scoring matches exhibit any differences in average odds between weekdays and weekends?
Which teams have the most discrepancy in win rate between home and away games?
Query
SELECT
Team.TeamName,
SUM(CASE WHEN Matches.HomeTeam = Team.TeamID THEN FullTimeResults.FTHG - FullTimeResults.FTAG ELSE 0 END) AS TotalHomeGoalDifference,
SUM(CASE WHEN Matches.AwayTeam = Team.TeamID THEN FullTimeResults.FTAG - FullTimeResults.FTHG ELSE 0 END) AS TotalAwayGoalDifference,
SUM(CASE WHEN Matches.HomeTeam = Team.TeamID THEN FullTimeResults.FTHG - FullTimeResults.FTAG ELSE 0 END) -
SUM(CASE WHEN Matches.AwayTeam = Team.TeamID THEN FullTimeResults.FTAG - FullTimeResults.FTHG ELSE 0 END) AS TotalDiscrepancy
FROM Team
JOIN Matches ON Team.TeamID = Matches.HomeTeam OR Team.TeamID = Matches.AwayTeam
JOIN FullTimeResults ON Matches.MatchID = FullTimeResults.MatchID
GROUP BY Team.TeamName
ORDER BY TotalDiscrepancy DESC;
Results
Findings
This season, the premier league was a two-horse race between Manchester City and Liverpool. With that, the data shows that consistency in both home and away is crucial to be successful int he premier league. Liverpool have a discrepancy of 6 while Manchester City have a discrepancy of 4.
A rather surprising finding is that four teams actually performed better away from home than their home field and two of these four teams finished in the top 5 in Chelsea and Leicester City. This may beg the question of the fans of these teams and how significant their home court advantage really is. On the other end of the spectrum, Tottenham had a 17 goal discrepancy which shows the potential talent of the players in the squad but may question the mental strength to play away from home.
How much more likely is it that a team winning at half-time wins the game?
Query
SELECT
winning_half_and_full_time,
winning_at_half_time,
(winning_half_and_full_time / winning_at_half_time) * 100 AS winning_percentage
FROM (
SELECT
SUM(CASE WHEN (HTR = 'H' AND FTR = 'H') OR (HTR = 'A' AND FTR = 'A') THEN 1 ELSE 0 END) AS winning_half_and_full_time,
SUM(CASE WHEN HTR = 'H' OR HTR = 'A' THEN 1 ELSE 0 END) AS winning_at_half_time
FROM Matches
) AS subquery;
Results
Findings
The results suggest that teams that are winning at half-time are more likely to maintain their lead and win the game at full-time, with around 70.30% of half-time leading teams also winning the game. This information can provide insights into the significance of a half-time lead in predicting the eventual outcome of a match.
Do certain referees tend to give more bookings than other referees?
Query
SELECT Referee,
AVG(HY + AY) AS avg_total_yellow_cards,
AVG(HR + AR) AS avg_total_red_cards
FROM Matches
GROUP BY Referee
ORDER BY avg_total_yellow_cards DESC, avg_total_red_cards DESC;
Results
Findings
The results of the SQL query reveal that there are variations in the average number of bookings given by different referees in matches. Referees such as "M Dean," "S Attwell," and "A Taylor" exhibit relatively higher average total yellow card counts, ranging around 4.19 to 4.35 yellow cards per match. On the other hand, some referees like "P Bankes," "R Jones," and "M Oliver" have lower average yellow card counts, often around 3.00 or less. In terms of red cards, the variations are generally smaller, with referees averaging a range from no red cards to around 0.25 red cards per match.
These findings suggest that certain referees indeed tend to give more bookings than others, as indicated by their consistently higher or lower average yellow card counts. However, the reasons for these variations could stem from multiple factors, including refereeing styles, match dynamics, and the enforcement of rules.
Do high scoring matches exhibit any differences in average odds between weekdays and weekends?
Query
SELECT
CASE WHEN DAYOFWEEK(M.Date) IN (1, 7) THEN 'Weekend' ELSE 'Weekday' END AS day_type,
ROUND(AVG(O.AvgH), 2) AS avg_home_odds,
ROUND(AVG(O.AvgA), 2) AS avg_away_odds
FROM Odds O
JOIN Matches M ON O.MatchID = M.MatchID
WHERE M.FTHG + M.FTAG > 4
GROUP BY day_type;
Results
Findings
High-scoring matches, defined as those with a total goal count greater than 4, exhibit differences in average odds between weekdays and weekends. When considering the average odds for Home team victory, the results show that matches taking place on weekdays have an average home odds of approximately 3.99, while matches on weekends have a slightly higher average home odds of around 4.31. Additionally, when examining the average odds for Away team victory, matches on weekdays have an average away odds of about 6.90, whereas matches on weekends have a lower average away odds of around 4.12. These differences suggest that in high scoring games, home teams seem to have higher odds on weekends, since attendance of home team supporters may be higher on weekends when fewer people are at work. Similarly, away teams may have higher odds on weekdays since there may be fewer home team supporters in attendance of a match.