Injury Analysis of Soccer Players with Python
Photo Credit: 2470279311/Shutterstock.com (license purchased)
Presentation Video
Introduction
Sports are an integral part of many people's lives. For some, sports are something to bet on, and for others fantasy leagues become an obsession. Then there’s the category I’m a part of: the die-hard fan. Given my interest in sports, I thought it would be fun to explore data from the English Premier League. The English Premier League (EPL) is one of the most demanding soccer leagues in the world, featuring high-intensity play, rigorous schedules, and some of the best athletes in the sport. It consists of 20 teams. At the end of the season, 3 teams are relegated to the league below, giving teams something to actually fight for at the bottom of the table, unlike American sports. With a season consisting of 38 matches per team, plus additional tournaments and international duties, some players can end up playing over 60 games in a single season. The downside of that relentless schedule is that injuries are inevitable.
But not all injuries are equal. Some players return to peak form after a brief recovery period, while others struggle to regain their previous level of performance. This analysis explores injury trends of seven teams during the 2023/24 season, examining the impact of injuries on player performance and their likelihood of recovery. The goal is to help managers and analysts make informed decisions about squad selection, substitutions, and player retention.
Data
The data that is being used is from Kaggle called ‘Player Injuries and Team Performance Dataset’. It follows seven teams in the English Premier League: Tottenham, Arsenal, Aston Villa, Newcastle, Burnley, Brentford, and Everton. The dataset has 42 columns that cover the player, the team, FIFA rating, the injury (when it happened, return, ratings prior to and after, and how the team did prior, during, and after). This project focuses on the team, injury, injury length, age, and player rating from the 23/24 Premier League season.
Question
Last season, the ending position of each team was the following:
- Arsenal: 2
- Aston Villa: 4
- Tottenham: 5
- Newcastle: 7
- Everton: 15
- Brentford: 16
- Burnley: 19
The questions to explore using this data are: How much impact do injuries have on a player’s ability to return to form? Is there data that can help managers make decisions about the future of the players on their team?
Analysis
Looking first at the breakdown of the injuries by team, followed by the most frequent injuries, and lastly the number of injuries players had in a season, we're able to find out which types of injuries had the most impact and the correlation between the number of injuries players had.
Number of Injuries, Average Age, and End of Season Ranking per Team
Looking at the bubble chart, we can see that Newcastle had the most injuries. On the other hand, Tottenham shows relatively few. When comparing the average age of injured players per team with the number of injuries, we can see that as the average age of injured players increases, the number of injuries increases. Arsenal and Everton are slightly out of line with the rest of the data because each team includes an outlier player. For Arsenal, the average age would have been 25, but because of the 31-year-old, their average age increased to 25.75. For Everton, the average age would have been 24.8, but because of the 38-year-old, their average age increased to 25.8. Otherwise, the two team's data would fit better with the rest of the data. Tottenham is significantly out of line with the rest of the data because 4 out of the team’s total of 6 injuries were incurred by the team’s oldest player.
Types of Injuries
The data was then broken down further to see which injuries occur the most. It was determined that ankle, knee and hamstring injuries were the most frequent injuries. To no surprise, the majority of injuries overall were leg related, but these 3 were the largest numbers of injuries. The heatmap shows us that Brentford and Newcastle suffered the most ankle injuries, Aston Villa and Newcastle suffered the most hamstring injuries, and Burnley suffered the most knee injuries. Overall, Newcastle had the most of these 3 injuries, followed closely behind by Brentford and Aston Villa.
Average Player Rating Before and After Injury
The boxplots show the average player rating by injury before and after, to see how well they were able to recover. We can see that players who had hamstring injuries were able to return to form or better. Looking at the mean, the average player rating increased by 0.235. However, players who had knee injuries were not able to return close to form. The mean of the average player rating for knee injuries decreased by 0.271, and we can see that the median is significantly higher in the before model than the after model. Lastly, when it comes to ankle injuries, it is difficult to determine if a player is capable of returning to form because the mean of the average player rating only increased by 0.048, but there is a significant outlier that is affecting the data. Therefore, we have to consider ankle injuries as a category to be determined on a case by case basis.
Percentage of the Three Injuries by Team
The bar graph shows the percentage of knee injuries, ankle injuries, and hamstring injuries, out of all of the injuries each team incurred. We can see that Burnley's injured players mostly had knee injuries. From the previous model, we know that players with knee injuries are significantly less likely to return to form, which could be a reason Burnley finished the season in 19th place. Another interesting fact is that the teams that suffered from hamstring injuries the most of the three (Newcastle, Aston Villa, Arsenal), all ended the season in the top 10.
Average Player Rating Before and After Injury by Age and Number of Injuries
The scatter plots above are broken down by age group (27 and older, under 27) and one injury vs. multiple injuries. The graphs on the left show that there is no correlation between the average player rating before and after the injury for multiple injuries. It appears for players 27 and older with one injury have a similar correlation as players 27 and older with multiple injuries. However, there is a player who had multiple injuries who is a significant outlier, possibly causing the correlation to be stronger than it is. When we look at the 2 graphs, we can see there is some direction for one injury, but really no direction for multiple injuries. The last graph, on the top right corner, represents players under 27 with one injury and it is clear that there is a much stronger correlation between average player rating before and after the injury. This means that players who are under 27 and have had only one injury in a season, are more likely to return to form or better.
Breakdown of Number of Injuries Per Team and Where They Ended the Season
Looking at teams that had multiple injuries, could it have benefited them to make those players substitutes instead of first-string players to end the season in a better position? If we analyze teams such as Newcastle (finished 7th) or Aston Villa (finished 4th), would we find that they could have performed better?
Newcastle:
- Wilson (32) had 5 injuries, 2 of which were hamstring injuries.
- Joelinton (27) had 3 injuries, 2 of which were hamstring injuries.
- Livramento (21) had 3 injuries, 2 of which were ankle injuries.
- Botman (24) had 3 injuries, 2 of which were knee injuries, one of which was an ankle injury.
Aston Villa:
- Ramsey (23) had 4 injuries, 1 of which was a hamstring injury.
- Durán (20) had 4 injuries, 1 of which was a hamstring injury.
- Cash (26) had 3 injuries, 2 of which were knee injuries, 1 of which was a hamstring injury.
Number of Injuries per Player, Type of Injury, Age, and Team
The two graphs above show which player had what type of injury, which team they play for, and how old they are. Based on all of the previous information, it would be best to look at players who have had multiple injuries and who had knee and ankle injuries, since those injuries were more difficult to return to form from.
From the first graph, it is clear that players under 27 are more likely to have ankle or knee injuries. We can also see that there are multiple players who had 3 or more injuries, out of which at least one was an ankle or one knee injury. It would be a good idea to question the form of players like Ramsey and Durán. Even though they didn't have an ankle or knee injury, they each had four injuries over the course of one season, which can cause disruption in a team dynamic. Botman, Livramento, and Mykolenko should be scrutinized since they've all had three injuries, including at least one ankle injury. Botman should get particular attention because he had two knee injuries and an ankle injury.
Looking at the second graph, there are far more hamstring injuries for players 27 and older. However, Lo Celso should be reconsidered more carefully since he has had four injuries, one of which was a knee injury. The last player to take into serious deliberation, is Wilson. While he has had no ankle or knee injuries, he has had five injuries in total, two of which were hamstring. While players are highly likely to return to form from hamstring injuries, the multitude of the injuries along with three other injuries would cause interference with the cohesiveness of the team.
Conclusions
- Players that have had only one injury are highly likely to return to form or better. Those are players that should be kept in the starting lineup.
- The more injuries a player has, the less likely he is to return to form.
- Hamstring injuries didn’t seem to have a significant impact on return to form. They only had an impact on how long the player was sidelined.
- Players that have had multiple injuries are less likely to return to form if they’ve had an ankle injury, but even less likely to return to form if they’ve had a knee injury. These are players that should be considered for substituting as opposed to starting. Depending on the age of the player, it may be necessary to drop them from the team.
Future Works
In the future, I would like to incorporate full team information: player’s age, ratings of healthy players to compare to players who get injured, which in turn can help sort teams that are weaker vs. players that did not recover well. I would also like to look at team performance to compare how it was at three different stages: before the injury, without the player, and after the injury. After my presentation, it was brought to my attention to also see how severe the injuries were, which is something I would like to also do for future works.