Data Study on Expectations of Referees in Top Soccer Leagues
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
"Know who you are dealing with, never offend the wrong person." -
Robert Greene (48 Laws of Power)
See my GitHub here
In the Bundesliga, LaLiga, Ligue 1, Premier League, and Serie A, data records show the referees are appointed to certain games. How would knowing the habits of the referee appointed to your game help determine what to expect? What are the habits of the referees in each league? Should we adapt our game prep depending upon the referee?
To reference the quote from above, knowing what referee will be officiating my team's game will give insight into what my team should expect from the game.
The objective of this analysis was to determine the habits of referees across the top 5 soccer (football) leagues from the past 9 season. I wanted to determine how I could understand the habits determined from the analysis into practical application for teams. While this is only a preliminary analysis, much can be taken away.
Web Scraping Data
In order to compile all the necessary data, I used Selenium to scrape from whoscored.com. I scraped season, league, referee name, games officiated, fouls per game, fouls per tackle, penalty per game, yellow cards per game, yellow cards total, red cards per game, and red card total. For this project, I chose to focus on: season, league, referee name, games officiated, fouls per game, and fouls per tackle.
I created a web scraper that would loop through the table, scraping all the data from multiple pages. Below is an example of the whoscored.com website for which I scraped.
Once I extracted the information from individual CSV files, I used Python to clean the data in preparation for analysis. The steps I used:
- I assigned each file into it's own DataFrame for seasonal and league analysis
- This was intentionally done for specific analysis on the 2017/2018 season
- I added two columns; one being the league and season the referees were a part of.
- I also created a DataFrame to track the average fouls per game from each league over the course of 9 seasons. **Which you will see as you continue to read.**
**Fouls Per Game on Average**
To start my analysis, I wanted an idea of how often fouls are called per game on average. This was analyzed by season and then by league. I thought it was important to see the trends and how/if the league differed from season to season.
As you can see, in 3 out of 5 leagues, there has been a big change in the amount of fouls called per game. The Premier League did change quite a bit from the 2009/10 to 2011/12 season, yet has remained relatively consistent. A similar thought could be proposed regarding LaLiga from the 2012/13 until 2017/18 season.
This gives a small glance into how referees interpret the rules and the evolution of each league's execution of the rules laid out by the League and FIFA.
**Average Fouls Per Tackle**
The next bit of information I wanted to address was fouls called per tackle. This ties back with one of my objectives which is to see what are the tendencies or habits of the referees during a game. Below is a table of the average fouls called per tackle.
To put all the information into perspective, the average percentage of tackles considered a foul by league:
- Bundesliga: 75%
- LaLiga: 71.2%
- Ligue 1: 73.6%
- Premier League: 60.9%
- Serie A: 80.3%
This informs my team that if they were to commit to a tackle in the Bundesliga, ~75% of the time it would be considered a foul. This informs me as a coach that I'll need to be acutely aware of my players' behaviors on the field and what to expect from referees in that league. In my opinion, referees are quick to call a foul in 4 out of 5 leagues. This doesn't include the context of the foul.
One item I want to address during this section is that depending upon behavior of my players, could have an influence into whether the referee will give him/her the benefit of the doubt in regards to the tackle. I reference again the quote from above, "know who you are dealing with, never offend the wrong person."
**Average Fouls Per Minute**
I was then curious how often are referees calling fouls. If referees in LaLiga are calling an average of 28 fouls per game, how many minutes go by until the next foul is called. To determine this:
- I know a game is 90minutes long and the average fouls per game.
- 90/(average fouls per game) = How Many Minutes Per Foul
I should expect on average 4.2 minutes of play before a foul is called in the Premier league, 3.5 minutes in Ligue 1, 3.4 minutes in Serie A, 3.2 minutes in Bundesliga and LaLiga. After a foul is called there is about 10-40 seconds of recovery before play resumes (context determined).
Ignoring the context of what kind of fouls are occurring and whether or not goals have been scored, I can determine that referees have a habit of calling a foul quite often (2 or 3 fouls might have been called by the time you finish reading this blog post). Knowing this kind of data should have an influence into training and game preparation.
**Correlation Between Ref Appearances and Fouls Per Game Called?**
I had an assumption that less seasoned referees were predisposed to calling more fouls per game in order to impress and hopefully officiate more games. This became unfounded. I used a linear regression plot to determine my hypothesis untrue.
As you can see that the error margin is wide. It turns out that there is no correlation between Fouls Per Game called and referee appearances (game a ref has officiated). This indicates to me that referees are interpreting the rules based upon their understanding. This led me to believe further that being on the right side of the referee could be favorable for my team.
Referees are quick to call a foul, meaning less playing time and opportunities to score.
Referees implement the rules based on their interpretation.
- Being on the good side of the referee could be in our favor, meaning more playing time between fouls
Parts of the soccer training model could change.
- If a foul occurs every 3-4 minutes with 10-40 seconds of rest, then training could mimic that.