Expectations of Referees in Top 5 Soccer Leagues and How to Prepare

Posted on Apr 27, 2019
"Know who you are dealing with, never offend the wrong person." -
Robert Greene (48 Laws of Power)

See my GitHub here


In the Bundesliga, LaLiga, Ligue 1, Premier League,  and Serie A, the referees are appointed to certain games. How would knowing the habits of the referee appointed to your game help determine what to expect? What are the habits of the referees in each league? Should we adapt our game prep depending upon the referee?

To reference the quote from above, knowing what referee will be officiating my team's game will give insight into what my team should expect from the game.


The objective of this analysis was to determine the habits of referees across the top 5 soccer (football) leagues from the past 9 season. I wanted to determine how I could understand the habits determined from the analysis into practical application for teams. While this is only a preliminary analysis, much can be taken away.

Web Scraping

In order to compile all the necessary data, I used Selenium to scrape from whoscored.com. I scraped season, league, referee name, games officiated, fouls per game, fouls per tackle, penalty per game, yellow cards per game, yellow cards total, red cards per game, and red card total. For this project, I chose to focus on: season, league, referee name, games officiated, fouls per game, and fouls per tackle.

I created a web scraper that would loop through the table, scraping all the data from multiple pages. Below is an example of the whoscored.com website for which I scraped.


Data Cleaning

Once I extracted the information from individual CSV files, I used Python to clean the data in preparation for analysis. The steps I used:

  • I assigned each file into it's own DataFrame for seasonal and league analysis
    • This was intentionally done for specific analysis on the 2017/2018 season
  • I added two columns; one being the league and season the referees were a part of.

  • I also created a DataFrame to track the average fouls per game from each league over the course of 9 seasons. **Which you will see as you continue to read.**

Data Analysis

**Fouls Per Game on Average**

To start my analysis, I wanted an idea of how often fouls are called per game on average. This was analyzed by season and then by league. I thought it was important to see the trends and how/if the league differed from season to season.

As you can see, in 3 out of 5 leagues, there has been a big change in the amount of fouls called per game. The Premier League did change quite a bit from the 2009/10 to 2011/12 season, yet has remained relatively consistent. A similar thought could be proposed regarding LaLiga from the 2012/13 until 2017/18 season.

This gives a small glance into how referees interpret the rules and the evolution of each league's execution of the rules laid out by the League and FIFA.

**Average Fouls Per Tackle**

The next bit of information I wanted to address was fouls called per tackle. This ties back with one of my objectives which is to see what are the tendencies or habits of the referees during a game. Below is a table of the average fouls called per tackle.

To put all the information into perspective, the average percentage of tackles considered a foul by league:

  1. Bundesliga: 75%
  2. LaLiga: 71.2%
  3. Ligue 1: 73.6%
  4. Premier League: 60.9%
  5. Serie A: 80.3%

This informs my team that if they were to commit to a tackle in the Bundesliga, ~75% of the time it would be considered a foul. This informs me as a coach that I'll need to be acutely aware of my players' behaviors on the field and what to expect from referees in that league. In my opinion, referees are quick to call a foul in 4 out of 5 leagues. This doesn't include the context of the foul.

One item I want to address during this section is that depending upon behavior of my players, could have an influence into whether the referee will give him/her the benefit of the doubt in regards to the tackle. I reference again the quote from above, "know who you are dealing with, never offend the wrong person."

**Average Fouls Per Minute**

I was then curious how often are referees calling fouls. If referees in LaLiga are calling an average of 28 fouls per game, how many minutes go by until the next foul is called. To determine this:

  • I know a game is 90minutes long and the average fouls per game.
    • 90/(average fouls per game) = How Many Minutes Per Foul

I should expect on average 4.2 minutes of play before a foul is called in the Premier league, 3.5 minutes in Ligue 1, 3.4 minutes in Serie A, 3.2 minutes in Bundesliga and LaLiga. After a foul is called there is about 10-40 seconds of recovery before play resumes (context determined).

Ignoring the context of what kind of fouls are occurring and whether or not goals have been scored, I can determine that referees have a habit of calling a foul quite often (2 or 3 fouls might have been called by the time you finish reading this blog post). Knowing this kind of data should have an influence into training and game preparation.

**Correlation Between Ref Appearances and Fouls Per Game Called?**

I had an assumption that less seasoned referees were predisposed to calling more fouls per game in order to impress and hopefully officiate more games. This became unfounded. I used a linear regression plot to determine my hypothesis untrue.

As you can see that the error margin is wide. It turns out that there is no correlation between Fouls Per Game called and referee appearances (game a ref has officiated). This indicates to me that referees are interpreting the rules based upon their understanding. This led me to believe further that being on the right side of the referee could be favorable for my team.

Final Thoughts


Referees are quick to call a foul, meaning less playing time and opportunities to score.

Referees implement the rules based on their interpretation.

  • Being on the good side of the referee could be in our favor, meaning more playing time between fouls

Parts of the soccer training model could change.

  • If a foul occurs every 3-4 minutes with 10-40 seconds of rest, then training could mimic that.

About Author

Kyle Greeley

Kyle Greeley has a B.S. degree in Kinesiology from Springfield College. While teaching mathematics in Texas and NYC, Kyle discovered his real passion for understanding data. Through NYC Data Science Academy, Kyle has become a machine learning enthusiast...
View all posts by Kyle Greeley >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp