Data Study on Expectations of Referees in Top Soccer Leagues

Posted on Apr 27, 2019
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
"Know who you are dealing with, never offend the wrong person." -
Robert Greene (48 Laws of Power)

See my GitHub here


In the Bundesliga, LaLiga, Ligue 1, Premier League,  and Serie A, data records show the referees are appointed to certain games. How would knowing the habits of the referee appointed to your game help determine what to expect? What are the habits of the referees in each league? Should we adapt our game prep depending upon the referee?

To reference the quote from above, knowing what referee will be officiating my team's game will give insight into what my team should expect from the game.


The objective of this analysis was to determine the habits of referees across the top 5 soccer (football) leagues from the past 9 season. I wanted to determine how I could understand the habits determined from the analysis into practical application for teams. While this is only a preliminary analysis, much can be taken away.

Web Scraping Data

In order to compile all the necessary data, I used Selenium to scrape from I scraped season, league, referee name, games officiated, fouls per game, fouls per tackle, penalty per game, yellow cards per game, yellow cards total, red cards per game, and red card total. For this project, I chose to focus on: season, league, referee name, games officiated, fouls per game, and fouls per tackle.

I created a web scraper that would loop through the table, scraping all the data from multiple pages. Below is an example of the website for which I scraped.


Data Study on Expectations of Referees in Top Soccer Leagues

Data Cleaning

Once I extracted the information from individual CSV files, I used Python to clean the data in preparation for analysis. The steps I used:

  • I assigned each file into it's own DataFrame for seasonal and league analysis
    • This was intentionally done for specific analysis on the 2017/2018 season
  • I added two columns; one being the league and season the referees were a part of.

Data Study on Expectations of Referees in Top Soccer Leagues

  • I also created a DataFrame to track the average fouls per game from each league over the course of 9 seasons. **Which you will see as you continue to read.**

Data Analysis

**Fouls Per Game on Average**

To start my analysis, I wanted an idea of how often fouls are called per game on average. This was analyzed by season and then by league. I thought it was important to see the trends and how/if the league differed from season to season.

Data Study on Expectations of Referees in Top Soccer Leagues

As you can see, in 3 out of 5 leagues, there has been a big change in the amount of fouls called per game. The Premier League did change quite a bit from the 2009/10 to 2011/12 season, yet has remained relatively consistent. A similar thought could be proposed regarding LaLiga from the 2012/13 until 2017/18 season.

This gives a small glance into how referees interpret the rules and the evolution of each league's execution of the rules laid out by the League and FIFA.

**Average Fouls Per Tackle**

The next bit of information I wanted to address was fouls called per tackle. This ties back with one of my objectives which is to see what are the tendencies or habits of the referees during a game. Below is a table of the average fouls called per tackle.

To put all the information into perspective, the average percentage of tackles considered a foul by league:

  1. Bundesliga: 75%
  2. LaLiga: 71.2%
  3. Ligue 1: 73.6%
  4. Premier League: 60.9%
  5. Serie A: 80.3%

This informs my team that if they were to commit to a tackle in the Bundesliga, ~75% of the time it would be considered a foul. This informs me as a coach that I'll need to be acutely aware of my players' behaviors on the field and what to expect from referees in that league. In my opinion, referees are quick to call a foul in 4 out of 5 leagues. This doesn't include the context of the foul.

One item I want to address during this section is that depending upon behavior of my players, could have an influence into whether the referee will give him/her the benefit of the doubt in regards to the tackle. I reference again the quote from above, "know who you are dealing with, never offend the wrong person."

**Average Fouls Per Minute**

I was then curious how often are referees calling fouls. If referees in LaLiga are calling an average of 28 fouls per game, how many minutes go by until the next foul is called. To determine this:

  • I know a game is 90minutes long and the average fouls per game.
    • 90/(average fouls per game) = How Many Minutes Per Foul

I should expect on average 4.2 minutes of play before a foul is called in the Premier league, 3.5 minutes in Ligue 1, 3.4 minutes in Serie A, 3.2 minutes in Bundesliga and LaLiga. After a foul is called there is about 10-40 seconds of recovery before play resumes (context determined).

Ignoring the context of what kind of fouls are occurring and whether or not goals have been scored, I can determine that referees have a habit of calling a foul quite often (2 or 3 fouls might have been called by the time you finish reading this blog post). Knowing this kind of data should have an influence into training and game preparation.

**Correlation Between Ref Appearances and Fouls Per Game Called?**

I had an assumption that less seasoned referees were predisposed to calling more fouls per game in order to impress and hopefully officiate more games. This became unfounded. I used a linear regression plot to determine my hypothesis untrue.

As you can see that the error margin is wide. It turns out that there is no correlation between Fouls Per Game called and referee appearances (game a ref has officiated). This indicates to me that referees are interpreting the rules based upon their understanding. This led me to believe further that being on the right side of the referee could be favorable for my team.

Final Thoughts


Referees are quick to call a foul, meaning less playing time and opportunities to score.

Referees implement the rules based on their interpretation.

  • Being on the good side of the referee could be in our favor, meaning more playing time between fouls

Parts of the soccer training model could change.

  • If a foul occurs every 3-4 minutes with 10-40 seconds of rest, then training could mimic that.

About Author

Kyle Greeley

Kyle Greeley has a B.S. degree in Kinesiology from Springfield College. While teaching mathematics in Texas and NYC, Kyle discovered his real passion for understanding data. Through NYC Data Science Academy, Kyle has become a machine learning enthusiast...
View all posts by Kyle Greeley >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI