Using Data Analytics to Manage Soccer

Posted on Jul 31, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Most soccer data sets are limited in what they provide: Goals, Shots, Fouls, Cards, and that's about it.Β  But there is so much more to the beautiful game of soccer than just those not-so-flavorful stats.Β  To start, the context under which those basic stats happen is very important!Β  For example, saying two different teams scored 6 out of 10 shots on goal does not draw a good picture of either team's actual abilities.

When you partner the shot statistics with the range of the shot, location on the pitch the shot was taken, and what was the outcome whether it was blocked, over the top, or back of the net, you add immeasurably more data insight into not only the abilities of the players taking these shots but also the flow of the game!


Goals, Goals, Goals!! It's all about Goals!

And Christiano Ronaldo Scores!!!Β  Again!! and Again! and Again!Β  He will surely carry us to the land of the cup finals and we together, on his back, will hoist that trophy high into the air!



Strikers scoring goals.Β  That is what the game is about, right? Well, nobody can argue it is what makes or breaks the excitement of the game.Β  However, the old adage: "Offense wins games, but Defense wins championships" has always been prevalent in my personal soccer career; let's see how well that stacks up for the pros.

Β Using Data Analytics to Manage Soccer

Using Data Analytics to Manage Soccer


As we can see from the graphs above, teams that score more goals generally have a higher win rate and conversely teams that are scored on more often generally have a lower win rate.Β  This makes sense, that is how you win at soccer after all: Score more than your opponent and tally one in the win column.Β  But does one have a larger effect over the other? Does scoring more on your opponents or being scored on have a larger impact on your win rate? We will first look at how a team's win rate changes based onΒ  goals scored.

The graph below shows Goals For vs. Win Rate and the size of each bubble depicts Goals Against as an added feature.

Win Rate

Using Data Analytics to Manage Soccer

As we might expect, teams who score more per game also have higher win rates.Β  As well as having larger bubbles, teams that were scored on more seem to be on the lower end of the win rate scale.Β  Let us now look at Goals Against vs Win Rate but this time Goals For is the added feature.


Again, what we would expect to see: more goals against, the lower the win rate as well as larger bubbles nearer the upper end of the win rate scale.Β  But the truly interesting thing to notice is comparing the graphs.Β  Look at the slope of Goals For vs Win Rate graph as compared to the slope of the Goals Against vs Win Rate graph.Β  The slope of each graph describes each goal's effect on win rate, and the sharper the grade of the slope, the more of an impact it has.

Alas, Goals against has much steeper slope than Goals for does! This means that being scored on has a much larger impact on a team's win rate than scoring more goals!Β  Not what we'd expect nor what makes for an energized crowd!


To paraphrase Michael Lewis in Moneyball: The Art of Winning an Unfair Game:Β  To the extent you can eliminate beliefs and bias and replace them with data, you gain a clear advantage.Β  If I were managing a team, I would be devoting more financial resources and recruiting efforts into the enlistment of upper tier defenseman instead of flashy strikers.

About Author

Will Thurston

Will is currently a student at New York City Data Science Academy. He graduated from Rochester Institute of Technology with a BS in Computer Security in 2016. He then spent the following year as a Network Engineer gaining...
View all posts by Will Thurston >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI