Managing Soccer Through Analytics

Posted on Jul 31, 2018


Most soccer datasets are limited in what they provide: Goals, Shots, Fouls, Cards, and that's about it.  But there is so much more to the beautiful game of soccer than just those not-so-flavorful stats.  To start, the context under which those basic stats happen is very important!  For example, saying two different teams scored 6 out of 10 shots on goal does not draw a good picture of either team's actual abilities.  When you partner the shot statistics with the range of the shot, location on the pitch the shot was taken, and what was the outcome whether it was blocked, over the top, or back of the net, you add immeasurably more insight into not only the abilities of the players taking these shots but also the flow of the game!


Goals, Goals, Goals!! It's all about Goals!

And Christiano Ronaldo Scores!!!  Again!! and Again! and Again!  He will surely carry us to the land of the cup finals and we together, on his back, will hoist that trophy high into the air!

Strikers scoring goals.  That is what the game is about, right? Well, nobody can argue it is what makes or breaks the excitement of the game.  However, the old adage: "Offense wins games, but Defense wins championships" has always been prevalent in my personal soccer career; let's see how well that stacks up for the pros.



As we can see from the graphs above, teams that score more goals generally have a higher win rate and conversely teams that are scored on more often generally have a lower win rate.  This makes sense, that is how you win at soccer after all: Score more than your opponent and tally one in the win column.  But does one have a larger effect over the other? Does scoring more on your opponents or being scored on have a larger impact on your win rate? We will first look at how a team's win rate changes based on  goals scored.  The graph below shows Goals For vs. Win Rate and the size of each bubble depicts Goals Against as an added feature.


As we might expect, teams who score more per game also have higher win rates.  As well as having larger bubbles, teams that were scored on more seem to be on the lower end of the win rate scale.  Let us now look at Goals Against vs Win Rate but this time Goals For is the added feature.

Again, what we would expect to see: more goals against, the lower the win rate as well as larger bubbles nearer the upper end of the win rate scale.  But the truly interesting thing to notice is comparing the graphs.  Look at the slope of Goals For vs Win Rate graph as compared to the slope of the Goals Against vs Win Rate graph.  The slope of each graph describes each goal's effect on win rate, and the sharper the grade of the slope, the more of an impact it has.  Alas, Goals against has much steeper slope than Goals for does! This means that being scored on has a much larger impact on a team's win rate than scoring more goals!  Not what we'd expect nor what makes for an energized crowd!


To paraphrase Michael Lewis in Moneyball: The Art of Winning an Unfair Game:  To the extent you can eliminate beliefs and bias and replace them with data, you gain a clear advantage.  If I were managing a team, I would be devoting more financial resources and recruiting efforts into the enlistment of upper tier defenseman instead of flashy strikers.

About Author

Will Thurston

Will is currently a student at New York City Data Science Academy. He graduated from Rochester Institute of Technology with a BS in Computer Security in 2016. He then spent the following year as a Network Engineer gaining...
View all posts by Will Thurston >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp