Using Data Analytics to Manage Soccer
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Most soccer data sets are limited in what they provide: Goals, Shots, Fouls, Cards, and that's about it. But there is so much more to the beautiful game of soccer than just those not-so-flavorful stats. To start, the context under which those basic stats happen is very important! For example, saying two different teams scored 6 out of 10 shots on goal does not draw a good picture of either team's actual abilities.
When you partner the shot statistics with the range of the shot, location on the pitch the shot was taken, and what was the outcome whether it was blocked, over the top, or back of the net, you add immeasurably more data insight into not only the abilities of the players taking these shots but also the flow of the game!
Goals, Goals, Goals!! It's all about Goals!
And Christiano Ronaldo Scores!!! Again!! and Again! and Again! He will surely carry us to the land of the cup finals and we together, on his back, will hoist that trophy high into the air!
Strikers scoring goals. That is what the game is about, right? Well, nobody can argue it is what makes or breaks the excitement of the game. However, the old adage: "Offense wins games, but Defense wins championships" has always been prevalent in my personal soccer career; let's see how well that stacks up for the pros.
As we can see from the graphs above, teams that score more goals generally have a higher win rate and conversely teams that are scored on more often generally have a lower win rate. This makes sense, that is how you win at soccer after all: Score more than your opponent and tally one in the win column. But does one have a larger effect over the other? Does scoring more on your opponents or being scored on have a larger impact on your win rate? We will first look at how a team's win rate changes based on goals scored.
The graph below shows Goals For vs. Win Rate and the size of each bubble depicts Goals Against as an added feature.
As we might expect, teams who score more per game also have higher win rates. As well as having larger bubbles, teams that were scored on more seem to be on the lower end of the win rate scale. Let us now look at Goals Against vs Win Rate but this time Goals For is the added feature.
Again, what we would expect to see: more goals against, the lower the win rate as well as larger bubbles nearer the upper end of the win rate scale. But the truly interesting thing to notice is comparing the graphs. Look at the slope of Goals For vs Win Rate graph as compared to the slope of the Goals Against vs Win Rate graph. The slope of each graph describes each goal's effect on win rate, and the sharper the grade of the slope, the more of an impact it has.
Alas, Goals against has much steeper slope than Goals for does! This means that being scored on has a much larger impact on a team's win rate than scoring more goals! Not what we'd expect nor what makes for an energized crowd!
To paraphrase Michael Lewis in Moneyball: The Art of Winning an Unfair Game: To the extent you can eliminate beliefs and bias and replace them with data, you gain a clear advantage. If I were managing a team, I would be devoting more financial resources and recruiting efforts into the enlistment of upper tier defenseman instead of flashy strikers.