Data Analysis on MVP Voting
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Background on Our Data Analysis
With the rise of analytically focused websites such as Fangraphs and Baseball Prospectus, as well as the popularity of the book and movie Moneyball, the use of advanced data statistics has become much more mainstream. These metrics have begun to play a vital role in the roster decisions teams make and debates fans across the world have.
As the way teams and fans view and evaluate players has changed, I wanted to look into if the way writers view players when it comes time to vote for awards. At the end of each season, writers vote for an MVP in both the American and National League to recognize who was the best player. While the focus historically had been on the triple crown stats batting average, home runs, and RBI, I wanted to see if advanced stats have begun to play a larger role in these decisions.
For my analysis, I chose to scrape data from baseball-reference.com. For each year there is an awards page that contains data on the MVP voting for each league from 1950 to 2019. I looked at only the hitters, focusing on the aforementioned triple crown stats for the traditional stats and war for the modern stats. Additionally, I factored in how a team performed in a given year to see what impact that had.
I first wanted to look at how the league leader in the different metrics fared in the voting. To do this, I looked at the distribution of where the league leader in war, home runs, RBI, and average finished in the voting. The first thing that stood out is that the league leader in war historically did well in the voting, typically finishing around the top 5. Surprisingly, though, he wasn’t winning the award all that frequently and would occasionally fall outside of the top 10.
In the last 20 years though, particularly the 2010s, the league leader finished very high in the voting, generally in the top 3 and frequently did win the award. It was also very unlikely for these players to fall out of the top 10 and even top 5 in voting. The reverse trend appeared to be true for RBI. The league leader typically finished towards the top of the voting and had a decent chance of winning the award through the 1990s but have not fared as well since the new millennium began.
Next, I looked at how direct the correlation was between where a player ranked in the triple crown stats and where he finished in the voting, while also accounting for team quality. The team quality benchmark used is a 90 win pace as that is generally a standard of a very good team, and the number of games teams have played over the year has varied.
What is noticeable is that there was a clear trend where the higher a player ranked in these stats, the higher they finished in the MVP voting. Team quality also played a large factor in the voting results. Players on better teams have fared better in the voting, though the importance of it has shrunk over the years. These trends were also noticeable when performing the same analysis but focusing on a player’s rank in war instead.
Lastly, I wanted to see if there were any types of players that typically over or underperformed in MVP voting.
I determined a player's expected MVP finish to be where they ranked in the league in war so the high numbers are players who finished higher than their war rank, and lower numbers mean they finished lower. What I noticed is the top 10 overperformers were players who typically compiled very high home run and RBI totals, while the underperformers were typically very well-rounded players who provided a lot of value with baserunning and defense in addition to being strong hitters.
Data on Home Run
To look further into this, I examined extreme home run and RBI seasons. To determine an extreme season I looked for players who had home run or RBIi totals >1.5 standard deviations from the league average amongst those who received MVP for that year and were not top 10 in war.
It was very noticeable that these players did in fact perform much better in MVP voting than the war rank would indicate, particularly those on very good teams. While the group of players has not fared as well in recent years, the players are still finishing within the top ten in MVP voting, which would indicate there is a bias towards very high home run and/or rbi totals regardless of the player's overall value.
Data on the Underperformed
I also looked into extreme war seasons that weren’t paired with an elite offensive season. For this I looked at players who had a war > 1 standard deviation for league mean of MVP vote receivers who did not rank top 5 in any of the triple crown stats. It became very noticeable that these players were being undervalued in MVP voting, finishing outside of the top 10 in many cases. You can also see that in recent years players were much more likely to finish higher in the voting, indicating that the voters are weighing war more heavily in their decisions.
The conclusion I was able to draw from this is that while the traditional triple crown stats are still being heavily valued when it comes to voting for awards, there is also a clear trend towards relying more on more modern and advanced metrics. Going forward, it appears there will be more focus on a player's all around contribution, as opposed to just hitting. The value the player provides is key and also being sure that they get credit due to them even when their teammates don’t perform as well.