Game Log MLB Stats - R Shiny Analysis
Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren
The skills we demoed here are taught in NYC Data Science Academy's Data Science with Machine Learning bootcamp .
Baseball has become the most analytical professional sport. The extent of analytics in baseball has gone so far that some teams even use analytics to tell a player whether or not he should swing at a particular pitch. That's why I chose to use the a game log of almost every game in MLB history since 1910 for my data set. While this particular set of data doesnโt give any pitch-by-pitch stats, it just goes to show how easily this game can be broken down into numbers; each pitch can serve as the base case for all other statistics involved.
Challenge
One of the challenges of this project was organizing the game log data in a way that would be meaningful with respect to both the league as a whole and for individual players. Given the range of the data that spans over 100 years, over 35 different teams combined with stats for each game set for home team and away team, deriving some of the stats proved challenging.
For this particular project, I decided to visualize a couple parts of the game that have been discussed among commentators and baseball broadcasters over the last few years. The first part of the game that Iโm referring to is the rise of dominant pitching. A lot of this talk began in 2014 and 2015 when there were clusters of games where pitchers didnโt give up any hits (also known as a no-hitter).
Increase
There was also a noticeable increase in the size of the contracts for pitchers, which stirred the conversation as well. I wanted to visualize pitching stats in a way that portrays the evolution of the game based on the logs dating back to 1910. The first graph shown below is the average number of strikeouts per game, for one team. If you doubled these numbers, you would get the average total number of strikeouts in a game, but I found the smaller number to be more intuitive for people who know baseball.
The second graph I did was for season totals of no hitters. I chose to look at these numbers as part of the rise of dominant pitching. As you can see, there were a few recent years with a high total number of no hitters. However, overall, no hitters are too rare for there to be a noticeable steady increase as there is for strikeouts.
Of course, when it comes to looking at sports statistics, one of the common modes of visualization is through a player index. The set of data I was working with did not give any particular individual player statistics but I was able to derive the number of wins for particular pitchers. I would have loved to be able to recreate a classic player index with all of their stats. Unfortunately, the game log did not support it.
Final Part
The final part of the project was to look at how the rise of pitching has affected the hitting aspect of the game. If you know the game of baseball, you understand that each time a batter strikes out, the runners on base are unable to advance. A common ground ball out still leaves the possibility of a runner advancing and getting closer to scoring a run. On the other hand, a homerun will score a run no matter what.
SoI decided to take a look and see if there is a correlation between the strikeouts and the ratio of homeruns per runs scored, which is the final visualization. I also derived the league average for this statistic so teams could be compared to the league average.
To view the app: https://jdsipala.shinyapps.io/shinyProj/