Using Data to Evaluate MLB Pitchers with Visualizations
Data Context & Motivation:
By playing in one of the most popular American sports leagues, MLB players have the opportunity to earn millions in compensation for their performance. The average major league player’s salary, irrespective of position, is about $4.17 million. As you can see the data collected from the graph below, average salary was increasing from 2003-2017. Since 2017 average salary has plateaued between 4 and 4.5 million. However, some individual players can have contracts that pay tens of millions per year.
For example, New York Mets starting pitcher, Jacob deGrom, has an estimated payroll salary of $36 million in 2021, roughly 9 times the league average player. In fact, starting pitchers account for 10 of the top 13 largest payroll salaries in 2021.
We can assume that a player’s salary is a function of past, current, and expected future performance. This begs the question, how can a pitcher’s performance be accurately measured, such that organizations have a reliable means to determine whether a pitcher's salary is justified relative to the organization's overall spending power?
In the context of starting pitchers, it is most common to measure performance by run suppression. Run suppression broadly refers to a pitcher’s ability to minimize the number of runs scored against their team. The most common statistic for quantifying run suppression is ERA (Earned Run Average), calculated as [(Total Earned Runs) / (Total Innings Pitched)] * 9. ERA represents the number of earned runs a pitcher allows per 9 innings pitched.
While ERA has undoubtedly been the standard in measuring pitcher effectiveness over the years, modern baseball analysts have since uncovered its flaws. In this blog, I aim to develop the argument against ERA as a reliable measurement for pitcher effectiveness. To do so, I analyzed individual starting pitcher aggregate statistics from 2017-2021. The data was easily gathered from Fangraphs.com and baseballsavant.mlb.com via CSV download, and imported into Python for visualization and analysis. At the time of analysis, the 2021 season was approximately halfway through.
Method of Calculation
We will be building our argument around BABIP (Batting Average on Balls in Play), calculated as BABIP = (H-HR)/(AB-K-HR+SF) where H = hits, HR = homeruns, AB = at-Bats , K = strikeouts, SF = sacrifice-flies. Simply stated, a ball in play is any at-Bat that does not result in a strikeout or homerun. By convention, walks, hit-by-pitches, and bases awarded by interference are not given credit as an at-Bat, and therefore are implicitly removed from the calculation as well. BABIP represents the rate at which balls put into the field of play fall for hits.
How Are Runs Scored?
Generally speaking, when opposing hitters consistently put the ball in play and get on base, runs tend to follow. Therefore, a pitcher's ERA should be positively correlated with BABIP. The data figure below shows the relationship between ERA and BABIP, as well as the frequency distribution of individual BABIP.
We can assume BABIP is Normally distributed, while ERA and BABIP exhibit a moderately positive correlation. The relationship is likely not stronger due to the omission of home runs from the calculation of BABIP. Regardless, since ERA is positively correlated with BABIP, we must ask ourselves what makes a ball put in play more likely to fall for a hit. To answer this question, we consider BABIP's relationship with the following type/quality of contact measurements:
Average Launch Angle - Angle relative to the ground the ball exits an opposing hitter's bat on average.
Average Exit Velocity - Velocity at which the ball exits an opposing hitter's bat on average.
Barrel Rate - Rate at which the ball exits an opposing hitter's bat at minimum 98mph velocity with a minimum launch angle of 23-26 degrees.
Flyball Rate - Rate at which the ball exiting the opposing hitter's bat is high hit and to the outfield.
Groundball Rate - Rate at which the ball exiting the opposing hitter's bat bounces or rolls off the ground immediately after contact.
Popup Rate - Rate at which the ball exiting the opposing hitter's bat is high hit and in the infield.
Correlation coefficients: (top left to right): -0.31,0.27,-0.13,
(bottom left to right): -0.26,0.22,-0.39
Somewhat surprisingly, BABIP doesn't seem to have strong correlations with either of the above quality of contact measurements. The lack of correlation may imply BABIP is out of the pitcher's control, as the quality of contact is independent of anyone other than the pitcher and the hitter. Some pitchers may disproportionately benefit from good fortune on balls put in play simply because they have a better defense behind them than other pitchers do.
If our goal is to evaluate a pitcher's skill, it may make more sense to look at metrics uncorrelated with BABIP. In other words, we should seek out metrics that pitchers have more control over and remove any dependencies on defensive play.
So What Does The Pitcher Have Control Over?
Assuming an umpire who calls balls and strikes accurately, a pitcher has control over two things, strikeouts and walks. Figure 4 below shows the relationship between ERA and the following:
Strikeout Rate (K%), Walk Rate (BB%), and their difference(K-BB%).
As we can see from the data collected, ERA has a relatively strong negative correlation with K% and K-BB%. Our aim now is to estimate ERA while taking strikeouts, walks, and home runs into account, all of which are not in the calculation of BABIP. One statistic that attempts to do so is FIP (Fielding Independent Pitching), calculated as FIP = [13HR + 3(B + HBP) - 2*K] / IP + Constant where HR = home runs, B = walks, HBP = hit by pitch, K = strikeouts, IP = innings pitched. The constant is intended to bring FIP on the order of ERA and is a function of league-average ERA.
In figure 5 below, we can see that FIP is highly correlated with ERA and shows relatively no correlation with BABIP.
FIP does a reasonably good job at estimating ERA while minimizing dependence on defensive performance and BABIP. Moreover, it is clear from its formula that FIP minimizes by having a high K%, low BB%, and low HR total. Moreover, we can be more confident in a player with a minimal FIP and a large number of innings pitched. In Figure 6 below is a plot of FIP vs. K-BB%
From a Management Perspective:
Here is a quick recap of what we have concluded so far:
1.) ERA is dependent on BABIP.
2.) BABIP is largely out of the pitcher's control, and thus has some luck associated with it. A pitcher's ERA seems to inherit some of the variance associated with BABIP, likely at no fault of his own.
3.) FIP can be a good estimate of ERA, and effectively takes out the dependence on BABIP & defensive performance. The difference between the two can be useful in identifying over and under performers.
4.) FIP has a strong negative correlation with K-BB%.
So as a manager, what are you supposed to do with these conclusions? I would suggest considering the difference between a pitcher's ERA and their FIP (E-F). A positive difference can imply that a pitcher is more effective than their ERA might suggest. Conversely, a negative difference implies that a pitcher is overperforming and potentially benefiting from good fortune on balls in play.
I do not intend to say that BABIP is solely responsible for the difference between ERA and FIP. However, it is a good starting point when determining why a pitcher may or may not be over or underperforming. In general, managers should be careful not to overpay overperformers. Underperformers can prove great value opportunities when considering player trades, free agency pickups, or contract discussions. Figure 7 below gives a comprehensive picture of what my proposed first step in evaluating pitchers should be.
Another suggestion is to emphasize strikeouts and walks when evaluating a pitcher. A deeper analysis regarding why a given pitcher is successful in the strikeout department can reveal which pitchers have a better chance at sustained success. Which pitchers are better prepared to weather the inevitable storms of poor luck on balls in play? The answer: the ones who strike batters out and don't let the ball get put in play.
Further Work To Be Done:
With the evolution of measurement technology, we find ourselves in an era of baseball full of highly detailed data. Specific pitch arsenals of any pitcher can be analyzed via metrics such as average spin rate, average movement, average velocity, or even how effectively a pitcher targets locations around the strike zone. To understand what makes a pitcher elite in terms of strikeouts, this is the necessary next step. I intend to do so as soon as time permits. In the meantime, thank you for reading this post all the way through! For anyone more interested in the interplay between ERA and FIP, check out this article posted on Fangraphs.com:
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.