Where ERA Falls Short in Evaluating MLB Pitchers

Posted on Aug 2, 2021

Business Context & Motivation:

By playing in one of the most popular American sports leagues, MLB players have the opportunity to earn millions in compensation for their performance. The average major league player’s salary, irrespective of position, is about $4.17 million. As you can see in the graph below, average salary was increasing from 2003-2017. Since 2017 average salary has plateaued between 4 and 4.5 million. However, some individual players can have contracts that pay tens of millions per year. For example, New York Mets starting pitcher, Jacob deGrom, has an estimated payroll salary of $36 million in 2021, roughly 9 times the league average player. In fact, starting pitchers account for 10 of the top 13 largest payroll salaries in 2021. 

Figure 1: MLB Average Player Salary in from 2003-2021

We can assume that a player’s salary is a function of past, current, and expected future performance. This begs the question, how can a pitcher’s performance be accurately measured, such that organizations have a reliable means to determine whether a pitcher's salary is justified relative to the organization's overall spending power? In the context of starting pitchers, it is most common to measure performance by run suppression. Run suppression broadly refers to a pitcher’s ability to minimize the number of runs scored against their team. The most common statistic for quantifying run suppression is ERA (Earned Run Average), calculated as [(Total Earned Runs) / (Total Innings Pitched)] * 9. ERA represents the number of earned runs a pitcher allows per 9 innings pitched.

While ERA has undoubtedly been the standard in measuring pitcher effectiveness over the years, modern baseball analysts have since uncovered its flaws. In this blog, I aim to develop the argument against ERA as a reliable measurement for pitcher effectiveness. To do so, I analyzed individual starting pitcher aggregate statistics from 2017-2021. The data was easily gathered from Fangraphs.com and baseballsavant.mlb.com via CSV download, and imported into Python for visualization and analysis. At the time of analysis, the 2021 season was approximately halfway through.

We will be building our argument around BABIP (Batting Average on Balls in Play), calculated as BABIP = (H-HR)/(AB-K-HR+SF) where H = hits, HR = homeruns, AB = at-Bats , K = strikeouts, SF = sacrifice-flies. Simply stated, a ball in play is any at-Bat that does not result in a strikeout or homerun. By convention, walks, hit-by-pitches, and bases awarded by interference are not given credit as an at-Bat, and therefore are implicitly removed from the calculation as well. BABIP represents the rate at which balls put into the field of play fall for hits. 

How Are Runs Scored?:

Generally speaking, when opposing hitters consistently put the ball in play and get on base, runs tend to follow. Therefore, a pitcher's ERA should be positively correlated with BABIP. The figure below shows the relationship between ERA and BABIP, as well as the frequency distribution of individual BABIP.

We can assume BABIP is Normally distributed, while ERA and BABIP exhibit a moderately positive correlation. The relationship is likely not stronger due to the omission of home runs from the calculation of BABIP. Regardless, since ERA is positively correlated with BABIP, we must ask ourselves what makes a ball put in play more likely to fall for a hit. To answer this question, we consider BABIP's relationship with the following type/quality of contact measurements:

Average Launch Angle - Angle relative to the ground the ball exits an opposing hitter's bat on average.

Average Exit Velocity - Velocity at which the ball exits an opposing hitter's bat on average.

Barrel Rate - Rate at which the ball exits an opposing hitter's bat at minimum 98mph velocity with a minimum launch angle of 23-26 degrees.

Flyball Rate - Rate at which the ball exiting the opposing hitter's bat is high hit and to the outfield.

Groundball Rate - Rate at which the ball exiting the opposing hitter's bat bounces or rolls off the ground immediately after contact.

Popup Rate - Rate at which the ball exiting the opposing hitter's bat is high hit and in the infield.

Somewhat surprisingly, BABIP doesn't seem to have strong correlations with either of the above quality of contact measurements. The lack of correlation may imply BABIP is out of the pitcher's control, as the quality of contact is independent of anyone other than the pitcher and the hitter. Some pitchers may disproportionately benefit from good fortune on balls put in play simply because they have a better defense behind them than other pitchers do. If our goal is to evaluate a pitcher's skill, it may make more sense to look at metrics uncorrelated with BABIP. In other words, we should seek out metrics that pitchers have more control over and remove any dependencies on defensive play.

So What Does The Pitcher Have Control Over?

Assuming an umpire who calls balls and strikes accurately, a pitcher has control over two things, strikeouts and walks. Figure 4 below shows the relationship between ERA and the following:

Strikeout Rate (K%), Walk Rate (BB%), and their difference(K-BB%).

As we can see, ERA has a relatively strong negative correlation with K% and K-BB%. Our aim now is to estimate ERA while taking strikeouts, walks, and home runs into account, all of which are not in the calculation of BABIP. One statistic that attempts to do so is FIP (Fielding Independent Pitching), calculated as FIP = [13HR + 3(B + HBP) - 2*K] / IP + Constant where HR = home runs, B = walks, HBP = hit by pitch, K = strikeouts, IP = innings pitched. The constant is intended to bring FIP on the order of ERA and is a function of league-average ERA. In figure 5 below, we can see that FIP is highly correlated with ERA and shows relatively no correlation with BABIP.

FIP does a reasonably good job at estimating ERA while minimizing dependence on defensive performance and BABIP. Moreover, it is clear from its formula that FIP minimizes by having a high K%, low BB%, and low HR total. Moreover, we can be more confident in a player with a minimal FIP and a large number of innings pitched. In Figure 6 below is a plot of FIP vs. K-BB%

Figure 6: FIP vs. K-BB% color and size weighted by innings pitched and pitches thrown respectively. Once again, each point represents an individual pitcher.

From a Management Perspective:

Here is a quick recap of what we have concluded so far:

1.) ERA is dependent on BABIP.

2.) BABIP is largely out of the pitcher's control, and thus has some luck associated with it. A pitcher's ERA seems to inherit some of the variance associated with BABIP, likely at no fault of his own.

3.) FIP can be a good estimate of ERA, and effectively takes out the dependence on BABIP & defensive performance. The difference between the two can be useful in identifying over and under performers.

4.) FIP has a strong negative correlation with K-BB%.

So as a manager, what are you supposed to do with these conclusions? I would suggest considering the difference between a pitcher's ERA and their FIP (E-F). A positive difference can imply that a pitcher is more effective than their ERA might suggest. Conversely, a negative difference implies that a pitcher is overperforming and potentially benefiting from good fortune on balls in play. I do not intend to say that BABIP is solely responsible for the difference between ERA and FIP. However, it is a good starting point when determining why a pitcher may or may not be over or underperforming. In general, managers should be careful not to overpay overperformers. Underperformers can prove great value opportunities when considering player trades, free agency pickups, or contract discussions. Figure 7 below gives a comprehensive picture of what my proposed first step in evaluating pitchers should be.

Another suggestion is to emphasize strikeouts and walks when evaluating a pitcher. A deeper analysis regarding why a given pitcher is successful in the strikeout department can reveal which pitchers have a better chance at sustained success. Which pitchers are better prepared to weather the inevitable storms of poor luck on balls in play? The answer: the ones who strike batters out and don't let the ball get put in play.

Figure 7: Identifying over and underperformers by looking at E-F vs. FIP color weighted by BABIP. The best pitchers are in left half of the plot (low FIP). Ideally we seek pitchers in the top left quadrant of the plot, with a yellow color weight. These points represent underperforming pitchers w.r.t. ERA, with low FIP and poor fortune w.r.t. BABIP. These are the best value players.

Further Work To Be Done:

With the evolution of measurement technology, we find ourselves in an era of baseball full of highly detailed data. Specific pitch arsenals of any pitcher can be analyzed via metrics such as average spin rate, average movement, average velocity, or even how effectively a pitcher targets locations around the strike zone. To understand what makes a pitcher elite in terms of strikeouts, this is the necessary next step. I intend to do so as soon as time permits. In the meantime, thank you for reading this post all the way through! For anyone more interested in the interplay between ERA and FIP, check out this article posted on Fangraphs.com:


About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp