Using Data to Evaluate MLB Pitchers with Visualizations

Posted on Aug 2, 2021

Data Context & Motivation:

By playing in one of the most popular American sports leagues, MLB players have the opportunity to earn millions in compensation for their performance. The average major league player’s salary, irrespective of position, is about $4.17 million. As you can see the data collected from the graph below, average salary was increasing from 2003-2017. Since 2017 average salary has plateaued between 4 and 4.5 million. However, some individual players can have contracts that pay tens of millions per year.

For example, New York Mets starting pitcher, Jacob deGrom, has an estimated payroll salary of $36 million in 2021, roughly 9 times the league average player. In fact, starting pitchers account for 10 of the top 13 largest payroll salaries in 2021. 

Data vis for MBL pitcher salary

Data vis for MBL pitcher salary

Figure 1: MLB Average Player Salary in from 2003-2021

Data Assumption

We can assume that a player’s salary is a function of past, current, and expected future performance. This begs the question, how can a pitcher’s performance be accurately measured, such that organizations have a reliable means to determine whether a pitcher's salary is justified relative to the organization's overall spending power?

Common Sense

In the context of starting pitchers, it is most common to measure performance by run suppression. Run suppression broadly refers to a pitcher’s ability to minimize the number of runs scored against their team. The most common statistic for quantifying run suppression is ERA (Earned Run Average), calculated as [(Total Earned Runs) / (Total Innings Pitched)] * 9. ERA represents the number of earned runs a pitcher allows per 9 innings pitched.

While ERA has undoubtedly been the standard in measuring pitcher effectiveness over the years, modern baseball analysts have since uncovered its flaws. In this blog, I aim to develop the argument against ERA as a reliable measurement for pitcher effectiveness. To do so, I analyzed individual starting pitcher aggregate statistics from 2017-2021. The data was easily gathered from and via CSV download, and imported into Python for visualization and analysis. At the time of analysis, the 2021 season was approximately halfway through.

Method of Calculation

We will be building our argument around BABIP (Batting Average on Balls in Play), calculated as BABIP = (H-HR)/(AB-K-HR+SF) where H = hits, HR = homeruns, AB = at-Bats , K = strikeouts, SF = sacrifice-flies. Simply stated, a ball in play is any at-Bat that does not result in a strikeout or homerun. By convention, walks, hit-by-pitches, and bases awarded by interference are not given credit as an at-Bat, and therefore are implicitly removed from the calculation as well. BABIP represents the rate at which balls put into the field of play fall for hits. 

Data Processing

How Are Runs Scored?

Generally speaking, when opposing hitters consistently put the ball in play and get on base, runs tend to follow. Therefore, a pitcher's ERA should be positively correlated with BABIP. The data figure below shows the relationship between ERA and BABIP, as well as the frequency distribution of individual BABIP.

We can assume BABIP is Normally distributed, while ERA and BABIP exhibit a moderately positive correlation. The relationship is likely not stronger due to the omission of home runs from the calculation of BABIP. Regardless, since ERA is positively correlated with BABIP, we must ask ourselves what makes a ball put in play more likely to fall for a hit. To answer this question, we consider BABIP's relationship with the following type/quality of contact measurements:

Average Launch Angle - Angle relative to the ground the ball exits an opposing hitter's bat on average.

Average Exit Velocity - Velocity at which the ball exits an opposing hitter's bat on average.

Barrel Rate - Rate at which the ball exits an opposing hitter's bat at minimum 98mph velocity with a minimum launch angle of 23-26 degrees.

Flyball Rate - Rate at which the ball exiting the opposing hitter's bat is high hit and to the outfield.

Groundball Rate - Rate at which the ball exiting the opposing hitter's bat bounces or rolls off the ground immediately after contact.

Popup Rate - Rate at which the ball exiting the opposing hitter's bat is high hit and in the infield.

Somewhat surprisingly, BABIP doesn't seem to have strong correlations with either of the above quality of contact measurements. The lack of correlation may imply BABIP is out of the pitcher's control, as the quality of contact is independent of anyone other than the pitcher and the hitter. Some pitchers may disproportionately benefit from good fortune on balls put in play simply because they have a better defense behind them than other pitchers do.

If our goal is to evaluate a pitcher's skill, it may make more sense to look at metrics uncorrelated with BABIP. In other words, we should seek out metrics that pitchers have more control over and remove any dependencies on defensive play.

So What Does The Pitcher Have Control Over?

Assuming an umpire who calls balls and strikes accurately, a pitcher has control over two things, strikeouts and walks. Figure 4 below shows the relationship between ERA and the following:

Strikeout Rate (K%), Walk Rate (BB%), and their difference(K-BB%).

As we can see from the data collected, ERA has a relatively strong negative correlation with K% and K-BB%. Our aim now is to estimate ERA while taking strikeouts, walks, and home runs into account, all of which are not in the calculation of BABIP. One statistic that attempts to do so is FIP (Fielding Independent Pitching), calculated as FIP = [13HR + 3(B + HBP) - 2*K] / IP + Constant where HR = home runs, B = walks, HBP = hit by pitch, K = strikeouts, IP = innings pitched. The constant is intended to bring FIP on the order of ERA and is a function of league-average ERA.

In figure 5 below, we can see that FIP is highly correlated with ERA and shows relatively no correlation with BABIP.

FIP does a reasonably good job at estimating ERA while minimizing dependence on defensive performance and BABIP. Moreover, it is clear from its formula that FIP minimizes by having a high K%, low BB%, and low HR total. Moreover, we can be more confident in a player with a minimal FIP and a large number of innings pitched. In Figure 6 below is a plot of FIP vs. K-BB%

Figure 6: FIP vs. K-BB% color and size weighted by innings pitched and pitches thrown respectively. Once again, each point represents an individual pitcher.

From a Management Perspective:

Here is a quick recap of what we have concluded so far:

1.) ERA is dependent on BABIP.

2.) BABIP is largely out of the pitcher's control, and thus has some luck associated with it. A pitcher's ERA seems to inherit some of the variance associated with BABIP, likely at no fault of his own.

3.) FIP can be a good estimate of ERA, and effectively takes out the dependence on BABIP & defensive performance. The difference between the two can be useful in identifying over and under performers.

4.) FIP has a strong negative correlation with K-BB%.


Data Interpretation

So as a manager, what are you supposed to do with these conclusions? I would suggest considering the difference between a pitcher's ERA and their FIP (E-F). A positive difference can imply that a pitcher is more effective than their ERA might suggest. Conversely, a negative difference implies that a pitcher is overperforming and potentially benefiting from good fortune on balls in play.

Potential interpretation

I do not intend to say that BABIP is solely responsible for the difference between ERA and FIP. However, it is a good starting point when determining why a pitcher may or may not be over or underperforming. In general, managers should be careful not to overpay overperformers. Underperformers can prove great value opportunities when considering player trades, free agency pickups, or contract discussions. Figure 7 below gives a comprehensive picture of what my proposed first step in evaluating pitchers should be.


Another suggestion is to emphasize strikeouts and walks when evaluating a pitcher. A deeper analysis regarding why a given pitcher is successful in the strikeout department can reveal which pitchers have a better chance at sustained success. Which pitchers are better prepared to weather the inevitable storms of poor luck on balls in play? The answer: the ones who strike batters out and don't let the ball get put in play.

Data vis for MBL pitcher salary

Data vis for MBL pitcher salary

Figure 7: Identifying over and underperformers by looking at E-F vs. FIP color weighted by BABIP. The best pitchers are in left half of the plot (low FIP). Ideally we seek pitchers in the top left quadrant of the plot, with a yellow color weight. These points represent underperforming pitchers w.r.t. ERA, with low FIP and poor fortune w.r.t. BABIP. These are the best value players.

Further Work To Be Done:

With the evolution of measurement technology, we find ourselves in an era of baseball full of highly detailed data. Specific pitch arsenals of any pitcher can be analyzed via metrics such as average spin rate, average movement, average velocity, or even how effectively a pitcher targets locations around the strike zone. To understand what makes a pitcher elite in terms of strikeouts, this is the necessary next step. I intend to do so as soon as time permits. In the meantime, thank you for reading this post all the way through! For anyone more interested in the interplay between ERA and FIP, check out this article posted on



The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI