Using Data to Evaluate MLB Pitchers with Visualizations

Robert Lando

Posted on Aug 2, 2021

Data Context & Motivation:

By playing in one of the most popular American sports leagues, MLB players have the opportunity to earn millions in compensation for their performance. The average major league player’s salary, irrespective of position, is about $4.17 million. As you can see the data collected from the graph below, average salary was increasing from 2003-2017. Since 2017 average salary has plateaued between 4 and 4.5 million. However, some individual players can have contracts that pay tens of millions per year.

For example, New York Mets starting pitcher, Jacob deGrom, has an estimated payroll salary of $36 million in 2021, roughly 9 times the league average player. In fact, starting pitchers account for 10 of the top 13 largest payroll salaries in 2021.

Data vis for MBL pitcher salary — Figure 1: MLB Average Player Salary in from 2003-2021

Data Assumption

We can assume that a player’s salary is a function of past, current, and expected future performance. This begs the question, how can a pitcher’s performance be accurately measured, such that organizations have a reliable means to determine whether a pitcher's salary is justified relative to the organization's overall spending power?

Common Sense

In the context of starting pitchers, it is most common to measure performance by run suppression. Run suppression broadly refers to a pitcher’s ability to minimize the number of runs scored against their team. The most common statistic for quantifying run suppression is ERA (Earned Run Average), calculated as [(Total Earned Runs) / (Total Innings Pitched)] * 9. ERA represents the number of earned runs a pitcher allows per 9 innings pitched.

While ERA has undoubtedly been the standard in measuring pitcher effectiveness over the years, modern baseball analysts have since uncovered its flaws. In this blog, I aim to develop the argument against ERA as a reliable measurement for pitcher effectiveness. To do so, I analyzed individual starting pitcher aggregate statistics from 2017-2021. The data was easily gathered from Fangraphs.com and baseballsavant.mlb.com via CSV download, and imported into Python for visualization and analysis. At the time of analysis, the 2021 season was approximately halfway through.

Method of Calculation

We will be building our argument around BABIP (Batting Average on Balls in Play), calculated as BABIP = (H-HR)/(AB-K-HR+SF) where H = hits, HR = homeruns, AB = at-Bats , K = strikeouts, SF = sacrifice-flies. Simply stated, a ball in play is any at-Bat that does not result in a strikeout or homerun. By convention, walks, hit-by-pitches, and bases awarded by interference are not given credit as an at-Bat, and therefore are implicitly removed from the calculation as well. BABIP represents the rate at which balls put into the field of play fall for hits.

Data Processing

How Are Runs Scored?

Generally speaking, when opposing hitters consistently put the ball in play and get on base, runs tend to follow. Therefore, a pitcher's ERA should be positively correlated with BABIP. The data figure below shows the relationship between ERA and BABIP, as well as the frequency distribution of individual BABIP.

era-vs-babip-089664-sbTyl4gj | Data Science Blog — Figure 2: (Left) Frequency distribution of individual BABIP, Shapiro Wilk test for Normality yielded a p-value of ~0.21 > 0.05.
(Right) Scatterplot of ERA vs. BABIP (correlation coefficient = 0.45), size weighted by total innings pitched. Each point represents an individual pitcher.

We can assume BABIP is Normally distributed, while ERA and BABIP exhibit a moderately positive correlation. The relationship is likely not stronger due to the omission of home runs from the calculation of BABIP. Regardless, since ERA is positively correlated with BABIP, we must ask ourselves what makes a ball put in play more likely to fall for a hit. To answer this question, we consider BABIP's relationship with the following type/quality of contact measurements:

Average Launch Angle - Angle relative to the ground the ball exits an opposing hitter's bat on average.

Average Exit Velocity - Velocity at which the ball exits an opposing hitter's bat on average.

Barrel Rate - Rate at which the ball exits an opposing hitter's bat at minimum 98mph velocity with a minimum launch angle of 23-26 degrees.

Flyball Rate - Rate at which the ball exiting the opposing hitter's bat is high hit and to the outfield.

Groundball Rate - Rate at which the ball exiting the opposing hitter's bat bounces or rolls off the ground immediately after contact.

Popup Rate - Rate at which the ball exiting the opposing hitter's bat is high hit and in the infield.

babip-vs-avg-launch-angle-172023-rcaARKyD | Data Science Blog — Figure 3: BABIP's relationship with several quality of contact measurements, color weighted by total innings pitched. Each point represents a single pitcher. Horizontal and vertical lines represent sample size averages w.r.t. the corresponding axis.
Correlation coefficients: (top left to right): -0.31,0.27,-0.13,
(bottom left to right): -0.26,0.22,-0.39

babip-vs-avg-exit-velo-118806-jNrv9bJk | Data Science Blog — Figure 3: BABIP's relationship with several quality of contact measurements, color weighted by total innings pitched. Each point represents a single pitcher. Horizontal and vertical lines represent sample size averages w.r.t. the corresponding axis.
Correlation coefficients: (top left to right): -0.31,0.27,-0.13,
(bottom left to right): -0.26,0.22,-0.39

Somewhat surprisingly, BABIP doesn't seem to have strong correlations with either of the above quality of contact measurements. The lack of correlation may imply BABIP is out of the pitcher's control, as the quality of contact is independent of anyone other than the pitcher and the hitter. Some pitchers may disproportionately benefit from good fortune on balls put in play simply because they have a better defense behind them than other pitchers do.

If our goal is to evaluate a pitcher's skill, it may make more sense to look at metrics uncorrelated with BABIP. In other words, we should seek out metrics that pitchers have more control over and remove any dependencies on defensive play.

So What Does The Pitcher Have Control Over?

Assuming an umpire who calls balls and strikes accurately, a pitcher has control over two things, strikeouts and walks. Figure 4 below shows the relationship between ERA and the following:

Strikeout Rate (K%), Walk Rate (BB%), and their difference(K-BB%).

era-vs-bb-672737-5Hgoh1YO | Data Science Blog — Figure 4: ERA vs. BB%, K%, and K-BB%, color and size weighted by total innings pitched and pitches thrown respectively. Correlation coefficients from left to right are 0.28,-0.64,-0.68. Each point represents an individual pitcher.

era-vs-k-621568-OyWMNwwd | Data Science Blog — Figure 4: ERA vs. BB%, K%, and K-BB%, color and size weighted by total innings pitched and pitches thrown respectively. Correlation coefficients from left to right are 0.28,-0.64,-0.68. Each point represents an individual pitcher.

As we can see from the data collected, ERA has a relatively strong negative correlation with K% and K-BB%. Our aim now is to estimate ERA while taking strikeouts, walks, and home runs into account, all of which are not in the calculation of BABIP. One statistic that attempts to do so is FIP (Fielding Independent Pitching), calculated as FIP = [13HR + 3(B + HBP) - 2*K] / IP + Constant where HR = home runs, B = walks, HBP = hit by pitch, K = strikeouts, IP = innings pitched. The constant is intended to bring FIP on the order of ERA and is a function of league-average ERA.

In figure 5 below, we can see that FIP is highly correlated with ERA and shows relatively no correlation with BABIP.

fip-vs-babip-510970-p6SkntgY | Data Science Blog — Figure 5: (Left) FIP vs. BABIP, size weighted by total innings pitched. (Right) FIP vs. ERA color and size weighted by the their difference ERA-FIP and total innings pitched respectively. Each point represents an individual pitcher.

fip-vs-era-740578-rzEbXiVG | Data Science Blog — Figure 5: (Left) FIP vs. BABIP, size weighted by total innings pitched. (Right) FIP vs. ERA color and size weighted by the their difference ERA-FIP and total innings pitched respectively. Each point represents an individual pitcher.

FIP does a reasonably good job at estimating ERA while minimizing dependence on defensive performance and BABIP. Moreover, it is clear from its formula that FIP minimizes by having a high K%, low BB%, and low HR total. Moreover, we can be more confident in a player with a minimal FIP and a large number of innings pitched. In Figure 6 below is a plot of FIP vs. K-BB%

fip-vs-k-bb-481465-z9v3lM1Y | Data Science Blog — Figure 6: FIP vs. K-BB% color and size weighted by innings pitched and pitches thrown respectively. Once again, each point represents an individual pitcher.

From a Management Perspective:

Here is a quick recap of what we have concluded so far:

1.) ERA is dependent on BABIP.

2.) BABIP is largely out of the pitcher's control, and thus has some luck associated with it. A pitcher's ERA seems to inherit some of the variance associated with BABIP, likely at no fault of his own.

3.) FIP can be a good estimate of ERA, and effectively takes out the dependence on BABIP & defensive performance. The difference between the two can be useful in identifying over and under performers.

4.) FIP has a strong negative correlation with K-BB%.

Data Interpretation

So as a manager, what are you supposed to do with these conclusions? I would suggest considering the difference between a pitcher's ERA and their FIP (E-F). A positive difference can imply that a pitcher is more effective than their ERA might suggest. Conversely, a negative difference implies that a pitcher is overperforming and potentially benefiting from good fortune on balls in play.

Potential interpretation

I do not intend to say that BABIP is solely responsible for the difference between ERA and FIP. However, it is a good starting point when determining why a pitcher may or may not be over or underperforming. In general, managers should be careful not to overpay overperformers. Underperformers can prove great value opportunities when considering player trades, free agency pickups, or contract discussions. Figure 7 below gives a comprehensive picture of what my proposed first step in evaluating pitchers should be.

Suggestions

Another suggestion is to emphasize strikeouts and walks when evaluating a pitcher. A deeper analysis regarding why a given pitcher is successful in the strikeout department can reveal which pitchers have a better chance at sustained success. Which pitchers are better prepared to weather the inevitable storms of poor luck on balls in play? The answer: the ones who strike batters out and don't let the ball get put in play.

Further Work To Be Done:

With the evolution of measurement technology, we find ourselves in an era of baseball full of highly detailed data. Specific pitch arsenals of any pitcher can be analyzed via metrics such as average spin rate, average movement, average velocity, or even how effectively a pitcher targets locations around the strike zone. To understand what makes a pitcher elite in terms of strikeouts, this is the necessary next step. I intend to do so as soon as time permits. In the meantime, thank you for reading this post all the way through! For anyone more interested in the interplay between ERA and FIP, check out this article posted on Fangraphs.com:

Evaluating the Gap Between ERA and FIP

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Robert Lando

View all posts by Robert Lando >

Machine Learning

Beware of Feature Importance for Business Decisions

Python

When You Want to Be Offensive: Understanding Football Receiving Positions

Python

Exploring Chess Openings: Can We Pinpoint a 'Best' Opening?

Machine Learning

The Best Bang for Your Buck in Ames, Iowa

Python

CitiBike Supply and Demand in NYC

No comments found.

Using Data to Evaluate MLB Pitchers with Visualizations

Data Context & Motivation:

Data Assumption

Common Sense

Method of Calculation

Data Processing

How Are Runs Scored?

So What Does The Pitcher Have Control Over?

From a Management Perspective:

Data Interpretation

Potential interpretation

Suggestions

Further Work To Be Done:

About Author

Robert Lando

Related Articles

Leave a Comment

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Using Data to Evaluate MLB Pitchers with Visualizations

Data Context & Motivation:

Data Assumption

Common Sense

Method of Calculation

Data Processing

How Are Runs Scored?

So What Does The Pitcher Have Control Over?

From a Management Perspective:

Data Interpretation

Potential interpretation

Suggestions

Further Work To Be Done:

About Author

Robert Lando

Related Articles

Leave a Comment

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!