Using Data to Analyze Clutch Batting in MLB
Data Science Introduction
Data shows that MLB, the major league of baseball, is one of the largest sports leagues in the United States. According to the estimation of Sportico, the average value of the thirty MLB franchises reached about 2.2 billion in 2021. As the baseball players are one of the most critical assets of the teams, it is therefore important for the teams and interesting for the fans to assess the skills of baseball athletes.
More importantly, the success story of Oakland Athletics’ “Moneyball” season in 2002 has stimulated the entire baseball society to develop a data-driven approach to better evaluate the baseball player’s skill from the statistics.
Despite the extended history of research in baseball statistics, clutch pitching and batting remained mysterious. Roughly speaking, we say a batting situation is a clutch situation if the batter is in a situation when he could change the game in a swing. We are in particular interested in the following question: Is there an indicator, or a combination of indicators that can help us identify good clutch batters?
Looking back to the literature, it seemed that the answer to the question had been controversial. Many statisticians, believed that the clutch batting ability is inexistent or negligible. For example, Cramer stated that there is virtually no evidence of clutch hitting ([Cramer 1977]); while Silver believed that clutch hitting exists, but the difference should be insignificant [Silver 2006]. On the other hand, many baseball players do believe that there is a conceivable difference between good and bad clutch hitters when the game is in the line (see ).
Our goal of the project is to look into the data and try to give our answers to the following questions:
- Does clutch hitting ability statistically exist?
- What are the factors related to clutch hitting ability?
- Are there a certain group of players better at clutch hitting?
Data Acquisition
The first step of the project is to collect the baseball data. Although baseball statistics nowadays have become more comprehensive than in the past. Since the planned scope of the project is the players throughout the MLB history, we will scrape the batting data from Retrosheet. Writing a Python script using the BeautifulSoup library, we got a database of 19917 MLB players. After ignoring those who have never played as a batter, we scraped 16009 players and got 84231 batting seasons among these players.
A very handy feature for the data collected from Retrosheet is that they parsed through the entire log of every game, and collected the batting statistics for four clutch situations:
- Where there is a runner on the base
- Where there is a runner in scoring position (second or third base)
- When the game is “late and close”
- When all the three bases are loaded with runners
In each situation, we will look at the following three batting statistics of each player, and compare them with the corresponding overall statistics:
- AVG, the batting average, which indicates the percentage a batter makes a hit,
- OBP, the on-base percentage, which indicates the percentage a batter makes it to at least the first base, and
- SLG, the slugging percentage, which takes into account the percentage a player makes an extra-base hit.
Does clutch batting exist? Introducing the p-value scores
Now, as we get the data, it is time for us to find a way to evaluate “clutch batting”. The essence of the question is that we need to figure out metrics that address the following challenges:
- Be able to statistically distinguish whether the clutch batting ability is completely determined by luck.
- Reflect on the fact that every batter has a different batting ability.
Our first approach is to introduce a p-value for each combination of statistics and situation. To give a concrete example, let us look at the career AVG overall and career AVG in the “runner in scoring position” (or RISP) situation for two Hall-of-Famer players:
Name | AVG | AVG_RISP |
Ivan Rodrigues | .293 | .290 |
Hank Aaron | .296 | .326 |
Table 1: The total AVG and the AVG in RISP situation for two baseball players
Since Hank Aaron had a slightly higher batting average, we need to take that into account when evaluating their clutch batting ability. Taking that into consideration, we can apply the statistics and compute a p-value, which gives a rectified score for his batting average in the RISP situation.
The intuitive meaning of the p-value score for Hank Aaron is as follows: Assumption that the chance of Hank making a hit is .296, and is invariant in every situation (just image that Hank is throwing a coin with a probability of 29.6% every time he is at bat). What is the probability that his batting average in RISP is .326 or less?
Using Fisher’s exact test implemented in Scipy, we can get the p-value score, which is a score between 0 (when the batting average in RISP is extremely low) and 1 (when the batting average in RISP is extremely high). Suppose that we are in a world where there is no clutch batting ability at all, then the p-value score will be uniformly distributed between 0 and 1 in theory. So, we can gain some insight on whether clutch batting ability exists by checking whether this is the case.
To visualize the distribution of the p-value score, we can draw a cumulative graph so that the y-axis is the p-value, and the x-axis is the number of players. In the case the hypothesis that clutch batting ability does not exist is true (which means that batting in a clutch situation depends solely on luck), the cumulative graph will be a straight line.
The red lines in the able 2 illustrate the p-value scores of the AVG for all eligible players in the four clutch situations. Alternatively, we can also check whether the distribution is uniform by the Kolmogorov-Smirov test. Using the functions in Scipy, we can get the p-value of the test, and conventionally we will reject that the distribution is uniform if the p-value is less than 0.05.
Runners on scoring position | Men on bases |
pKS=0.1356 | pKS=6.97e-12 |
Close and late | Bases loaded |
pKS=0.0012 | pKS=0.5413 |
Table 2: The cumulative graphs, the histograms of the rectified p-value scores, and the Kolmogorov-Smirov p-values (pKS) of the rectified p-value scores of the AVG for all eligible players in various clutch situations. In the cumulated graphs, the y-axes are the p-value scores, and the x-axes are the number of eligible players. The red lines are the original p-value scores, and the blue lines are the rectified p-value scores.
Looking at the cumulative graphs, this p-value scores approach did not go well: they are biased either toward 0 or 1. This means that in all the clutch situations, the batting averages are biased: sometimes to the higher end, and sometimes to the lower end. We are not able to specify reasons for these phenomena, but some possible explanations may be that in late and close situations, batters will face the elite setup-men or closers, and making hits will be harder; and weaker pitchers have a higher chance of facing RISP situations than stronger ones, so batters have an easier time facing these in average.
Therefore, aiming to annihilate such kind of discrepancy, before computing the p-value score, we adjusted the clutch batting statistics by a weighted linear regression. The cumulated graph of the adjusted p-value scores is shown in blue in Table 1. In Table 1, we plotted the histogram of the adjusted p-values and the adjusted Kolmogorov-Smirov p-values.
It is evident that after the adjustment, the distribution of the p-values is much closer to the uniform distribution. And the Kolmogorov-Smirov p-values even suggest that in some of the clutch situations, the clutch batting average is even statistically indistinguishable from battering completely by luck!
In conclusion, it turned out that after the bias removal, clutch batting ability seemed to be non-existent in some cases. For the other cases, since the p-values are still biased to both extremes in some cases, and the Kolmogorov-Smirov test suggested that the distribution is not uniform in some cases, we suspect that the clutch batting ability still exists, but not to a really large extent. We will dive deeper into these p-value scores in various statistics and situations to search for more fun facts which might relate to clutch batting ability under our settings.
Searching for factors affecting clutch batting ability
So far, for each statistic (AVG, OPS, and SLG) and each clutch situation, we can compute the p-value score for each batter, giving us 12 scores. A natural question is: are those scores consistent? It turned out to be trickier than we thought. Let’s look at the example between the AVG of Ivan Rodriguez and Hank Aaron again in Table 3:
AVG | AVG)RISP | p_RISP | p_Men_on_base | p_Close_&_late | p_Bases_loaded | |
Ivan Rodriguez | .293 | .290 | .139 | .271 | .266 | .520 |
Hank Aaron | .296 | .326 | .994 | .767 | .997 | .485 |
Table 3: The AVG and p-value scores of two Hall-of-Famer players in various situations
According to our metric, it seems that although Hank Aaron has a much higher p-value score in the RISP situation, it is not consistent in other clutch situations. It suggests we compute the correlation among all the p-score values from all the situations and batting statistics.
Table 4: The correlation matrix of p-value scores over various situations and statistics
We observed some facts from the correlation matrix in Table 4. On one hand, it is less surprising that there is a high correlation on the p-value scores on different statistics (AVG, OBP, and SLG) over the same situation, as shown in the diagonal blocks separated by the blue lines; on the other hand, it is surprising that the correlation is in general low (within plus-minus 0.3) over different clutch situations, even though those clutch situations more or less overlap.
The low correlation of p-value scores over different clutch situations seemed to suggest that the batting statistics, AVG, OLB, SLG, might reflect more luck than the true “clutch batting ability”.
Let’s still assume that the p-value scores still reflect clutch batting ability to some extent. A natural question is, is there any other batting ability or characteristics related to the p-value scores? To inspect this question, let us look at some other batting statistics which have been extensively used to evaluate a player:
- AB/HR: At bats per home run
- BABIP: Batting average on balls in play
- BB/PA: The frequency to get a walk
- BB/K: Number of walks per strikeouts
- HR/H: Home runs per hit
- ISO: Isolated power: a hitter's ability to hit for extra bases
- PA/SO: Plate appearances per strikeout
- OPS: On-base average plus slugging average
Some of them can be directly picked up from Retrosheets, and the others can be derived from basic statistics. We can then add the plethora of these statistics with the p-values and get a large correlation matrix, as shown in Table 5:
Table 5: The correlation matrix of p-value scores over various situations and statistics, together with other batting statistics
From Table 5, we can see that most of the batting statistics have a low correlation with the p-value scores (within plus-minus 0.3) and low statistical significance. Assuming that one is willing to aggressively take low correlation (around plus-minus 0.2) as causality and believe that the clutch batting ability lies in the p-value scores, one can make the following hypotheses:
- Since AB/HR has a positive correlation, and HR/H and ISO have a negative correlation with the p-value scores, players who make more longer hits tends to be worse in the clutch; and
- Since BB/K and PA/SO have positive correlations with the p-value scores, players who walk more and get struck out less often tend to be better clutch players.
The hypotheses are consistent with Silver’s assertions on sluggers and home plate discipline in [Silver 2006].
Comparing among groups
We are also interested in whether the backgrounds of baseball players affect their mentality or clutch batting ability. Again, assuming that the p-value scores reflects the clutch batting ability to some extent, we can make some comparison.
Left-handed versus right-handed: First, we can compare the batting position. Figure 6 is a violin plot that gives the comparison among the distribution of p-value scores over left-handed, right-handed, and switch hitters. It turned out that the p-value scores for the left- and right-handed batters are rather uniformly distributed from 0 to 1. On the other hand, although the p-value scores of switch hitters are somewhat distributed to the lower end, we will reserve our conclusion since the sample size is much smaller.
Figure 6: The frequency distribution of the p-value scores over left-handed, right-handed, and switch hitters
Domestic versus international players: Figure 7 gives a comparison among the distribution of p-value scores over players from the United States and players from other countries. It seems that there is no significant difference between the clutch batting ability.
Figure 7: The frequency distribution of the p-value scores over domestic and international players
Are Hall-of-Famers better in clutch situations? Figure 8 gives a comparison between the distribution of p-value scores over Hall-of-Famer batters over other players. It seems that there is no significant difference between the clutch batting ability. It seems that the p-value scores of Hall-of-Famers are leaning toward the higher end, which probably indicates that Hall-of-Famers are better clutch players on average.
Figure 8: The frequency distribution of the p-value scores over Hall-of-Fame batters and other players
Conclusion and further directions
Using the traditional batting statistics of MLB baseball players over various clutch situations, we developed a scoring metric, the p-value scores, to evaluate the outcomes of a baseball player in clutch situations, adjusted by their batting ability. We analyzed these metrics and found that the clutch batting ability might have at most a slight relation with the actual outcomes. Finally, we analyzed the relation of the p-value scores with other batting statistics and the relation of the p-value scores with players with different backgrounds.
Listed below are some aspects that I am as well interested, but did not have time to address:
- Find a better model than weighted linear regression to adjust the p-value scores.
- Investigate the relation between p-value scores and different periods in the season.
- Take into account the effect of different stadiums and years.
- MLB improved their equipment and made more advanced metrics, Statcast since 2015. Find if there is a metric that better indicates clutch batting ability.
Reference
[Cramer 1977] R. Cramer, "Do Clutch Hitters Exist?", SABR Baseball Research Journal (1977)
[Silver 2006] N. Silver, "Is David Ortiz a Clutch Hitter?", in Jonah Keri, Ed., Baseball Between the Numbers (New York: Basic Books, 2006): 14–35.
[Verducci 2004], T. Verducci, "Does Clutch Hitting Truly Exist?", Sports Illustrated, April 5, 2004: 60–62.
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.