Data on 3-Point Shot in the NBA's Trends and Team Success
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Data tells us a lot about the correlation between NBA's Trends and Team Success. The first NBA season was in 1949-50; however, it was not until the 1979-80 season that the 3-point shot was first introduced. The current NBA 3-point line is shortest in the corners at 22 feet and the rest of the 3-point arc is 23 feet 9 inches. It took some time for players to attempt these shots at a higher frequency and it has become more prevalent slowly over time. In the past 10 years the accuracy of shooters has increased through training and practice and the frequency of attempts has increased so dramatically that it has completely changed offensive and defensive strategies.
The goals of this analysis were to analyze 1) the league-wide trends in shooting to determine what types of shots are most efficient and 2) trends by team to correlate shot making strategies with team success (more wins).
The stakeholders here are 1) team owners/General Mangers/scouts to determine players to acquire, 2) coaches for making offensive and defensive strategies, and 3) players to make decisions on what types of shots to practice and improve.
The data was composed of individual csv files, each of which contained every play of that NBA season (https://sports-statistics.com/sports-data/nba-basketball-datasets-csv-files/). From each of these csv files, a Pandas Dataframe was created where each row was a single play and each column contained attributes of that play. The dataframes were condensed and only a small subset of columns were used, such as the play description and team. The play description was a string that described the play, such as “James 27’ 3PT Jump Shot (9 PTS) (Rondo 5 AST)”. Using regular expressions and other techniques, I extracted the shot type (2-point of 3-point), shot distance, and shot outcome (make or miss). I then made these new columns in my dataframe to use for analysis.
I first analyzed how the frequency of 3-point shot attempts has changed over the last decade. Figure 1 shows the proportion of all shot attempts league-wide that are 2-pointers and 3-pointers from the 2010-11 season through the 2018-19 season. The proportion of all shot attempts that are 3-pointers has increased from 20% in 2010-11 to 30% from 2018-19. Considering that most of the court area that players typically shoot from is composed of 2-point shots, the statistic that ~30% of all shot attempts are 3-pointers is quite high.
Figure 2 shows the distribution of all shot attempts during the 2019-20 season at all shot distances from the hoop. It is clear there are two peaks, the first is from 0-3 feet and the other from 23-27 feet. These peaks correspond to shots at or near the basket and the shortest 3-point shot distances, respectively. There is a dearth of data in between these peaks, which corresponds to mid-range shots, which have been slowly phased out of offensive strategy for most NBA teams.
Shot Distances Analysis
The distribution of shot attempts is likely closely related to the efficiency at these shot distances, so I investigated these trends next. Figure 3 shows field goal percentage (FG%), which represents the proportion of shots made to shot attempts, as a function of shot distance from 0-40 feet. The highest FG% were near the basket: >70% from 0-1 feet and >60% from 2 feet. This FG% decayed quickly to <40% at 4 feet. It is interesting to note that FG% does not vary much (+/- a few percentage points) with shot distance over most of the rest of this distribution, from 4-30 feet.
I hypothesize this is likely not due to the difficulty of the shot but rather a function of the location of defensive players. Shots taken from closer to the basket are likely more contested (the defender is closer to the shooter) than those taken further away, such as 3-point attempts. Data that includes the distance of the closest defender to the shooter would aid in the explanation of this observed trend.
Effective field goal
Since 3-point shots are worth 1.5 times the amount of each 2-point shot, a statistic that is commonly used in the current NBA is effective field goal percentage (EFG%), which weights 3-point makes by a factor of 1.5 in the FG% calculation. Figure 4 shows EFG% versus shot distance. Similar to figure 3, the shots from 0-2 feet are still the most efficient. However, after these shots, 3-point shots are most efficient due to their higher value than 2-point shots.
As a result, for most players, if they cannot get to within 2 feet of the basket, they should be attempting a 3-pointer. I will emphasize that this is a general strategy, and more specific strategies can be tailored to individual players using their individual statistics.
Team Trends Correlated with Wins
After investigating league-wide trends in shot attempts and makes, the next question was whether there was a correlation between shot statistics and wins on a team-by-team basis. Figure 5 shows the number of wins for each NBA team in the 2019-20 season versus the 3-point percentage of that team. There does not seem to be an overall trend and there are large amounts of scatter in the data.
Shooting efficiency of each team
Figure 6 shows the same data as figure 5 and separates the data into two groups (teams that made the playoffs and teams that did not); however, there still appears to be no observable trend. This is likely due to the simplicity of the 3-point percentage statistic. The 3-point percentage does not consider the number of 3-point attempts, the number of 2-point attempts, and the 2-point percentage. The EFG% includes all of these statistics in a weighted average and is likely to give a better indicator of the shooting efficiency of each team.
Figure 7 shows the number of wins plotted against the EFG% and a much better linear trend was observed with an R2 value of 0.59. The trend is quite good considering that only shooting stats were used and many other statistics neglected, such as rebounds, assists, turnovers, free throw percentage, etc. Additionally, it is quite accurate at the extremes, since the team with the lowest EFG% had the least number of wins and the team with the highest EFG% had the greatest number of wins.
Figure 8 shows the same data as figure 7 and separates the data by playoff and non-playoff teams. Teams that made the playoffs had an EFG% greater than 0.53, while teams that did not make the playoffs had an EFG% less than 0.53, with a few outliers. This team analysis was replicated for the 2018-19 season with virtually the same results.
Data Science Conclusions
The frequency of 3-point shot attempts has increased over the last decade (fig. 1) and currently most shot attempts are taken near the basket and from 3-point range (fig. 2). The most efficient shots are those near the basket (0-2 feet) followed by 3-pointers, due to their enhanced weighting when calculating EFG% (figs. 3,4). The team analysis showed no trend in wins versus 3-point % (figs. 5,6). However, there was a clear trend of wins versus EFG%, and most playoff teams had an EFG% greater than 0.53 for the 2019-20 season (figs. 7,8).
In the future, including other statistical categories, such as turnovers, FT%, and rebounds, can be used to create a more robust model of team wins. Additionally, the team analysis that was done could be done on an individual player basis and use the time remaining and score to determine what players are the most clutch performers.
Additional data that was not part of this dataset that would be extremely useful are the location of each shot attempt and distance to the nearest defender. This data could be used to create a predictive model of EFG% (on a player and team basis) as a function of the location of the player and distance to the nearest defender.