Data on 3-Point Shot in the NBA's Trends and Team Success

Posted on Oct 24, 2021

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Introduction

Data tells us a lot about the correlation between NBA's Trends and Team Success. The first NBA season was in 1949-50; however, it was not until the 1979-80 season that the 3-point shot was first introduced. The current NBA 3-point line is shortest in the corners at 22 feet and the rest of the 3-point arc is 23 feet 9 inches. It took some time for players to attempt these shots at a higher frequency and it has become more prevalent slowly over time. In the past 10 years the accuracy of shooters has increased through training and practice and the frequency of attempts has increased so dramatically that it has completely changed offensive and defensive strategies.


The goals of this analysis were to analyze 1) the league-wide trends in shooting to determine what types of shots are most efficient and 2) trends by team to correlate shot making strategies with team success (more wins).

The stakeholders here are 1) team owners/General Mangers/scouts to determine players to acquire, 2) coaches for making offensive and defensive strategies, and 3) players to make decisions on what types of shots to practice and improve.

Data Cleaning

The data was composed of individual csv files, each of which contained every play of that NBA season ( From each of these csv files, a Pandas Dataframe was created where each row was a single play and each column contained attributes of that play. The dataframes were condensed and only a small subset of columns were used, such as the play description and team. The play description was a string that described the play, such as “James 27’ 3PT Jump Shot (9 PTS) (Rondo 5 AST)”. Using regular expressions and other techniques, I extracted the shot type (2-point of 3-point), shot distance, and shot outcome (make or miss). I then made these new columns in my dataframe to use for analysis.

Data Analysis

League-wide Trends

I first analyzed how the frequency of 3-point shot attempts has changed over the last decade. Figure 1 shows the proportion of all shot attempts league-wide that are 2-pointers and 3-pointers from the 2010-11 season through the 2018-19 season. The proportion of all shot attempts that are 3-pointers has increased from 20% in 2010-11 to 30% from 2018-19. Considering that most of the court area that players typically shoot from is composed of 2-point shots, the statistic that ~30% of all shot attempts are 3-pointers is quite high.

Data on 3-Point Shot in the NBA

Figure 1. Proportion of 2-point and 3-point shot attempts by year.


Figure 2 shows the distribution of all shot attempts during the 2019-20 season at all shot distances from the hoop. It is clear there are two peaks, the first is from 0-3 feet and the other from 23-27 feet. These peaks correspond to shots at or near the basket and the shortest 3-point shot distances, respectively. There is a dearth of data in between these peaks, which corresponds to mid-range shots, which have been slowly phased out of offensive strategy for most NBA teams.

Figure 2. Shot attempts versus shot distance.


Shot Distances Analysis

The distribution of shot attempts is likely closely related to the efficiency at these shot distances, so I investigated these trends next. Figure 3 shows field goal percentage (FG%), which represents the proportion of shots made to shot attempts, as a function of shot distance from 0-40 feet. The highest FG% were near the basket: >70% from 0-1 feet and >60% from 2 feet. This FG% decayed quickly to <40% at 4 feet. It is interesting to note that FG% does not vary much (+/- a few percentage points) with shot distance over most of the rest of this distribution, from 4-30 feet.

I hypothesize this is likely not due to the difficulty of the shot but rather a function of the location of defensive players. Shots taken from closer to the basket are likely more contested (the defender is closer to the shooter) than those taken further away, such as 3-point attempts. Data that includes the distance of the closest defender to the shooter would aid in the explanation of this observed trend.

Figure 3. Field goal percentage versus shot distance.

Effective field goal

Since 3-point shots are worth 1.5 times the amount of each 2-point shot, a statistic that is commonly used in the current NBA is effective field goal percentage (EFG%), which weights 3-point makes by a factor of 1.5 in the FG% calculation. Figure 4 shows EFG% versus shot distance. Similar to figure 3, the shots from 0-2 feet are still the most efficient. However, after these shots, 3-point shots are most efficient due to their higher value than 2-point shots.

As a result, for most players, if they cannot get to within 2 feet of the basket, they should be attempting a 3-pointer. I will emphasize that this is a general strategy, and more specific strategies can be tailored to individual players using their individual statistics.

Figure 4. Effective field goal percentage versus shot distance.


Team Trends Correlated with Wins

After investigating league-wide trends in shot attempts and makes, the next question was whether there was a correlation between shot statistics and wins on a team-by-team basis. Figure 5 shows the number of wins for each NBA team in the 2019-20 season versus the 3-point percentage of that team. There does not seem to be an overall trend and there are large amounts of scatter in the data.

Shooting efficiency of each team

Figure 6 shows the same data as figure 5 and separates the data into two groups (teams that made the playoffs and teams that did not); however, there still appears to be no observable trend. This is likely due to the simplicity of the 3-point percentage statistic. The 3-point percentage does not consider the number of 3-point attempts, the number of 2-point attempts, and the 2-point percentage. The EFG% includes all of these statistics in a weighted average and is likely to give a better indicator of the shooting efficiency of each team.

Figure 5. Wins versus 3-point percentage by team.


Figure 6. Wins versus 3-point percentage by team, grouped by playoff and non-playoff teams.


Figure 7 shows the number of wins plotted against the EFG% and a much better linear trend was observed with an R2 value of 0.59. The trend is quite good considering that only shooting stats were used and many other statistics neglected, such as rebounds, assists, turnovers, free throw percentage, etc. Additionally, it is quite accurate at the extremes, since the team with the lowest EFG% had the least number of wins and the team with the highest EFG% had the greatest number of wins.

Figure 8 shows the same data as figure 7 and separates the data by playoff and non-playoff teams. Teams that made the playoffs had an EFG% greater than 0.53, while teams that did not make the playoffs had an EFG% less than 0.53, with a few outliers. This team analysis was replicated for the 2018-19 season with virtually the same results.

Figure 7. Wins versus effective field goal percentage by team.


Figure 8. Wins versus effective field goal percentage by team, grouped by playoff and non-playoff teams.

Data Science Conclusions

The frequency of 3-point shot attempts has increased over the last decade (fig. 1) and currently most shot attempts are taken near the basket and from 3-point range (fig. 2). The most efficient shots are those near the basket (0-2 feet) followed by 3-pointers, due to their enhanced weighting when calculating EFG% (figs. 3,4). The team analysis showed no trend in wins versus 3-point % (figs. 5,6). However, there was a clear trend of wins versus EFG%, and most playoff teams had an EFG% greater than 0.53 for the 2019-20 season (figs. 7,8).

Future Work

In the future, including other statistical categories, such as turnovers, FT%, and rebounds, can be used to create a more robust model of team wins. Additionally, the team analysis that was done could be done on an individual player basis and use the time remaining and score to determine what players are the most clutch performers.

Additional data that was not part of this dataset that would be extremely useful are the location of each shot attempt and distance to the nearest defender. This data could be used to create a predictive model of EFG% (on a player and team basis) as a function of the location of the player and distance to the nearest defender.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI