Visualizing the Game Style and Shooting Performance among Superstars via NBA Shot-log

Xinyuan Wu
Posted on Oct 24, 2016

In the NBA, a top player makes around a thousand shots during the entire regular season. A question worth asking is: What information can we get by looking at these shots? As a basketball fan for more than 10 years, I am particularly interested in discovering facts that can not be directly seen on live TV. When I was surfing on web last week, I found a data set called NBA shot-log from Kaggle. This data summarizes every shot made by each player during the games in the 14/15 regular season along with a variety of features. I decided to perform an exploratory visualization with this data.

Now Let's dive into the shot-log, and see what interesting information we can discover in terms of game style and shooting performance among NBA players. I focused this analysis on Stephen Curry, James Harden, Lebron James and Russell Westbrook, who are ranked 1-4 in the MVP ballot in 2014-to-2015 season and undoubtedly superstars in the league.

Data Obtaining and Processing

The CSV file and the variable descriptions can be accessed here. Data cleaning, feature creating and graph processing were performed using R. The package used for generating graphs is ggplot2. The R code for data cleaning and feature creation can be found here.



Figure 1. Shot density plot with respect to shot distance.

The graph above demonstrates the distribution of the shot attempts by each player versus shot distance. All four players have a local maximum centered at around 5 feet and 25 feet, corresponding to lay-up region and three-point region. Curry has the shot density leaning towards three-point zone while James shot more shots at the paint zone, indicating different play style between two players. It can also be seen that Westbrook uses two-point jumper frequently, as suggested by the peak at around 17 feet.


Figure 2. Violin plot that summarizes shot accuracy for each player.

The above violin plot summarizes the the shot accuracy for each player throughout the season. Based on the visual inspection of this plot, Curry and James have relatively stable shot accuracy compared to Harden and Westbrook (as suggested by a wider shape).


Figure 3. Boxplot that describes the shot accuracy with respect to match result.

After seeing the summary of shot attempt and shot accuracy, let's explore how these values behave when other factors are taken into account. Let's divide the shot accuracy according to the match result. From the plot, Curry, James and Westbrook display a large gap between the won games and the lost games. In contrast, Harden shows a relatively small accuracy gap.


Figure 4. The shot number and shot accuracy with respect to date.

Then let's look at how the shot number and accuracy change over the season timeline. Westbrook tends to make more shots at the end of the season, during which time Oklahoma City Thunder is fighting for the last playoff position. From the graph on the right, Curry and James have relatively stable shot accuracy throughout the timeline, while the accuracy of Harden and Westbrook seems to have greater variance.


Figure 5. Number of shots with respect to touch time.

Now let's see the number of shots plotted against touch time. Curry performed more shot at a very short touch time, indicating his catch-and-release shooting style. In contrast, Westbrook tends to have the ball in hand for a few seconds before taking the shot.


Figure 6. Shot accuracy with respect to shot distance.

An interesting phenomenon was observed when plotting shot accuracy against the shot distance. As shown above, the shot accuracy decreases from the lay-up region to around 10 feet. For Curry, James and Westbrook, although value of accuracy differ with each other, they all have a local maximum at around 14 feet. Let's call this region the comfortable zone. On the other hand, the accuracy peak of Harden extends out of the three-point line, which is different with the others. When the comfortable zone is passed, the accuracy for all players decreases monotonically.


Figure 7. Density plot with respect to shot distance and closest defender distance.

When combining defender distance into figure 1, we get a contour plot that can give us a general feeling about the play style of each player. From the plot on the left, it can be seen that at lay-up region, the contour plot for Westbrook lies below the one for Curry, meaning that Westbrook tends to make more tough lay-ups than Curry. To my surprise, Westbrook is even more aggressive at the rim than Lebron James.


Figure 8. Shot number and shot accuracy with respect to opponent and players.

From the heat map above, we can view the number of shots and shot accuracy with respect to each opponent. For example, Westbrook made more shots when playing against New Orleans Pelican and Portland Trail Blazers, and Harden had poor accuracy when playing against Boston Celtics.


Figure 9. The shot accuracy after made shots. The top graph combines all shots, while the bottom graph takes only three point shots into account.

Some people believes that making one shot will affect the accuracy of the next shot. Based on the shot-log, we can actually explore this effect. A set of plots has been generated. For each player, the left most red bar represents the shot accuracy of all shots right after missing one shot. The green, blue, and purple bars represent the shot accuracy after making 1, 2 and 3 consecutive shots. It is interesting to note that, almost for all players under study, having one shot made seems to have a negative effect on the following shot. The more consecutive shots are made, the lower the accuracy of the next shot. When only three-point shots are taken into account, this trend still holds true for Curry and Lebron James.

Takeaways and Future Direction

From these graphs, we can see that four stars have dramatically different play styles. For example, Stephen Curry tends to perform catch and quick release, while Russell Westbrook prefers to attack the rim with ball in hand. In terms of shot accuracy, Stephen Curry and Lebron James have a more stable performance than Harden and Westbrook. Interestingly, in most cases, hitting one shot tends to have a negative effect on the next shot. A deeper exploration is needed for more detail about this phenomenon. For the future direction, focusing on the defender side of the data is a potentially interesting extension. Further more, we could also apply machine learning techniques to predict the probability of hitting a shot.

About Author

Xinyuan Wu

Xinyuan Wu

Xinyuan recently obtained his Ph.D. from North Carolina State University. He gained quantitative analysis, statistical knowledge and critical thinking from years of research on magnetic and photophysical chemistry. His belief in the trend of predictive analysis, along with...
View all posts by Xinyuan Wu >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp