Data Visualizing Overall Game Performance among Superstars

Posted on Oct 24, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

In the NBA, a top player makes around a thousand shots during the entire regular season. A question worth asking is: What information can we get by looking at these shots? As a basketball fan for more than 10 years, I am particularly interested in discovering facts that can not be directly seen on live TV. When I was surfing on web last week, I found a data set called NBA shot-log from Kaggle. This data summarizes every shot made by each player during the games in the 14/15 regular season along with a variety of features. I decided to perform an exploratory visualization with this data.

Now Let's dive into the shot-log, and see what interesting information we can discover in terms of game style and shooting performance among NBA players. I focused this analysis on Stephen Curry, James Harden, Lebron James and Russell Westbrook, who are ranked 1-4 in the MVP ballot in 2014-to-2015 season and undoubtedly superstars in the league.


Data Obtaining and Processing

The CSV file and the variable descriptions can be accessed here. Data cleaning, feature creating and graph processing were performed using R. The package used for generating graphs is ggplot2. The R code for data cleaning and feature creation can be found here.


Data Visualization

Shot Attempts

Data Visualizing Overall Game Performance among Superstars

Figure 1. Shot density plot with respect to shot distance.

The graph above demonstrates the distribution of the shot attempts by each player versus shot distance. All four players have a local maximum centered at around 5 feet and 25 feet, corresponding to lay-up region and three-point region. Curry has the shot density leaning towards three-point zone while James shot more shots at the paint zone, indicating different play style between two players. It can also be seen that Westbrook uses two-point jumper frequently, as suggested by the peak at around 17 feet.

Shot Accuracy

Data Visualizing Overall Game Performance among Superstars

Figure 2. Violin plot that summarizes shot accuracy for each player.

The above violin plot summarizes the the shot accuracy for each player throughout the season. Based on the visual inspection of this plot, Curry and James have relatively stable shot accuracy compared to Harden and Westbrook (as suggested by a wider shape).

Match Results

Data Visualizing Overall Game Performance among Superstars

Figure 3. Boxplot that describes the shot accuracy with respect to match result.

Shot Attempts and Shot Accuracy

After seeing the summary of shot attempt and shot accuracy, let's explore how these values behave when other factors are taken into account. Let's divide the shot accuracy according to the match result. From the plot, Curry, James and Westbrook display a large gap between the won games and the lost games. In contrast, Harden shows a relatively small accuracy gap.

xw_date

Figure 4. The shot number and shot accuracy with respect to date.

Then let's look at how the shot number and accuracy change over the season timeline. Westbrook tends to make more shots at the end of the season, during which time Oklahoma City Thunder is fighting for the last playoff position. From the graph on the right, Curry and James have relatively stable shot accuracy throughout the timeline, while the accuracy of Harden and Westbrook seems to have greater variance.

Touch Time

xw_shot_touch

Figure 5. Number of shots with respect to touch time.

Now let's see the number of shots plotted against touch time. Curry performed more shot at a very short touch time, indicating his catch-and-release shooting style. In contrast, Westbrook tends to have the ball in hand for a few seconds before taking the shot.

Shot Distance

xw_acc_dist

Figure 6. Shot accuracy with respect to shot distance.

Heat Maps

An interesting phenomenon was observed when plotting shot accuracy against the shot distance. As shown above, the shot accuracy decreases from the lay-up region to around 10 feet. For Curry, James and Westbrook, although value of accuracy differ with each other, they all have a local maximum at around 14 feet. Let's call this region the comfortable zone. On the other hand, the accuracy peak of Harden extends out of the three-point line, which is different with the others. When the comfortable zone is passed, the accuracy for all players decreases monotonically.

xw_dens_def

Figure 7. Density plot with respect to shot distance and closest defender distance.

When combining defender distance into figure 1, we get a contour plot that can give us a general feeling about the play style of each player. From the plot on the left, it can be seen that at lay-up region, the contour plot for Westbrook lies below the one for Curry, meaning that Westbrook tends to make more tough lay-ups than Curry. To my surprise, Westbrook is even more aggressive at the rim than Lebron James.

xw_heat

Figure 8. Shot number and shot accuracy with respect to opponent and players.

From the heat map above, we can view the number of shots and shot accuracy with respect to each opponent. For example, Westbrook made more shots when playing against New Orleans Pelican and Portland Trail Blazers, and Harden had poor accuracy when playing against Boston Celtics.

Hot Hand Effect

xw_hot_hand

Figure 9. The shot accuracy after made shots. The top graph combines all shots, while the bottom graph takes only three point shots into account.

Some people believes that making one shot will affect the accuracy of the next shot. Based on the shot-log, we can actually explore this effect. A set of plots has been generated. For each player, the left most red bar represents the shot accuracy of all shots right after missing one shot. The green, blue, and purple bars represent the shot accuracy after making 1, 2 and 3 consecutive shots.

It is interesting to note that, almost for all players under study, having one shot made seems to have a negative effect on the following shot. The more consecutive shots are made, the lower the accuracy of the next shot. When only three-point shots are taken into account, this trend still holds true for Curry and Lebron James.


Takeaways and Future Direction

From these graphs, we can see that four stars have dramatically different play styles. For example, Stephen Curry tends to perform catch and quick release, while Russell Westbrook prefers to attack the rim with ball in hand. In terms of shot accuracy, Stephen Curry and Lebron James have a more stable performance than Harden and Westbrook. Interestingly, in most cases, hitting one shot tends to have a negative effect on the next shot.

A deeper exploration is needed for more detail about this phenomenon. For the future direction, focusing on the defender side of the data is a potentially interesting extension. Further more, we could also apply machine learning techniques to predict the probability of hitting a shot.

About Author

Xinyuan Wu

Xinyuan recently obtained his Ph.D. from North Carolina State University. He gained quantitative analysis, statistical knowledge and critical thinking from years of research on magnetic and photophysical chemistry. His belief in the trend of predictive analysis, along with...
View all posts by Xinyuan Wu >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI