2020 NBA Season Analysis

Posted on May 17, 2020
LinkedIn | Github

Inspiration and Goals

Sadly, the NBA season has been put on hold this year due to coronavirus, and as an avid basketball fan and long time player, I was devastated to find out that the NBA might be canceling the remainder of the season if the situation does not improve. Usually, around this time of year, I would be eagerly awaiting the NBA playoffs, excited to watch the best players in the world compete for the ultimate prize. Instead, the season might be canceled all together.ย 

I decided to reconnect with my passion for basketball by doing my python web scraping project on the NBA season. My goals for the project were two fold:ย 

  1. Although this season might be over, I wanted to use the 2020 season data to look ahead to next season. I wanted to see if I could identify rising talent to watch for next season, both individual players and teams, while also identifying undervalued players.
  2. Additionally, I wanted to see if I could speculate, based on the data, if I could crown my own 2020 NBA awards, specifically for the league MVP, and the NBA championship.


The Data

For this project I chose to use Selenium to scrape stats.nba.com for data on each player for each of the last 5 seasons. The data came out to be around 2500 rows and 31 columns. I had to web scrape the data because the NBA does not make the data downloadable on their website, and I used Selenium because the website contains dynamic tables that would not be able to be scraped using Scrapy.ย 

There are a few important definitions to note within the data that are important to the project. First, all the stats collected were on a per game basis. Second, the Plus/Minus stat is defined as how the team did when the player was on the court.

This provides insight into how impactful a player is for his team, while also taking into account โ€œintangiblesโ€ that do not get recorded on the stat sheet. Lastly, โ€œRelevant Playersโ€ is a definition that I created myself, in order to avoid including injured players or bench warmers in my calculations, as including these players skews the numbers.

I defined Relevant Players as players that played at least half the season, meaning that they played in at least half the games, and in the games that they played, they averaged playing at least half the minutes. By doing this, I essentially filtered for the core players that do the majority of the heavy lifting for their respective teams.


Looking Ahead to 2021

How do I identify up and coming players?

In order to find up and coming players, I decided to use my own domain knowledge as well as research to filter the 500+ NBA players based on specific criteria. First, the players needed to be young, and for that I decided to filter for players below 24 year old. I chose 24 because the average age of retirement in the NBA is mid 30s, meaning about 15 years playing in the league. 24 would mean the player is in the first third of his career.

Second, I wanted to find players that had above average team impact, which I filtered for players with a Plus/Minus > +0.3. I chose +0.3 because this was the average Plus/Minus for Relevant Players, and I wanted to find players that were above that. Lastly, I noticed that this gave me players that were already considered stars, but these stars all played over 30 minutes per game because their teams relied on them heavily.

In order to filter out the star players and get the players that are rising, I filtered for players that averaged less than 30 minutes per game.ย Based on this criteria, I was able to come up with the following group of young players below that I believe have the potential to exceed expectations next year.ย 

Young Teams to Watch?

After analyzing young players to watch, I also wanted to look at young teams to watch. In order to analyze this, I grouped by team, and then created a box plot (see below) of the median age for each team. What I found actually was not surprising, as most of the older, more experienced teams, were some of the best in the league, while younger teams were all towards the bottom of the league.

For example, Milwaukee, Houston and LA Lakers were the three most experienced teams, and they had 142 wins total between them. Meanwhile the least experienced teams, Minnesota, Phoenix and Cleveland, combined for just 64 wins.ย 


Additionally, two teams stood out to me as teams to watch for next season. First was Memphis, who was the only team in the bottom quartile when it comes to age, that was still on track to make the playoffs this year. Next year, with a little more experience, it is possible that they could perform even better.

The second team that stood out was Golden State. The Warriors are interesting because this year two of their top players, Steph Curry and Klay Thompson, were out with injuries, in turn allowing their fairly young team to build up crucial game experience that they otherwise may not have, had the two stars been healthy.ย 


Which Players are Undervalued?

To try to get a sense of which players were undervalued, I started by plotting the points per game vs salary for every player in the NBA. As shown in the plot, most players fit into the category of low scoring (under 10 points per game), but also not making that much money (under $5M per year). However, as players start scoring more, their salaries also tend to increase, with the top paid players scoring over 20 points for game and making upwards of $35M per year.

The interesting area that I wanted to look at was the red box, which shows players that score over 15 points a game, but are still making less than $5M per year. After filtering for these players, I came up with the plot below showing 16 undervalued scorers in the NBA. This information could be useful for coaches that are looking to add a scorer to their lineup, or a GM who is concerned about salary cap space.


2020 NBA Awards


Next I wanted to look at who would win the MVP award. My methodology for analyzing this was three fold:ย 

  • Look at stats for the previous 4 NBA MVPS
  • Combine this with domain knowledge to come up with Key Stats
  • Compare the 2020 MVP candidates using these Key Stats

Based on this methodology the three Key Stats I came up with were Points per Game, Plus/Minus (Team Impact), and Team Wins. By looking on the graphs below, we can see that previous MVPs not only averaged over 25 points per game and had high team impact, but their teams actually performed at an extremely high level, winning close to 80% of their games.




Now, by comparing the 2020 MVP candidates based on these stats we can try to speculate who will win this year. We see that in the first graph, Giannis and James harding lead the way in scoring, but Giannis has a far higher Plus/Minus. Additionally, when we look at team performance, both Giannis and LeBron James stand out, but Giannis also holds the lead in this category. For this reason, I select Giannis Antetokounmpo as my 2020 MVP.ย 


NBA Championship

For the final section, I wanted to see if I could crown my own NBA champion using the data from the season. Comparing teams can get tricky because you can not necessarily compare based on record, given that teams play in different divisions and end up playing against different teams.

The way I thought to compare teams was by comparing the actual players on each of the teams. In order to do this, I grouped the players by team and looked at the average Plus/Minus in an effort to see how much value the players bring to the team. Through this comparison, I found Milwaukee to be the rated highest. This comparison showed me Milwaukee to be the rated highest.

The last piece of analysis I did, which was the most eye opening, was to look at the top 20 players in the NBA just based on team impact, or Plus/Minus. I then looked at how many of those players were on each team. What I found was that Milwaukee had a whopping 7 players in the top 20, more than double the next highest team. This was especially impressive considering that this means that Milwaukee's entire starting lineup, along with two bench players, all add significant positive minutes when they were out on the court. Based on these numbers, I selected the Milwaukee Bucks as my 2020 NBA Champion.ย 

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI