Data Analysis on the Performances in March Madness

Posted on Sep 21, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Every year in March, the NCAA holds its March Madness basketball tournament to crown a National Champion. Prior to this, many conferences also hold a conference tournament to name their Conference Champions. I've often wondered 'Is there any correlation to how a team peforms in their conference tourney and how they perform in the Big Dance?' So I decided to dive into the data.

The data used was taken from the NCAA and Google Cloud ML Kaggle competition. The competition this data was taken from did not occur because the 2020 NCAA Tournament was cancelled due to COVID-19. Conference Tournament data only went back to 2001, so I truncated all the data to 2001.


Comparing Wins

Data Analysis on the Performances in March Madness

Overall, There doesn't seem to be too much of a correlation. It could be said that only one team has ever won more than 3 games in their conference tournament and won the National Championship (UConn, 2011). The reverse can also be said, no team that has not won at least one conference tournament game has won the National Championship. But I wanted to look deeper.

Subsetting the Conferences

Every Conference Tournament champion automatically makes the NCAA Tournament. For a lot of conferences, this is the only team that makes it into the field. These teams, also, often do not win a single game. These conferences should be filtered out to really get at the underlying question.

Here I averaged the number of teams that make it each year for each conference and filtered out those that average less than 1.5. The reasoning is that more often than not these conferences only send one team. That left only 12 conferences: Conference USA, American Athletic (AAC), Atlantic 10, Atlantic Coast (ACC), Big 12, Big East, Big 10, Missouri Valley (MVC), Mountain West, Pac-12, SEC, and West Coast (WCC). These conferences and a graph showing the relationship between conference wins and tournament wins is shown below.

Data Analysis on the Performances in March Madness

Data on Power Conferences

After subsetting to conferences averaging more than 1.5 teams, we are down to 12. There does not seem to be any overwhelming trends, just a few possibly interesting observations. We can subset even further to the The Power Conferences. These are conferences that average sending 4 or more teams to the Dance.

Data Analysis on the Performances in March Madness


Though not strong, the trend seems to suggest that conference champions perform better. Big East and West Coast champions average twice as many wins as other teams that make it, although most of the wins from the WCC champion comes from Gonzaga, a perennial good team. AAC teams that win two conference games have a much higher average than other teams; however, the conference has not been around that long, so there is not enough data points to make any major conclusions. Not surprising, power conference teams overall average more wins in the NCAA Tournament. The Shiny App can be accessed here.

Code available onΒ GitHub

Photo by Markus Spiske on Unsplash

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI