Data Analysis on the Performances in March Madness
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Every year in March, the NCAA holds its March Madness basketball tournament to crown a National Champion. Prior to this, many conferences also hold a conference tournament to name their Conference Champions. I've often wondered 'Is there any correlation to how a team peforms in their conference tourney and how they perform in the Big Dance?' So I decided to dive into the data.
The data used was taken from the NCAA and Google Cloud ML Kaggle competition. The competition this data was taken from did not occur because the 2020 NCAA Tournament was cancelled due to COVID-19. Conference Tournament data only went back to 2001, so I truncated all the data to 2001.
Overall, There doesn't seem to be too much of a correlation. It could be said that only one team has ever won more than 3 games in their conference tournament and won the National Championship (UConn, 2011). The reverse can also be said, no team that has not won at least one conference tournament game has won the National Championship. But I wanted to look deeper.
Subsetting the Conferences
Every Conference Tournament champion automatically makes the NCAA Tournament. For a lot of conferences, this is the only team that makes it into the field. These teams, also, often do not win a single game. These conferences should be filtered out to really get at the underlying question.
Here I averaged the number of teams that make it each year for each conference and filtered out those that average less than 1.5. The reasoning is that more often than not these conferences only send one team. That left only 12 conferences: Conference USA, American Athletic (AAC), Atlantic 10, Atlantic Coast (ACC), Big 12, Big East, Big 10, Missouri Valley (MVC), Mountain West, Pac-12, SEC, and West Coast (WCC). These conferences and a graph showing the relationship between conference wins and tournament wins is shown below.
Data on Power Conferences
After subsetting to conferences averaging more than 1.5 teams, we are down to 12. There does not seem to be any overwhelming trends, just a few possibly interesting observations. We can subset even further to the The Power Conferences. These are conferences that average sending 4 or more teams to the Dance.
Though not strong, the trend seems to suggest that conference champions perform better. Big East and West Coast champions average twice as many wins as other teams that make it, although most of the wins from the WCC champion comes from Gonzaga, a perennial good team. AAC teams that win two conference games have a much higher average than other teams; however, the conference has not been around that long, so there is not enough data points to make any major conclusions. Not surprising, power conference teams overall average more wins in the NCAA Tournament. The Shiny App can be accessed here.
Code available on GitHub