Examining Higher Education in the United States with the College Scorecard Dataset
The Department of Education collects a great deal of data from colleges and universities across the country and releases this data annually on a site called College Scorecard. The data is intended to help prospective college students and their families make decisions about which colleges are best for them.
I used R Shiny to create a tool which allows the College Scorecard data to be visualized via methods other than the ones already available on the College Scorecard site. My hope is that it will provide useful information about higher education in the United States both to those looking to find the right college and to those interested in trends in higher education as a whole.
I used the College Scorecard data from the school year 2014-2015. I restricted my analysis to institutions which offer bachelor's degrees and which provided data for all of the metrics I examined.
Questions for Analysis
- Of the three types of institutions (private, public, and for-profit), which offers the best value for its students?
- How do the institutions in each state compare to one another?
I created a Shiny application to help the analysis by allowing the data to be visualized in four different ways-- a data table view of the entire data set, density plots of single user-chosen variables, scatter plots of two user-chosen variables, and a geographic plot using Leaflet of state averages a user chosen variable. The data table view is useful for examining single data points and for finding the best and worst institutions for a given metric via the 'sort by column' feature. The single variable density plots reveal the approximate probability distribution for each variable and type of institution. The two variable scatter plots reveals the interaction between pairs of variables. The geographic plots show the mean value of the chosen variable by state.
The variables available to view are:
- Admission rate
- Average family income
- Median family income
- Default rate
- 3 year repayment rates for students from low, middle, and high income families
- Median debt
- Number of students
Examining the single variable plots for default rate and median debt, we see that the distribution of values for for-profit schools has a fatter right tail than those for public and private schools. We also note that there are many private schools whose median amount of debt is between $25,000 and $30,000.
The two variable scatter plots revealed connections between the default rate and both the median debt and median family income of students. We notice that beyond a threshold at around $27,000 of median debt, there are very few schools with low default rates.
The geographic plot reveals that colleges and universities in southern states have a higher average default rate than those in other states.
Thus the visualizations in my Shiny application allowed me to notice the high default rates and median debts in a segment of the for-profit schools, as well as the geographic trend in default rates.