Examining Higher Education in the United States with the College Scorecard Dataset

Avatar
Posted on Oct 15, 2017

Introduction
The Department of Education collects a great deal of data from colleges and universities across the country and releases this data annually on a site called College Scorecard. The data is intended to help prospective college students and their families make decisions about which colleges are best for them.

I used R Shiny to create a tool which allows the College Scorecard data to be visualized via methods other than the ones already available on the College Scorecard site. My hope is that it will provide useful information about higher education in the United States both to those looking to find the right college and to those interested in trends in higher education as a whole.

Dataset
I used the College Scorecard data from the school year 2014-2015. I restricted my analysis to institutions which offer bachelor's degrees and which provided data for all of the metrics I examined.

Questions for Analysis

  • Of the three types of institutions (private, public, and for-profit), which offers the best value for its students?
  • How do the institutions in each state compare to one another?

Shiny Application
I created a Shiny application to help the analysis by allowing the data to be visualized in four different ways-- a data table view of the entire data set, density plots of single user-chosen variables, scatter plots of two user-chosen variables, and a geographic plot using Leaflet of state averages a user chosen variable. The data table view is useful for examining single data points and for finding the best and worst institutions for a given metric via the 'sort by column' feature. The single variable density plots reveal the approximate probability distribution for each variable and type of institution. The two variable scatter plots reveals the interaction between pairs of variables. The geographic plots show the mean value of the chosen variable by state.

The variables available to view are:

  • Admission rate
  • Average family income
  • Median family income
  • Default rate
  • 3 year repayment rates for students from low, middle, and high income families
  • Median debt
  • Number of students

Findings

Examining the single variable plots for default rate and median debt, we see that the distribution of values for for-profit schools has a fatter right tail than those for public and private schools.  We also note that there are many private schools whose median amount of debt is between $25,000 and $30,000.

The two variable scatter plots revealed connections between the default rate and both the median debt and median family income of students.  We notice that beyond a threshold at around $27,000 of median debt, there are very few schools with low default rates.

The geographic plot reveals that colleges and universities in southern states have a higher average default rate than those in other states.

Thus the visualizations in my Shiny application allowed me to notice the high default rates and median debts in a segment of the for-profit schools, as well as the geographic trend in default rates.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career citibike clustering Coding Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job JP Morgan Chase Kaggle lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Portfolio Development prediction Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping What to expect word cloud word2vec XGBoost yelp