Examining Higher Education in the United States with the College Scorecard Dataset

Posted on Oct 15, 2017

The Department of Education collects a great deal of data from colleges and universities across the country and releases this data annually on a site called College Scorecard. The data is intended to help prospective college students and their families make decisions about which colleges are best for them.

I used R Shiny to create a tool which allows the College Scorecard data to be visualized via methods other than the ones already available on the College Scorecard site. My hope is that it will provide useful information about higher education in the United States both to those looking to find the right college and to those interested in trends in higher education as a whole.

I used the College Scorecard data from the school year 2014-2015. I restricted my analysis to institutions which offer bachelor's degrees and which provided data for all of the metrics I examined.

Questions for Analysis

  • Of the three types of institutions (private, public, and for-profit), which offers the best value for its students?
  • How do the institutions in each state compare to one another?

Shiny Application
I created a Shiny application to help the analysis by allowing the data to be visualized in four different ways-- a data table view of the entire data set, density plots of single user-chosen variables, scatter plots of two user-chosen variables, and a geographic plot using Leaflet of state averages a user chosen variable. The data table view is useful for examining single data points and for finding the best and worst institutions for a given metric via the 'sort by column' feature. The single variable density plots reveal the approximate probability distribution for each variable and type of institution. The two variable scatter plots reveals the interaction between pairs of variables. The geographic plots show the mean value of the chosen variable by state.

The variables available to view are:

  • Admission rate
  • Average family income
  • Median family income
  • Default rate
  • 3 year repayment rates for students from low, middle, and high income families
  • Median debt
  • Number of students


Examining the single variable plots for default rate and median debt, we see that the distribution of values for for-profit schools has a fatter right tail than those for public and private schools.  We also note that there are many private schools whose median amount of debt is between $25,000 and $30,000.

The two variable scatter plots revealed connections between the default rate and both the median debt and median family income of students.  We notice that beyond a threshold at around $27,000 of median debt, there are very few schools with low default rates.

The geographic plot reveals that colleges and universities in southern states have a higher average default rate than those in other states.

Thus the visualizations in my Shiny application allowed me to notice the high default rates and median debts in a segment of the for-profit schools, as well as the geographic trend in default rates.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup music Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp