What school is the best "bang for your buck"? An analysis of the College Scorecard.

Julia Goldstein
Posted on Jul 23, 2017


In 2017, Americans are burdened by more student loan debt than ever in history - over $1.4 trillion in debt among 44 million borrowers. That number is likely to grow as more people attend college and tuition continues to rise. At the same time, a college degree has become more of a necessity in today's economy, as the average salary of a worker with a college education is more than twice that of someone with a high school diploma.

Given the advantages of a college degree paired with rising costs, it is important for prospective students to be able compare costs across schools, as well as assess their post-graduation outcomes.

The Data 

I used the College Scorecard data released by the Department of Education for my analysis. The Department began releasing the College Scorecard in 2015 to improve transparency in higher education and hold colleges accountable for measures like value and quality. The full data set has information on almost 8,000 institutions in the United States, including community colleges, undergraduate schools, and post-graduate institutions like law and medical schools. It also contains over 1,500 variables, including:

  • School Type: Whether the institution’s governance structure is public, private nonprofit, or private for-profit.
  • Net Tuition Revenue: Tuition revenue minus discounts and allowances, divided by the number of full-time students.
  • Average Cost: Average annual cost of attendance, including tuition and fees, books and supplies, and living expenses for all students who receive federal aid.
  • Median Earnings:  Median earnings for all federally aided students. Data is available for each year starting six years after a student enrolls in college and up to 10 years after the student enrolls (for this analysis, I used 10-year earnings data).
  • Median Graduate Debt: The median loan debt accumulated at the institution by all student borrowers of federal loans (debt for students who left the institution before graduating is tracked separately).
  • Default Rate: The three-year cohort default rate percentage at the institution.

It is important to mention that many of these variables, including median earnings and graduate debt, only apply to student borrowers of federal loans and may not be representative of students who have private loans or no student debt.

Research Questions

For my analysis, I specifically looked at institutions that offer four-year undergraduate degrees and focused on variables related to cost and post-graduation outcomes. I attempted to answer the following questions:

  1. Best Value Schools: Which schools both cost less and provide students with higher earning potential? Which schools have high overall costs but poor outcomes?
  2. State Variation: Do college costs and outcomes vary by state? What states have students with the highest and lowest earnings?
  3. Outcomes by School Type: Does the data validate recent coverage of private for-profit schools? Specifically, do they target low-income students and result in worse outcomes?


Best Value Schools

First, I wanted to get a sense of earnings and employment prospects of former students and compare that against the average cost of each school. For prospective students considering loans to pay for college, it might be valuable to understand where they can get the best "bang for their buck" - schools with low average costs but relatively high earnings among former students. I graphed average costs for each institution against 10-year median earnings, separated by school type.

Some insights from the graph included:

  • Unsurprisingly, public institutions have lower average costs overall than private non-profit or private for-profit schools. Some of the schools with the lowest costs and highest median earnings include selective public schools such as University of Virginia, the University of California schools, and University of Michigan.
  • Program emphasis likely has an impact on student outcomes. Schools with high proportions of STEM majors (Georgia Institute of Technology, New Jersey Institute of Technology, Massachusetts Institute of Technology) have relatively high median earnings.
  • Less selective (60%+ admission rate) private colleges are most likely to have the highest annual costs and relatively low median earnings. While there are many factors that influence college selection, this is something students should keep in mind, especially if they plan on borrowing money to attend.

State Variation

Next, I was curious to see what average costs, debt, and earnings look like across the United States. I used the College Scorecard data, grouped by state, to create a heat map of each value with leaflet.

From mapping the data, I found that:

  • While Massachusetts has the highest overall cost for college, Delaware is the state with the highest student debt after graduation ($27,546).
  • It is not surprising that states with high costs of living (e.g., the Northeast and California) have high college costs, since cost takes living expenses into account.
  • Wyoming is the state with both the lowest cost and lowest debt in the country, though that is based on a limited number of data points.
  • Many states in the South and Midwest with low overall college costs have comparatively high post-graduation debt (e.g.,  Alabama, Mississippi). Given that those states have lower median household incomes, students in those states may still need to take out larger federal loans despite lower college costs.
  • Finally, earnings data by state looks similar to existing data about median household incomes. The District of Columbia has the highest overall earnings ($50,656) in the country, along with many states in the tri-state area. Mississippi has the lowest overall earnings ($33,320), followed by South Carolina ($34,570). This data is based on information from students' W-2 forms, and is not adjusted for cost of living.

Outcomes by School Type

Finally, I wanted to see whether there was a difference across school type based on several different variables. Specifically, I wanted to look at demographic and outcomes data for private for-profit colleges, which have received criticism in the United States for their predatory recruitment practices and poor post-graduation opportunities.

Insights included:

  • Interestingly, the median family income of for-profit college students is concentrated at the very low end of the scale and is significantly lower than family income for both public and private nonprofit students. This is notable because, overall, public colleges have a lower price tag than for-profit colleges.
  • Three-year default rates are also higher among for-profit college graduates. The Department of Education withholds federal loans from many for-profit colleges because of their high default rates, so these figures do not even include default rates for students with private loan debt.
  • Median earnings appear similarly dispersed across all school types, which was unexpected. Earnings are concentrated around $40,000 with tails at both ends.


The College Scorecard data definitely has its shortcomings - much of its data is based on students who have federal loans, and it may not completely represent the full undergraduate population. However, it also provides a trove of information that was previously unavailable, including data on student outcomes. While no single data point can capture a school's "value," the College Scorecard is a very useful resource for prospective college students to understand and compare different schools across a variety of important metrics. I invite you to interact with my Shiny App to further explore the data and my insights.

Link to my GitHub.

About Author

Julia Goldstein

Julia Goldstein

Julia has over five years of experience delivering business insight through data analysis and visualization. As an analytics and management consultant, she was responsible for managing projects, identifying solutions, and developing support among senior-level stakeholders. Moving forward, Julia...
View all posts by Julia Goldstein >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp