A Data analysis of the College Scorecard.
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
In 2017, data shows Americans are burdened by more student loan debt than ever in history - over $1.4 trillion in debt among 44 million borrowers. That number is likely to grow as more people attend college and tuition continues to rise. At the same time, a college degree has become more of a necessity in today's economy, as the average salary of a worker with a college education is more than twice that of someone with a high school diploma.
Given the advantages of a college degree paired with rising costs, it is important for prospective students to be able compare costs across schools, as well as assess their post-graduation outcomes.
I used the College Scorecard data released by the Department of Education for my analysis. The Department began releasing the College Scorecard in 2015 to improve transparency in higher education and hold colleges accountable for measures like value and quality. The full data set has information on almost 8,000 institutions in the United States, including community colleges, undergraduate schools, and post-graduate institutions like law and medical schools. It also contains over 1,500 variables, including:
- School Type: Whether the institution’s governance structure is public, private nonprofit, or private for-profit.
- Net Tuition Revenue: Tuition revenue minus discounts and allowances, divided by the number of full-time students.
- Average Cost: Average annual cost of attendance, including tuition and fees, books and supplies, and living expenses for all students who receive federal aid.
- Median Earnings: Median earnings for all federally aided students. Data is available for each year starting six years after a student enrolls in college and up to 10 years after the student enrolls (for this analysis, I used 10-year earnings data).
- Median Graduate Debt: The median loan debt accumulated at the institution by all student borrowers of federal loans (debt for students who left the institution before graduating is tracked separately).
- Default Rate: The three-year cohort default rate percentage at the institution.
It is important to mention that many of these variables, including median earnings and graduate debt, only apply to student borrowers of federal loans and may not be representative of students who have private loans or no student debt.
For my analysis, I specifically looked at institutions that offer four-year undergraduate degrees and focused on variables related to cost and post-graduation outcomes. I attempted to answer the following questions:
- Best Value Schools: Which schools both cost less and provide students with higher earning potential? Which schools have high overall costs but poor outcomes?
- State Variation: Do college costs and outcomes vary by state? What states have students with the highest and lowest earnings?
- Outcomes by School Type: Does the data validate recent coverage of private for-profit schools? Specifically, do they target low-income students and result in worse outcomes?
Best Value Schools
First, I wanted to get a sense of earnings and employment prospects of former students and compare that against the average cost of each school. For prospective students considering loans to pay for college, it might be valuable to understand where they can get the best "bang for their buck" - schools with low average costs but relatively high earnings among former students. I graphed average costs for each institution against 10-year median earnings, separated by school type.
Some insights from the graph included:
- Unsurprisingly, public institutions have lower average costs overall than private non-profit or private for-profit schools. Some of the schools with the lowest costs and highest median earnings include selective public schools such as University of Virginia, the University of California schools, and University of Michigan.
- Program emphasis likely has an impact on student outcomes. Schools with high proportions of STEM majors (Georgia Institute of Technology, New Jersey Institute of Technology, Massachusetts Institute of Technology) have relatively high median earnings.
- Less selective (60%+ admission rate) private colleges are most likely to have the highest annual costs and relatively low median earnings. While there are many factors that influence college selection, this is something students should keep in mind, especially if they plan on borrowing money to attend.
Next, I was curious to see what average costs, debt, and earnings look like across the United States. I used the College Scorecard data, grouped by state, to create a heat map of each value with leaflet.
From mapping the data, I found that:
- While Massachusetts has the highest overall cost for college, Delaware is the state with the highest student debt after graduation ($27,546).
- It is not surprising that states with high costs of living (e.g., the Northeast and California) have high college costs, since cost takes living expenses into account.
- Wyoming is the state with both the lowest cost and lowest debt in the country, though that is based on a limited number of data points.
- Many states in the South and Midwest with low overall college costs have comparatively high post-graduation debt (e.g., Alabama, Mississippi). Given that those states have lower median household incomes, students in those states may still need to take out larger federal loans despite lower college costs.
- Finally, earnings data by state looks similar to existing data about median household incomes. The District of Columbia has the highest overall earnings ($50,656) in the country, along with many states in the tri-state area. Mississippi has the lowest overall earnings ($33,320), followed by South Carolina ($34,570). This data is based on information from students' W-2 forms, and is not adjusted for cost of living.
Outcomes by School Type
Finally, I wanted to see whether there was a difference across school type based on several different variables. Specifically, I wanted to look at demographic and outcomes data for private for-profit colleges, which have received criticism in the United States for their predatory recruitment practices and poor post-graduation opportunities.
- Interestingly, the median family income of for-profit college students is concentrated at the very low end of the scale and is significantly lower than family income for both public and private nonprofit students. This is notable because, overall, public colleges have a lower price tag than for-profit colleges.
- Three-year default rates are also higher among for-profit college graduates. The Department of Education withholds federal loans from many for-profit colleges because of their high default rates, so these figures do not even include default rates for students with private loan debt.
- Median earnings appear similarly dispersed across all school types, which was unexpected. Earnings are concentrated around $40,000 with tails at both ends.
The College Scorecard data definitely has its shortcomings - much of its data is based on students who have federal loans, and it may not completely represent the full undergraduate population. However, it also provides a trove of information that was previously unavailable, including data on student outcomes. While no single data point can capture a school's "value," the College Scorecard is a very useful resource for prospective college students to understand and compare different schools across a variety of important metrics. I invite you to interact with my Shiny App to further explore the data and my insights.
Link to my GitHub.