Data Analysis of Public vs Private Institutions

Posted on Oct 24, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


In 2008, I was a senior in high school applying to colleges around the country, eager to start the next phase of my life. Unfortunately, my college application and the data cycle fell right in the middle of our recent recession caused by the collapse of the housing market in 2006.

Although I had gained admission to Northwestern University, a prestigious private school, I had to decide if it was worth spending my parent’s life savings as well as taking out large loans. My other choice was my native state’s public school, State University of New York (SUNY), Binghamton. Although SUNY Binghamton was not as prestigious, it would have cost me 4 times less, and majorly reduced the financial burden on my family. Ultimately, I had to choose between an expensive private college, or go a cheaper public college. I ended up going to Northwestern University, but if I had more data on the differences, I might have chose differently.

It is not uncommon for students to have to choose between more prestigious private schools and cheaper public schools. However, as college tuition, student debt, and the need for a college degree are on the rise, it is becoming ever more important to choose carefully. In fact, regarding the student debt bubble, billionaire entrepreneur Mark Cuban has said we are “going to see a repeat of what we saw in the housing market...”1. Furthermore, according to, the cumulative U.S. student debt is over $1.45 trillion dollars, more than the total credit card and auto debt in the country.


In this blog, we will do an exploratory analysis of the data released by the U.S. Department of Education and look at the costs and benefits between public and private colleges. This blog will focus primarily on predominately bachelor’s degree granting schools and the latest available year’s data (2013). This analysis will focus on the cost, debt, and earning aspect. We will see a U.S. map of the cost and adjusted cost, density plots of the median debt and median earnings after graduation, and lastly a scatter plot of net cost vs. median earning.


Data on U.S. Map of Cost and Adjusted Net Cost

Data Analysis of Public vs Private Institutions Data Analysis of Public vs Private Institutions

 The cost of attendance is the college’s reported estimate of total cost needed per year, this includes tuition, living expenses, fees, etc. The public school map is dominated by green and yellow points ($10,000 to $30,000), whereas the the private school map is dominated by orange and red points ($30,000+). It seems that private colleges are roughly about $20,000 more expensive. In the above plots, we confirm that most private institutions are more expensive than public institutions.

Data Analysis of Public vs Private Institutionsprivate_net

However, when we plot the adjusted net cost (cost of attendance minus average grants and scholarships) we can see the difference in cost is actually smaller, where the private institutions provide more financial aid but still ultimately cost more. The public school map is dominated by blue and green dots ($0 to $20,000), whereas the private school map is dominated by green and yellow dots ($10,000 to $30,000). Roughly speaking, the public institutions give about $10,000 aid, whereas private institutions give about $20,000 aid.


Density Graphs of Debt and Median Earning Data


In the median debt density graph, the public and private graphs have different peaks and a portion that overlaps. From this graph, it seems that most of the median debt from public schools are less than private schools. This makes sense since in the previous U.S. maps, private school costs more, so logically students will have to take out more loans to pay for tuition.


Surprisingly in the median density graphs, the median earnings (10 years after graduation) between public and private schools have little visual difference. The density graphs seem to fall on top one of another, with the peaks almost aligned, but the private density graph has a little more variance. It seems that regardless of private or public schools, in general, the earnings are about the same.


Scatter Plot of Net Cost vs. Median Earning Data


In order to get a better understanding of how cost and earnings are related to each other for each school, it is desirable to show a scatter plot of net cost and median earning. Here it seems that the public and private institutions form their own clusters. With the public school cluster being cheaper than the private school cluster, but at about the same earning level. However, there is also a small portion of these two clusters that overlap.


To understand the distributions a little more clearly, 2D density contours are overlaid on top of the scatter points to illustrate where the highest density regions are. The innermost contour line shows the densest region of each group. Here it can be seen that even though the two college types may overlap, the peak of each group are separated.



In general, the cost of private institutions are roughly on the order of $20,000 more than expensive than public schools. However, private institutions give on the order of $10,000 more financial aid, resulting in private schools only be on the order of $10,000 greater in net cost. On the other side of the analysis, in general, students tend to leave private institutions with more debt, but earn about the same amount after graduation. Finally, net cost does not seem to generate more earnings, which results in public schools being cheaper but earning around the same as private schools. With all else being, it is recommended to go to public schools to save money.


Future Steps
In this analysis, we focused on a high level overview of whether the monetary investment in more expensive colleges (private institutions) is worth it based on how much debt and earnings one comes out with after graduation. However, college is more than just money in and money out, there are many other factors that define a good and worthwhile investment. These factors may include, faculty to student ratio, school size, location, types of programs and many others.

In the future, a deeper analysis will be done to include these intangible factors. In addition, this study only looked at predominately bachelor’s degree granting institution, it would be enlightening to see how schools that grant different level of degrees such as associate degrees, medical degrees, etc. play into earnings and investment returns.

About Author

Nelson Chen

Nelson has a Bachelor's degree from Northwestern University and a Master's degree from University of California, Berkeley in Mechanical Engineering. His graduate work specialized in developing and applying new Computational Fluid Dynamic algorithms to astrophysical fluid dynamic problems...
View all posts by Nelson Chen >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI