Data Study on U.S. Health Insurance

Posted on Oct 22, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Is it possible to purchase a health insurance plan with lower rates and more benefits? The answer is yes. This project provides some insights on this question. Firstly, the project analyzes  the factors driving plan rates based on over 12 million records of health insurance marketplace data. Secondly, according to over 5 million records of plan benefits data, the project provides  information about different plan types: their benefits, variety and specialty.

This project's dataset comes from, and is originally from All the data used in the project are from 2014 to 2016.  

Data on Plan Rate

Age vs. Plan Rate

Besides your name, the first question when you try to get a health insurance rate quote is your age. In order to regulate the marketplace and to limit the amount that can be charged to older people, the Department of Human Health Services(HHS) established the federal default standard age curve, which has been in effect since 2014.

Data Study on U.S. Health Insurance

For rating purpose, HHS defines a single age band for children, age 0 to 20, and older adults, age 64 and over, and one-year age band for adults, age 21 through 63. As shown in the chart above, the health insurance plan rates increase with age, and the maximum ratio for age rating is 3:1. That is, for the same insurance plan, if the rates for a 21 year-old adult is 100 dollars, the rates for the older adults cannot be over 300 dollars. However, age is a factor that we cannot change. Let's explore some controllable factors of plan rates.

Plan Length vs. Plan Rate

The second question in a rate quote might be the plan length. The choices are various, like 3 months, 6 months, 12 months, etc. Here's a graph for each age group, showing how  plan rates are affected by the plan length. The project used month as a length unit, and all the plan rates mentioned in the article are monthly plan rates.

Data Study on U.S. Health Insurance

The graph shows that for each age group, the average rates are lowest for 12-month plans. Another good choice of plan length is 6 months. Why are the rates of short term or even longer term so much higher? One of the possible reasons is that insurance companies might consider customers buying a short-term or long-term plan for a special medical purpose. 6-month and 12-month plans are considered as "regular" plans, by default. For the plan length option, 12-month is the best choice.

However, the average rates shown in the bar chart seem too high. Let's take a look at the raw data shown in a violin plot.

Data Study on U.S. Health Insurance

The plot above showing the raw data density reveals two main trends: one is the going-up trend, the other is the flat trend at the bottom. What does this mean? It means that for each age group at least two price levels of insurance plans are available: cheap ones and expensive ones. The seemingly weird plots at the bottom actually provide  some good news. No matter what age group you are in, plenty of free health insurance plans are provided by insurance companies.

Tobacco Policy vs. Plan Rate

Another question most often asked in rate quote process is "Do you smoke?". Why asked? Because it could affect your plan rates! The project found that insurance company's policy about tobacco use, meaning whether the company cares about customers using tobacco or not, is another important factor of plan rates.


Besides the increasing rates with age, rates under different company policy vary accordingly. Those companies that have no preference on customers' tobacco use provide relatively lower rates to the public. In other words, when people purchase insurance plans, and no questions are asked about tobacco use, the rates might be lower than others with that question.

Family Option, Region vs. Plan Rate

A family option is the most effective way to cut expenses. According to the project's analysis, buying with dependents, or as a family, will reduce plan rates massively. Since region, like age, has the same effect across different factors on plan rates, here we combine region with a family option as an example.


As shown above, even the highest average rate is only $44.81 per month. Due to the limitations of this blog post, the interactive feature of the chart cannot be shown below. New Jersey, as an example, is illustrated in the chart. In general, rates vary across states, and the darker the color, the higher the average plan rates.


Data on Plan Benefit

Plan Type vs. Benefit Variety

There are five types of plans in the marketplace: HMO, PPO, POS, EPO, and Indemnity. Here's a link to understand the differences of the five plan types. The left graph shows benefit numbers for each type of coverage. The right graph represents the cover ratio of individual insurance plan under each plan type in the marketplace.



It seems that the HMO has the largest marketplace. However, that doesn't mean that HMOs provide the most benefits.



In fact, the PPO plan type covers more kinds of benefits. In other words, what the graphs illustrate is that when people have no preference about benefits, PPOs and HMOs are two better choices.

Plan Type vs. Benefit Specialty

If people have a preference about benefits, which plan type should they choose? Below is a list containing the unique benefits covered only by specific plan types. Therefore, when we have preference in the fields mentioned below, we know which type to choose.

  • PPO - Prescription Drugs: non-/preferred brand, off label, other
  • HMO - Inpatient Rehabilitation; Sterilization; Naprapathic services
  • POS - Therapeutic radiology; Telehealth visit: PCP, specialist; Optical services
  • EPO - Mastectomy related coverage; Wellness plan benefit
  • INDEMNITY - Dental


Age, plan length, tobacco policy, family option, and region are possible factors of plan rates. If you choose a one-year plan, an insurance company that doesn't have a tobacco policy, and  purchase with dependents, you could get a lower rates.

PPOs and HMOs cover more benefits than other kinds of health insurance plan types. However, each type has their own benefit specialty.


R code is here:

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI