Data Trends on US Obesity

Posted on Jul 30, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


The main objective of this study was to observe the data and demographics of obesity in the United States. Obesity remains an important issue because almost half of the world’s adult population could be overweight or obese by 2030 according to The Global Obesity Threat published by McKinsey&Company. Obesity is often related to health conditions such as heart disease, stroke, type 2 diabetes, certain types of cancer in addition to contributing to economic and productivity costs.

My main goal was to identify distinct segments with high obesity within the US population. Market segmentation could allow for more effective targeting of products and marketing campaign such as promotion of gym memberships, healthy eating programs, or healthcare discounts in order to move Americans towards healthier weights.

The datasets for analysis were obtained from The datasets include national and state specific data on youth and adults’ diet, physical activity, and weight status across years and throughout different demographics. Youth data were collected every two years from 2011 to 2015; and adult data were collected every year from 2011 to 2016. Please refer to my Shiny app ( for links to the raw datasets.

Data Analysis

The below heat maps show the percent population on the selective input chosen by the user. The selective inputs include obesity, diet, and physical activity attributes for both youth and adult groups. Looking at the obesity data, the southern states have the highest obese population, whereas states that have a more active lifestyle, such as Colorado and Hawaii, have the lowest obesity rates.

The user can further explore the patterns of different diet and physical activity across the states with the heat map. Another way to investigate the correlation between obesity and the behavior factors is through scatter plots.

Data Trends on US Obesity

Youth Obesity Trend Data 

The below scatter plots show a general trend between obesity versus the different behavior attributes that are included in the heat map selection. The data points on each scatter plot represent a different state’s data over time.

The graphs below show that states with higher percent population of fruit and vegetable consumption tend to have lower percent population of obesity. But interestingly, physical activity does not have a significant correlation with youth obesity levels. This suggests a diet-focused strategy might be more beneficial in moderating youth obesity.

Data Trends on US Obesity

Data Trends on US Obesity

Adult Obesity Trend Data

Adults’ obesity trend for fruit and vegetable consumption is similar to the youth group. On the other hand, physical activity also has a significant correlation with adult obesity. For each category of physical activity, the states with higher percent population that performs physical activity have lower obesity rates. The inverse relationship is shown for no physical activity. Both diet and physical activity play an important role in weight management for the adult group.

Demographic Trends

In addition to obesity data provided for each state, the datasets also provided a breakdown of obesity across different demographics for each state as well as the whole nation. Below are a few interesting trends I observed from different demographics at the national level.

For the age demographic, obesity tends to stay at the same level for youth group (age 14-17) but gradually increases in the young adult group (age 18-35) and reaches the highest from mid 40s to mid 50s.


For ethnicity demographic, obesity trends are similar among youth and adult groups. African American and Hispanic have higher obesity rates followed by Caucasian and Asian. But again, adults have a significantly higher obesity rate compared to the youth group.

Two additional demographic categories were provided in the adult dataset. The adult population with lower education (less than high school) have higher obesity rates (close to 40%) compared to college graduates (low 30%). Lower income population also tends to have a higher obesity rate.

Perhaps educational campaign on a healthy lifestyle should start at an earlier age. Weight loss products and health insurance plans could be personalized toward low education, low income, and even certain ethnic subgroups of the population.

Sharp rise of obesity in young adults

The previous demographic data showed a unifying trend where adults have higher obesity rates compared to youth across all years. I was curious to see how obesity changes across age and if the change can be explained by the behavioral factors captured in the dataset.

I took a snap shot in time and picked the most up-to-date year to construct the box plots below. The data points represent each state’s data in 2015 – the median obese population sharply increases in the young adult group from age 18-35.

Fruit consumption has no obvious trends across age, and interestingly, vegetable consumption rises with age. This might seem to be a contradiction to the earlier trend observed for vegetable consumption, but it is not entirely because the earlier trend was constructed within individual groups of adult data and youth data respectively, whereas the comparison below is made across groups. The box plots below simply indicate that the diet parameters from the dataset cannot explain the sharp increase in obesity for the young adult group which I was interested in.

As for physical activity, different categories of physical activity are either similar or higher for the young adult group. This again indicates that physical activity may not be the main contributor to the significant increase in young adult obesity. Other factors such as high carbohydrate diet, sleeping patterns, stress levels, and changes in metabolism should be investigated.


  • Youth obesity has a stronger correlation with fruit and vegetable consumption as opposed to physical activity. Parents and schools should focus on providing healthier diet to control youth obesity.
  • Hispanic and African American have highest prevalence of obesity, followed by Caucasian and Asians for both youth and adult. Populations with lower education and income have higher obesity rates. Although genetics play a role, controllable behavior such as diet and exercise can moderate obesity levels for adults.
  • As population with lower education and lower income have higher obesity rates, educational campaign on a healthy lifestyle should start at an earlier age. Weight loss products and health insurance plans can also be personalized toward certain subgroups of the population.
  • The sharp rise of obesity in young adults from age 18 to 35 could not be explained by the diet and physical activity attributes captured in this dataset. Other studies, such as sleeping patterns, stress levels, high carbohydrate diet, and metabolism could be conducted to determine the principal cause for the observed trend.


About Author

Kelly Ho

Kelly graduated from Cornell University with a Master of Engineering degree. She has three years of experience in analytics, statistical modeling, and providing data-driven recommendations for process improvement.
View all posts by Kelly Ho >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI