Data Analysis on Infant Natality & Mortality Rates

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Reach out to us via LinkedInΒ 

View the Github repo for this project here


Introduction

The Center for Disease Control (CDC) annually publishes material related to the birth rates and death rates of infants born in the United States. They gather a tremendous amount of data relevant to the child, including categories such as the education of the parents, age, health status, and tobacco use. Other categories include the health status of the new born, APGAR score, delivery method used, etc.

There is countless material gathered for every child born in the United States. Due to the large volume of data available, it can be hard to keep track of it all. That being said, we as a team of Data Scientists dove right into this data in an attempt to uncover any interesting relationships between maternal health (i.e. mother's BMI and smoking habits) and subsequent infant natality and mortality rates.Β 

 

GoalData Analysis on Infant Natality & Mortality Rates

The goal of this project was two-fold:

  1. To analyze and interpret the data collected from the CDC website on infant natality and mortality rates
  2. To develop a user-friendly application that aims to educate expecting mothers about the potential impacts their lifestyle choices may have on the health of their future child.

Please follow along with our custom Shiny App as you peruse through the rest of this blog post.

 

Data Acquisition

All of the data used in this analysis was collected through the CDC website, which includes:Β 

  • Infant Natality Data, 2016-18
  • Period Linked Infant Birth & Death Data, 2016-17Β 

Since the original dataset from the CDC was quite large (~3.8 million observations and 90+ variables per year of data), we built a custom parser to only extract data that was relevant to our scope of work. Exploratory data analysis (EDA) was conducted in both Python and R.

Our team focused on the following maternal factors which include, but are not limited to:Β 

    • DemographicsΒ 
      • Age
      • Education Level
      • Race
    • Maternal Health
      • BMI (weight)
      • Infections (e.g. syphilis)
      • Risk Factors (e.g. pre-pregnancy diabetes)
      • Tobacco Use
    • Infant Health
      • Gestation Period (in weeks)
      • Birth Weight
      • APGAR Score
      • Infant Survival

Once the data was cleansed of missing data values and irrelevant variables, our team created several visualizations based on our analysis of the key factors listed above. We then built an interactive web application via R Shiny where expecting mothers or couples wishing to start a family can explore national statistics as well as customized statistics based on their own demographic or current health conditions.

 

Exploratory Data Analysis

Data presented in the app includes, but is not limited to as follows:Β 

  • The graph below represents the death toll of infants born in relation to the average age of the mother; mothers aging with 15-19 and 45-49 are shown to have the highest level of death rates.Β 

Data Analysis on Infant Natality & Mortality Rates

  • The graph represents the average BMI level of the mother in regards to the gestation period of the infant in weeks, it is shown that mothers with lower BMI levels are expected to have higher gestation periods compared to those with high BMI levels.Β 

1st Trimester Data

Data Analysis on Infant Natality & Mortality Rates

  • This plot shows the relationship between the daily rate of cigarettes the mother smoked during the 1st trimester of their pregnancy and their respective ages.
  • The first two trimesters show a similar trend in what is occurring with the mother regarding their level of smoking. The rates do not go above 1.5 in either trimester, however, there does appear to be significant variance within smoking trends especially for mothers that are in their 40s.

2nd Trimester DataΒ 

Data Analysis on Infant Natality & Mortality Rates

  • This plot shows the relationship between the daily rate of cigarettes the mother smoked during the 2nd trimester of their pregnancy and their respective ages.
  • The first two trimesters show a similar trend in what is occurring with the mother regarding their level of smoking. The rates do not go above 1.5 in either trimester, however, there does appear to be significant variance within smoking trends especially for mothers that are in their 40s.

3rd Trimester Data

Data Analysis on Infant Natality & Mortality Rates

  • This plot shows the relationship between the daily rate of cigarettes the mother smoked during the 3rd and final trimester of their pregnancy and their respective ages.
  • The third trimester is interesting in that it starts off with a high smoking rate, but then sharply drops and stays at about 1, and stays relatively flat up until the mother is in their 40s.
  • A big consistency shown here is that mothers in their 40s tend to smoke the most while being pregnant.

 

  • Above is an example of what the app would show the mother based on the data that we have gathered and applied to the app based on the CDC data implemented.
  • As exemplified, a mother between the ages of 25-29 holding a bachelors degree who is Caucasian has an infant mortality rate of 3 deaths per 1,000 births.

 

Further Development

So far the app applies data implemented together that helps gives mothers the correct information they need for their child. Further development would include applying useful information for hospitals and doctors to use in order for them to understand what is needed for the infant, what health recommendations they could give to the to the mother, to the father, etc.

Given more time and resources, our team's next steps would be to find other maternal factors beyond maternal weight and tobacco use while possibly making predictions using a machine learning model. These tools could be used to find further insights such as:Β 

    • The variables that are the strongest predictors of an infant's overall survival rate
    • The effect (if any) that geographic region (mother's state/county of residence) has on infant natality and mortalityΒ 
    • Differences in an infant's health conditions based on the mother's and father's demographic background

Thank you for the taking the time to read our blog post! Please don't hesitate to reach out to us with any questions, comments or concerns regarding this project.

About Authors

Jason Hoffmeier

Jason Hoffmeier is a NYC Data Science fellow that currently resides in New York City. He has a Masters Degree in Systems Engineering from SUNY Binghamton, and has recently earned his Lean Six Sigma Black Belt for quality...
View all posts by Jason Hoffmeier >

Baptiste Mokas

Hello! , I am a student-researcher in cognitive and mathematical bioscience dedicated to the modeling of the integration of information in complex adaptive and multiobjective systems. I use a lot of datascience tools for my work. Always ready...
View all posts by Baptiste Mokas >

Edwin Back

Graduated from the University of Michigan in 2015 with a BSE in Environmental Engineering. Data Analyst with a robust understanding of probability and statistics backed by 2 years of professional work involving business data analytics (sales/marketing/real estate), environmental...
View all posts by Edwin Back >

Mike Lim

Michael Lim is a Data Analyst with a strong quantitative foundation and several years of industrial engineering. His collection of technical skills includes Python, R, and SQL with an emphasis on data analysis and visualization techniques and found...
View all posts by Mike Lim >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI