Data Analysis of New York City Leading Causes of Death

Posted on Jan 31, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


The leading causes of death has a significant impact on different race ethnicity in the New York City.  A race might have more deaths than other race from one leading cause to another leading cause. There might be correlation in leading causes, deaths, race ethnicity, sex, and heath indicators.  The data analysis would be useful for department of health in disease control and medical research.  It would be interesting to find out the following questions by analyzing NYC Leading Causes of Death data (2007-2014) and NYC Health Indicators (2012-2014).

  • How are deaths changing over time? Are there more deaths for males than females? What is the trend for individual race?
  • Which leading cause has the main impact on the change of the trend?
  • Which race has a higher death number? Which leading cause contributes to it?
  • How do the health indicators show the related death causes?

Data Sources

NYC Leading Causes of Death data (2007-2014):

NYC Health Indicators (2012-2014):

Data Cleaning

Merged the NYC Health Indicators data table to NYC Leading Causes of Death data table.

Created new columns from NYC Health Indicators and used R manually updated the column values.

A glance at the Shiny App 

Shiny App

The shiny app gives users the capability in comparing deaths by leading cause, year, sex, and race ethnicity.

Data Analysis of New York City Leading Causes of Death

The 2007-2014 death trend shows there is a dramatical drop from 2008 to 2009, and the year of 2012 has the least death.  This is because of the death of some major leading causes have increased.  From 2008 to 2009: Chronic Liver Disease and Cirrhosis, Nephritis, Nephrotic Syndrome and Nephrisis. From 2011 to 2012: Mental and Behavioral Disorders due to Accidental Poisoning and Other Psychoactive Substance Use, Human Immunodeficiency Virus Disease.


Data Analysis

We have seen the death trend for male and female together.  Now we look at the death trends for the male and female.

Data Analysis of New York City Leading Causes of Death

The females have more deaths than males.  The trend for females is very similar to the trend for all males and females together.  This tells us that female deaths a significant impact on overall death trend.

race trend

The death trends for different race ethnicity shows White Non-Hispanic has the highest deaths with a downward sloping trend.  This means the health of white Non-Hispanic has been improved over time.  Asian and Pacific Islander has the least deaths with a upward sloping trend.  Their health is getting worse gradually.  The shiny App gives users the capability in looking at the trend for individual race and comparing the trend between race ethnicity.

Data Analysis of New York City Leading Causes of Death

Heart Disease

Heart Diseases is the number one leading cause but there might be different types of heart diseases. Also, all other causes are ranking number three. What are all other causes? Perhaps, health indicators might tell the story behind it. Coronary Heart Disease has high mortality for Black Non-Hispanic and White Non-Hispanic. Black mortality are slightly higher than white for the heart disease/stroke. White has a high risk on unintentional injury and elderlies have higher risk to fall. Black has high risk of Asthma/Chronic Lower Respiratory Hospitalizations. White has high suicide mortality and black has high drug related hospitalizations. Black has higher risk in diabetes. Both black and white are at high risk in cancer. Black has higher birth related mortality.


We see heart diseases for blacks and whites are higher than other races.  We can say they are the majority of races in having more heart diseases.



Whites have higher suicide mortality deaths.  Also, we are surprised that Asian and Pacific Islander is with the number 2 ranking.



Black has more diabetes mortality.


Black has more cancer mortality.  Especially, Female breast cancer mortality is for higher for both blacks and whites.

For more details of the codes, please see R Shiny Source Code


Through the data exploratory data analysis, we conclude that the death trend of females has a significant impact on the overall trend. The health of females got significant improvement in 2009. Females have more deaths than males and breast cancer is one of the major leading causes of death for females. Black and White have higher health issues/hospitalizations than other races.


About Author

Yaxiong Huang

Tommy Huang received his Master of Arts in Statistics at Hunter College and Bachelor of Science in Mathematics and Economics at College of Staten Island. He has 7 years of experience in catastrophic modeling research for the insurance...
View all posts by Yaxiong Huang >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI