Data Analysis of New York City Leading Causes of Death
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction
The leading causes of death has a significant impact on different race ethnicity in the New York City. A race might have more deaths than other race from one leading cause to another leading cause. There might be correlation in leading causes, deaths, race ethnicity, sex, and heath indicators. The data analysis would be useful for department of health in disease control and medical research. It would be interesting to find out the following questions by analyzing NYC Leading Causes of Death data (2007-2014) and NYC Health Indicators (2012-2014).
- How are deaths changing over time? Are there more deaths for males than females? What is the trend for individual race?
- Which leading cause has the main impact on the change of the trend?
- Which race has a higher death number? Which leading cause contributes to it?
- How do the health indicators show the related death causes?
Data Sources
NYC Leading Causes of Death data (2007-2014): https://catalog.data.gov/dataset/new-york-city-leading-causes-of-death-ce97f
NYC Health Indicators (2012-2014): https://www.health.ny.gov/statistics/community/minority/county/newyorkcity.htm
Data Cleaning
Merged the NYC Health Indicators data table to NYC Leading Causes of Death data table.
Created new columns from NYC Health Indicators and used R manually updated the column values.
A glance at the Shiny App
The shiny app gives users the capability in comparing deaths by leading cause, year, sex, and race ethnicity.
The 2007-2014 death trend shows there is a dramatical drop from 2008 to 2009, and the year of 2012 has the least death. This is because of the death of some major leading causes have increased. From 2008 to 2009: Chronic Liver Disease and Cirrhosis, Nephritis, Nephrotic Syndrome and Nephrisis. From 2011 to 2012: Mental and Behavioral Disorders due to Accidental Poisoning and Other Psychoactive Substance Use, Human Immunodeficiency Virus Disease.
Data Analysis
We have seen the death trend for male and female together. Now we look at the death trends for the male and female.
The females have more deaths than males. The trend for females is very similar to the trend for all males and females together. This tells us that female deaths a significant impact on overall death trend.
The death trends for different race ethnicity shows White Non-Hispanic has the highest deaths with a downward sloping trend. This means the health of white Non-Hispanic has been improved over time. Asian and Pacific Islander has the least deaths with a upward sloping trend. Their health is getting worse gradually. The shiny App gives users the capability in looking at the trend for individual race and comparing the trend between race ethnicity.
Heart Disease
Heart Diseases is the number one leading cause but there might be different types of heart diseases. Also, all other causes are ranking number three. What are all other causes? Perhaps, health indicators might tell the story behind it. Coronary Heart Disease has high mortality for Black Non-Hispanic and White Non-Hispanic. Black mortality are slightly higher than white for the heart disease/stroke. White has a high risk on unintentional injury and elderlies have higher risk to fall. Black has high risk of Asthma/Chronic Lower Respiratory Hospitalizations. White has high suicide mortality and black has high drug related hospitalizations. Black has higher risk in diabetes. Both black and white are at high risk in cancer. Black has higher birth related mortality.
We see heart diseases for blacks and whites are higher than other races. We can say they are the majority of races in having more heart diseases.
Suicides
Whites have higher suicide mortality deaths. Also, we are surprised that Asian and Pacific Islander is with the number 2 ranking.
Black has more diabetes mortality.
Black has more cancer mortality. Especially, Female breast cancer mortality is for higher for both blacks and whites.
For more details of the codes, please see R Shiny Source Code
Conclusion
Through the data exploratory data analysis, we conclude that the death trend of females has a significant impact on the overall trend. The health of females got significant improvement in 2009. Females have more deaths than males and breast cancer is one of the major leading causes of death for females. Black and White have higher health issues/hospitalizations than other races.