Data Analysis on The Mental Health Crisis

Posted on Nov 2, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Github | Linkedin | Shiny


Mental health is an issue that has recently come into the spotlight in mainstream media data here in the United States.  It is a multi-sector issue that involves healthcare professionals, all branches of the government, and expands to patients, caregivers, and family members.  It impacts all facets of the workplace and everyone who is on social media.

I wanted to dive in and take a look at the hard numbers and see the story the data was telling.  "Is it an issue?", "Just how bad is it?", "Is it only impacting the United States?", these were some of the few questions that came to light when first thinking of this topic.


My goals were twofold for this analysis:

  • For starters, it was to discover answers to the questions stated with hard data to support any claims.
  • But more importantly, I wanted to spread awareness on Mental Health with some statistical insight to those who may be impacted by it either directly or indirectly.


  • Time Period: 1985-2016
  • Compiled by: World Bank, World Health Organization (WHO)
  • Countries: 101 countries
  • Gender: 2 gender classifications
  • Age Groups: 6 different groups
  • Yearly GDP: for that particular country
  • Population: split specifically for a particular group
  • # of Suicides: split specifically for a particular group

Data Scrubbing

The data integrity for years prior to 2002 and in the final year of 2016 was not consistent.  In 2016, it appeared that a lot of data was missing because it had yet to be compiled or collected and for years prior to 2002 there were missing data from countries that had yet to start tracking or didn't have consistent tracking.

Therefore, I filtered the data accordingly:

  • Time period shortened to 2002-2015
  • Total of 51 countries

Data Analysis

Initially, I wanted to take a wide-angle lens approach and see everything on a macro level:

Data Analysis on The Mental Health Crisis

This is what we hoped for but looks too good to be true.

How is the spread of the data for each country?

Data Analysis on The Mental Health Crisis

It looks pretty wide in a decent chunk of the countries.

Data on each country from the 2002-2015 period.

Data Analysis on The Mental Health Crisis

This chart doesn't seem to say much but does highlight Europe & Central Asia being a bulk of the highest suicide rates for this period.

Gender Data

Looks like there's a trend toward Males being more prone to suicide than females.

Is there a better way to visualize this?

Yes, this chart beautifully illustrates the ratio split by gender.  Note the 2 lines representing the average for females on left and males on the right.

Age Group Data

A disturbing chart that illustrates that elderly age brackets are at higher risk for suicide than younger age groups.

This made me think, what if we evaluate generations instead of just age groups?

Nothing eye-opening here.  This shows what I was expecting which was the oldest generation groups have higher rates and the youngest have the lowest.  I don't really see rates increasing or decreasing by a large amount.  (Note Silent is the generation prior to the Boomers)

Adding GDP into the equation

Would expect some correlation here but it looks like there isn't much correlation.  My guess is the wealthier nations don't seem to have a great impact on the suicide rates.

Finding a country with Increasing Suicide Rates


I thought correlation would be the best way to tackle finding the outliers that were showing increasing suicide rates vs the first chart where we saw a decreasing trend of suicides globally.  This chart illustrates a clear visual representation of that.

Evaluating the U.S (2nd on the list)

It definitely looks like the U.S has an increasing trend for suicides throughout this period.

as the country got richer and had more resources, the numbers still increased.

A slight rise overall for both males and females but the ratio seems somewhat consistent.

The most disturbing chart for me is this one.  Remember we saw that the older age groups were higher risk to suicide? According to this chart, in the U.S it's actually the 35-54 year old category.

My Conclusion

From the analysis shown, we can infer the following:

  • Global suicide rates have been decreasing from 2002-2015
  • This doesn't paint the whole picture
  • Countries such as the United States have been outliers and have had a problem with increasing rates of suicides
  • The United States is also an outlier where the 35-54 age bracket is at the highest risk!
  • Elderly Population at higher risk overall compared to the younger population
  • Males have a much higher risk of suicide than females

Final Thoughts

I feel that this is the first step in the right direction to better understand suicide rates and the mental health crisis.  I just wish there were more years of data that we had available from all countries for a more conducive analysis.

To highlight some of the issues I encountered:

  • Helping smaller countries and countries with smaller GDP better track their data
  • Seeing if any missing historical data is available and providing it to the World Health Organization
  • Fact-checking the data to ensure the integrity of the numbers

Next Steps

I would love to eventually revisit, rethink, and add to this project.  I feel these were some future thoughts worth considering:

  • Bringing in more individual characteristics as factors to consider with this analysis (ethnicity and education for example)
  • Also to bring in more tracked key indicators to add layers to this analysis (i.e: happiness Index, unemployment rate)
  • Incorporate the years during the COVID epidemic

A brighter future

This could be used to help build a predictive model with factors to predict future trends in suicide.  The factor weights can be used to better understand which facets impact mental health the most.  This could eventually be used as an effective tool to help combat this issue.

If you or anyone you know needs help:

National Institute of Mental Health:
National Suicide Prevention Lifeline: 1-800-273-TALK (8255)


World Health Organization (WHO):
Kaggle Dataset: Dataset link
National Institue of Mental Health:

About Author

David Jhang

David has 10+ years in the financial investment industry in NYC. He is currently working at a Long/Short Equity Hedge Fund that focuses on TMT. He is also currently an aspiring Data Scientist at NYC Data Academy.
View all posts by David Jhang >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI