Data Analysis on Mental Health During COVID

Posted on Apr 4, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Analysis on Mental Health During COVID


Background: Covid-19 and Mental Health

For many people in the United States and worldwide, the Covid-19 pandemic has had devastating impacts on health, mental health, and financial security. As the virus moved into position as the third leading cause of death in 2020, data shows countries around the world experienced multiple periods of lock down and social isolation that created distance among loved ones, pushed many stores and restaurants out of business, and caused dramatic increases in unemployment.

The crisis triggered significant increases in mental health problems, with over 40 states reporting an increase in opioid related mortality in 2020. As Covid-19 infections rose, hospitals also saw recording-breaking declines in patient volume and revenue, in part due to Covid-related restrictions, and also due to patient anxiety about visiting medical facilities during the pandemic.

One important question is: What can medical facilities do to ease patient anxieties associated with visiting medical facilities in the wake of the Covid-19 pandemic?

Even before the pandemic, reports showed that 84% of health care leaders regarded improvements in patient experience as a top priority and that these improvements can be as simple as taking time to talk to patients and showing compassion. These improvements are particularly important as health care providers continue to struggle with Covid-related stressors that impact mental health and well-being.

Present Data Analysis

The purpose of the present analysis is to analyze frequencies of posts in Patient.Info forums about mental health during the rise of the Covid-19 pandemic, to determine whether patients are relating anxiety and depression to Covid-19, and to identify the extent to which other themes, such as concerns about the economy, might be associated with mental health.

The results of these analyses can be used by doctors and hospitals to enhance communication with patients who may be experiencing anxiety or depression related to the Covid-19 pandemic. Effective communication with patients can improve patient experience and satisfaction which can increase patient visits and facilitate patient care.

The Data

To conduct this analysis, data were collected from Patient.Info via web scraping using Python’s open-source Scrapy framework. is an online health information directory that hosts multiple community forums. The data consisted of 4902 user posts in three of Patient.Info’s community forums, including the Depression forum, the Anxiety Disorders forum, and the Coronavirus (Covid-19) forums. Posts were collected from December, 2018, to September, 2020, which allowed for a comparison of anxiety- and depression-related posts before and during the rise of the Covid-19 pandemic.

Data Analysis on Mental Health During COVID

Data Analyses

An overall analysis of post frequencies in the three forums showed that there were multiple peaks in the number of posts in the anxiety forum, including one substantial rise in posts from February to April of 2020. The latter peak aligns with the first wave of Covid-19 cases and with the first posts in the Covid-19 forum. In contrast, the overall number of posts in the depression forum did not vary substantially over time.

Data Analysis on Mental Health During COVID

To examine the extent to which Patient.Info users might be connecting depression and anxiety to the Covid-19 pandemic, natural language processing was employed to analyze the presence of Covid-related discourse in the Depression and Anxiety forums. Covid-related discourse was identified through the use of key Covid-related words that were used to reference the Covid-19 pandemic in the Covid-19 forum. These Covid-related key words included the following:


An analysis of trends over time shows that the proportion of posts that included references to Covid-19 increased for posts in both the Anxiety and Depression forums, with the increase being greater for Anxiety posts earlier in the pandemic (i.e., Spring 2020) and greater for Depression posts later in the pandemic (i.e., late summer, 2020).

Depression & Anxiety

To examine the presence of depression-related themes and anxiety-related themes across both the Depression and Anxiety forums, codebooks were created using key words derived from mental health inventories that are typically used to diagnose depression and anxiety disorders. The anxiety codebook was created using keywords from the Beck Anxiety Inventory (BAI), Hamilton Anxiety Rating Scale (HAM-A), and the Zung Self-Rating Anxiety Scale. The depression codebook was created using keywords from the Beck Depression Inventory and the Center for Epidemiologic Studies Depression Scale (CES-D).

The Depression Codebook included the following key words:

The Anxiety Codebook included the following key words:

Discourse about stressors related to economic decline and social distancing were also examined. Keywords used to examine discourse about economic concerns included: job, economy, money, finance, homeless, bills, rent, and stimulus. Keywords used to examine discourse about social distancing included isolation, distance, trapped, and fatigue.

Data on Usage of Words Before and During COVID

The text of posts in all three forums was analyzed for frequencies of the key words before and during the Covid-19 pandemic.  The results showed that the number of depression keywords in depression posts and the number of anxiety keywords in anxiety posts significantly increased as COVID-19 cases increased. This finding indicates that use of the depression and anxiety inventories as sources for developing codebooks was valid.

Recall that the frequency of posts in the Anxiety forum increased over the course of the Covid-19 pandemic, but the frequency of posts in the Depression forum did not increase. However, as shown in the bar graphs above, an examination of depression-related discourse reveals an increase in depression discourse within the text of the Depression forum posts. This finding suggests that the overall frequency of depression posts may not have increased over the course of the pandemic, but that the intensity or severity of depression-related symptoms may have increased for individuals who were posting in those forums.


Interestingly, the results also showed that anxiety discourse was more frequent than depression discourse in the Covid-19 forum, suggesting that Patient.Info users were more likely to associate the Covid-19 pandemic with feelings of anxiety.

In addition, concerns about health were significantly associated with COVID-related discourse in the anxiety forum but not in the depression forum.


Here is an example of a post in the Anxiety forum that emphasized health concerns related to Covid-19:

Analyses of relationships between mental health discourse and other stressors showed that concerns about the economy were more strongly correlated with discourse about depression than with discourse about anxiety. It is important to note that there were substantially more posts about the economy (N = 206) than about social distancing (N = 33) in these forums, making it difficult to conduct correlation analyses for concerns related to social distancing. The difference in the number of posts related to concerns about the economy vs. social distancing may be due to the difference in the number of keywords used to identify these posts.

In the heatmap figure below, lighter green represents stronger positive correlations. As discourse about the economic concerns increased, discourse about depression also increased.


Together, these findings suggest that there is an increase in anxiety and depression over the course of the Covid-19 pandemic, but that the experience of anxiety vs. depression are differentiated. While anxiety posts increased in frequency, depression posts did not. However, an analysis of the discourse within the posts using key words derived from mental health inventories showed an increase in both anxiety-related and depression-related discourse within the post text, suggesting that there was an increase in the number of symptoms or in the severity of these mental health concerns for both anxiety and depression.

This finding also indicates that use of inventory keywords serves as a valid method for analyzing and tracking trends in online mental health discourse. The results of additional analyses showed that Patient.Info users were more likely to associate Covid-19 with feelings of anxiety, were more likely to connect feelings of anxiety about Covid-19 to specific health concerns, and were more likely to connect feelings of depression to economic concerns.


When working with patients who express anxiety or depression, health care providers can enhance communication and compassion by applying the findings from the present analysis to their own communication and aligning their discourse to the specific concerns of patients. Patients who report symptoms of anxiety may be more likely to focus on health concerns and patients who report symptoms of depression may be more concerned about Covid-related stressors such as economic and financial concerns.

In many cases, symptoms associated with depression and anxiety are comorbid (i.e., co-occur), so many patients may express concerns related to multiple stressors. Future research should build on these findings by examining the effectiveness of aligned communication among providers and patients for enhancing patient experience and satisfaction during the Covid-19 pandemic.

About Author

Sara Kien

As a social science researcher with a PhD in cognitive psychology, Sara has been applying data science skills (including hypothesis testing, statistics, machine learning, and natural language processing) to program evaluation and behavioral science in ways that enhance...
View all posts by Sara Kien >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI