Mental Illness Classifier Platform:

Posted on Jan 15, 2018


Depression is a condition that reportedly affects one in ten Americans at one point or another, and the incidence of depression is higher in some states than others. According to the research conducted by Healthline, 44% of college students report having symptoms of depression, and 75% of them do not seek help for their mental health problems. Suicide is the third leading cause of death among children and young adults aged 10 to 24. As for people over 30, about 50% of all adults experiencing symptoms of depression will not talk to a doctor or seek help for depression.

These statistical findings merely cover depression not to mention including other forms of mental illnesses. Some depressed people avoid seeking help from external resources because they fear social judgment. Others may be more concerned by the high cost of treatment -- even with insurance.

As Ayush, Chetan, Suprith, Shi and I share the same values and goals, we teamed up at the HackUMass event creating the to solve the problems stated above. is a mental illness classifier platform for individuals and families affected by mental illness to give them a better understanding of their mental health condition. uses machine learning to analyze text entered by users and predicts the kind of disorder that could be affecting them. There is still a lot of stigmas associated with mental illness and that prevents people from reaching out to others, for this reason, we created this website to enable people to be heard and to find possible solutions easily.


Data Collection

Ayush and I chose the mental health forums because people diagnosed by trained practitioners mostly post there about their day to day life. That gave us a good basis for assuming labeled data. For example, if a user posts an article in PTSD group, we put it in the PTSD category. We also scraped through Reddit original posts. We thought they would be a good fit because of the popularity of the post and tags of mental diseases. In the end, we gathered 520 reviews across mental diseases like depression, PTSD (Post Traumatic Stress Disorder), ADHD (Attention-Deficit/Hyperactivity Disorder), and PPD (Paranoid Personality Disorder) and labeled them accordingly.

Codes are available on GitHub.


Machine Learning

After gaining our labeled reviews, Ayush and I pre-processed the data by removing stop words, tokenizing, and stemming. Subsequently, we trained the multinomial Naive Bayes classification model with five-fold grid search cross-validation to tune the best hyperparameters. The accuracy score of the test set is about 70 percent.

Suprith, Shi, and Chetan collaboratively built a web app that is supported by Django.


Web App

The Front-end of our project was created using Node JS, and the back end was created using Django. The Django project was put into production on a Digital Ocean droplet with minimum processing and space requirements. Gunicorn was used to spawn threads for the Django server to serve real-time user requests, and Nginx framework was used to create an interface between Network and the Django threads. An object of the trained classifier was created and stored using pickle. For serving real-time diagnosis requests each time the classifier object is unpickled and used to classify the user request string. Based on the classification a custom page is created and returned which consists of disorder-specific coping mechanisms i.e. a diagnosis of ADHD would return a list of curated resources for ADHD and likewise for every disorder that our model predicts. In the future, we hope to add information about healthcare practitioners within a 20-mile radius who specialize in the predicted disorder.



  • Unlike other online mental health assessment platforms, the WeCare website asks open-ended questions which allow users to fully describe their feelings in a more detailed manner.
  • A free and real-time diagnosis allows people to learn more about their mental health condition and figure out possible solutions to their problems.
  • Since the dataset is small and a little bit unbalanced, it would be better if we can scrape more data and try to get each disease proportionally. We will work on Convolutional Neural Network (CNN) approach to classify mental illnesses once a larger dataset are obtained.



Ayush Sharma: CICS, UMass AMherst (M.S. in Computer Science)

Chetan Manjesh: CICS, UMass AMherst (M.S. in Computer Science)

Suprith Aireddy: Illinois State University (B.S. in Computer Science)

Shi Zhang: New York University (B.S. in Computer Science)

Yu-Han Chen: New York University (M.S. in Management and Systems)

About Author

Yu-Han Chen

Yu-Han is currently pursuing a Master’s degree in Management and Systems at New York University, and being a part-time data scientist and teaching assistant at NYC Data Science Academy. In her prior role as a market research consultant,...
View all posts by Yu-Han Chen >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI