Data Tells How Happy The World Is (World Happiness Report)

Posted on Mar 9, 2022

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Science Motivation

I was motivated to do this analysis from my curiosity to understand how happy the world is and which countries are happiest and the least happy. Furthermore, I have assessed whether this data can help us identify the potential correlations with economic or social factors.

World Happiness Report is published every year with data on comparative happiness of countries. As this report is being published for many years, there is rich historical data already available. This EDA compares the trends over time, trends by geographical regions and the potential correlations of different variables with the Happiness Index.  The data includes around 140 countries, some countries are removed where the data was sparse or inconsistent and amounted to <5% of the world population.

The data mainly consists of the below metrics and the categorical variables are Country and Region.

  1. Happiness score: This is the primary data point of the survey, gathered from the Cantril Ladder, where the respondents are asked to imagine themselves on a ladder with steps from zero to ten. Zero is worst possible and a ten is the best possible.
  2. Log GDP per capita: GDP per capita is interns of Purchasing Power Parity ( PPP) adjusted to constant 2011 international dollars.
  3. Social support: The national average of the binary responses ( 0= No, 1 = Yes) to the Gallup World Poll ( GWP) question, "If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?"
  4. Healthy life expectancy: Health life expectancy at birth based on the data from WHO Global Health Observatory data repository.
  5. Freedom to make life choices: It is the national average of binary responses to the GWP question, "Are you satisfied or dissatisfied with your freedom to choose what you do with your life?"
  6. Generosity: It is the residual of regressing the national average of GWP responses to the question, "Have you donated money to a charity in the past month?"
  7. Perceptions of corruption: It is the average of binary answers to two GWP questions, "Is corruption widespread throughout the government or not?" and "Is corruption widespread within businesses oar not?"

Specifically, These are the questions this article will try to answer

1) How 2021 Happiness score data compares to previous years?

2) How has Happiness varied over last 10 years across different regions?

3) How some of the national/ international issues impacted Happiness score? e.g. Has Covid-19 impacted the average happiness score of the world?

4) Which available variables shows high correlation with Happiness Score?

5) What inferences can be drawn from this analysis? e.g. Can you decide to relocate to a happier place based on this analysis?

6) How this analysis can be improved?

Data Analysis

Preliminary EDA: Distributions, top 10, bottom 10 and box plots by region

Data Tells How Happy The World Is

Data Tells How Happy The World Is

The distributions for 2021 and historical data ( last 10 years' average happiness scores) is similar to the latest scores. However, 2021's distribution is shifted towards right, indicating an increase in overall average happiness scores compared to the last 10 years'

Here are the top 10 and the bottom 10 countries as per their happiness scores

Data Tells How Happy The World Is

Data Tells How Happy The World Is


For 2021, Middle Eastern and North African, South Asian and Western European regions have a wide range of Happiness Score for their respective countries. North America an ANZ region shows lower variability.

Deep Dives:

Although North America and ANZ has highest mean happiness it is declining over the years. Central and Eastern Europe has lower mean happiness score but gradually increased over the years

By plotting mean happiness scores over the years we see that the Central and Eastern Europe's distribution is widest, and South Asia and North America are ones with least wide distribution. However by digging in at the grain of country and plotting the scores by year we can see that the happiness is trending downwards in North American region and trending upward in Central and Eastern Europe.

Central and Eastern Europe's countries have an uptrend in happiness levels

North America and ANZ regions have as downward trending levels of happiness

If we isolate one region and notice the happiness over the year, South Asia shows most variability, where the happiness has declined from 2016 until 2019. To explore potential hypotheses we dive deeper by plotting scores for only the countries that are part of the South Asian region.


Clearly, Afghanistan and India have seen the steepest declines in this region, causing the overall average happiness of the region to drop. Potential reasons could be the social and political unrest in these countries. Specially Afghanistan, which witnessed multiple changes in leadership and devastating wars.

Potential Impact from Covid-19

In the plot above the Happiness score seems to drop in 2021. Before 2021 the score was increasing at a steady pace but took a downward turn in 2021. One hypothesis could be the effects of Covid-19. However, just based on this available data it is difficult to confirm this hypothesis.

Data Correlation Analysis


As per the correlation heatmap Log GDP per capita, Social Support and Health life expectancy are the top correlated variables with Happiness Score. Generosity does not show any correlation and Perceptions of Corruption is negatively correlated.

Data Science Conclusion:

The question is posed by the title - "Can data answer how happy is the world?". We can find some insights in this data to answer this question. For example, we can see the uptrends in Central and Eastern Europe and downtrends in North America and ANZ. South Asia seem to have lowest levels of happiness, with lowest levels in Afghanistan. Finland is the happiest country. Beyond these insights, this data is also helpful in formulating hypotheses and need additional data sources to test.

1) How 2021 Happiness score data compare to previous years?

The distribuion does not differ in a meaningful way compared to previous years.

2) How has Happiness varied over last 10 years across different regions?

a) For North America and ANZ the average Happiness Score has declined whereas increases in Centeral and Eastern Europe.

b) South Asian countries, India and Afghanistan as seeing gradual declines in Happiness. More data is need to pin point the actual causes, however from news and observation one can hypothesisize the social and political turmoil could be responsible for this decline

3) How some of the national/ international issues impacted Happiness score? e.g. Has Covid-19 impacted the average happiness score of the world?

a) Covid seems to have a negative effect on Happiness score. b) As stated in 2) above social and political issues could be responsible for lower happiness in certain regions

4) Which available variables show high correlation with Happiness Score?

Top 3 positively correlated variables with the happiness score are

  1. Log GDP per capita
  2. Social support
  3. Health life expectancy

Perceptions of corruption is negatively correlated and generocity seems to be least correlated

5) What inferences can be drawn from this analysis? e.g. Can you decide to relocate to a happier place based on this analysis?

As per the data we can say, as explained in 4 above, which variables are positively and negatively correlated with the happiness score.

It is difficult to draw conclusions with high certainty based on this data. However, we can formulate several hypothesis that can lead to further analyses to confirm our opinions. For example, Covid-19 seems to have negative impact on the happiness score. Social and political conditins in South Asia are causing decrease in happiness

Future Analysis

One way to improve this analysis will be to add population data and weight the happiness score by the population of the regions. Second, we can use regression techniques to predict the future happiness of the regions. Many more hypotheses can be formulated based on this data. The more context and knowledge one has of the geo political and social landscape of the world, the better hypotheses one can draw from this data.


Data and Resources

References: Information regarding data and world happiness report can be found at the below pages The Jupiter notebook has all the reproducible code witch includes steps to clean the data and helper function to plot the visualizations.


About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI