Data Tells How Happy The World Is (World Happiness Report)
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Data Science Motivation
I was motivated to do this analysis from my curiosity to understand how happy the world is and which countries are happiest and the least happy. Furthermore, I have assessedΒ whether this data can help us identify the potential correlations with economic or social factors.
World Happiness Report is published every year with data on comparative happiness of countries. As this report is being published for many years, there is rich historical data already available. This EDA compares the trends over time, trends by geographical regions and the potential correlations of different variables with the Happiness Index. Β The data includes around 140 countries, some countries are removed where the data was sparse or inconsistent and amounted to <5% of the world population.
The data mainly consists of the below metrics and the categorical variables are Country and Region.
- Happiness score: This is the primary data point of the survey, gathered from the Cantril Ladder, where the respondents are asked to imagine themselves on a ladder with steps from zero to ten. Zero is worst possible and a ten is the best possible.
- Log GDP per capita: GDP per capita is interns of Purchasing Power Parity ( PPP) adjusted to constant 2011 international dollars.
- Social support: The national average of the binary responses ( 0= No, 1 = Yes) to the Gallup World Poll ( GWP) question, "If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?"
- Healthy life expectancy: Health life expectancy at birth based on the data from WHO Global Health Observatory data repository.
- Freedom to make life choices: It is the national average of binary responses to the GWP question, "Are you satisfied or dissatisfied with your freedom to choose what you do with your life?"
- Generosity: It is the residual of regressing the national average of GWP responses to the question, "Have you donated money to a charity in the past month?"
- Perceptions of corruption: It is the average of binary answers to two GWP questions, "Is corruption widespread throughout the government or not?" and "Is corruption widespread within businesses oar not?"
Specifically, These are the questions this article will try to answer
1) How 2021 Happiness score data compares to previous years?
2) How has Happiness varied over last 10 years across different regions?
3) How some of the national/ international issues impacted Happiness score? e.g. Has Covid-19 impacted the average happiness score of the world?
4) Which available variables shows high correlation with Happiness Score?
5) What inferences can be drawn from this analysis? e.g. Can you decide to relocate to a happier place based on this analysis?
6) How this analysis can be improved?
Data Analysis
Preliminary EDA: Distributions, top 10, bottom 10 and box plots by region
The distributions for 2021 and historical data ( last 10 years' average happiness scores) is similar to the latest scores. However, 2021's distribution is shifted towards right, indicating an increase in overall average happiness scores compared to the last 10 years'
Here are the top 10 and the bottom 10 countries as per their happiness scores
For 2021,Β Middle Eastern and North African, South Asian and Western European regions have a wide range of Happiness Score for their respective countries. North America an ANZ region shows lower variability.
Deep Dives:
Although North America and ANZ has highest mean happiness it is declining over the years. Central and Eastern Europe has lower mean happiness score but gradually increased over the years
By plotting mean happiness scores over the years we see that the Central and Eastern Europe's distribution is widest, and South Asia and North America are ones with least wide distribution. However by digging in at the grain of country and plotting the scores by year we can see that the happiness is trending downwards in North American region and trending upward in Central and Eastern Europe.
Central and Eastern Europe's countries have an uptrend in happiness levels
North America and ANZ regions have as downward trending levels of happiness
If we isolate one region and notice the happiness over the year, South Asia shows most variability, where the happiness has declined from 2016 until 2019. To explore potential hypotheses we dive deeper by plotting scores for only the countries that are part of the South Asian region.
Clearly, Afghanistan and India have seen the steepest declines in this region, causing the overall average happiness of the region to drop. Potential reasons could be the social and political unrest in these countries. Specially Afghanistan, which witnessed multiple changes in leadership and devastating wars.
Potential Impact from Covid-19
In the plot above the Happiness score seems to drop in 2021. Before 2021 the score was increasing at a steady pace but took a downward turn in 2021. One hypothesis could be the effects of Covid-19. However, just based on this available data it is difficult to confirm this hypothesis.
Data Correlation Analysis
As per the correlation heatmap Log GDP per capita, Social Support and Health life expectancy are the top correlated variables with Happiness Score. Generosity does not show any correlation and Perceptions of Corruption is negatively correlated.
Data Science Conclusion:
The question is posed by the title - "Can data answer how happy is the world?". We can find some insights in this data to answer this question. For example, we can see the uptrends in Central and Eastern Europe and downtrends in North America and ANZ. South Asia seem to have lowest levels of happiness, with lowest levels in Afghanistan. Finland is the happiest country. Beyond these insights, this data is also helpful in formulating hypotheses and need additional data sources to test.
1) How 2021 Happiness score data compare to previous years?
The distribuion does not differ in a meaningful way compared to previous years.
2) How has Happiness varied over last 10 years across different regions?
a) For North America and ANZ the average Happiness Score has declined whereas increases in Centeral and Eastern Europe.
b) South Asian countries, India and Afghanistan as seeing gradual declines in Happiness. More data is need to pin point the actual causes, however from news and observation one can hypothesisize the social and political turmoil could be responsible for this decline
3) How some of the national/ international issues impacted Happiness score? e.g. Has Covid-19 impacted the average happiness score of the world?
a) Covid seems to have a negative effect on Happiness score. b) As stated in 2) above social and political issues could be responsible for lower happiness in certain regions
4) Which available variables show high correlation with Happiness Score?
Top 3 positively correlated variables with the happiness score are
- Log GDP per capita
- Social support
- Health life expectancy
Perceptions of corruption is negatively correlated and generocity seems to be least correlated
5) What inferences can be drawn from this analysis? e.g. Can you decide to relocate to a happier place based on this analysis?ΒΆ
As per the data we can say, as explained in 4 above, which variables are positively and negatively correlated with the happiness score.
It is difficult to draw conclusions with high certainty based on this data. However, we can formulate several hypothesis that can lead to further analyses to confirm our opinions. For example, Covid-19 seems to have negative impact on the happiness score. Social and political conditins in South Asia are causing decrease in happiness
Future Analysis
One way to improve this analysis will be to add population data and weight the happiness score by the population of the regions. Second, we can use regression techniques to predict the future happiness of the regions. Many more hypotheses can be formulated based on this data. The more context and knowledge one has of the geo political and social landscape of the world, the better hypotheses one can draw from this data.
Data and Resources
References: Information regarding data and world happiness report can be found at the below pages
https://github.com/hkspro1/happiness.git: The Jupiter notebook has all the reproducible code witch includes steps to clean the data and helper function to plot the visualizations.
- https://www.kaggle.com/ajaypalsinghlo/world-happiness-report-2021
- https://www.kaggle.com/mathurinache/world-happiness-report-20152021
- https://worldhappiness.report/faq/
- https://ourworldindata.org/happiness-and-life-satisfaction