Data Study on World Health Expenditure and Life Expectancy

Posted on Nov 21, 2016
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


The World Health Organization (WHO) was founded in 1948 by the United Nations. The organization works with countries’ governments and other associations to ameliorate the quality of health for everyone. For the WHO and many other organizations, it is useful to know how to gauge the overall quality of health in a country and more so to know what factors are of the greatest influence. The WHO webpage has data and information on the life expectancy, % GDP expenditure, expenditure per capita, and other indices for each country. One often hears on the news that the US has the highest %GDP expenditure on health in the world, but many countries have better health indices.


Here, I use the data that I scraped from the WHO webpage ( to show that %GDP expenditure is not the best predictor for better health indices, but that the expenditure per capita has a stronger relationship with life expectancy. A resulting linear model can function as a guide to how much life expectancy can be affected given certain changes in expenditure per capita.

Data Study on World Health Expenditure and Life Expectancy

Figure 1 The colors for the bars represent grouping according to the value of male life expectancy

Male Life Expectancy and the Measures of Expenditure

By looking at a plot of the two measures of expenditure considered here in Figure 2, we see that expenditure per capita has a much stronger relationship with male life expectancy then does % of GDP expenditure. Actually, expenditure per capita seems to grow exponentially with male life expectancy.


Figure 2

Linear Model

This exponential relationship between expenditure per capita and male life expectancy can be exploited to create a linear model. In this model, we ask the question “how does male life expectancy change with the natural log of the expenditure per capita”.

The resulting fit of the linear model to the data from the WHO website is shown in Figure 3. The intercept and slope coefficients for the model are β0= 38.916(years) and β1= 4.657(years/log($)). The p-values for the two coefficients are much less than 0.05, implying that both values are significant. The slope value β1 shows a positive relationship between the two variables and indicates how much the male life expectancy changes with a change in the log of the expenditure per capita.


Figure 3


Although it seems that the model in Figure 3 forms a good description of the data, a check of whether the assumptions for forming a linear model are met is warranted. Figure 4 shows some of the diagnostics used to validate our linear model in Figure 3. The positive diagnostics are:

  1. the residuals vs. fitted curve shows a flat and linear line
  2. the scale vs. location plot shows that different regions of the fitted values have similar variance
  3. the residual vs. leverage plot shows that there are no points that are influential (both outliers and have high leverage), and none of the points fall close to the Cook’s 0.5 or 1.0 lines.

The only possible drawback for this model is found in the normal Q-Q plot. While the plot shows that for most of the quantiles the relationship is linear, at the lower quantiles the points deviate slightly from linearity, indicating that the distribution here may be skewed and not normal in form.


Figure 4

It is possible that a different relationship between expenditure per capita and male life expectancy, other than the log(expenditure per capita), may help with some of the problems in the normality of the lower quantiles. To this end a Box-Cox transformation may be performed to find a different linear relationship. However, considering most of the diagnostics, the present model shown in Figure 3 is a good linear model and directly relates expenditure per capita to male life expectancy. The results here show it may be more prudent to discuss health indices with respect to the expenditure per capita in a given country instead of the % of GDP expenditure.


About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI