Diseases with Vaccines: Impact of Regionality & Development

Posted on Feb 7, 2021
Diseases with Vaccines: Impact of Regionality & Development

The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction to Vaccine Preventable Diseases

As we focus our attention on the distribution of the coronavirus (COVID-19) vaccines, it is appropriate to apply insight derived from studying the impact of vaccines on other vaccine preventable diseases. Vaccinations are an integral tool for controlling a range of bacterial and viral diseases. Their effectiveness varies which is why tracking disease incidence is necessary to control outbreaks.

Even though there are stories with happy endings, such as eradication of smallpox, there are other diseases which have been difficult to eradicate even with vaccines and continue to challenge us.

This project attempts to study data from the World Health Organization (WHO) to assess the trend of vaccine preventable disease incidences and how they relate to regionality and the level of development of a specific country. The analysis here hopes to shed light on what we can expect as we look forward to a world without masks and social distancing.

WHO Dataset

The datasets used for this study range from the year 2010 to 2019.  The source of the data can be found here.  The WHO provides data on vaccine preventable diseases and vaccine coverage by country.  Disease incidence data is available for the following diseases:

  • Diphtheria
  • Japanese encephalitis
  • Measles
  • Pertussis (whooping cough)
  • Polio
  • Rubella
  • Tetanus and
  • Yellow fever (not included in this study)

The WHO limits its study to these diseases that generally impact people more globally and across regions.  Vaccine coverage data, on the other hand, is available for the following vaccines:

  • BCG for tuberculosis
  • DTP(n) for diphtheria, tetanus and pertussis
  • IPV(n) for polio
  • HepB for Hepatitis B
  • Hib3 for Haemophilus influenza
  • JapEnc for Japanese Encephalitis
  • MCV(n) for measles
  • PCV(n) for pneumonia
  • Pol3 for polio
  • Rota(n) for Rotavirus
  • RCV(n) for Rubella
  • TT2plus for Tetanus
  • PAB for Tetanus
  • VAD1 is Vitamin A supplement
  • YFV for yellow fever

The analysis here did not include Yellow Fever as there were no reported cases during the time frame being studied i.e. 2010 to 2019.  Vaccines related to the illnesses being studied were included.  Therefore, the following vaccines were not part of the analysis:  BCG, HepB, Hib3, Rota(n), VAD1 and YFV.

Another important point about vaccine data is that the WHO provides this as a percentage coverage rate within a country.  Thus, the value had to be normalized to account for population levels in various countries.

Main Objective and Data Groupings on Vaccine Preventable Diseases

The main objective of this project was to visualize data to see patterns in disease incidence and vaccine administration by:

  • WHO regions 
  • Human Development Index (HDI) 

The WHO regions are as follows:

  • AFR - African region
  • AMR - American region 
  • EMR - Eastern Mediterranean region
  • EUR - European region
  • SEAR - Southeast Asian region
  • WPR - Western Pacific region

Region information was included in the disease incidence and vaccines data sets provided by the WHO.

To group country data by levels of development, HDI was used.  HDI is a score (between 0 and 1) assigned by the United Nations.  It demonstrates the status of health, education and standard of living as follows:

  • Very High Development (HDI >= 0.8)
  • High Development (HDI between 0.7 and 0.799)
  • Medium Development (HDI between 0.55 and 0.699)
  • Low Development (HDI<0.55)

HDI data was retrieved from the UN website here.  This data was joined with the WHO data set based on Country name.  Details on HDI can be found here.  


Data visualizations were created using R with packages dpylr and ggplot2 and were deployed in a R Shiny App. Incidence and vaccine data were presented by region and HDI. Here is the visualization for Diphtheria based on HDI. Diphtheria prevalence is mostly in the Medium Development countries with more recent prevalence in Low Development countries.

Diseases with Vaccines: Impact of Regionality & Development

Drilling down further into Low Development countries, one can see the dropping vaccination rates (shown with the pink line chart). This may explain the gradual rise in disease incidence.

Diseases with Vaccines: Impact of Regionality & Development

The next visualization shows the prevalence of polio across WHO regions.  In this first visualization, it is clear that polio is near eradication in most of the world.  Some cases are still seen in the AFR and EMR regions.  Drilling down further into the EMR region, we can see that vaccinations have been effective in dropping rates of polio since 2014.

Diseases with Vaccines: Impact of Regionality & Development

Although the above visualizations show the impact of vaccinations very clearly, such trends are not so obvious in other diseases.  The data includes omissions or errors and the source of the errors were hard to identify and correct.  Therefore, clear conclusions based on vaccine data could not be made for all diseases.  The disease incidence rate were more reliable and clear conclusions can be drawn from disease data.

Conclusion About Vaccine Preventable Diseases

The visualizations clearly depicted the regionality of incidences and also show how incidences seemed to be prevalent in countries in a certain category of HDI. The following tables summarize the findings at a high level.

Disease rates by region
Disease rates by HDI

The visualizations generated for this project were a good tool for analyzing disease rates across regions and HDI categories.  The tables above show that Diphtheria, for example, is more prevalent in AFR and SEAR regions.

However, polio is more prevalent in the AFR and EMR regions.  Japanese Encephalitis seems to be a problem only in the SEAR and WPR regions.  So, one could conclude that each region faces different challenges when it comes to these diseases.    

The above tables also show that Pertussis is a problem for Very High Development countries which are not immune from these vaccine preventable diseases.  Most other diseases seem to have more prevalence in Low and Medium Development countries.

Another finding is that some regions and HDI categories have eradicated diseases such as Polio altogether. Regions that have struggled with polio have contained it by ramping up vaccine distribution.

The implications from this analysis is that even though we are in the early stages of vaccine distribution for Covid-19, the road ahead is a long one. Disease eradication is a challenge and each region and HDI category will face its own challenges.

Future Work

The analysis here is far from complete and future work must include (1) more vaccine data (2) study of other parameters that impact disease prevalence and eradication (3) add statistical analysis 

Vaccine Visualization Repository

About Author

Chitra Sharathchandra

Chitra Sharathchandra is a software engineer who is passionate about technology. Her current focus is on data science and data engineering. Chitra enjoys teaching South Indian classical music.
View all posts by Chitra Sharathchandra >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI