Vaccine preventable diseases - impact of regionality and development

Chitra Sharathchandra
Posted on Feb 7, 2021
Infant in the UK receiving the small pox vaccine in 1951 (source: commons.wikimedia.org)

Introduction

As we focus our attention on the distribution of the coronavirus(COVID-19) vaccines, it is appropriate to apply insight derived from studying the impact of vaccines on other vaccine preventable diseases. Vaccinations are an integral tool for controlling a range of bacterial and viral diseases. Their effectiveness varies which is why tracking disease incidence is necessary to control outbreaks. Even though there are stories with happy endings, such as eradication of small pox, there are other diseases which have been difficult to eradicate even with vaccines and continue to challenge us. This project attempts to study data from the World Health Organization(WHO) to assess the trend of vaccine preventable disease incidences and how they relate to regionality and the level of development of a specific country. The analysis here hopes to shed light on what we can expect as we look forward to a world without masks and social distancing.

WHO data set

The datasets used for this study range from the year 2010 to 2019.  The source of the data can be found here.  The WHO provides data on vaccine preventable diseases and vaccine coverage by country.  Disease incidence data is available for the following diseases:

  • Diphtheria
  • Japanese encephalitis
  • Measles
  • Pertussis (whooping cough)
  • Polio
  • Rubella
  • Tetanus and
  • Yellow fever(not included in this study)

The WHO limits its study to these diseases that generally impact people more globally and across regions.  Vaccine coverage data, on the other hand, is available for the following vaccines:

  • BCG for tuberculosis
  • DTP(n) for diphtheria, tetanus and pertussis
  • IPV(n) for polio
  • HepB for Hepatitis B
  • Hib3 for Haemophilus influenza
  • JapEnc for Japanese Encephalitis
  • MCV(n) for measles
  • PCV(n) for pneumonia
  • Pol3 for polio
  • Rota(n) for Rotavirus
  • RCV(n) for Rubella
  • TT2plus for Tetanus
  • PAB for Tetanus
  • VAD1 is Vitamin A supplement
  • YFV for yellow fever

The analysis here did not include Yellow Fever as there were no reported cases during the time frame being studied i.e. 2010 to 2019.  Vaccines related to the illnesses being studied were included.  Therefore, the following vaccines were not part of the analysis:  BCG, HepB, Hib3, Rota(n), VAD1 and YFV.

Another important point about vaccine data is that the WHO provides this as a percentage coverage rate within a country.  Thus, the value had to be normalized to account for population levels in various countries.

Main objective and data groupings

The main objective of this project was to visualize data to see patterns in disease incidence and vaccine administration by:

  • WHO regions 
  • Human Development Index (HDI) 

The WHO regions are as follows:

  • AFR - African region
  • AMR - American region 
  • EMR - Eastern Mediterranean region
  • EUR - European region
  • SEAR - South East Asian region
  • WPR - Western Pacific region

Region information was included in the disease incidence and vaccines data sets provided by the WHO.

To group country data by levels of development, HDI was used.  HDI is a score (between 0 and 1) assigned by the United Nations.  It demonstrates the status of health, education and standard of living as follows:

  • Very High Development (HDI >= 0.8)
  • High Development (HDI between 0.7 and 0.799)
  • Medium Development (HDI between 0.55 and 0.699)
  • Low Development (HDI<0.55)

HDI data was retrieved from the UN website here.  This data was joined with the WHO data set based on Country name.  Details on HDI can be found here.  

Visualizations

Data visualizations were created using R with packages dpylr and ggplot2 and were deployed in a R Shiny App. Incidence and vaccine data were presented by region and HDI. Here is the visualization for Diphtheria based on HDI. Diphtheria prevalence is mostly in the Medium Development countries with more recent prevalence in Low Development countries.

Drilling down further into Low Development countries, one can see the dropping vaccination rates (shown with the pink line chart). This may explain the gradual rise in disease incidence.

The next visualization shows the prevalence of polio across WHO regions.  In this first visualization, it is clear that polio is near eradication in most of the world.  Some cases are still seen in the AFR and EMR regions.  Drilling down further into the EMR region, we can see that vaccinations have been effective in dropping rates of polio since 2014.

Although the above visualizations show the impact of vaccinations very clearly, such trends are not so obvious in other diseases.  The data includes omissions or errors and the source of the errors were hard to identify and correct.  Therefore, clear conclusions based on vaccine data could not be made for all diseases.  The disease incidence rate were more reliable and clear conclusions can be drawn from disease data.

Conclusions

The visualizations clearly depicted the regionality of incidences and also show how incidences seemed to be prevalent in countries in a certain category of HDI. The following tables summarize the findings at a high level.

Disease rates by region
Disease rates by HDI

The visualizations generated for this project were a good tool for analyzing disease rates across regions and HDI categories.  The tables above show that Diphtheria, for example, is more prevalent in AFR and SEAR regions.  However, polio is more prevalent in the AFR and EMR regions.  Japanese Encephalitis seems to be a problem only in the SEAR and WPR regions.  So, one could conclude that each region faces different challenges when it comes to these diseases.    

The above tables also show that Pertussis is a problem for Very High Development countries which are not immune from these vaccine preventable diseases.  Most other diseases seem to have more prevalence in Low and Medium Development countries. Another finding is that some regions and HDI categories have eradicated diseases such as Polio altogether.     Regions that have struggled with polio have contained it by ramping up vaccine distribution.

The implications from this analysis is that even though we are in the early stages of vaccine distribution for Covid-19, the road ahead is a long one. Disease eradication is a challenge and each region and HDI category will face its own challenges.

Future work

The analysis here is far from complete and future work must include (1) more vaccine data (2) study of other parameters that impact disease prevalence and eradication (3) add statistical analysis 

Vaccine Visualization Repository

About Author

Chitra Sharathchandra

Chitra Sharathchandra

Chitra Sharathchandra is a software engineer who is passionate about technology. Her current focus is on data science and data engineering. Chitra enjoys teaching South Indian classical music.
View all posts by Chitra Sharathchandra >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp