Diseases with Vaccines: Impact of Regionality & Development
The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction to Vaccine Preventable Diseases
As we focus our attention on the distribution of the coronavirus (COVID-19) vaccines, it is appropriate to apply insight derived from studying the impact of vaccines on other vaccine preventable diseases. Vaccinations are an integral tool for controlling a range of bacterial and viral diseases. Their effectiveness varies which is why tracking disease incidence is necessary to control outbreaks.
Even though there are stories with happy endings, such as eradication of smallpox, there are other diseases which have been difficult to eradicate even with vaccines and continue to challenge us.
This project attempts to study data from the World Health Organization (WHO) to assess the trend of vaccine preventable disease incidences and how they relate to regionality and the level of development of a specific country. The analysis here hopes to shed light on what we can expect as we look forward to a world without masks and social distancing.
WHO Dataset
The datasets used for this study range from the year 2010 to 2019. The source of the data can be found here. The WHO provides data on vaccine preventable diseases and vaccine coverage by country. Disease incidence data is available for the following diseases:
- Diphtheria
- Japanese encephalitis
- Measles
- Pertussis (whooping cough)
- Polio
- Rubella
- Tetanus and
- Yellow fever (not included in this study)
The WHO limits its study to these diseases that generally impact people more globally and across regions. Vaccine coverage data, on the other hand, is available for the following vaccines:
- BCG for tuberculosis
- DTP(n) for diphtheria, tetanus and pertussis
- IPV(n) for polio
- HepB for Hepatitis B
- Hib3 for Haemophilus influenza
- JapEnc for Japanese Encephalitis
- MCV(n) for measles
- PCV(n) for pneumonia
- Pol3 for polio
- Rota(n) for Rotavirus
- RCV(n) for Rubella
- TT2plus for Tetanus
- PAB for Tetanus
- VAD1 is Vitamin A supplement
- YFV for yellow fever
The analysis here did not include Yellow Fever as there were no reported cases during the time frame being studied i.e. 2010 to 2019. Vaccines related to the illnesses being studied were included. Therefore, the following vaccines were not part of the analysis: BCG, HepB, Hib3, Rota(n), VAD1 and YFV.
Another important point about vaccine data is that the WHO provides this as a percentage coverage rate within a country. Thus, the value had to be normalized to account for population levels in various countries.
Main Objective and Data Groupings on Vaccine Preventable Diseases
The main objective of this project was to visualize data to see patterns in disease incidence and vaccine administration by:
- WHO regions
- Human Development Index (HDI)
The WHO regions are as follows:
- AFR - African region
- AMR - American region
- EMR - Eastern Mediterranean region
- EUR - European region
- SEAR - Southeast Asian region
- WPR - Western Pacific region
Region information was included in the disease incidence and vaccines data sets provided by the WHO.
To group country data by levels of development, HDI was used. HDI is a score (between 0 and 1) assigned by the United Nations. It demonstrates the status of health, education and standard of living as follows:
- Very High Development (HDI >= 0.8)
- High Development (HDI between 0.7 and 0.799)
- Medium Development (HDI between 0.55 and 0.699)
- Low Development (HDI<0.55)
HDI data was retrieved from the UN website here. This data was joined with the WHO data set based on Country name. Details on HDI can be found here.
Visualizations
Data visualizations were created using R with packages dpylr and ggplot2 and were deployed in a R Shiny App. Incidence and vaccine data were presented by region and HDI. Here is the visualization for Diphtheria based on HDI. Diphtheria prevalence is mostly in the Medium Development countries with more recent prevalence in Low Development countries.
Drilling down further into Low Development countries, one can see the dropping vaccination rates (shown with the pink line chart). This may explain the gradual rise in disease incidence.
The next visualization shows the prevalence of polio across WHO regions. In this first visualization, it is clear that polio is near eradication in most of the world. Some cases are still seen in the AFR and EMR regions. Drilling down further into the EMR region, we can see that vaccinations have been effective in dropping rates of polio since 2014.
Although the above visualizations show the impact of vaccinations very clearly, such trends are not so obvious in other diseases. The data includes omissions or errors and the source of the errors were hard to identify and correct. Therefore, clear conclusions based on vaccine data could not be made for all diseases. The disease incidence rate were more reliable and clear conclusions can be drawn from disease data.
Conclusion About Vaccine Preventable Diseases
The visualizations clearly depicted the regionality of incidences and also show how incidences seemed to be prevalent in countries in a certain category of HDI. The following tables summarize the findings at a high level.
The visualizations generated for this project were a good tool for analyzing disease rates across regions and HDI categories. The tables above show that Diphtheria, for example, is more prevalent in AFR and SEAR regions.
However, polio is more prevalent in the AFR and EMR regions. Japanese Encephalitis seems to be a problem only in the SEAR and WPR regions. So, one could conclude that each region faces different challenges when it comes to these diseases.
The above tables also show that Pertussis is a problem for Very High Development countries which are not immune from these vaccine preventable diseases. Most other diseases seem to have more prevalence in Low and Medium Development countries.
Another finding is that some regions and HDI categories have eradicated diseases such as Polio altogether. Regions that have struggled with polio have contained it by ramping up vaccine distribution.
The implications from this analysis is that even though we are in the early stages of vaccine distribution for Covid-19, the road ahead is a long one. Disease eradication is a challenge and each region and HDI category will face its own challenges.
Future Work
The analysis here is far from complete and future work must include (1) more vaccine data (2) study of other parameters that impact disease prevalence and eradication (3) add statistical analysis