Visualizing Trends in Primary Education

Posted on Aug 5, 2017


Primary education is a fundamental requirement for success. Regardless of how one might define the term “success”, the skills attained in primary schooling are vital. Those of us who have grown up in a first-world country with universal access to primary school may take how important the early years of education are for granted.

The problem:

As of 2013, nearly 60 million children of primary school age were not in school1. While this figure is down from 99 million in the year 2000, clearly more action needs to be taken.

Given that the non-poor essentially have full access to primary education, we can safely assume that those who aren't enrolled are also poor. The gaps at a regional level are found in the poorer regions of the globe likes Sub-Saharan Africa and South Asia. These areas tend to be the poorest and are most affected by regional conflicts.

Using available data visualization tools within R like, we can graph specific indicators that were provided by the World Bank (link) to better portray our findings. In the graph above, comparing the cumulative dropout rate to the last grade in primary education indicates the proportion of pupils enrolled in a given grade who are no longer enrolled the following school year.


Comparing GNI per capita using the PPP (Purchasing Power Parity) method, we can find Sub-Saharan Africa and South Asia lagging behind the rest of the world. In the next graph, we can then look at the net enrollment rate and find these two regions at the bottom end of the scale as well. This reaffirms our assumption that poorer countries and regions are most likely to lack access to primary education. While trends have shown some improvement, there is much more work to be done.

What can we do?

Now that we’ve identified where the problem areas are physically, we can do a little more digging to understand some of the factors leading to the success (or failure) of primary education.

Using Pearson’s method of correlation, we can use R to calculate the covariance of x and y divided by the standard deviation of x multiplied by the standard deviation of y. This will better display how various indicators correlate to one another. Essentially, we are quantifying the interdependence between two indicators in order to try and identify what drives primary education success.

For the sake of the data we have available, we can define success of primary education as the literacy rate in the adult population. In addition to serving as an indicator of primary education success, literacy is an important factor in reducing poverty. A study by the World Bank linking education and poverty found that “in all cases where detailed analysis of household data has been carried out, poverty rates are highest for households headed by illiterate people and decline with increased education of the household head.”2

The correlation plot identifies the drivers of increased (or decreased) literacy rates in the population aged 25-64. Enrollment ratios and GPI (Gender Parity Index) have a positive impact on Literacy rates, while Pupil to teacher ratios have a negative correlation. Gender parity displays the access of females to males in terms of education. The closer to 1, the more equal access is. Gender parity is especially helpful when looking at certain countries and regions that may not prioritize female education.

Additionally, we note the dropout rates rising with the pupil to teacher ratio. In other words, as we have more students per teacher, they are less likely to succeed and stay in school for the following year. This leads to another cycle of illiteracy and, in turn, poverty.


If we want to get serious about tackling the lack of primary school to those in need, we must use data and data visualization tools to shed light on the issues. It’s time to end the cycle of poverty, and we can do so by providing basic primary education to all children. Ensuring that students are not vastly outnumbering their teacher proves to be an important factor to keeping students enrolled and improving literacy rates. If we can collect more data, we can answer the questions we need and raise additional ones we should be asking.

Further research shows that the highest returns in less developed countries come from primary education3. We must prioritize our future generations and give them a fighting chance to attain something better.

A child’s success in life should not be dictated by the region in which they are born.


Link to my ShinyApp.

Link to my GitHub repository.






About Author


Mike Ghoul

Mike is a strategic analyst with 5 years of financial services experience coupled with data science skills and an insatiable drive to solve problems. While at Morgan Stanley, he built predictive compensation models forecasting future costs and presented...
View all posts by Mike Ghoul >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp