Student Enrollment Visualization

Smitha Mathew
Posted on Dec 22, 2016

Introduction

This visualization project will use data that has been collected by World Bank related to the School enrollment, secondary (gross), gender parity index (GPI). This post shows details on how data is loaded, cleansed, filtered and used to plot visualizations using ggplot2 and dplyr. The reshape2 library has also been used to convert data from the source into format that is useful for analysis.

Data Source

The data that has been used for this visualization has been sourced from the World Bank Gender Data portal from the education section.  According to the data shared by them,

Gender parity index for gross enrollment ratio in secondary education is the ratio of girls to boys enrolled at secondary level in public and private schools. Ratio of girls to boys gross enrollment ratio in secondary school is calculated by dividing female gross enrollment ratio in secondary education by male gross enrollment ratio in secondary education.

Data on education are collected by the UNESCO Institute for Statistics from official responses to its annual education survey. All the data are mapped to the International Standard Classification of Education (ISCED) to ensure the comparability of education programs at the international level. The current version was formally adopted by UNESCO Member States in 2011.

The reference years reflect the school year for which the data are presented. In some countries the school year spans two calendar years (for example, from September 2010 to June 2011); in these cases the reference year refers to the year in which the school year ended (2011 in the example).

Data Cleansing

The raw csv file was opened up in a text editor, and as a preliminary step to data analysis in R, data that did not confirm to the csv format was removed. Specifically, some header lines about the column heading were removed. It was also determined that the column header had a missing field that led to issues during analysis. This was also taken care of.

The source data contained information in two files. The first file contained 264 observations of 62 variables. Majority of the 264 observations correlated to the nations of the world. Out of the 62 variables, 58 of them were related to the GPI data for the years 1960 ~ 2016.  There were also few rows that contained summary data.

The second file consisted of country information, specifically classifying various countries into specific regions and Income Groups. Data from both these files were joined together to enable categorization of GPI data based on region and income levels.

The data was difficult to use directly for R visualization. To perform visualization of the progress in different nations, GPI data that was organized as separate columns for each of those were converted to a melted form using the reshape2 package. This resulted in the generation of 12098 rows of observations of GPI data.

https://gist.github.com/smithaam/54ca238d3d5dc2548f0b5000a7f9fe0b

 

Visualizations

Income based Visualization

https://gist.github.com/smithaam/2d0cd47acd1b1fcfd83dcd689087e7e9

 

incomelevelcomparison

SVG Version - rplot01

 

Region based visualization

https://gist.github.com/smithaam/38b68fd491da565e629a124b2f4db5de

regionlevelcomparison

SVG Version - rplot13

regioncompare

A global heat map depiction

https://gist.github.com/smithaam/ed8f5f0a1f17ed6925994a3eb759b504

densityplotwithpoints

Conclusion

By performing an introductory data visualization, its clear that the global gender parity index for school admissions has steadily climbed closer to a point of equality (Ratio of 1.0) between the two genders. We can also clearly see that areas like Europe and Americas have had a good ratio from the 70s. The Arab, middle east and north African areas although historically at a low 0.7 GPI has in the recent past improved the ratio to 0.95. The conflict affected areas and the sub Saharan areas are still a concern as they have improved the ratio only to of 0.8.

We can also clearly see a pattern of higher income countries having a better ratio vs the lower income countries who have always lagged over the course of 50 years.

About Author

Smitha Mathew

Smitha Mathew

Technology Enthusiast, with attention to detail, having global exposure. She is a self-motivated problem solver with experience analyzing data and deriving meaningful statistical information. Her goal is to be able to make a positive difference in peoples lives...
View all posts by Smitha Mathew >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp