World Development Indicator Explorer - Shiny

Aravind Kolumum Raja
Posted on Apr 1, 2016

Contributed by Aravind Kolumum Raja. He attended in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his second class project - R Shiny  (due on the 4th week of the program).

 

The World Bank collects and processes large amounts of data and generates them on the basis of economic models. These data and models have gradually been made available to the public in a way that encourages reuse.  The primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.

The data ranging from 1960-2015 across 214 Economies covers a wide range of indicators across various domains including Agriculture & Rural Development, Aid Effectiveness, Climate Change, Economy & Growth, Education, Energy & Mining, Environment, External Debt, Financial Sector, Gender, Health, Infrastructure, Labor & Social Protection, Poverty, Private Sector, Public Sector, Science & Technology, Social Development, Trade and Urban Development. 1346 unique indicators have been used whose data have been analyzed and used in Research across a variety of domains.

This project using Shiny is an attempt to start building a Visualization and Statistical analysis toolbox that can operate on all these Indicators so one can study cross sectional as well as Time series comparisons between Indicators and Countries.  A header of the dataset is shown below. The data had more than 330,000 rows for each Country, Indicator combination which was a Time Series ranging from 1960 - 2015.  The data contained many missing values across earlier years(as expected) .

I added two more columns based on the accompanying documentation file to include Indicator category and Currency. The data also comprises of group economies like Arab World, Europe, Asia etc along with more than 200 Country data.

worldbankdata

Shiny allows interactive visualization platforms using R code without needing to employ Javascript code. So  I wrote my first Shiny app using a single R file.

I used the time series visualization package (dygraphs), Time series modelling package (xts) , and Googlevis , the R interface to Google Charts API, allowing users to create interactive charts based on data frames.

Shiny development is roughly the design of the User Interface and coding  the Server component where all the calculations and data crunching is done.

 

For the UI ,  I created three basic pages, using the code below.

https://gist.github.com/kraravind/2cfe89d8ea77695f116957548128305d

 

  • Indicator Time Series Visualization across multiple Countries . The example below just shows one of the indicators, Fertility rate, across three different Economies , China, India and Japan . The slider can be used to zoom in or focus on a particular range of years.  The values of the indicator variable for a particular point can be inferred by hovering the mouse over any of the lines to get a cross sectional comparison for the year across the countries.
  • shinypage1

 

 

  • The second page Choropleth Maps for a choice of (Year, Indicator) combination showing the variation of that Indicator across countries and the top and bottom countries for the same combination

shinypage21

shinypage22 shinypage23

 

 

  • Bubble/Scatter plot for Two different Indicators across a slider of Years  (motion play)

shinypage3

 

The underlying Server implementation involved some data crunching using dplyr and converting , attribute(selected based on the user input)  into a Time Series object (xts) and rendering the Charts.

https://gist.github.com/kraravind/c80e7ec13a074a5cc2ee7577bf1b1f01

 

The app has been deployed at https://aravindkr.shinyapps.io/worldbank/ if anyone wishes to play with the data.

The app still needs a lot of improvements in UI Design . A lot of scope exists to further expand on this platform which would be ultimately useful

for Social Scientists, Economists, Financial analysts among many other with the inclusion of

  • Regression/Time Series modelling
  • Cointegration testing
  • Machine learning using both supervised and unsupervised methods including Cluster analysis of Countries based on certain parameters, GDP prediction using Random Forest,Neural nets and other advanced algorithms
  • Subset Data set viewing/ download capabilities

within this  Shiny framework which can be used as an initial visual or statistical input  for Quantitative research.

However owing to the short nature of the bootcamp,  I have not been able to devote too much time on this particular project. However, if anyone is interested in this application and the World Development data and wishes to work towards  expanding this into a complete application involving statistical modelling and machine learning .I am happy to collaborate at [email protected]

 

 

 

 

 

 

About Author

Aravind Kolumum Raja

Aravind Kolumum Raja

Aravind obtained his Masters degree in Statistics from Columbia University in 2012 and is presently an Analyst with a global investment management firm based in New York. His primary interests are in Mathematics, Statistics & Machine learning. He...
View all posts by Aravind Kolumum Raja >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp