World Development Indicator Explorer - Shiny

Posted on Apr 1, 2016

Contributed by Aravind Kolumum Raja. He attended in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his second class project - R Shiny  (due on the 4th week of the program).

 

The World Bank collects and processes large amounts of data and generates them on the basis of economic models. These data and models have gradually been made available to the public in a way that encourages reuse.  The primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.

The data ranging from 1960-2015 across 214 Economies covers a wide range of indicators across various domains including Agriculture & Rural Development, Aid Effectiveness, Climate Change, Economy & Growth, Education, Energy & Mining, Environment, External Debt, Financial Sector, Gender, Health, Infrastructure, Labor & Social Protection, Poverty, Private Sector, Public Sector, Science & Technology, Social Development, Trade and Urban Development. 1346 unique indicators have been used whose data have been analyzed and used in Research across a variety of domains.

This project using Shiny is an attempt to start building a Visualization and Statistical analysis toolbox that can operate on all these Indicators so one can study cross sectional as well as Time series comparisons between Indicators and Countries.  A header of the dataset is shown below. The data had more than 330,000 rows for each Country, Indicator combination which was a Time Series ranging from 1960 - 2015.  The data contained many missing values across earlier years(as expected) .

I added two more columns based on the accompanying documentation file to include Indicator category and Currency. The data also comprises of group economies like Arab World, Europe, Asia etc along with more than 200 Country data.

worldbankdata

Shiny allows interactive visualization platforms using R code without needing to employ Javascript code. So  I wrote my first Shiny app using a single R file.

I used the time series visualization package (dygraphs), Time series modelling package (xts) , and Googlevis , the R interface to Google Charts API, allowing users to create interactive charts based on data frames.

Shiny development is roughly the design of the User Interface and coding  the Server component where all the calculations and data crunching is done.

 

For the UI ,  I created three basic pages, using the code below.

 

  • Indicator Time Series Visualization across multiple Countries . The example below just shows one of the indicators, Fertility rate, across three different Economies , China, India and Japan . The slider can be used to zoom in or focus on a particular range of years.  The values of the indicator variable for a particular point can be inferred by hovering the mouse over any of the lines to get a cross sectional comparison for the year across the countries.
  • shinypage1

 

 

  • The second page Choropleth Maps for a choice of (Year, Indicator) combination showing the variation of that Indicator across countries and the top and bottom countries for the same combination

shinypage21

shinypage22 shinypage23

 

 

  • Bubble/Scatter plot for Two different Indicators across a slider of Years  (motion play)

shinypage3

 

The underlying Server implementation involved some data crunching using dplyr and converting , attribute(selected based on the user input)  into a Time Series object (xts) and rendering the Charts.

 

The app has been deployed at https://aravindkr.shinyapps.io/worldbank/ if anyone wishes to play with the data.

The app still needs a lot of improvements in UI Design . A lot of scope exists to further expand on this platform which would be ultimately useful

for Social Scientists, Economists, Financial analysts among many other with the inclusion of

  • Regression/Time Series modelling
  • Cointegration testing
  • Machine learning using both supervised and unsupervised methods including Cluster analysis of Countries based on certain parameters, GDP prediction using Random Forest,Neural nets and other advanced algorithms
  • Subset Data set viewing/ download capabilities

within this  Shiny framework which can be used as an initial visual or statistical input  for Quantitative research.

However owing to the short nature of the bootcamp,  I have not been able to devote too much time on this particular project. However, if anyone is interested in this application and the World Development data and wishes to work towards  expanding this into a complete application involving statistical modelling and machine learning .I am happy to collaborate at [email protected]

 

 

 

 

 

 

About Author

Aravind Kolumum Raja

Aravind obtained his Masters degree in Statistics from Columbia University in 2012 and is presently an Analyst with a global investment management firm based in New York. His primary interests are in Mathematics, Statistics & Machine learning. He...
View all posts by Aravind Kolumum Raja >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI