World Development Indicator Explorer - Shiny
Contributed by Aravind Kolumum Raja. He attended in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his second class project - R Shiny (due on the 4th week of the program).
The World Bank collects and processes large amounts of data and generates them on the basis of economic models. These data and models have gradually been made available to the public in a way that encourages reuse. The primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.
The data ranging from 1960-2015 across 214 Economies covers a wide range of indicators across various domains including Agriculture & Rural Development, Aid Effectiveness, Climate Change, Economy & Growth, Education, Energy & Mining, Environment, External Debt, Financial Sector, Gender, Health, Infrastructure, Labor & Social Protection, Poverty, Private Sector, Public Sector, Science & Technology, Social Development, Trade and Urban Development. 1346 unique indicators have been used whose data have been analyzed and used in Research across a variety of domains.
This project using Shiny is an attempt to start building a Visualization and Statistical analysis toolbox that can operate on all these Indicators so one can study cross sectional as well as Time series comparisons between Indicators and Countries. A header of the dataset is shown below. The data had more than 330,000 rows for each Country, Indicator combination which was a Time Series ranging from 1960 - 2015. The data contained many missing values across earlier years(as expected) .
I added two more columns based on the accompanying documentation file to include Indicator category and Currency. The data also comprises of group economies like Arab World, Europe, Asia etc along with more than 200 Country data.
Shiny allows interactive visualization platforms using R code without needing to employ Javascript code. So I wrote my first Shiny app using a single R file.
I used the time series visualization package (dygraphs), Time series modelling package (xts) , and Googlevis , the R interface to Google Charts API, allowing users to create interactive charts based on data frames.
Shiny development is roughly the design of the User Interface and coding the Server component where all the calculations and data crunching is done.
For the UI , I created three basic pages, using the code below.
- Indicator Time Series Visualization across multiple Countries . The example below just shows one of the indicators, Fertility rate, across three different Economies , China, India and Japan . The slider can be used to zoom in or focus on a particular range of years. The values of the indicator variable for a particular point can be inferred by hovering the mouse over any of the lines to get a cross sectional comparison for the year across the countries.
- The second page Choropleth Maps for a choice of (Year, Indicator) combination showing the variation of that Indicator across countries and the top and bottom countries for the same combination
- Bubble/Scatter plot for Two different Indicators across a slider of Years (motion play)
The underlying Server implementation involved some data crunching using dplyr and converting , attribute(selected based on the user input) into a Time Series object (xts) and rendering the Charts.
The app has been deployed at https://aravindkr.shinyapps.io/worldbank/ if anyone wishes to play with the data.
The app still needs a lot of improvements in UI Design . A lot of scope exists to further expand on this platform which would be ultimately useful
for Social Scientists, Economists, Financial analysts among many other with the inclusion of
- Regression/Time Series modelling
- Cointegration testing
- Machine learning using both supervised and unsupervised methods including Cluster analysis of Countries based on certain parameters, GDP prediction using Random Forest,Neural nets and other advanced algorithms
- Subset Data set viewing/ download capabilities
within this Shiny framework which can be used as an initial visual or statistical input for Quantitative research.
However owing to the short nature of the bootcamp, I have not been able to devote too much time on this particular project. However, if anyone is interested in this application and the World Development data and wishes to work towards expanding this into a complete application involving statistical modelling and machine learning .I am happy to collaborate at kr.aravind@gmail.com