Explore Global Insights in a World Atlas Shiny App
In April of 2023, Posit released Shiny for Python. In doing so it extended, support for building apps in both R and Python, offering tools and packages for Python and Julia and other languages. While that introduction was both exciting and potentially helpful, it was a challenge to optimize its value. That is due to the fact that there are not many example apps to look at for inspiration and the absence of any community discussion online. To make the app, I first had to find an interesting and robust dataset to work with. I chose a Global Country Information Dataset and imagined what a nice Python Shiny app might look like.
Inspired by the viral TEDTalk by Hans Rosling, The Best Stats You've Ever Seen, this World Atlas app was created to help users learn about the world visually. It is designed for users to select and learn about the countries and socioeconomic indicators they're most interested in from two standpoints: an interactive world map and a variety of graphs that show how the world has changed over time.
Challenges of the project
In building the Python-Shiny app, you might assume (as I did) that grouping data for countries and regions in an accurate and precise way would be easy, but it wasn’t very simple to interweave and conceptualize them. Country names have spelling variations, and a lot of the data had to be manually cleaned.
In doing some exploratory data analysis, I realized that for the type of visualizations I wanted to create, I’d have to spend time merging different data frames that included the historical data for each country that would have its socioeconomic factors plotted. The next step was merging these datasets with additional information like ISO Alpha codes that recognize a country’s location on an interactive Plotly map, plus commonly used rRegions (APAC/AMER/EMEA) and a dataset with all six continents labeled. Adding ISO codes, regions, and continents allowed for more creativity in visualizing each country’s socioeconomic data for the past two decades.
Other challenges included working with different types of numerical data, and ranges of numerical data that were much smaller or larger in certain cases. To visualize the data correctly, the data had to be matched with the correct types of graphs and other variables. The information represented in each column of the data frames varies distinctly including percentage data, life expectancy (ranging from 40-90), a population ratio out of 1000 to represent birth rate and maternal mortality rate, and GDP represented in Billions, USD-price adjusted.
Technical challenges that arose included:
- Creating a requirements.txt file with packages in a version that enabled the dependencies work smoothly and would not cause the app to break for users who want to run it in their local environment
- Fixing environment problems that surfaced when VSCode needed to have the correct Python version running for the app.py file and a separate environment for exploratory data analysis in Jjupyter notebook
- Finding missing packages for certain graphs: “statsmodels” had to be installed and added to the requirements.txt file, whereas this is already part of R
- Learning to create a user interface and write in a UI code with Python shiny (Certain UI code had been deprecated already for Python Shiny after being live for less than one year. The code was a little different, from the very few existing apps that could be used for inspiration.)
- There weren't any Python Shiny Apps to use as inspiration online that were using Plotly geo-visualizations
- Writing some HTML and CSS to tidy up the style and design of the final app version
Deciding on the final view of the app was difficult. I had three goals:
1 - Portray the data into neat visualizations that serve as an interactive learning experience,
2 - Create a great user experience with a logical flow of information, and
3 - Produce visually-appealing maps.
I tried Pyplot’s ipyleaflet basemaps, GeoPandas, Folium, and eventually settled on Plotly’s MapBox package. At first, I thought it would look interesting to have the option to view multiple ipyleaflet basemaps on one navigation tab. In the end, though, I opted for Plotly’s MapBox in the earth-terrain style for the home page and a second page with either a white base map or dark base map that correlates to the time of day; after 5pm, the dark map will appear. Overall, Plotly’s MapBox was not just the most functional and intuitive choice; it also offered the best aesthetic for displaying geo-visualization data. In contrast, ipyleaflet did not have much documentation, and some of its maps had been deprecated and were no longer available as of October of 2023.
When it came to creating the data visualizations,I found Plotly to once again be far more effective than pandas, matplotlib, and seaborn as it allows users of the app to effectively zoom in and out of Plotly base maps AND interact and play with Plotly’s wide range of graphs to fully understand information better than they would on static graphs.
Data for this project was sourced from Kaggle, Gapminder, and the World Bank. I also used a few other sources found online to add historical CO2 Emissions, four letter rRegion codes, ISO codes, and continents. If I had to do it over again, I would pull more data straight from Gapminder, where there are a lot of really rich historical datasets that could have provided decades worth of depth on my historical data page. In retrospect, though, It was good practice merging the datasets to create the historical data page.
Discovering questions and answers
Through the course of working with the original dataset and then creating a second dataset with historical data, some research questions naturally unfolded.
1 - How can the world’s information be simplified with geo-visualization?
2 - What insights can be found in the data from 2023 with interactive data visualizations?
3 - What insights, patterns, and trends in the historical data led us to the current global status?
These questions can be answered by the user of the app who explores the maps to understand the world data through geo-visualizations. The current year data’s interactive visualizations shed light on the state of the world in 2023, and the final page of the app can be used to explore interactive Plotly graphs with historical data.
Who would find the app useful
Ideal Users for the app were kept in mind while developing it. At first the idea was to build a useful app for educators and researchers. It could have a broader base of users if further developed. At it’s current stage, there are a few ideal users:
- Data-Driven decision-makers who need socioeconomic data analysis and geo-visualizations.
- Educators and Advocates: Specialists who use technology and data for education and advocacy.
- Global Health Experts: Professionals who need annual or up-to-date health and environmental data.
- Economic Analysts and Environmentalists: Consultants who require data on economic and environmental trends.
If further developed, the app could have an even wider range of users.
Key takeaways and insights
Here are some of the most interesting insights I found in my own research:
Higher education shows a correlation of lower birth rates around the world. This could reflect a couple of insights. One is that there is more opportunity for women around the world to get an education and work, and subsequently, women have fewer children. I also believe it may reflect the amount of debt higher education creates, which may leave people reluctant to incur the additional expense of a large family.
Here we can see that certain countries in Africa, part of the EMEA region, experience both a higher infant mortality and maternal mortality rate as their birth rate rises.
This chart shows the highest levels of health expenditures in the world and paints a stark contrast as to what someone might expect to spend when receiving health care abroad.
Here we see the wide range of country’s priorities when it comes to producing food. The size of the bubbles represents the population size of the country and paints an interesting picture when compared to how much percentage of land is devoted to agriculture.
In this graph, I wanted to portray the countries in which people enjoy the greatest longevity. The top 30 countries all have an average life expectancy above 80-years old. However, economic superpowers with the highest populations such as the US, China and India, did not crack above the 80-year average life expectancy mark. This surprising insight suggests that having one of the strongest economy does not correlate with having one of the healthiest, longest-living societies.
The higher the population, the more CO2 Emissions. This graph shares the clear correlation between every country’s population size and CO2 emissionsNot pictured: US, China, India.
This graph was made possible by the distinct regional codes, which allowed me to split countries into three subsets. It paints an interesting picture of which countries maintain a large area of forest and which countries prioritize food production.
In this graph, users can hover over each country's two bars, representing the education levels for the biggest economies in the world.
Overall, I found the Plotly documentation pretty helpful; it can be viewed here: Plotly Scatter MapBox documentation.