Don't Know Much About History: Visualizing the Scale of Major 20th Century Conflicts (Overview)

Posted on May 1, 2017

Check out the app here while reading the article.

Executive Summary:

I built a comprehensive app to plot, filter, and analyze World War Two and other major 20th century conflict data using R Shiny. Users can investigate these historical events spatially (on maps), temporally (through histogram data), and relationally (through scatter and bar plots), and even animate the progression of these conflicts over time.

Skip straight to the technical parts! (separate blog post)

I. Introduction

History is about human experiences of the past, and few of us have access to these shared stories anymore.

Sure, my grandfather served in the Pacific Theater in the Navy, but he died before I was born, and I got to hear a few stories from my great uncles who flew supply runs during World War Two as well, but I really don't have a sense for what living through this period in history was like. Neither do most people alive at this point.

For this first project, I wanted to use the power of data visualization in R through Shiny to give people a sense of the immense scale of the conflicts of the 20th century that I can hardly understand myself.

(Why Shiny? Because Shiny makes “app-ifying” data through R incredibly easy while still providing comprehensive functionality—I’d highly recommend it.)

II. Motivations - Why I Chose These Data

I obtained these datasets, which were just published within the past several months, from I was frankly shocked to see the US military jumping on the Open Data bandwagon, so I couldn't help but take a peek at the dataset. Within it I found an enormous collection of data points that each carry heavy weight as a human story. I decided to make a tool for anyone passingly curious in, say, WW2 history to be able to explore the history in an interactive and intuitive way.

The data themselves are records of aerial bombing operations performed by the United States military (reportedly every single bombing run made, with a few caveats) along with a small collection of records of aerial bombing operations performed by other nations in conjunction with the US military surrounding four important conflicts of the 20th century: World War One, World War Two, the Korean War, and the Vietnam War* (or “Vietnam Era Conflict”, as it used to be classified by the Library of Congress due to an official declaration of war never having been made). I have no way of proving that these data are exactly representative of the larger conflicts (though insists they are comprehensive), so I will shy away from making large-scale comparisons, though the data appear representative and complete enough for me to feel confident in the utility of this tool I've created. Each point represents a story that had significant impact on the people and places involved, even if much of the residue of these conflicts has been washed away by the ages, as craters have been filled in and entire cities have been rebuilt. I implore the user to dive into the individual point data first to best get a sense of the importance of this project.

These datasets also provided significant challenges to myself as a programmer: many numerical data were missing, inaccurate, or incomplete, textual entries were difficult to process, and the sheer volume of the data (nearly 5 million total observations altogether across hundreds of columns) required a special level of care to make such an exploratory tool possible for people to enjoy using.

My hope is that anyone—from the amateur historian to the student with a passing curiosity in these histories—will be able to develop a sense of these events through this tool.

III. Overview

One of the most important senses I want to give people is a sense of scale, both in terms of space and magnitude or intensity.

To give people a sense of the shear amount of land (and sea) area affected by these conflicts, I've plotted out the unique targets of aerial bombing operations for all of these wars on a central "overview" map. Each conflict has a separate color to make comparisons easy (compare WW1—when airplanes were just beginning to be used in combat—and WW2—just some 25 years later—for instance). The data are sampled heavily in order to make plotting (especially within the platform of feasible. I have also allowed the opacity to change based on how many points are plotted for improved visibility--notice that few points appear distinct and many points become a cloud. Even just from a bird's eye view, you can see how much territory was affected by these conflicts, and you can see how the conflict was distributed across space and time. Feel free to adjust the map and label style to your liking as you explore the spatial distributions.

I wanted to give people an otherworldly sense of magnitude by displaying the number of missions flown, bombs dropped, and weight of bombs dropped in the info boxes at the top. These update based on your selections of wars to include and specific date ranges to inspect. You may find a few noticeable distinctions between the different wars based on the magnitude of their bombing campaigns.

It should be noted that the weight shown in pounds is just the weight of the deliverable or the warhead itself—that is, a 10-pound explosive propelled by 90 pounds of rocket fuel would be counted as a 10-pound bomb. Astute observers may have already noticed the relative magnitude of the atomic bombs dropped near the end of the Second World War as well—they have been listed according to their weight in TNT explosive equivalent.

Each point comes with a clickable tooltip that provides a little snippet of the event as far as the data can show. The text has been heavily processed to appear reasonable and well-formatted to the human eye, and missing or incomplete data has been replaced with general descriptions. I consider this a key feature of the app, as many memorials around the world are made most effective by showing individual names and stories. Often just a small snippet (an amount that's digestible) is more impactful than a full report, as there's simply too much information in its entire form, and focusing on a few small points that one can grasp onto (along with an understanding the scale of the conflict) can provide one with the best intuitive understanding.

IV. Sandbox

I mentioned that I wanted to create a tool for exploring history through data, and while the overview plots are intuitive and informative, I wanted to give any user, regardless of programming skill, the ability to further investigate trends and patterns the data. I call it the sandbox, as it's a tool that allows the user to create new things in an exploratory and creative manner. It lets people explore trends between whatever two variables they'd like as a hypothesis generator.

V. Animation

I also wanted to give users a sense of the progression of these conflicts, so I added an animation component to all of the graphs and maps. The app will automatically cycle through the entire date range of the conflicts selected by year, by month, or by week, to show the different stages of each war. For instance, with World War Two, you can see the European Theatre open up before the Pacific Theatre, followed by the liberation of France, followed by the sudden crescendo of bombings throughout Japan, culminating in the atomic bombings at Hiroshima and Nagasaki.

VI. Historical Interests

Have you ever wondered where the expression “taking flak” comes from? Watch a “How to Avoid Flak” training video (genuinely used for World War Two pilots) in the Pilot tab. For an overview of major battles and events, see the Commander tab. For an eye on the sky as a civilian, see a heatmap of bombing intensity in the Civilian tab.


Needless to say, if you have any suggestions (or find any errors in the app or its code), please don’t hesitate to contact me.

Dig Deeper

That’s it for the surface of the project. The true magic lies in what’s under the hood: how the data were prepared. To learn more about that, see my technical blog post on this project here.

About Author

Scott Dobbins

Scott Dobbins is a trained biochemist (B.A. Columbia 2011, M.S. Stanford 2013) with a specialty in international education and teaching. He is fluent in multiple languages and learns best by figuring out how to translate difficult concepts into...
View all posts by Scott Dobbins >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI