Don't Know Much About History: Visualizing the Scale of Major 20th Century Conflicts (Overview)

Avatar
Posted on May 1, 2017

Check out the app here while reading the article.

 

Executive Summary:

I built a comprehensive app to plot, filter, and analyze World War Two and other major 20th century conflict data using R Shiny. Users can investigate these historical events spatially (on maps), temporally (through histogram data), and relationally (through scatter and bar plots), and even animate the progression of these conflicts over time.

 

Skip straight to the technical parts! (separate blog post)

 

I. Introduction

 

History is about human experiences of the past, and few of us have access to these shared stories anymore.

 

Sure, my grandfather served in the Pacific Theater in the Navy, but he died before I was born, and I got to hear a few stories from my great uncles who flew supply runs during World War Two as well, but I really don't have a sense for what living through this period in history was like. Neither do most people alive at this point.

For this first project, I wanted to use the power of data visualization in R through Shiny to give people a sense of the immense scale of the conflicts of the 20th century that I can hardly understand myself.

 

(Why Shiny? Because Shiny makes “app-ifying” data through R incredibly easy while still providing comprehensive functionality—I’d highly recommend it.)

 

II. Motivations - Why I Chose These Data

 

I obtained these datasets, which were just published within the past several months, from data.mil. I was frankly shocked to see the US military jumping on the Open Data bandwagon, so I couldn't help but take a peek at the dataset. Within it I found an enormous collection of data points that each carry heavy weight as a human story. I decided to make a tool for anyone passingly curious in, say, WW2 history to be able to explore the history in an interactive and intuitive way.

 

The data themselves are records of aerial bombing operations performed by the United States military (reportedly every single bombing run made, with a few caveats) along with a small collection of records of aerial bombing operations performed by other nations in conjunction with the US military surrounding four important conflicts of the 20th century: World War One, World War Two, the Korean War, and the Vietnam War* (or “Vietnam Era Conflict”, as it used to be classified by the Library of Congress due to an official declaration of war never having been made). I have no way of proving that these data are exactly representative of the larger conflicts (though data.mil insists they are comprehensive), so I will shy away from making large-scale comparisons, though the data appear representative and complete enough for me to feel confident in the utility of this tool I've created. Each point represents a story that had significant impact on the people and places involved, even if much of the residue of these conflicts has been washed away by the ages, as craters have been filled in and entire cities have been rebuilt. I implore the user to dive into the individual point data first to best get a sense of the importance of this project.

 

These datasets also provided significant challenges to myself as a programmer: many numerical data were missing, inaccurate, or incomplete, textual entries were difficult to process, and the sheer volume of the data (nearly 5 million total observations altogether across hundreds of columns) required a special level of care to make such an exploratory tool possible for people to enjoy using.

 

My hope is that anyone—from the amateur historian to the student with a passing curiosity in these histories—will be able to develop a sense of these events through this tool.

 

III. Overview

 

One of the most important senses I want to give people is a sense of scale, both in terms of space and magnitude or intensity.

 

To give people a sense of the shear amount of land (and sea) area affected by these conflicts, I've plotted out the unique targets of aerial bombing operations for all of these wars on a central "overview" map. Each conflict has a separate color to make comparisons easy (compare WW1—when airplanes were just beginning to be used in combat—and WW2—just some 25 years later—for instance). The data are sampled heavily in order to make plotting (especially within the platform of Shinyapps.io) feasible. I have also allowed the opacity to change based on how many points are plotted for improved visibility--notice that few points appear distinct and many points become a cloud. Even just from a bird's eye view, you can see how much territory was affected by these conflicts, and you can see how the conflict was distributed across space and time. Feel free to adjust the map and label style to your liking as you explore the spatial distributions.

 

I wanted to give people an otherworldly sense of magnitude by displaying the number of missions flown, bombs dropped, and weight of bombs dropped in the info boxes at the top. These update based on your selections of wars to include and specific date ranges to inspect. You may find a few noticeable distinctions between the different wars based on the magnitude of their bombing campaigns.

It should be noted that the weight shown in pounds is just the weight of the deliverable or the warhead itself—that is, a 10-pound explosive propelled by 90 pounds of rocket fuel would be counted as a 10-pound bomb. Astute observers may have already noticed the relative magnitude of the atomic bombs dropped near the end of the Second World War as well—they have been listed according to their weight in TNT explosive equivalent.

 

Each point comes with a clickable tooltip that provides a little snippet of the event as far as the data can show. The text has been heavily processed to appear reasonable and well-formatted to the human eye, and missing or incomplete data has been replaced with general descriptions. I consider this a key feature of the app, as many memorials around the world are made most effective by showing individual names and stories. Often just a small snippet (an amount that's digestible) is more impactful than a full report, as there's simply too much information in its entire form, and focusing on a few small points that one can grasp onto (along with an understanding the scale of the conflict) can provide one with the best intuitive understanding.

 

IV. Sandbox

 

I mentioned that I wanted to create a tool for exploring history through data, and while the overview plots are intuitive and informative, I wanted to give any user, regardless of programming skill, the ability to further investigate trends and patterns the data. I call it the sandbox, as it's a tool that allows the user to create new things in an exploratory and creative manner. It lets people explore trends between whatever two variables they'd like as a hypothesis generator.

 

V. Animation

 

I also wanted to give users a sense of the progression of these conflicts, so I added an animation component to all of the graphs and maps. The app will automatically cycle through the entire date range of the conflicts selected by year, by month, or by week, to show the different stages of each war. For instance, with World War Two, you can see the European Theatre open up before the Pacific Theatre, followed by the liberation of France, followed by the sudden crescendo of bombings throughout Japan, culminating in the atomic bombings at Hiroshima and Nagasaki.

 

VI. Historical Interests

 

Have you ever wondered where the expression “taking flak” comes from? Watch a “How to Avoid Flak” training video (genuinely used for World War Two pilots) in the Pilot tab. For an overview of major battles and events, see the Commander tab. For an eye on the sky as a civilian, see a heatmap of bombing intensity in the Civilian tab.

 

Feedback

 

Needless to say, if you have any suggestions (or find any errors in the app or its code), please don’t hesitate to contact me.

 

Dig Deeper

 

That’s it for the surface of the project. The true magic lies in what’s under the hood: how the data were prepared. To learn more about that, see my technical blog post on this project here.

About Author

Avatar

Scott Dobbins

Scott Dobbins is a trained biochemist (B.A. Columbia 2011, M.S. Stanford 2013) with a specialty in international education and teaching. He is fluent in multiple languages and learns best by figuring out how to translate difficult concepts into...
View all posts by Scott Dobbins >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Classes Demo Day Demo Lesson Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet Lectures linear regression Live Chat Live Online Bootcamp Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Lectures Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking Realtime Interaction recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp