Air Crash Investigation

George Alster
Posted on May 12, 2019


This Air Crash Investigation blog is an introduction and summary of the shiny app I developed. This app was built as an interactive, and insightful analysis into historical airplane accidents since 1908. To go ahead and check out the app, please click the link here!

For an insight into the code used to develop the app, please follow this link to my Github page.


The motivation for this project comes in two parts. The first is from the customers point of view, namely, me. It is not uncommon to be a nervous flyer and to have worst case scenarios pop into your head as you experience some mid flight turbulence. In order to calm my nerves, I looked to statistics. The dataset used for this project contains just over five thousand observations recorded over a one hundred year period and according to multiple sources there are approximately one hundred thousand commercial flights daily. Computing this simple math instantly portrays the infrequency of these accidents. 

Never the less, we of course, would like to reduce this probability down to zero. So how can this data help? Analysis of this data enables an insight into the count of accidents by both type of vehicle and the airline. It is also highlights the fatality tally for these two filters in addition to being filtered by year. Therefore, we can observe which type of vehicle and or airline are most frequently involved in incidences. As an aviation safety consultant company, this information enables you to target your focus. 

The data

The data used for this project consists of over 5,000 observations and focuses on seven different variables; Date, Location, Type, Operator, Fatalities, Ratio and Ground. The ratio is the proportion of fatalities to those on board the plane and the ground variable is the number of fatalities that occurred on ground as a result of the incident. During the development of this app, the original dataset was merged with a second to find all the latitude and longitude values of each location where an accident occurred. This enabled development of the interactive globe in the first tab.

The app

Follow the link posted at the start of this blog to check out the application. The app consists of four main tabs. The homepage, which gives a brief introduction to the project. The interactive globe which enables you to observe the pinpoint locations where these accidents occurred and can be filtered by type, operator and date. The annual analysis shows two graphs. The first of these displays the history of incidents by the total number of fatalities and can also be filtered by the same variables. The second shows the count of incidents over the years and again can be filtered. Finally, the monthly analysis provides a small insight into when accidents most frequently occur. If we observe this graph across all years, we note that incidents are most likely to occur during the winter months. A likely reason for this could be the high increase in global tourism during the holiday period over new years. 

Future work

Data analysis is never truly finished and there are multiple further investigations to be done with this data. One useful analysis, would be to utilise the 'summary' variable in the dataset. Although this was not used in my analysis, it holds a lot of key information detailing what caused each individual accident. Regular expression analysis or the use of word clouds could be used here to find out what is the most common cause of airplane accidents. Fire? Terrorism? In air collision? etc.


Thank you very much for taking the time to read through my blog, I hope you enjoy my app!

About Author

George Alster

George Alster

George graduated with First Class Honours in his Chemical Engineering (MEng) degree at University College London (UCL) in 2018. Alongside completing groundbreaking research in the Electrochemical Innovation Lab at his university, George also has experience in the private...
View all posts by George Alster >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp