Data Study on Air Crash Investigation

Posted on May 12, 2019
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Study on Air Crash Investigation

Introduction

This Air Crash Investigation blog is an introduction and summary of the shiny app I developed. This app was built as an interactive, and insightful data analysis into historical airplane accidents since 1908. To go ahead and check out the app, please click the link here! https://alster96.shinyapps.io/gsa_project2/

For an insight into the code used to develop the app, please follow this link to my Github page. https://github.com/Alster96/gsa_project2

Motivation

The motivation for this project comes in two parts. The first is from the customers point of view, namely, me. It is not uncommon to be a nervous flyer and to have worst case scenarios pop into your head as you experience some mid flight turbulence. In order to calm my nerves, I looked to statistics. The dataset used for this project contains just over five thousand observations recorded over a one hundred year period and according to multiple sources there are approximately one hundred thousand commercial flights daily. Computing this simple math instantly portrays the infrequency of these accidents. 

Never the less, we of course, would like to reduce this probability down to zero. So how can this data help? Analysis of this data enables an insight into the count of accidents by both type of vehicle and the airline. It is also highlights the fatality tally for these two filters in addition to being filtered by year. Therefore, we can observe which type of vehicle and or airline are most frequently involved in incidences. As an aviation safety consultant company, this information enables you to target your focus. 

The data

The data used for this project consists of over 5,000 observations and focuses on seven different variables; Date, Location, Type, Operator, Fatalities, Ratio and Ground. The ratio is the proportion of fatalities to those on board the plane and the ground variable is the number of fatalities that occurred on ground as a result of the incident. During the development of this app, the original dataset was merged with a second to find all the latitude and longitude values of each location where an accident occurred. This enabled development of the interactive globe in the first tab.

The app's data 

Follow the link posted at the start of this blog to check out the application. The app consists of four main tabs. The homepage, which gives a brief introduction to the project. The interactive globe which enables you to observe the pinpoint locations where these accidents occurred and can be filtered by type, operator and date. The annual analysis shows two graphs. The first of these displays the history of incidents by the total number of fatalities and can also be filtered by the same variables. The second shows the count of incidents over the years and again can be filtered.

Finally, the monthly analysis provides a small insight into when accidents most frequently occur. If we observe this graph across all years, we note that incidents are most likely to occur during the winter months. A likely reason for this could be the high increase in global tourism during the holiday period over new years. 

Future work

Data analysis is never truly finished and there are multiple further investigations to be done with this data. One useful analysis, would be to utilise the 'summary' variable in the dataset. Although this was not used in my analysis, it holds a lot of key information detailing what caused each individual accident. Regular expression analysis or the use of word clouds could be used here to find out what is the most common cause of airplane accidents. Fire? Terrorism? In air collision? etc.

 

Thank you very much for taking the time to read through my blog, I hope you enjoy my app!

About Author

George Alster

George graduated with First Class Honours in his Chemical Engineering (MEng) degree at University College London (UCL) in 2018. Alongside completing groundbreaking research in the Electrochemical Innovation Lab at his university, George also has experience in the private...
View all posts by George Alster >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI