Data Study on Air Crash Investigation
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction
This Air Crash Investigation blog is an introduction and summary of the shiny app I developed. This app was built as an interactive, and insightful data analysis into historical airplane accidents since 1908. To go ahead and check out the app, please click the link here! https://alster96.shinyapps.io/gsa_project2/
For an insight into the code used to develop the app, please follow this link to my Github page. https://github.com/Alster96/gsa_project2
Motivation
The motivation for this project comes in two parts. The first is from the customers point of view, namely, me. It is not uncommon to be a nervous flyer and to have worst case scenarios pop into your head as you experience some mid flight turbulence. In order to calm my nerves, I looked to statistics. The dataset used for this project contains just over five thousand observations recorded over a one hundred year period and according to multiple sources there are approximately one hundred thousand commercial flights daily. Computing this simple math instantly portrays the infrequency of these accidents.
Never the less, we of course, would like to reduce this probability down to zero. So how can this data help? Analysis of this data enables an insight into the count of accidents by both type of vehicle and the airline. It is also highlights the fatality tally for these two filters in addition to being filtered by year. Therefore, we can observe which type of vehicle and or airline are most frequently involved in incidences. As an aviation safety consultant company, this information enables you to target your focus.
The data
The data used for this project consists of over 5,000 observations and focuses on seven different variables; Date, Location, Type, Operator, Fatalities, Ratio and Ground. The ratio is the proportion of fatalities to those on board the plane and the ground variable is the number of fatalities that occurred on ground as a result of the incident. During the development of this app, the original dataset was merged with a second to find all the latitude and longitude values of each location where an accident occurred. This enabled development of the interactive globe in the first tab.
The app's data
Follow the link posted at the start of this blog to check out the application. The app consists of four main tabs. The homepage, which gives a brief introduction to the project. The interactive globe which enables you to observe the pinpoint locations where these accidents occurred and can be filtered by type, operator and date. The annual analysis shows two graphs. The first of these displays the history of incidents by the total number of fatalities and can also be filtered by the same variables. The second shows the count of incidents over the years and again can be filtered.
Finally, the monthly analysis provides a small insight into when accidents most frequently occur. If we observe this graph across all years, we note that incidents are most likely to occur during the winter months. A likely reason for this could be the high increase in global tourism during the holiday period over new years.
Future work
Data analysis is never truly finished and there are multiple further investigations to be done with this data. One useful analysis, would be to utilise the 'summary' variable in the dataset. Although this was not used in my analysis, it holds a lot of key information detailing what caused each individual accident. Regular expression analysis or the use of word clouds could be used here to find out what is the most common cause of airplane accidents. Fire? Terrorism? In air collision? etc.
Thank you very much for taking the time to read through my blog, I hope you enjoy my app!