Exploring Aviation Accidents from 1908 through the Present

Avatar
Posted on Jan 9, 2018

If you can walk away from a landing, it's a good landing. If you use the airplane the next day, it's an outstanding landing.
-Chuck Yeager

As the first airplane invented by Wright brothers in 1903, aviation accident became an inevitable tragedy as well as an attractive topic. In this project, I used the data of full history of airplane crashes throughout the world, from 1908 to present, to analyze time, airplane type, airlines and accident summary of aviation accidents

 

Data


The original data I used contain over 5,000 rows coming from Open Data by Socrata, each row of the data represent a single accident, and the data consist of following features:
Date: date the accident happened
Time: time the accident happened
Location: where the accident happened
Operator: operator the crashed aircrafts belong to
Type: type of crashed aircrafts
Aborad: # of people aboard the crashed aircrafts
Fatalities: # of people dead (who aboard the aircrafts)
Ground: # of people dead (who did not aboard the aircrafts)
Summary: brief accident description
Notice that I removed all the rows belong to military operator in order to focus on commercial airplane.

 

Time



The first two plot shows air crash count in each year and fatality count and death ratio in each ten years. We can see that after 1970, total amount of accidents and people die from accidents are both decreasing. Due to lack of annually flights count data, we can't simply say that accident rate is decreasing. However, the death rate went down from over 90% to around 65% throughout history, which means passengers are more likely to survive than before in an air crash.
The third plot shows air crash count and death ratio by time of day. Number of air crash are distinguished by day and night. Again, due to data limitation, we can't conclude that at what time it has higher accident rate, but death rate during night is higher than during day. (4 of the top 5 highest death rate time periods are within 12am to 5am)

 

Aircraft Type


Then I analyzed the type of crashed aircraft. Firstly, I want to see if aircraft size affect death rate. As we can see from the first plot, as the aircraft size increasing, the death rate is overall decreasing. We can conclude that passenger in larger aircraft have higher survive rate than in smaller aircraft.
The next two plot shows air crash count for different aircraft model and air craft manufacturer. By selecting whole time range (can drag the bar in shiny app to change the year range), we find that Douglas DC-3 has the most count of air crash, which is far more than the second one DHC-6, but if only consider accidents after 1964, DHC-6 surpass DC-3 become the one has most accidents. After 2000, Cessna 208-B become the lead.
In the second plot, it’s easy to compare the proportion of different manufacturer’s crashed aircraft in different age. And also, we can select different manufacturer to compare in the shiny app. For example, in the subplot, we selected Boeing and Airbus. From the plot, it’s obvious that Boeing experienced more accident than Airbus through the whole history, but it’s too rough to conclude which is safer. We still need more data such as total number of aircraft in service per year, total number of passenger it delivered, etc.

 

Airlines


 

The plots show air crash count and death rate by different airlines. Throughout whole history, Aeroflot has the most count of air crash (179 accidents in total), which is far more than the second one Air France(67 accidents in total), but by selecting after 1991, private airplane surpass Aeroflot, become the one has most accidents.

 

Accident Summary


Lastly, I put all accident summary together and generated a word cloud. Not surprisingly, Crashed is the most frequent word in the summary. We can see that Landing is more frequent than Takeoff, which illustrate that air crash happened more in the phase of landing rather than takeoff. Also, depends on words such as Mountain, Ground, Runway, Engine, Fuel, Fire, etc. we can roughly estimate where the accident happened and what was the cause of the accident.

 


Thanks for watching, please feel free to browse my Shiny App and Github via following link!

Link to Shiny App

Email: [email protected]

About Author

Avatar

Tianyi Gu

Tianyi Gu is a creative thinker with strong quantitative and analytical skills. Tianyi received his MS in Urban Informatics from New York University and BS in Actuarial Science from SUNY Buffalo. With great passion in infinite possibilities in...
View all posts by Tianyi Gu >

Related Articles

Leave a Comment

Avatar
Exploring Aviation Accidents from 1908 through the Present – Mubashir Qasim January 9, 2018
[…] article was first published on R – NYC Data Science Academy Blog, and kindly contributed to […]

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp