Exploring Aviation Accidents from 1908 through the Present

Posted on Jan 9, 2018

If you can walk away from a landing, it's a good landing. If you use the airplane the next day, it's an outstanding landing.
-Chuck Yeager

As the first airplane invented by Wright brothers in 1903, aviation accident became an inevitable tragedy as well as an attractive topic. In this project, I used the data of full history of airplane crashes throughout the world, from 1908 to present, to analyze time, airplane type, airlines and accident summary of aviation accidents



The original data I used contain over 5,000 rows coming from Open Data by Socrata, each row of the data represent a single accident, and the data consist of following features:
Date: date the accident happened
Time: time the accident happened
Location: where the accident happened
Operator: operator the crashed aircrafts belong to
Type: type of crashed aircrafts
Aborad: # of people aboard the crashed aircrafts
Fatalities: # of people dead (who aboard the aircrafts)
Ground: # of people dead (who did not aboard the aircrafts)
Summary: brief accident description
Notice that I removed all the rows belong to military operator in order to focus on commercial airplane.



The first two plot shows air crash count in each year and fatality count and death ratio in each ten years. We can see that after 1970, total amount of accidents and people die from accidents are both decreasing. Due to lack of annually flights count data, we can't simply say that accident rate is decreasing. However, the death rate went down from over 90% to around 65% throughout history, which means passengers are more likely to survive than before in an air crash.
The third plot shows air crash count and death ratio by time of day. Number of air crash are distinguished by day and night. Again, due to data limitation, we can't conclude that at what time it has higher accident rate, but death rate during night is higher than during day. (4 of the top 5 highest death rate time periods are within 12am to 5am)


Aircraft Type

Then I analyzed the type of crashed aircraft. Firstly, I want to see if aircraft size affect death rate. As we can see from the first plot, as the aircraft size increasing, the death rate is overall decreasing. We can conclude that passenger in larger aircraft have higher survive rate than in smaller aircraft.
The next two plot shows air crash count for different aircraft model and air craft manufacturer. By selecting whole time range (can drag the bar in shiny app to change the year range), we find that Douglas DC-3 has the most count of air crash, which is far more than the second one DHC-6, but if only consider accidents after 1964, DHC-6 surpass DC-3 become the one has most accidents. After 2000, Cessna 208-B become the lead.
In the second plot, it’s easy to compare the proportion of different manufacturer’s crashed aircraft in different age. And also, we can select different manufacturer to compare in the shiny app. For example, in the subplot, we selected Boeing and Airbus. From the plot, it’s obvious that Boeing experienced more accident than Airbus through the whole history, but it’s too rough to conclude which is safer. We still need more data such as total number of aircraft in service per year, total number of passenger it delivered, etc.




The plots show air crash count and death rate by different airlines. Throughout whole history, Aeroflot has the most count of air crash (179 accidents in total), which is far more than the second one Air France(67 accidents in total), but by selecting after 1991, private airplane surpass Aeroflot, become the one has most accidents.


Accident Summary

Lastly, I put all accident summary together and generated a word cloud. Not surprisingly, Crashed is the most frequent word in the summary. We can see that Landing is more frequent than Takeoff, which illustrate that air crash happened more in the phase of landing rather than takeoff. Also, depends on words such as Mountain, Ground, Runway, Engine, Fuel, Fire, etc. we can roughly estimate where the accident happened and what was the cause of the accident.


Thanks for watching, please feel free to browse my Shiny App and Github via following link!

Link to Shiny App

Email: [email protected]

About Author

Tianyi Gu

Tianyi Gu is a creative thinker with strong quantitative and analytical skills. Tianyi received his MS in Urban Informatics from New York University and BS in Actuarial Science from SUNY Buffalo. With great passion in infinite possibilities in...
View all posts by Tianyi Gu >

Related Articles

Leave a Comment

Exploring Aviation Accidents from 1908 through the Present – Mubashir Qasim January 9, 2018
[…] article was first published on R – NYC Data Science Academy Blog, and kindly contributed to […]

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI