Data Analysis on Airline Companies

Posted on Feb 16, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


According to data from United States Department of Transportation, U.S. airlines carried 925.5 million passengers in the year of 2019 (2.5 million passengers in a day), with a total annual aircraft hour of over 23 million. Many travelers favor air travel because it is faster and less accident prone than other methods. There are many factors to be considered when booking a flight. Airfares and travel time are probably the top two decisive factors for most travelers.

Imagine you are seating at the gate holding your excitement about the upcoming trip, only to find out that your flight is delayed for indefinite hours. How you wish you had booked another flight that has higher on-time rate. For this reason, I developed an app to provide some guideline on which airline to choose so that it is less prone to delays.



The data I used to make this app is the Marketing Carrier On-Time Performance from 09/2019 to 11/2019 published by U.S. Bureau of Transportation. Information includes time, carrier, origin and destination airports, departure and arrival delays and geographic information of the airports for each flight. After I dropped out the flights that had missing information on departure and arrival time, the data for analysis included a total of 303,807 flights, covering 26 U.S. airlines and 370 airports.Β 


The R shiny App Data

The interactive app was built using shiny dashboard in R. The app allows user to select origin and arrival airports, and the departure time period, which I categorized into 4 groups: early morning (midnight - 6 am), morning (6 am - noon), afternoon (noon - 6 pm) and evening (6 pm - midnight).Β 

Data Analysis on Airline Companies


The app contains two analyses, one for the airport on-time performance and the other for the carrier. Under the Airport tab, I aggregated the observations by origin and arrival airports, and made two maps using leaflet to show the average delay in mins for all departure and arrival airports.

First Visualization

There are two boxes on the top of the page to display the average delay for the selected airports. Markers on the map show the locations of airports. The color and size of a marker indicate how severe the delay is for that airport compared to others. I added label to each marker so user can hover the mouse to any airport for details on how long the average delay is. This allows users to view any nearby airports and perhaps save some time by flying from another airport. A similar map was produced for the arrival airport below the departure airport map.Β 

Data Analysis on Airline Companies

Second Visualization

The second visualization I built was to compare carrier on-time performance under the Carrier tab. I aggregated observations by the day of the week and carrier for user specified departure airport, arrival airport and departure time. The three infoboxes on the top serve as a general guideline on day with most flights, best day and worst day to travel based on average delay time across all carriers. Airfares could be cheaper on days with more flights. Avoid travelling on those busier days could potentially improve your overall fly experience.Β 

Charts and Graphs

There are four bar charts plotted using plotly with day in a week as x-axis and available carriers as bar colors. The first graph shows the count of flights as y-axis for each day in a week and each carrier that has flight between the chosen airports and time period. The second and third bar charts are for average departure and arrival delay for all carriers. This allows users to compare carrier on-time performance for any day of a week.

The last graph is the average arrival delay caused by carrier. Note that there are a few other reasons that could lead to a delay, such as weather, late aircraft and security. Because the goal of this app is to make carrier recommendation, other causes that are uncontrollable by the carrier, were excluded from this analysis. The summary table of the 4 bar charts can be found under the Table tab for more information.Β 


The Data tab from the sidebar lists all the observations used for this app for anyone who is interested in the data set.Β 

Finally you can find my information under the Author tab.Β 


Going ForwardΒ 

The app was built based on three months of data due to the RAM limitation of shiny apps. More data could improve the accuracy and perform other analyses such as how well each carrier handles special situation such as extreme weather or holiday seasons.

I would also like to develop an app to recommend the best flights to take, not just the carriers. This can provide users more flexible options such as preferred carriers and nearby airports to truly improve their fly experience.Β 


Thank you!

Thank you for reading my post and I hope you find the app interesting. Here is the link to my Github repository if you want to see the codes. Feel free to contact me by email: [email protected] if you have any questions or would like to discuss more.Β 

About Author

Yunmei Zhang

Yunmei (May) graduated from Cornell University with a master degree in chemical engineering. After graduation, she started a career as a process engineer in a manufacturing company, where she utilized the power of data in problem solving and...
View all posts by Yunmei Zhang >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI