Which Airline Should I Choose?

Yunmei Zhang
Posted on Feb 16, 2020

Background

According to United States Department of Transportation, U.S. airlines carried 925.5 million passengers in the year of 2019 (2.5 million passengers in a day), with a total annual aircraft hour of over 23 million. Many travelers favor air travel because it is faster and less accident prone than other methods. There are many factors to be considered when booking a flight. Airfares and travel time are probably the top two decisive factors for most travelers. Imagine you are seating at the gate holding your excitement about the upcoming trip, only to find out that your flight is delayed for indefinite hours. How you wish you had booked another flight that has higher on-time rate. For this reason, I developed an app to provide some guideline on which airline to choose so that it is less prone to delays.

 

Data

The data I used to make this app is the Marketing Carrier On-Time Performance from 09/2019 to 11/2019 published by U.S. Bureau of Transportation. Information includes time, carrier, origin and destination airports, departure and arrival delays and geographic information of the airports for each flight. After I dropped out the flights that had missing information on departure and arrival time, the data for analysis included a total of 303,807 flights, covering 26 U.S. airlines and 370 airports. 

 

The R shiny App

The interactive app was built using shiny dashboard in R. The app allows user to select origin and arrival airports, and the departure time period, which I categorized into 4 groups: early morning (midnight - 6 am), morning (6 am - noon), afternoon (noon - 6 pm) and evening (6 pm - midnight). 

 

The app contains two analyses, one for the airport on-time performance and the other for the carrier. Under the Airport tab, I aggregated the observations by origin and arrival airports, and made two maps using leaflet to show the average delay in mins for all departure and arrival airports. There are two boxes on the top of the page to display the average delay for the selected airports. Markers on the map show the locations of airports. The color and size of a marker indicate how severe the delay is for that airport compared to others. I added label to each marker so user can hover the mouse to any airport for details on how long the average delay is. This allows users to view any nearby airports and perhaps save some time by flying from another airport. A similar map was produced for the arrival airport below the departure airport map. 

 

The second visualization I built was to compare carrier on-time performance under the Carrier tab. I aggregated observations by the day of the week and carrier for user specified departure airport, arrival airport and departure time. The three infoboxes on the top serve as a general guideline on day with most flights, best day and worst day to travel based on average delay time across all carriers. Airfares could be cheaper on days with more flights. Avoid travelling on those busier days could potentially improve your overall fly experience. 

There are four bar charts plotted using plotly with day in a week as x-axis and available carriers as bar colors. The first graph shows the count of flights as y-axis for each day in a week and each carrier that has flight between the chosen airports and time period. The second and third bar charts are for average departure and arrival delay for all carriers. This allows users to compare carrier on-time performance for any day of a week. The last graph is the average arrival delay caused by carrier. Note that there are a few other reasons that could lead to a delay, such as weather, late aircraft and security. Because the goal of this app is to make carrier recommendation, other causes that are uncontrollable by the carrier, were excluded from this analysis. The summary table of the 4 bar charts can be found under the Table tab for more information. 

 

The Data tab from the sidebar lists all the observations used for this app for anyone who is interested in the data set. 

Finally you can find my information under the Author tab. 

 

Going Forward 

The app was built based on three months of data due to the RAM limitation of shiny apps. More data could improve the accuracy and perform other analyses such as how well each carrier handles special situation such as extreme weather or holiday seasons.

I would also like to develop an app to recommend the best flights to take, not just the carriers. This can provide users more flexible options such as preferred carriers and nearby airports to truly improve their fly experience. 

 

Thank you!

Thank you for reading my post and I hope you find the app interesting. Here is the link to my Github repository if you want to see the codes. Feel free to contact me by email: [email protected] if you have any questions or would like to discuss more. 

About Author

Yunmei Zhang

Yunmei Zhang

Yunmei (May) graduated from Cornell University with a master degree in chemical engineering. After graduation, she started a career as a process engineer in a manufacturing company, where she utilized the power of data in problem solving and...
View all posts by Yunmei Zhang >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp