Data Analysis on Post-COVID Air Travels

and
Posted on Mar 19, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Github Repository | LinkedIn: Ryan Kniewel, James Welch

Photo by Sarah Brown on Unsplash
Photo by Sarah Brown on Unsplash

Before We Start

Turning large datasets into actionable insights is a core feature of any data scientist’s job. For this quarter’s hackathon at the New York Data Science Academy we were provided a dataset consisting of flights curated from a Kaggle submission. Our prompt was to consider the strengths and weaknesses of four major airlines and present a data-driven strategy for them to adopt.

After a session of brainstorming, we decided to focus on long and short term growth through the lens of changes in trends driven by COVID-19. We worked over the course of a week and put together our presentation for the judges. Our findings are presented below.

Introduction

The airline industry is at an inflection point. After a year of pandemic restrictions, hope is on the horizon and the industry is primed to accelerate out of this slump.

The COVID-19 pandemic dramatically reduced airline travel due to travel restrictions, lockdowns, and initial safety concerns. A year after the first cases in the US, travel volume is still reduced 60%. While the Airlines are far from the only industry impacted, they are certainly near the top of the list. Business travel, which formerly accounted for just 12% of travelers but 75% of airline profits, has been decimated by the pandemic. Many routine business trips have likely been permanently displaced by remote work and online meetings.

But there is hope on the horizon, safety measures, therapeutics, and vaccines have provided hope that the end of the pandemic is in sight. As air travel resumes, airlines are in a position to accelerate into the future.

Our analysis encompasses three key themes:

(1) Identifying cities and airports which are undervalued in the old business model, and where an increased presence can yield great returns. 

(2) Leverage growing vacation travel demand to gain market share and provide a counter-cyclic profit driver.

(3) The utilization of coupons as a nudge to promote user behavior.

First Impressions

Data Analysis on Post-COVID Air Travels

Our dataset was sourced from the US Bureau of Transportation Statistics and was curated for only domestic flights, which includes all 50 states, Puerto Rico and the US Virgin Islands. This dataset contained ticket information such as the purchase price, number purchased and how many coupons were available for the flight. The tickets were linked to specific flights with origin and destination airports, and the distance traveled.

Finally, included metadata provided the purchase code, the calendar year quarter, and the airline. This initial dataset was a starting point, but to enhance our analysis we integrated additional sources of information. 

To understand the changing population dynamics of cities we used US census data for the decade of 2010 to 2019 and a more specific analysis from 2018-2019. Additionally, to understand COVID-19 specific trends we integrated an analysis by McKinsey Global Institute that compiled changes in LinkedIn location. We also included a vacation destination trend analysis from Forbes and RENTCafé and incorporated data of hub and focus airports for each airline. By bringing all these data sources together we assembled a picture of how the airline industry was optimized in the past and how we propose its reorganization for the future. 

Data Analysis on Post-COVID Air Travels
Figure 1: Exploration of 2018 route networks for four airlines
Hub airports are indicated by green points, other airports are blue. Airports with 10 or more connections are designated by their three letter IATA code. Route passenger volume is indicated by increased line widths and color depths. 

Opportunities in Growing Cities

Analysis of route networks indicated that three of four airlines utilized a hub system (American, United and Delta; Figure 1). This hub system serves outlying airports and from there passengers transfer to their final destination or at most to another hub and then on to their final destination. This model works well for concentrating infrastructure and costs to one location and has been in use for decades to well serve business and leisure travelers.

Southwest Airlines, on the other hand, has a different business model that is evident from its highly distributed route network. It specializes in middle-distance routes and provides more connections between a larger range of mid-size cities than its competitors.

The existing route system is optimized to the utmost level, and has provided transport for millions of passengers a year, but like many optimizations it isn't easy to change when conditions shift.

Data Plots and Graphs

Data Analysis on Post-COVID Air TravelsFigure 2: Relationship between metro area population, flight volume and airline hub locations. 
Hub airports are indicated by black points.

Most cities where hubs are located are the largest, most dynamic cities in the United States, however that is not necessarily true anymore (Figure 2). Cities like Detroit and Chicago have large metropolitan areas, serve as hub airports for the upper Midwest, but have experienced zero or negative changes in population in the 2010s (Figure 3).

Figure 3: Relationship between population increase in the 2010s and flight volume
Hub airports are indicated by black points.

Analysis of the route network for airports closest to the top 15 fastest growing cities in 2018-2019 reveals that each airline is differently positioned to service these fast growing regions (Figure 5).

Figure 5: Airline route networks serving the top 15 high-growth regions for 2018-2019
Hub airports are indicated by green points, other airports are blue. Airports closest to the top 15 fastest growing areas for 2018-2019 are designated by their three letter IATA code. Route passenger volume is indicated by increased line widths and color depths. 

 

One metro area that is rapidly gaining population and is prominent in both the Census and LinkedIn growth data is around Austin, Texas. 

Data Case Study: Austin

Over the past decade, the Austin metro area which includes Georgetown and Round Rock has grown at an astonishing 30%. Not only is the whole metro area growing, Austin is surrounded by some of the fastest growing cities in recent years: Leander, Georgetown, and New Braunfels.The Austin airport, AUS, is one of the largest in the area and services 13,000 passengers with 246 routes (2018 data). 

An area for disruption that our analysis revealed is the many routes that are predominantly serviced by one airline, even between populous cities. In our dataset, Southwest is the dominant airline serving Austin, yet competes heavily for many routes with United and American. One in particular is for the Austin-Chicago route.

Considering that Chicago O’Hare is a hub for both United and American these two airlines are positioned to take market share from their dominant competitor, Southwest. On the flip side, Southwest could work to fortify its foothold in Austin, which could be through increasing the range of routes servicing AUS, increasing brand visibility, and working to build customer loyalty. 

However AUS isn’t the only airport serving this cluster of metro areas. San Antonio (SAT) is only a 90 minute drive from Austin, while New Braunfels is even closer. The city has been growing at half the rate of the Austin metro area but as their metro area continues to grow the difference between AUS and SAT for that population will come down to choice rather than proximity. 

This leads to the intuition that there may be many other underserved airports located in cities with large and/or growing populations where one airline provides most of the flights, so we attempted to enumerate that insight.

High Potential Cities Data 

The list above was generated by analyzing all the cities provided in our 2018 flights dataset. We linked that to 2019 population data and the change in population over the 2010s. Then we looked at how many routes each carrier provided for that airport. We set a cut-off for population change greater than 10%, and where a single airline accounted for more than 50% of the routes serving each airport. 

The utility and forward-looking model of Southwest’s distributed network is again obvious in this analysis. Their network is already well positioned to provide a variety of routes serving smaller airports across the nation. Southwest is favorably overrepresented in this analysis due to many high growth cities being located in the southwestern US. 

This is a curated list and the cut-offs we used were arbitrary. We think that with more domain knowledge and a broader set of data, more informed cut-offs could be selected.

Ultimately, investing in new hub cities is a costly decision which would require more information, analysis, and expertise. However, we think this analysis can offer a promising strategy for investing in growing cities with an eye toward sustaining the growth opportunities for each airline. Like any investment, hub reorganization would not be expected to pay off immediately but in the current economic climate repositioning to take advantage of growth markets could be fruitful.

Leisure Travel as Opportunity

Besides business travel, leisure is why most customers choose to fly. Occasionally to relocate, more often to visit family, but most commonly for vacation travel. Generally, leisure travel peaks in the summer and winter, but working to increase the volume of holiday travel during non-peak periods could function to offset profit lost from decreased business travel. 

Another trend seen recently is the “COVID working vacation” where professionals, taking advantage of work-from-home policies, relocate to smaller cities, rural areas, or vacation destinations. This pattern was illuminated through a recent search trend analysis by RENTCafe.com.

Search Trends

Their analysis discovered that in addition to vacation search volume being down, the type of destinations cited in chatter online was also changing. Keywords representing beaches, natural attractions, and small towns were increasingly being searched. While many of these locations overlap with popular vacation destinations, some airlines are in a better position to take advantage of this trend than others.

Figure 6 reveals the existing route networks serving vacation destinations being searched on the internet. These are airports closest to beaches, natural attractions, and small towns, while also including some popular vacation city destinations (e.g. Los Angeles and Las Vegas).

Overall, three airlines, American, United and Southwest already have diverse routes serving many vacation destinations. With Southwest making up somewhat for its lack of longer Hawaii routes with a high density of flights to southern locales, including Florida, where it has a hub in Orlando (MCO). (Note: Southwest recently added routes to Hawaii which weren't in our dataset from 2018. A great confirmation of our thesis.) Delta, on the other hand, has relatively fewer flights to Caribbean destinations flying predominantly from their hubs in Atlanta and New York.

Figure 6: Airline route networks serving vacation destinations

Hub airports are indicated by green points, other airports are blue. Airports closest to vacation destinations are designated by their three letter IATA code. Route passenger volume is indicated by increased line widths and color depths.

 

We wanted to specifically look at travel to the Caribbean as another case study. All four airlines fly to Caribbean airports in Puerto Rico, St. Thomas, and St. Croix. This includes 239 routes and served 56,000 passengers in 2018.

Figure 7: Airline route networks serving Caribbean Destinations

Hub airports are indicated by green points, other airports are blue. Airports closest to vacation destinations are designated by their three letter IATA code. Route passenger volume is indicated by increased line widths and color depths.

 

Increasing service to the Caribbean and other vacation destinations will help airlines take advantage of the pent-up travel demand as the pandemic begins to subside. However, many of these destinations are not only vacation destinations, but many are also fast-growing cities and one we wanted to highlight is Sarasota, Florida.  

Combining Our Thesis

Over the past decade the Sarasota (SRQ) metro area has grown nearly 20%. It is an hour drive from Tampa (TPA), another large and high growth city. Sarasota sits on the western coast of Florida and hosts a large beachfront. This allows Sarasota to validate both our theses, that there are many growing cities where airlines can establish a footprint to capture market share while focusing on vacation destinations as locations in need of increased throughput.

Many growth areas are suburbs lying at the edge of, or between, existing metro areas and for most suburbs the difference between TPA and SRQ airports is a slightly longer drive and the choice of flights. 

The final question is if consumer behavior will match our predictions, and a feature in our dataset might be leveraged to encourage their behavior. 

SWOT Data Analysis

  American Airlines United Delta Southwest

Strengths

Hubs in growth cities: PHX, DFW
Hubs in vacation destinations: MIA, LAX
3 hub cities serve Hawaii routes
Hubs in growth cities: IAH, DEN
Hubs in vacation destination: LAX

Many Hawaii routes
Hubs in growth city: SLC
Hubs in vacation destinations: LAX

Many Hawaii routes
Hubs in growth cities: MCO, HOU, DAL, LAS, DAL, DEN
Massive route network

Weakness

  Less expansive route network with respect to high growth areas and most vacation destinations Having hubs in northern cities might hinder establishment of new routes in growth areas and vacation destinations No Hawaii routes as of 2018 (vacation destination)

Opportunity

Exploit growth markets currently only served by one competitor airline
Use MIA hub to undercut vacation travel for FL routes
Exploit growth markets currently only served by one competitor airline Exploit growth markets currently only served by one competitor airline
Since several routes to HI are established, increase marketing/advertising push
Bolster cornered markets where no route competition exists
Use extensive route network to reinforce vacation travel market

Threat

Need to maintain market share in grow areas and vacation destinations

Heavy FL competition
Need to maintain market share in grow areas and vacation destinations Having much route coverage in northern cities may prove detrimental as these are not growth areas (with the exception of SEA and PDX)
Very weak route coverage for some vacation areas (Caribbean)
Exposure to competition in cornered markets in growth areas
Heavy FL vacation travel competition

Changing Behavior with Coupons

The documentation from the Department of Transportation Statistics for the coupon feature was a little suspect. Every fight had at least 1 coupon available, which set off some alarm bells, but we don’t know whether the customer utilized those coupons or what the coupon was used for (Figure 8).

Regardless, we believe coupons are an underutilized resource and embracing them could serve has a way to nudge consumer behavior and respond to consumer sentiment.

Figure 8: Relationship between ticket prices and number of coupons for flights by the four airlines

Only 3% of flights had more than 1 coupon available so there is a lot of room for improvement. Note the boxes above are similarly sized but there are nearly 40 times as many flights in the 1 coupon category compared to 3.

Even with the low number of multi-coupon flights, we can see that there is a broad trend of more coupons being available for more expensive flights. This trend is only strong for United and for other airlines where the difference between 1 and 3 coupon flights is around $50. 

While we don’t know how these coupons are applied or even if they are used by the customer, we think they can be leveraged to nudge behavior. The coupon doesn’t necessarily need to reduce the cost of the flight. For many consumers, a drink coupon or a free checked bag can be the difference between booking a flight and delaying the purchase. 

The coupons may not need to come from the airlines themselves. The value of tourists is so high that destinations might be willing to partner with airlines, offering compensation for our coupons by nudging vacationers to one destination or another.

Conclusions

Though the current position of airlines is precarious, there is the opportunity for growth and expansion in a post-COVID world. We hope we have shown that smaller, fast-growing cities can be the future of your hub system, that vacation travel presents a reliable opportunity to make up for a lost customer base, and that a better utilization of coupons can help shape demand. The future is in your hands, and with an appropriate strategy you are cleared for take-off.

Our Thoughts

We both had a great time working on this hackathon. It was a lot of work, a couple long nights, and some last-minute adjustments, but we are proud of what we made. Though we only had a week to work on our presentation and our experience with the airline industry is limited to that which we have experienced from only inside the cabin, we found this has been a rich learning opportunity.

Ultimately we wish we had more time, we wanted to delve into predictive models for these routes, dashboards which highlight ‘orphan routes’ only serviced by one provider, and explore the idea of airline companies as aggregators of consumer demand. Lastly, we are just looking forward to the future: learning with NYCDSA, analyzing data, and building. Keep a lookout for our next projects and thank you for reading.

Our Data Sources

Airline Hubs, Codes, Locations

Metropolitan Statistical Area Population, 2010-2019 Population Growth

US Census, 2018-2019 Fastest Growing Cities

McKinsey Global Institute

Forbes Post-Covid Travel Destinations

Trondent Development Corp

Airlines for America

About Authors

James Welch

I was trained as a synthetic biologist and I am working to become a data scientists too. I have expertise in the genetic engineering of a variety of single-celled organisms, DNA and protein design, and industrial process scaling...
View all posts by James Welch >

Ryan Kniewel

I have a diverse background in biotechnology and synthetic biology with over 20 years of experience engineering microorganisms using tools from biochemistry, molecular biology, genetics and bioinformatics. I am expanding my knowledge base to address a new range...
View all posts by Ryan Kniewel >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI