Data Study on Airline Customer Satisfaction
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
The sheer size of the airline industry provides a reason to care about it: it affects not only millions of people directly (flyers, pilots, engineers, etcetera), but also millions more indirectly by the heft of its economic presence. In a December 2016 report, data shows the International Air Transport Association (IATA) wrote:
"While airline industry profits are expected to have reached a cyclical peak in 2016 of $35.6 billion, a soft landing in profitable territory is expected in 2017 with a net profit of $29.8 billion. 2017 is expected to be the eighth year in a row of aggregate airline profitability, illustrating the resilience to shocks that have been built into the industry structure. On average, airlines will retain $7.54 for every passenger carried." (reference)
As a resident of the US, the daughter of an airline pilot, and a semi-frequent flyer, I have a personal interest in the US airline industry in particular. In looking at carriers by region in the same article mentioned above, the IATA concluded: "The strongest financial performance is being delivered by airlines in North America. Net post-tax profits will be the highest at $18.1 billion next year [...]. The net margin for the region’s carriers is also expected to be the strongest at 8.5% with an average profit of $19.58/passenger."
Although the North American airline industry is strong, it must be ever-vigilant about keeping up with customer demands in order to maintain its continued growth and its continued position as industry leader across regions. Of course, success in this regard requires airlines to know what customers care about in the first place. Discovering what airline customers like and dislike about their flight experiences was the starting point for this project. To view the project in an interactive Shiny App, click here.
To understand more precisely which aspects of a flight shape customers' opinions, I decided to scrape the website Skytrax, which collects customer-written reviews of flights for nearly every airline in operation. A typical review appears as follows:
To get the data from posts of this kind into an analyzable format, I wrote a python script using Selenium to define a web scraping spider, the code for which can be found here on my Github. Of the hundreds of airlines available for reviewing on Skytrax, I limited the scope of my investigation to the eleven largest US-based companies: Alaska Airlines, Allegiant Air, American Airlines, Delta Air Lines, Frontier Airlines, Hawaiian Airlines, JetBlue Airways, Southwest Airlines, Spirit Airlines, United Airlines, and Virgin America. I scraped around 10,000 reviews in total. The variables I included in the scrape were:
- airline: Airline with which the review-writer flew
- overall: Overall airline rating (out of 10) given by the review-writer
- author: Name of the review-writer
- date: Date the review was written
- customer_review: Text of the customer review
- aircraft: Aircraft class/type on which the review-writer flew (possibilities too numerous to list; example: Boeing 737)
- traveller_type: Type of traveller of the review-writer (Business, Couple Leisure, Family Leisure, Solo Leisure)
- cabin: Type of cabin/class in which the review-writer flew (Business Class, Economy Class, First Class, Premium Economy)
- route: Origin and destination of the flight (example: Chicago to Boston)
- date_flown: Month and year in which the flight in question took place
- seat_comfort: Rating (out of 5) of seat comfort
- cabin_service: Rating (out of 5) of the inflight service
- food_bev: Rating (out of 5) of the quality of the inflight food and beverages
- entertainment: Rating (out of 5) of the availability and connectivity of the inflight entertainment*
- ground_service: Rating (out of 5) of the service on the ground before and/or after the flight
- value_for_money: Rating (out of 5) of the value of the airline against the cost of a ticket
- recommended: Does the review-writer plan to recommend the airline to others (Yes or No)
Process of Choosing Data
One detail of the data worth mentioning is that the variables coming from the "table" in the review (aircraft, type of traveller, etcetera) were not required fields for the review writer. This meant that different reviews could contain different subsets of variables from that table. This affected both the engineering and analysis aspects of this project.
On the engineering side, the changing tables from review to review forced me to use Selenium rather than the much faster python package Scrapy. On the analysis side, reviews that did not contain all possible table fields would have missing values (NAs) for those omitted variables and missingness was leveraged to help answer the more focused questions that follow.
*It was unclear from the review submission page and the data alone if the category of "Cabin Wifi and Connectivity" was available to older reviews as it may have been been subsumed by "Inflight Entertainment." In any case, the wifi connectivity variable was not scraped and absolutely deserves more attention in future analyses.
Out of seat comfort, cabin service, food and beverages, entertainment, and ground service, which aspect of a flight has the most influence on a customer's overall rating?
This is a classic machine learning question that is easy to ask but difficult to answer, the difficulty lying in the potentially subtle interactions among the predictor variables. To attack this question in my project, I turned to random forests; this choice was motivated by the need to avoid machine learning techniques (such as linear regression) that rely on normality assumptions for the predictor variables. Given the biased nature of reviews, nearly all of the predictor variables in question here (seat comfort, cabin service, food and beverages, entertainment, ground service) were not normally distributed but rather bimodal, with peaks near ratings of 1 and 5.
I used the randomForest() function from the R package "randomForest", which uses the non-parametric Breiman random forest algorithm to produce regression models. As a side perk, it estimates the importance of each predictor variable, relative to the others in question, in predicting the response variable. This output is what I used to determine which of my five variables was most important to the overall flight rating.
The details of the Breiman random forest regression algorithm as well as the details for the algorithms used to determine the prediction importance can be found here. Below is a visual of the variable importance output from the randomForest() function when run on my data:
One thing to note is that the above computation was done on a version of my dataset in which I had imputed "3" ratings into all the missing ratings. This was done with the idea that if someone had omitted a variable from their review, they likely felt ambivalent about it and would give a middling rating of 3/5 if pressed. However, this was an assumption that could alter my results so I did two things:
- Ran the same computation on three different versions of my data: one with "3" imputed for NAs; one with the variable mean imputed for NAs in that variable; one with only reviews that were entirely complete. The above result, as mentioned, is from the first version of imputed data. While the importance ratings (the numbers themselves) changed from version to version, the order of variable importance was constant throughout.
- This allows me to conclude that according to the Breiman algorithm, ground service was the most important variable in predicting a customer's overall rating for a flight, followed by seat comfort, cabin service, food and beverage, and entertainment (in that order).
- Analyzed the proportion of reviews that included each variable. Imputing missing values inevitably skews results, but eliminating them costs us potentially valuable information. In this case, I believed missingness in various fields tended to fall into the category of "Missing Not at Random," meaning that the cause for missingness was actually related to the value of the variable in question; in particular, I believed fields with high amounts of missingness were likely less important to customers than fields with very little missingness. To analyze this, I graphed the proportion of reviews that included each variable:
From this we see that cabin service and seat comfort are fields that are included in nearly every review, while ground service is only included in about 55% of the reviews.
Cabin service and seat comfort are filled in by nearly every customer who writes a review, so these aspects of a flight are important to most customers. Since these variables rank as second and third most important in predicting overall flight score according to the Breiman random forest algorithm, we may conclude that these two fields have the most influence on a customer's overall rating of a flight.
Furthermore, while ground service appears to be the field most often omitted from reviews (and therefore possibly the least important to customers in general), overall flight rating is highly dependent on ground service for those customers who do include it. In other words, most people don't care too much about ground service, but those who do care a lot.
IMPORTANT EDIT: The above analysis is not reliable due to an ex post facto realization (thank you, Adi) about the review survey format. The fields of entertainment and food and beverage had 'NA' as an option while cabin service, ground service, and seat comfort did not.
Such a difference in survey format introduces bias into the responses and makes comparison across the differently-formatted fields unsound. A correct analysis would compare the impact of cabin service, ground service, and seat comfort on overall rating separately from that of entertainment and food and beverage. In addition, applying any sort of tree-based machine learning technique in this corrected situation would be like shooting at pigeons with cannons: it would be overkill.
How are US airlines performing across different aspects of customer flight experience?
Given our results in Question 1, an airline may now want to compare itself to other airlines and to the industry as a whole across the variables of cabin service, entertainment, food and beverage, ground service, and seat comfort. To analyze this, I counted the number of reviews giving 1, 2, 3, 4, 5, and NA ratings for each variable and within each airline, as well as for the industry as a whole. For seat comfort ratings specifically, we have the following results:
Each of the five variables of cabin service, entertainment, food and beverage, ground service, and seat comfort were explored, and going over all of these lead to the following observations:
- JetBlue Airways had the best ratings for seat comfort across all airlines and thus should market themselves as the industry leaders in seat comfort. Similarly, Alaska Airlines should market themselves as the industry leader in cabin service. Given the results from Question 1, both JetBlue and Alaska would likely see a boost in sales if customers knew they lead in seat comfort and cabin service since these are the variables (of the five studied so far) that impact customers' overall impressions of a flight the most.
- The industry as a whole sees quite a lot of low or missing ratings in entertainment. An airline that has received middling ratings in other fields could distinguish itself with entertainment (inflight movies, wifi access, etcetera) and potentially influence more customers to care more about their inflight entertainment experiences.
- Spirit Airlines consistently receives mostly 1 ratings, which suggests that across all considered fields customers tend to be unhappy with their experience. Yet, Spirit Airlines continues to grow. This suggests a need for more exploration into the needs and wants of airline customers.
On the whole the US airline industry is doing the best with cabin service and the worst with ground service and seat comfort (in these fields, there were fewer 5s than any other rating). In addition, there is a huge number of 'NA' ratings for entertainment. Within each of the five fields studied, there are opportunities for individual airlines to capitalize on the difference between their own performance and that of the industry: either boast about leading in a currently important field (seat comfort, cabin service, or ground service) or put money towards becoming a leader in an overlooked arena (entertainment or food and beverage).
What words come up most frequently in positive reviews? In negative reviews?
The previous questions aimed to better understand airline customer views on five specific aspects of flight experience (cabin service, entertainment, food and beverage, ground service, seat comfort), but since these five fields do not account for all that could impact a customer's overall experience, I wanted to analyze the actual text of their reviews. To do this, I used word clouds, which show the most-used words by raw count within a body of text (a corpus) after filtering out common words like 'the' and 'to.'
I generated the word clouds using a combination of the "tm", "wordcloud," and "memoise" packages in R. I analyzed the positive and negative reviews separately and for both individual airlines and the industry as a whole. Positive reviews were those with overall ratings of 6/10 or better and negative reviews were those with overall ratings of 5/10 or worse. Here are the positive and negative word clouds for the industry as a whole:
In both the positive and negative reviews, the word 'time' is one of the top three most-used words. This suggests that, indeed, the five fields considered in questions 1 and 2 did not capture all that is important to flyers (an obvious statement to flyers, of course!). Looking through the word clouds for all the airlines individually further emphasizes that customers primarily rave about and complain about time-related topics.
Interestingly, the word 'hours' comes up in negative reviews, suggesting that customers can tolerate a moderate amount of lateness or delay (say, under an hour) but lose patience after that. Conversely, no amount of time comes up in positive reviews; potentially, any amount of time that an airline can save a customer is rewarded.
The words 'seats' and 'service' still appear in the top five words too, though, so the analysis from the previous questions is corroborated.
In addition to providing additional information about the trends in the industry, individual airlines can benefit from these word cloud analyses by seeing words that appear in their clouds and not in others (or vice versa). For example, 'Atlanta' comes up in the negative reviews for Delta, suggesting that Delta has problems in Atlanta; similarly, Frontier has problems in Denver, and Hawaiian airline customers mention class - both positively and negatively - more than those of any other airline.
Spirit, although it ranks as the worst among the airlines in nearly every one of the five fields considered in questions 1 and 2, has a negative word cloud that is not distinguished from the other negative word clouds. That is, Spirit customers still complain the most in text reviews about delays and time, and in fact, Spirit customers mention the word 'seat' in their positive reviews!
Airline customers write about time more than anything else, followed by service and seats. Given the surprising findings about Spirit Airlines, it's possible that saving/wasting time is an even stronger predictor of a customer's overall flight rating than anything else. In business practice, this would suggest that anything an airline can do to save customers time would boost their sales; for some airlines, making tweaks even in single cities may lead to higher level of customer satisfaction.
Conclusion and future directions
Despite the fact that there are a myriad of factors that impact a flyer's experience with a flight, airlines can boost customer satisfaction by focusing on a few main aspects of flights - particularly time, seat comfort, and cabin service. This project only scratches the surface in understanding all that impacts flyers' impressions and each flight aspect merits a full investigation of its own. For example, research into the cost and benefit of improving seat comfort would be highly beneficial for an airline looking to improve sales through seat comfort.
In addition, I'd like to employ more advanced machine learning techniques to help predict customer flight ratings and, subsequently, sales. For this, I'd like to look into the unused data I scraped about routes, traveller type, cabin, and date flown, as well as incorporate weather and economic data. A more detailed analysis on the review text could also be done with more sophisticated Natural Language Processing tools such as topic modeling.