Reaching for the Stars on Skytrax

Introduction
Most people today don't enjoy their flights. The top three reasons for negative reviews are bad customer service, delay, and cancellations. I wanted to learn how the experience may be improved and so embarked on this data science project centering on Skytrax. Skytrax is a UK-based consultancy that runs an airline and airport review and ranking site. The 5-Star Airline rating is a unique mark of quality achievement, and only 10 airlines are certified in this category. I set out to find what did those 10 airlines do better to deserve a spot in the list. The link for the code is on https://github.com/freddy90503/SkyTrax_Scraping



The Data
The data I am scraping from the web are individual reviews that only verified consumers left. A full review includes title, name, country, review date, comments, type of traveler, seat type, route, date flown, overall score, seat comfort score, cabin staff service score, food & beverage score, inflight entertainment score, ground service score, value of money score, and whether or not the person recommends the airline. Not all consumers answered all questions.

Data scraping with Scrapy
I used the Scrapy tool and got all 10 files for each airline. Each file has all the information from each review listed in columns. Each row represents one review.


Data cleaning with Pandas
After I got all the raw data, I used Panda package in Python to do some cleaning. I first combined all 10 airline files together into one. Then I dropped duplicates, removed parentheses, and colons that were not useful to my analysis. I casted the date information to date type, then I filled N/A with either number 0 or NAN depending on the column.

Overall
- Firstly I created a bar chart that reflected the overall rating for all 10 airlines. As shown, all of them are well rated, with score 9 and 10 in the majority except for Lufthansa.

By travel type
I did another chart to see how different types of travelers give different ratings and found out that solo travelers on average give the highest rating, while couples give the lowest.

By Aircraft Model
I did another analysis by aircraft model and found out that Boeing 777 and 787 are the most highly rated models. Hainan Airlines' Boeing fleet has the highest ratings overall.

By Seat Type
Here is a chart to see how travelers seated on different types of seats give ratings differently. I found out that First Class travelers on average give the highest rating and Premium Economy the lowest.

By other methods
To better analyze data that I gathered for different segments like wifi, entertainment, food, ground service, seat comfort, and value, I made these stacked histograms. I found out that most of the airlines have more than 75% of users rated 3 or above score out of 5 in almost all categories.




Word clouds
I also used word clouds to see what the most popular words were given in the comments section for each airline. Most of the words were positive, and phrases like "Good service," "Excellent," "Friendly," "Great experience,",and "Comfortable" really stood out.



By Correlation with Overall Score
After I had all sorts of charts and analysis, I found out that most of the top 10 airlines are doing very well in all segments, so I wanted to see which segment is most important for reviewers. I checked the correlation of all the small areas and the overall score. I found that consumers usually give a high overall score when they think the value of the flight is good and that they received good services in-flight and on the ground. Entertainment and wifi have lowest correlation to the overall score.

Conclusion
After the project, I was able to gain some insight:
- First-class passengers are more likely to give positive feedback.
- Solo travelers are more likely to give positive feedback.
- Boeing 787 & 777 passengers are more likely to give positive feedback.
- The most important elements are value and customer service, which are areas that US airlines are lacking in.
- US airlines are good at entertainment and wifi, but those are not crucial elements.
- The top 10 airlines in the world are all very good at multiple areas instead of just one.
This project helped me answered questions I had before, and I look forward to expanding it in the future.