NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Python > Data Scraping Tripadvisor: Finding the Best Bargains

Data Scraping Tripadvisor: Finding the Best Bargains

Edwin Back
Posted on Feb 3, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

View Edwin's Github or LinkedIn profile


Introduction

Ever wondered when the best time and date to book a flight is? Or how about which airlines really provide the best bang for your buck? As a casual flyer myself, I've pondered these same questions countless times without ever reaching a real conclusion that wasn't influenced by the last clickbait ad I saw on Twitter five minutes ago. Well, I finally said enough was enough and took matters into my own hands by shamelessly extracting nearly an entire calendar year's worth of flight deals on Tripadvisor.

I say "nearly" because the Tripadvisor website did not flight deals available past December 21, 2020. Keep in mind that these deals were scraped in real time on January 27, 2020.

 

Data Extraction

Although I desperately wanted to use Scrapy Web Crawler for its relatively fast scraping rate, I resorted to Python's rather slow Selenium library package simply for the fact that it was able to scrape my desired site, Tripadvisor. Like most major corporations, Tripadvisor runs their site dynamically, allowing for cool features such as searching for flight deals based on lowest to highest prices or booking hotel rooms based on reviews and user ratings. Unfortunately, Scrapy is unable to handle sites that are configured this way. As a result, I created a Selenium web driver that executed the following steps:

  • Opened a Google Chrome web browser that navigated to Tripadvisor's Flights website
  • Entered starting departure date as January 27, 2020, origin airport as NYC (i.e. JFK, LGA, EWR), and first destination airport as LAX; submitted query
  • Once the page results loaded, scrolled through every page of flight deals for NYC to LAX and wrote the pieces of text I identified from each deal (i.e. price, Fly Score, departure date, etc.) to a CSV file
  • The previous step is repeated but for the four other destination airports
  • Once all five destinations for an entire day are scraped, the collection of flight deals for that day are written and stored in a CSV file
  • Then, the scraper repeats the search process again with LAX as the first destination but with the departure date set to the next day, January 28th, 2020
  • The scraper was interrupted and stopped when it reached December 22, 2020 since there were no flight deals available by Tripadvisor for that date and beyond

CSV File

Overall, I was left with one CSV file per day of flights between 01/27/20 - 12/21/20. From these daily files, I concatenated them into 12 separate monthly files based on the month of the flight's departure date. Then, I combined the 12 monthly files into one single file with every row representing a flight deal and the following variables to describe each flight:

  • Departure Date
  • Departure Time
  • Origin Airport Code
  • Arrival Date
  • Arrival Time
  • Destination Airport Code
  • Flight Duration in Hours
  • Overnight Flight Flag
  • Name of Airline
  • Flight Price
  • Flight Deal Rating (Fly Score)
  • Day of the Week Flight Scheduled (i.e. Monday, Tuesday, etc.)

The following analysis was performed using the twelve key variables listed above and a total of 57,801 individual flight deals.

 

Analysis

Although the bulk of my analysis was exploratory in nature, I extracted a handful of insights that could be valuable to both casual flyers (those who fly at least once a year) and frequent flyers (those who fly at least once a month for business and/or leisure), as well as airline corporations. According to the trade organization Airlines for America, commercial aviation helps drive more than $1.5 trillion per year in U.S. economic activity and more than 11 million U.S. jobs.

So, if people like us are going to keep flying but also continue searching for the best quality deals while the airline industry thinks of new ways to get us to purchase those same deals, I was left with one last question: are there any potential areas of compromise where both sides can essentially get what they want? 

Before we reach that juicy conclusion, we need to build up our understanding of the data as a whole through the exploration of three key factors:

  1. Date/Time
  2. Prices
  3. Fly Scores

 

Daily Flight Prices

First, let's take a look at the daily average price of flights over the entire dataset (approximately 10.5 months between 01/27/20 - 12/21/20) to five different cities including Los Angeles (LAX), Miami (MIA), Chicago (ORD), San Juan (SJU), Montreal (YUL).

 

 

Right off the bat, it's apparent that there are peaks and valleys when it comes to flight prices throughout a given year. The obvious suspects for price peaks were the major federal U.S. holidays like Thanksgiving, Christmas, New Year's, Memorial Day, and my personal favorite the 4th of July. Indeed, all of these holidays were confirmed as local price maximums in the data that I scraped shown in the plot above. However, I didn't anticipate Valentine's Day (February 14th) to have anywhere near the highest average prices for any given time span in this dataset, let alone amongst any of the federal holidays where people actually have time off work to travel. Looking further into the data, it makes a lot more sense why the days leading up to, on, and the day after Valentine's Day experience the price inflations that they did.

 

Evaluation

My evaluation is that the near $400 average daily price increase from February 11th ($103.95) to February 15th ($504.13) was most likely due to  Valentine's Day falling on a Friday in 2020. Whether it was finally taking that couple's vacation to Miami or Puerto Rico, or visiting that long distance significant other out in L.A. or Chicago, I think New Yorkers were more inclined to travel this year for Valentine's Day since it didn't fall on an "inconvenient" day such as a Tuesday or Wednesday. With the day falling on a weekend, couples had the option to plan memorable weekend getaways instead of settling for overpriced dinner reservations at a local restaurant they visited last week. My contention is that airlines used a similar type of analysis to mark up their prices for flights leading up to, on, and immediately after Valentine's Day in order to maximize profits. Essentially, these airlines anticipated the demand for outgoing flights between February 11th and February 15th to be unusually high this year for the reasons listed above and took full advantage of it. Indeed, this is just one angle of approach to explain this data and, in reality, there are probably several different factors that might play into this particular price surge. With more time and resources, I think it would be valuable to conduct further research on the factors and circumstances that drove airlines to quadruple their prices during this five day stretch in February.

 

Monthly Flight Prices

One thing that is clearly shown by this day-to-day time series is that the average daily price of flights are constantly fluctuating. As a consumer, there aren't many practical insights I can draw from this data. However, when the time series is manipulated to show average monthly prices versus average daily prices, a few insights seem to stick out.

 

Here, we're able to see that at the time this data was scraped in January 2020, September 2020 was projected to have the cheapest overall flights averaging a whopping $147 per flight. Keep in mind that this data included flights from NYC to both Montreal, Canada and San Juan, Puerto Rico. Meanwhile, June 2020 was projected to have the most expensive flights averaging about $204 per flight.

 

For bargain hunters, this information could be quite useful when planning their next vacation as they could potentially save an average of $57 per ticket by waiting an extra 3 months if their schedule and circumstances are flexible.

 

Flight Prices by Day of the Week

Great, now we know which months to look into and which ones to avoid. So, which day of the week has the cheapest flights available? Well, if you happen to have a very flexible schedule and can pick any day of the week to fly, your best bet is booking a Tuesday flight as shown in the distribution below. To not get distracted by outliers, I limited the data to only show flights $500 and below.

 

Out of all the days of the week, Tuesday has the lowest projected median price at $115 as well as the smallest range of prices.

 

According to the distribution of prices shown above, 50% of flights available on Tuesdays were between $89 and $154, the first quartile and third quartile respectively. Wednesday had the next lowest median flight price at $132 followed by Monday with a median flight price of $154. Contrastingly, Fridays had the most expensive prices with a median flight price of $179 followed by Sundays at $171 and Thursdays at $169.

 

Fly Scores by Day of the Week

Similarly, the distribution of Fly Scores for each day of the week only reinforces the conclusion that Tuesdays and Wednesday are the best days to book a flight.

 

Indeed, Tuesday and Wednesday had almost identical distributions of fly scores with a median score of 7.1 and 50 percent of the scores falling between 5.9 and 7.7 points. Thus, it can be inferred that Tuesdays and Wednesdays are the best days of the week to fly because they have the lowest median prices and the highest median fly scores. Keep in mind that these findings are based on flight prices for days of the week over the entire year. Therefore, Tuesdays and Wednesdays are the best days of the week to book the cheapest flights with the best ratings regardless of what month it is.

Contrastingly, Thursdays and Sundays had the worst fly scores with median scores of 6.5 and 6.4 respectively.

 

This is consistent with the distribution of flight prices previously shown where Thursdays and Sundays had the highest median flight prices. Therefore, based on prices and fly scores, it is safe to say that Thursdays and Sundays are the worst days to fly out of NYC because they have the highest median flight prices as well as the lowest median fly scores out of any given day of the week.

 

Flight Prices by Airline

Now that we've identified the best months and days of the week to purchase flights, let us consider how flight prices are distributed amongst our participating airlines.

 

For all flights out of NYC at or below $500, Frontier airlines had the lowest projected median flight price of $119 while WestJet had the highest projected median flight price of $182. Of course, this is an oversimplification since every airline shown above doesn't service every destination included in this dataset. As a result, this analysis is more useful when a single destination is chosen in order to equally compare the "best" and "worst" airlines.

 

The example above shows the distribution of prices for all flights $500 or below, departing NYC and arriving in LAX. Here, there are only four airlines that service flights from NYC to LAX (American, Alaska, United, and JetBlue). Amongst these four airlines, Alaska airlines has the lowest projected median flight price of $154 while United airlines has the highest projected median flight price of $217. Therefore, if your desired destination is Los Angeles and you have a choice between these four airlines, then the cheapest overall option would be to fly with Alaska airlines.

 

Conclusion

Based on the five core analyses I conducted on this data scraped from Tripadvisor, my suggestion for consumers seeking the cheapest flights with the highest ratings is follows:

  • Book flights in the months of September or May
  • Book flights that depart on either a Tuesday or Wednesday
  • If your desired destination is Los Angeles, book a flight with Alaska airlines if possible

So, if your next vacation includes Los Angeles, Chicago, Miami, San Juan, or Montreal as possible destinations, strongly consider this extensive analysis before you purchase that flight. Thank you for your time and safe travels!

About Author

Edwin Back

Graduated from the University of Michigan in 2015 with a BSE in Environmental Engineering. Data Analyst with a robust understanding of probability and statistics backed by 2 years of professional work involving business data analytics (sales/marketing/real estate), environmental...
View all posts by Edwin Back >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms
Machine Learning
Pandemic Effects on the Ames Housing Market and Lifestyle
Machine Learning
The Ames Data Set: Sales Price Tackled With Diverse Models

Leave a Comment

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application