NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Student Works > Data Science Analysis of Scraped TripAdvisor Reviews

Data Science Analysis of Scraped TripAdvisor Reviews

Theodore
Posted on Dec 14, 2021

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Check out my source codes from: https://github.com/tdchoi7/Web_Scrape_Proj

 

Data Science Background

Good reviews are important for any business today, especially for city attractions and dining. Potential customers rely on the information provided by previous visitors.  Although the tourism industry already has a means of gauging business through revenue or ticket sales, one potentially untapped method of gauging an attractionโ€™s value and boosting sales could lie in the words of reviewers.

The written word can have a great impact on future visitors โ€“ both in number and overall sentiment. Therefore, an analysis of reviews of the top attractions in a city can reveal what people like and what they find wanting. Based on that insight, proprietors can respond to popular demand and increase revenue.

Data Analytic Method

To make the data set manageable, 400 reviews were scraped on December 12, 2020 from each of the 5 top attractions of four major cities: Boston, Chicago, Los Angeles, and New York City. Attraction names, city, review posted date, attraction visit date, number of user reviews, number of user helpful votes, number of review helpful votes, ratings, reviews, review titles, username, and user location were scraped using a combination of Scrapy and Selenium.

Scrapy alone did not suffice because it was unable to properly expand TripAdvisorโ€™s textbox at the time of scraping. In total, there were 38,294 rows and 12 columns. A single sample row is shown in Figure 1 with 12 features.

data for tripadvisor reviews

data for tripadvisor reviews

Figure 1: Example of Original 12 columns

There was a slight issue when scraping attractions in New York, which was most likely due to computer memory capacity at the time. The 9/11 Memorial and Central Park reviews were scraped again separately and added into the DataFrame using Pandas.

Prior to analysis, the null values for visited dates were filled as the posted date, and the number of features was expanded to 16 to include all possible fields for a userโ€™s location. If the userโ€™s location included a state, the abbreviation was used as the replacement. Prior to analyzing the attractions in the city of Boston specifically, ratings of 1 to 3 stars were grouped into a more over-encompassing rating of โ€œPoorโ€ since there were not enough reviews with 1, 2, or 3 stars to do the analysis well.

Data Analysis

Analysis was performed mainly using basic Natural Language Processing (NLP) and Sentiment Analysis. Count of single words, a pair of words (bigram), and a triad of words (trigram) were graphed for all reviews scraped for Boston. Single words generally contained names or words associated with attractions, but bigrams showed a visually interesting trend (Graph 1) where certain word pairings such as were found more often for reviews giving poor ratings.

This pattern was most apparent for the word pairing: โ€œgiftโ€ and โ€œshop.โ€ All but one of the poorly rated reviews containing the word pairing of โ€œgiftโ€ and โ€œshopโ€ were for The Boston Tea Party Ships & Museum (BTPSM) as noted in Graph 2.

Graph 1: Bigram Count by Rating

data for tripadvisor reviews

data for tripadvisor reviews

Graph 2: Count of Word Pairing for โ€œGiftโ€ and โ€œShopโ€ in Poor Reviews

Boston Tea Party Ships & Museum's Gift Shop Analysis

Considering that the word pairings were so often mentioned in reviews that gave poor ratings, the expectation could have been that the gift shop needed much improvement. However, closer inspection of the actual reviews indicated that the gift shop was not actually the issue but rather that the gift shop was the best that BTPSM had to offer. One example is shown in Figure 2 below.

Figure 2: Review of Poor Remarks for BTPSM by DavvaW

Despite Graph 3 showing that negative reviews for BTPSM were not that negatively polar, the review in Figure 2 indicates that the attraction itself was not good and that the gift shop was the only part of the attraction that was worth the userโ€™s while. In fact, a careful read-through of the reviews rating BTPSM poorly would demonstrate that the overall sentiment regarding the attractions leaned towards the negative.

Sentiment Analysis was unable to distinguish the negative views of the main attraction and the positive views of the gift shop, which reinforces the importance of the ratings and encourages combing through the reviews. In the one poorly rated review for Fenway Park that contained the pairing โ€œgiftโ€ and โ€œshop,โ€ the reviewer purchased an upgrade of the tour allowing them on the field, but the activity on the field seemed limited (Figure 3).

Graph 3: Polarity of Reviews Giving BTPSM Poor Ratings

Figure 3: Review with โ€œGiftโ€ and โ€œShopโ€ Pairing Giving Fenway Park a Poor Rating

Fenway Park Tour Analysis

A closer look at the comments for Fenway Park seemed to indicate that the reviews for games tended to be better than reviews for the tour. This is more apparent when delving into the reviews that rated attractions poorly and mentioned the word pairing โ€œtourโ€ and โ€œguideโ€ (Graph 4).  Most of the reviews mentioned some issues with the tour guide including lack of the guideโ€™s awareness or training. Other reviews mentioned a lack of substantial experience (such as going to the dugout or seeing the press box) of the park during the tour. At times, there were complaints of a lack of planning and scheduling the tours (Figure 4).

Graph 4: Polarity of Reviews Containing "Tour" and "Guide"

Figure 4: Review with โ€œTourโ€ and โ€œGuideโ€ Pairing Giving Fenway Park a Poor Rating

An interesting observation to note was for the trigram for โ€œwaste,โ€ โ€œmoney,โ€ and โ€œtime.โ€ This particular trigram appears in only three reviews for Fenway Park and BTPSM, though it may reflect a possible need for change in these two attractions.

Possible Improvements for Attractions

Possible improvements for BTPSM, despite its being a money-making attraction, would be improving time constraints and scheduling. Some complaints in reviews mentioned not being able to see all the artifacts in the museum. Staggering times for tourists with and those without children could also help but would require actors who would know the history of the Tea Party well enough to keep adults informed about the history behind the attraction.

For Fenway, allowing tourists to see the hidden aspects of the field and park could help improve touristsโ€™ view of the park since the tour seems to be less of an experience than an actual baseball game. Adding a part of the tour that allows visitors to visit the dugout, locker rooms, or past Hall of Famers and trophies would be a good addition to the routine.

Other possible ways to incorporate the experience of a ballgame in the tour could be discounted tickets to a Red Sox game with the purchase of a tour ticket. Having players practice or warm-up during a tour or even having someone from management address the tour group could also help improve the experience. Proper training of tour guides or providing tour guides with notes could help the experience even more.

Possible Routes for Further Data Analysis

In the future, analyzing the patterns of how reviews change over time and looking for repetitive patterns or new patterns that develop could help attractions increase revenue. Also, analyzing possible responses to reviews throughout the years could give an indication of whether the attractions have been taking proper steps to increase revenue. Even if attractions are not focused on reviews, noting trends could help more when coupled with targeted fixes based on reviews and ratings.

 

About Author

Theodore

Theodore is a jack of many trades and an expert in overthinking. He has worked in healthcare, healthcare administration, and finance and has experience in medical research. Having volunteered with medical missions abroad, managed building a new primary...
View all posts by Theodore >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application