Data Analysis on Restaurant Reviews

Posted on Feb 2, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

How Important Are Online Reviews

Data shows online restaurant reviews changed how customers decide where to eat. In an ultra-competitive market like New York City, a restaurant's online reviews can mean make-or-break. Restaurant owners are fully aware of the importance of customer reviews and doing everything possible to boost their online image. There are many credible review sites - Squarespace, Yelp, TripAdvisor just to name a few. Some restaurants can have vastly different ratings on different review sites, which makes it difficult for a customer to know which source is more reliable.

”94% of Diners Will Choose Your Restaurant Based on Online Reviews.”  Β 

– Michael Guta (Small Business Trends)

Data on The top-ranked NYC restaurant on TripAdvisor

One night I was browsing on TripAdvisor to search for new restaurants to add to my restaurant bucket list. When I saw Obao Midtown East was ranked No.1 in New York City with a jaw-dropping 4.5 stars (out of 5), I thought that I must have applied some wrong filters. After re-doing the search, I shockingly found out that the midtown Seamless stable, which makes everything from Vietnamese pho to Pad Thai, is indeed the top-ranked restaurant in the whole New York City.

Perhaps Obao upped their game recently? I went on Yelp for another reference only to find out the rating is still at an un-inspiring 3.5 stars. Curious about the huge difference between the two sites, I decided to get to the bottom of this.

To conduct a more in-depth analysis, I first scraped all the reviews on Obao from the two websites. At first glance, I noticed a rather generous rating tendency on Yelp than on Tripadvisor.

Data Analysis on Restaurant Reviews

Obao's Rating has been on a move.

Despite there has always been a gap between Yelp and TripAdvisor reviews, ratings on both sites have remained relatively consistent over the years. However, the rating on TripAdvisor had a large surge in the recent two months. Given the restaurant has been around for many years, such a sharp turn is quite unexpected.

More surprisingly, the jump in rating coincided with a sudden increase in the number of reviews in December 2019 and January 2020.

Data Analysis on Restaurant Reviews

Digging deeper, I discovered that more than half of the reviews in Dec’19 and Jan’20 were from first-time reviewers. In addition, the average rating for the period is an eye-popping 4.99. This doesn’t make sense…

How did they do it?

Seeking a clue, I started reading the user reviews and came across one that was written by an ex-colleague. It turned out Obao was offering a free drink for each customer to write a review. Pretty clever right? But it is actually a violation of TripAdvisor website policy and can result in penalties such as warning signs on the business webpage and disqualification for TripAdvisor award.

β€œTripAdvisor encourages businesses to ask all customers to write reviews and share their feedback. However, we do not allow offering any kind of incentive for a review because this can impact the impartiality of that review. Under ourΒ incentives policy, we penalize any businesses that are found to be offering incentives to customers.”

- TripAdvisor policy

A better review data platform is needed

Review websites such as TripAdvisor are built on credibility. Biased or inflated reviews can have a detrimental effect on user trust. Review sites have to step up their effort to continually evolve their review platforms to prevent businesses from gaming the system. Here are a couple of recommendations:

  1. Adding a question before the review section asking review writers if they were asked by the business to write a review. This can deter businesses from offering incentives in exchange for reviews.
  2. Implementing rating trend line analysis such as the technique mentioned in this article to catch any suspicious rating increase


About Author

Vincent Ji

Vincent is a data scientist and a former research data associate at Bridgewater Associates. Prior to that, he was an associate at BlackRock, focusing on data analytics, business strategy, and implementation. He started his career as a management...
View all posts by Vincent Ji >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI