Food delivery: a new revenue source but also more complexity to manage
The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction
Over the past decade the number of consumer review websites, such as Yelp.com, has exploded. These websites allow consumers to share their experiences about service, product quality, restaurant environment and other aspects. Nowadays, it is very easy to acquire information and data from countless other consumers about restaurants, hotels, products and it shows a significant impact in the businesses.
Another significant change in the last years was the market for food delivery that keeps growing with the creation of several websites and apps delivering meals from restaurants that sometimes haven’t traditionally offered the option food to-go. For restaurant owners, the extra business is often welcomed, but introducing a third party can create a large number of problems.
So, given that bad reviews can harm the business and having the delivery service as a new factor to be reviewed, this study is intended to analyze how reviews from Yelp website can be compared with Seamless website (delivery service)?
Data Collection
On Yelp.com, I used Scrapy to web scrape 393,314 reviews from 570 restaurants in New York City.
On Seamless.com, I used Selenium to web scrape 335,169 reviews from 5,612 restaurants in New York City.
The number of reviews per borough, from each website, can be identified in the charts on the left and the number of restaurants, per borough and price, can be identified in the charts on the right:
Analysis on Yelp database:
Before joining the databases form both websites to do the final analysis, a specific analysis on Yelp data was performed.
When analyzing the restaurants from different boroughs, it is possible to notice that Manhattan, Queen and Staten Island show restaurants with higher rates, where 75% of the restaurants have the overall rating between 4 and 5.
The main purpose of this study is to analyze how the delivery service impacts the review rates, the following chart shows the user rates whether the restaurant has delivery service or not based on Yelp reviews. It was possible to notice that restaurants with delivery service have a wider percentage of restaurants with lower rates.
In order to check if lower rates on restaurants with delivery service is a general behavior in New York City or if it changes from one borough to another, I plotted the following chart. Manhattan and Brooklyn show the same behavior, but Queens shows the opposite behavior and it seems to have no difference on delivery service in Bronx. This indicates that the infrastructure/traffic/service of the borough might have an impact on reviews.
Analysis on Seamless database:
Likewise Yelp, the same analysis was performed on Seamless data. Seamless make available information about what people are saying on reviews related to the quality of food, quality of the delivery and the quality of the order made on the website. I plotted a box plot to check if these variables could be related to the overall rate of the restaurants.
It looks like there is no significant "bad" reviews related to if the order was accurate or not (when the food is delivery accordingly to the order made on the website/app).
Analysis on Seamless and Yelp combined database:
To perform the complete analysis, comparing restaurants that are in both website I joined the databases ending up with a total of 135 restaurants. The total number of reviews from these restaurants on both websites are shown in the chart below.
Initially I plotted a box plot comparing the rated of these restaurants per borough and you can see difference in some boroughs, as follows:
If you see the overall rating from the restaurants together if it doesn't seem to be very different but to confirm that, I plotted the results for restaurants that had more than 1,000 reviews on Yelp. The chart shows that some restaurants do not match but it looks like they usually have similar results.
Regardless of what both charts showed, I tested their correlation (-0.0215) which means a very low correlation. I also ran a two-sample t-test and the p-value was 1.9931e-223 which means that the samples are unlikely to have the same mean. So, the overall rate of people using Yelp (most of the cases going into a restaurant) is different from ordering food online using Seamless.
Conclusions
This study was able to show that Yelp and Seamless have different overall rates for the same restaurant. So, before starting food delivery service, a restaurant needs to aware of the new factors that it may bring to the restaurant management.
Some concerns related to starting delivery service are quality and temperature of the food, more orders on the restaurants of peak dinner times, the prices on online services may be higher than in the restaurant menu, which may lead to a bad delivered food experience.
A deeper study can be done using sentimental analysis on the reviews to gather more information to prove this approach.
If you want to see more information about this study, you can check my GitHub.