Food delivery: a new revenue source but also more complexity to manage

Posted on Oct 29, 2017

The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Over the past decade the number of consumer review websites, such as Yelp.com, has exploded. These websites allow consumers to share their experiences about service, product quality, restaurant environment and other aspects. Nowadays, it is very easy to acquire information and data from countless other consumers about restaurants, hotels, products and it shows a significant impact in the businesses.

Another significant change in the last years was the market for food delivery that keeps growing with the creation of several websites and apps delivering meals from restaurants that sometimes haven’t traditionally offered the option food to-go. For restaurant owners, the extra business is often welcomed, but introducing a third party can create a large number of problems.

So, given that bad reviews can harm the business and having the delivery service as a new factor to be reviewed, this study is intended to analyze how reviews from Yelp website can be compared with Seamless website (delivery service)?

Data Collection

On Yelp.com, I used Scrapy to web scrape 393,314 reviews from 570 restaurants in New York City.

On Seamless.com, I used Selenium to web scrape 335,169 reviews from 5,612 restaurants in New York City.

The number of reviews per borough, from each website, can be identified in the charts on the left and the number of restaurants, per borough and price, can be identified in the charts on the right:

 

 

 

 

 

 

 

Analysis on Yelp database:

Before joining the databases form both websites to do the final analysis, a specific analysis on Yelp data was performed.

When analyzing the restaurants from different boroughs, it is possible to notice that Manhattan, Queen and Staten Island show restaurants with higher rates, where 75% of the restaurants have the overall rating between 4 and 5.

The main purpose of this study is to analyze how the delivery service impacts the review rates, the following chart shows the user rates whether the restaurant has delivery service or not based on Yelp reviews. It was possible to notice that restaurants with delivery service have a wider percentage of restaurants with lower rates.

In order to check if lower rates on restaurants with delivery service is a general behavior in New York City or if it changes from one borough to another, I plotted the following chart.  Manhattan and Brooklyn show the same behavior, but Queens shows the opposite behavior and it seems to have no difference on delivery service in Bronx. This indicates that the infrastructure/traffic/service of the borough might have an impact on reviews.

Analysis on Seamless database:

Likewise Yelp, the same analysis was performed on Seamless data. Seamless make available information about what people are saying on reviews related to the quality of food, quality of the delivery and the quality of the order made on the website. I plotted a box plot to check if these variables could be related to the overall rate of the restaurants.

It looks like there is no significant "bad" reviews related to if the order was accurate or not (when the food is delivery accordingly to the order made on the website/app).

Analysis on Seamless and Yelp combined database:

To perform the complete analysis, comparing restaurants that are in both website I joined the databases ending up with a total of 135 restaurants. The total number of reviews from these restaurants on both websites are shown in the chart below.

Initially I plotted a box plot comparing the rated of these restaurants per borough and you can see difference in some boroughs, as follows:

If you see the overall rating from the restaurants together if it doesn't seem to be very different but to confirm that, I plotted the results for restaurants that had more than 1,000 reviews on Yelp. The chart shows that some restaurants do not match but it looks like they usually have similar results.

Regardless of what both charts showed, I tested their correlation (-0.0215) which means a very low correlation. I also ran a two-sample t-test and the p-value was 1.9931e-223 which means that the samples are unlikely to have the same mean. So, the overall rate of people using Yelp (most of the cases going into a restaurant) is different from ordering food online using Seamless.

Conclusions

This study was able to show that Yelp and Seamless have different overall rates for the same restaurant. So, before starting food delivery service, a restaurant needs to aware of the new factors that it may bring to the restaurant management.

Some concerns related to starting delivery service are quality and temperature of the food, more orders on the restaurants of peak dinner times, the prices on online services may be higher than in the restaurant menu, which may lead to a bad delivered food experience.

A deeper study can be done using sentimental analysis on the reviews to gather more information to prove this approach.

If you want to see more information about this study, you can check my GitHub.

About Author

Neuton Fonseca

MBA in Business Analytics and Big Data (ongoing) and recent certification as Data Scientist with an engineering background alongside with 7 years of corporate business experience. A problem solver with passion to gather and analyze data to drive...
View all posts by Neuton Fonseca >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI