Market Research Data Analysis for Restaurant Expansion

Posted on May 17, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


In the modern age of globalization, international cuisines have been showing up on every corner of metropolitan cities in the US with the competition being ever so present thus raising the standard of dining. Restaurant goers have become ever so critical of restaurants, through their social media accounts and blogging on online data review websites (i.e. Yelp, Google Reviews etc).

Personally, I have a large affinity for Italian food; having grown up along the coast of the Mediterranean and with a diet somewhat similar to that of the Italian diet, I almost always go for a Neapolitan style pizza with its charred bubbly crust and a thin layer of sweet tomato sauce that is topped off with a silky white mountain of fresh mozzarella and leaves of aromatic basil (see Figure 1 below).

Market Research Data Analysis for Restaurant Expansion
Figure 1. Credit to @Nik_owens on Unsplash

Inspiration and Objective

I wanted my first piece of work to potentially help bring joy to other cities in the US that could be lacking some quality Italian marketplace and restaurant. With existing locations in New York City, Los Angeles, Chicago, Las Vegas, and Boston, Eataly has clearly prospered in large and affluent metropolitan cities, which brings forth the question as to which new city could be home to a new Eataly marketplace and restaurant.

First and foremost, I wanted to narrow down the possible options by only considering cities with similar median household income and population to those of cities with existing Eataly franchises. After determining the top 25 most populated US cities and highest median income, I narrowed down the possible options to six, by comparing new locations’ median income and population to the average median income and population of existing cities. The resulting cities, having high population and median income, were Seattle, Washington DC, Austin, San Francisco, Miami, and San Jose.

Web Scraping & Data Preparation

In order to better assess which expansion location would be the best fit for a new Eataly marketplace and restaurant, I turned to Yelp to gather data on the top 1000 rated restaurants in each of the six potential cities. Yelp provides customers with the ability to rate any business with both a standardized metric on a scale of 1 through 5 starts and a more thorough customer-based review. For the purpose of this study, only the aggregated rating for restaurants was necessary, as the detailed review could serve as a good measure for a different method, particularly in the form of Natural Language Processing.

When searching for Italian restaurant ratings in a certain city, as seen in Figure 2, Yelp generates a list of 30 restaurants on each page, all within close proximity of the location selected. Among the information that I looked to gather for each restaurant, this included:

  • Location within the designated city
  • Name
  • Number of reviews obtained
  • Average rating
  • Price range
Market Research Data Analysis for Restaurant Expansion
Figure 2. Sample of scraped restaurant review on Yelp

After determining the important features to collect from Yelp, I built a web scraping tool using Scrapy - a Python framework for large scale web scraping - and then saved the output of 6000 restaurant ratings across six cities into a csv file to analyze.


Prior to forming any conclusive analysis, I had to make a few assumptions based on the limitation of the data that was attainable for each restaurant and in order to create more objective comparison metrics between favorability of the different cities; listed below are the assumptions:

  • The age of the restaurant has no contribution to or is not a factor for the number of reviews that a restaurant might have
  • Customers has the same like-li-hood to leave a review independent on whether their experience was subpar, mediocre, good, or of the highest-quality
  • Every review represents a unique customer or group of customers for each restaurant rather than a single (or group of) customer(s) being associated with multiple reviews for one restaurant

Data Analysis​

​​​​​The goal behind my analysis was to determine the optimal location that Eataly could expand to while targeting a similar market segment of customers; this is to be achieved by considering the following approaches: determining the average ratings to show overall satisfaction of customers for the restaurant and/or cuisine, measuring the frequency of visits to the restaurants to indicate a high demand in the city, and comparing the proportion of restaurants within Eataly’s price range in each city to show the level (or amount) of competition that a new restaurant would face.

First Approach - Ratings

First and foremost, my initial approach was to analyze and compare the aggregate ratings of restaurants for all the 6 cities. With one thousand restaurants in each location and hundreds of thousands of reviews in total, it became clear and obvious that Seattle has the highest average ratings of Italian restaurants among all cities as seen in Table 1 and Figure 3 below. Seattle’s average restaurant rating of 4.12 over a maximum of 5 is the highest and has the lowest standard deviation which, in return, shows that the high majority of restaurants in Seattle are of high quality and are well-appreciated among customers.

City Mean Standard Deviation
Seattle, WA 4.12 0.28
Miami, FL 3.96 0.45
San Francisco, CA 3.92 0.46
San Jose, CA 3.90 0.40
Austin, TX 3.87 0.45
Washington, DC 3.80 0.48

Table 1 - Mean and Standard Deviation of restaurant ratings


Market Research Data Analysis for Restaurant ExpansionFigure 3 - Comparing average Yelp ratings for Italian restaurants in potential cities


If I were just to focus on Italian restaurants as a whole, it would clearly show that Seattle has a high level of competition for any Italian restaurant but also that customers generally have a very high appreciation for Italian food.

To dig even deeper into the restaurant ratings, I considered only restaurants that are within the same price category as Eataly (-priced to all restaurants in each city, as seen in Table 2, there seems to be a relatively low proportion of restaurants that are within Eataly’s category in Seattle (0.73), Miami (0.73), and Washington DC (0.67); thus, over saturation of middle-priced restaurants is clearly not the case in any of these three locations.

City Mean Standard Deviation Ratio of Restaurants
Seattle, WA 4.114 0.306 0.73
San Francisco, CA 3.937 0.462 0.79
Miami, FL 3.936 0.466 0.74
Austin, TX 3.912 0.427 0.73
San Jose, CA 3.900 0.405 0.76
Washington, DC 3.783 0.480 0.67

Table 2 - Aggregate of middle-priced restaurant ratings in each location


Market Research Data Analysis for Restaurant ExpansionFigure 4 - Comparing average Yelp ratings for $$ Italian restaurants

Since Yelp reviews can be very subjective and they are likely to vary in their level of critique from one city to another and from one price category to another, I decided to look at the general picture of all Italian restaurants across all price points in each of the six potential cities.

One easy thing to notice from Figure 5 about each price category is that Seattle has the highest average rating for Italian restaurants among all the cities. We could easily dismiss it as a very competitive market, with all restaurants fairing tremendously well, but on the contrary, it could very well mean that these restaurant-goers do not have a very difficult level of palette to satisfy and are less “picky” for the most part. To look into the matter of competition even further, I decided to hone into another feature to see how competitive each city is across all three price categories.

Figure 5 - Average restaurant rating per price category

Second Approach – Customer visits/frequency data

My next approach was to look into customer frequency and which price category of restaurants has the highest proportion of visitors to other categories across all potential cities.

At first, when considering Italian restaurants that range between $11 and $30 per person, the proportion of customers for middle-priced Italian restaurants is the highest for Seattle and San Francisco at 79% of restaurants falling within Eataly’s price range per Yelp’s classification. Per Table 3 and Figure Washington DC and San Jose have a significantly lower proportion of customers visiting middle-priced Italian restaurants as opposed to the remaining regions, and so this could potentially represent a low demand for a new Eataly restaurant.

City $1-$10 Range $11-$30 Range $31+ Range
Seattle, WA 0.07 0.79 0.14
Miami, FL 0.12 0.73 0.15
Washington, DC 0.19 0.68 0.13
San Jose, CA 0.10 0.69 0.21
San Francisco, CA 0.06 0.79 0.15
Austin, TX 0.13 0.76 0.11

Table 3 - Proportion of customers per restaurant price category


Figure 6 - Proportion of customers per restaurant price category

Another point that I wanted to consider is the purchasing power of customers per city. Per Table 4 and Figure 7, 73% of Italian restaurants in Seattle are within the $11-$30 price range whereas, per Table 3, 79% of Italian restaurant-goers in Seattle do visit this category of restaurants sparking the largest contrast of supply not necessarily matching up demand.

The same can’t be said for other cities - such as Miami, Washington DC, San Jose, and San Francisco - where when considering middle-price Italian restaurants, the proportion of customers is lower than the proportion of available restaurants. This presents the potential expansion opportunity for Eataly in both Seattle and Austin.

City $1-$10 Range $11-$30 Range $31+ Range
Seattle, WA 0.10 0.73 0.17
Miami, FL 0.12 0.74 0.14
Washington, DC 0.22 0.67 0.11
San Jose, CA 0.13 0.76 0.11
San Francisco, CA 0.05 0.79 0.16
Austin, TX 0.16 0.73 0.11

Table 4 - Proportion of restaurants per price category

Figure 7 - Proportion of restaurants per price category


When considering the different approaches to analyze the different cities as potential expansion locations, most of the drawn insights lead to the same conclusion that Seattle could be home for a new competitive and successful Eataly location.

Initially, the fact that Seattle has the highest average of Italian restaurants ratings and the lowest variation of ratings among all cities shows that the quality of Italian eateries, in general, are of high standards to match the customers’ appeal. Later on, when diving deeper into Italian restaurants within Eataly’s price range, Seattle still had the highest average ratings while maintaining the 2nd lowest ratio of restaurant availability in the region and the highest customer traffic to middle-priced restaurants, thus, showing a formidable opportunity for a new Italian eatery in the city.

While this analysis shows Seattle having an overwhelmingly upper hand versus the other potential cities, it only dives into the potential pull of customers to a new Eataly. There are many other factors that can be taken into account when considering opening up a restaurant in a location, such as operating expenses, and land or rental space availability.

Mohamad's code and analysis for this project can be found in his github.

About Author

Mohamad Sayed

Mohamad has an MS in Operations Research Engineering from the University of Southern California. Prior to the bootcamp, he worked in a variety of roles, mainly supply chain and project management. Currently, Mohamad is a Data Science Fellow...
View all posts by Mohamad Sayed >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI