New York City Menu Items - Web Scraping

Posted on Feb 20, 2017


Restaurant reviews are extremely common and used frequently by users to find restaurants with good reviews near them. Yelp, Urbanspoon, and Zomato are just a few of the popular apps and websites that aggregate reviews. However, these sites, while incredibly useful and popular, could do a better job of providing insights into individual menu items. What's good, what's overpriced, and where to find it. I wanted to create a way to search by menu item and see prices, information and location. I scraped menus from the website using Scrapy, providing me with menu items, prices, and descriptions, along with the locations and names of the restaurants. After data cleaning and removing lower priced items like beverages, I was left with about 500,000 menu items, mostly contained to Manhattan. From there, I created a map in shiny using R and leaflet that allows the user to filter by cuisine and more interestingly, by ingredient and even cooking method.

Overall_ViewOn the map, there are markers colored by the price of the item, Green for items less than $10, yellow less than $15, orange less than $20, and red for items that cost over $20


Clicking on a marker provides info about the item, including the restaurant, price, and description.

Popup_DescriptionThe map also provides the number of search results, as well as the minimum and maximum price


A heat map is also included, allowing the user to get a sense for the density of items with the desired features



Additionally, there is a table tab, so users can see the database the map is pulling from. Using this they can sort by price, the name of the item or restaurant, or by location.

Screen Shot 2017-02-20 at 3.05.50 PM


Finally, histograms of price are created dynamically, following a similar coloring structure, lower priced items in green and highest priced items in red. These histograms allow the user to gain insights into the distribution of prices, as well as the mean and median. Users can also change the bin size manually.




This application could be useful for potential restaurateurs, allowing them to get a sense for individual menu items around the city. They could use this information to competitively price their items or find locations where their food is not well represented, allowing them to take advantage of this information to better position their restaurant in the marketplace. It could also be useful for consumers, allowing users to find menu items with ingredients they like across cuisines. They can also find restaurants near them that  they haven't tried yet, but are intrigued by a specific menu item. In the future I would like to add the ability for users to rate individual menu items in addition to the restaurant overall, as I feel this would be provide useful information to users about which items to get when they eat at a restaurant as well as allow for a recommendation system based on the reviews individual items.

About Author


Daniel Epstein

Daniel Epstein is a neuroscience PHD candidate at the University of Utah, expecting to graduate in summer 2017. While performing analyses on behavioral and neuroimaging data, he became interested in utilizing data science to understand human behavior and...
View all posts by Daniel Epstein >

Related Articles

Leave a Comment

The Use of Web Scraping Services in Research - DataHen February 5, 2019
[…] see such information about it as prices, location and other details. Thus, he decided to do his own research using a web scraping […]
Jordan Brooker December 19, 2017
Hello, I am working on a similar project for a class and I was wondering about your use of In their terms & conditions, they say that users are not allowed to scrape or automatically access their webpages. Was the different when you completed this project? If you only access the site at a rate comparable to a real user, is that ok? Or, do you know of any restaurant sites that allow scraping?

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp