Data Study on NYC restaurants reviews and inspection scores

Posted on May 18, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

If you ever pass outside a restaurant in New York City, you’ll notice a prominently displayed letter grade. Since July 2010, the Health Department has required restaurants to post letter grades showing sanitary inspection data results.

An A grade attests to top marks for health and safety, so you can feel secure about eating there. But you don’t necessarily know that you will enjoy the food and experience courteous service. To find that out, you’d refer to the restaurant reviews. For this project, I looked at a simple data analysis and visualization of the NYC restaurants reviews and inspection scores data to find out if there is any correlation between the two. The data will also show which types of cuisines and which NYC locations tend to attract more ratings.

Nowadays, business reviews, ratings and grades are the decision making for any business to measure for their quality, popularity and future success. For restaurants business, ratings, hygienic, and cleanliness are essential. A popular site for reviews, Yelp, offers many individual ratings for restaurants. The New York City Department of Health and Mental Hygiene (DOHMH) conducts unannounced restaurant inspections annually. They check if the food handling, food temperature, personal hygiene of workers and vermin control of the restaurants are in  compliance with hygienic standards.. The scoring and grading process can be found here.

Data

The restaurant ratings and location information used in this project come from Yelp’s API. The inspection data was downloaded from NYC open data website. I merge yelp restaurants review data and inspection data and remove NA rows which doesn’t haveeither inspection score or reviews. I also reassigned the inspection score in the grades A, B, and C category as this measure is widely used and label on restaurants. There were other scores, primarily P or Z, or some version of grade pending which we are ignoring in our analysis here. Restaurants with a score between 0 and 13 points earn an A, those with 14 to 27 points receive a B and those with 28 or more a C.


Data Study on NYC restaurants reviews and inspection scoresData Study on NYC restaurants reviews and inspection scores

 

 

The data shows that an A is the most commonly assigned inspection grade for restaurants of all types in all locations. I plotted various bar plots to visualized the inspection scores and ratings based on borough and cuisine type.

With respect to location, this borough bar plot shows that Manhattan has highest number of restaurants with all grades compared to others. This is obvious as it has highest number of restaurants in general.  Staten Island has lowest number of restaurants with grades A, B and C among all.

Ratings by Cuisine

As for cuisine types, the cuisines plots shows first 15 restaurants with highest number of counts for based on cuisine.  This indicates that the American cuisine has highest number of A grade compared to other. This indicate that american restaurants are focus more on hygienic and cleanliness compare to others type of restaurants.

Data Study on NYC restaurants reviews and inspection scores

Ratings by Boroughs

The review plot indicates that most  restaurants do achieve the top rating of 4 stars. Again, Manhattan has the highest number of restaurants with ratings four stars while Staten Island has lowest numbers of restaurants with high ratings. It also shows that almost all borough have a low number of  2 star restaurants. Moreover, cuisine reviews plot indicates that American cuisine tend to have the highest rating compared to other cuisines. The reasons could be more American restaurants under this category then others.

 

Findings

The scatter plots shows therelationship between inspection score and rating. It indicates that there is no direct clear correlation between two variables. It is fairly common for a  restaurant with a C grade inspection score to achieve a 4-5 star ratings in a review. Also it is possible to find a number of A grade ratings for restaurants that only have 1-2 stars.  This could be because so long as food is tasty, people will rate the restaurant well because they do not pay very much attentions to cleanliness and hygienic issues.

The scatter plots also show that though some  restaurants maintain a very high level of cleanliness and hygienic food conditions, they fail to get good ratings, which could be due to bad service or less than tasty food . We can do further analysis on both side of  restaurants by analyzing review comments and try to find why some restaurants have good reviews but low inspection score and vice-versa. This require further data about reviews comments and further analysis using NLP.

 

 

The cluster map of NYC restaurants helps visualize locations and  to filter the restaurants based cuisine types. The color mark of the point indicates the ratings and includes  descriptions of the featured restaurants. The heat map show the density of the restaurants based on borough selection or cuisine selection. It indicate which area has a greater number of restaurants. This could be helpful for business people to make informed decisions about where to  open new restaurants based on the types of restaurants already in place.

Conclusion

Finally, this app can be useful for people to filter the data base on borough, cuisine , ratings , and inspection grade.  The people want to go to eat with specific criteria can filters the restaurants and visit their favorite restaurants based on top marks for both ratings and inspection grades. The shiny app link is here.

 

About Author

Related Articles

Leave a Comment

Akshay Vaghani June 6, 2018
Hi Andrew ! Thank you for comments , I will make it to show in % per borough or % cuisine.
Andrew June 6, 2018
With a 4-day trip to NYC coming up, this is great! Some rambling thoughts =) Manhatten obviously is the prime tourist destination which correlates to total # of review. Instead of showing absolute count, it would be interesting to see rating in % per borough --- maybe include a summary statistics of average rating with error bars per borough per cuisine. Is there temporal information? Would be neat to observe change in rating overtime.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI