Data Study on New York Restaurant Safety

Posted on Feb 13, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

According to NYC Health, “Each year, thousands of New York City residents become sick from consuming foods or drinks that are contaminated with harmful bacteria, viruses or parasites”[1]. Since a common source of food poisoning stems from eating out at insalubrious restaurants, I decided to create a Shiny App where users can easily search and access data and information on restaurants that were temporary closed due to hazardous sanitary violations in the past 5 years. The code for this project can be found on Github. My Shiny can be found here.

I. Introduction

NYC’s Department of Health and Mental Hygiene (DOHMH) conducts unannounced inspections of restaurants at least once a year to check for a variety of issues, such as compliance in food handling, food temperature, personal hygiene, and vermin control. According to its scoring system, each violation of a regulation gets a restaurant a certain number of points, which are then added to an overall score at the end of the inspection. The higher the score, the worst a restaurant performs. Each score is converted to a letter grade (e.g. A/B/C), which must be prominently posted at the entrance of a restaurant.

For my Shiny project, I was interested in investigating restaurant closures within NYC from 2013 to 2017. For example, some of my initial questions were:

  • Do Manhattan restaurants have a lower proportion of closed restaurants than those in the Bronx?
  • Are Salad Shops in Staten Island more likely to have been closed than ones in Manhattan?
  • Is the length of temporary closure related to the overall score at the sanitary inspection (i.e higher score leading to longer closure timeframe)?


II. Data Set & Cleanup

The Restaurant Inspection Results dataset is provided by the NYC DOHMH and can be found at NYC Open Data. It consists of about 400,000 entries from inspections conducted between 2012 and 2017, and includes at a high level information on restaurants location, cuisine, inspection dates and individual violations.

A version of January 17,2016 of the dataset was used for this shiny app. Some initial cleanup was performed on the data, including: removing rows with no scores and negative scores; changing format of dates; fixing borough naming issues; and shortening text values.

The code below was used to generate 3 new columns for the analysis of restaurant closures:

  • n_infractions: numbers of infractions committed at a particular date of inspection
  • n_closures: number of closures a restaurant has had within the past 5 years.
  • days_diff: number of days a restaurant took to reopen after being closed by the Health Department

Information on latitude and longitude was also added to the dataset for the map feature of my shiny app through the geocode function found at the ggmap library. For full details on my code to clean up this dataset, please refer to my github account.


III. Data Analysis

A. Data on Overall Distribution of Grades over the years

Since letter grades are what most NYC residents are familiar with, I first wanted to visualize the breakdown of restaurant grades and how that proportion has changed over the years. Looking at all 5 Boroughs combined, a few observations can be made:

  • Most restaurants perform well at the restaurant inspection, with more than 80% of restaurants receiving an A grade.
  • The proportion of A grades has been increasing over the years, signaling a move by restaurants to become more hygienic.

Data Study on New York Restaurant Safety

The distribution of grades for each Borough found on my Shiny App also shows that, in terms of grades, there is no much differentiation across neighborhoods with most restaurants receiving A grades


B. Proportion of closures

As most restaurants performed well at the sanitary inspections, I decided to focus my analysis on restaurant closures, as those were places that, independent of score received, committed some kind of sanitary violation that posed a serious threat to the health of customers.

B.1 Proportion of closures by Borough

The graph below provides some interesting insights:

  • Overall, less than 2 % of restaurants in NYC have been shut down due to sanitary violation, reflecting the overall trend of clean restaurants seen on the overall analysis of restaurants.
  • The Bronx performs slightly worse than other Boroughs with 1.7% of restaurants closed at some point in the past 5 years, 0.5% higher than Manhattan, the best performer.

Data Study on New York Restaurant Safety


B.2. Proportion of closures by Borough and Cuisine

Since not much differentiation could be seen at the Borough level, I decided to take my analysis a level deeper, and investigate the proportion of closures by Borough and Cuisine.

The graph below displays the results of my analysis for the top 6 worst performers with highest closure proportions. On my Shiny App, users have the option to view more or less bars if they wish. This segmentation by both Borough and Cuisine offered some interesting insights:

  • Four out the top six worst performers are in the Bronx area, which corroborates with the initial findings that the Bronx had the highest closure proportion among Boroughs
  • Restaurants on this grapg such as Salad Stores in Staten Island are places that should be potentially avoided since its closure proportion over the past 5 years is of 23.1% is much higher than the NYC average of 1.1%.

borough and cuisine


C. Length of closure vs Inspection Score

Another question I was interested in was whether inspection scores had any relationship with length of closure. Even though around 85% of restaurants only took less than a week to reopen, I was still expecting restaurants with a higher score to take a few more days to fix its problems and reopen.

The boxplot below shows the distribution of scores by closure length. However, there was no apparent relationship between the two variables aforementioned, with mean of scores around 46 across different categories of length of closure



IV. Conclusion

Data Findings

  • Overall, dining out in NYC is pretty safe, with more than 80% of the restaurants receiving A grade. The proportion of A grade has also been increasing in the past 5 years, signaling a move by restaurants to become more hygienic
  • At a borough level, Bronx should be avoided if you do not want to dine out somewhere with a poor record of closures issued by the Health Department. At a borough and cuisine level, stay clear from Staten Island Salad Shops, which have about 23%(nyc average ~ 1%) of closure proportion.
  • There is no particular relationship between length of closure and inspection score. More than 85% of restaurants reopen within only a week after closing.

Further Improvements/Analysis:

  • Look over seasonal trends of closures
  • Analyze Inspection scores by type of violations
  • Include extra information on closed restaurants with a graph showing evolution of inspections scores


V. Other Features of the Restaurant Closures Shiny App

Aside from the graphs mentioned on this blog post, my shiny app also includes a map where one can find further details and location of restaurants shut down after poor performance in sanitary inspections within the past five years. There is also a tab labeled “Overview of All Restaurants”, where users can check the distribution of scores by Borough and find a Heat Map which displays the distribution of scores by Borough and Cuisine.

About Author

Yvonne Lau

Yvonne Lau is a recent Yale University graduate with a B.A. degree in Economics and Mathematics. Hailing from Rio de Janeiro, Brazil, she became interested in data science after serving as a Data Analyst for a nonprofit organization,...
View all posts by Yvonne Lau >

Related Articles

Leave a Comment

edut February 14, 2017
Nice post, but... what's with the y-axis of the barplot in B.1 ? 3 lines for 1%, 2 for 2% ? Bronx has a value of 1.7% but the bar extends above 2%? Please fix that. It's embarrassing (to me)
To Eat, or Not to Eat…WHERE is the question – Mubashir Qasim February 14, 2017
[…] post To Eat, or Not to Eat…WHERE is the question appeared first on NYC Data Science Academy […]
To Eat, or Not to Eat…WHERE is the question | A bunch of data February 14, 2017
[…] post To Eat, or Not to Eat…WHERE is the question appeared first on NYC Data Science Academy […]
To Eat, or Not to Eat…WHERE is the question - Use-R!Use-R! February 14, 2017
[…] post To Eat, or Not to Eat…WHERE is the question appeared first on NYC Data Science Academy […]

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI