Data Study on New York Restaurant Safety
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
According to NYC Health, “Each year, thousands of New York City residents become sick from consuming foods or drinks that are contaminated with harmful bacteria, viruses or parasites”. Since a common source of food poisoning stems from eating out at insalubrious restaurants, I decided to create a Shiny App where users can easily search and access data and information on restaurants that were temporary closed due to hazardous sanitary violations in the past 5 years. The code for this project can be found on Github. My Shiny can be found here.
NYC’s Department of Health and Mental Hygiene (DOHMH) conducts unannounced inspections of restaurants at least once a year to check for a variety of issues, such as compliance in food handling, food temperature, personal hygiene, and vermin control. According to its scoring system, each violation of a regulation gets a restaurant a certain number of points, which are then added to an overall score at the end of the inspection. The higher the score, the worst a restaurant performs. Each score is converted to a letter grade (e.g. A/B/C), which must be prominently posted at the entrance of a restaurant.
For my Shiny project, I was interested in investigating restaurant closures within NYC from 2013 to 2017. For example, some of my initial questions were:
- Do Manhattan restaurants have a lower proportion of closed restaurants than those in the Bronx?
- Are Salad Shops in Staten Island more likely to have been closed than ones in Manhattan?
- Is the length of temporary closure related to the overall score at the sanitary inspection (i.e higher score leading to longer closure timeframe)?
II. Data Set & Cleanup
The Restaurant Inspection Results dataset is provided by the NYC DOHMH and can be found at NYC Open Data. It consists of about 400,000 entries from inspections conducted between 2012 and 2017, and includes at a high level information on restaurants location, cuisine, inspection dates and individual violations.
A version of January 17,2016 of the dataset was used for this shiny app. Some initial cleanup was performed on the data, including: removing rows with no scores and negative scores; changing format of dates; fixing borough naming issues; and shortening text values.
The code below was used to generate 3 new columns for the analysis of restaurant closures:
- n_infractions: numbers of infractions committed at a particular date of inspection
- n_closures: number of closures a restaurant has had within the past 5 years.
- days_diff: number of days a restaurant took to reopen after being closed by the Health Department
Information on latitude and longitude was also added to the dataset for the map feature of my shiny app through the geocode function found at the ggmap library. For full details on my code to clean up this dataset, please refer to my github account.
III. Data Analysis
A. Data on Overall Distribution of Grades over the years
Since letter grades are what most NYC residents are familiar with, I first wanted to visualize the breakdown of restaurant grades and how that proportion has changed over the years. Looking at all 5 Boroughs combined, a few observations can be made:
- Most restaurants perform well at the restaurant inspection, with more than 80% of restaurants receiving an A grade.
- The proportion of A grades has been increasing over the years, signaling a move by restaurants to become more hygienic.
The distribution of grades for each Borough found on my Shiny App also shows that, in terms of grades, there is no much differentiation across neighborhoods with most restaurants receiving A grades
B. Proportion of closures
As most restaurants performed well at the sanitary inspections, I decided to focus my analysis on restaurant closures, as those were places that, independent of score received, committed some kind of sanitary violation that posed a serious threat to the health of customers.
B.1 Proportion of closures by Borough
The graph below provides some interesting insights:
- Overall, less than 2 % of restaurants in NYC have been shut down due to sanitary violation, reflecting the overall trend of clean restaurants seen on the overall analysis of restaurants.
- The Bronx performs slightly worse than other Boroughs with 1.7% of restaurants closed at some point in the past 5 years, 0.5% higher than Manhattan, the best performer.
B.2. Proportion of closures by Borough and Cuisine
Since not much differentiation could be seen at the Borough level, I decided to take my analysis a level deeper, and investigate the proportion of closures by Borough and Cuisine.
The graph below displays the results of my analysis for the top 6 worst performers with highest closure proportions. On my Shiny App, users have the option to view more or less bars if they wish. This segmentation by both Borough and Cuisine offered some interesting insights:
- Four out the top six worst performers are in the Bronx area, which corroborates with the initial findings that the Bronx had the highest closure proportion among Boroughs
- Restaurants on this grapg such as Salad Stores in Staten Island are places that should be potentially avoided since its closure proportion over the past 5 years is of 23.1% is much higher than the NYC average of 1.1%.
C. Length of closure vs Inspection Score
Another question I was interested in was whether inspection scores had any relationship with length of closure. Even though around 85% of restaurants only took less than a week to reopen, I was still expecting restaurants with a higher score to take a few more days to fix its problems and reopen.
The boxplot below shows the distribution of scores by closure length. However, there was no apparent relationship between the two variables aforementioned, with mean of scores around 46 across different categories of length of closure
- Overall, dining out in NYC is pretty safe, with more than 80% of the restaurants receiving A grade. The proportion of A grade has also been increasing in the past 5 years, signaling a move by restaurants to become more hygienic
- At a borough level, Bronx should be avoided if you do not want to dine out somewhere with a poor record of closures issued by the Health Department. At a borough and cuisine level, stay clear from Staten Island Salad Shops, which have about 23%(nyc average ~ 1%) of closure proportion.
- There is no particular relationship between length of closure and inspection score. More than 85% of restaurants reopen within only a week after closing.
- Look over seasonal trends of closures
- Analyze Inspection scores by type of violations
- Include extra information on closed restaurants with a graph showing evolution of inspections scores
V. Other Features of the Restaurant Closures Shiny App
Aside from the graphs mentioned on this blog post, my shiny app also includes a map where one can find further details and location of restaurants shut down after poor performance in sanitary inspections within the past five years. There is also a tab labeled “Overview of All Restaurants”, where users can check the distribution of scores by Borough and find a Heat Map which displays the distribution of scores by Borough and Cuisine.