Data Study on NYC Complaints
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Contributed by Kelly Mejia Breton. She is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between April 11th to July 1st, 2016. This post is based on her second class project - R Shiny, due on the 4th week of the program).
“If I can make it there, I'll make it anywhere.” ~ Frank Sinatra
When I think of New York City I think of noise, sirens, taxis, the constant battle of suitable heat with the landlord, and the scent of a sanitation truck on a hot summer day. There is nothing like home sweet home, unless other data disagrees.
Two years ago I moved to New Jersey and it took me a while to get use to the sound of silence. Strange, but I found it difficult to sleep at night without the noise. Furthermore, my husband, a naïve New Yorker as well, never fully got use to the silence; he turns on the heater or air conditioner every night just to hear some noise.
Like a New Yorker in New Jersey, I am sure there is much to get accustom to in New York City. Some people avoid certain NYC areas at all cost, while to others it may just be what they are looking for.
Using the 311 dataset provided by NYC Open Data I crafted an interactive user application to assist potential/current NYC residents/visitors who are in search of an area with an appropriate sound and style to their liking. NYC Complaints by Borough (N.C.B.) is a quick reference tool to aid users with their planning, grounded on the complaints reported in a specific borough or neighborhood.
The Data Set
The dataset is a daily collection of complaints made directly to 311 by the public. 311 is a NYC customer service center, launched by Mayor Michael Bloomberg in March 2003. Providing the public with easy access to NYC government services, while assisting agency improvement through consistent measurements. 311 is a huge success, with annual year-on-year usage growth since its launch. Here are some facts on 311:
- Open 24 hours a day, 365 days’ year
- Access to 180 languages
- Receive 51,000 calls per day on average
- Annual call volumes to 911 have decreased since the inception of 311
- 311 online was launched in March 2009
The Shiny App
The N.C.B. application is designed in a simple way to tailor to all levels of users. Today I will walk you through N.C.B. with an example that focuses on a user visiting Brooklyn (my hometown), while citing the other features of the application.
Note: the dataset is only a sample since the original data contains over 11 million observations and 53 variables, this demonstration is based on only 4k observations (not randomized).
Complaints
The “Complaints” tab opens to a word cloud chart showing the different types of complaints visually by borough and frequency, the bigger the size of the word the higher the frequency. Based on this sample, Brooklyn complaints seem to be more often regarding Street conditions/signs, heat and hot water and noise. However, less regarding snow, rodents, and taxi. This information gives the user an idea of possible problems they may encounter if they decide to visit Brooklyn.
Borough
On the top left “Select a Borough” allows the user to change the selected borough and the change is reflected on the charts in all tabs.
Frequency
The “Frequency” tab opens to a bar graph, with the option to view complaints annually, monthly or daily.
This can be adjusted on the lower left of this tab for this particular chart. Monthly we see a spike in May for some reason it seems like most complaints are done in May, could be because in May New Yorkers end their winter hibernation, or possibly because there are more festivals and street fairs starting? Or maybe the sample data is skew towards May? Definitely an area that could be looked into further. Knowing the frequency by annual, month, or day allows the user to get an idea of when these complaints occur. If I am deciding to visit Brooklyn, I may not choose May, since complaints are at their highest.
Top Complaints
The “Top Ten Complaints” tab give a table holding the top ten complains by borough. In Brooklyn, the top complaint is a Blocked Driveway, followed by illegal parking, & dirty conditions. As a visitor I may not care too much about the block driveways and illegal parking since I plan to take mass transit, but I may keep in mind the dirty conditions.
The “Map” tab allows the user to get more detailed information by zooming in closer. If I was considering visiting Bedford Avenue in Williamsburg, Brooklyn, a trending neighborhood, I would zoom in closer and see what complaints are specific in that area. Seems like there is a flooring /stairs complaint, water leak, and a street light condition.
Ok noted, so if I decide to visit Brooklyn, I expect to hear loud noise, may have to take a cold shower, probably avoid visiting in the month of May, will look out for dirty conditions, watch my step as I walk, and look both sides before I cross the street.
Next steps
Allow the user to sort by complaint type, specific neighborhoods, time period, maybe enter longitude and latitude coordinates and let the user see complaints near a specific area. Automate application to update live with the latest data available. Uploading the full data set.