Data Study on NYC Complaints

Posted on May 15, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Contributed by Kelly Mejia Breton. She is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between April 11th to July 1st, 2016. This post is based on her second class project - R  Shiny, due on the 4th week of the program).

If I can make it there, I'll make it anywhere. ~ Frank Sinatra

When I think of New York City I think of noise, sirens, taxis, the constant battle of suitable heat with the landlord, and the scent of a sanitation truck on a hot summer day.  There is nothing like home sweet home, unless other data disagrees.

Two years ago I moved to New Jersey and it took me a while to get use to the sound of silence. Strange, but I found it difficult to sleep at night without the noise.  Furthermore, my husband, a naïve New Yorker as well, never fully got use to the silence; he turns on the heater or air conditioner every night just to hear some noise.

Like a New Yorker in New Jersey, I am sure there is much to get accustom to in New York City.  Some people avoid certain NYC areas at all cost, while to others it may just be what they are looking for.

Using the 311 dataset provided by NYC Open Data I crafted an interactive user application to assist potential/current NYC residents/visitors who are in search of an area with an appropriate sound and style to their liking.  NYC Complaints by Borough (N.C.B.) is a quick reference tool to aid users with their planning, grounded on the complaints reported in a specific borough or neighborhood.



The Data Set

The dataset is a daily collection of complaints made directly to 311 by the public.  311 is a NYC customer service center, launched by Mayor Michael Bloomberg in March 2003. Providing the public with easy access to NYC government services, while assisting agency improvement through consistent measurements.  311 is a huge success, with annual year-on-year usage growth since its launch.  Here are some facts on 311:



The Shiny App

The N.C.B. application is designed in a simple way to tailor to all levels of users.  Today I will walk you through N.C.B. with an example that focuses on a user visiting Brooklyn (my hometown), while citing the other features of the application.

Note: the dataset is only a sample since the original data contains over 11 million observations and 53 variables, this demonstration is based on only 4k observations (not randomized).


Data Study on NYC Complaints

The “Complaints” tab opens to a word cloud chart showing the different types of complaints visually by borough and frequency, the bigger the size of the word the higher the frequency.  Based on this sample, Brooklyn complaints seem to be more often regarding Street conditions/signs, heat and hot water and noise.  However, less regarding snow, rodents, and taxi. This information gives the user an idea of possible problems they may encounter if they decide to visit Brooklyn.


On the top left “Select a Borough” allows the user to change the selected borough and the change is reflected on the charts in all tabs.

Data Study on NYC Complaints


The “Frequency” tab opens to a bar graph, with the option to view complaints annually, monthly or daily.

This can be adjusted on the lower left of this tab for this particular chart. Monthly we see a spike in May for some reason it seems like most complaints are done in May, could be because in May New Yorkers end their winter hibernation, or possibly because there are more festivals and street fairs starting? Or maybe the sample data is skew towards May?  Definitely an area that could be looked into further.   Knowing the frequency by annual, month, or day allows the user to get an idea of when these complaints occur.  If I am deciding to visit Brooklyn, I may not choose May, since complaints are at their highest.

Top Complaints

Screen Shot 2016-05-15 at 12.50.20 PM

The “Top Ten Complaints” tab give a table holding the top ten complains by borough.  In Brooklyn, the top complaint is a Blocked Driveway, followed by illegal parking, & dirty conditions. As a visitor I may not care too much about the block driveways and illegal parking since I plan to take mass transit, but I may keep in mind the dirty conditions.

Data Study on NYC Complaints

The “Map” tab allows the user to get more detailed information by zooming in closer.  If I was considering visiting Bedford Avenue in Williamsburg, Brooklyn, a trending neighborhood, I would zoom in closer and see what complaints are specific in that area.  Seems like there is a flooring /stairs complaint, water leak, and a street light condition.

Ok noted, so if I decide to visit Brooklyn, I expect to hear loud noise, may have to take a cold shower, probably avoid visiting in the month of May, will look out for dirty conditions, watch my step as I walk, and look both sides before I cross the street.


Next steps

Allow the user to sort by complaint type, specific neighborhoods, time period, maybe enter longitude and latitude coordinates and let the user see complaints near a specific area.  Automate application to update live with the latest data available. Uploading the full data set.



About Author


KB is a driven and determined Senior Analyst with nearly 15 years of proven data analytics expertise. Most recently focused on forecasting short-term and long-term global crude oil and product prices for PIRA Energy Group. Previously held a...
View all posts by Breton >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI