The Rise of NYC 311 Noise Complaints: A Python Exploratory Data Analysis

Posted on Dec 7, 2021

Project GitHub | LinkedIn

Key Takeaways


  • Residential noise complaint volumes in NYC vary significantly between 2019, 2020 and 2021. Both data visualization (showing a surge during the 2020 lockdown period) and hypothesis testing (Kruskal-Wallis test) are used to assess the rise in noise complaints.
  • The Bronx has the quickest response time as compared to all other boroughs, despite the largest complaint volume.
  • While city agencies effectively handle indoor noise complaints within 24 hours, complaints related to outdoor sources of noise, such as manufacturing, construction and lawn equipment, have longer resolution times (> 70 hours), suggesting that there is room to improve the management of complaints related to outdoor noise sources.

Background


In the spring of 2020, New York City emerged as a coronavirus disease epicenter, inspiring a mass exodus as well as a dramatic shift to remote work across many professions. Given lockdown restrictions, the high population density and tightly packed apartments in NYC, and a general frustration in response to the pandemic, I suspected that there would be an increase in noise complaints logged to the city during this time.

The objective of this exploratory analysis is to analyze noise complaints in NYC before and during the pandemic time periods. The questions involved are the following::

  • Did complaints increase, and if so, by how much?
  • Which types of noise are most prevalent?
  • Which locations have the highest noise pollution?
  • How does the average time to resolve complaints vary with location and type?

While it is highly likely that the residential noise complaints increased when the lockdown stay-at-home orders were in effect, these specific insights can inform a data-driven effort for the city government to better handle noise complaints, alleviate annoyance related noise pollution and, hopefully, retain NYC residents.

Data Collection


I analyzed a subset of the 27 million service requests logged in the NYC311’s Service Requests from 2010-Present dataset, made publicly available via NYC Open Data's Socrata API. NYC311 is the non-emergency New York City call center providing the public 24/7 access to city services and government information. Calls cover a broad range of topics with requests encompassing over 500 complaint types, including everything from street/sidewalk repair and missed trash collection to heat/hot water issues and rodent sightings. Residents call in or submit service requests online, and those are then logged into a database. 

I queried the Socrata Open Data API, with SoQL (a syntax similar to SQL), to filter the data down to the years 2019-2021, and to reduce the original 41 columns to only the most relevant features. The resulting dataset had 7.2 million rows and 15 attributes:

  1. Unique Key
  2. Created Date
  3. Closed Date
  4. Agency
  5. Agency Name
  6. Complaint Type
  7. Descriptor
  8. Location Type
  9. Incident Zip
  10. Incident Address
  11. Community Board
  12. Borough
  13. Latitude
  14. Longitude
  15. Location

The highest number of complaints are under the category of Residential Noise, with 13.67% of total complaints.  The following visualizations were used to gain a high-level overview of the data before data cleaning.

While 20 different city agencies are represented in this dataset, the majority of service requests are handled by the NYPD (46%), HPD (17%), and DOT (9%). Each agency is responsible for its own subset of service request complaint types.

The above tree map shows the distribution of complaints, by complaint type and within each agency. Note that Noise-related complaints are handled by 3 different agencies and include about 15 different descriptors. This observation will guide data cleaning, as described below.

Data Cleaning & Manipulation 


The original noise-related service requests are not labeled consistently across agencies. Some noise complaints contain ‘descriptors’ that are either too specific (‘Construction before/after hours’, ‘Construction equipment’) or they have a ‘complaint type’ that is too vague (‘Noise’). I created a function with regex conditions to consolidate these various categories and relabel the requests using two new features, ‘Noise Type’ (such as ' Loud Talking’), and ‘Noise Location' (such as Residential Building/Housing). I also engineered a third feature, ‘Resolution Time,’ by subtracting the service request closed date from the created date.

The dataset was finally filtered down to noise-related complaints, resulting in 1,984,645 records between 2019-2021.

A Closer Look at Noise 


I. Monthly total noise complaints increased from 58,842 in June 2019 to 107,725 in June 2020, a percent change of 83.08%. To get a big picture view of noise complaints during the baseline time period (2019) and after (2020- 2021), I created a line plot of complaints volume over time.

II. Looking at the types of noise, Loud Music/Partying is the most prevalent complaint, followed by Vehicle Noise and Banging/Pounding. Residential and Street/sidewalk complaints heavily outnumber noise complaints from other locations. Visually, one can see a surge in noise complaint volumes in 2020. Hypothesis testing reveals that noise complaint volumes varied significantly (at the 95% level) when comparing volumes between 2019, 2020 and 2021. 

Two vertical lines were added to the plots below to label March 2020 (when stay-at-home orders first started) and August 2020 (right after phase 4 reopening) to highlight the time window when most residents were indoors. While seasonality is definitely at play, with a higher number of Loud Music complaints in the summer months (when partying is more popular), there still appears to be an increase in complaints volume during the summer of 2020 compared to those in the summers of  2019 and 2021.

A Kruskal-Wallis hypothesis test was applied to see whether the means of the monthly volumes of noise complaints filed varied significantly across years. The test yielded a p-value of 0.0137, indicating that the null hypothesis that there is no significant difference between the noise complaint volumes from 2019, 2020 and 2021 was rejected at the 95% certainty level. The Kruskal-Wallis test was also applied to each noise location and noise type individually, across years. Here, Residential noise (p-value = 0.0004) and Banging/Pounding noise (p-value = 0.0007)  stood out as descriptors of noise that varied significantly, at the 99.09% level, between 2019, 2020 and 2021.

III. The Bronx had the highest volume of noise complaints (554,655 total) while Staten Island had the fewest (44,391). The figure below shows the normalized volume of complaints adjusted by the 2020 Census population in each borough.

IV. Brooklyn has the highest average resolution time, 14.7 hours, while Queens has the least, 7.6 hours. It is interesting to note that the Bronx has the second lowest resolution time despite the highest volume of complaints adjusted by population.

V. Across all boroughs, complaints associated with neighbors next door (loud music/partying, loud talking, banging/pounding noise, loud television) were all resolved  quickly, well below 24 hours of the incident. Complaint categories such as lawn care equipment, manufacturing and construction, took at least 71 hours to resolve, possibly since they are related to outdoor noise sources and/or long-term projects, which makes them more difficult to track down and control. Air condition/ventilation equipment noise complaints took the longest time to resolve on average, ranging between 4.04 days (the Bronx) to a high of 9.58 days (Manhattan). The heatmap below was created to visualize complaint resolution times (in hours) by each borough and noise descriptor.  

* Note, the above plots show results with the Helicopter complaints removed from the dataset, as they are the only noise complaints handled by the Economic Development Corporation agency, and they behaved as outliers, with the highest median resolution time of 20 days! 69% of Helicopter complaints were in Manhattan, which makes sense given the three public-use heliports located in the borough.

Recommendations


  • Pre-pandemic and pandemic data visualizations indicate noise complaints fluctuated with government policies on reopening, with a surge in Residential noise during the lockdown period. Hypothesis testing confirms that noise complaint volumes across 2019, 2020 and 2021 vary significantly, with Residential Noise and Banging/Pounding complaints varying the most significantly. Based on this, I recommend that the city prioritize infrastructure and building insulation that can help reduce the number of complaints called in due to noisy neighbors. 
  • Although Residential noise is the most prevalent type, an analysis on resolution time indicates that city agencies are effectively able to resolve these cases within a couple of hours. The city agencies should focus on solutions to quickly identify and resolve Construction, Manufacturing, Lawn Care equipment and Air condition/ventilation equipment noise as these complaints had the longest resolution times.
  • Of all the boroughs, the Bronx has the highest number of complaints adjusted by population,, but it also has the second quickest resolution time. This could be due to a number of factors. One possible explanation could be that the residents of the Bronx report duplicate incidents (with multiple residents complaining about the same incident), indicating that the community is very engaged and highly comfortable with reporting to NYC311. Alternatively, it could mean that the agency employees responding to requests in the Bronx are more efficient in resolving noise complaints compared to the other boroughs. Further investigation into the Bronx's handling of noise complaints is warranted to see if there are any best practices that can be adopted to aid the other boroughs in improving noise complaint resolution time.

Thank you for taking the time to read about my work! Please check out my author page if you would like to take a look at my other projects.

About Author

Niki Agrawal

Niki is a data science professional with 4+ years of data analysis experience in industry (digital health tech) and computational research (neuroscience, biomedical engineering). Niki enjoys applying creative and analytical thinking to solve real world problems with data....
View all posts by Niki Agrawal >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup music Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp