Chemical Spills

Sam Nuzbrokh
Posted on May 19, 2020


Shiny App Github LinkedIn


Do you know what's in your water?

Improper handling and storage of petroleum, hazardous substances/chemicals, or liquefied natural gas (LNG) can result in spills that threaten the environment or pose health and safety risks to nearby persons. Across New York State, there have been instances of spills of petroleum or chemicals that have caused groundwater contamination including some public water supplies. Places like Flint Michigan wallow in misery because their local governments cannot limit groundwater contamination.

To help failing governments and to inform the citizenry, I created a Shiny app that visualizes all the spills that have occurred in New York State since the state began keep records.  The app aims to provide both an intuitive interface for the exploration of the data and offer rigorous analytical summaries of spills, Department of Environmental Conservation (DEC) responsiveness, and institutions or companies tied to the most dumping of chemicals into the commons. All of it organized by region, county, and type of material.

Interactive UI

The interactive map encourages user exploration and investigation of chemical spills that have occurred in NY State. Each spill is drawn with a circle that grows with the logarithm of the spill size. Spills and linked facilities can be selected by the material chemical involved and stored respectively. Spills can also be filtered by size and year of occurence. Hovering over a facility or a spill will bring up useful summary information. In addition, facilities can be grouped and toggled based on their status: closed, inactive, or active.

For a quick walkthrough, we're going to head over to the NYC area and select all size spills of Hazardous Materials for all chemical types from year 2000 to 2020. 


The analysis tab showcases a summary per DEC region, County, and Material Family of total chemical spills by volume. Significant for policymakers - the "Spill Sources" tab breaks down the total spills for the particular set of counties by source. Certain areas in NY state with a lot of industrial activity will have a higher percentage of their spills coming from Industrial and Storage sources compared to a rural region - where most spills are from Vehicles or Municipal sources.

The density, distribution, chemical type, and source of spills are given on the Analysis tab of the app. Below is a generative graphic from a particular user selection:

Each chemical type is presented in its bulk spill volume over the user-selected time period and location. In this case it depicts our selection with the interactive UI of all Hazardous Chemical spills of all sizes in NYC from 2000 to 2020.  We notice low volumes of Ferric Chloride spills throughout the time period and for higher volumes, Sodium Hypochlorites begins to dominate.

DEC Responsiveness

This tab shows the "Case Lag" per DEC Region and County. Case Lag   is the amount of time from when a spill is reported to the DEC Regional office to the time a case is closed. Closing a case entails processing the spill, organizing a cleanup, and administration of reopening of the contaminated site. The chart below, for example, shows the mean case lag (in red) for the NYC region over a period of 30 years. Number of individual spill incidents is represented in purple. We notice a sharp peak around  Sept 11, 2001 as the city bureaucracy was dealing with the physical and chemical fallout of that event.

The amount of statistical insight form this tool is limited. Though an avenue for future work - the dataset itself would need to be enriched by particular company financial data, NYS administrative data, etc. to truly produce actionable insights. What this tool CAN do is serve the public understanding of what the NYS Department of Environmental Conservation does and the case load it deals with. 





About Author

Sam Nuzbrokh

Sam Nuzbrokh

Sam Nuzbrokh is a certified data scientist with a Master's in Space Engineering and a Bachelors in Theoretical Physics. He has 3+ years of data science, engineering, and research experience across satellite communication, engineering telemetry, and academic research....
View all posts by Sam Nuzbrokh >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp