Data: Improper handling and Chemical Spills

Posted on May 19, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Shiny App Github LinkedIn


Do you know what's in your water?

Improper handling and storage of petroleum, hazardous substances/chemicals, or liquefied natural gas (LNG) can result in spills that threaten the environment or pose health and safety risks to nearby persons. Across New York State, there have been instances of spills of petroleum or chemicals that have caused groundwater contamination including some public water supplies. Places like Flint Michigan wallow in misery because their local governments cannot limit groundwater contamination.

To help failing governments and to inform the citizenry, I created a Shiny app that visualizes all the spills that have occurred in New York State since the state began keep records.Β  The app aims to provide both an intuitive interface for the exploration of the data and offer rigorous analytical summaries of spills, Department of Environmental Conservation (DEC) responsiveness, and institutions or companies tied to the most dumping of chemicals into the commons. All of it organized by region, county, and type of material.

Interactive UI

The interactive map encourages user exploration and investigation of chemical spills that have occurred in NY State. Each spill is drawn with a circle that grows with the logarithm of the spill size. Spills and linked facilities can be selected by the material chemical involved and stored respectively. Spills can also be filtered by size and year of occurence. Hovering over a facility or a spill will bring up useful summary information. In addition, facilities can be grouped and toggled based on their status: closed, inactive, or active.

For a quick walkthrough, we're going to head over to the NYC area and select all size spills of Hazardous Materials for all chemical types from year 2000 to 2020.Β 


The analysis tab showcases a summary per DEC region, County, and Material Family of total chemical spills by volume. Significant for policymakers - the "Spill Sources" tab breaks down the total spills for the particular set of counties by source. Certain areas in NY state with a lot of industrial activity will have a higher percentage of their spills coming from Industrial and Storage sources compared to a rural region - where most spills are from Vehicles or Municipal sources.

The density, distribution, chemical type, and source of spills are given on the Analysis tab of the app. Below is a generative graphic from a particular user selection:

Each chemical type is presented in its bulk spill volume over the user-selected time period and location. In this case it depicts our selection with the interactive UI of all Hazardous Chemical spills of all sizes in NYC from 2000 to 2020.Β  We notice low volumes of Ferric Chloride spills throughout the time period and for higher volumes, Sodium Hypochlorites begins to dominate.

DEC Responsiveness

This tab shows the "Case Lag" per DEC Region and County. Case Lag Β  is the amount of time from when a spill is reported to the DEC Regional office to the time a case is closed. Closing a case entails processing the spill, organizing a cleanup, and administration of reopening of the contaminated site. The chart below, for example, shows the mean case lag (in red) for the NYC region over a period of 30 years. Number of individual spill incidents is represented in purple. We notice a sharp peak aroundΒ  Sept 11, 2001 as the city bureaucracy was dealing with the physical and chemical fallout of that event.

The amount of statistical insight form this tool is limited. Though an avenue for future work - the dataset itself would need to be enriched by particular company financial data, NYS administrative data, etc. to truly produce actionable insights. What this tool CAN do is serve the public understanding of what the NYS Department of Environmental Conservation does and the case load it deals with.Β 






About Author

Sam Nuzbrokh

Sam Nuzbrokh is a certified data scientist with a Master's in Space Engineering and a Bachelors in Theoretical Physics. He has 3+ years of data science, engineering, and research experience across satellite communication, engineering telemetry, and academic research....
View all posts by Sam Nuzbrokh >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI