Data Analysis on NYC Air Quality

Posted on Nov 8, 2021
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Background and Research Goal

Air pollution can be a major public health issue, especially in large cities where industrial centers are abundant. Data shows high levels of pollution can be especially dangerous to people with asthma or other respiratory conditions, and in some cases, in can be dangerous to children. With that in mind, I think it would benefit anyone with respiratory problems to have an accessible source of information on how pollution levels differ between areas of a city, in this case New York.

I will be creating a catalog of how different types of pollutants have changed throughout time in different areas of New York, as well as how many emergency room visits and deaths have been attributes to pollution where the data is available. Hopefully this can help people make more informed choices when looking for housing in New York, and who knows, it;s possible that people implementing pollution reducing initiatives in NYC will be able to consult this catalog for easier access to that information. If you;d like more information about how certain pollutants can be harmful, I will link some CDC articles later for some introductory information.

Notes on the Original Data set

The original dataset (linked in the sources section), while comprehensive, isn't exactly formatted in a way that allows for easy access by a layperson. The dataset contains a different row for every measurement of a pollutant, whether that be the average parts per billion for ozone in certain year, or the number of emergency room visits attributed to fine particlest per capita from the beginning of winter in one year, until the start of winter in the next year. In order to get easily interpret-able data, I took the closest approximation I could get to a yearly average, or in some cases, the average over a few years.

While the specifics are long winded, if you're interested, a github repo will be linked below that contains the script to make the provided graphs (contained in the src folder). While most of exploratory analysis was done separately, the script is extensively commented, and will walk you through the decisions I made to get each approximate average where there wasn't an easily accessible one.

Data on Types of Pollutants

This dataset contains information of 4 pollutants. Fine particles, O3 (ozone), Nitrite (NO2) and Sulfur Dioxide (SO2 and approximately 114 different areas in NYC (some of which are sub areas of others). We will be separating the measurement of each pollutant, as not all of the pollutants have the same information.

For fine particles and Ozone, we not only have the different levels over time, but we also have estimates for the number of deaths and emergency room visits (but not hospitalizations) attributed to each pollutant. Nitrite and sulfur dioxide only contain the measures over time. In addition, some people may have reason to believe that they are more sensitive to one type of pollutant than another, in which case they may only care about the differences in one type of pollutant.

Data Catalog Info

This project has a collection of graph images that track either the levels of each pollutant over time in each area, or the numbers of hospital visits and deaths. The images are first separated by which pollutant you want to view information for, then it is separated by what type of information is available, such as levels of the pollutant over time, or the rate of emergency room visits per capita over time.

In the case of emergency room visits, it is also separated by whether the rates are for minors or adults. Once you've navigated to the folder that contains the information you want on a specific pollutant, you'll find a collection of graphs for every area out of the 114 where data was available.

A faair warning, the y-axis will not be constant across all these graphs, simply due to time constraints, although I plan to update this project in the near future in order to fix this. Until then, I would advise paying close attention to the tick marks on each graph in order to get accurate information.

In the future, I would also like to further divide the information based on location within the city. Having different folders for each larger area, and showing all of the sub-areas contained within it. I would also like to make a simple web app that would allow you to show the different levels of pollution, death rates, etc over time in multiple areas on a single graph to facilitate easier comparisons.


Link to Project Catalog / Github Repo

Link to Original Presentation Slides (meant to be given orally)

CDC Articles on Pollutants

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI