NYC Air Quality

Posted on Nov 8, 2021

Background and Research Goal

Air pollution can be a major public health issue, especially in large cities where industrial centers are abundant. High levels of pollution can be especially dangerous to people with asthma or other respiratory conditions, and in some cases, in can be dangerous to children. With that in mind, I think it would benefit anyone with respiratory problems to have an accessible source of information on how pollution levels differ between areas of a city, in this case New York. I will be creating a catalog of how different types of pollutants have changed throughout time in different areas of New York, as well as how many emergency room visits and deaths have been attributes to pollution where the data is available. Hopefully this can help people make more informed choices when looking for housing in New York, and who knows, it;s possible that people implementing pollution reducing initiatives in NYC will be able to consult this catalog for easier access to that information. If you;d like more information about how certain pollutants can be harmful, I will link some CDC articles later for some introductory information.

Notes on the Original Dataset

The original dataset (linked in the sources section), while comprehensive, isn't exactly formatted in a way that allows for easy access by a layperson. The dataset contains a different row for every measurement of a pollutant, whether that be the average parts per billion for ozone in certain year, or the number of emergency room visits attributed to fine particlest per capita from the beginning of winter in one year, until the start of winter in the next year. In order to get easily interpret-able data, I took the closest approximation I could get to a yearly average, or in some cases, the average over a few years. While the specifics are long winded, if you're interested, a github repo will be linked below that contains the script to make the provided graphs (contained in the src folder). While most of exploratory analysis was done separately, the script is extensively commented, and will walk you through the decisions I made to get each approximate average where there wasn't an easily accessible one.

This dataset contains information of 4 pollutants. Fine particles, O3 (ozone), Nitrite (NO2) and Sulfur Dioxide (SO2 and approximately 114 different areas in NYC (some of which are sub areas of others). We will be separating the measurement of each pollutant, as not all of the pollutants have the same information. For fine particles and Ozone, we not only have the different levels over time, but we also have estimates for the number of deaths and emergency room visits (but not hospitalizations) attributed to each pollutant. Nitrite and sulfur dioxide only contain the measures over time. In addition, some people may have reason to believe that they are more sensitive to one type of pollutant than another, in which case they may only care about the differences in one type of pollutant.

Catalog Info

This project has a collection of graph images that track either the levels of each pollutant over time in each area, or the numbers of hospital visits and deaths. The images are first separated by which pollutant you want to view information for, then it is separated by what type of information is available, such as levels of the pollutant over time, or the rate of emergency room visits per capita over time. In the case of emergency room visits, it is also separated by whether the rates are for minors or adults. Once you've navigated to the folder that contains the information you want on a specific pollutant, you'll find a collection of graphs for every area out of the 114 where data was available.

A faair warning, the y-axis will not be constant across all these graphs, simply due to time constraints, although I plan to update this project in the near future in order to fix this. Until then, I would advise paying close attention to the tick marks on each graph in order to get accurate information. In the future, I would also like to further divide the information based on location within the city. Having different folders for each larger area, and showing all of the sub-areas contained within it. I would also like to make a simple web app that would allow you to show the different levels of pollution, death rates, etc over time in multiple areas on a single graph to facilitate easier comparisons.


Link to Project Catalog / Github Repo

Link to Original Presentation Slides (meant to be given orally)

CDC Articles on Pollutants

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup music Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp