Data Analysis on NYC Air Quality
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Background and Research Goal
Air pollution can be a major public health issue, especially in large cities where industrial centers are abundant. Data shows high levels of pollution can be especially dangerous to people with asthma or other respiratory conditions, and in some cases, in can be dangerous to children. With that in mind, I think it would benefit anyone with respiratory problems to have an accessible source of information on how pollution levels differ between areas of a city, in this case New York.
I will be creating a catalog of how different types of pollutants have changed throughout time in different areas of New York, as well as how many emergency room visits and deaths have been attributes to pollution where the data is available. Hopefully this can help people make more informed choices when looking for housing in New York, and who knows, it;s possible that people implementing pollution reducing initiatives in NYC will be able to consult this catalog for easier access to that information. If you;d like more information about how certain pollutants can be harmful, I will link some CDC articles later for some introductory information.
Notes on the Original Data set
The original dataset (linked in the sources section), while comprehensive, isn't exactly formatted in a way that allows for easy access by a layperson. The dataset contains a different row for every measurement of a pollutant, whether that be the average parts per billion for ozone in certain year, or the number of emergency room visits attributed to fine particlest per capita from the beginning of winter in one year, until the start of winter in the next year. In order to get easily interpret-able data, I took the closest approximation I could get to a yearly average, or in some cases, the average over a few years.
While the specifics are long winded, if you're interested, a github repo will be linked below that contains the script to make the provided graphs (contained in the src folder). While most of exploratory analysis was done separately, the script is extensively commented, and will walk you through the decisions I made to get each approximate average where there wasn't an easily accessible one.
Data on Types of Pollutants
This dataset contains information of 4 pollutants. Fine particles, O3 (ozone), Nitrite (NO2) and Sulfur Dioxide (SO2 and approximately 114 different areas in NYC (some of which are sub areas of others). We will be separating the measurement of each pollutant, as not all of the pollutants have the same information.
For fine particles and Ozone, we not only have the different levels over time, but we also have estimates for the number of deaths and emergency room visits (but not hospitalizations) attributed to each pollutant. Nitrite and sulfur dioxide only contain the measures over time. In addition, some people may have reason to believe that they are more sensitive to one type of pollutant than another, in which case they may only care about the differences in one type of pollutant.
Data Catalog Info
This project has a collection of graph images that track either the levels of each pollutant over time in each area, or the numbers of hospital visits and deaths. The images are first separated by which pollutant you want to view information for, then it is separated by what type of information is available, such as levels of the pollutant over time, or the rate of emergency room visits per capita over time.
In the case of emergency room visits, it is also separated by whether the rates are for minors or adults. Once you've navigated to the folder that contains the information you want on a specific pollutant, you'll find a collection of graphs for every area out of the 114 where data was available.
A faair warning, the y-axis will not be constant across all these graphs, simply due to time constraints, although I plan to update this project in the near future in order to fix this. Until then, I would advise paying close attention to the tick marks on each graph in order to get accurate information.
In the future, I would also like to further divide the information based on location within the city. Having different folders for each larger area, and showing all of the sub-areas contained within it. I would also like to make a simple web app that would allow you to show the different levels of pollution, death rates, etc over time in multiple areas on a single graph to facilitate easier comparisons.
Link to Project Catalog / Github Repo
Link to Original Presentation Slides (meant to be given orally)
CDC Articles on Pollutants