Determinants of asthma throughout NYC

Posted on Dec 25, 2022

Asthma in NYC

A community health profile identified asthma as one of the leading causes of avoidable hospitalizations in the the Mott Haven neighborhood in the Bronx. The neighborhood has the highest child asthma hospitalization rate and the third-highest rate of avoidable adult asthma hospitalizations in the city. The profile suggested a number of preventable possible causes/triggers, including:

  • Air quality (specifically, fine particulate matter of a certain size, denoted PM2.5)
  • Housing-quality related exposure to triggers, like cockroaches, mice, and secondhand smoke

These determinants were supported by some descriptive statistics about particulate matter and housing quality. The goals of this research were to see if these determinants generalized across the city, and to verify or exclude determinants using statistical tests. We found:

  • Asthma rates vary significantly by location across the city, and they are elevated near Mott Haven.
  • Moreover, high asthma rates are localized to a few "hotspots", namely: South Bronx/Harlem, Lower Manhattan, North Brooklyn, North Staten Island, and the Rockaways.
  • Determinants are not uniform across hotspots.
  • Even in areas in the middle 50% of O3 (ozone) density, O3-related asthma hospitalizations are highly elevated in areas where the asthma rate is high.
  • The South Bronx/Harlem and North Brooklyn hotspots fit this pattern. The Rockaways are an outlier (high O3 and asthma rates without elevated O3-related asthma hospitalizations), and O3 does not account for the elevated hospitalizations and diagnoses in Lower Manhattan.
  • On the other hand, the Lower Manhattan hotspot has significantly elevated PM2.5 levels.
  • Locations with more tobacco availability per capita do have slightly elevated hospitalization rates among beneficiaries diagnosed with asthma. Across all of NYC, smelling secondhand smoke regularly within the home does correspond with a significance increase in asthma diagnoses.
  • Rat sightings, a proxy for housing quality, are not independent asthma diagnoses across the city; increased sightings are associated with increased diagnoses.
  • Asthma diagnosis rates are significantly increased in high-poverty neighborhoods across the city.

While some of these results corroborate Mott Haven's community health profile, they show that the determinants are regionalized within the city. Hence, public health initiatives targeting high asthma rates requires a regionalized approach, as determinants which may be nearly irrelevant in one area may be highly relevant in another.

Asthma rates vary significantly by location

In order to determine the distribution of asthma diagnoses throughout the city, we looked at publicly available data on Medicaid beneficiaries. The percentage of beneficiaries who have received an asthma diagnoses within each zip code was computed. First, a χ2 test confirmed that asthma rates were not independent of zip code (p < 2.2 x 10-16).

The distribution of asthma diagnosis rates across the city are not independent of zip code, with some areas significantly higher or lower than others. The highest rate is in the Mott Haven area, corroborating the results of the community health profile.

The diagnosis rates corroborate the inferences drawn from hospitalizations in the community health profile: the highest rates are concentrated in the Mott Haven area. Moreover, many of the areas with elevated rates are contiguous. To identify these "hotspots", each rate was compared to the citywide average with a binomial test, and those locations where the percentage was significantly elevated (p<0.05) were identified. The zip codes with significantly elevated diagnosis rates are shown below.

Many of the zip codes with significantly elevated asthma diagnosis rates are contiguous. The contiguous areas with elevated asthma diagnosis rates cluster into a few major "hotspots". We will loosely refer to these hotspots as Bronx/Harlem, North Brooklyn, Lower Manhattan, the Rockaways, and (North) Staten Island. We can also compare these to the hospitalization rates. For each zip, we look at the proportion of beneficiaries with asthma who have at least one asthma-related ER in a year, averaged by year.

We see that in many of the same areas where the proportion of beneficiaries with asthma diagnoses are increased, so is the rate of ER visits among those diagnoses. That is, not only are diagnoses elevated in those areas, but the rate of hospitalization is higher among those with a diagnosis. Numerically, this appears as a moderate correlation (r=0.59) between the two rates.

Both statistics are possibly biased by the data source: publicly available Medicaid data from hospitals. While it is possible that the availability of health services from these locations biases the diagnosis and hospitalization rates, it does not appear to fully explain the variance in these statistics across the city.

Asthma and air quality

One determinant of asthma hospitalizations suggested by the community health profile was air quality, specifically fine particulate matter (PM2.5). We looked at PM2.5 density from a dataset from a similar time period with measurements taken from the winter. Winter is when PM2.5 concentration is worst, hence giving the worst-case scenario for asthma triggers. The distribution of PM2.5 density across geographic regions is bimodal, with significantly elevated levels near the Lower Manhattan hotspot.

As the asthma ER visit rate among beneficiaries with asthma is elevated in this area, we cannot rule out the possibility that PM2.5 is a determinant in this area. However, while PM2.5 density is elevated in Mott Haven, as noted by the community health profile, we see that many areas with comparable air quality are not associated with elevated diagnosis or hospitalization rates. Moreover, many of the hotspots - namely, North Brooklyn, the Rockaways, and Staten Island - do not have elevated PM2.5 density. Overall, there is low correlation between PM2.5 density and elevated asthma-related ER visit rates among beneficiaries with asthma (r = .33). These together suggest that only extremely elevated PM2.5 density may be associated with increased risk. Measures of other non-O3 particulates, like benzene and emissions due to boilers, pattern similarly.

Increased PM2.5 density is not universally associated with higher ER visit rates among beneficiaries with asthma

O3 density is also uneven across the city. As there is data available directly on O3-attributable asthma, there is evidence that even moderately elevated O3 levels lead to a detectable increase in associated asthma hospitalizations.

The rates of O3-attributable asthma-related hospitalization rates out of the whole population for both children and adults are shown above. For both age groups, the rates are clearly elevated for both the South Bronx/Harlem and North Brooklyn hotspots. O3 itself is approximately normally distributed throughout the city, with the geographic distribution also provided below.

Notably, the Lower Manhattan hotspot has relatively low O3 density levels, and hence the O3-attributable asthma hospitalizations are low (despite overall having a higher asthma hospitalization rate). While it is a logical necessity that an area with high O3-attributable asthma hospitalizations must have both (a) a larger population with an asthma diagnosis, and (b) elevated O3 density, it is somewhat surprising that O3 levels do not need to be significantly elevated to detect the effects.

Areas with a low O3 density or low diagnosis rates have low O3-attributable asthma hospitalization rates, as is logically expected. However, we can see that O3 density does not have to be significantly elevated for it to begin elevating O3-attributable hospitalizations in areas with elevated asthma diagnosis rates. Neighborhoods between the red dashed lines are within the middle 50% of O3 density levels, yet even moderately elevated diagnosis rates have significantly higher O3-attributable asthma-related hospitalizations, with little distinction between neighborhoods with similar asthma rates on either side of this middle 50%.

Notably, neighborhoods with comparable asthma diagnosis rates, but which are below the middle 50% of the O3 distribution, have nearly 1/3 of the O3-attributable hospitalizations. Also notable is one outlying neighborhood, with highly elevated O3, an elevated diagnosis rate, but low O3-attributable asthma hospitalization rate. This outlier corresponds to the Rockaways hotspot. It is the only hotspot with notably elevated O3 levels which does not experience elevated hospitalizations attributable to it. Assuming that the attributions are correct, this indicates that O3 may be a significant contributing factor in two out of the three hotspots where it is relevant, with the third deserving further research.

A final, less direct, comparison was made based on zoning. Other research into elevated asthma diagnosis and hospitalization rates in the Bronx has identified industrial zones as possible determinants. While some hotspots, including the one containing Mott Haven, are near industrial zones, a more robust analysis based on GIS and geographic inference is required.

Summarizing air quality

While PM2.5 density was suggested as a possible determinant, it overall has a low correlation with elevated hospitalization rates among beneficiaries with an asthma diagnosis. Many areas with elevated PM2.5 density do not exhibit increased hospitalization rates, and some areas with elevated hospitalization rates do not exhibit increased PM2.5 density. Extremely high PM2.5 density and asthma-related hospitalizations do overlap in the Lower Manhattan hotspot, however. On the other hand, even areas within the middle 50% of O3 density distribution in the city have significantly elevated O3-attributable asthma hospitalizations in areas with elevated asthma diagnosis rates, showing that the affect of O3 is significant even with only moderately elevated O3 levels. This pattern holds for two out of the three hotspots where O3 is prevalent - South Bronx/Harlem and North Brooklyn - with the Rockaways being an exception.

Tobacco use and asthma rates

Another determinant of asthma diagnosis and hospitalization rates suggested by the community health profile was secondhand smoke. While location data from the Community Health Survey (CHS) on secondhand smoke was not available to the public, we can get a proxy for regional smoking habits by looking at the distribution of tobacco retailers. The distribution of tobacco retailers per capita is roughly log-normal. As a very noisy signal, the correlation between log(retailers/capita) and hospitalization rates is very moderate (r = 0.30).

While location data from the CHS was not publicly available, and hence could not be compared with the location-based hospital data, the survey results internally corroborate indoor secondhand smoke as a determinant. Specifically, self-reported asthma diagnoses and smelling secondhand smoke indoors were not independent under an age-balanced χ2 test (p < 0.0111). Those who responded "Daily", "Weekly", or "Monthly" to the question "How often do you smell cigarettes in your apartment from outside?" were significantly more likely to have been diagnosed with asthma.

While both of these are proxies for the essential statistics, they do support the claims of the community health profile. As exposure to triggers may be more likely to cause symptoms, it may increase the number of people who seek diagnoses, and hence be reflected in the diagnosis rates.

Housing quality and poverty levels

Beyond secondhand smoke, other indicators of housing quality such as rat sightings were also flagged as possible determinants by the community health profile. Again, location data from the CHS was not publicly available, but rat sightings were also not independent of self-reported asthma rates, as determined by an age-balanced χ2 test (p < 0.0002). Respondents who had seen mice in the area around their building were more likely to have been diagnosed with asthma.

To get a sense of the geographic distribution of housing quality, we looked at the proportion of rat inspections which ended in a result associated with active rat activity. While the available data was sparse, with uneven measurements across geographic areas and time, a map showing the results of such inspections from a similar time period indicate both the South Bronx/Harlem and North Brooklyn hotspots as areas with housing quality issues.

As many areas from the relevant time interval did not have rodent inspection data, reliable statistical tests could not be performed. Additionally, the sampling process is biased, as areas where an inspection has been requested are more likely to have rodent activity. As a result, areas where data is available at all are likely to have highly inflated rates. However, the pattern of available data and rates both suggest that housing quality may be an issue in these two hotspots.

Finally, socioeconomic status is also a potential determinant for many health issues, as it is related to housing quality and other triggers. The probability that a respondent in the CHS reported an asthma diagnosis was significantly dependent on the estimated poverty level of the respondent's neighborhood. Respondents who were estimated to be in "high" or "very high" poverty group areas were significantly more likely to report an asthma diagnosis across the city.

However, it is likely that this determinant is only predictive when taken in context with other factors. While some of the lowest-income areas are within hotspots - specifically, the two hotspots where housing quality may be a major determinant (South Bronx/Harlem and North Brooklyn) - average household income is overall not a strong predictor of asthma diagnoses. There is a negligible negative correlation between income and asthma diagnosis rate (r = -0.09), but so many areas are in the lower part of the citywide income distribution that it is not a strong predictor. Income distribution throughout the city are given below, with the caveat that the available data was from a different time period from the asthma rate data (using data generated from the 2020 Census).


This research sought to verify, generalize, and rule out certain determinants of asthma diagnosis and hospitalization rates across NYC, building on observations in the Mott Haven community health profile. We showed that these rates are in fact dependent on location, with rates varying significantly between zip codes. Moreover, areas with elevated rates tend to cluster into one of a few hotspots.

However, the determinants associated with these rates are not uniform. While fine particulate matter density has been suggested as a determinant, only extremely high density seems to have an affect on hospitalizations, as in the Lower Manhattan hotspot. Elsewhere, areas with comparable particulate matter density have high variance in their hospitalization rate.

On the other hand, even mildly elevated O3 levels, while not an issue for the Lower Manhattan hotspot, do affect the South Bronx/Harlem and North Brooklyn hotspots, with the Rockaways being the only exception to the pattern. These two hotspots also have housing quality as a likely determinant, possibly related to the income levels in those areas. While asthma rates are likely not independent from the prevalence of secondhand smoke, it is not apparent that tobacco availability is a strong determinant, and public health initiatives should likely focus on other issues, like air and housing quality, instead.

As a public health research project using publicly available data, there were many restrictions on the inferences we could draw definitively. For example, we frequently had to compare data measured over different time periods using different geographic boundaries, as a wide range of sources were used (Medicaid, UHF, air quality, rodent inspection, census, and retail data, to name a few). Time windows were aligned as much as possible, and summary statistics were recomputed to be comparable across various geographic boundaries, using the best information available to make inferences about the geographic region each datapoint belonged to. However, some misalignment was unavoidable.

These misalignments, combined with the fact that the medical data sources were mostly drawn only from Medicaid beneficiaries, could have introduced bias into the analysis. In order to gain a more robust understanding of these determinants, more sophisticated geographic analysis methods could be used, taking into account the (estimated) distances between the geographic origin of various data, or by interpolating data to the requisite timeframes. However, the available data frequently corroborated the existence of certain asthma hotspots in NYC and the relevance of certain determinants, which we believe will help limit the scope of future research and outreach in this area.

About Author

Zach Stone

I am a data scientist with a background in linguistics research and math. I love to make it easier to analyze and draw insights from complex patterns using a combination of research, code, and modeling.
View all posts by Zach Stone >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI