Do Patterns of Demographics Influence Which Neighborhood is Gentrificatied?

Charles Leung
Posted on Aug 8, 2016


Back in the 1960s, Williamsburg, Brooklyn was a manufacturing hub that lacked glamour, but promised labor to thousands of lower income residents. The neighborhood attracted large amounts of Latin American immigrants and developments of public housing projects. However, with the decline of manufacturing in the 1990s, many were left unemployed mounting to the social ills of the time: poverty, racism, poor health care and inadequate education. With walls and roads ridden with decay, it most definitely is a far cry from today’s bustling hipster and artist haven.

However, the unfortunate side effect to the massive influx of educated millennials is the soaring rent prices. The growth of demand had started to displace poorer, underprivileged residents in the community; this has forced many to live out in the suburbs, even further away from the city where many earn wages to survive paycheck to paycheck. While this is the two-sided coin of urban development, we must ask: why did Williamsburg, with Myrtle Avenue nicknamed “Murder Avenue”, suddenly boom as opposed to Flatbush, Canarsie, or Bed Stuyvesant? What clicked that made real estate developers start to look at the area as prime for gentrification this past decade? I sought to answer these questions by looking at the demographic data of New York communities from 2005 – 2014.


Source Data

The American Community Survey (ACS) provides 1-year, 3-year, and 5-year estimates between the decennial census. The level of geography I used was called PUMA (Public Use Microdata Areas) - PUMAs are 5-digit codes, used to describe population zones of around 100,000 people. It was first developed for use in the Census and ACS. There is a cooperative program between the Census Bureau and the states that allows local input to suggest boundaries for them.


The source file contains individual responses for summary:


I summarized them by counting categorical values and taking the median of the values by PUMA to create the working set of data:

Data Formatting Code

Percentages for Mapping


Page 1: PUMA Dictionary

The first part of my Shiny application is a table that allows you to understand which zones each PUMA represents. If you’re interested in a certain neighborhood, you may search for it, or find it manually with sort and next page.



Page 2: Comparison Chart

The second part of my application is a dynamic line chart that allows you to select which PUMAs of interest and which variables you would like to compare.  For example, the chart below shows the change in household income between PUMAs 04008, 03805, and 03802:



Page 3: Variable Map

Sometimes it is of interest to see the changes geographically – perhaps the demographics of nearby PUMAs or neighborhoods have a domino effect to start gentrification. The third part of my application allows you to visualize these with an overlaying map, year slider, and variable selector.




The following codes were used to create the Shiny Application:




These two codes use a javascript library (nvd3):





If we compare neighborhoods that were gentrified, and those that didn’t succeed in gentrification, we may be able to see any potential factors. I first sorted the PUMA listing by difference in housing growth between 2005 and 2014; this will allow us to see which neighborhoods developed the most, and which neighborhoods lagged behind in development.



Some of the top ten are what we expected: Williamsburg, Cobble Hill, Long Island City, etc., neighborhoods with very noticeable revitalization. On the bottom tier, we mostly have poorer neighborhoods in the Bronx and Brooklyn, such as Bronxdale and East New York. I’ve colored our two groups of Gentrified and unable to Gentrify as Blue and Red respectively.


Race and Household Type


By plotting time series plots, we may be able to see if there is a tipping point at where a neighborhood becomes gentrified. Gentrification is traditionally defined by the influx of educated white millennials (age range lower 20s – mid 30s). I plotted the growth of white populations and married households within the two groups (Gentrified as Green, while Unable to Gentrify as Purple). With the white race, all the gentrified neighborhoods have always had at least 40% white population. However, there are also poorer neighborhoods with a large population of white people (see Brighton Beach). Likewise, nothing notable can be seen in the married household demographic. Both results are inconclusive.




The above graph plots population with a bachelor’s degree and above. Education had the clearest distinction. All of the gentrified neighborhoods (blue) have at least 30% with a college degree. Williamsburg (lowest blue line) was at the bottom early in 2008, but started increasing dramatically in 2010.  This shows that education may be a potential factor in determining neighborhood development.

If we do find that education is the main primer that attracts developers, then funding into neighborhood education is of utmost importance. In order to preserve the local community, the city may also have to figure out ways to improve local education and housing strategies that do not displace longtime residents. Whether this is the case, further sociological and urban development studies will be needed. This is an issue we have yet to solve today - the reverse urban sprawl of the educated and wealthy moving into cities, and the underprivileged moving out to suburbs; By moving locals further from their workplace and community programs, gentrification all in all increases the barriers for equality. Current strategies in place by the government are rent controlled housing, and in the future subsidized technical education – both of which sound excellent for protecting the poor. New York City is and has always been a melting pot city; I would love to see it develop symbiotically, with both the locals and newcomers hand in hand.

About Author

Charles Leung

Charles Leung

During his past three years in the manufacturing industry, Charles has discovered and developed his passion for big data – not only to solve quality and production issues but also to create tools that automated and optimized steelmaking...
View all posts by Charles Leung >

Leave a Comment

Mauricio March 20, 2017
Thanks Gabriele. You’re right, deciding how you’re going to use whichever social media outlet you choose is a big deal for business, but not such a big deal if all one wants to do is have fun. Nothing wrong with fun, though. ??

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp