NYC Real Estate Web Scraping Project

DongHwi Kim
Posted on Oct 21, 2019

Introduction

  • Since the housing bubble that occured in 2008 real estate has been a popular topic than ever whether it be positive or negative.  
  • This project has been made to see the current housing market of NYC and its surrounding neighbors.  It will take into account the different factors that might cause price differences between the areas and will try to access the current housing market of NYC.
  • For this project I used Realtor.com to get my data for the project.
  • For this project both Scrapy and Selenium were used.

Question

  • What are the average housing prices in and around NYC?
  • Is the real estate market of NYC a good investment?
  • Can the new upcoming generation (millennials) purchase the real estate based on the different factors, or what income would be needed to make the purchase feasible.

To Answer the Question

In order to understand the real estate market of NYC, I used both scrapy and selenium to scrape realtor.com.

Scrapy was used at first to scrape the data as it was the fastest scraper but ran into problems when realtor detected that it was a bot and started banning the ip addresses.  To avoid this problem I used proxy pool in order to rotate through lists of ip addresses to avoid being banned. But as the sample size was too large, all the ip addresses ended up being banned which made me then resort to using Selenium.

To answer the question I scraped the houses on sale in Manhattan, Brooklyn, Queens, and Jersey City based on the variables of property type, bedrooms, bathrooms, and sqft

Price Analysis (Manhattan)

  • From the graphs and charts it is safe to assume that there is a concentration of listings around $1 million which suggests that they are also the most accessible due to the amount of houses available.
  • The $1 million mark also places most listings up to 4 bedrooms within its range.
  • In order to purchase an average size home at either $1 million or $2 million which includes half of the listings one can see the payments below of what to expect for mortgage
  • 30% rule was used to calculate the annual income(in parentheses) necessary to afford the houses below(excluding down payment)
  • $1 million home = $50,000 down payment with $6,550 monthly for 30 years ($262,000)
  • $1 million home = $200,000 down payment with $5,427 monthly for 30 years ($217,080)
  • $2 million home = $100,000 down payment with $12,867 monthly for 30 years ($514,680)
  • $2 million home = $400,000 down payment with $10,621 monthly for 30 years ($424,840)

Price Analysis (Brooklyn)

  • The median price of house in Brooklyn is $1.15mil but we see concentration of houses under $1 million
  • What is interesting is the negative slope of the $ per Sqft as Sqft of house increases which means you get more for your dollar the bigger the house gets.
  • In order to purchase an average size home at either $750k or $1 million which includes half of the listings one can see the payments below of what to expect for mortgage
  • 30% rule was used to calculate the annual income(in parentheses) necessary to afford the houses below(excluding down payment)
  • $750k home = $40,000 down payment with $4,958 monthly for 30 years ($198,320)
  • $750k home = $150,000 down payment with $4,129 monthly for 30 years ($165,160)
  • $1 million home = $50,000 down payment with $6,550 monthly for 30 years ($262,000)
  • $1 million home = $200,000 down payment with $5,427 monthly for 30 years ($217,080)

Price Analysis (Queens)

  • The median price of house in Brooklyn is $700k but we see concentration of houses under $500,000
  • What is interesting is the negative slope of the $ per Sqft as Sqft of house increases which means you get more for your dollar the bigger the house gets.
  • In order to purchase an average size home at either $700k or $400k which includes half of the listings one can see the payments below of what to expect for mortgage
  • 30% rule was used to calculate the annual income(in parentheses) necessary to afford the houses below(excluding down payment)
  • $400k home = $20,000 down payment with $2,760 monthly for 30 years ($110,400)
  • $400k home = $80,000 down payment with $2,311 monthly for 30 years ($92,440)
  • $700k home = $35,000 down payment with $4,655 monthly for 30 years ($186,200)
  • $700k home = $140,000 down payment with $3,869 monthly for 30 years ($154,760)

Price Analysis (Jersey City)

  • The median price of house in Brooklyn is $645k with even distribution of listings from $300k to $900k 
  • What is interesting is the positive slope of the $ per Sqft as Sqft of house increases which means you get less for your dollar as the house gets bigger.
  • In order to purchase an average size home at either $700k or $400k which includes half of the listings one can see the payments below of what to expect for mortgage
  • 30% rule was used to calculate the annual income(in parentheses) necessary to afford the houses below(excluding down payment)
  • $400k home = $20,000 down payment with $2,760 monthly for 30 years ($110,400)
  • $400k home = $80,000 down payment with $2,311 monthly for 30 years ($92,440)
  • $700k home = $35,000 down payment with $4,655 monthly for 30 years ($186,200)
  • $700k home = $140,000 down payment with $3,869 monthly for 30 years ($154,760)

Price Analysis Overall

  • Judging from this graph alone it seems that Queens has the lowest price per sqft which translates to being the cheapest city out of the 4 above. Also the lowest average price for 1 bedroom houses.
  • What’s also interesting to note is that the price per sqft chart had a negative slope for Brooklyn and Queens while it had a positive slope for Manhattan and Jersey City.  Which means that the as the house gets bigger the price rises exponentially.
  • Also it is important to note that across all areas, co-ops had the cheapest price per sqft.

Is it a good investment?

  • Buying real estate property as an investment would be good only if the rental price of the purchased property would be higher than the mortgage payments plus fees required to maintain the property.
  • Given its historical trends the real estate market and prices has seen an overall increase the past 30 years.
  • But compared to the areas around NYC, the NYC real estate market has been stagnant the past 2 years which makes the outer areas a better return on investment as the rental prices are cheaper as well which makes it more attractable to the general population who has income of sub $100k.
  • Assuming one can find a tenant who can cover the mortgage and the maintenance cost of the prices one can able to have a steady increase of +7% per year in places like jersey city according to which translates to a $35,000 profit every year for a $500k home. (http://www.noradarealestate.com/blog/jersey-city-real-estate-market/)

Income needed vs Average Income

  • The average income of NYC in 2017 was found to be around $65,000 which is deeply disproportional to the income needed to safely pay for a mortgage in and around NYC.
  • This represents the inability of millenials to purchase homes in the recent years as studies found an all time low for home buying for millennials as millennials are forced to purchase homes later and later as current income is not sufficient to purchase homes.
  • Even the homes that have the best prices are usually out of reach as they are co-ops and have high barriers of entrance as they require a minimum of at least 20~25% to  somewhere being 50% down payment which is hard to do as that would translate having $150k + in cash.

Worth it to live in Queens or Jersey City?

  • In order to deduce if it is worth it to live in either Queens or Jersey City assuming the work is in manhattan, one must first calculate factors from commute time, the cost of commute, money saved on house, and the worth of the individual's time in terms of his hourly worth.
  • Commute time and cost (Jersey City)
    • Jersey City (Newport) to NYC = 25 mins on average x 2 = 50 mins per day x 20(days in month) = roughly 20 hours
    • Monthly PATH = $89
  • Commute time and cost (Queens)
      • Queens to NYC = 30 mins on average x 2 = 1 hour x 20 days = 20 hours
      • Monthly Lirr = $234
      • Monthly MTA = $121

On average one saves around $2000 commuting from either Jersey City and $3000 from Queens. In order to upset this difference one must have a 20 hour worth that exceeds $2000 or $3000 to make it worth living in NYC which translates to $100 or $150 an hour. Which is someone with an annual salary of $210,000 or $300,000. Which fits the initial assessment of needing ~$217,000 salary to afford a house in Manhattan

Future work

  • I would also access the quality of each of the homes, such as the year it was built, amenities, kitchen appliances etc. in order to better analyze the prices of homes.
  • The location of the houses on listing would be split up into more smaller regions rather than entire county as this would give a more accurate data in what affects the prices. 
  • Also scrape the rental data to compare the prices of houses to its rental prices.
  • This can be applied to machine learning to take all the factors and variables to make the best financially logical decision on where and when to buy houses.

About Author

DongHwi Kim

DongHwi Kim

James (Dong-Hwi) Kim is NYC Data Science Fellow with a Bachelor's Degree in Applied Mathematics and Statistics from Stony Brook University. Before coming to NYCDSA James was a CEO and founder for a startup where he found a...
View all posts by DongHwi Kim >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp