NYC Real Estate Market Data Analysis and Visualization

Avatar
Posted on Mar 28, 2018

Introduction

As a previous real estate consultant, I've found that often, the existing real estate data providers don't break down property classes in each neighborhood in New York City. And to do so, one would require a large amount of data with an efficient tool that can handle the amount of data; excel often falls short when the size of the data gets bigger. This project is designed to achieve this task by visualizing and charting the sale transactions in New York City, categorized by properties and neighborhoods, using R and Shiny. Data I used the NYC Rolling Sales data to do the data analysis. The reason why I want to choose this dataset is as follows:

  • 50,000+ rows for each file - hard to navigate with excel
  • Constantly updated every month, can track back to 2003
  • Rich and factual information but with messy format and noise
  • Provides information that industry reports don't provide

Business Values

The more important question is how do you create business values from the dataset, how do I differentiate from the tools that other data websites provide?

The following is a simple comparison between several widely used data providers in the industry and the project to demonstrate where the business values are.

StreetEasy

  • Only focus on residential
  • No past transactions on a property level
  • No trend graphs

The Project:

  • focuses on every major property class
  • With past transactions on the property level
  • With trend analysis

Real Capital Analytics

  • They only provide records that go back to limited number of years and it doesn't provide trend analysis
  • It doesn't provide unit level data

The Project:

  • Can be extended several years back
  • Provides unit-level data

When people think of New York, they think of Manhattan, Queens and Brooklyn. And that's where the brokerage firms mainly cover for their industry research reports. However, the Bronx and Staten Island real estate markets are not covered. And those two borough's real estate markets have performed really well in the past year. 

Further Development

  • Combine more historical data dating back to 2003
  • More cleaning and more real estate metrics
  • Predict neighborhoods that have investment potential in the future

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp