Data Study on Property Pricing Trends - Brooklyn, NY

Posted on Dec 4, 2018
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Data of property prices in Brooklyn show interesting trends.  Different contributors make the cost of a property higher or lower including; the location, lot size, and other factors. The question for people seeking to invest in property is: how to find the best value?


1. Renovations Toll of Sale Price

The first step is to examine which years buildings were built and if they were renovated once or twice. There is a spike in the 1930s, which may be due to Roosevelt’s Presidency. To promote the American economy during the Great Depression, Roosevelt launched various building campaigns. It is possible that the trends we see are the result of some of Roosevelt’s policies. After that big spike, there were some other times when building and renovating occurred,  but not to the same extent.

In the past year, some new buildings were constructed, however, without the space to expand, as was back in the 1930s, it is impossible to achieve the same scale of development. Today, one finds that renovations appear to occur at about the same rate as building. Consequently, in years of more building, there is also more renovating.
Data Study on Property Pricing Trends - Brooklyn, NY

In the charts presented here, the analysis is based on the sale price and renovation. One graph shows the year the building was built as well. These two graphs show a consistent relationship.
 Data Study on Property Pricing Trends - Brooklyn, NYData Study on Property Pricing Trends - Brooklyn, NY
From these graphs, we can conclude that, for the most part,  very expensive buildings did not have any renovations. Renovations of newer buildings do not appear to have much impact on the increase of the cost of one building relative to another.  

2. Sale based on Location

Sometimes sales are not only based on the size of the property but where the property is located as well.

This map is divided up based on zip codes throughout Brooklyn, NY. It shows that the areas near Manhattan have the highest prices. It could be only due to the proximity to Manhattan or it could be due to the fact that the competition of buildings cause the properties to be higher in price, as well. In the middle, there are also some higher priced properties, but those are generally houses with high values.

3. Tax class based on Location

Different kinds of properties/buildings have different tax classes. Tax classes 1-3 are for houses, while tax class 4 is for buildings, hotels, factories and stores. Tax class 2 means that the house has more than 3 units in it.

The bar graph below shows the different tax classes based on the price range of the property. I needed to subset some of the data because some were included with the price range of 0-50e6 that you could not whitin see any of the other price ranges. Most properties are either in group 4 or 2. It seems like some buildings, factories or stores are cheap because they fall into the cheapest tax range. This shows that the very expensive buildings are really outliers.

The second graph is a map from Leaflet. Its colors represent the different tax classes based on location/zip code. Tax class two seems to be in every location. Interestingly, tax class one shows up in the more expensive areas. Tax class four is in basically every location, which makes sense because there are stores, hotels and buildings in every location. Tax class three only shows up in one spot, which makes it useless for comparisons.

From here we can see, tax classes make some difference in the price of a property, though not a very substantial difference. While it is interesting to see where the tax classes are located and what price range is the highest in each tax class, it does not offer great insight to the property price differences.

About the Data Set

I acquired a dataset from Kaggle that had different properties in Brooklyn. I then grouped them by zip code so I could use Leaflet to graph the data points. It helped visualize the data more clearly.
If a row had more than 3 N/As in it, it was removed. This did not affect the dataset because by removing the N/A's there were still more than 10,000 rows. After the N/A's were removed, the first 10,000 rows were used in the analysis.

Future Work

I wish to expand on and use different parts of the data set that I did not yet get to use. There were 40 columns in the dataset and I only used a few of them.
Also, I would like to split the dataset between "buildings" and "houses" in order to understand more about trends. By looking at both types of properties the data can get skewed since buildings are much more expensive than houses. It would be interesting to understand if houses were more expensive in places with fewer buildings. which would change how the leaflet graph is presented.


In conclusion, there are many factors that contribute to the price of a house/building. Certain factors are more important than others. From this analyses, we could help people see trends in areas so that they can make an informed decision about investing in a particular area that is currently cheaper. On the other hand, they may want to build somewhere that is the cheapest in the hopes that others might follow them and develop teh whole area. Big cities don't develop overnight, but if people are willing to invest for the long term, they can see substantial returns.


About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI