Data Study on Property Pricing Trends - Brooklyn, NY
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Data of property prices in Brooklyn show interesting trends. Different contributors make the cost of a property higher or lower including; the location, lot size, and other factors. The question for people seeking to invest in property is: how to find the best value?
1. Renovations Toll of Sale Price
The first step is to examine which years buildings were built and if they were renovated once or twice. There is a spike in the 1930s, which may be due to Roosevelt’s Presidency. To promote the American economy during the Great Depression, Roosevelt launched various building campaigns. It is possible that the trends we see are the result of some of Roosevelt’s policies. After that big spike, there were some other times when building and renovating occurred, but not to the same extent.
In the past year, some new buildings were constructed, however, without the space to expand, as was back in the 1930s, it is impossible to achieve the same scale of development. Today, one finds that renovations appear to occur at about the same rate as building. Consequently, in years of more building, there is also more renovating.
In the charts presented here, the analysis is based on the sale price and renovation. One graph shows the year the building was built as well. These two graphs show a consistent relationship.
From these graphs, we can conclude that, for the most part, very expensive buildings did not have any renovations. Renovations of newer buildings do not appear to have much impact on the increase of the cost of one building relative to another.
2. Sale based on Location
Sometimes sales are not only based on the size of the property but where the property is located as well.
This map is divided up based on zip codes throughout Brooklyn, NY. It shows that the areas near Manhattan have the highest prices. It could be only due to the proximity to Manhattan or it could be due to the fact that the competition of buildings cause the properties to be higher in price, as well. In the middle, there are also some higher priced properties, but those are generally houses with high values.
3. Tax class based on Location
Different kinds of properties/buildings have different tax classes. Tax classes 1-3 are for houses, while tax class 4 is for buildings, hotels, factories and stores. Tax class 2 means that the house has more than 3 units in it.
The bar graph below shows the different tax classes based on the price range of the property. I needed to subset some of the data because some were included with the price range of 0-50e6 that you could not whitin see any of the other price ranges. Most properties are either in group 4 or 2. It seems like some buildings, factories or stores are cheap because they fall into the cheapest tax range. This shows that the very expensive buildings are really outliers.
The second graph is a map from Leaflet. Its colors represent the different tax classes based on location/zip code. Tax class two seems to be in every location. Interestingly, tax class one shows up in the more expensive areas. Tax class four is in basically every location, which makes sense because there are stores, hotels and buildings in every location. Tax class three only shows up in one spot, which makes it useless for comparisons.
From here we can see, tax classes make some difference in the price of a property, though not a very substantial difference. While it is interesting to see where the tax classes are located and what price range is the highest in each tax class, it does not offer great insight to the property price differences.
About the Data Set
I acquired a dataset from Kaggle that had different properties in Brooklyn. I then grouped them by zip code so I could use Leaflet to graph the data points. It helped visualize the data more clearly.
If a row had more than 3 N/As in it, it was removed. This did not affect the dataset because by removing the N/A's there were still more than 10,000 rows. After the N/A's were removed, the first 10,000 rows were used in the analysis.
I wish to expand on and use different parts of the data set that I did not yet get to use. There were 40 columns in the dataset and I only used a few of them.
Also, I would like to split the dataset between "buildings" and "houses" in order to understand more about trends. By looking at both types of properties the data can get skewed since buildings are much more expensive than houses. It would be interesting to understand if houses were more expensive in places with fewer buildings. which would change how the leaflet graph is presented.
In conclusion, there are many factors that contribute to the price of a house/building. Certain factors are more important than others. From this analyses, we could help people see trends in areas so that they can make an informed decision about investing in a particular area that is currently cheaper. On the other hand, they may want to build somewhere that is the cheapest in the hopes that others might follow them and develop teh whole area. Big cities don't develop overnight, but if people are willing to invest for the long term, they can see substantial returns.