Data Analysis on Real Estate in Ames, Iowa

Posted on Dec 6, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Ames has an interesting housing market because data shows there has been some expansion going on in recent years. Newer neighborhoods tend to be on the outskirts of the city, but there have also been a good number of renovations as well.  

Data Analysis on Real Estate in Ames, Iowa
Map of all the neighborhoods in Ames, Iowa

Data

Recent Development

As you can see above, all the neighborhoods surround Iowa State University (ISU). Oldtown is one of the oldest neighborhoods in Ames, but it has some very high renovation activity. It seems that Ames, Iowa is in the process of becoming a more modern housing market. This is one of the recent developments that make Ames an up-and-coming housing market.

 

Data Analysis on Real Estate in Ames, Iowa
Graphic that shows that distribution of houses by the year it was built.

The graphic above shows that the more recent houses were built on the outskirts of Ames which suggests is expanding outward. That said, there are a few houses being built around the airport which might mean that the city wants to make it easier for people to travel into Ames. This could lead to increased business activity in the downtown area. 

Year of House Built 

Data Analysis on Real Estate in Ames, Iowa
Distribution of renovations

The above graphic shows the year the house was built (if it was not remodeled) or the year of the most recent renovation (until 2011). This shows that there is high renovation activity in the neighborhoods with the oldest homes. There seems to be a higher concentration of renovation activity near the downtown area; this might increase economic activity in the business districts.

Renovations 

Number of Renovations since 1995 by Neighborhood

This facet graph shows the number of renovations since 1995 by neighborhood. There are only four neighborhoods on an upward trend. This suggests that the housing market is expanding by building new houses. Renovations may not be the primary way for expansion in Ames.

However, this means that there is room for investment in Ames in the area of housing renovation. We can see that Ames is expanding by building more modern homes near the airport, downtown, and the outskirts of town. The property value of these newer homes will be improved if the older homes in these areas were renovated to match the increasing quality of homes being built. So, if you wanted to invest in the housing market of Ames by renovating older homes, where would you start? Are there any neighborhoods that have a high probability of a substantial return on investment?

 

Neighborhoods

Renovations by Neighborhood

When doing data analysis, the number of observations is important. When the sample size is small, it is difficult to make reliable conclusions from the statistics of the data because it is usually difficult for small sample sizes to approximate the population. Therefore, we will focus our analysis on the neighborhoods in the top six for the number of observations, namely: Old Town, Northwest Ames (NAmes), College Creek (CollgCr), Northridge Heights (NridgHt), Somerset, and Gilbert.     

 

Sale Price Distribution

Sale Price Distribution

From the boxplot, we can see Old Town has a lot of outliers. Why? It is most likely because Old Town is one of the oldest neighborhoods in Ames and there are a lot of new houses that were built. These new houses were probably sold for higher prices than the older homes. Also, the high number of outliers shows that Old Town's housing landscape is being updated. It might be in the city's business interest to update Old Town seeing that it is near downtown and other business districts. 

Northridge Heights has the biggest distribution of sale prices. Northridge Heights is in the northern part of Ames. As stated before, Ames is expanding; Northridge Heights is a byproduct of this expansion process. Thus, the houses in this neighborhood are more modern, hence more expensive. The 25th percentile of the sale price in Northridge Heights is greater than the median sale price of each of the other five neighborhoods. This highlights that newer houses tend to be more expensive than older houses. Inflation is not the cause of the price increase because all the homes in this dataset were appraised in the same time period.

Selected Features

In the dataset, there were well over 100 features and so, to avoid the curse of dimensionality, some feature selection needed to take place. Lasso regression worked very well in this case, along with some judgment calls, to come up with the following set of features to analyze:

 

The distance features in blue were created with GeoPy in Python.

R-Squared Model

These were the features that were used in the descriptive modeling process. Our team wanted to understand the importance of each feature on the neighborhood level, so we ran Multiple Linear Regression for each neighborhood individually. Here were the R-squared values for each model:

 

The top graph shows the R-squared value for each model and the bottom graph shows the number of observations for each neighborhood. Notice that the neighborhoods with the highest R-squared tend to have the lowest number of observations, although there are some notable exceptions. Old Town has a high number of observations, but a relatively low R-squared value. How can this be? It is most likely because Old Town has a high number of outliers, as can be seen from the previous boxplot, therefore the model did not make a great fit. 

If you were an investor looking to invest in the housing market in Ames by renovating houses, =what would be your strategy? Based on the multiple linear regression models, we decided to focus on the neighborhoods in the top 6 for the number of observations and give business insights from their regression coefficients. Here is what we found:

 

Old Town

We start with Old Town. Old Town has a number of old homes, but is in the process of being renovated. As with many old places, it is important to modernize the necessities without losing the historic aspects of the property. Older homes usually have subpar insulation, by today's standards, so an improvement in insulation and heating quality would increase the values of these homes.

As you can see, the coefficients for Gilbert and Northwest Ames were very similar, so we decided to give business insights for both simultaneously. Also, buy properties within these neighborhoods that are closer to the airport and renovate them because the airport distance seems to have a high coefficient.

College Town

ISU is a college town, but families with no college students might prefer to live away from ISU. Also, college students usually need cheaper homes, thus the most expensive houses tend to be further away from ISU. Therefore buying homes under the median price that are further away from ISU should bring a good return on an investment after renovation. 

Northridge Heights is one of the newer neighborhoods in Ames, which might explain why it has a more expensive housing market when compared to the other neighborhoods. Also, the more expensive house tend to be located further away from ISU. Therefore, one should try to find houses cheaper than the median price of 300k in the northern part of the neighborhood.

In Somerset, the median house price is 223k, so this advice is similar to the others: buy cheap and renovate to increase the sale price beyond the median price. 

Conclusion

This concludes the Ames Housing project! This work was done by me, Brian Kuo, and Evan Kiolbassa.

About Author

Mark Carthon

Data Scientist with 7+ years of work experience and a strong mathematical background. Passionate about applications of Machine Learning and Deep Learning in industry.
View all posts by Mark Carthon >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI