Investment Opportunity in Ames, Iowa
Home ownership has long been touted as part of the American dream. Unfortunately for those living in coastal cities, such as New York or San Francisco, home ownership is an increasingly distant dream. Luckily there are more affordable options in Ames, Iowa.
Why Ames? For those who are unfamiliar with the city, Ames has been named the best college town in America and CNN Money's 9th best place to live. It's close to a lot nature reserves, and is not too far from bigger cities like Minneapolis or Chicago. Because it is a university town, you can expect no shortage of incoming students or faculty looking looking to move to Ames, making property purchase in Ames the ideal rental investment.
I decided to look at real estate investment opportunities from two perspectives:
- A home buyer
- A real estate developer
The data I used for my analysis is from Kaggle.
Before starting my analysis, I needed to fix the issue with missing data. At first glance, many features have missing values, but after checking the data dictionary, these missing values are due to the home not having that feature. For example, missing pool quality simply means that home does not have a pool, and missing value for fence, similarly means there is no fence. For these values, I simply changed the missing value into an appropriate text value stating there is no pool, no garage or no fence. For the remaining variables that were missing at random, I chose mode imputation for categorical variables, and median imputation for the numerical variables.
Data Exploration and Feature Engineering:
One of the things I tried to do was to consolidate information into as few variables as possible. The data provided five different variables reporting porch related space. I combined this into one variable: TotalPorchSF. Many of the variables are related to each other. For example: high garage capacity (number of cars) usually indicates large garage size. Because of this I only kept the number of cars for my analysis. I removed a total of 16 variables that either had low predictive power, or showed a duplication of information.
Another issue was outliers. I identified two homes that were priced very low in comparison to their size: the two data points observed at the bottom right in the plot below. I removed a total of 5 observations from the training set. The remaining three were removed as they represented extremely rare external materials. There simply was not enough data to tell us about the quality reagarding those materials.
The distribution of the sale price of homes in Ames, Iowa is right skewed. Because of this I took the log of the sale price. As you can see, it is closer to being normally distributed. I used the log of the sale price as my predictor variable when building my Machine Learning models.
I tried a variety of linear and tree-based models including LASSO, Ridge, Random Forest, and Gradient Boost models. Out of these, I had the best results using tree-based models. Ultimately, I chose Random Forest for my analysis as it was the best performing when considering speed and accuracy.
One of the benefits of the Random Forest model is that it does not overfit the data even when increasing the number of trees. However, the problem with having a Random Forest model with more trees is that it slows down the computation speed, with little return on results. I chose 500 trees for my model, as I did not see an improvement in accuracy when increasing the number of trees of past 500.
|# of trees||Test RMSE||Kaggle Score|
The second model is trained using 12 different variables related to the materials and style of the home. The purpose was to identify areas of focus from a real estate development perspective. I was able to fit a Random Forest with a RMSE of .243, which is pretty good considering it eliminated many of the most important variables as they don't relate to home construction directly.
Below are the most important features for the home sale price model, and construction model, respectively.
Conclusion and Business Recommendations:
Based on the results of my model above, I recommend the following investment plan for home buyers:
- Invest big, in terms of square feet (not price)
- Improve quality and condition of the home through renovations
- Only consider homes with fireplaces
- Boosting overall home quality from 6 to 10 increases home value by $21,900 (on average), the following specific renovations are the most recommended:
- Improving kitchen quality increases home value by $11,100
- Improving fireplace increases home value by $3,800
- Improving basement increases home value by $7,300
I recommend the following home design for construction considerations:
- Poured concrete or wood foundation
- Stone masonry veneer
- Two story home with built-in garage
- Single-family detached home, or townhouse end unit
- Cement board or vinyl siding exterior
Finally, I recommend both buying and selling in the summer as there are the most options along with more people looking to buy. Real estate developers should plan construction accordingly with this timeline in mind.
One of the issues with my model is that it tends to overpredict cheaper homes, and underpredict more expensive homes. Also I will work on consolidating some of the related features in my main model.