Using Predictive ML to value Basement Renovation Projects
[__working draft__ ]
The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
What is the Goal of this Project?
This project focuses on developing a Machine Learning model that to predict how a basement renovation project will affect the value of a house.
This Project uses the classic Housing Dataset from Ames Iowa.
It includes individual residential property in Ames, Iowa from 2006 to 2010. A total of 2930 observations and a large number of explanatory variables (23 nominal, 23 ordinal, 14 discrete, and 20 continuous). The data came directly from the Assessor’s Office in the form of a data dump from their records system.
After removal of extraneous variables, 80 variables remained that were directly related to property sales.
The data comprises many attributes of a property that a potential buyer could factor into their decision to buy a house and how much they are willing to pay for it.
Who is it For?
This information would be valuable to several categories of potential stakeholders:
- current homeowner - who is trying to be pragmatic about how much to spend on the renovation.
- Prospective Buyer - who intends to flip the house by buying it, changing or upgrading aspects of the home and then reselling it.
- Contractors or Renovation company - wanting to understand project and home context when providing project bids and estimates
Exploratory Data Analysis
Before Developing the ML model we want to understand details about the original dataset.
There are many details about the raw data tht could be interesting to explore, however we look at the data now with the goal of finding information helpful in developing the ML Model.
Some features of note:
Neighborhood has a clear relationship to Price
Exterior Quality has another clear positive relationship with Sales Price.
However, 'Exterior Condition' displays surprising plateau for anything above a Typical/Average rating.
We do a simple correlation analysis, to check how strongly each of the original features correlates with the Sale Price of the house.
In developing the predictive model the first goal is to accurately predict the price of a house. So we are interested in all the features that we can use. Second, because we are investigating basement renovations, we will look specifically for the importance of features related to basement renovations. Amount of Basement Finished Square Feet is shown in green, and Year Remodeled in light Blue
The Predictive ML model is based on an ensembling method known as a voting regressor. This takes the predictions from several model types and linearly combines their individual predictions to produce a final predicted Sale Price.
The Data Preprocessing and Model Structure are summarized by the two following graphics.
More Details about the preprocessing, feature engineering, and model training can be found in the technical appendix section.
The following graph describes the accuracy of our model for different test observations
The Performance of our model on the training and test sets. Although the stacked model slightly outperforms the Voting Regressor on the RMSLE, this difference is small and the Voting Regressor has a lower bias. So, the final implementation for our prediction model uses the Voting Regressor.
Now we can look at a simple visualization comparing the Prediction of our model against the actual training data Sale Price. Upon first inspection of this graph, our approach looks be an accurate model for the entire collection of training data.
One final verification of the integrity of our model is the q-q normal plot which indicates residuals for the Voting Regressor model are uniformly distributed, the desired outcome.
Inspecting the ML Model
After the model is trained and tested we can use Permutation Importance to determine how different features impact the house price.
The Role of Basement Attributes
Looking at the specifics of permutation importance of Basement-related features reveals some clear trends:
- Amount of Unfinished Basement has Negative Impact on Value
- Amount of Square Feet Finished in the Basement Increases Value
- Year of Remodel affects the price, but the behavior of the relationship is not quite linear
Now, let's take a look at the Partial Dependence Plot for two features: Total Basement Square Feet and Basement Square Feet that are finished.
The colorscale on the right is related to the sale price. The larger the numbers on the color scale imply higher house value.
Looking at the red circles indicating scenarios where the all the square feet in a basement are finished, and comparing them to the black circles at the bottom with the same size of basement but less of it finished, we can see that the value of House is Higher if the Entire Basement is Finished.
This is a result we would expect based on intuition but it is a good validation for our model. Next, we will quantify this result.
How Much Value does a Renovation Add?
Which Houses will serve as our candidates for a basement remodel?
- greater than 1,000 square feet
- more than 95% unfinished
- Condition of ‘typical’ or better (no moisture issues)
- Ceiling Height: 90 - 100+ inches (standard 8ft ceiling is 96 in.)
After the Remodel
- Entire Basement is Finished
- Finish Type is ‘Good Living Quarters’ - the highest * encoded value
- Remodel Year is updated to most recent year in data
Remodeling Results as Percent
Remodeling Results in $
Additional Notes On the Outcomes
- median and mean increase of house value:
- a small number of outlying results (such as negative change to house value), that merit further investigation