Ames Housing: Predicting House Prices with Machine Learning
Sundail Real Estate is dedicated to help home buyers purchase their dream house. We have built a tree based model to help forecast the sales price of a house by focusing on the most important features that influence the price for Ames Housing. The dataset used was from kaggle's prediction competition, House Prices: Advanced Regression Techniques.
With these steps we were able to make accurate predictions:
1) Load data sets and packages
2) Multivariable Analysis
3) Clean Data and Impute Missing Data
4) Feature Engineering
5) Modeling and Predicting
With a correlation matrix heatmap we were able to have an idea of the correlations some features have with the Sales Price.
The first floor of a house indeed has a linear correlation with the Sales Price
It was not a surprise to see that features regarding a garage had a high correlation with the Sales Price.
We also noticed the OverallQual of a house has a high correlation with the Sale Price. This feature rates the overall material of the house from 1-10, poor to very excellent.
After feature engineering we were able to build a random forest regressor model to give us the best predictions. From sklearn's model selection library we imported grid search cross validation to help us find the best parameters for our tree based model.
With a training score of 97.8% and test score of 87.3% we predicted the most important features towards a sale price are the overall quality, garage area, and square footage of the basement, first floor and second floor of a house.