Predicting Housing Prices in Ames: A Machine Learning Project
Introduction
In this project we use machine learning techniques to attempt to predict housing prices in Ames, Iowa. The data comes from the Kaggle competition "House Prices: Advanced Regression Techniques". The team consists of Gregory Brucchieri, Billy Fallon and Adrian Phillips-Samuels. We are The Fighting Mongooses.
The Data
The data contained a training set with 1460 observations of 79 features and the target variable Sale Price. The features were a mix of 28 continuous variables and 51 categorical. 34 features contained missing values. We used a variety of techniques to impute these values, usually drawing from the variables and their description, which can be seen in the final code. We treated categorical variables as ordinal whenever possible. We observed 2 outliers with an unusual price/square footage ratio and chose robust scaling to account for these values. A number of additional features were engineered and categoricals were replaced through one hot encoding.
The Models
A number of models were used to explain and predict the sales price, including Random Forest, Gradient Boosting, XGBoost and linear modeling. In the end a weighted ensemble of Random Forest, Gradient Boosting and XGBoost provided our best model. We recieved a Kagle Score of .1232.
All code and results can be seen here, in our github repo. The final prediction code is in the Project_Consolidated.py file.