Using Data to Predict House Sale Prices in Ames, Iowa
The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction
Ames, Iowa is the college town of Iowa State University. The Ames housing data set consists of about 2500 houses sale records between 2006−2010 Sale Price of Houses in Ames, Iowa. which includes Detailed information about the house attributes and their Prices.
It has 81 features which were describing the house. It focuses on the quality and quantity of many physical attributes of the property. Like number of rooms, how big is the lot, year built, overall quality, exterior quality and etc.
The Aim of the project :
- Perform descriptive data analysis to gain business insights.
- Build descriptive machine learning models to understand the local housing market.
- Build predictive machine learning models for the local house price prediction.
Results and Findings:
Features Correlation To Price
The following features have higher correlation score with the sales column (above 0.5) :
- OverallQual
- GrLivArea
- ExterQual
- KitchenQua
- TotalBsmtSF
- 1stFlrSF
- GarageCars
- GarageArea
- BsmtQual
- YearBuilt
- FullBath
- FireplaceQu
- GarageYrBlt
- YearRemodAdd
The housing Price increases as Quality increases in general
People tend to buy more houses during the second quarter of the year ( May-August ) . There also might be some correlation to Temperature degrees and School off period.
Average Sale Price and Neighborhood tend to be correlated.
The sale price increase as the ground area increases. Also, the overall quality shows the quality of the materials used in the construction of the house has a direct relationship with the price.
Sale Price and Total square feet of basement area. As the basement area increase the sale price increases. and for areas above 2000 sqrt-ft the BLQ type is to dominant in term of sales.
Lot size in square feet does not affect the price of the house, I general most houses have small land areas .On the Opposite the external quality affects the price directly.
The majority of buyers buy one-story or two-story homes with one family building type.
Buyers are not interested in whether the house has a swimming pool or not, and this is evident through the years of sale.
Customers tend to buy house with more Heating and Central Cooling houses/units.
The Modeling:
All Features Approach:
We used all features + some new derived Features.
Partial Features Approach:
We chose the top 15 features that are correlated with sales prices.
Models Comparison:
Models Comparison Score
All Features VS Partial highly correlated
Geo Area of Ames:
Future Work:
Apply Different type of Modeling:
- Try to apply more Modeling to the data and check if its prediction accuracy is better.
Enhance existing models:
- Try to filter the features used using the feature engineering methods to enhance the accuracy score.
Enhance mapping the data on the map:
- Tune and enhance mapping the locations to the map , by increasing the number of points on map and coloring them by price values.