Finding a dream house? Data help!

Posted on Aug 22, 2016

The Portland, Oregon metro area has a lot to offer. A 90 minute drive west brings you to the beach. In the opposite direction there's a snow-capped mountain. It has a thriving food and wine scene, a burgeoning tech sector, and temperate weather. An increasing number of people would like to move to this metro area. Perhaps the greatest concern for these new residents is finding housing. Data can help.

Records for house sales on redfin.com over the past three years in the neighboring cities of Portland, OR and Vancouver, WA were scraped by using the Python packages BeautifulSoup and Selenium. Around 20,000 sold house records with 15 variables are extracted, including zipcode, address, location, selling price, days on redfin, beds and baths, size, price per sqft, and etc.  The public school ratings were gathered by scraping the website greatschools.org according to zip code. A screenshot is below.

Screen Shot 2016-08-15 at 7.47.17 PM-18

1). When is good time to buy a house?

We see that in the Portland metro area the median house price per square foot increased 35% from Aug 2013 to Aug 2016.  However, the house transaction volume was increasing during the past three years.

price_sqft_portland_median

The house transaction volume is strongly seasonally dependent - it is highest in summer and lowest in winter. The new school year starts in September, so the real estate market is influenced by people looking to move within a desirable school district. Since offers need one or two months to be processed, the data shown here is delayed. Agreements for house sales usually happen in the Spring. Therefore, this is the hottest season for real estate market.

season_dependence

2). Where are the best locations?

Vancouver, WA and Portland, OR are the twin cities in the metro area. However, which of these is the best place to live?  In general, the living cost in Portland is more expensive than Vancouver as shown in the boxplot. The red boxes show the price of each zip code area in Portland and the green boxes show those in Vancouver. In seven out of ten zip code areas in Portland the median price per unit area is above $200/sqft. The median price per unit in all the areas of Vancouver is lower.

Port_Vanc_three_years

3). Which area has significant appreciation?

During the past three years, the price growth rate per unit area demonstrates the potential appreciation capability. The top 10 zip codes are in Portland and the bottom 10 zip codes are in Vancouver. In this regard, some areas in Portland, such as 97202,97206,97220, have gradually increasing house prices. Buying a house in these areas would conceivably be more rewarding in the long run. In comparison, the price appreciation in Vancouver is relatively weak.

house_price_area-36

4). Are the public schools good?

The public school ratings for elementary schools, middle schools and high schools were webscraped. The ratings were averaged by zip code since the house data was segmented in this way. The school ratings are positively correlated as shown in the plots.  Especially for elementary and middle school ratings the correlation coefficient is 0.9083869, which is very high. Usually good elementary, middle, and high schools are bound together and vice versa.

Rplot-38

As shown in the plots, the housing cost in Portland is higher than that in Vancouver for the same education quality. Also, as school ratings increase, so do house prices.

price_elementary_rating_oneyear_Portland_

price_high_rating_oneyear_Portland

5). Finally: Picking specific house features in your chosen area

Now you know when and where to narrow your target for house hunting based on price, location, and school district. What specific features do you care about in a house? The following plot shows the correlation relationships among those numerical features. As previously explained, school ratings among elementary, middle, and high schools are highly correlated. The price, house size (sqft), number of bedrooms, and number of bathrooms are positively correlated. Larger houses, more bedrooms, and more bathrooms lead to higher house prices. In general, one would expect smaller houses to be cheaper than larger houses. However, when one takes into consideration the price per square foot, this relationship doesn’t really hold.

Rplot06

As observed, for the house in one zip code and within a short time period (August 2015 to August 2016),  house price is positively correlated with house size (sqft). However, the price per unit square foot shows a complex trend. When the size is less than 2000-sqft, the price per unit square foot drops. It is constant from 2000 to 5000-sqft until it slightly decreases after 5000-sqft. Therefore, buying a house smaller than 2000-sqft means you pay more for each square foot. The combination of house price and house size should be optimized before submitting an offer.

Linear

$sqft_area1

As we see, the analyses of housing historical records are able to help residents find a desired house, fully with trust of data.

 

 

About Author

Jingyu Zhang

Jingyu Zhang has Ph. D degree in Electrical Engineering with emphasis on optics and electronics. She had been working as display scientist at Sharp Labs of America for four years. Now she is passionate about data science and...
View all posts by Jingyu Zhang >

Related Articles

Leave a Comment

calistaheritage.Org July 25, 2017
I like the report

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI