MyShoeBox - Finding Value in NYC Apartments

Zachary Escalante
Posted on May 24, 2016

Contributed by Zach Escalante. He  is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between April 11th to July 1st, 2016. This post is based on his third class project - Python Web scraping (due on the 6th week of the program).

Introduction:

Whether you're new to the Big Apple, or a native Brooklynite, everyone knows that apartment hunting can be a painful hassle. If it's your first time moving to the concrete jungle, it can feel overwhelming trying to figure out what neighborhoods provide the best value, or,  where can I find a flat with four bedrooms for you and your friends? Perhaps you're done with roommates and want to see how much you need to budget in order to get a studio within walking distance of your favorite West Village brunch spot?

Well now with MyShoeBox, you have access to all this information and more with the click of a button. Never again will you have to question your decision to place a deposit on an apartment because the broker says it won't last, because you'll know exactly how many are just like it and how much they cost!

How It Works:

What MyShoeBox does is coalesce all of the data from www.streeteasy.com and bucket it according to each neighborhood. With this data, the user can see a breakdown of what they can expect to find in each neighborhood, and how much they should expect to pay. This way, when they find the apartment of their dreams, they can have the confidence to make a quick decision.

Data Collection:

The first challenge in building this application is to obtain the data from Streeteasy. To do this, I used the BeautifulSoup application in Python. I initialized a dictionary and then added each apartment as a nested dictionary object inside my original one with the apartments data id as the key. After this, I added the apartment's attributes (beds, baths, square feet, price, neighborhood, coordinates) to the nested dictionary.

Screen Shot 2016-05-23 at 11.06.16 PM

In order to get the bed, bath and square feet, I had to write a second for-loop which iterated over each apartment item and returned tag whose text string matched any of the words 'bed', 'bath' or 'ft'. To do this, I created a variable 'pattern' which returned any of these words if they were found in a string

Screen Shot 2016-05-24 at 12.07.41 AM

and then added that word as an attribute in the apartment dictionary and assigned its value to that attribute. Once I collected all the data in my nested dictionary, I then cleaned the data and converted it to a  data frame structure and exported it as a file into R, where I could then use it to create the Shiny web application.

Shiny Application:

Screen Shot 2016-05-23 at 11.54.16 PM

The user is promoted to select a neighborhood they would like to live in. Once they have made their choice, they are shown the number of apartments available in that neighborhood, The price per square foot, the average price of a studio, and the number of studios available. Improvements to the application will include asking the user to specify how many rooms they are looking for in an apartment, then being shown how many apartments match their criteria. This way they can compare two-bedroom apartments in West Village versus similar apartments in the East Village, or Tribeca. Additionally, my next steps will include adding the top 5 picks on Streeteasy that match the user-defined criteria according to either asking price, or cheapest per square foot.

Screen Shot 2016-05-23 at 11.48.46 PM

Conclusion:

This was an exciting project for me because I believe that this information can and will be very helpful to people of all ages and socio-economic statuses in helping them find the very best value with their NYC apartment. Additionally, I was able to combine both Python BeautifulSoup with a Shiny web application in R, and learned how to take a project from start to finish in terms of the data collection, scrubbing and user interface. If you have questions or comments as to how this application could be improved, please feel free to let me know!


About Author

Zachary Escalante

Zachary Escalante

Zach Escalante's path to the field of Data Analysis has not been a conventional one. Born and raised in South Florida, Zach did his first bachelor's degree in Finance at Florida Atlantic University (FAU). Following the completion of...
View all posts by Zachary Escalante >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp