Data Study on Finding Value in NYC Apartments
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction:
Whether you're new to the Big Apple, or a native Brooklynite, everyone knows that apartment hunting can be a painful hassle. If it's your first time moving to the concrete jungle, it can feel overwhelming trying to figure out what neighborhoods provide the best value, or, where can I find data on a flat with four bedrooms for you and your friends? Perhaps you're done with roommates and want to see how much you need to budget in order to get a studio within walking distance of your favorite West Village brunch spot?
Well now with MyShoeBox, you have access to all this information and more with the click of a button. Never again will you have to question your decision to place a deposit on an apartment because the broker says it won't last, because you'll know exactly how many are just like it and how much they cost!
How It Works:
What MyShoeBox does is coalesce all of the data from www.streeteasy.com and bucket it according to each neighborhood. With this data, the user can see a breakdown of what they can expect to find in each neighborhood, and how much they should expect to pay. This way, when they find the apartment of their dreams, they can have the confidence to make a quick decision.
Data Collection:
The first challenge in building this application is to obtain the data from Streeteasy. To do this, I used the BeautifulSoup application in Python. I initialized a dictionary and then added each apartment as a nested dictionary object inside my original one with the apartments data id as the key. After this, I added the apartment's attributes (beds, baths, square feet, price, neighborhood, coordinates) to the nested dictionary.
In order to get the bed, bath and square feet, I had to write a second for-loop which iterated over each apartment item and returned tag whose text string matched any of the words 'bed', 'bath' or 'ft'. To do this, I created a variable 'pattern' which returned any of these words if they were found in a string
and then added that word as an attribute in the apartment dictionary and assigned its value to that attribute. Once I collected all the data in my nested dictionary, I then cleaned the data and converted it to a data frame structure and exported it as a file into R, where I could then use it to create the Shiny web application.
Shiny Application Data:
The user is promoted to select a neighborhood they would like to live in. Once they have made their choice, they are shown the number of apartments available in that neighborhood, The price per square foot, the average price of a studio, and the number of studios available.
Improvements to the application will include asking the user to specify how many rooms they are looking for in an apartment, then being shown how many apartments match their criteria. This way they can compare two-bedroom apartments in West Village versus similar apartments in the East Village, or Tribeca. Additionally, my next steps will include adding the top 5 picks on Streeteasy that match the user-defined criteria according to either asking price, or cheapest per square foot.
Conclusion:
This was an exciting project for me because I believe that this information can and will be very helpful to people of all ages and socio-economic statuses in helping them find the very best value with their NYC apartment. Additionally, I was able to combine both Python BeautifulSoup with a Shiny web application in R, and learned how to take a project from start to finish in terms of the data collection, scrubbing and user interface. If you have questions or comments as to how this application could be improved, please feel free to let me know!