Airbnb Traveling Data Analysis: To be or not to Airbnb
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
The inspiration for performing an EDA on Airbnb data first came from the work done over at Inside Airbnb and the advent of legislation in recent years geared towards regulating short-term rentals in major metropolitan areas, such as New York. I thought it would be interesting, at a minimum, to gain greater insight into Airbnb ownership patterns in New York City, whether listing owners and/or managers adopted particular strategies in how listings were displayed, and to discern whether regulatory attempts to curtail short term lets have had a significant impact on rental prices in New York City.
I discovered some interesting broad trends in ownership distribution and how listings are managed throughout New York City. As one would expect, the disproportionate majority of listings were either single-owner listings or had managers who only oversaw a single listing. However, it is interesting to note that in spite of the regulatory pressure, there were still several persons who managed dozens or hundreds of listings.
This seemed to indicate that a multiplicity of private and business structures may well still operate within the Airbnb universe, in spite of regulatory attempts to curtail these kinds of practices. In addition, it was interesting to note that the distribution of listing number by neighborhood within the respective boroughs varied markedly.
Ownership overall also was heavily concentrated in Manhattan and Brooklyn (the neighborhoods here are log-weighted by the number of listings they contain). Staten Island also seems to be underrepresented in Airbnb. In addition, not only did room type seemed to have a bearing on the median listing price by borough, but the listings' title word counts were disproportionately between twenty-five and fifty characters.
Unfortunately, drawing firm conclusions around ownership patterns, ownership strategies and the effectiveness of legal regulation was stymied by two issues: weakly-correlated variables and limitations on the granularity of the data set to address cost-comparative and legal regulatory considerations. In future work, I aim to make use of different and multiple data sets to gain stronger conclusions, as well as to better integrate my R script with shinyapps.io (it failed to upload correctly and has to be run locally for the time being).