Airbnb in NYC - Spatial Analysis of Illegal Activity

Posted on Jul 22, 2016

Airbnb in New York City

Airbnb boasts almost two million listings in 34,000 cities, and according to data from Inside Airbnb, a independent data analysis website, listed about 36000 apartments in New York as of July 5, 2016. This data exploration sets out to visualize how Airbnb operates in New York City. Airbnb's presence in NYC has been clouded in controversy from the beginning, with law makers arguing that Airbnb drive up rents for New York residents, as well as facilitating a lot of illegal hosting activities, all the while not paying any of the fees hotels are subjected to. Rent is drived up when landlords decide to rather rent apartments to short-term guests at higher rates, compared to signing up tenants for yearlong leases. In a study conducted in 2014, The New York State Attorney General concluded that 72%of all units used as private short-term rentals on Airbnb during 2010 through mid-2014 appeared to violate both state and local New York laws. New York’s short-term rental laws, which were last updated in 2010, basically prohibit most apartments (buildings with three or more units) in New York City from being rented out for less than 30 days. This means that the majority of entire home/apartment listings that you find on Airbnb and other sites for New York City would be considered illegal, especially if you can book them for a period of less than 30 days. Airbnb has not actively helped city or state officials limit illegal listings on their sites and, according to data supplied by Airbnb, entire home listings outnumber listings for private rooms or shared spaces on their platform for New York City. This analysis sets out to to investigate the number of listings that are available for long extended periods throughout the year, as well as shed some light on just how much illegal listings are hosted on Airbnb's platform. I'm going to visualize the distributions of Prices, Reviews per Month, Availability, Minimum Stay by each of the five boroughs. Next, I visualize how Median Income in each neighborhood is correlated with prices and the percentage of entire home listings available. Finally, I do a Geo-Spatial Analysis of Prices, Reviews per Month, Availability and Minimum Stay.


To start things off, let’s look at the distributions of Price, Reviews, Availability and Minimum Nights.  These plots primarily investigate how the 5 Boroughs difference when we compare the prices, reviews, and availability of each respectively.


In the figures we can see that the distributions are skewed to the right. Even after zooming it and cutting off the x-axis at 500, we still see the distribution is skewed to the right. I decided to take the log of prices in order to fix this issue.


We can clearly see now how prices are distributed. Manhattan, as expected, is the most expensive followed by Brooklyn, Queens, Staten Island and finally The Bronx.

Reviews per month

The Reviews per Month variable gives us a good indication of how much activity goes on in each Borough. The more reviews a listing has per month, the more it gets rented out. The plot below shows that the median number of reviews per month is 1.  The plot to the right shows the outliers. The most active listing is in Queens. It averages about 11 reviews per month. Listings in Queens seems to be the busiest on average, but we see quite a few places in Manhattan and Queens that have high frequency listings.

Unknown-5 Unknown-6


Next I investigate how availability differs across the five boroughs. What struck me as quite interesting is how much these listings are available on an annual basis. The distribution shows that listings are either available for a very short period, or essentially for 365 days out of the year. A quick look at estimated occupancy rates for Entire Home rentals in New York City (as at July 2, 2016) shows us that more than 6,000 entire homes are being rented for more than half the year, and most likely are no longer available on the rental or owner-occupied housing markets.


Minimum Nights

Now lets test the claim that 72% of listings are illegal. Under state law, it is illegal to lease most homes—with the exception of one and two family residences for periods of less than 30 days when the owner or tenant is not present. This means that if an apartment is listed for less than thirty days and is listed as an “entire home”, it most likely is an illegal listing. The data shows that there is definitely something fishy going on.


Correlation Plots

The following section is dedicated to exploring the relationship between Median Income in each neighborhood, and how it is correlated with prices and the percentage of entire home listings available. As we might expect, the more affluent regions in New York charge higher prices, as well as list a higher percentage of entire rooms.


The plot above shows that the Median Income is correlated with the Median Price of Private Room Listings. As the Median Income increases, starting from the bottom left of the plot, so does the Median Price of Private Room Listings. The correlation coefficient is 0.69.


The relationship between Median Entire Room Price and the Median Income for each neighborhood showed a positive correlation of 0.61. Wealthier neighborhoods list more expensive apartments. We again see that Brooklyn and Manhattan are the most expensive listings.


After I doing a log transformation, we observe a linear relationship, with a correlation of 0.632 between the log of median income and the percentage of apartment that are for entire room listings. I then colored them by Borough. We see that Brooklyn and Manhattan have the highest percentage of entire apartment listings.

Geo-Spatial Analysis of Airbnb Listings in New York City

This section of my data exploration spatially visualize Airbnb’s listings in the city. I’m going to look at prices, activity levels, and which rental units are illegal under New York law.


Prices of Listings

The red on the map shows the expensive listings, and the yellow color denote the cheaper listings. As expected Manhattan has the highest concentrated regions of expensive listings, especially in areas like Soho and Chelsea and the Financial District. The next plot show how prices are distributed across the neighborhood.

Price by Neighborhood

Screen Shot 2016-07-22 at 6.26.13 PM 1

The plot shows exactly what we could have guessed – Manhattan has the more expensive listings, on average, and the most expensive neighborhoods are concentrated near the financial Chelsea, Soho and the Financial district. The Bronx, as well as the outskirts of Brooklyn has the cheapest neighborhood on average.

Type of Listing

Screen Shot 2016-07-22 at 6.26.24 PM

Above we can see how the type of listing is distributed across NYC. We can clearly see that the more expensive regions are also the regions that list more entire apartment listings. Shared rooms are rarely seen.

Minimum Nights for Entire Apartments

Screen Shot 2016-07-22 at 6.26.45 PM

On June 17, the New York state legislature passed a bill that would heavily fine hosts on Airbnb and other short-term rental sites like HomeAway, FlipKey, and VRBO, who post listings that violate the state’s laws on short-term rentals. The new law has a penalty of up to $1,000 for the first violation, $5,000 for the second violation, and $7,500 for the third and subsequent violations. My analysis shows that quite a number of people will be in big trouble. Governor Cuomo has until January to make his decision about the bill, but some have wondered: Is this law as it’s written, legal? Or could opponents of the law argue that it is unconstitutional? If Governor Cuomo does approve this advertising law, there’s a chance it could have a chilling effect on the number of listings Airbnb has for New York City, its biggest market in the U.S. Hosts renting out entire apartments for less than 30 days may choose not to list their apartments any longer, for fear of paying those hefty fees.

About Author

Jurgen De Jager

Jurgen’s fascination with analytics and its applications specifically within data science, led to his decision some time ago that this is the career path he wants to pursue post graduation. In anticipation of this, he has worked extensively...
View all posts by Jurgen De Jager >

Leave a Comment

Thomas October 3, 2016
Where can one find the airbnb data sources?

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI