Analysis Of NYC Airbnb Listings

Posted on Aug 4, 2023

Research Questions for NYC Listings

In this Airbnb analysis we will try to answer these questions:

  1. Where is the best area for Airbnb listings in NYC?
  2. Where are the opportunities in the market that would allow for a new lister on Airbnb to optimize their revenue?
  3. What are features in listings most influence the annual revenue?

Analysis Background

Recently, Airbnb’s revenue from listings nearly doubled globally from $3.88 billion in 2020 to 6.85 billion in 2021 with North America growing the most among regions . With difficult economic times for many families, listing a property with Airbnb may be an effective way to earn alternative revenue.

According to Mashvisor, as of March 2023, there are over 6.6 active listings on the platform that are managed by over 4.4 million hosts. There are also over 150 million people use Airbnb as their primary means of finding rental units. Because of the influx of new listings, listers often find it difficult to get started with Airbnb. They see a saturation of listings and are deterred from even initially listing their property. This analysis will help identify potential gaps in the NYC listings market to help listers best optimize their revenue.

Airbnb Data

The data for this analysis was gathered from Inside Airbnb, specifically in New York City from the month of December. Although the dataset may not incorporate seasonality for the rest of the year, the dataset has columns with predicted availability over the next 365 days for each listing. To get a better understanding of the columns in this dataset, the data dictionary can be found at this link.

Initial Findings

To to an initial analysis of the dataset and how each of the features influence each other, I created a Pearson's Correlation Heat-map to try and measure how each feature influences each other. This is seen in this figure.

From this we see that the price has a positive weak correlation with the number of listings a host has. It would seem that as the number of listings increases, the price of each listing also increases. This may be because many of the listings are in an area where the prices are just naturally higher such as Manhattan. It also seems that there is a weak correlation between the availability of a listing and its price. This is interesting because the availability_365 column is defined as the number of days a listing is available in the next 365 days and it would seem that as the number of days a listing is available increases, the price also increases. Although this is seems counterintuitive, in the dataset dictionary it states that a listing may be unavailable due to the owner of the property blocking the listing from being booked, or because the listing already has already been booked. This either shows that many of the listings in the used dataset are blocked by the lister, or that there truly is a negative relationship between these two variables. To explore more, we would need to look at another dataset of all of the listers in NYC and delve deeper into the types of listings there are.

Above is an initial heat map showing where the heaviest concentration of listings in NYC are. It shows that there is a large concentration of Listings in Manhattan and Brooklyn. As a new lister, this is logical because there are more tourist attractions in those areas such as Central Park, The Empire State Building, etc.

To compare the listings and how successful they were, we created a column in the data frame named "annual_revenue" which was calculated from the availability_365 column and the price column. This was then added to each listing to show their estimated annual revenue. The neighborhoods with the highest average annual revenue are as follows.

  1. Chelsea - $119443.27
  2. Brooklyn Heights - $105301.77
  3. Theater District - $102203.87
  4. SoHo - $96167.34
  5. Tribeca - $89216.63
  6. Financial District - $82792.28
  7. West Village - $79576.62
  8. Greenwich Village - $77590.37
  9. Midtown - $77515.15
  10. Boerum Hill - $77406.62
  11. NoHo - $72173.75
  12. Downtown Brooklyn - $69474.00
  13. Battery Park City - $67842.68
  14. Hell's Kitchen - $65484.57
  15. Kips Bay - $63182.99
  16. Columbia St - $62889.90
  17. Vinegar Hill - $62258.82
  18. East Village - $61701.07
  19. Nolita - $59398.92
  20. Cobble Hill - $57816.72

From this figure, we can see that the annual revenue for the listings in those neighborhoods are in the range of $60,000-$110,000 a year. To get a better understanding, we would need to look Into the categorical descriptors.

From the right figure above, it seemed as if the borough had a correlation with the annual revenue. When further explored it can be seen that Manhattan has the greatest median price, while the Bronx has the lowest.

The next step in the analysis is to break the listings up into price bins. We did this because this then allowed for further analysis of all of the boroughs. We can now look at a price range where a lister should optimally list their property. Once done for all of the boroughs, some gaps in the market may appear.

Below are line charts for each borough and it shows the number of listings in the borough, the average days a listing is not booked, and the estimated annual revenue calculated from average nights books multiplied by the nightly rate compared to its price bin. A spike in the chart may show where a price bin is popular in the borough and may unveil an optimized range for a listing in a borough.

Brooklyn Analysis

From these charts we can see that the majority of the listings from this dataset are from Brooklyn in the lower price bins. However, we can conclude from the graph on the right it that even though the availability is higher, the price . When looking at the availability of that price bin in the second chart, it seems that

Manhattan Analysis

In this analysis it can be seen that the $980-$990 price bin again creates more annual revenue than its neighboring price bins. In Manhattan there are also more listings around that price range meaning that there is already a market at that price range. This may be because people finding listings have a budget around $1000 a night, and are specifically looking for the most in that price range. It also shows that if a lister wanted, lowering or raising the price to that specific price bin may bear higher annual revenue.

Queens Analysis

From this, it can be seen that there is a large increase in estimated annual revenue at around the $820-$830 price bin. When explored further, it can also be seen that there is also more demand for listings in this price bin. It is showcased in the middle figure, where there a steep drop in the number of days a listings is available in the price range. The neighboring price bins also have very similar number of listings as the $820-$830 price bin, but with far less estimated annual revenue. This shows that there may be a market for listings in that price range. This can also be seen in the $980-$990 price bin.

The Bronx Analysis

It can be seen that there are only about 250-300 listings in the Bronx. From the third chart it can be seen there is an increase in annual revenue at around the $250 price bin. When explored further

Staten Island Analysis

From these charts we can see that there are not many listings in Staten Island from our dataset. The estimated annual revenue spike near the right end of the graph can be attributed tot the fact that there are less than 10 listings in that price range in Staten Island. This is considered noise caused by a low sample size. This either means that there truly are not many listings in Staten Island or the data frame we used does not include all listings from Staten Island. To get a more in depth look, we would need a Staten Island specific dataset which may provide the insights this dataset is missing.

Analyzing the Type of Room

From the earlier analysis we can see that a majority of the listings are in Manhattan and Brooklyn. When looking closer into which types of room generate the more revenue, we find each borough has a different optimal room type. Below are some analyses which find potential markets for new listers with a specific listing type.

Brooklyn Home

From these graphs we can find potential markets as well as parts of the market which listers should avoid. There is a price bin at $990-$1000 which generates more revenue than its neighboring price bins while also having a lower availability. There is also an increase in listings in that price point. It can also be seen that the price bin at $860-$870 should be avoided. The graphs show that there is less estimated revenue, increased availability while maintaining roughly the same number of listings. This means that the listings at that price point are not as popular as other price bins.

Manhattan Private

When looking at private room type listings in Manhattan, it can be seen again that there is a price point at $990-$1000 which is very popular. Again, there is a drastic increase in estimated annual revenue, a decrease in availability, and increase in number of listings. This means that the price bin is very popular among people trying to find a listing. If a lister owns a private room in Manhattan, he/she should look to price their room at around $990-$1000. Listers should avoid the $400-$420 price range because although they have more listings, the estimated annual revenue is less than its neighboring price bins, and the availability is greater than its neighboring price bins.

Limitations to the Analysis

Even though the analysis shows the price points in which listings are popular, it does not explain why they are popular. To get a better picture, more details are needed such as the amenities provided, the neighboring businesses, and cost of living in the area. For example, it is much more expensive to live in certain parts Tribeca than it is to live in certain parts in Chinatown.

About Author

Roger Liu

I recently graduated Cornell University in 2022 with a B.S in Information Science, Systems, and Technology from the College of Engineering.
View all posts by Roger Liu >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI