Becoming a Successful Airbnb Host in NYC

Posted on Aug 24, 2023
  1. Introduction

In 2022 New York City ranked among the world’s ‘most powerful’ tourism cities, according to the World Travel & Tourism Council (WTTC).  In that year alone, NYC attracted around 56 million tourists, a figure that was expected to increase to 61 million in 2023. From the initial establishment of the Airbnb platform in 2008, to its rebrand in 2014, up until today, it has become a household name for vacation and short term rentals. And now, there are more Airbnb listings in NYC than there are rental listings in NYC, so it is not surprising that people may consider investing in Airbnb properties.

2. Research Questions

The purpose of this project is to identify what indicators Airbnb property investors should consider to maximize potential profits.

The exploratory data analysis seeks to answer the following questions:

  • How does someone become a successful Airbnb host in NYC?
  • What makes a listing popular?
  • For someone aspiring to be an Airbnb host, which areas in NYC would be the most profitable? 
  • What are the trends saying about the behavior of NYC visitors?
  • How can data be leveraged to improve the online booking experience?

3. The Data

The dataset I chose for this analysis was the Airbnb Open Data NYC Dataset found on Kaggle. As this was an already cleaned version,  not much additional cleaning was necessary for this dataset. Smaller, additional cleaning was performed, including, rounding down integer values to two decimal places, correcting misspelled column names, and changing certain column values to boolean type values, such as “Yes” or “No.” This dataset showed listing activity of Airbnb listings in NYC with construction years ranging from 2003 – 2022. 

4. Exploratory Data Analysis

To start off, a heat map was plotted to show a visualization of which room types were most prominent in each borough. Brooklyn and Manhattan accounted for a majority of both overall Airbnb listings and all entire home/apartment type listings. 

A picture containing text, screenshot, diagram, colorfulness

Description automatically generated

This is most likely due to higher demand. Manhattan is the center of NYC tourism, with Brooklyn coming in second. Identifying where each type of room is most aggregated can be useful for future listers who want to determine where there are opportunities for growth and which areas may be too saturated to be competitive with other listers.

Most in Demand Room Types Per Borough

Using the listings with the highest amount of reviews as the measuring metric, we determined which rooms were most popular.  On the Airbnb platform, only travelers who have completed a stay are able to leave reviews. As a review on AirBnB correlates directly to a booking,  the review count can work as a measure of popularity.

To get a picture of where the current demand was, the data was filtered by the ‘last_reviewed’ column values, with only the ones that were last reviewed in 2022. This separate data frame was then used to create a bar chart to show which borough and room type combinations were accumulating the highest number of reviews. The results showed that entire home apartments/homes in Brooklyn received significantly more reviews in 2022 than other room/borough combinations. Results may be slightly skewed toward Brooklyn and Manhattan entire apartments/home listings due to their representing the majority of overall Airbnb listings in New York City. The Brooklyn neighborhood has more open space and less hectic environment in comparison to busy streets of Manhattan.  It’s possible that travelers are preferring to stay in more private and less busy areas.

A picture containing text, screenshot, line, plot

Description automatically generated

Highly Saturated Neighborhoods in NYC

The data reveals the top 10 neighborhoods with the largest number of listings. The results of this bar graph are able to show potential future listers what areas they might want to avoid investing in. The more saturated the neighborhood is already with listings, the harder it will be to remain competitive and receive a greater ROI on the property.

Analyzing the Distribution of High-Rated Rooms

The data was then grouped by only the listings that received either a 4.0 or 5.0, measured as good and great respectively. The distribution of ratings per room type was also dependent on the total number for each room type. The greater the number of available listings for a particular room type, the higher the likelihood of receiving higher ratings, and conversely, the lower the likelihood of receiving lower ratings. Of a total of 69,305 listings, there were 37,212 entire home/apartment listings and 30,508 private rooms. Together, they made up around 97% of all listings, which explains why these two room types make up a majority of the high rated listings.

A picture containing text, screenshot, diagram, circle

Description automatically generated

Almost the same patterns in distribution were seen when highly rated rooms were grouped with each borough and room type as well. The data was consistent with where and which type of listings were most saturated: Manhattan, Brooklyn and entire home/apartments and private rooms.

A picture containing text, screenshot, diagram, circle

Description automatically generated
A pie chart with text

Description automatically generated with low confidence

Year-Round Available Listings (365 days)

After grouping the data by high-rated listings that were available 365 days, a pie chart was used to visualize the top 10 results. This was almost 100% consistent with the top 10 neighborhoods with the most number of listings. 

A picture containing text, screenshot, diagram, circle

Description automatically generated

Identifying Common Description Words&Phrases in the Top 100 Most Reviewed (Review Count)

In order to determine if there were specific keywords and phrases that were commonly associated with the most booked listings, a word count was done on the top 100 most reviewed listings. Any words that appeared more than two times among the 100 most reviewed listings were pulled and organized into a word cloud. In addition, any filler words or symbols, such as &, of, -, +, !, the, etc. were excluded in the word count.

We found that location specific keywords were significant. Words regarding proximity to nearby airports and the subway were observed and also the words, “Private” and “Quiet”, suggested that travelers prefer to stay in quieter areas with more privacy. “No cleaning fee” was seemingly also important but not preferred by travelers. And consistent with the demand, Brooklyn and Manhattan were also among the most frequently appearing words in the top 100 most popular listings.  

A close-up of words

Description automatically generated with low confidence

This analysis will help Airbnb owners pick the right words and phrases when describing their listings to help improve their visibility in search results. Greater visibility should increase the number of bookings for their properties

A Look at Seasonality Over Time

In order to measure seasonality most accurately, this dataset would have needed to have the actual booking dates recorded. In order to get a picture of the seasonality demand over time as closely as possible, a scatterplot was set up to visualize the number of reviews (y axis) against the last date it was reviewed ( x axis). The last reviewed date column values were reformatted into just the month and year values for ease of visualization. 

January, February and March 2022 saw the highest spike in number of reviews, with June and July 2019 following closely behind. January to March is usually considered off-season for travel,  so these months would be the cheapest time to travel. In addition, 2022 saw international travel surpass pre-pandemic levels. People were most likely more comfortable traveling in 2022 compared to 2020 and 2021, which were the peak times of the pandemic.

A picture containing text, screenshot, plot, line

Description automatically generated

Airbnb Stay Attributes 

Another important thing to know is if any attributes associated with the listing had an effect on the booking rate, price, and ratings.  Listings with a one night minimum stay saw the most amount of bookings and reviews. A box plot was used to see if the cancellation policy (ranging from flexible, moderate to strict) and instant bookability had an effect on price, and it was found that they did not.  

A picture containing text, diagram, line, number

Description automatically generated
A picture containing text, line, diagram, number

Description automatically generated

House rules also had no effect on a listing’s rating. 

A picture containing screenshot, diagram, rectangle, text

Description automatically generated

Determining the Most Optimal Price for Each Area and Room Type

As a potential investor in a property to list on Airbnb, it’s important to determine where the price distribution lies to get a sense of how a listing should be priced. When determining the most optimal price for a listing, it was important to first determine the median pricing for each borough and room type. Compared to the mean, the median is better at determining central tendency for skewed distributions since it is much more robust and sensible.

Based on the results, optimal pricing for someone looking to invest in a property to list on Airbnb would have to consider a price range between $621 - $650 for entire apartments and private rooms if they want to be competitive in the market.

A picture containing text, screenshot, diagram, plot

Description automatically generated
A picture containing text, screenshot, plot, diagram

Description automatically generated
A picture containing text, screenshot, plot, diagram

Description automatically generated

Current Demand of Still Active Listings

Lastly, it was important to see where the current and most recent demand lied, so focus was put on specific NYC neighborhoods by filtering out the top 10 areas that had the most amount of bookings/reviews in 2022 alone. This was also used as a metric to see which listings were still active, as well. Of the listings that are still active, the Bedford Stuyvesant neighborhood in Brooklyn had substantially more bookings in 2022 compared to other neighborhoods. This also aligns with the previous analysis results of Brooklyn entire apartments being the most in demand.

5. Conclusion

Based on this exploratory analysis of the given dataset, location is a clear factor that travelers take into consideration when booking a room on Airbnb.

Proximity and closeness to specific areas, such as subways, airports, popular areas like Manhattan and Brooklyn, were also important decision factors. Analyzing the most common keywords associated with popular listings will become meaningful data that can be used to establish an algorithm within Airbnb to put more weight on certain keywords based on city and specific neighborhoods. When deciding how to describe a listing to reach the target traveler faster, these factors and keywords should be taken into account.

It is clear that entire apartments and private rooms in Brooklyn and Manhattan are the most popular boroughs and room types respectively. However, highly saturated areas like Brooklyn and Manhattan are most likely too saturated with listings already, making it that much more difficult to stay competitive against other listings. That means an investor should carefully consider which section of these popular boroughs to invest in.

6. Future Works

There were quite a few more areas that required further exploration in order to obtain more accurate business insights. Data on which listings were Superhosts or regular hosts would have been very helpful in determining their impact on booking rate. In addition, data on exact booking dates, length of stays, and which days of the week the bookings occurred could give us insight into which specific days and periods of the year are seeing the most travel activity. 

Finally, information on each listing’s reviews could also be used to perform a common word/phrase count as we did with the listing descriptions as well. This will help identify whether a listing’s reviews are mostly positive or negative, so that we may differentiate between reviews and quality of the listings. A review on Airbnb may mean a booking, but it does not necessarily mean it’s a positive review. 

About Author

Emily In

After earning my Bachelor degree in Sociology and working as an SEO Analyst, I discovered my interest in researching and deriving business insights and stories from data. A family matter required me to leave my job and work...
View all posts by Emily In >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI