Data Analysis on the Airbnb NYC Market
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Shiny app |Β Github Repo | LinkedIn
What is this project about?
Airbnb has allowed for much innovation in the hospitality industry, in some ways, disrupting traditional models and allowing smaller players to more easily enter the market by simplifying some of the business aspects. The data that Airbnb has made available can shed light on many aspects of the hospitality market that would be usefulΒ for potential investors and entrepreneurs to make informed investment decisions. Some of the data available and used in this project is listings, coordinate location, neighbourhood, borough, price per night, number of reviews, and availability days per year.
The purpose of this project is to gather insights from this data and allow for easy exploration and customization through the use of an interactive visual dashboard using a Shiny app. However, this is more a proof of concept for the tools that can be used, and requires further analysis of the impact of COVID-19.
The data is sourced fromΒ insideairbnb.com, an independent non-commercial project that gathers publicly available Airbnb information.
Some of the questions we'll be looking to answer are:
- How is the market composed in terms of size, price, demand, supply and other characteristics?
- How has the market evolved over time?
- As this is a novel business model, how have characteristics in the offering changed?
- What has happened to the supply and demand side of the equation in this market?
- Are there any underlying factors impacting those changes?
- What has been the impact of COVID-19 in the market?
- Who has benefited, and where is the impact more pronounced?
- Where are the most interesting investment opportunities with regards to the largest mismatch between supply and demand that could lead to higher occupancy rates and price per night increases?
How has the market evolved over time?
Our first objective will be to understand how has the market changed in the last five years, since we have data available.
The first variable we will analyze is market size, which we have calculated as the product of the listings by their average availability days per year (listings days) by their price. This metric gives us the supply side of the market by telling us how much inventory is being offered for any period.
We can see an initial reduction followed by small increases over time before a moderate fall in 2020.
In order to have a better understanding of what is going on let's take a look at the underlying variables.
We can see that the initial drop in market size, from 2015 to 2016, was caused by a large drop on the average available days per listing. This probably has to do with a behavioral change by the owners of the apartments after being able to test the business model.
We can also see that the drop in 2020 was caused by a reduction in both listings and average price, which was mitigated by an increase in average availability.Β
I believe that this decrease in 2020 was caused mainly by a large reduction in demand for the service. This can be observed by our next metric, number of reviews, which we use as a proxy for demand.
Data on the Reviews
So, while we see a decrease of 25.4% in the market size from 2019 to 2020, we see a much larger decrease (67.1% )Β in the number of reviews. This could be, in part , the result of changes in the use habit by customers, such as longer stays on average, which produce fewer reviews but not less revenue necessarily.Β But we can also suspect that there has been a large decrease in demand.
Ratio of Reviews
have calculated another metric to allow us to easily compare the relative size of the market to reviews. This is the ratio of reviews divided by listings days (listings multiplied by average available days per year). This ratio increases when there is a larger relative increase in reviews (demand) than market size (supply). The following graph allows us to visually examine the changes this ratio has had over the years.
Here we can easily see that there was a trend towards a larger ratio, meaning larger relative increases in demand than supply. Until 2020 where the impact of COVID-19 caused a major decrease in the ratio for all neighbourhoods.Β
Price
We can suspect that this ratio has had an effect on the prices since the supply curve has changed so dramatically. The next graph allows us to see the changes in prices per night by neighbourhood.
We can see that although there was not much change for most neighbourhoods, there was a large decrease in Manhattan, which is the largest neighbourhood by far. There was a reduction of 13.1 % in the average price per night for Manhattan. We can also assume, due to the large decrease in the reviews per listing days ratio, that the occupancy rate must have gone down.
Distribution
Finally, we can also compare the distribution of the market by room type.
We can see some minor changes over time but it has mostly stayed the same. What we can also learn is that the distribution is composed, in a large proportion, by the type entire home/apartment followed by private room. The distribution for 2020 is the following:
- Entire home or apartment: 69.7%
- Private room: 26.1%
- Hotel room: 2.4%
- Shared room: 1.8%
In the next section we will take a look at some tools that will allow us to more closely understand and examine the market each year.
Data on Yearly Snapshot
The next section of the app allows for more exploration of the yearly data. There are several types of graphs that allow us to get a quick understanding of the characteristics of the market at that point in time.Β
First there is a dashboard with a summary of the most important variables:
One can select the year from the drop down menu and get a quick view of the most important variables from this dashboard. This yearly selection can be applied to any of the following graphs as well.
Heatmap Data
Next there are several heatmaps which overlay the listings, price per night, and reviews on a map of NYC. The listings heatmap also has dynamic clusters that show how many units are available for every area encompassed.
One can zoom in and out to get a quick visual of the data for different geographical areas. From this map we can quickly see how Manhattan and Brooklyn are the most densely populated areas.
Treemap Data
We have also created several treemaps which display the different boroughs and their neighbourhoods. These areas are based on either the listings or market value. They also show a heatmap which can either display the price per night or the reviews per listing days ratio.
We can see how the number of listings in Manhattan and Brooklyn look similar. However, once the area is based on market value, it is clear that Manhattan makes up over 50% of the market.
In the second graph we can also get an idea of which neighbourhoods are more attractive for investment due to having a higher reviews per listing days ratio. Also, if we compare it to the same graph for 2019 we see a large difference in the heatmap due to the general decrease in this ratio.
The next graph is designed to also help us understand which neighbourhoods are more attractive for investment based on the reviews per listing days ratio.Β
Neighborhood Data
It shows the top neighbourhoods with a market value above a certain amount, so we can see which could potentially be the best opportunities. The graph also allows us to modify both how many results to display and which is the minimum market value for the neighbourhood.
The graph also correlates the market size with the dot size so we can quickly get an idea of which markets are larger. For example, East Elmhurst is the most attractive neighbourhood based on this ratio for the neighbourhoods with a market value above USD $ 1 million annually.
Data Table and Download
Lastly, we have a section that allows us to get a look at the neighbourhood data for each year and also download it in CSV format. This allows us to take a closer look at a specific neighbourhood that might be of interest or download it for further manipulation if desired.
Final Thoughts and Further Opportunities
Thanks to the visualization tools, we have been able to quickly understand how the major features of the Airbnb market in NYC have evolved over time. It seemed to have been progressing toward a market with greater demand than supply over time, as the reviews increased at a faster pace than the supply of days available for rent.Β
This, however, abruptly changed due to the impact of COVID-19, which wiped out demand t in NYC. Once the pandemic struck the city, and lockdowns were imposed, far fewer people were travelling into the city due to tourist attractions and restaurantsΒ being closed. Even those with business interests were not likely to come in as remote work and communication was adopted for greater safety. The result was a much lower relative demand for Airbnb rentals. Still, with real estate prices falling, there could still be adequate investment opportunities despite this fact. More analysis on this variable would allow us to have a more complete picture in order to make an educated investment decisiΓ³n.Β
We also were able to quickly identify the largest markets and which are the most attractive today. We also gain an idea of the different characteristics of the neighbourhoods and their geographical locations.
Future Research
Nevertheless, some further research that could bring addition insights are:
- Study of real estate prices to determine if an adequate ROI can be achieved for investments in properties that will be used as Airbnb rentals or similar.
- Analysis of individual reviews to understand the factors that impact the value perceived by the customers in order to tailor offerings that generate the most value.
- Analysis of the elasticity of the price in the market to determine if there are a subset of assets where value could be generated through acquisition and optimization of the pricing strategy.
- Combining this data with available real estate data on land value to determine if this data can help predict increases in value in the real estate market.
- Comparing this data with hotel information to gather insights into how the traditional hospitality industry could better compete by tailoring its offer to the needs of Airbnb customers.
Finally, I hope this presentation and data visualization tool can bring value to you and your organization. I'm passionate about data and its potential impacts, so please feel free to connect with me in Linkedin to discuss any topic related or in case you want to stay in touch for future discussions.