NYC-Toursim, Web Scraping

Posted on Jul 23, 2020


New York City is one of the world’s most visited cities, making tourism one of its major income sources. As someone who moved to New York City over ten years ago, I thought it would be interesting to run a short analysis to see how tourism has grown over the past years. Let’s dive in…

Scrapping data 

The two sites that were used for scrapping are and Tripadvisor is a great resource when it comes to travel information lookup, effectively making it one of the most used sites in this category. The dataset I scrapped down contains four columns consisting of the attraction name, type of attraction, number of reviews and the rating it received. I was interested to find out which attractions were the most popular out of the hundred or thousands in the city. This dataset shows an overall sense of how popular each attraction site is. The following is a peek of the dataset:

Dataset 1 - Tripadvisor

The second site that I scrapped was an affiliate of Baruch College. It's a site containing data and information mostly for educational purposes -- including  the data that I scrapped of New York City’s tourism over the past 13 years. It contains information for four categories: Domestic and International Visitors, Economic Impact of Tourism on New York City's Economy, New York City (NYC) Hotel Market and International Visitors to New York City By Major Countries and Regions.  

Dataset 2 -NYCData, Baruch College


While analyzing the dataset of Tripadvisor, I was interested in finding out what the top ten attractions in the city and how popular they actually are. The steps I took was simply grouping the data and looking for the max value in the number ratings or reviews. The result is shown in the below captures. 

Top ten attractions in NYC
Top ten attraction categories
Top ten attraction categories - Graph

After the results from analyzing the first dataset, I went on to inspect the second dataset. There are four categories in this dataset which I have introduced before. In the Domestic and International Visitors, we are looking at the total number of visitors in both international and domestic. We can see in the graphs below that there were over 710 million visitors from 2004 to 2017. Among them, 569 millions were domestic and the rest were international. It's quite obvious that NYC is popular to both domestic and international visitors, and it has been growing.

The next category I inspected was Economic Impact of Tourism on New York City's Economy. It contains data for the total spending of visitors, taxes, wages for local workers and also jobs created by tourism. On average, both domestic and international visitors generated over 32 billion dollars each year from 2004 to 2017, 8 billion dollars each year in taxes, 18 billions dollars in wages each year and created over 34,000 jobs each year. Moreover, tourism helps NYC locals by creating more job opportunities and more financial resources. The relationship between total spending of visitors and wages are positive, meaning more people visiting NYC would help generate more job opportunities and dollars for the city.

The next category is the hotel market, and the two variables I looked at were daily room rate and average hotel occupancy. Being one of the most popular cities for tourism, the hotel market in NYC has been one of the biggest and most profitable one. As the first graph shows, the daily room rate in NYC has always been above $200 and the occupancy rate has been stable above 0.8 or 80% on average.


Future work

Hopefully to get more data and provide a more comprehensive analysis of NYC tourism business.



About Author



Wei(Evin) Lin is a certified data scientist with with a bachelor’s in Finance and a bachelor’s in Statistics. He has 3+ years of Finance and accounting internship experience across sale and trading, accounting and general finance fields. He...
View all posts by Evin >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp