Web Scraping Data on NYC-Toursim

Posted on Jul 23, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


New York City is one of the most visited cities in the world, and tourism is one of the city's primary sources of revenue. As someone who has lived in New York City for more than 10 years, I thought it would be fascinating to do a quick data analysis to observe how tourism has changed over time. Let's get started...

Data - Scrapping data 

TripAdvisor and NYCdata were used to scrape data. Tripadvisor is a fantastic resource for finding travel information, making it one of the most popular sites in this domain. The dataset I scraped has four columns: the name of the attraction, its type, the number of reviews it has got, as well as its rating. I was curious to learn which of the city's hundreds or thousands of attractions were the most popular. This dataset depicts the total popularity of each attraction place. Here's a sneak glance at the dataset:

Web Scraping Data on NYC-Toursim
Dataset 1 - Tripadvisor

The second site I scraped was New York City Data. It's a page containing data and information mostly for educational purposes, including data regarding New York City tourists during the last 13 years. The data I scraped includes data on the number of domestic and international tourists, the economic impact of tourism on New York, and the hotel industry performance of NYC. 

Web Scraping Data on NYC-Toursim
Dataset 2 -NYCData

Data Analysis


Web Scraping Data on NYC-Toursim
Top ten attractions in NYC
Web Scraping Data on NYC-Toursim
The ten attraction categories
Top ten attraction categories - Graph

Second Dataset

After reviewing the first dataset's findings, I moved on to the second dataset. We're looking at the overall number of visitors, including foreign and domestic, under the Domestic and International Tourists section. Between the year 2004 and 2017, there were almost 710 million visits, as seen in the graphs below. Domestically, 569 million dollars were spent, while the rest was spent internationally. It's no secret that New York City is a popular tourist destination for both domestic and foreign travelers, and it's only becoming bigger.

The Economic Impact on NYC's Economy was the next category I examined into. It includes information on overall tourist expenditure, taxes, earnings for local workers, and employment created by tourism. Between 2004 and 2017, international and domestic visits generated over 32 billion dollars in annual revenue, 8 billion dollars in annual taxes, 18 billion dollars in annual earnings, and over 34,000 employment.

Furthermore, tourism benefits New Yorkers by providing more job opportunities and financial resources. The link between total visitor spending and wages is positive, implying that more people visiting NYC will help the city generate more job opportunities and revenue.

The performance of hotel business is the next category, and I investigated the following variables: daily room rate and average hotel occupancy. As one of the most popular tourist destinations, New York City's hotel market is one of the largest and most profitable. The daily room fee in NYC has always been above $200, while the occupancy rate has remained consistent above 0.8 or 80 percent on average, as shown in the first graph.



To summary, tourism is unquestionably one of the city's most valuable revenue streams. From 2004 to 2017, it helps the government generate about 18 billion in revenue each year, although the numbers dropped slightly during the financial crisis in 2008 and 2009. 


If you have any questions or comments, please feel free to reach out on LinkedIn or GitHub.

Github: https://github.com/Evinwlin/NYC_Toursim

Linkedin: https://www.linkedin.com/in/wei-evin-lin/



About Author


With a bachelor's degree in Finance and a bachelor's degree in Statistics, Wei(Evin) Lin is a certified data scientist. He has more than two years of finance and accounting internship experience in the area of sales and trade,...
View all posts by Evin >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI