NYC-Toursim, Web Scraping

Posted on Jul 23, 2020


New York City is one of the most visited cities in the world, and tourism is one of the city's primary sources of revenue. As someone who has lived in New York City for more than 10 years, I thought it would be fascinating to do a quick analysis to observe how tourism has changed over time. Let's get started...

Data - Scrapping data 

TripAdvisor and NYCdata were used to scrape data. Tripadvisor is a fantastic resource for finding travel information, making it one of the most popular sites in this domain. The dataset I scraped has four columns: the name of the attraction, its type, the number of reviews it has got, as well as its rating. I was curious to learn which of the city's hundreds or thousands of attractions were the most popular. This dataset depicts the total popularity of each attraction place. Here's a sneak glance at the dataset:

Dataset 1 - Tripadvisor

The second site I scraped was New York City Data. It's a page containing data and information mostly for educational purposes, including data regarding New York City tourists during the last 13 years. The data I scraped includes data on the number of domestic and international tourists, the economic impact of tourism on New York, and the hotel industry performance of NYC. 

Dataset 2 -NYCData



Top ten attractions in NYC
Top ten attraction categories
Top ten attraction categories - Graph

After reviewing the first dataset's findings, I moved on to the second dataset. We're looking at the overall number of visitors, including foreign and domestic, under the Domestic and International Tourists section. Between the year 2004 and 2017, there were almost 710 million visits, as seen in the graphs below. Domestically, 569 million dollars were spent, while the rest was spent internationally. It's no secret that New York City is a popular tourist destination for both domestic and foreign travelers, and it's only becoming bigger.

The Economic Impact on NYC's Economy was the next category I examined into. It includes information on overall tourist expenditure, taxes, earnings for local workers, and employment created by tourism. Between 2004 and 2017, international and domestic visits generated over 32 billion dollars in annual revenue, 8 billion dollars in annual taxes, 18 billion dollars in annual earnings, and over 34,000 employment. Furthermore, tourism benefits New Yorkers by providing more job opportunities and financial resources. The link between total visitor spending and wages is positive, implying that more people visiting NYC will help the city generate more job opportunities and revenue.

The performance of hotel business is the next category, and I investigated the following variables: daily room rate and average hotel occupancy. As one of the most popular tourist destinations, New York City's hotel market is one of the largest and most profitable. The daily room fee in NYC has always been above $200, while the occupancy rate has remained consistent above 0.8 or 80 percent on average, as shown in the first graph.



To summary, tourism is unquestionably one of the city's most valuable revenue streams. From 2004 to 2017, it helps the government generate about 18 billion in revenue each year, although the numbers dropped slightly during the financial crisis in 2008 and 2009. 


If you have any questions or comments, please feel free to reach out on LinkedIn or GitHub.





About Author



With a bachelor's degree in Finance and a bachelor's degree in Statistics, Wei(Evin) Lin is a certified data scientist. He has more than two years of finance and accounting internship experience in the area of sales and trade,...
View all posts by Evin >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp