National Park Analysis on Season, popularity and fees

Posted on Jul 28, 2019

Project GitHub | LinkedIn:   Niki   Moritz   Hao-Wei   Matthew   Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Motivation

Why did I choose National Park?
I like to travel and if I have a free time, I try to go on a trip.
Also, I like to look for the place that was in the background while watching the movie. I was interested in National Park after watching the movie ‘Wild’. Since I have never been there yet, I chose which National Park to go to as the subject of this project.

Questions

There are three questions.

Q1. Where is Top10 National Park popular in the United States?

Q2. When is a good time to go to the national park?

Q3. How much is the entrance fee for a popular national park? Is it related to the entrance fee and the number of visitors?

Web Scraping

I used Scrapy to scrape data from two web sites.  
Wikipedia has crawled the visitor numbers per park. And The National Park official site scraped the 10 most visited national park visitor numbers and the entrance fee data.

  • Wikipedia : 

    -  https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States

  • National Park official site :  

   - https://www.nps.gov/aboutus/visitation-numbers.htm

   - https://www.nps.gov/aboutus/entrance-fee-prices.htm

Analysis 

Q1. Where is Top10 National Park popular in the United States?

[10 Most Visited National Parks in 2018]

This graph shows the highest number of visitors in 2018 as a bar graph. Before I got this data, I thought Grand Canyon or Yosemite National Park would be number one or two, but I can see that Great Smoky Mountain National Park is number one in the overwhelming number of visitors. The number of visitors to Great Smoky Mountain National park is almost twice that number two Grand Canyon. It ranks third to tenth from Rocky Mountain National Park to Glacier National Park, and the gap between their visitors is not big.

So, was Great Smoky Mountain number one in the past? Was the Grand Canyon, which is currently number two place, always the number two place to visit?  Let's take a look at historical data from the 10 most popular national parks.

[Number of visitors to National Park by Year]

Still, Great Smoky Mountains National Park tops the list of overwhelming visitors, the Grand Canyon was not number two  in 1980, but has been second in number since 1990 through steady gains.

The part of the graph you can check out is that before 1990 the Acadia National Park was second in terms of visitors with differences from the Grand Canyon, but it fell from second to sixth place with a wide gap in 1990. I think that the number of visitors has fell sharply due to some factors.

Q2. When is a good time to go to the national park?  

[National Park’s Monthly Visitors count]

This graph shows the number of visitors per month.
June to September at all national parks is a month with a large number of visitors, and although there are differences between parks, July is usually the most visited month. 
If you don't like a lot of people, I recommend you'd better avoid from June to September and go to May or October.

Q3. What is the Entrance fee for a popular national park?

[National Park’s Entrance fee]

This is a graph of 10 popular National Parks' entrance fee and an Annual Pass. It can be divided into three groups with an entrance fee.  The first group is the Olympic National Park and the Acadia National Park for $15 per person and the second group is the seven parks for $20. And the last group, Great Smokey Mountains, is free of admission. As you can see, Great Smoky Mountains is the number one visitor.  So, is there a correlation between the number of visitors and the entrance fee?

Q3. Is it related to the entrance fee and the number of visitors?

[Correlation between the Number of Visitors and the Entrance Fee]

This graph shows the relationship between the number of visitors to 32 national parks and the entrance fee in scatterplot. 
Before analyzing the data, I expected that cheaper entrance fee would result in more visitors, but the actual data did not. 
The Great Smoky Mountain has the largest number of visitors, and entrance fee is free. Grand Canyon and Rocky Mountain have many visitors, but entrance fee is more expensive than other parks.   

So, is there a related to the entrance fee and the number of visitors to the  national parks except Great Smoky Mountain?

[Correlation between the Number of Visitors and the Entrance Fee]

(except Great Smoky Mountain)

I draw the relationship between the entrance fee and the number of visitors, except for the Great Smoky Mountain, using Seaborn scatter plot. 
This graph shows the number of visitors is high, you can see that the entrance fee is expensive. But, this is only a result of data, in addition, there is not big difference between the most expensive entrance fee and the cheapest entrance fee. And the high number of visitors may not be the reason why entrance fee is expensive, but other factors. I can guess that the reason for the large number of visitors will be the services, facilities, activities, and size of the national park. Therefore, I would like to try the relationship between these factors and the number of visitors through future work.

Summary

In working on this project, the conclusions on the three questions were as follows.

  • The number one visitor is Great Smoky Mountain, and except for this, the number of visitors has changed since the past.
  • Months with a large number of visitors are from June to September. If you prefer less crowded, it is good to go in May or October.

  • The Great Smoky Mountain has the largest number of visitors, and entrance fee is free. With the exception of Great Smokey Mountain, t the number of visitors is high, the entrance fee is expensive . But this is only result of data, There is no direct correlation between the entrance fee and the number of visitors, and other factors will have to do with the number of visitors. for example, services, facilities, activities, and size of the national park. 

Future Works

If I have more time and data, I'd like to analysis deeply below subject in future works.

  • Correlation analysis of the services, facilities, activities, size of national park and number of visitors in national park
  • Analysis on weather and number of visitors in national park

Thank you for reading my blog! 

About Author

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI