Data shows how Economy Impacts National Park Visitation

Posted on Oct 25, 2021

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Science Background:

Data shows that over 300 million visitors go to the US National Parks each year. The National Parks Services (NPS) has over 22,000 full time employees and hires around 10,000 employees on a short basis in Spring and Summer when visitation is higher. These short term contracts make up a large expenditure for the National Parks Service. During years with larger spikes in visitation this number may not be enough and parks may not run as efficiently but if visitation does not grow as quickly the NPS may be spending more money than what is necessary.

Research Question and Objectives:

I gathered National Parks visitation data as well as economic data to see if the two were correlated. I first checked National Parks visitation as a whole to check whether or not it is correlated to US average retail fuel prices and US median income. I then checked to see if the economic factors had a stronger correlation with popular parks visitation (parks with visitation above the yearly mean) or with unpopular parks visitation (parks with visitation below the yearly mean). I then took a single park popular park as well as single unpopular park and conducted a case study to check if local economic data had a stronger correlation to visitation.

The Data:

  • National Park visitation data was gathered from the NPS website
  • Average retail fuel prices was gathered from the Energy Information Administration
  • Median income was gathered from the St. Louis Federal Reserve

Correlation was found by combine the data into a Pandas data frame and checking Pearson's coefficient.

National Park Visitation as a Whole:

I first looked at how US median income and fuel price correlated with National Park visitation as a whole. US median income saw a .84 correlation to National Park visitation where as average retail fuel prices saw a -.71.


Data shows how Economy Impacts National Park Visitation

National Park Visitation Over US Median Income


National Park Visitation Over Average Retail Fuel Price

National Park Visitation Over Average Retail Fuel Price


This shows that both factors are strongly correlated to National Parks visitation but not all National Parks are equal. Some parks see millions of visitors each year while others see barely over 10,000. Perhaps there are some differences between popular parks and unpopular parks.

Popular Parks and Unpopular Parks:

To check to see if there was any difference in the correlation of economic factors and National Park visitation I split up the National Parks by yearly mean. I then took the correlation of same US economic data and compared it to each grouping.

The popular parks saw a correlation most like the total visitation with median income being .83 and average retail fuel prices being -.71.


Popular Park Visitation Over US Median Income


Popular Park Visitation Over Average Retail Fuel Prices


This is line with my initial assumptions. Most people using National Parks as a vacation destination would most likely go to the popular parks and would most likely be influenced by economic factors.


On the other hand, unpopular parks visitation did not have have as strong of a correlation with economic data. While the correlation of visitation and median income was .82, the average retail fuel price's was only -.61.


Unpopular Park Visitation over US Median Income


Unpopular Park Visitation Over Average Retail Fuel Prices


I had initially expected a larger drop off of correlation for both due to the fact if someone is willing to visit one of these parks they most likely would regardless of economic factor. I then thought that if I could possibly take a look at more localized I data I could possibly find a stronger correlation (for example if a park was located in California, it's visitation would be more correlated to California economic data). I ended up taking data from one popular park and one unpopular park and compared it with their respective state data.

Data Case 1: Yosemite:

Location: California

2019 Number of Visitors: ~4,420,000

CA Median Income Correlation Coefficient: .61

US Median Income Correlation Coefficient: .58

CA Average Retail Fuel Price Correlation Coefficient: -.43

US Average Retail Fuel Price Correlation Coefficient: -. 62


Yosemite Visitation Correlation Heat Map


While the local data in terms of median income was stronger for Yosemite. It still was not very strong. Also US retail fuel prices had a stronger correlation than the local data.

Data Case 2: Great Basin

Location: Nevada

2019 Number of Visitors: ~120,000

US Median Income Correlation Coefficient: .72

NV Median Income Correlation Coefficient: .59

US Average Retail Gas Price: -.55

NV Average Retail Gas Price: -.61


Great Basin Visitation Correlation Heat Map


In this case the local retail fuel prices had a stronger correlation than the US data but once again it was not very strong.

Data Conclusion

While not a lot of data was taken for the local cases, due to the fact that the correlations were not as strong I decided to keep the initial recommendation to the NPS general.

For every 1% increase to US median income, increase the amount of short-term contracts by .43% and for every 1% increase to US retail fuel prices, decrease the amount of short-term contracts by .36%.


The recommendation is based off taking 50% from the correlation coefficient of the economic data to play it safe and test the success as time goes on. Success is determined if the NPS is able to accommodate guests without having too many people on staff.

While I think this could end up being successful, the recommendation does come with the caveat. After going through the data I do not fully believe that economic factors are most influential.

Future Work:

While I was going through the data comparisons. I noticed that the year had the strongest correlation to National Park visitation. So I pulled a graph of visitation over time.


National Park Visits Over Time


I noticed that there was a big jump after 2013 and wondered what that could be. Turns out, a possible answer was right on my phone. 2013 is when Instagram's popularity began to explode. While it may not exactly be causation, Instagram users over year had a .81 correlation coefficient to National Park visitation. While this data is relatively unrelated, I believe that looking at Instagram photos taken within National Park boundaries over time may yield a stronger prediction.

Thank you for reading! Github repository with link to the slide deck can be found here.

About Author

Jacob Smith

Hello! My name is Jacob Smith. I am an University of Arizona alumni.
View all posts by Jacob Smith >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI