Data shows how Economy Impacts National Park Visitation
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Data Science Background:
Data shows that over 300 million visitors go to the US National Parks each year. The National Parks Services (NPS) has over 22,000 full time employees and hires around 10,000 employees on a short basis in Spring and Summer when visitation is higher. These short term contracts make up a large expenditure for the National Parks Service. During years with larger spikes in visitation this number may not be enough and parks may not run as efficiently but if visitation does not grow as quickly the NPS may be spending more money than what is necessary.
Research Question and Objectives:
I gathered National Parks visitation data as well as economic data to see if the two were correlated. I first checked National Parks visitation as a whole to check whether or not it is correlated to US average retail fuel prices and US median income. I then checked to see if the economic factors had a stronger correlation with popular parks visitation (parks with visitation above the yearly mean) or with unpopular parks visitation (parks with visitation below the yearly mean). I then took a single park popular park as well as single unpopular park and conducted a case study to check if local economic data had a stronger correlation to visitation.
- National Park visitation data was gathered from the NPS website
- Average retail fuel prices was gathered from the Energy Information Administration
- Median income was gathered from the St. Louis Federal Reserve
Correlation was found by combine the data into a Pandas data frame and checking Pearson's coefficient.
National Park Visitation as a Whole:
I first looked at how US median income and fuel price correlated with National Park visitation as a whole. US median income saw a .84 correlation to National Park visitation where as average retail fuel prices saw a -.71.
This shows that both factors are strongly correlated to National Parks visitation but not all National Parks are equal. Some parks see millions of visitors each year while others see barely over 10,000. Perhaps there are some differences between popular parks and unpopular parks.
Popular Parks and Unpopular Parks:
To check to see if there was any difference in the correlation of economic factors and National Park visitation I split up the National Parks by yearly mean. I then took the correlation of same US economic data and compared it to each grouping.
The popular parks saw a correlation most like the total visitation with median income being .83 and average retail fuel prices being -.71.
This is line with my initial assumptions. Most people using National Parks as a vacation destination would most likely go to the popular parks and would most likely be influenced by economic factors.
On the other hand, unpopular parks visitation did not have have as strong of a correlation with economic data. While the correlation of visitation and median income was .82, the average retail fuel price's was only -.61.
I had initially expected a larger drop off of correlation for both due to the fact if someone is willing to visit one of these parks they most likely would regardless of economic factor. I then thought that if I could possibly take a look at more localized I data I could possibly find a stronger correlation (for example if a park was located in California, it's visitation would be more correlated to California economic data). I ended up taking data from one popular park and one unpopular park and compared it with their respective state data.
Data Case 1: Yosemite:
2019 Number of Visitors: ~4,420,000
CA Median Income Correlation Coefficient: .61
US Median Income Correlation Coefficient: .58
CA Average Retail Fuel Price Correlation Coefficient: -.43
US Average Retail Fuel Price Correlation Coefficient: -. 62
While the local data in terms of median income was stronger for Yosemite. It still was not very strong. Also US retail fuel prices had a stronger correlation than the local data.
Data Case 2: Great Basin
2019 Number of Visitors: ~120,000
US Median Income Correlation Coefficient: .72
NV Median Income Correlation Coefficient: .59
US Average Retail Gas Price: -.55
NV Average Retail Gas Price: -.61
In this case the local retail fuel prices had a stronger correlation than the US data but once again it was not very strong.
While not a lot of data was taken for the local cases, due to the fact that the correlations were not as strong I decided to keep the initial recommendation to the NPS general.
For every 1% increase to US median income, increase the amount of short-term contracts by .43% and for every 1% increase to US retail fuel prices, decrease the amount of short-term contracts by .36%.
The recommendation is based off taking 50% from the correlation coefficient of the economic data to play it safe and test the success as time goes on. Success is determined if the NPS is able to accommodate guests without having too many people on staff.
While I think this could end up being successful, the recommendation does come with the caveat. After going through the data I do not fully believe that economic factors are most influential.
While I was going through the data comparisons. I noticed that the year had the strongest correlation to National Park visitation. So I pulled a graph of visitation over time.
I noticed that there was a big jump after 2013 and wondered what that could be. Turns out, a possible answer was right on my phone. 2013 is when Instagram's popularity began to explode. While it may not exactly be causation, Instagram users over year had a .81 correlation coefficient to National Park visitation. While this data is relatively unrelated, I believe that looking at Instagram photos taken within National Park boundaries over time may yield a stronger prediction.
Thank you for reading! Github repository with link to the slide deck can be found here.