Data Study on the Colorado's 14ers
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction
Hiking Colorado 14ers is one of the most popular summer past times in Colorado. Every year, data shows thousands of people hoping to get away from civilization and partake in a beautiful mountain sunrise emigrate to one of the 53 (58 depending on who you’re talking to) 14ers Colorado holds. Some of these 14ers can be as easy as a walk in the park and some should only be attempted by experienced hikers.
In this analysis I looked at many different aspects of the 14ers. All of the data scraped originated from 14ers.com, a prevalent site used by most hikers. Route data, mountain data, and weather data were the primary sources of information for this project. I directed my analysis of the data towards the outdoor industry. There are many opportunities for outdoor companies to get involved in trail restoration. This data could also be used to glean insights into what items will be needed by hikers and where (weather data, length of routes, etc).
Data
Working exclusively with BeautifulSoup I scraped beginning with high-level mountain data which included the class of the hike, the mountain name, and the elevation above sea level. I then cleaned the data returned from the page. This involved building a regex in order to determine if the mountain was, in fact, a 14er. 13ers are also widely popular in Colorado but outside the scope of this project. I then took the data and segmented it in order to be able to parse out each mountain independently.
This process was followed for scraping route data and weather data as well. Each 14er may have multiple routes to climb in order to reach the summit. This was taken into account later in the analysis. Each mountain forecast provided five days and nights of weather data. The format provided by the site was not in a traditional datetime format and required manipulation in order to determine the exact date for the weather on any given run of the scraping code.
In order to pull the raw data I looped through the route URL for each mountain and returned a list of the forecast for each mountain for the next five days.
The data returned needed to be cleaned significantly and assigned a real date as the returned data only provided a string for the forecast as seem below:
In order to clean the dates, I built a conversion dictionary which provided a key and value to add n-hours to the index of the returned weather. For example, the weather always begins with Today’s High and continues in 12-hour intervals. Therefore each index is approximately 12 hours later than the previous index. You can see here how it’s implemented:
The result is a dictionary with a mountain name as the key and tuples with the weather as the values:
This result not only allows me to later use actual dates in any analysis I might do but also to run this on a consistent schedule and build a history of weather on actual dates.
Jupyter
From here I moved from scraping to my Jupyter notebook for data exploration and initial analysis. Since my main focus here was to scrape and clean data my exploration is only scratching the surface of what is possible.
In order to call my data from the previous file I called the function and assigned the result to each dataset respectively:
I started with some basic questions:
Seaborn
I also used Seaborn to visualize some of the data.
Then, I used some of the data to look at the weather patterns:
Overall the project allowed me to begin scratching the surface of what’s possible. This data can easily be used to help understand where the best opportunities are for improvement on the trails and which trails are still experiencing harder weather. This, in turn, can help companies decide not only what to stock but where to stock it.
Further research:
- Detailed data on which routes and mountains are most frequented by hikers of different skills during different times of the year. Possible data here.
- Data on how much each route is in disrepair. Possible data here.
- Where most accidents are happening on different mountains. Possible data here.
- Historical and present weather data in order to better understand long-term inventory. Possible data here.