Data Study on the Colorado's 14ers

Posted on Jul 28, 2019
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Hiking Colorado 14ers is one of the most popular summer past times in Colorado. Every year, data shows thousands of people hoping to get away from civilization and partake in a beautiful mountain sunrise emigrate to one of the 53 (58 depending on who you’re talking to) 14ers Colorado holds. Some of these 14ers can be as easy as a walk in the park and some should only be attempted by experienced hikers. 

In this analysis I looked at many different aspects of the 14ers. All of the data scraped originated from 14ers.com, a prevalent site used by most hikers. Route data, mountain data, and weather data were the primary sources of information for this project. I directed my analysis of the data towards the outdoor industry. There are many opportunities for outdoor companies to get involved in trail restoration. This data could also be used to glean insights into what items will be needed by hikers and where (weather data, length of routes, etc). 

Data

Working exclusively with BeautifulSoup I scraped beginning with high-level mountain data which included the class of the hike, the mountain name, and the elevation above sea level. I then cleaned the data returned from the page. This involved building a regex in order to determine if the mountain was, in fact, a 14er. 13ers are also widely popular in Colorado but outside the scope of this project. I then took the data and segmented it in order to be able to parse out each mountain independently. 

This process was followed for scraping route data and weather data as well. Each 14er may have multiple routes to climb in order to reach the summit. This was taken into account later in the analysis. Each mountain forecast provided five days and nights of weather data. The format provided by the site was not in a traditional datetime format and required manipulation in order to determine the exact date for the weather on any given run of the scraping code. 

 

In order to pull the raw data I looped through the route URL for each mountain and returned a list of the forecast for each mountain for the next five days.

 

The data returned needed to be cleaned significantly and assigned a real date as the returned data only provided a string for the forecast as seem below:

In order to clean the dates, I built a conversion dictionary which provided a key and value to add n-hours to the index of the returned weather. For example, the weather always begins with Today’s High and continues in 12-hour intervals. Therefore each index is approximately 12 hours later than the previous index. You can see here how it’s implemented:

The result is a dictionary with a mountain name as the key and tuples with the weather as the values:

This result not only allows me to later use actual dates in any analysis I might do but also to run this on a consistent schedule and build a history of weather on actual dates. 

Jupyter 

From here I moved from scraping to my Jupyter notebook for data exploration and initial analysis. Since my main focus here was to scrape and clean data my exploration is only scratching the surface of what is possible. 

In order to call my data from the previous file I called the function and assigned the result to each dataset respectively:

 

I started with some basic questions:

Seaborn

I also used Seaborn to visualize some of the data.

 

Then, I used some of the data to look at the weather patterns:

 

Overall the project allowed me to begin scratching the surface of what’s possible. This data can easily be used to help understand where the best opportunities are for improvement on the trails and which trails are still experiencing harder weather. This, in turn, can help companies decide not only what to stock but where to stock it. 

 

Further research:

  • Detailed data on which routes and mountains are most frequented by hikers of different skills during different times of the year. Possible data here.
  • Data on how much each route is in disrepair. Possible data here.
  • Where most accidents are happening on different mountains. Possible data here.
  • Historical and present weather data in order to better understand long-term inventory. Possible data here.

 

About Author

Katherine Treadwell

Analytics professional passionate about empowering teams to make well-informed, data-driven decisions. Proven leader and strategic partner in developing long-term plans for success. Strong foundation in technology and business applications. Teacher, developer, and mentor of team members and the...
View all posts by Katherine Treadwell >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI