Comparing Running Shoes

Avatar
Posted on May 18, 2020

Motivation   

There is a rising interest in running as the COVID-19 pandemic has forced gym closures and people are running more to stay fit, especially as the weather is getting warmer.  This web scraping project seeks to provide beginner runners with some comparison information as they may look to buy a new pair of running shoes from Zappos.com, a third-party shoes retail site that offers 22 brands and close to 1,000 shoes.  Though traditional factors such as price, rating and fit are considered, two additional measures - shoe weight and the age of the brand – are examined.  These two measures are not typically considered by beginners, however they do yield some interesting insight.           

 

Dataset

The data was scrapped from the www.Zappos .com website.  The site was chosen because it has a larger running shoe listing than other similar sites such as Finish Line and Overstock.com.  Scrapped information include brand, model, star ratings, price, fit measures, weight and comments.  The ratings are on a 5-star scale (5 stars = best).  The fit measures consist of 3 categories (True to size, True to width, and Arch support), and is graded by the buyers on a scale of 0 – 100%.  The age of the brand was found separately from other sources on the web.  From the 22 brands offered by the site, top 11 most popular brands were filtered out for analysis.   

 

Price range  

It can be noted that that the shoe prices are clustered around $70 - $120.  And though most Nike models selling under $100, this may be due to Nike’s devotion of most of its R&D and marketing on basketball shoes, it only started to direct energy on running shoes recently.  In addition, Nike’s most advanced and higher-priced selections (such as Vaporfly) are listed only Nike’s own website, not on Zappos.    

As the shoes that are in focus are most popular, it’s no surprise that the ratings are clustered around 4 or 5 ratings.  However, one conclusion that can be made at this point is that there is little correlation between price and ratings, so one does not have to spend a lot of money to buy a pair a good pair of running shoes.   

 

 

Rating and shoe fit

In digging a little deeper on the information content of the ratings, we can look at how that ratings can indicate shoe fit.  As previously indicated, the fit measure consists of consists of 3 categories (True to size, True to width, and Arch support), and is graded by the buyers on a scale of 0 – 100%.  While size and width are mostly consistent with the ratings in which higher ratings reflect greater fitting comfort, there is more varied feedback on arch support.  Arch support is harder to pin down as it relates to shoes that provide special cushioning to ‘correct’ biomechanical and prolongation issues.  As it relates to potentially highly individualistic medical issues, it is no wonder that the spread of the feedback is wider.  This indicates that more care must be taken for runners looking for shoes that provide corrective cushioning.           

 

Shoe weight  

The median weight of the shoes is mostly between 10-12 oz., which is reasonable for most average runners.  There are some shoes for those looking for extra cushioning.  Looking at the relationship between weight and price, the takeaway here is that there is little correlation between weight and price, so a potential buyer may not have to worry about paying a higher price for shoes with extra cushioning.

 

Brand Age

Age is used as a proxy to capture the intangible measures of freshness and innovation.  This particularly relates to On and Hoka One One, the two newest entrants to the running shoe market.  Though the two brands are only about 10 years old, they have become very popular with runners:  On with its innovative design of its soles and Hoka with its cushioning that does not much to the overall weight to the shoes.  However, the two brands typically cost more than other more established brands.  It interesting to note that though the Nike brand is about 70 years old, it has a strong innovation cycle.  As Nike directed its vast resources to the running shoe market in recent years, it introduced the highly touted, and controversial, Vaporfly model that has generated a great deal of buzz.  The net effect of what these brands have done is that it has forced other brands to go back to the drawing board to rethink and innovate to keep things fresh.   

 

Future considerations

Future project considerations include comparing women’s running shoes.  This is particularly significant as women makes up about half of the marathon finishers in the U.S. and makes up more than 50% of all casual runners.  Other interesting topics that can be explored include the demographic makeup of the brands and the time series of the brand evolution.  Also, a deeper analysis can be performed on a richer dataset obtained from scraping the brand sites directly, which combined with machine learning algorithms may yield more insight.         

About Author

Avatar

Peter Liu

Peter Liu has more than 14 years of experience in corporate credit risk management and held various financial analytics positions. He has an MBA and a BA in mathematics.
View all posts by Peter Liu >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp