Data Comparison on Running Shoes

Posted on May 18, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Motivation   

There is a rising interest in running as the COVID-19 pandemic has forced gym closures and people are running more to stay fit, especially as the weather is getting warmer.  This web data scraping project seeks to provide beginner runners with some comparison information as they may look to buy a new pair of running shoes from Zappos.com, a third-party shoes retail site that offers 22 brands and close to 1,000 shoes. 

Though traditional factors such as price, rating and fit are considered, two additional measures - shoe weight and the age of the brand – are examined.  These two measures are not typically considered by beginners, however they do yield some interesting insight.           

 

Dataset

The data was scrapped from the Zappos website.  The site was chosen because it has a larger running shoe listing than other similar sites such as Finish Line and Overstock.com. 

Scrapped information include brand, model, star ratings, price, fit measures, weight and comments.  The ratings are on a 5-star scale (5 stars = best).  The fit measures consist of 3 categories (True to size, True to width, and Arch support), and is graded by the buyers on a scale of 0 – 100%.  The age of the brand was found separately from other sources on the web.  From the 22 brands offered by the site, top 11 most popular brands were filtered out for analysis.   

Data Comparison on Running Shoes

 

Price range Data

It can be noted that that the shoe prices are clustered around $70 - $120.  And though most Nike models selling under $100, this may be due to Nike’s devotion of most of its R&D and marketing on basketball shoes, it only started to direct energy on running shoes recently.  In addition, Nike’s most advanced and higher-priced selections (such as Vaporfly) are listed only Nike’s own website, not on Zappos.    

As the shoes that are in focus are most popular, it’s no surprise that the ratings are clustered around 4 or 5 ratings.  However, one conclusion that can be made at this point is that there is little correlation between price and ratings, so one does not have to spend a lot of money to buy a pair a good pair of running shoes.   

 

 

Data on Rating and shoe fit 

In digging a little deeper on the information content of the ratings, we can look at how that ratings can indicate shoe fit.  As previously indicated, the fit measure consists of consists of 3 categories (True to size, True to width, and Arch support), and is graded by the buyers on a scale of 0 – 100%.  

While size and width are mostly consistent with the ratings in which higher ratings reflect greater fitting comfort, there is more varied feedback on arch support.  Arch support is harder to pin down as it relates to shoes that provide special cushioning to ‘correct’ biomechanical and prolongation issues.  As it relates to potentially highly individualistic medical issues, it is no wonder that the spread of the feedback is wider.  This indicates that more care must be taken for runners looking for shoes that provide corrective cushioning.           

 

Shoe weight  

The median weight of the shoes is mostly between 10-12 oz., which is reasonable for most average runners.  There are some shoes for those looking for extra cushioning.  Looking at the relationship between weight and price, the takeaway here is that there is little correlation between weight and price, so a potential buyer may not have to worry about paying a higher price for shoes with extra cushioning.

 

Brand Age Data

Age is used as a proxy to capture the intangible measures of freshness and innovation.  This particularly relates to On and Hoka One One, the two newest entrants to the running shoe market.  Though the two brands are only about 10 years old, they have become very popular with runners:  On with its innovative design of its soles and Hoka with its cushioning that does not much to the overall weight to the shoes. 

However, the two brands typically cost more than other more established brands.  It interesting to note that though the Nike brand is about 70 years old, it has a strong innovation cycle.  As Nike directed its vast resources to the running shoe market in recent years, it introduced the highly touted, and controversial, Vaporfly model that has generated a great deal of buzz.  The net effect of what these brands have done is that it has forced other brands to go back to the drawing board to rethink and innovate to keep things fresh.   

 

Future considerations

Future project considerations include comparing women’s running shoes.  This is particularly significant as women makes up about half of the marathon finishers in the U.S. and makes up more than 50% of all casual runners.  Other interesting topics that can be explored include the demographic makeup of the brands and the time series of the brand evolution.  Also, a deeper analysis can be performed on a richer dataset obtained from scraping the brand sites directly, which combined with machine learning algorithms may yield more insight.         

About Author

Peter Liu

Peter Liu has more than 14 years of experience in corporate credit risk management and held various financial analytics positions. He has an MBA and a BA in mathematics.
View all posts by Peter Liu >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI