Webscraping running shoes portal runrepeat.com
Motivation
Whether you run for fitness or you are a marathon runner, finding the best-fitting shoe among the many choices at a running store isnβt always easy based on data.
Research Questions
What are popular shoe brands?
What are the popular shoes for specific needs?
What features may have critical influences on customers satisfaction?
Data Collection
For my web scraping project I decided to scrape http://www.runrepeat.com, a running shoes discovery and review platform. It has over 134,867 expert reviews and over 1000 shoes for users to choose from.
In order to narrow down my research scope, I focused on the top women's running shoes in all categories. I was able to scrape 400+ shoes with top scores in terms of popularity and top reviews. For product datasets, I scraped brand name, shoe name, overall product rating, run score, rank, summary and reviews. Plus, the web scraping review dataset includes shoe details like terrain, use, release dates, score, reviews, review summary etc.Β My web scraping codes are available onΒ Github.
Exploratory Data Analysis
Price Distribution
Rating vs Number of Reviews
Word Cloud of good reviews
Word Cloud of bad reviews
Source