Webscraping running shoes portal runrepeat.com

Lalith Sugavanam
Posted on Dec 15, 2017

Motivation

Whether you run for fitness or you are a marathon runner, finding the best-fitting shoe among the many choices at a running store isn’t always easy.

Research Questions

What are popular shoe brands?

What are the popular shoes for specific needs?

What features may have critical influences on customers satisfaction?

Data Collection

For my web scraping project I decided to scrape http://www.runrepeat.com, a running shoes discovery and review platform. It has over 134,867 expert reviews and over 1000 shoes for users to choose from.

In order to narrow down my research scope, I focused on the top women's running shoes in all categories. I was able to scrape 400+ shoes with top scores in terms of popularity and top reviews. For product datasets, I scraped brand name, shoe name, overall product rating, run score, rank, summary and reviews. Plus, the web scraping review dataset includes shoe details like terrain, use, release dates, score, reviews, review summary etc. My web scraping codes are available on Github.

Exploratory Data Analysis

 

  

 

Price Distribution

Rating vs Number of Reviews

Word Cloud of good reviews

Word Cloud of bad reviews

Source

http://www.runrepeat.com

About Author

Lalith Sugavanam

Lalith Sugavanam

Lalith holds a Masters degree in computer applications from Bharathidasan University, India. She loves to program and has more recently progressed into a fascination with extracting meaning from data. She's completed a 12-week Data Science course at NYCDS.
View all posts by Lalith Sugavanam >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career citibike clustering Coding Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job JP Morgan Chase Kaggle lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Portfolio Development prediction Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping What to expect word cloud word2vec XGBoost yelp