If the Shoe Fits

Posted on Nov 16, 2018


Have you been out shoe shopping and wonder if your favorite brand ever goes on sale? Ever been curious to know if women's heels or flats are more expensive? As an avid shoe lover and buyer, I'm always hunting for a good deal on a pair of shoes. In order to make more data-driven decisions before purchasing shoes, I wanted to explore and visualize women's online shoe data. A sample dataset is available on Kaggle with a list of 10,000 women's shoes and their product information from 2014-2017 using Datafiniti's product database. (Note that this is a sample from a 93 million product data listings with 300 million price offers from 1000s of online retailers).

I used this shiny project as an opportunity to analyze the relationship between a shoe's price with its style, brand, and merchant.


For the purposes of this analysis I set the following price ranges:

  • Cheap: $0.00 - $40.00
  • Mid-Priced: $41.00 - $150.00
  • Luxury: $151.00 +

Additionally, I filtered the dataset to only include the top 20 brands and merchants with the highest product counts.  To start off, let's take a look at the price distributions across women's shoe styles:

Note: The green-dotted line indicates 'cheap' price range. Up until the red-dotted line indicates 'mid' price range. After the red line is 'luxury' price range.

  • Boots (unsurprising) have the highest price
    • Boots also appear to have the widest price distribution, with Sandals not close behind
  • The majority of styles are offered at cheap and mid-price ranges. All styles are also available at luxury prices

How do brands vary in price range and style?

  • The first 3 brands listed (Dearfoams, Bamboo, C Label) exclusively offer shoes in the cheap price range
  • About half (or 9/20) of the top brands don’t offer sneakers
    • The majority of brands that offer sneakers are exclusive to producing sneaker shoe types
  • About half (or 8/20) of the top brands offer expensive shoe styles
  • Flats are the only style that top brands don’t offer in the expensive price range
  • Mid-priced shoes seem to be the most common among top brands

How do merchants vary in price range and style?

Note: y-axis = price

  • Ralph Lauren clearly offered the widest price range of styles
  • Sears.com is the only multi-brand retailer offering luxury priced shoes
    • It's also interesting to note that Sears.com offers a wide selection of all styles except sneakers
  • Most retailers offer all styles

Price Discounts

The discount ratio is the current % of shoes available at a discounted price from the top 20 brands. It is calculated by dividing the number of products with an 'on-sale' value = True by the total number of products a brand is offering.

  • Brands with the highest price discounts: Easy Spirit, Tommy Hilfiger, New Balance
  • Over half of the top brands have discounted their shoes from 30 – 50%
  • If you’re looking for a cheaper but popular athletic shoe, Easy Spirit or NB would be a much better option than Nike.

The discount ratio is the current % of shoes available at a discounted price from each style. It is calculated by dividing the number of products with an 'on-sale' value = True, by the total number of products in that shoe style.

  • It comes as a surprise that sneakers are the most discounted style
  • A bit less surprising is that boots are the least discounted, with them being the most expensive shoe style
  • Heels vs Flats: You're probably more likely to find a pair of heels on sale than flats
    • Perhaps the demand is higher for flats because they're generally more comfortable than heels so they don't go on sale as much

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp