If the Shoe Fits

Posted on Nov 16, 2018


Have you been out shoe shopping and wonder if your favorite brand ever goes on sale? Ever been curious to know if women's heels or flats are more expensive? As an avid shoe lover and buyer, I'm always hunting for a good deal on a pair of shoes. In order to make more data-driven decisions before purchasing shoes, I wanted to explore and visualize women's online shoe data. A sample dataset is available on Kaggle with a list of 10,000 women's shoes and their product information from 2014-2017 using Datafiniti's product database. (Note that this is a sample from a 93 million product data listings with 300 million price offers from 1000s of online retailers).

I used this shiny project as an opportunity to analyze the relationship between a shoe's price with its style, brand, and merchant.


For the purposes of this analysis I set the following price ranges:

  • Cheap: $0.00 - $40.00
  • Mid-Priced: $41.00 - $150.00
  • Luxury: $151.00 +

Additionally, I filtered the dataset to only include the top 20 brands and merchants with the highest product counts.  To start off, let's take a look at the price distributions across women's shoe styles:

Note: The green-dotted line indicates 'cheap' price range. Up until the red-dotted line indicates 'mid' price range. After the red line is 'luxury' price range.

  • Boots (unsurprising) have the highest price
    • Boots also appear to have the widest price distribution, with Sandals not close behind
  • The majority of styles are offered at cheap and mid-price ranges. All styles are also available at luxury prices

How do brands vary in price range and style?

  • The first 3 brands listed (Dearfoams, Bamboo, C Label) exclusively offer shoes in the cheap price range
  • About half (or 9/20) of the top brands don’t offer sneakers
    • The majority of brands that offer sneakers are exclusive to producing sneaker shoe types
  • About half (or 8/20) of the top brands offer expensive shoe styles
  • Flats are the only style that top brands don’t offer in the expensive price range
  • Mid-priced shoes seem to be the most common among top brands

How do merchants vary in price range and style?

Note: y-axis = price

  • Ralph Lauren clearly offered the widest price range of styles
  • Sears.com is the only multi-brand retailer offering luxury priced shoes
    • It's also interesting to note that Sears.com offers a wide selection of all styles except sneakers
  • Most retailers offer all styles

Price Discounts

The discount ratio is the current % of shoes available at a discounted price from the top 20 brands. It is calculated by dividing the number of products with an 'on-sale' value = True by the total number of products a brand is offering.

  • Brands with the highest price discounts: Easy Spirit, Tommy Hilfiger, New Balance
  • Over half of the top brands have discounted their shoes from 30 – 50%
  • If you’re looking for a cheaper but popular athletic shoe, Easy Spirit or NB would be a much better option than Nike.

The discount ratio is the current % of shoes available at a discounted price from each style. It is calculated by dividing the number of products with an 'on-sale' value = True, by the total number of products in that shoe style.

  • It comes as a surprise that sneakers are the most discounted style
  • A bit less surprising is that boots are the least discounted, with them being the most expensive shoe style
  • Heels vs Flats: You're probably more likely to find a pair of heels on sale than flats
    • Perhaps the demand is higher for flats because they're generally more comfortable than heels so they don't go on sale as much

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI