Data Analysis on Athletic Shoes and Sneakers

Posted on Apr 29, 2019
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Using data from I was able to paint a picture of what consumers value in top selling men’s athletic shoes and sneakers by brand. My motivation stems from wanting to know what consumers of men’s athletic shoes and sneakers have to say about them.  I wanted to analyze which qualities consumers tended to base their purchasing decisions off. To achieve this, I had to pose some questions of interest. Which brands reign supreme in different ratings, such as comfort and style? What do consumers value in shoes? Is it the price? What about arch support? Could it be the true size feeling, true width feeling, or style?

True size feeling tells us if the consumers thought a shoe was too small, too large, or just right. Similarly, true width feeling tells us if the consumers thought a shoe was too narrow, too wide, or just right. I also wanted to see what consumers thought about different products, so I asked, “What are consumers saying about the shoes?” Finally, are there trends based on different shoe qualities? Without further ado, let’s examine the process of how the data for this project was collected, cleaned, and analyzed. We will then see what was uncovered in the resultant findings. 


Using the Scrapy web-crawling framework in Python, I was able to collect data from the top 200 selling men’s athletic shoes and sneakers from After collecting basic information on each shoe, such as price, brand name, and product name, I scraped ratings and comments from individual reviews. This data was cleaned and then organized in descending order of best true size feeling, true width feeling, and arch support. This structured data was then used to generate statistics for the shoe features, visual plots representing the variations in shoe ratings and correlations between them, and WordClouds to show what consumers are saying about different shoe brands.

Data Findings

After completing analysis using the shoes data, I sought out the top 5 selling brands and products. They are listed below:

Below are some statistics for different shoe attributes of the 19 unique brands that comprise the top 200 selling products:

Data Analysis on Athletic Shoes and Sneakers

From the table above, we can see that most of the top selling shoes are priced $65 and below. It seems that on average, customers rate the shoes as having good true size feelings and better true width feelings. Notice how the mean arch support rating drops compared to the previous two ratings. This tells us consumers are not that satisfied with the arch support most shoe products offer. We also see that most consumers are, in general, generous when it comes to rating shoe style. They are seemingly most critical of comfort, and less of overall and style.


Let’s now look at how the 19 unique brands performed individually in true size and true width in addition to their mean product price. I produced some horizontal barplots detailing the performance of each brand in the three categories. Each brand’s respective mean values are listed next to the plots.

Data Analysis on Athletic Shoes and Sneakers
Data Analysis on Athletic Shoes and Sneakers

Upon looking at the plot “Mean Price by Brand”, we can see right away that the brands Vans and Converse sit on the lower end of the price spectrum at $53.03 and $52.99, respectively. Their low product prices and timeless styles make them quite popular; it’s no secret why they collectively hold 4 of 5 spots for the top 5 selling products. Joining them are Saucony Originals and SKECHERS. SKECHERS has always been known to make highly affordable shoes that serve general purpose athletics. Saucony, a pioneer of running shoes, seems to have begun making cheaper shoes to increase sales.

When looking at the “Mean True Size Feeling by Brand” we see high quality names, like ASICS, Brooks, Nike and New Balance leading the way at 86.5, 88.23, 83.97, and 83.71, respectively. Also joining them is the classic Vans with a value of 82.93. Poorer performers include Superga at 56.0 and Converse at 63.13. I myself used to own a pair of Converse Chuck Taylor All Star Core Ox and found its size to be untrue of what the company claimed; it ran rather large.

With the “Mean True Width Feeling by Brand” plot we shall observe that all 19 brands perform well as a collective. Leaders in this category include Vans at 91.89, Saucony Originals at 91.99, and SKECHERS at 91.29, Nike at 91.73, and Converse at 91.54. This was expected; makers of athletic shoes strive for nominal size and width fits. Judging from the plots, Vans seems adept at making proper fitting, everyday purpose, stylish shoes at affordable prices; they have found their specialization in the shoe market. Upon review, I recommend that Superga, having performed poorly in true width feeling with a score of 71.0, should overhaul it shoe design process. Consumers of its products are critical of its products’ actual sizes and widths.

Overal Rating vs Price

Below are a couple of jointplots and a boxplot that I made with the help of the Python data visualization libraries, Seaborn and Matplotlib.  A jointplot shows, in addition to a scatterplot, a histogram for each variable of the scatterplot. In the examples below, the variables of interest are the mean overall rating vs price and mean style rating vs price. The tables next to the plots detail the statistics of different ratings for the top 5 selling brands.

Data Analysis 

The jointplot of overall rating vs price tells us that the majority concentration of reviews gave an overall rating of 4-5 stars for shoes in the $45-60 price range. Converse attains the highest mean overall rating of 4.73 for a single brand while Nike earns the lowest of 4.41. The standard deviations of their respective ratings are 0.68 and 1.12. This tells us that of the top 5, consumers of Nike shoe products were the most critical and likely to giving lower overall ratings. Consumers of Converse shoe products were the least critical and more likely to give high overall ratings.

We see in the jointplot of style rating vs price that consumers are, for the most part, generous in their style ratings; the histogram of style ratings tells us most consumers gave 5 stars for style.

The majority concentration of reviews gave an style rating of 4-5 stars for shoes in the $45-60 price range. Converse leads the top 5 brands in style with a mean rating of 4.90 while Nike is in last with a rating of 4.67. Consumers are very pleased with Converse shoe products’ style; a standard deviation of 0.36 tells us that consumers generally tend to give high ratings and are not that critical of Converse shoe products’ style. Nike has the highest standard deviation of style rating with a rating of 0.717. This tells us consumers of Nike shoe products are critical of their style and may give lower ratings.

Comfort Rating

The boxplot shows the variation in the comfort ratings for the top 5 brands. With outliers present it is difficult to gauge exactly what consumer opinion on comfort is, so a table detailing the statistics of the ratings is given on the right. Interestingly, we see that Converse and Nike swap places for comfort.

Converse was given the lowest comfort rating with a mean score of 4.49. It also has the lowest standard deviation of 0.81. This means most consumers are not satisfied with the comfort Converse shoe products provide. This may be attributed to the shoes’ poor arch support; we’ll see evidence of this below. The standard deviation of 0.81 tells us that most people who wear Converse generally agree with one another that the shoes aren’t as comfortable as, say shoes from the other 4 leading brands.

Nike has the highest comfort rating with a mean score of 4.53. It also has the highest standard deviation of 0.97. I expected this since, as a master maker of athletic shoes across various sports, Nike should have a competitive edge in comfort and arch support. The standard deviation of 0.97 tells us the consumers may beg to differ amongst each other.

Word Clouds

The following WordClouds, made using the WordCloud package of Python, tell us what consumers are saying about shoe products for different brands.  


In the Vans WordCloud we see words such as, love, comfortable, great, feel, slip on, good, and recommend. This supports our earlier finding that Vans is great at making affordable, stylish, everyday shoes with proper fitting. It seems that people are happy with the slip-on shoes the most.


In the Converse WordCloud, we see words such as, love, good, great, classic, Chuck Taylor, and arch support. As mentioned in the discussion of the mean comfort rating vs price boxplot above and seen in the WordCloud, consumers feel the arch support of Converse shoes is a bit weak. Consumers, however, still love the timeless look Converse has to offer; it seems they are particularly fond of the Chuck Taylor shoes.


In the Saucony WordCloud, we see words such as, great, running, comfortable, fit, look, good, love and M574. This may indicate that consumers are very pleased with the M574 running shoe as it is highly affordable, comfortable, proper fitting, and looks stylish. Saucony’s running shoes generally tend to be on the pricier end, so their push to deliver cheaper shoes seems to be paying off.

Conclusion and Future Pursuits

As a recap to all of this:

  • People love a timeless look. Everyone needs a back-to-basics option to fall back on, and Converse has that classic feel that transcends time.
  • Consumers prioritize affordability and style when it comes to decision making.
  • Everyday, all-purpose sneakers in the price range of $40-60 sell.
  • Companies should look to improving comfort in shoes. Consumers may not shout it out loud, but they will always keep appreciate great arch support.
  • Comfort is the third pillar of a shoes’ success, with affordability and style as one and two. Athletic shoes have better true fit. Leisure shoes rank higher in style. This whole “ath-leisure” craze makes sense now as people love style that doesn’t break the bank and is comfy. Since mean comfort ratings have shown to be lower than style and overall, companies may benefit from figuring out how to improve comfort.


This project provided a lot of valuable insight into men’s athletic shoes and sneakers. But, as always, we can always learn more from doing analysis greater in volume and variety. With more data and time, I would explore a connection between best selling products and social media influencers. Using sentiment analysis and more WordClouds I’d look to see how shoe trends change over time. I would also repeat this entire process to see what makes athletic shoes and sneakers sell in the women's market.

Thanks for reading!

About Author

Sashank Gummella

Sashank graduated from the University of Illinois in May of 2018 with a Bachelor of Science degree in Aerospace Engineering. He's had the privilege of interning at NASA Langley Research Center, where he was involved with the design...
View all posts by Sashank Gummella >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI